Skip to content

This tool aligns reads back to a reference genome. It is also my final project for CSE 185 Advanced Bioinformatics Lab. Won Best Presentation Award among 60+ student projects.

License

Notifications You must be signed in to change notification settings

WillardFord/wf-align-CSE185

Repository files navigation

wf-align (CSE185 Project Demo)

This is a demonstration project for CSE185. It implements a smaller, simpler version of bwa-backtrack. See the BWA page for more details. For the materials that I actually turned in summarizing this project, refer to the final-project-files directory.

Install instructions

Installation doesn't require any additional libraries.

Navigate to the directory in which you would like to download this tool and use the following command:

git clone https://github.com/WillardFord/wf-align-CSE185

Change into that directory and install the tool so that it can be used from the command line. You can install wf-align with the following command:

cd wf-align-CSE185
python setup.py install

Note: if you do not have root access, you can run the commands above with additional options to install locally:

python setup.py install --user

If the install was successful, typing wf-align --help should show a useful message.

Basic usage

The basic usage of wf-align is:

wf-align reference.fa reads.fq [-o output.sam] [other options]

To run wf-align on a small test example (using files in this repo):

wf-align example-files/test_reference.fa example-files/test_reads.fastq 

This should produce the output below:

@HD VN:1.6 SO:unknown
SEQ_ID_1        0       chrTEST 3       255     10M     0       0       10      CTAGCTACGT      FFFFFFFFFF
SEQ_ID_2        0       chrTEST2        1       255     10M     0       0       10      TAGCTAGGTT      HHHHHHHHHH
SEQ_ID_3        0       chrTEST 57      255     8M      0       0       8       GCTAGCAT        HHHHHHHH

wf-align options

There are 2 required inputs to wf-align, a reference fasta file and a fastq file containing reads. Users may additionally specify the options below:

  • -o FILE, --output FILE: Write output to file. By default, output is written to stdout.

  • -m FILE, --metrics FILE: Write metrics to file. By default, metrics are written to {cur_time}_wf_align_metrics.txt where cur_time is the result of time.time() at the end of alignment.

SARS Cov2 Example

I've benchmarked wf-align against BWA-MEM using a SARS Cov2 reference genome in the benchmark file.

File format

The output file format is the same as the bwa mem method, a sam file. See: https://samtools.github.io/hts-specs/SAMv1.pdf

Methodology

I used a modified version of BWA-backtrack algorithm that runs in O(length of reference genome * length of reads) using O(length of reference genome) space. For more details refer to the methods section of the Project Report available in the final-project-files directory.

Sources

Available in final-project-files/Project-Report.pdf

Contributors

This repository was generated by Willard Ford, with inspiration from the CSE 185 Example Repository and the work of my fellow students.

Please submit a pull request with any corrections or suggestions.

About

This tool aligns reads back to a reference genome. It is also my final project for CSE 185 Advanced Bioinformatics Lab. Won Best Presentation Award among 60+ student projects.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published