- Download Data
- SRR13577846
- Quality Assesment using FastQC V0.11.9
fastqc --threads 2 Data/SRR13577846.fastq.gz --outdir fastqc_raw/
- We have two Summary report statistics that fail: Per base sequence content, which shows expected variation for the first 10 zoomed in basepairs, and realy large differences in the basepairs from 13000-13499. A reason for this could be that only a couple to one read of this length exists, thus making the distribution not as average. The end contains only Gs, highly repetitive region? The other failed statistic is Per sequence GC content. The main peak we see is twice the size of the expected one at 38%. We see two smaller peaks at around 17% and 44%. These might be worth investigating, as they may stem from contaminets.
- DeNovo Assembly with Hifiasm V0.18.7
hifiasm -o hifisam_result/SRR13577846.asm -t 10 Data/SRR13577846.fastq.gz
- Convert resulting .gfa files to fasta:
awk '/^S/{print ">"$2;print $3}' SRR13577846.asm.bp.hap2.p_ctg.gfa > SRR13577846.asm.bp.hap2.p_ctg.fa```
- QC with Quast V5.2.0
- Search for Reference Genome on SGD
http://sgd-archive.yeastgenome.org/sequence/S288C_reference/
by downloading latest genome releases, selecting the latest (RefSeq assembly GCF_000146045.2), and Downloading it.
quast hifisam_result/SRR13577846.asm.bp.hap1.p_ctg.fa -o quast/hap1/ -r Data/GCF_000146045.2_R64_genomic.fna.gz quast hifisam_result/SRR13577846.asm.bp.hap2.p_ctg.fa -o quast/hap2/ -r Data/GCF_000146045.2_R64_genomic.fna.gz
- This results in all of the repotrs found in the quast dir. Most natably, for the first/second haploid: 51/26 contigs, largest contig of 1506376/1410597, total length 12737440/11724748, N50 809047/809047, NG50 809047/809047, misassemblies 118/96
- Search for Reference Genome on SGD
- QC with BUSCO V5.4.4
- Select linage database saccharomycetes_odb10, for concrete comparison.
busco -i hifisam_result/SRR13577846.asm.bp.hap1.p_ctg.fa -l saccharomycetes_odb10 -m genome -o busco_result/hap1 -f busco -i hifisam_result/SRR13577846.asm.bp.hap2.p_ctg.fa -l saccharomycetes_odb10 -m genome -o busco_result/hap2 -f
- Main results for hap1 are: Complete 2129(99.7%), fragmented 2 (0.1%), missing 6 (0.2%) and total BUSCOs 2137. For hap2: Complete 2039, fragmented 3, missing 98 and total BUSCOs 2137
-
Notifications
You must be signed in to change notification settings - Fork 0
mjheid/denovo_exc
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published