Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index error with Salmon v1.0 #451

Open
chilampoon opened this issue Nov 20, 2019 · 5 comments
Open

Index error with Salmon v1.0 #451

chilampoon opened this issue Nov 20, 2019 · 5 comments
Labels
fixed in develop this bug has been fixed in develop and the issue will be closed when merged into master SA Issue related to Selective Alignment

Comments

@chilampoon
Copy link

Is the bug primarily related to salmon (bulk mode) or alevin (single-cell mode)?
salmon

Describe the bug
I wanted to quantify the expression of 3′ UTR isoforms using QAPA. A step within this package is to index the 3'UTR fasta file using Salmon, but I can't get the index:

salmon index -t output_sequences.fa -i utr_library

Version Info: This is the most recent version of salmon.
index ["utr_library"] did not previously exist  . . . creating it
[2019-11-20 19:50:23.102] [jLog] [info] building index
out : utr_library
[2019-11-20 19:50:23.102] [puff::index::jointLog] [info] Running fixFasta

[Step 1 of 4] : counting k-mers
counted k-mers for 40000 transcripts
[2019-11-20 19:50:26.017] [puff::index::jointLog] [info] Replaced 0 non-ATCG nucleotides
[2019-11-20 19:50:26.017] [puff::index::jointLog] [info] Clipped poly-A tails from 86 transcripts
wrote 41857 cleaned references
seqHash 256 : 53fb8234c46e608b5ffa1f70869f5705573b3f671f35cbc2490ac78dd90e917d
seqHash 512 : 87b7752997ca977ff56d02f69857a32f13b3c39a0a084c72feaa2c97e698b9b04d80a88c6755b97aede5604b89fdf66789a14f7976a89597a7832760a47e8919
nameHash 256 : 54e47ff5eb21b38ef24c8ffa3fc2a192ee5d9c0541bc6ee2da9414ecbd0f8c59
nameHash 512 : 163b337219cfd19b0c4c99cece12c2c2b760b3bf7e4686dbe633259c78552a56f2f015f18740a33c18e0f14c5f362997395c3168590f3ad80704071cabfab13a
[2019-11-20 19:50:26.273] [puff::index::jointLog] [info] Filter size not provided; estimating from number of distinct k-mers
[2019-11-20 19:50:27.059] [puff::index::jointLog] [info] ntHll estimated 34379504 distinct k-mers, setting filter size to 2^30
Threads = 2
Vertex length = 31
Hash functions = 5
Filter size = 1073741824
Capacity = 2
Files: 
utr_library/ref_k31_fixed.fa
--------------------------------------------------------------------------------
Round 0, 0:1073741824
Pass	Filling	Filtering
1	17	42	
2	2	0
True junctions count = 102593
False junctions count = 122933
Hash table size = 225526
Candidate marks count = 1387640
--------------------------------------------------------------------------------
Reallocating bifurcations time: 0
True marks count: 1100523
Edges construction time: 13
--------------------------------------------------------------------------------
Distinct junctions = 102593

approximateContigTotalLength: 29519449
counters:
13519 5 4 5
ERROR!! DOESN'T SUPPORT STRING LENGTH LONGER THAN 255. String length: 317

And the output_sequences.fa is from qapa fasta -f genome/hg38/hg38.fa /qapa/qapa_3utrs.gencode_V31.hg38.bed output_sequences.fa

The bed file is the pre-compiled annotation file from QAPA.

To Reproduce
Steps and data to reproduce the behavior:

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots or terminal output to help explain your problem.

Desktop (please complete the following information):

  • OS: Ubuntu Linux
  • Ubuntu 18.04.2 LTS
@rob-p
Copy link
Collaborator

rob-p commented Nov 20, 2019

Thanks for the report, @chilampoon. I think this is due to an arbitrary (but fixable) limitation on the length of identifier names in the input. Can you look into this, @fataltes, and fix in upstream indexing code. We'll ping back here once we have a fix. I believe version 0.15.0 should not have this issue if you are in immediate need.

@boiscat
Copy link

boiscat commented Nov 26, 2019

I get the same error in buliding 3' UTR index and expect the next version.

@k3yavi
Copy link
Member

k3yavi commented Nov 26, 2019

Hi guys,

I think, not completely sure, one workaround while the pufferfish is being fixed would be to change the target names in the input fasta to something shorter than 255 length but unique to the sequence.

@rob-p
Copy link
Collaborator

rob-p commented Nov 26, 2019

Right; so this has been fixed upstream and the limitation will be removed in the next release. As @k3yavi says, one option is to modify the reference input names to be of length <255. The other option is to make use of the 0.15.0 release, which does not have this limitation, until the next release that fixes this under the pufferfish-based index.

@chilampoon
Copy link
Author

Got it, thank you all!

@k3yavi k3yavi added SA Issue related to Selective Alignment fixed in develop this bug has been fixed in develop and the issue will be closed when merged into master labels Nov 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fixed in develop this bug has been fixed in develop and the issue will be closed when merged into master SA Issue related to Selective Alignment
Projects
None yet
Development

No branches or pull requests

4 participants