-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
salmon 1.2.0 index size with Rnor data using SA method #505
Comments
No; this is not expected. Can you list the contents of the index directory? |
Hi @rob-p Salmon index command used
Directory size after indexing completes
Listing of files and their sizes
|
So, I am still very surprised by the 45G number, but one big difference here is that the refgenomes indices are with the default value of k ( |
Ok - I will give this a try with Will update soon |
When I try it this way, the program exits
These are the last few lines
|
I am a bit puzzled that salmon indexing runs fine with a |
Hi @tamuanand, Ok, it seems something simple with the preparation of the decoys.txt file. I'm looking into it. If you watch the log, you see the following output before the (intentional exit with status code 1):
--Rob |
Ok, for some reason it's not finding these two particular decoys (even though they do seem to be in the gentrome file):
still looking further. |
Ok, I figured it out :) — these two decoy references are (1) identical with each other and (2) collide with another decoy reference. Currently, the way we process decoys, we don't allow duplicate decoys (it makes even less sense to allow duplicate decoys than to allow duplicate transcripts). However, the reason indexing worked with However, for the time being, I think the best thing to do is simply to remove |
thanks for looking into it @rob-p Just curious, how did it work without issues when I had |
@rob-p Looks like we both posted at the same time 🤔 you answered my question on why it worked with |
The key flag there is |
Ok, the indexing completed after just < 20m of real time (with 12 threads).
|
Thanks @rob-p I will close this |
Hi @rob-p and @tamuanand - I ran into the same issue with the Rnor6.0 reference from Ensembl, so thanks for the tips and solutions here on the two duplicate sequences! |
Hi @tamuanand and @uros-sipetic, Thanks for the feedback on this! I just cut v1.2.1 which "fixes" the behavior. It will simply discard any duplicate decoy sequences, which resolves this problem without requiring manual intervention. |
@rob-p Thanks for releasing 1.2.1 Do any of the new fixes affect Why the question: I have already started using 1.2.0 for different |
No, there are no changes here. Further, indices built from version 1.0.0 are forward-compatible up through the current release. There is no need to rebuild any indices. Also, though 1.2.1 added a new flag, it made no changes to defaults, so quantifications between 1.2.0 and 1.2.1 are directly comparable. |
Hi
I am using salmon 1.2.0 to build a Salmon-SAF index on Rat data from Ensembl -- using full genome as decoy.
After I build the SA index, the directory size is 45 GB. The same with v1.1.0 on the RefGenomes site is about 15GB - http://refgenomes.databio.org/v2/asset/rn6/salmon_sa_index/splash?tag=default
Is this expected?
The text was updated successfully, but these errors were encountered: