Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the flag filtering default to include PCR duplicates #13

Closed
GWW opened this issue Jul 10, 2020 · 4 comments
Closed

Change the flag filtering default to include PCR duplicates #13

GWW opened this issue Jul 10, 2020 · 4 comments

Comments

@GWW
Copy link

GWW commented Jul 10, 2020

From my cursory understanding of cellSNP you filter all of the reads that are marked as PCR duplicates by Cell Ranger. However, this would remove a large number of UMI duplicates as noted by the vartrix documentation:

ignore alignments marked as duplicates? Take care when turning this on with scRNA-seq data, as duplicates are marked in that pipeline for every extra read sharing the same UMI/CB pair, which will result in most variant data being lost.

I wouldn't be surprised if this negatively affects the performance of vireo on some datasets.

@hxj5
Copy link
Collaborator

hxj5 commented Jul 14, 2020

Hi, thanks for your feedback. If the reads sharing the same UMI/CB pair are marked as PCR duplicates by CellRanger, cellSNP would filter them given a small value of parameter maxFLAG. As we found some test datasets on which CellRanger did not perform the marking, we would run CellRanger on a few more datasets, especially using its default parameters.

@GWW
Copy link
Author

GWW commented Jul 14, 2020

Alright, if you think it's the more reasonable choice to filter them. Perhaps it may be worth explicitly stating in the manual that they are filtered by default and may lead to an increased SNV false negative rate, which has been the case in my experience.

@GWW GWW closed this as completed Jul 14, 2020
@hxj5
Copy link
Collaborator

hxj5 commented Jul 20, 2020

We have changed the default value of maxFLAG to include PCR duplicates for scRNA-seq data when UMItag is turned on and state it in the README file. Thanks for your advice!

@GWW
Copy link
Author

GWW commented Jul 20, 2020

No problem. I am glad you made the change in our experience Vireo has performed extremely well using all of the reads including PCR duplicates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants