Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

False indel-derived offtargets #67

Open
thomas-davis opened this issue Sep 10, 2024 · 11 comments
Open

False indel-derived offtargets #67

thomas-davis opened this issue Sep 10, 2024 · 11 comments
Labels
bug Something isn't working

Comments

@thomas-davis
Copy link

thomas-davis commented Sep 10, 2024

Describe the bug

I suspect CRISRPme is occasionally reporting off-targets generated by indels that are False-- i.e. that are not actually generated by the reported indel

As part of an analysis, I generate oligos representing variant-containing off-target sequences with surrounding genomic context.
To do this I'm extract the reference sequence and surrounding context for off-target sites reported in crisprme, and then introducing the reported SNP into the sequence. 95-99% of the time I am able to find the reported off-target in this extracted sequence. However rarely, for some indel associated off-targets, I am unable to find a sequence that matched the reported off-target.

I can't send explicit examples unfortunately because they involve proprietary spacers, but I have observed this same bug across two versions of crisprme (v2.1.5 and v2.1.1). Is this a known bug? What other information would be helpful in tracking this down? I'm happy to disclose the code I'm using to add variants to the extracted sequence. If you have outputs for a public spacer I can take a look at whether the same bug is present and give concrete examples. The bug is preventing us from being able to use crisprme for our application.

To Reproduce
Ran crisprme via the command line using HG38, 1000G + HGDP.

--genome Genomes/full_renamed \
--vcf 1000G_and_hgdp_list_vcf.txt \
--samplesID list_samplesID.txt \
--guide THE_NAME_OF_OUR_SPACER_spacer.txt \
--pam PAMs/20bp-FOO-CAS.txt \
--annotation Annotations/encode+gencode.hg38.bed \
--gene_annotation Annotations/encode+gencode.hg38.bed \
--mm 6 \
--bDNA 2 \
--bRNA 2 \
--bMax 2 \
--merge 3 \
--output THE_NAME_OF_OUR_SPACER \
--thread 64
  1. Spacer sequences
    I can't disclose this, apologies. Observed across two difference spacers. If need be I can try to reproduce with a public spacer.

  2. Cas protein
    Also cannot disclose this. Effect observed across two different CAS's

  3. PAM
    Also cannot disclose this. Effect observed across two different PAMs

  4. Genome
    HG38

  5. Variants dataset (OPTIONAL)
    HGDP + 1000G

  6. Thresholds
    Mismatches: 6
    DNA Bulges: 2
    RNA Bulges: 2

Expected behavior
Indels should generate the expected off-target sequence

Screenshots
If running CRISPRme via website, add screenshots to help explain your problem.

Environment (please complete the following information, ONLY applicable if running CRISPRme via command line):

  • Python version 3.9.15
  • CRISPRme version 2.1.5 and 2.1.1
  • CRISPRitz version 2.6.6
  • axel version 2.17.11
  • gdown version 5.2
  • numpy version 1.20.0
  • dash version 1.10.0
  • dash-bootstrap-components version 0.10
  • dash-core-components version 1.9.0
  • dash-daq version 0.4.0
  • dash-html-components version 1.0.3
  • dash-renderer version 1.3.0
  • dash-table version 4.6.2
  • flask version 1.1.3
  • flask-caching version 1.7.1
  • flask-compress version 1.5.0
  • fontconfig version 2.13.1
  • freetype version 2.10.1
  • future version 0.18.2
  • gettext version 0.19.8.1
  • gunicorn version 20.0.4
  • werkzeug version 1.0.1
  • pandas version 1.2.5

Thank you in advance for your help!

@thomas-davis thomas-davis added the bug Something isn't working label Sep 10, 2024
@ManuelTgn
Copy link
Contributor

Hi @thomas-davis,

Thank you for bringing this issue to our attention. Could you kindly share the code you’re using to add variants to the sequences? This will help us replicate the behavior in CRISPRme using a public spacer, so we can investigate if the issue persists in a broader context.

Best,
Manuel

@ManuelTgn
Copy link
Contributor

Hi @thomas-davis,

I am writing to follow up on the status of the scripts that were being adapted to utilize public guides for reproducing the unexpected behaviors observed in CRISPRme. We believe that having access to these scripts could be instrumental in helping us identify the underlying causes of the inconsistencies we have encountered.

Any updates or insights you could provide would be greatly appreciated, as they may help expedite our understanding and resolution of these issues.

Additionally, I would like to let you know that version 2.1.1 has been deprecated, as it contains a known bug that affects the correct reporting of targets generated by indels. We advise using a CRISPRme v2.1.5 or greater to avoid this issue, although you reported similar issues even running this tool's version.

Thank you for your time and assistance.

Best,
Manuel

@thomas-davis
Copy link
Author

thomas-davis commented Sep 16, 2024

Hey @ManuelTgn apologies I'll get back to you as soon as I can.

We were also wondering what public spacer you'd planned to use for crisprme testing, ideally wanted to run it on our end as well to test our infrastructure. It is the one listed in sg1617.txt?

@lucapinello
Copy link

lucapinello commented Sep 16, 2024 via email

@thomas-davis
Copy link
Author

thomas-davis commented Sep 23, 2024

Sorry I still owe you code but side question-- we tried running sg1617 to replicate the bug and ran into error.

Traceback (most recent call last):
  File "/opt/conda/opt/crisprme/PostProcess/./new_simple_analysis.py", line 791, in <module>
    clusters_with_scores = calculate_scores(cluster_to_save)
  File "/opt/conda/opt/crisprme/PostProcess/./new_simple_analysis.py", line 566, in calculate_scores
    cluster_with_CRISTA_score = preprocess_CRISTA_score(cluster_to_save)
  File "/opt/conda/opt/crisprme/PostProcess/./new_simple_analysis.py", line 469, in preprocess_CRISTA_score
    crista_score_list_alt = CRISTA_predict_list(
  File "/opt/conda/opt/crisprme/PostProcess/CRISTA_score.py", line 399, in CRISTA_predict_list
    features = get_features(full_dna_seq=dna_seq_29nt, aligned_sgRNA=sgRNA_seq,
  File "/opt/conda/opt/crisprme/PostProcess/CRISTA_score.py", line 254, in get_features
    dna_enthalpy = sum([DNA_PAIRS_THERMODYNAMICS[gapless_dnaseq[i-1:i+1]]
  File "/opt/conda/opt/crisprme/PostProcess/CRISTA_score.py", line 254, in <listcomp>
    dna_enthalpy = sum([DNA_PAIRS_THERMODYNAMICS[gapless_dnaseq[i-1:i+1]]
KeyError: 'AR'
CRISPRme ERROR: annotation analysis failed (script: ./scriptAnalisiNNN_v3.sh line 97)
CRISPRme ERROR: SNP analysis failed (script: ./post_analisi_snp.sh line 33)

We were running the following with v2.1.5 of crisprme
Have you run 2.1.5 on the sg1617 spacer? Have you seen this crash before?
Thanks!

crisprme.py complete-search \
--genome Genomes/full_renamed \
--vcf 1000G_and_hgdp_list_vcf.txt \
--samplesID list_samplesID.txt \
--guide sg1617_spacer.txt \
--pam PAMs/20bp-FOO-SpCas9.txt \
--annotation Annotations/encode+gencode.hg38.bed \
--gene_annotation Annotations/encode+gencode.hg38.bed \
--mm 5 \
--bDNA 2 \
--bRNA 2 \
--bMax 2 \
--merge 3 \
--output sg1617 \
--thread 64
cat Results/sg1617/log_error.txt
cat Results/sg1617/log_error_no_check.txt

It seems like here you guys deal with case of 'N' being present in reference genome match but not necessarily other IUPAC expanded nucleotides (like 'R') which causes crash over here

@thomas-davis
Copy link
Author

thomas-davis commented Sep 23, 2024

Also here's the notebook code. Currently run on a proprietary spacer, so I can't include intermediate outputs or much detail.

Once we run sg1617.txt I can link with intermediate files.
notebook_example.zip

@thomas-davis
Copy link
Author

One last comment, one of my colleagues will be taking over this correspondence. Thanks in advance for your help.

@dkuo-ttx
Copy link

Hi, I'll be taking over correspondence for this issue from @thomas-davis. Let me know if there's anything else that's needed from our side and thanks!

@ManuelTgn
Copy link
Contributor

Hi @thomas-davis and @dkuo-ttx,

Thank you for sharing the details. The error log is indeed unusual, and we haven’t encountered this issue before. We will thoroughly investigate the problem and work on identifying a solution.

Best,
Manuel

@ManuelTgn
Copy link
Contributor

Hi @dkuo-ttx,

We attempted to reproduce the reported error using a different PAM, such as NAA, which seems similar to what you’re working with. However, we weren't able to replicate the issue, and the run completed smoothly. Below is the exact sequence of commands we executed for testing:

mamba create -n crisprme-2.1.5  # create fresh environment for CRISPRme v2.1.5
mamba activate crisprme-2.1.5  # activate the environment 
mamba install crisprme=2.1.5 -y  # install CRISPRme from bioconda targeting v2.1.5
crisprme.py --version  # v2.1.5
# run the tool -- 1000G + HGDP, NAA PAM
crisprme.py complete-search \
  --genome Genomes/hg38  \
  --vcf vcf_list_1000G_HGDP.txt \
  --samplesID samplesIDs_list_1000G_HGDP.txt \
  --guide sg1617_test_guide.txt \
  --pam PAMs/20bp-NAA-iSpyMacCas9.txt \
  --annotation Annotations/dhs+encode+gencode.hg38.bed \
  --gene_annotation Annotations/dhs+encode+gencode.hg38.bed \
  --mm 5 \
  --bDNA 2 \
  --bRNA 2 \
  --bMax 2 \
  --merge 3 \
  --output sg1617 \
  --thread 64

I've also attached the full log files for reference:
log_error_no_check.txt
log_verbose.txt
log.txt

Could you please share the specific PAM sequence file you're using? There might be an issue related to that. Additionally, if the conversation may touch on any confidential information, we can switch to email to ensure everything remains secure.

Best,
Manuel

@dkuo-ttx
Copy link

dkuo-ttx commented Oct 4, 2024

Hi @ManuelTgn,

Thanks for the replies and apologies for my delay in getting back to you. My colleague was more deeply engaged in this issue so thanks also for your patience as I get up to speed.

Interesting that the run completed smoothly. We used this as the sg1617 spacer: CTAACAGTTGCTTTTATCAC and had searched for an NGG pam when we ran into the issue. I'll try to run the code locally as you've described to see if it was maybe a deployment issue on our side.

I think once we can get the test spacer to run, we can assess the InDel bug together. I'll send another update in the next 2-3 days. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants