-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Appropiate background genes for GC & length correction #71
Comments
Hey, Thanks for bringing this to our attention. We are looking into it and will get back to you on it soon. |
Hi @xoelmb, thanks for bringing this to our attention. I just checked if this was something @Al-Murphy or I had introduced when making updates to EWCE over the last couple years, but it seems that this bit of code existed long before any of those changes were made (at least as far back as May 2018): EWCE/R/check.ewce.genelist.input.r Line 87 in 699ff6d
I agree that this seems to contradict what the documentation says and I can't think of a reason why genes absent from the SCT/CTD wouldn't be dropped in this situation. @NathanSkene do you have any insight into why this was here originally? |
Can't think of any reason.... doesn't mean there wasn't one... let's drop it and see what happens?
…________________________________
From: Brian M. Schilder ***@***.***>
Sent: 21 July 2022 22:21
To: NathanSkene/EWCE ***@***.***>
Cc: Skene, Nathan G ***@***.***>; Mention ***@***.***>
Subject: Re: [NathanSkene/EWCE] Appropiate background genes for GC & length correction (Issue #71)
This email from ***@***.*** originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list<https://spam.ic.ac.uk/SpamConsole/Senders.aspx> to disable email stamping for this address.
Hi @xoelmb<https://github.com/xoelmb>, thanks for bringing this to our attention. I just checked if this was something Alan or I had introduced when making updates to EWCE over the last couple years, but it seems that this was implemented long before any of those changes were made:
https://github.com/NathanSkene/EWCE/blob/699ff6dead2970bf40ff5d7b73c212253930ef35/R/check.ewce.genelist.input.r#L87
I agree that this seems to contradict what the documentation says and I can't think of a reason why genes absent from the SCT/CTD wouldn't be dropped in this situation. @NathanSkene<https://github.com/NathanSkene> do you have any insight into why this was here originally?
—
Reply to this email directly, view it on GitHub<#71 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AH5ZPE4UQ3S4J62VVPXP653VVG5O3ANCNFSM53IAS2XA>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Will do @NathanSkene Come to think of it, I seem to recall @KittyMurphy encountered a similar effect of increased significant results when using |
Good point! Thanks for figuring this out
…________________________________
From: Brian M. Schilder ***@***.***>
Sent: 21 July 2022 22:30
To: NathanSkene/EWCE ***@***.***>
Cc: Skene, Nathan G ***@***.***>; Mention ***@***.***>
Subject: Re: [NathanSkene/EWCE] Appropiate background genes for GC & length correction (Issue #71)
This email from ***@***.*** originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list<https://spam.ic.ac.uk/SpamConsole/Senders.aspx> to disable email stamping for this address.
Will do @NathanSkene<https://github.com/NathanSkene>
Come to think of it, I seem to recall @KittyMurphy<https://github.com/KittyMurphy> encountered a similar effect of increased significant results when using geneSizeControl =TRUE. This likely explains why.
—
Reply to this email directly, view it on GitHub<#71 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AH5ZPE6ISHT5YHJL26ZBBFTVVG6PHANCNFSM53IAS2XA>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Great! Thank you all for spending your time on this! I'll keep an eye on how this gets solved and in the meantime I'll be using the results without geneSizeControl. |
Hey! The fix has been implemented (version 1.5.4) in the Github master branch so feel free to install EWCE from there so you can use it with geneSizeControl. Otherwise it will be up on the bioconductor dev version in the next few days (usually 2-3). You can use that by installing:
Thanks for pointing this out! |
Great! Thanks to you again!! |
Is your feature request related to a problem? Please describe.
I have run the bootstrap enrichment test with EWCE on ~40 genesets and a scRNA-seq dataset with and without geneSizeControl (GC % & gene length control), and with automatic background genes. The number of significant results (q value < 0.05) I get when geneSizeControl is on is huge in comparison. I have been reading the code and I think that the background genes created when geneSizeControl is on include genes not present in the SCT expression dataset.
Describe the solution you'd like
I think there is a simple solution to this problem: restricting the background genes used even when geneSizeControl is on to those present in the expression data. So, basically removing one check would make it. Changing this
#### Restrict gene sets to only genes in the SCT dataset ####
if (!geneSizeControl) {
hits <- hits[hits %in% sct_genes]
bg <- bg[bg %in% sct_genes]
}
For this
#### Restrict gene sets to only genes in the SCT dataset ####
hits <- hits[hits %in% sct_genes]
bg <- bg[bg %in% sct_genes]
The function is check_ewce_genelist_input
EWCE/R/check_ewce_genelist_input.r
Lines 129 to 133 in 9400d4c
Describe alternatives you've considered
Maybe I'm missing something. Is this filtering done in some other part of the code that I have not reached? Is this expected behavior?
Additional context
I think this filtering is meant to be done. According to the original EWCE article:
Methods:
Bootstrapping with Controls for Transcript Length and GC Content
The text was updated successfully, but these errors were encountered: