Appropiate background genes for GC & length correction #71

xoelmb · 2022-07-11T16:01:01Z

Is your feature request related to a problem? Please describe.
I have run the bootstrap enrichment test with EWCE on ~40 genesets and a scRNA-seq dataset with and without geneSizeControl (GC % & gene length control), and with automatic background genes. The number of significant results (q value < 0.05) I get when geneSizeControl is on is huge in comparison. I have been reading the code and I think that the background genes created when geneSizeControl is on include genes not present in the SCT expression dataset.

Describe the solution you'd like
I think there is a simple solution to this problem: restricting the background genes used even when geneSizeControl is on to those present in the expression data. So, basically removing one check would make it. Changing this
#### Restrict gene sets to only genes in the SCT dataset ####
if (!geneSizeControl) {
hits <- hits[hits %in% sct_genes]
bg <- bg[bg %in% sct_genes]
}
For this
#### Restrict gene sets to only genes in the SCT dataset ####
hits <- hits[hits %in% sct_genes]
bg <- bg[bg %in% sct_genes]

The function is check_ewce_genelist_input

EWCE/R/check_ewce_genelist_input.r

Lines 129 to 133 in 9400d4c

    
           #### Restrict gene sets to only genes in the SCT dataset  #### 
        
           if (!geneSizeControl) { 
        
               hits <- hits[hits %in% sct_genes] 
        
               bg <- bg[bg %in% sct_genes] 
        
           }

Describe alternatives you've considered
Maybe I'm missing something. Is this filtering done in some other part of the code that I have not reached? Is this expected behavior?

Additional context
I think this filtering is meant to be done. According to the original EWCE article:
Methods:
Bootstrapping with Controls for Transcript Length and GC Content

...The deciles of gene size and GC content were calculated over the set of genes expressed in the SCT dataset (after dropping those with low expression levels as described above). The two sets of decile values were used to define a grid, and each gene assigned to a position within the grid based on it's transcript lengths and GC content...

The text was updated successfully, but these errors were encountered:

Al-Murphy · 2022-07-20T19:13:02Z

Hey,

Thanks for bringing this to our attention. We are looking into it and will get back to you on it soon.

bschilder · 2022-07-21T21:21:38Z

Hi @xoelmb, thanks for bringing this to our attention. I just checked if this was something @Al-Murphy or I had introduced when making updates to EWCE over the last couple years, but it seems that this bit of code existed long before any of those changes were made (at least as far back as May 2018):

EWCE/R/check.ewce.genelist.input.r

Line 87 in 699ff6d

if(geneSizeControl==FALSE){

I agree that this seems to contradict what the documentation says and I can't think of a reason why genes absent from the SCT/CTD wouldn't be dropped in this situation. @NathanSkene do you have any insight into why this was here originally?

NathanSkene · 2022-07-21T21:28:33Z

Can't think of any reason.... doesn't mean there wasn't one... let's drop it and see what happens?

…

________________________________ From: Brian M. Schilder ***@***.***> Sent: 21 July 2022 22:21 To: NathanSkene/EWCE ***@***.***> Cc: Skene, Nathan G ***@***.***>; Mention ***@***.***> Subject: Re: [NathanSkene/EWCE] Appropiate background genes for GC & length correction (Issue #71) This email from ***@***.*** originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list<https://spam.ic.ac.uk/SpamConsole/Senders.aspx> to disable email stamping for this address. Hi @xoelmb<https://github.com/xoelmb>, thanks for bringing this to our attention. I just checked if this was something Alan or I had introduced when making updates to EWCE over the last couple years, but it seems that this was implemented long before any of those changes were made: https://github.com/NathanSkene/EWCE/blob/699ff6dead2970bf40ff5d7b73c212253930ef35/R/check.ewce.genelist.input.r#L87 I agree that this seems to contradict what the documentation says and I can't think of a reason why genes absent from the SCT/CTD wouldn't be dropped in this situation. @NathanSkene<https://github.com/NathanSkene> do you have any insight into why this was here originally? — Reply to this email directly, view it on GitHub<#71 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AH5ZPE4UQ3S4J62VVPXP653VVG5O3ANCNFSM53IAS2XA>. You are receiving this because you were mentioned.Message ID: ***@***.***>

bschilder · 2022-07-21T21:30:16Z

Will do @NathanSkene

Come to think of it, I seem to recall @KittyMurphy encountered a similar effect of increased significant results when using geneSizeControl =TRUE. This likely explains why.

NathanSkene · 2022-07-21T21:31:40Z

Good point! Thanks for figuring this out

…

________________________________ From: Brian M. Schilder ***@***.***> Sent: 21 July 2022 22:30 To: NathanSkene/EWCE ***@***.***> Cc: Skene, Nathan G ***@***.***>; Mention ***@***.***> Subject: Re: [NathanSkene/EWCE] Appropiate background genes for GC & length correction (Issue #71) This email from ***@***.*** originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list<https://spam.ic.ac.uk/SpamConsole/Senders.aspx> to disable email stamping for this address. Will do @NathanSkene<https://github.com/NathanSkene> Come to think of it, I seem to recall @KittyMurphy<https://github.com/KittyMurphy> encountered a similar effect of increased significant results when using geneSizeControl =TRUE. This likely explains why. — Reply to this email directly, view it on GitHub<#71 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AH5ZPE6ISHT5YHJL26ZBBFTVVG6PHANCNFSM53IAS2XA>. You are receiving this because you were mentioned.Message ID: ***@***.***>

xoelmb · 2022-07-22T07:13:42Z

Great! Thank you all for spending your time on this! I'll keep an eye on how this gets solved and in the meantime I'll be using the results without geneSizeControl.
I'm happy I could make this little contribution to such a good package!

Al-Murphy · 2022-07-22T07:19:04Z

Hey! The fix has been implemented (version 1.5.4) in the Github master branch so feel free to install EWCE from there so you can use it with geneSizeControl. Otherwise it will be up on the bioconductor dev version in the next few days (usually 2-3). You can use that by installing:

BiocManager::install(version='devel')

Thanks for pointing this out!

xoelmb · 2022-07-22T07:25:15Z

Great! Thanks to you again!! ☺️

bschilder self-assigned this Jul 21, 2022

bschilder added the bug label Jul 21, 2022

Al-Murphy closed this as completed Jul 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Appropiate background genes for GC & length correction #71

Appropiate background genes for GC & length correction #71

xoelmb commented Jul 11, 2022 •

edited

Loading

Al-Murphy commented Jul 20, 2022

bschilder commented Jul 21, 2022 •

edited

Loading

NathanSkene commented Jul 21, 2022 via email

bschilder commented Jul 21, 2022

NathanSkene commented Jul 21, 2022 via email

xoelmb commented Jul 22, 2022

Al-Murphy commented Jul 22, 2022

xoelmb commented Jul 22, 2022

Appropiate background genes for GC & length correction #71

Appropiate background genes for GC & length correction #71

Comments

xoelmb commented Jul 11, 2022 • edited Loading

Al-Murphy commented Jul 20, 2022

bschilder commented Jul 21, 2022 • edited Loading

NathanSkene commented Jul 21, 2022 via email

bschilder commented Jul 21, 2022

NathanSkene commented Jul 21, 2022 via email

xoelmb commented Jul 22, 2022

Al-Murphy commented Jul 22, 2022

xoelmb commented Jul 22, 2022

xoelmb commented Jul 11, 2022 •

edited

Loading

bschilder commented Jul 21, 2022 •

edited

Loading