Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review API for NCBI Gene #697

Closed
kdahlquist opened this issue Nov 1, 2018 · 7 comments
Closed

Review API for NCBI Gene #697

kdahlquist opened this issue Nov 1, 2018 · 7 comments

Comments

@kdahlquist
Copy link
Collaborator

TODO: Review API query for NCBI Gene

@kdahlquist
Copy link
Collaborator Author

@johnllopez616 will copy the API call out of the code and paste it as a comment to this issue for @kdahlquist to review.

@jlopez616
Copy link
Collaborator

let getNCBIInfo = function (geneSymbol) {
    return $.get({
        url: serviceRoot + "/ncbi/entrez/eutils/esearch.fcgi",
        data: {
            db: "gene",
            term: geneSymbol + "[gene]+Saccharomyces+cerevisiae[Organism]",
        },
        dataType: "text",
        timeout: 5000,
    }).then(function (data) {
        const regex = /<Id>(\d*)<\/Id>/gm;
        const id = regex.exec(data)[1];
        return $.get({
            url: serviceRoot + "/ncbi/entrez/eutils/esummary.fcgi?db=gene&id=" + id,
            dataType: "xml",
            timeout: 5000,
        });
    });
};

@jlopez616
Copy link
Collaborator

Also a straightforward way to un-hardcode the gene name, except, the current query doesn't use the taxon, like JASPAR and UniProt. I'll investigate to see if we can search NCBI by taxon, for the sake of convenience:

let getNCBIInfo = function (geneSymbol) {
    let geneName = "Saccharomyces+cerevisiae";
    return $.get({
        url: serviceRoot + "/ncbi/entrez/eutils/esearch.fcgi",
        data: {
            db: "gene",
            term: geneSymbol + "[gene]+" + geneName + "[Organism]",
        },
        dataType: "text",
        timeout: 5000,
    }).then(function (data) {

@jlopez616
Copy link
Collaborator

Found a weird workaround solution that allows us to search NCBI by taxon to get the geneName, requires using this query:
"https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=taxonomy&id=" + taxon number. Returns the following:

image

If nothing else, we could parse the data here, inject a plus symbol into the ID, and perform the query mentioned above.
I'll see if I can find something better.

At the very least, we now have a way to get the species name from the taxon ID, in case we need this for other purposes, such as the Ensembl API (#696)

@kdahlquist
Copy link
Collaborator Author

I'm surprised that there has to be a workaround. NCBI owns the taxonomy database and I would be surprised if that cannot be accessed via API.

@jlopez616
Copy link
Collaborator

jlopez616 commented Jan 28, 2019

After browsing the capabilities of the NCBI API, I conclude it is not possible to directly retrieve gene data from strictly the taxon ID.

According to the documentation, the way to access gene information through the NCBI database is by providing an Entrez Unique Identifier (UID).
This is the purpose of the first get() function:

This would require us knowing both the gene name and the organism name in advance though, which, presently, are both passed into the gene page as the page is created.

return $.get({
        url: serviceRoot + "/ncbi/entrez/eutils/esearch.fcgi",
        data: {
            db: "gene",
            term: geneSymbol + "[gene]+" + geneName + "[Organism]",
        },
        dataType: "text",
        timeout: 5000,
    }).

The result of this function is page of XML data, which we use to get the UID.

Example query: YHP1

<eSearchResult>
<Count>1</Count>
<RetMax>1</RetMax>
<RetStart>0</RetStart>
<IdList>
**_<Id>852062</Id>_** <!-- We want this -->
</IdList>
<TranslationSet>
<Translation>
<From>+Saccharomyces+cerevisiae[Organism]</From>
<To>"Saccharomyces cerevisiae"[Organism]</To>
</Translation>
</TranslationSet>
<TranslationStack>
<TermSet>
<Term>YHP1[gene]</Term>
<Field>gene</Field>
<Count>1</Count>
<Explode>N</Explode>
</TermSet>
<TermSet>
<Term>"Saccharomyces cerevisiae"[Organism]</Term>
<Field>Organism</Field>
<Count>7062</Count>
<Explode>Y</Explode>
</TermSet>
<OP>AND</OP>
</TranslationStack>
<QueryTranslation>
YHP1[gene] AND "Saccharomyces cerevisiae"[Organism]
</QueryTranslation>
</eSearchResult>

We take that value, and put it into a SECOND get() function to retrieve the gene data we want:

return $.get({
            url: serviceRoot + "/ncbi/entrez/eutils/esummary.fcgi?db=gene&id=" + id,
            dataType: "xml",
            timeout: 5000,
        });

As stated in my previous comment, it is possible to retrieve the species name and the gene name by accessing the taxonomy database, which would give us the organism name. It would require a third get() function to be created, however, which isn't difficult but would slow the page down slightly. Is this a workaround we want to implement, @kdahlquist?

@kdahlquist
Copy link
Collaborator Author

With the posting of the API wiki page: https://github.com/dondi/GRNsight/wiki/Web-API-Guide, and today's discussion, this issue is closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants