Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review API query for Ensembl #696

Closed
kdahlquist opened this issue Nov 1, 2018 · 9 comments
Closed

Review API query for Ensembl #696

kdahlquist opened this issue Nov 1, 2018 · 9 comments

Comments

@kdahlquist
Copy link
Collaborator

TODO: Review API query for Ensembl

@kdahlquist
Copy link
Collaborator Author

@johnllopez616 will copy the API call out of the code and paste it as a comment to this issue for @kdahlquist to review.

@jlopez616
Copy link
Collaborator

jlopez616 commented Nov 8, 2018

`let getEnsemblInfo = function (geneSymbol) {

return $.get({
    url: serviceRoot + "/ensembl/lookup/symbol/saccharomyces_cerevisiae/" + geneSymbol,
    dataType: "json",
    timeout: 5000,
    beforeSend: function (xhr) {
        xhr.setRequestHeader("content-type", "application/json");
    },
}).then(function (data) {
    return $.get({
        url: serviceRoot + "/ensembl/lookup/id/" + data.id + "?expand=1",
        dataType: "json",
        timeout: 5000,
        beforeSend: function (xhr) {
            xhr.setRequestHeader("content-type", "application/json");
        },
    });
});

};`

@kdahlquist
Copy link
Collaborator Author

So I see that this call looks up the gene ID in the "symbol" field, which is what we want.

I also see that the species is hard-coded in the call, which is what we will work on with #698.

@jlopez616
Copy link
Collaborator

I would like to really discuss this during our next meeting, but I would like to note a few observations I made about Ensembl API and a possible lack of data we need for Saccharomyces cerevisiae.

I would like to first point out what data we are seeking to get from Ensembl, which are two points:
    const ensemblTemplate = {
        ensemblID: data.id,
        description: data.description,
        dnaSequence: "Not found", // Information unavailable via regular API
        geneLocation: "Not found", // Information unavailable via regular API
    };

According to the documentation (https://rest.ensembl.org/) and (https://rest.ensembl.org/documentation/info/symbol_lookup), when a species name and gene symbol is given, to the first $.get function seen in the comment above (there isn't actually a need for the second one), the data we seek is meant to be returned.

Expected:
image

This works for Homo Sapiens, and has worked for other species (I tested the capuchin monkey). Look what happens, however, when we try to do a gene for Saccharomyces cerevisiae:

Actual:
image

This was tested for ACE2, YHP1, and ADH1.

This lack of data can be substantiated by examining the Ensembl interface. Observe what occurs when a search is made for a gene like ACE2:
image

Note the ACE2 for a human gene contains an Ensembl ID and summary data:
image

Compare this to ACE2 for Saccharomyces cerevisiae:
image

@jlopez616
Copy link
Collaborator

image

Notice how it works, however, for a gene (ALAS1) with "homo sapiens" hardcoded.

@jlopez616
Copy link
Collaborator

At the last meeting, it was discussed that the most important thing to obtain for Ensembl was the link to the Ensembl page. To access the Ensembl page for a given gene,

https://uswest.ensembl.org/Saccharomyces_cerevisiae/Gene/Summary?db=core;g= + locus tag,

where the locus tag may be obtained from NCBI.

@kdahlquist
Copy link
Collaborator Author

It looks like the https statement in the previous comment pings a specific mirror of the ensembl database. Do we want to hardcode a specific mirror or should we just hit "ensembl.org"?

@jlopez616
Copy link
Collaborator

According to the documentation, looking up a gene requires either an ensembl ID or the gene species or symbol. Our code presently uses the latter:

let getEnsemblInfo = function (query) {
    const geneSymbol = query.symbol;
    const geneSpecies = query.species;
    return $.get({
        url: serviceRoot + "/ensembl/lookup/symbol/" + geneSpecies + "/"
        + geneSymbol + "?content-type=application/json",
        dataType: "json",
        timeout: 5000
    });
};

Interestingly enough, there is a workaround that would allow us to extract the species name by just the symbol. According to the documentation for the taxonomy database, we can get the species name using the NCBI taxon, which, as explained in #697, we parse anyway in order to have NCBI's API call working. What we might be able to do is retrieve the gene species from here.

This means, however, if NCBI is not functioning on a given day, we won't be able to access Ensembl, unless we retrieve the species data from another source Is this something we want to discuss further, @kdahlquist?

@kdahlquist
Copy link
Collaborator Author

We will not use the Ensembl API, but will just populate a URL on the gene page, for which another issue is opened. Also see: https://github.com/dondi/GRNsight/wiki/Web-API-Guide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants