Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write function to dynamically populate species id in API queries #698

Closed
kdahlquist opened this issue Nov 1, 2018 · 15 comments
Closed

Write function to dynamically populate species id in API queries #698

kdahlquist opened this issue Nov 1, 2018 · 15 comments

Comments

@kdahlquist
Copy link
Collaborator

Upon review of the JASPAR API query, it was found that the taxon ID for yeast is hard-coded into the query. To make the gene pages more generic for species other than yeast, we will need to set up some code that allows the species to be discovered via the UniProt query, like it is for naming the species at the top of the page, and then dynamically populate the other queries.

@jlopez616
Copy link
Collaborator

I would like to discuss this tomorrow in detail because I have several questions as to how this would work. What if the UniProt API is not functioning? Would this new code resort to Saccharomyces cerevisiae as a default? Moreover, what if it is a gene like ACE2, which is found in both humans and yeast?

@kdahlquist
Copy link
Collaborator Author

Addressing the questions in the previous comment:

  • If we are "discovering" the species from UniProt, what happens if UniProt is not working?
    • It turns out that it may not be the case that we are discovering the species from UniProt, at least in terms of the API's for JASPAR and Ensembl, which we have looked at so far.
    • We will likely need input from the user about the species, see below.
      What happens if the same gene name occurs in multiple species? We will need input from the user.
  • Resorting to Saccharomyces cerevisiae as a default will depend on our mechanism for determining the species.
  • What happens if the same gene is found in multiple species?
    • We are going to need a mechanism to get input from the user. This input is going to need to be stored by GRNsight for as long as the same network is loaded. We will likely need a multi-pronged approach:
      1. We might encode the species in the "optimization_parameters" sheet of GRNmap (requires testing of GRNmap to make sure this doesn't mess something up). Won't work for SIF, might work for GraphML.
      2. We might use a dialog to ask the user at the point of loading a network.
      3. We might allow the user to set a preference in a menu.
      4. We might ask the user at the point of clicking on a gene.

There is more we discussed, but I have to run. I'll comment on the rest later.

@kdahlquist
Copy link
Collaborator Author

Nix asking user when clicking on gene, ask when loading in first place.

@kdahlquist
Copy link
Collaborator Author

That way information is captured in the network object and can be exported (like what @jtorre39 is working on).

@dondi
Copy link
Owner

dondi commented Nov 29, 2018

After today's meeting, the sequence of recommended tasks:

  1. @johnllopez616 completes review of UniProt, NCBI, and JASPAR to verify the ability to unhardcode species from each of their APIs
  2. IF review fails, we need to determine reliable ways to unhardcode species for each database
  3. If review succeeds (except Ensembl), we have a decision point on whether to unhardcode first vs. to solve Ensembl first (probably for next semester).

@jlopez616
Copy link
Collaborator

Looks like it's quite possible to un-hardcode UniProt, NCBI, and JASPAR.

It seems that UniProt and JASPAR require an organism's taxon while NCBI and Ensembl use an organism's species name. However, using an NCBI function as described in (#697), it's possible to get that as long as NCBI is operating.

@kdahlquist
Copy link
Collaborator Author

Let's keep in mind for later that we could ask the user for the taxon ID directly or we could ask them for the species name or both. We might move the species look-up function that you have made to the point when we are asking the user for the species instead of at the point of opening the gene page.

@kdahlquist
Copy link
Collaborator Author

*@kdahlquist needs to review whether we will keep ensembl as a data source.
*@kdahlquist will define where the species will live in the GRNmap workbook, so then it can be parsed and added to the GRNstruct.
*@johnllopez616 will start work on un-hardcoding the species from the API calls.

@jlopez616
Copy link
Collaborator

Spent a 2 hour work session on 1/16/19 allowing data from graph.js to be sent to the gene page for construction. So far, the gene name and taxon is a hardcoded value in graph.js but is passed to the gene page. The only visible change would be the url for the gene page now has a query for species name and taxon, instead of just gene symbol. Is this something you would like to see in a pull request, or should we wait until the entire gene symbol is unhardcoded for the PR?

@kdahlquist
Copy link
Collaborator Author

Not sure, @dondi?

@dondi
Copy link
Owner

dondi commented Jan 22, 2019

This is a good intermediate step. Let's work toward making the species name or taxon ID a gene page parameter then take it from there.

One question for @kdahlquist is whether we can get away with standardizing on just one or the other. They are somewhat redundant and ideally we have a single value.

@kdahlquist
Copy link
Collaborator Author

To answer @dondi, we are probably going to need an internal converter in our code. From what @johnllopez616 found out, some API's are using species name and others are using taxon ID (at least that's what I remember). It would be better (and less ambiguous) if they used taxon ID, but I don't know if that's possible for all the databases.

Another wrinkle is bacterial (and other species) that have substrains with different taxon IDs.

In terms of specifying in the GRNmap input workbook, we can certainly say the user has to use the taxon ID.

If we want to force the user to use it upon file upload, we can ask for the ID and maybe provide a link for them to look it up if they don't know it or something. Just a thought.

@kdahlquist
Copy link
Collaborator Author

To record discussion from today, @johnllopez616 will work on closing other API issues before tackling this one. Note that once we start work on this we will tackle the case of the xlsx input workbook first.

We may end up having a "favorites" list of a dozen species or so that a user can choose from and if their species isn't on the list, then they can type in the taxon ID.

@kdahlquist
Copy link
Collaborator Author

If I'm not mistaken, this one is the subject of the current PR from @johnllopez616 . As soon as that PR is approved and merged with beta, we can close this one.

@jlopez616
Copy link
Collaborator

@johnllopez616 Note to self: add “review requested”

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants