Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add USPTO-480K dataset from https://doi.org/10.1039/C8SC04228D #61

Merged
merged 10 commits into from
Feb 17, 2021

Conversation

skearnes
Copy link
Contributor

@skearnes skearnes commented Feb 12, 2021

Here's the notebook I used to create these: uspto-480k.zip. Only the reaction SMILES are included; pending open-reaction-database/ord-schema#559 for the validations to pass.

The data in https://github.com/connorcoley/rexgen_direct is licensed under GPL-3.0, and @connorcoley has consented to re-license for inclusion in the ORD.

@skearnes skearnes changed the base branch from main to 480k February 12, 2021 16:41
@skearnes skearnes changed the base branch from 480k to main February 17, 2021 17:44
@skearnes skearnes changed the base branch from main to 480k February 17, 2021 17:46
@skearnes
Copy link
Contributor Author

Updated to use only reaction SMILES and not add in made-up stuff to make the validations happy; PTAL.

@connorcoley
Copy link
Contributor

The information after the reaction SMILES is an annotation of bond changes during the reaction, not actually information related to the reaction following the extended SMILES format. Everything after the space should just be discarded and the type changed to plain old REACTION_SMILES (also the description updated to reflect that these aren't REACTION_CXSMILES)

@skearnes
Copy link
Contributor Author

The information after the reaction SMILES is an annotation of bond changes during the reaction, not actually information related to the reaction following the extended SMILES format. Everything after the space should just be discarded and the type changed to plain old REACTION_SMILES (also the description updated to reflect that these aren't REACTION_CXSMILES)

Thanks; fixed.

@skearnes skearnes merged commit 78b3587 into open-reaction-database:480k Feb 17, 2021
@skearnes skearnes deleted the 480k branch February 17, 2021 22:45
skearnes added a commit that referenced this pull request Feb 18, 2021
…64)

* Add USPTO-480K dataset from https://doi.org/10.1039/C8SC04228D (#61)

* Add USPTO-480K dataset from https://doi.org/10.1039/C8SC04228D

* trigger

* bump ord-schema version

* update to just use reaction smiles

* bump ord-schema version

* trigger

* remove LFS placeholder

* REACTION_SMILES

* fix typo

* Update submission

* Update badges

* Update validation.yml

* Update submission

* bump ord-schema version

* Update submission

Co-authored-by: github-actions <github-actions@github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants