-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add USPTO-480K dataset from https://doi.org/10.1039/C8SC04228D #61
Conversation
Updated to use only reaction SMILES and not add in made-up stuff to make the validations happy; PTAL. |
The information after the reaction SMILES is an annotation of bond changes during the reaction, not actually information related to the reaction following the extended SMILES format. Everything after the space should just be discarded and the type changed to plain old |
Thanks; fixed. |
…64) * Add USPTO-480K dataset from https://doi.org/10.1039/C8SC04228D (#61) * Add USPTO-480K dataset from https://doi.org/10.1039/C8SC04228D * trigger * bump ord-schema version * update to just use reaction smiles * bump ord-schema version * trigger * remove LFS placeholder * REACTION_SMILES * fix typo * Update submission * Update badges * Update validation.yml * Update submission * bump ord-schema version * Update submission Co-authored-by: github-actions <github-actions@github.com>
Here's the notebook I used to create these: uspto-480k.zip. Only the reaction SMILES are included; pending open-reaction-database/ord-schema#559 for the validations to pass.
The data in https://github.com/connorcoley/rexgen_direct is licensed under GPL-3.0, and @connorcoley has consented to re-license for inclusion in the ORD.