Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miniaturised Suzuki couplings, reductive aminations, and N-alkylation #203

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

bdeadman
Copy link
Collaborator

…s (#202)

  • 1440 suzuki rxns

1440 suzuki rxns from https://doi.org/10.1038/s44160-023-00351-1.

  • added first subset of reductive aminations, and added name for suzuki dataset

  • now with all 3 dataset files including updates to suzuki and red am

  • Rogue quotation mark in description - fixed

  • Delete alkylation_merge.pbtxt

  • Fixed tag on core_SMILES outcome

  • Added dataset name and descriptor to Suzuki

#202)

* 1440 suzuki rxns

1440 suzuki rxns from https://doi.org/10.1038/s44160-023-00351-1.

* added first subset of reductive aminations, and added name for suzuki dataset

* now with all 3 dataset files including updates to suzuki and red am

* Rogue quotation mark in description - fixed

* Delete alkylation_merge.pbtxt

* Fixed tag on core_SMILES outcome

* Added dataset name and descriptor to Suzuki
Copy link

Change summary:

Filename Added Removed Changed
data/alkylation_merge.pbtxt 0 0 0
data/red_am_1_dataset_v2.pbtxt 0 0 0
data/suzuki_dataset.pbtxt 0 0 0
0 0 0

@bdeadman bdeadman closed this Sep 27, 2024
@bdeadman bdeadman reopened this Sep 27, 2024
Copy link

Change summary:

Filename Added Removed Changed
data/8c/ord_dataset-8c36a0c558ab4012a2b049bfc1466988.pb.gz 1440 0 0
data/a8/ord_dataset-a8ad4beb9a6d4cccacd2bbf9272528a3.pb.gz 104 0 0
data/c5/ord_dataset-c50391cfbc4f43efb359199893b293c2.pb.gz 768 0 0
2312 0 0

@bdeadman
Copy link
Collaborator Author

bdeadman commented Oct 3, 2024

1440 Suzuki rxns, 768 reductive aminations, and 104 N-alkylations with BOC deprotection. From https://doi.org/10.1038/s44160-023-00351-1, a paper by Tim Cernak when he was at Merck. Reactions are miniaturized to uL scale.

submission.zip has the templates and the csv files. Note that the N-alkylation set was assembled in two parts and then joined together to correctly format the control reactions.

@bdeadman bdeadman self-assigned this Oct 3, 2024
@bdeadman bdeadman requested a review from skearnes October 3, 2024 15:54
@bdeadman
Copy link
Collaborator Author

bdeadman commented Oct 3, 2024

@skearnes - these three haven't been peer reviewed yet.

Copy link

github-actions bot commented Oct 7, 2024

Change summary:

Filename Added Removed Changed
data/8c/ord_dataset-8c36a0c558ab4012a2b049bfc1466988.pb.gz 1440 0 0
data/a8/ord_dataset-a8ad4beb9a6d4cccacd2bbf9272528a3.pb.gz 104 0 0
data/c5/ord_dataset-c50391cfbc4f43efb359199893b293c2.pb.gz 768 0 0
2312 0 0

@skearnes skearnes closed this Oct 7, 2024
@skearnes skearnes reopened this Oct 7, 2024
Copy link

github-actions bot commented Oct 7, 2024

Change summary:

Filename Added Removed Changed
data/8c/ord_dataset-8c36a0c558ab4012a2b049bfc1466988.pb.gz 1440 0 0
data/a8/ord_dataset-a8ad4beb9a6d4cccacd2bbf9272528a3.pb.gz 104 0 0
data/c5/ord_dataset-c50391cfbc4f43efb359199893b293c2.pb.gz 768 0 0
2312 0 0

@skearnes
Copy link
Contributor

skearnes commented Oct 7, 2024

Thanks @bdeadman! Review notes:

Suzuki

  • Check names in the dataset description.
  • There's no addition order 2? I see two inputs with addition order 4 in the template.
  • H2O is listed as 1600 nmol; should that be nL?
  • Should you include the boronate reactant in the product list if it has no associated measurements?
  • You include the row in the analytical data but I think this is already in the reaction identifiers?

Amination

  • Does this dataset only include the optimization set and not the library set? I see two sheets in the SI with 768 and 384 rows.
  • I'm confused about the solvent placeholders; from the SI it looks like NMP was used a lot but I'm not seeing it?

Alkylation

  • I see some unpopulated values, like $Electrophile ID$ in the final reaction.
  • The base input has no SMILES.
  • Typo in "Forumula" in product identifiers.
  • There are multiple products marked with is_desired_product.

Replacement Suzuki dataset updated after review.
Copy link

github-actions bot commented Oct 9, 2024

Change summary:

Filename Added Removed Changed
data/8c/ord_dataset-8c36a0c558ab4012a2b049bfc1466988.pb.gz 0 0 0
data/a8/ord_dataset-a8ad4beb9a6d4cccacd2bbf9272528a3.pb.gz 104 0 0
data/c5/ord_dataset-c50391cfbc4f43efb359199893b293c2.pb.gz 768 0 0
872 0 0

@bdeadman
Copy link
Collaborator Author

bdeadman commented Oct 9, 2024

Suzuki

  • Boronic acid/ester is addition 2, and catalyst is addition 3.
  • The H2O is included in the core stock solution at 4 equivalents level, but the value should be 400 (not 1,600).
  • Removed the boronate from the products. This was possibly my 'working' in the GUI to check SMILES but I must have missed deleting it.
  • Deleted well row from analytical details. This was a late change to make it consistent with the other datasets in this paper.
  • Corrected Time to Tim in dataset description.

Dataset .pb.gz file has been replaced and pushed to this branch.
Files to generate dataset: Cernak_miniature_suzuki_v2.zip

@bdeadman bdeadman closed this Oct 9, 2024
@bdeadman bdeadman reopened this Oct 9, 2024
Copy link

github-actions bot commented Oct 9, 2024

Change summary:

Filename Added Removed Changed
data/3b/ord_dataset-3b8a2ef300e145468579027f206a3ac8.pb.gz 1440 0 0
data/a8/ord_dataset-a8ad4beb9a6d4cccacd2bbf9272528a3.pb.gz 104 0 0
data/c5/ord_dataset-c50391cfbc4f43efb359199893b293c2.pb.gz 768 0 0
2312 0 0

@bdeadman
Copy link
Collaborator Author

bdeadman commented Oct 9, 2024

Reductive Amination

  • Current dataset only includes the optimisation. I will add the library set to the same dataset.
  • Not sure what the problem is here. In the optimisation set half the reactions are run in DMA, and the other half in NMP. The dataset on my PC looks to have the solvents setup. When I add the extra library data we can check again.

Copy link

Change summary:

Filename Added Removed Changed
data/3b/ord_dataset-3b8a2ef300e145468579027f206a3ac8.pb.gz 1440 0 0
data/a8/ord_dataset-a8ad4beb9a6d4cccacd2bbf9272528a3.pb.gz 104 0 0
data/c5/ord_dataset-c50391cfbc4f43efb359199893b293c2.pb.gz 0 0 0
1544 0 0

@bdeadman bdeadman closed this Oct 10, 2024
@bdeadman bdeadman reopened this Oct 10, 2024
Copy link

Change summary:

Filename Added Removed Changed
data/3b/ord_dataset-3b8a2ef300e145468579027f206a3ac8.pb.gz 1440 0 0
data/a8/ord_dataset-a8ad4beb9a6d4cccacd2bbf9272528a3.pb.gz 104 0 0
data/c5/ord_dataset-c50391cfbc4f43efb359199893b293c2.pb.gz 0 0 0
1544 0 0

@bdeadman bdeadman closed this Oct 10, 2024
@bdeadman bdeadman reopened this Oct 10, 2024
Copy link

Change summary:

Filename Added Removed Changed
data/2b/ord_dataset-2be11f57f3304e678ea8469baf6dd1bc.pb.gz 1152 0 0
data/3b/ord_dataset-3b8a2ef300e145468579027f206a3ac8.pb.gz 1440 0 0
data/a8/ord_dataset-a8ad4beb9a6d4cccacd2bbf9272528a3.pb.gz 104 0 0
2696 0 0

Updates made in response to peer review.
@bdeadman
Copy link
Collaborator Author

Alkylation Dataset

  • $Electrophile ID$ had a double space in the spreadsheet. Changed key and column header to Electrophile_ID for clarity.
  • Base SMILES had a few empty cells in the source spreadsheet. Resolved.
  • "Forumula" typo corrected.
  • Changed the deprotected core product to have is_desired_product = false

Replacement source files: alkylation.zip

Copy link

Change summary:

Filename Added Removed Changed
data/2b/ord_dataset-2be11f57f3304e678ea8469baf6dd1bc.pb.gz 1152 0 0
data/3b/ord_dataset-3b8a2ef300e145468579027f206a3ac8.pb.gz 1440 0 0
data/a8/ord_dataset-a8ad4beb9a6d4cccacd2bbf9272528a3.pb.gz 0 0 0
2592 0 0

@bdeadman bdeadman closed this Oct 10, 2024
@bdeadman bdeadman reopened this Oct 10, 2024
Copy link

Change summary:

Filename Added Removed Changed
data/17/ord_dataset-172039a759a440219a68af62d203b79b.pb.gz 104 0 0
data/2b/ord_dataset-2be11f57f3304e678ea8469baf6dd1bc.pb.gz 1152 0 0
data/3b/ord_dataset-3b8a2ef300e145468579027f206a3ac8.pb.gz 1440 0 0
2696 0 0

@bdeadman
Copy link
Collaborator Author

@skearnes I've made the updates and replaced the .pb.gz files. the reaction count looks correct. Let me know if there are any other changes required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants