Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate ord-data to ord-schema v0.3.0 #75

Merged
merged 3 commits into from
Mar 8, 2021
Merged

Migrate ord-data to ord-schema v0.3.0 #75

merged 3 commits into from
Mar 8, 2021

Conversation

skearnes
Copy link
Contributor

@skearnes skearnes commented Mar 4, 2021

@skearnes skearnes closed this Mar 4, 2021
@skearnes skearnes reopened this Mar 4, 2021
@skearnes
Copy link
Contributor Author

skearnes commented Mar 5, 2021

@connorcoley I'm going to request an exception to the usual checks here. There's a bug in process_dataset.py with how comparisons are made against gzipped datasets vs. the main branch that is causing the jobs to fail. But even if that bug weren't there, the fact that process_dataset.py validates each dataset twice (once on the way in and once on the way out) means that the job would never finish in the 6 h time limit.

The count_reactions job confirms that the number of reactions hasn't changed, and the (sharded) validate_database jobs are checking that everything in the update validates correctly, so I think we're safe to merge without the process_submission jobs.

@skearnes skearnes marked this pull request as ready for review March 5, 2021 02:15
@connorcoley
Copy link
Contributor

@connorcoley I'm going to request an exception to the usual checks here. There's a bug in process_dataset.py with how comparisons are made against gzipped datasets vs. the main branch that is causing the jobs to fail. But even if that bug weren't there, the fact that process_dataset.py validates each dataset twice (once on the way in and once on the way out) means that the job would never finish in the 6 h time limit.

The count_reactions job confirms that the number of reactions hasn't changed, and the (sharded) validate_database jobs are checking that everything in the update validates correctly, so I think we're safe to merge without the process_submission jobs.

Since the number of reactions is the same and the validation is working, then I'm inclined to agree. I suppose this will be a problem any time we do a migration since process_dataset will far too long given the current size of the database?

@skearnes
Copy link
Contributor Author

skearnes commented Mar 8, 2021

Since the number of reactions is the same and the validation is working, then I'm inclined to agree. I suppose this will be a problem any time we do a migration since process_dataset will far too long given the current size of the database?

Yep, but I think in situations like this where we're not expecting changes the validations should be sufficient.

@skearnes skearnes merged commit b08b91e into main Mar 8, 2021
@skearnes skearnes deleted the migrate branch March 8, 2021 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants