-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vendored scripts #1
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wish the auto-generated messages (f9d6407, ee78e39) had more information on the remote. If I'm reading the git history alone, it's not clear where these come from. Suggestion: Modify one of the auto-generated messages to include the remote URL, https://github.com/nextstrain/ingest. I'm not sure which is best since the remote commit hash is in f9d6407, but only the merge commit ee78e39 will show up as the one making changes. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
# ingest | ||
|
||
Shared internal tooling for pathogen data ingest. Used by our individual | ||
pathogen repos which produce Nextstrain builds. Expected to be vendored by | ||
each pathogen repo using `git subtree` (or `git subrepo`). | ||
|
||
Some tools may only live here temporarily before finding a permanent home in | ||
`augur curate` or Nextstrain CLI. Others may happily live out their days here. | ||
|
||
## History | ||
|
||
Much of this tooling originated in | ||
[ncov-ingest](https://github.com/nextstrain/ncov-ingest) and was passaged thru | ||
[monkeypox's ingest/](https://github.com/nextstrain/monkeypox/tree/@/ingest/). | ||
It subsequently proliferated from [monkeypox][] to other pathogen repos | ||
([rsv][], [zika][], [dengue][], [hepatitisB][], [forecasts-ncov][]) primarily | ||
thru copying. To [counter that | ||
proliferation](https://bedfordlab.slack.com/archives/C7SDVPBLZ/p1688577879947079), | ||
this repo was made. | ||
|
||
[monkeypox]: https://github.com/nextstrain/monkeypox | ||
[rsv]: https://github.com/nextstrain/rsv | ||
[zika]: https://github.com/nextstrain/zika/pull/24 | ||
[dengue]: https://github.com/nextstrain/dengue/pull/10 | ||
[hepatitisB]: https://github.com/nextstrain/hepatitisB | ||
[forecasts-ncov]: https://github.com/nextstrain/forecasts-ncov | ||
|
||
## Elsewhere | ||
|
||
The creation of this repo, in both the abstract and concrete, and the general | ||
approach to "ingest" has been discussed in various internal places, including: | ||
|
||
- https://github.com/nextstrain/private/issues/59 | ||
- @joverlee521's [workflows document](https://docs.google.com/document/d/1rLWPvEuj0Ayc8MR0O1lfRJZfj9av53xU38f20g8nU_E/edit#heading=h.4g0d3mjvb89i) | ||
- [5 July 2023 Slack thread](https://bedfordlab.slack.com/archives/C7SDVPBLZ/p1688577879947079) | ||
- [6 July 2023 team meeting](https://docs.google.com/document/d/1FPfx-ON5RdqL2wyvODhkrCcjgOVX3nlXgBwCPhIEsco/edit) | ||
- _…many others_ |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,18 +5,6 @@ | |
any additional transformations on top of the user curations. | ||
""" | ||
|
||
""" | ||
Copied from https://github.com/nextstrain/monkeypox/blob/62fca491c6775573ad036eedf34b271b4952f2c2/ingest/bin/apply-geolocation-rules | ||
with two changes: | ||
Comment on lines
-9
to
-10
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
A comment is used to describe the pathogen-specific changes, when those might be better described by git history. Suggestion: Put these changes in a separate commit. I'd expect something like one commit to add the original contents from There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, I just realized that 7d507f7 is a direct copy of nextstrain/ingest@5afbcc7 and these aren't pathogen-specific changes. You can disregard this suggestion then. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yeah, or perhaps the other way around! The ingest commit was created by pushing the subtree from this (hbv) repo. In hindsight, probably not an approach I'd reach for again, but it was a good learning exercise. |
||
|
||
First change allows missing fields in the input ndjson | ||
- annotated_values = transform_geolocations(geolocation_rules, [record.[field] for field in location_fields]) | ||
+ annotated_values = transform_geolocations(geolocation_rules, [record.get(field, '') for field in location_fields]) | ||
|
||
Second change allows blank lines in the location-rules TSV | ||
- if line.lstrip()[0] == '#': | ||
+ if line.strip()=="" or line.lstrip()[0] == '#': | ||
""" | ||
|
||
import argparse | ||
import json | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestions:
git subtree pull