-
-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
grep max-count option #77
Comments
I thought it has a flag like
It would gain some speed improvement, without using extra data structure. |
Yeah that would cover the most frequent (IMHO) use case of |
I'll add this weekend |
It works for "normal" and regex patterns now: |
Many thanks. I got a slight runtime improvement. Not much because all hits were near the end of the file. |
It would be nice if
csvtk grep
would support a-m NUM, --max-count=NUM
parameter like the usual grep utility. The rational behind this is performance improvements.I'm repeatedly looking for unique values (stored in a file) in a column of a very large database containing only unique values (primary keys). It takes some time to go over the hole database and I believe if I could provide something like
--max-count=1
it would ran faster. But that requires a a map for the querying values to there occurences already found.Here is the command I use:
pigz -dc prot.accession2taxid.gz | csvtk grep -t -f accession.version -P some_accessions.txt | csvtk cut -t -f accession.version,taxid
prot.accession2taxid.gz
is several gigabases large.The text was updated successfully, but these errors were encountered: