Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new command "scatter" #265

Closed
VladimirAlexiev opened this issue Feb 7, 2024 · 8 comments
Closed

new command "scatter" #265

VladimirAlexiev opened this issue Feb 7, 2024 · 8 comments

Comments

@VladimirAlexiev
Copy link

VladimirAlexiev commented Feb 7, 2024

Can you add a command that's opposite of "gather"?

I have a file like this:

module    t                  c
-------   ----------------   -
address   Class              1
address   DatatypeProperty   2
address   ObjectProperty     3
agent     DataProperty       4

I want to convert it to:

module    Class   DatatypeProperty    ObjectProperty
-------   -----   ----------------    --------------
address   1       2                   3
agent                                 4

I don't know what would be an appropriate name, maybe "scatter"?
Could be invoked like this

csvtk scatter --key t --value c -f Class,DatatypeProperty,ObjectProperty

where -f is an OPTIONAL list of key values to be used for sorting the output columns.


I don't have a useful case for multiple key or value columns, but I guess that is possible . Eg from

gender education number percent
male   basic
female highschool
...

to something like

male_basic_number male_highschool_number female_basic_number female_highschool_number

Multiple --key make sense only for a very few values in the key columns.

@shenwei356
Copy link
Owner

There's a 'spread': https://bioinf.shenwei.me/csvtk/usage/#spread

@VladimirAlexiev
Copy link
Author

VladimirAlexiev commented Feb 14, 2024

Uh-oh, I had 0.24.
Thanks! After upgrading, I get exactly what I need:

csvtk space2tab test.txt|csvtk spread -t -k t -v c|csvtk pretty -t
module    Class   DatatypeProperty   ObjectProperty
-------   -----   ----------------   --------------
address   1       2                  3
agent             4

@VladimirAlexiev
Copy link
Author

hi @shenwei356 Does it make sense to add scatter as a synonym of spread?
scatter is a better match for gather: it even rhymes :-)

@shenwei356
Copy link
Owner

Not really :). scatter sounds like the scatter plot.

gather and spread are from the R package tidyr. They perform opposite operations.

$ csvtk -h
Commands for Data Transformation:
  fold            fold multiple values of a field into cells of groups
  gather          gather columns into key-value pairs, like tidyr::gather/pivot_longer
  sep             separate column into multiple columns
  spread          spread a key-value pair across multiple columns, like tidyr::spread/pivot_wider
  transpose       transpose CSV data
  unfold          unfold multiple values in cells of a field

@VladimirAlexiev
Copy link
Author

@shenwei356 Yes: gather != spread = scatter.

"spread" and "scatter" mean the same (in this context", and "scatter" rhymes better with "gather".
I don't know tidyr, that's why I guessed there should be "scatter" as the opposite of "gather".

@shenwei356
Copy link
Owner

Does it make sense to add scatter as a synonym of spread?

OK. But spread remains the main name, for consistence with tidyr, a popular R package widely used for table manipulation.

@VladimirAlexiev
Copy link
Author

sure!

@shenwei356
Copy link
Owner

done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants