Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

converting pairwise values into a matrix or semi-matrix #91

Closed
avilella opened this issue Oct 11, 2019 · 5 comments
Closed

converting pairwise values into a matrix or semi-matrix #91

avilella opened this issue Oct 11, 2019 · 5 comments

Comments

@avilella
Copy link

avilella commented Oct 11, 2019

There may already be a solution to this with a combination of summary or gather or collapse but here is a feature request I find myself wanting to do every once in a while:

https://stackoverflow.com/questions/22492767/converting-pairwise-distances-into-a-distance-matrix-in-r

Take a list of pairwise distances and convert into a matrix:

A1  A1  0.90
A1  B1  0.85
A1  C1  0.45
A1  D1  0.96
B1  B1  0.90
B1  C1  0.85
B1  D1  0.56
C1  C1  0.55
C1  D1  0.45
D1  D1  0.90

E.g. below:

       A1      B1      C1      D1
A1    0.90    0.85    0.45    0.96
B1            0.90    0.85    0.56
C1                    0.55    0.45
D1                            0.90
@avilella
Copy link
Author

This is currently what a similar tool to csvtk does for this contingency table generation:

cat data.tsv | datamash crosstab 2,1 unique 3

@shenwei356
Copy link
Owner

I think this case is too special, I mean, maybe there are very few people needing this. And since datamash provides this function, there's no need to reinvent this wheel right now.

@cwarden
Copy link

cwarden commented Dec 30, 2022

I don't think the use case is too unusual. tidyr provides spread as the inverse of gather (now pivot_longer and pivot_wider). The pivot documentation provides example use cases.

@avilella
Copy link
Author

avilella commented Dec 30, 2022 via email

@shenwei356
Copy link
Owner

Implemented:

Your data

$ csvtk spread -Ht -k 2 -v 3 data.tsv \
    |  csvtk pretty -t -S bold

┏━━━━┳━━━━━━┳━━━━━━┳━━━━━━┳━━━━━━┓
┃    ┃ A1   ┃ B1   ┃ C1   ┃ D1   ┃
┣━━━━╋━━━━━━╋━━━━━━╋━━━━━━╋━━━━━━┫
┃ A1 ┃ 0.90 ┃ 0.85 ┃ 0.45 ┃ 0.96 ┃
┣━━━━╋━━━━━━╋━━━━━━╋━━━━━━╋━━━━━━┫
┃ B1 ┃      ┃ 0.90 ┃ 0.85 ┃ 0.56 ┃
┣━━━━╋━━━━━━╋━━━━━━╋━━━━━━╋━━━━━━┫
┃ C1 ┃      ┃      ┃ 0.55 ┃ 0.45 ┃
┣━━━━╋━━━━━━╋━━━━━━╋━━━━━━╋━━━━━━┫
┃ D1 ┃      ┃      ┃      ┃ 0.90 ┃
┗━━━━┻━━━━━━┻━━━━━━┻━━━━━━┻━━━━━━┛

Another example: Shuffled columns:

$ csvtk cut -f 1,4,2,3 testdata/names.csv \
  | csvtk pretty -S simple
----------------------------------------
id   username   first_name   last_name
----------------------------------------
11   rob        Rob          Pike
2    ken        Ken          Thompson
4    gri        Robert       Griesemer
1    abc        Robert       Thompson
NA   123        Robert       Abel
----------------------------------------

data -> gather/longer -> spread/wider. Note that the orders of both rows and columns are kept :)

$ csvtk cut -f 1,4,2,3 testdata/names.csv \
    | csvtk gather -k item -v value -f -1 \
    | csvtk spread -k item -v value \
    | csvtk pretty -S simple
----------------------------------------
id   username   first_name   last_name
----------------------------------------
11   rob        Rob          Pike
2    ken        Ken          Thompson
4    gri        Robert       Griesemer
1    abc        Robert       Thompson
NA   123        Robert       Abel
----------------------------------------

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants