Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tables that are also vectors are not printed as tables #28

Closed
piever opened this issue Jan 1, 2020 · 11 comments
Closed

Tables that are also vectors are not printed as tables #28

piever opened this issue Jan 1, 2020 · 11 comments

Comments

@piever
Copy link

piever commented Jan 1, 2020

Thanks for a very nice package! I've tried to use this with StructArrays but ran into the following limitation. A StructArray is a table but also an AbstractVector, and the same holds for TypedTables, and even Vector{NamedTuple} (which again is a table in the Tables.istable sense). The issue is that pretty_table uses the AbstractVector fallback, giving the following:

julia> using PrettyTables, StructArrays

julia> pretty_table(StructArray(a=1:2, b=1:2))
┌────────────────┐
│         Col. 1 │
├────────────────┤
│ (a = 1, b = 1) │
│ (a = 2, b = 2) │
└────────────────┘

Would it be possible to check for Tables.istable before going for the AbstractVector fallback?

@piever piever changed the title Tables that are also vectors Tables that are also vectors are not printed as tables Jan 1, 2020
@ronisbr
Copy link
Owner

ronisbr commented Jan 5, 2020

Hi @piever !

Thanks :) I am glad the package is being useful.

Hum, I am a little lost here. Today, PrettyTables.jl prints data that are of type AbstractVecOrMat as usual (like vectors or matrices). It also supports Dict. If the type is not any of those, then we check if the variable supports Tables.jl API and, if so, use it to print the data.

In your proposal, the very first thing will be to check if the table supports the Tables.jl API. This will lead to a huge change in PrettyTables.jl because the way a variable is printed can be changed by simply defining the function Tables.istable.

I think I prefer to always go for the variable super type. If it is AbstractVecOrMat, then print it as a vector or matrix. This is simple and straightforward. What we can do is to define a way to force the function to use the Tables.jl API for such edge cases. What do you think?

@piever
Copy link
Author

piever commented Jan 5, 2020

In your proposal, the very first thing will be to check if the table supports the Tables.jl API. This will lead to a huge change in PrettyTables.jl because the way a variable is printed can be changed by simply defining the function Tables.istable.

I imagine in practice the main difference should be deprecating pretty_table(v::AbstractVector, :name) in favor of pretty_table((name = v,)), or pretty_table([v], [:name]) as in DataFrames. This would be in line with, e.g., the DataFrame constructor:

julia> DataFrame(rand(10))
ERROR: ArgumentError: 'Array{Float64,1}' iterates 'Float64' values, which don't satisfy the Tables.jl Row-iterator interface
Stacktrace:
 [1] invalidtable(::Array{Float64,1}, ::Float64) at /home/pietro/.julia/packages/Tables/FXXeK/src/tofromdatavalues.jl:34
 [2] iterate at /home/pietro/.julia/packages/Tables/FXXeK/src/tofromdatavalues.jl:40 [inlined]
 [3] buildcolumns at /home/pietro/.julia/packages/Tables/FXXeK/src/fallbacks.jl:147 [inlined]
 [4] columns at /home/pietro/.julia/packages/Tables/FXXeK/src/fallbacks.jl:178 [inlined]
 [5] #DataFrame#451(::Bool, ::Type{DataFrame}, ::Array{Float64,1}) at /home/pietro/.julia/packages/DataFrames/uPgZV/src/other/tables.jl:32
 [6] DataFrame(::Array{Float64,1}) at /home/pietro/.julia/packages/DataFrames/uPgZV/src/other/tables.jl:23
 [7] top-level scope at REPL[24]:1

julia> DataFrame((a=rand(10),)) # a column table
10×1 DataFrame
│ Row │ a        │
│     │ Float64  │
├─────┼──────────┤
│ 10.576454 │
│ 20.640508 │
│ 30.245764 │
│ 40.429384 │
│ 50.762505 │
│ 60.934582 │
│ 70.587146 │
│ 80.629278 │
│ 90.16324  │
│ 100.954647 │

julia> DataFrame([rand(10)], [:a])
10×1 DataFrame
│ Row │ a        │
│     │ Float64  │
├─────┼──────────┤
│ 10.855044 │
│ 20.984927 │
│ 30.923934 │
│ 40.423184 │
│ 50.641871 │
│ 60.887677 │
│ 70.839145 │
│ 80.178055 │
│ 90.552524 │
│ 100.264694

I think I prefer to always go for the variable super type. If it is AbstractVecOrMat, then print it as a vector or matrix. This is simple and straightforward. What we can do is to define a way to force the function to use the Tables.jl API for such edge cases. What do you think?

I confess my personal preference is to be fully Tables.jl compliant. That is to say, if a type declares that it is a table, setting Tables.istable(::Type{MyTable}) = true, then IMO one should treat it as such. There are generic implementation to get the columns (with names) as a NamedTuple, doing Tables.columntable(t) for any t for which Tables.istable(t) = true.

OT: this is a bit more precise than Tables.matrix and Tables.schema, as there can be tables that have names, but those names are only discovered upon collection, and not encoded in the schema, for example t = ((a=i, b=i+1) for i in 1:10).

What concerns me is that, with the current implementation, changing table format affects the output:

julia> using StructArrays

julia> s = StructArray(a=rand(10), b=rand(10));

julia> pretty_table(Tables.rowtable(s))
┌───────────────────────────────────────────────────┐
│                                            Col. 1 │
├───────────────────────────────────────────────────┤
│  (a = 0.7583714543331594, b = 0.4024629535770552) │
│  (a = 0.8674712007442931, b = 0.6351637553145835) │
│ (a = 0.2384485960958933, b = 0.09565612336993445) │
│ (a = 0.21514157556662372, b = 0.7297000027493539) │
│  (a = 0.9520885500604304, b = 0.4858508060544866) │
│  (a = 0.8562978877759044, b = 0.5312473904578003) │
│  (a = 0.5723581430865925, b = 0.4232925679936128) │
│   (a = 0.5698941317352904, b = 0.759946202174999) │
│  (a = 0.2798279533785477, b = 0.9416218026500813) │
│ (a = 0.04355598866286026, b = 0.6851099552033448) │
└───────────────────────────────────────────────────┘

julia> pretty_table(Tables.columntable(s))
┌─────────────────────┬─────────────────────┐
│                   a │                   b │
│             Float64 │             Float64 │
├─────────────────────┼─────────────────────┤
│  0.75837145433315940.4024629535770552 │
│  0.86747120074429310.6351637553145835 │
│  0.23844859609589330.09565612336993445 │
│ 0.215141575566623720.7297000027493539 │
│  0.95208855006043040.4858508060544866 │
│  0.85629788777590440.5312473904578003 │
│  0.57235814308659250.4232925679936128 │
│  0.56989413173529040.759946202174999 │
│  0.27982795337854770.9416218026500813 │
│ 0.043555988662860260.6851099552033448 │
└─────────────────────┴─────────────────────┘

What we can do is to define a way to force the function to use the Tables.jl API for such edge cases. What do you think?

I actually consider the Tables.istable function to be that. Being an AbstractVector of rows for a table can be quite natural (as in rowtable, TypedTable, and StructVector), and is in general more of a technical consideration (see for example JuliaData/IndexedTables.jl#214), whereas declaring to be a Tables.istable states that the data can be interpreted as a table in a specific way.

Some cases, like the DataFrame constructor, go a bit further and "try their luck", even if they can't tell for sure that the input is a table, e.g.:

julia> DataFrame((a=i, b=i+1) for i in 1:10)
10×2 DataFrame
│ Row │ a     │ b     │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 112     │
│ 223     │
│ 334     │
│ 445     │
│ 556     │
│ 667     │
│ 778     │
│ 889     │
│ 9910    │
│ 101011

That would also be a possibility here.

@ronisbr
Copy link
Owner

ronisbr commented Jan 5, 2020

Ok I understood your point.

I thought about a good way to do this, but it involves using a bunch of if’s instead of relying in multiple dispatch. This can work but I need to do some tests. The ideia is to have only one signature of the print function that will check the type of the variable being printed. In this case, the very first thing will see if it is Tables compliant. If not, then we will check the other supported types. Finally, we can throw an error if the type is not support.

Is it good?

@piever
Copy link
Author

piever commented Jan 5, 2020

Thanks for considering the proposal!

I think you are right that it needs to be done with an if check. Maybe this logic can be factored out in an external function, such as:

julia> using Tables: istable, columntable

julia> function matrix_and_header(t)
           if istable(t)
               nt = columntable(t)
               return hcat(nt...), collect(keys(nt))
           elseif t isa AbstractVecOrMat
               return t, [Symbol("Col $i") for i in 1:size(t, 2)]
           else
               error("Only table, vectors, and matrices supported!")
           end
       end

so that external types can overload it if necessary.

@ronisbr
Copy link
Owner

ronisbr commented Jan 5, 2020

@piever I did an initial proposal in the branch tables. Can you please test?

@piever
Copy link
Author

piever commented Jan 6, 2020

Thanks again! It seems to work fine for my StructArray usecase. I guess a good test (to avoid adding extra testing dependencies) would be to check that pretty_table(df), pretty_table(Tables.rowtable(df)), and pretty_table(Tables.columntable(df)) all give the same output for df::DataFrame.

@ablaom
Copy link

ablaom commented Jan 8, 2020

There still seems to be an issue:

julia> X = (x=rand(3), y=rand(3))
(x = [0.128352, 0.94915, 0.0852043], y = [0.868149, 0.0826012, 0.355312])

Then pretty_tables(X) throws an error, although, Tables.istable(X) == true. Furthermore:

julia> V = Tables.rowtable(X)
3-element Array{NamedTuple{(:x, :y),Tuple{Float64,Float64}},1}:
 (x = 0.12835211602619134, y = 0.8681491602242126)
 (x = 0.9491499588629446, y = 0.08260115451349814)
 (x = 0.0852042807808937, y = 0.35531185121618947)

julia> using PrettyTables

julia> pretty_table(V)
┌───────────────────────────────────────────────────┐
│                                            Col. 1 │
├───────────────────────────────────────────────────┤
│ (x = 0.12835211602619134, y = 0.8681491602242126) │
│ (x = 0.9491499588629446, y = 0.08260115451349814) │
│ (x = 0.0852042807808937, y = 0.35531185121618947) │
└───────────────────────────────────────────────────┘

julia> Tables.istable(V)
true

julia> versioninfo()
Julia Version 1.1.1
Commit 55e36cc (2019-05-16 04:10 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin15.6.0)
  CPU: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
  JULIA_PATH = /Applications/Julia-1.1.app/Contents/Resources/julia/bin/julia

julia> Pkg.installed()
Dict{String,Union{Nothing, VersionNumber}} with 2 entries:
  "Tables"       => v"0.2.11"
  "PrettyTables" => v"0.8.0"

@ronisbr
Copy link
Owner

ronisbr commented Jan 8, 2020

@ablaom are you sure that you are using the branch tables? I did not commit the fixes yet and this is working using that branch:

julia> X = (x=rand(3), y=rand(3))
(x = [0.04964261477073162, 0.8212012992612663, 0.20400748817965408], y = [0.5621636055315691, 0.14886525225444225, 0.42568071532721796])

julia> pretty_table(X)
┌─────────────────────┬─────────────────────┐
│                   x │                   y │
│             Float64 │             Float64 │
├─────────────────────┼─────────────────────┤
│ 0.049642614770731620.5621636055315691 │
│  0.82120129926126630.14886525225444225 │
│ 0.204007488179654080.42568071532721796 │
└─────────────────────┴─────────────────────┘

@ablaom
Copy link

ablaom commented Jan 8, 2020

Well, I can't explain that but PrettyTables#branch is working for me now on both tables.

Thanks for your time!

@ronisbr
Copy link
Owner

ronisbr commented Jan 12, 2020

@piever well, seems to be working :)

julia> using PrettyTables

julia> using Tables

julia> df = DataFrame(x=1:3, y='a':'c');

julia> pretty_table(df)
┌───────┬──────┐
│     x │    y │
│ Int64 │ Char │
├───────┼──────┤
│     1 │    a │
│     2 │    b │
│     3 │    c │
└───────┴──────┘

julia> pretty_table(Tables.rowtable(df))
┌───────┬──────┐
│     x │    y │
│ Int64 │ Char │
├───────┼──────┤
│     1 │    a │
│     2 │    b │
│     3 │    c │
└───────┴──────┘

julia> pretty_table(Tables.columntable(df))
┌───────┬──────┐
│     x │    y │
│ Int64 │ Char │
├───────┼──────┤
│     1 │    a │
│     2 │    b │
│     3 │    c │
└───────┴──────┘

@ronisbr
Copy link
Owner

ronisbr commented Jan 25, 2020

Done! During my tests, I did not find any regressions. Hence, if the tests here went fine, then I will tag a new version today.

ronisbr added a commit that referenced this issue Jan 30, 2020
The Tables.jl API is not the priority when checking if a variable
can be printed as a tables.

This fixes the problem related to `StructArray`. This type is an
`AbstractVecOrMat` and also complies with Tables.jl API. However,
the old version gave preference to the `AbstractVecOrMat` type,
leading to a strange printing.

Closes #28
ronisbr added a commit that referenced this issue Jan 30, 2020
The Tables.jl API is not the priority when checking if a variable
can be printed as a tables.

This fixes the problem related to `StructArray`. This type is an
`AbstractVecOrMat` and also complies with Tables.jl API. However,
the old version gave preference to the `AbstractVecOrMat` type,
leading to a strange printing.

Closes #28
ronisbr added a commit that referenced this issue Jan 30, 2020
The Tables.jl API is not the priority when checking if a variable
can be printed as a tables.

This fixes the problem related to `StructArray`. This type is an
`AbstractVecOrMat` and also complies with Tables.jl API. However,
the old version gave preference to the `AbstractVecOrMat` type,
leading to a strange printing.

Closes #28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants