Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need some way to specify that you only want certain columns #72

Closed
hadley opened this issue Mar 11, 2015 · 3 comments
Closed

Need some way to specify that you only want certain columns #72

hadley opened this issue Mar 11, 2015 · 3 comments

Comments

@hadley
Copy link
Member

hadley commented Mar 11, 2015

col_types = only("a", "k", "z")

col_types = only("a", "k", z = col_factor(c("a","b")))

??

@HenrikBengtsson
Copy link

Since I'm just starting to look into readr, it might be that I'm unaware of some existing features of the package, so please forgive me if that's the case. If not, have a look at how I do it in R.filesets::readDataFrame(). Maybe you can enhance col_types to support named character vectors as well. Example:

readDataFrame(pathname, colClasses=c("*"="NULL", "(x|y)"="integer", "char"="character"))

Here names(colClasses) specified regular expression matching column names. The "*"=NULL specifies that the default column class should be NULL, i.e. to drop/skip all columns by default, except those specified.

In your case I can imagine something like:

read_tsv(pathname, col_types=list("*"=col_skip(), x=col_integer(), y=col_integer(), char=col_character()))

An alternative is to let an empty name represent the default behavior.

You could also extend col_types to also support:

read_tsv(pathname, col_types=c("*"="_", x="i", y="i", char="c"))

such that it expands to the above list. With empty name for default, you'd have:

read_tsv(pathname, col_types=c("_", x="i", y="i", char="c"))

@hadley
Copy link
Member Author

hadley commented Jun 10, 2015

I'm not a big fan of overloading column names with additional structure. What happens if there is a column called *?

@HenrikBengtsson
Copy link

That's why I proposed the empty-name alternative. Of course, then there could be empty column names as well. Using regular expressions handles it all, but you'd need to escape.

@hadley hadley closed this as completed in ac24e9e Sep 22, 2015
@lock lock bot locked and limited conversation to collaborators Sep 25, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants