Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Columns are read even if not mentioned in col_types in read_tsv #132

Closed
anders-sjogren opened this issue Apr 16, 2015 · 6 comments
Closed

Comments

@anders-sjogren
Copy link

From ?read_tsv:

col_types
[...]
If a list, it must contain one "collector" for each column. If you only want to read a subset of the columns, you can use a named list (where the names give the column names). If a column is not mentioned by name, it will not be included in the output. [...]

Bug reproduction steps

read_tsv("1\t2\n3\t4\n",col_names=c("X","Y"),col_types=list("X"=col_integer()))

Expected result

  X
1 1
2 3

Observed result

  X Y
1 1 2
2 3 4

Sidenote: read_tsv("1\t2\n3\t4\n",col_names=c("X","Y"),col_types=c("i_")) yields the correct result.

@anders-sjogren
Copy link
Author

A workaround is to define

makeColTypesString<-function(colTypes,allColNames) {
  getColChar<-function(collector){
    if(is(collector,"collector_double")) return("d");
    if(is(collector,"collector_integer")) return("i");
    if(is(collector,"collector_character")) return("c");
    if(is(collector,"collector_logical")) return("l");
  }
  chars = sapply(allColNames, function(x){
    if(x %in% names(colTypes)) getColChar(colTypes[[x]]) else "_";
  })
  paste(chars,collapse="")
}

and to use it as

cnames=c("X","Y")
ctypes=list("X"=col_integer())
read_tsv("1\t2\n3\t4\n",col_names=cnames,col_types=makeColTypesString(ctypes,cnames))

@hadley
Copy link
Member

hadley commented Apr 16, 2015

Oops, yes, that's a documentation error. See also #72

@anders-sjogren
Copy link
Author

Ok. IMHO: if it'd work as documented, I think it'd be neat.

@hadley
Copy link
Member

hadley commented Apr 16, 2015

That's what I had originally planned, but it's a huge pain if you just want to adjust the type of one column. That seems to be more common in practice - but I think it should be possible to have a solution where you just replace list() with something like only() and then you only get the specified cols

@anders-sjogren
Copy link
Author

Is there some ignore collector, that one could use to express that one would want to ignore that column (cf. "Ignore Token Processing" on http://www.codeproject.com/Articles/23198/C-String-Toolkit-StrTk-Tokenizer)

@anders-sjogren
Copy link
Author

but I think it should be possible to have a solution where you just replace list() with something like only() and then you only get those cols.

That seems like a good idea.

@hadley hadley closed this as completed in f330023 Apr 16, 2015
@lock lock bot locked and limited conversation to collaborators Sep 25, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants