Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integer+Alpha codes parse as numeric #339

Closed
barryrowlingson opened this issue Dec 28, 2015 · 2 comments
Closed

Integer+Alpha codes parse as numeric #339

barryrowlingson opened this issue Dec 28, 2015 · 2 comments

Comments

@barryrowlingson
Copy link

I have a CSV column with area codes that are of the form "13T". They are guessed as numeric, and the trailing char is lost:

> parse_guess(c("13T","13T","21R"))
[1] 13 13 21

Unless there's a zero:

> parse_guess(c("13T","13T","0F"))
[1] "13T" "13T" "0F" 
> parse_guess(c("13T","13T","00F"))
[1] "13T" "13T" "00F"

I'd prefer these ("13T" etc) to be parsed as character. I can't think of a context where this pattern would be numeric. Except in special cases, such as issue #316, where he has 21N, 13.5E etc for coordinates. He gets numerics (but I'm not sure he wants them) except in a column that has a "0N" which returns as character...

more than one trailing alphabetical seems to be enough to trigger character guessing:

> parse_guess(c("13T","13T","10N"))
[1] 13 13 10
> parse_guess(c("13T","13T","10NN"))
[1] "13T"  "13T"  "10NN"

Seems a bit inconsistent...

readr_0.2.2

@benmarwick
Copy link
Contributor

I've run into this problem also. Here's a reproducible example:

df <- data.frame(one = letters[1:10],
                 two = 1:10,
                 three = sapply(1:10, function(i) paste0(i, "L")))
df
##    one two three
## 1    a   1    1L
## 2    b   2    2L
## 3    c   3    3L
## 4    d   4    4L
## 5    e   5    5L
## 6    f   6    6L
## 7    g   7    7L
## 8    h   8    8L
## 9    i   9    9L
## 10   j  10   10L
str(df)
## 'data.frame':    10 obs. of  3 variables:
##  $ one  : Factor w/ 10 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10
##  $ two  : int  1 2 3 4 5 6 7 8 9 10
##  $ three: Factor w/ 10 levels "10L","1L","2L",..: 2 3 4 5 6 7 8 9 10 1

If I use base::read.csv I see the "L" in the data still:

write.csv(df, "df.csv")
df_in <- read.csv("df.csv")
str(df_in)
## 'data.frame':    10 obs. of  4 variables:
##  $ X    : int  1 2 3 4 5 6 7 8 9 10
##  $ one  : Factor w/ 10 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10
##  $ two  : int  1 2 3 4 5 6 7 8 9 10
##  $ three: Factor w/ 10 levels "10L","1L","2L",..: 2 3 4 5 6 7 8 9 10 1

But when I use readr::read_csv, it's coerced to numeric and I lose the "L" from my data, which is quite unexpected:

library(readr)
## Warning: package 'readr' was built under R version 3.2.3
df_in_ <- read_csv("df.csv")
str(df_in_)
## Classes 'tbl_df', 'tbl' and 'data.frame':    10 obs. of  4 variables:
##  $      : int  1 2 3 4 5 6 7 8 9 10
##  $ one  : chr  "a" "b" "c" "d" ...
##  $ two  : int  1 2 3 4 5 6 7 8 9 10
##  $ three: num  1 2 3 4 5 6 7 8 9 10
sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252   
[3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C                      
[5] LC_TIME=English_Australia.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] readr_0.2.2   plyr_1.8.3    Bchron_4.1.2  inline_0.3.14 knitr_1.11   
[6] ktc11_0.1    

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.1      highr_0.5.1      tools_3.2.2      digest_0.6.8    
 [5] mclust_5.1       G2Sd_2.1.5       gtable_0.1.2     lattice_0.20-33 
 [9] shiny_0.12.2     DBI_0.3.1        yaml_2.1.13      parallel_3.2.2  
[13] proto_0.3-10     rJava_0.9-7      coda_0.18-1      stringr_1.0.0   
[17] dplyr_0.4.3      xlsxjars_0.6.1   grid_3.2.2       hdrcde_3.1      
[21] ellipse_0.3-8    R6_2.1.1         rmarkdown_0.9.2  reshape2_1.4.1  
[25] ggplot2_1.0.1    magrittr_1.5     scales_0.3.0     htmltools_0.2.6 
[29] MASS_7.3-43      assertthat_0.1   mime_0.4         xtable_1.8-0    
[33] colorspace_1.2-6 httpuv_1.3.3     xlsx_0.5.7       stringi_1.0-1   
[37] munsell_0.4.2 

@hadley
Copy link
Member

hadley commented Jun 1, 2016

Yeah, this is a duplicate of #316

@hadley hadley closed this as completed Jun 1, 2016
@lock lock bot locked and limited conversation to collaborators Sep 25, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants