position_nudge does not work with geom_boxplot #2733

jpasquier · 2018-07-06T06:50:43Z

position_nudge seems to not not work with geom_boxplot

library(ggplot2)
df <- data.frame(x = factor("x"), y = 1:10)
ggplot(df, aes(x = x, y = y)) +
  geom_boxplot(position = position_nudge(x = -0.2, y = 0))
#> Error: position_nudge requires the following missing aesthetics: y

What I want to do eventually, is to place a dotplot next to a boxplot like on this image

(I manage to produce this image with ggplot2. Oddly, this issue does not occur with some data. That is why I am completely lost...)

I use R 3.5.1 with ggplot2 3.0.0 on debian stretch. The reproducible example was made with reprex.

ptoche · 2018-07-06T08:48:33Z

Do you have a reprex of an example that works? In your example you have just one discrete value, does it work if you have two values? Reading the text below made me think that perhaps with one discrete value there isn't enough information to position the nudge (I mean one value does not define a unit, no idea if that's relevant to the code... reading the position-nudge.R, it does look like it just takes a (small) numerical value for x and a value for y. I haven't looked beyond this. Not much useful to say actually.

quoting from https://github.com/tidyverse/ggplot2/blob/master/R/position-nudge.R

#' position_nudge is generally useful for adjusting the position of
#' items on discrete scales by a small amount. Nudging is built in to
#' [geom_text()] because it's so useful for moving labels a small
#' distance from what they're labelling.

jpasquier · 2018-07-06T09:46:42Z

Thank you for your response.

Here, an example where it woks:

library(ggplot2)
df <- data.frame(
  x = factor(c("3m", "1d", "6w", "24m", "preop", "12m", "6m",
               "1d", "3m", "3m", "1d", "3m", "1d", "1d", "3w",
               "3w", "12m", "preop", "3m", "preop"),
             levels = c("preop", "1d", "3w", "6w", "3m", "6m",
                        "12m", "24m")),
  y = c(10, 13, 12, 11, 21, 16, 12, 4, 12, 13, 15, 7, 12,
        15, 10, 16, 9, 18, 14, 30)
)
ggplot(df, aes(x = x, y = y)) +
  geom_boxplot(width = .3, position = position_nudge(x = -0.2, y = 0),
               outlier.shape=NA) +
  geom_dotplot(binaxis = "y", binwidth = 0.5, dotsize = 0.7)

And here a similar example, where it does not work:

library(ggplot2)
df <- data.frame(
  x = factor(c("preop", "3m", "1w", "1d", "1w", "3w", "1d",
               "1d", "1w", "3m", "1d", "1w", "6m", "preop",
               "6m", "6w", "6w", "preop", "3w", "6m"),
               levels = c("preop", "1d", "1w", "3w", "6w",
                          "3m", "6m")),
  y = c(14, 10, 13, 12, 10, 11, 7, 14, 15, 18, 15, 9, 17,
        26, 8, 8, 14, 23, 21, 13)
)
ggplot(df, aes(x = x, y = y)) +
  geom_boxplot(width = .3, position = position_nudge(x = -0.2, y = 0),
               outlier.shape=NA) +
  geom_dotplot(binaxis = "y", binwidth = 0.5, dotsize = 0.7)
#> Error: position_nudge requires the following missing aesthetics: y

It is odd, isn't it ?

(I used a sample of my data)

clauswilke · 2018-07-06T13:18:32Z

I can confirm this. Simplified reprex below. For one data set the y aesthetic makes it all the way into the final data frame and for the other it doesn't. I can't see yet what triggers this.

library(ggplot2)
df1 <- data.frame(
  x = factor(c("3m", "1d", "6w", "24m", "preop", "12m", "6m",
               "1d", "3m", "3m", "1d", "3m", "1d", "1d", "3w",
               "3w", "12m", "preop", "3m", "preop"),
             levels = c("preop", "1d", "3w", "6w", "3m", "6m",
                        "12m", "24m")),
  y = c(10, 13, 12, 11, 21, 16, 12, 4, 12, 13, 15, 7, 12,
        15, 10, 16, 9, 18, 14, 30)
)

df2 <- data.frame(
  x = factor(c("preop", "3m", "1w", "1d", "1w", "3w", "1d",
               "1d", "1w", "3m", "1d", "1w", "6m", "preop",
               "6m", "6w", "6w", "preop", "3w", "6m"),
             levels = c("preop", "1d", "1w", "3w", "6w",
                        "3m", "6m")),
  y = c(14, 10, 13, 12, 10, 11, 7, 14, 15, 18, 15, 9, 17,
        26, 8, 8, 14, 23, 21, 13)
)

layer_data(ggplot(df1, aes(x = x, y = y)) + geom_boxplot())
#>   ymin lower middle upper ymax outliers notchupper notchlower x PANEL
#> 1   18 19.50   21.0 25.50   30            26.47328  15.526719 1     1
#> 2   12 12.00   13.0 15.00   15        4   15.11979  10.880208 2     1
#> 3   10 11.50   13.0 14.50   16            16.35169   9.648314 3     1
#> 4   12 12.00   12.0 12.00   12            12.00000  12.000000 4     1
#> 5    7 10.00   12.0 13.00   14            14.11979   9.880208 5     1
#> 6   12 12.00   12.0 12.00   12            12.00000  12.000000 6     1
#> 7    9 10.75   12.5 14.25   16            16.41030   8.589700 7     1
#> 8   11 11.00   11.0 11.00   11            11.00000  11.000000 8     1
#>   group  y ymin_final ymax_final  xmin  xmax xid newx new_width weight
#> 1     1 NA         18         30 0.625 1.375   1    1      0.75      1
#> 2     2 NA          4         15 1.625 2.375   2    2      0.75      1
#> 3     3 NA         10         16 2.625 3.375   3    3      0.75      1
#> 4     4 12         12         12 3.625 4.375   4    4      0.75      1
#> 5     5 NA          7         14 4.625 5.375   5    5      0.75      1
#> 6     6 12         12         12 5.625 6.375   6    6      0.75      1
#> 7     7 NA          9         16 6.625 7.375   7    7      0.75      1
#> 8     8 11         11         11 7.625 8.375   8    8      0.75      1
#>   colour  fill size alpha shape linetype
#> 1 grey20 white  0.5    NA    19    solid
#> 2 grey20 white  0.5    NA    19    solid
#> 3 grey20 white  0.5    NA    19    solid
#> 4 grey20 white  0.5    NA    19    solid
#> 5 grey20 white  0.5    NA    19    solid
#> 6 grey20 white  0.5    NA    19    solid
#> 7 grey20 white  0.5    NA    19    solid
#> 8 grey20 white  0.5    NA    19    solid
layer_data(ggplot(df2, aes(x = x, y = y)) + geom_boxplot())
#>   ymin lower middle upper ymax outliers notchupper notchlower x PANEL
#> 1   14 18.50   23.0 24.50   26            28.47328  17.526719 1     1
#> 2    7 10.75   13.0 14.25   15            15.76500  10.235000 2     1
#> 3    9  9.75   11.5 13.50   15            14.46250   8.537500 3     1
#> 4   11 13.50   16.0 18.50   21            21.58614  10.413856 4     1
#> 5    8  9.50   11.0 12.50   14            14.35169   7.648314 5     1
#> 6   10 12.00   14.0 16.00   18            18.46891   9.531085 6     1
#> 7    8 10.50   13.0 15.00   17            17.10496   8.895040 7     1
#>   group ymin_final ymax_final  xmin  xmax xid newx new_width weight colour
#> 1     1         14         26 0.625 1.375   1    1      0.75      1 grey20
#> 2     2          7         15 1.625 2.375   2    2      0.75      1 grey20
#> 3     3          9         15 2.625 3.375   3    3      0.75      1 grey20
#> 4     4         11         21 3.625 4.375   4    4      0.75      1 grey20
#> 5     5          8         14 4.625 5.375   5    5      0.75      1 grey20
#> 6     6         10         18 5.625 6.375   6    6      0.75      1 grey20
#> 7     7          8         17 6.625 7.375   7    7      0.75      1 grey20
#>    fill size alpha shape linetype
#> 1 white  0.5    NA    19    solid
#> 2 white  0.5    NA    19    solid
#> 3 white  0.5    NA    19    solid
#> 4 white  0.5    NA    19    solid
#> 5 white  0.5    NA    19    solid
#> 6 white  0.5    NA    19    solid
#> 7 white  0.5    NA    19    solid

ggplot(df1, aes(x = x, y = y)) +
  geom_boxplot(position = position_nudge(x = -0.2))

ggplot(df2, aes(x = x, y = y)) +
  geom_boxplot(position = position_nudge(x = -0.2))
#> Error: position_nudge requires the following missing aesthetics: y

Created on 2018-07-06 by the reprex package (v0.2.0).

ptoche · 2018-07-06T13:41:26Z

Is this related to some na.rm = TRUE that drops y? I notice that even in df1, there are many NAs.

There is a na.rm = TRUE on line 84 of https://github.com/tidyverse/ggplot2/blob/master/R/stat-boxplot.r
but I couldn't say if it's got anything to do with this...

hadley · 2018-07-06T13:43:26Z

@ptoche you can link to a specific line, like so: https://github.com/tidyverse/ggplot2/blob/master/R/stat-boxplot.r#L84 (just click on the line number to get the link in the address bar)

clauswilke · 2018-07-06T13:43:40Z

It seems the difference is whether there is any group with only a single data point. If there is at least one, the final data set has a y column. That column has NAs everywhere but in the rows corresponding to groups with just a single data point. If there is no such group, the final data set does not have a y column.

The bigger question though is whether required_aes = c("x", "y") is appropriate in position_nudge():

ggplot2/R/position-nudge.R

Line 44 in 4f272fe

required_aes = c("x", "y"),

What is required is some y aesthetic, but it could be ymin, ymax, etc.

hadley · 2018-07-06T13:47:27Z

I only imagined position_nudge() working with 0d geoms (like point and text). But it seems reasonable to extend it.

clauswilke · 2018-07-06T14:05:04Z

I think the correct way to fix this is to come up with some way to check for classes of aesthetics in Position$setup_data(), i.e., do we have any of c("x", "xmin", "xmax") rather than exactly "x".

ggplot2/R/position-.r

Lines 51 to 54 in 8922e24

    
           setup_data = function(self, data, params) { 
        
             check_required_aesthetics(self$required_aes, names(data), snake_class(self)) 
        
             data 
        
           },

I note though that position adjustment in the y direction would likely fail because stat_boxplot() creates aesthetics called lower, middle, upper.

clauswilke · 2018-07-06T14:24:14Z

@jpasquier As a workaround for your problem, you can just define a new position adjustment that only goes horizontally.

library(ggplot2)

# horizontal nudge position adjustment
position_hnudge <- function(x = 0) {
  ggproto(NULL, PositionHNudge, x = x)
}

PositionHNudge <- ggproto("PositionHNudge", Position,
  x = 0,
  required_aes = "x",
  setup_params = function(self, data) {
    list(x = self$x)
  },
  compute_layer = function(data, params, panel) {
    transform_position(data, function(x) x + params$x)
  }
)

df <- data.frame(
  x = factor(c("preop", "3m", "1w", "1d", "1w", "3w", "1d",
               "1d", "1w", "3m", "1d", "1w", "6m", "preop",
               "6m", "6w", "6w", "preop", "3w", "6m"),
             levels = c("preop", "1d", "1w", "3w", "6w",
                        "3m", "6m")),
  y = c(14, 10, 13, 12, 10, 11, 7, 14, 15, 18, 15, 9, 17,
        26, 8, 8, 14, 23, 21, 13)
)
ggplot(df, aes(x = x, y = y)) +
  geom_boxplot(width = .3, position = position_hnudge(x = -0.2),
               outlier.shape=NA) +
  geom_dotplot(binaxis = "y", binwidth = 0.5, dotsize = 0.7)

Created on 2018-07-06 by the reprex package (v0.2.0).

hadley · 2018-07-06T14:54:43Z

(@clauswilke this is one place where I think a data frame column might make sense - y could be a data frame with columns min, max, mid etc. I'm not proposing we do this, as it would be a huge change to the internals, but I think it's an interesting thought experiment)

jpasquier · 2018-07-06T15:11:32Z

@clauswilke Thank you very much for the workaround. It solves my problem and allows me to produce the figures I need.

clauswilke · 2018-07-07T16:49:45Z

It bugged me that I didn't understand why the two data frames give different results, so I hunted it down. The relevant code is here:

ggplot2/R/stat-.r

Lines 107 to 117 in 3d022ed

    
           stats <- mapply(function(new, old) { 
        
             if (empty(new)) return(data.frame()) 
        
             unique <- uniquecols(old) 
        
             missing <- !(names(unique) %in% names(new)) 
        
             cbind( 
        
               new, 
        
               unique[rep(1, nrow(new)), missing,drop = FALSE] 
        
             ) 
        
           }, stats, groups, SIMPLIFY = FALSE) 
        
           do.call(plyr::rbind.fill, stats)

Line 109 finds any data columns that are constant within a group, and the rest of the code then copies those data columns from the data frame before the stat transformation to the data frame after the stat transformation. If a group consists of only 1 value, all data columns meet that condition and are copied over. Then, in line 117, if some of those columns are not constant for other groups, they are filled with NAs for those groups. This explains why y is copied over only when some groups have only one value, and why when that happens all groups with more than one y value have NA in the final y column.

I think this code could use some commenting.

clauswilke · 2018-08-29T04:51:15Z

@hadley How do you feel about addressing this issue by introducing horizontal and vertical nudge position adjustments, just like I did in my workaround? And if we do, how should they be called? hnudge, nudge_h, xnudge, nudge_x?

clauswilke · 2018-08-29T05:34:02Z

Actually, right after I wrote this I realized it should be possible to just write different ggproto objects and pick the appropriate one in the position_nudge() constructor based on whether any of x or y are zero. Let me try that.

* make nudging more robust. closes #2733. * add regression tests for position_nudge() * simplify position_nudge, remove required aesthetics

lock · 2019-02-28T19:59:59Z

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

clauswilke added a commit to wilkelab/ggplot2_archive that referenced this issue Aug 29, 2018

make nudging more robust. closes tidyverse#2733.

244af6a

clauswilke mentioned this issue Aug 29, 2018

Make nudging more robust #2874

Merged

clauswilke closed this as completed in #2874 Sep 1, 2018

clauswilke added a commit that referenced this issue Sep 1, 2018

Make nudging more robust (#2874)

07b7457

* make nudging more robust. closes #2733. * add regression tests for position_nudge() * simplify position_nudge, remove required aesthetics

josesho mentioned this issue Oct 15, 2018

position_nudge requires a y aesthetic, does not play well with geoms that don't require y #2940

Closed

lock bot locked and limited conversation to collaborators Feb 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

position_nudge does not work with geom_boxplot #2733

position_nudge does not work with geom_boxplot #2733

jpasquier commented Jul 6, 2018

ptoche commented Jul 6, 2018 •

edited

Loading

jpasquier commented Jul 6, 2018

clauswilke commented Jul 6, 2018 •

edited

Loading

ptoche commented Jul 6, 2018

hadley commented Jul 6, 2018

clauswilke commented Jul 6, 2018 •

edited

Loading

hadley commented Jul 6, 2018

clauswilke commented Jul 6, 2018

clauswilke commented Jul 6, 2018

hadley commented Jul 6, 2018

jpasquier commented Jul 6, 2018

clauswilke commented Jul 7, 2018

clauswilke commented Aug 29, 2018

clauswilke commented Aug 29, 2018

lock bot commented Feb 28, 2019

position_nudge does not work with geom_boxplot #2733

position_nudge does not work with geom_boxplot #2733

Comments

jpasquier commented Jul 6, 2018

ptoche commented Jul 6, 2018 • edited Loading

jpasquier commented Jul 6, 2018

clauswilke commented Jul 6, 2018 • edited Loading

ptoche commented Jul 6, 2018

hadley commented Jul 6, 2018

clauswilke commented Jul 6, 2018 • edited Loading

hadley commented Jul 6, 2018

clauswilke commented Jul 6, 2018

clauswilke commented Jul 6, 2018

hadley commented Jul 6, 2018

jpasquier commented Jul 6, 2018

clauswilke commented Jul 7, 2018

clauswilke commented Aug 29, 2018

clauswilke commented Aug 29, 2018

lock bot commented Feb 28, 2019

ptoche commented Jul 6, 2018 •

edited

Loading

clauswilke commented Jul 6, 2018 •

edited

Loading

clauswilke commented Jul 6, 2018 •

edited

Loading