Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

position_nudge does not work with geom_boxplot #2733

Closed
jpasquier opened this issue Jul 6, 2018 · 15 comments · Fixed by #2874
Closed

position_nudge does not work with geom_boxplot #2733

jpasquier opened this issue Jul 6, 2018 · 15 comments · Fixed by #2874

Comments

@jpasquier
Copy link

position_nudge seems to not not work with geom_boxplot

library(ggplot2)
df <- data.frame(x = factor("x"), y = 1:10)
ggplot(df, aes(x = x, y = y)) +
  geom_boxplot(position = position_nudge(x = -0.2, y = 0))
#> Error: position_nudge requires the following missing aesthetics: y

What I want to do eventually, is to place a dotplot next to a boxplot like on this image
tmp
(I manage to produce this image with ggplot2. Oddly, this issue does not occur with some data. That is why I am completely lost...)

I use R 3.5.1 with ggplot2 3.0.0 on debian stretch. The reproducible example was made with reprex.

@ptoche
Copy link

ptoche commented Jul 6, 2018

Do you have a reprex of an example that works? In your example you have just one discrete value, does it work if you have two values? Reading the text below made me think that perhaps with one discrete value there isn't enough information to position the nudge (I mean one value does not define a unit, no idea if that's relevant to the code... reading the position-nudge.R, it does look like it just takes a (small) numerical value for x and a value for y. I haven't looked beyond this. Not much useful to say actually.

quoting from https://github.com/tidyverse/ggplot2/blob/master/R/position-nudge.R

#' position_nudge is generally useful for adjusting the position of
#' items on discrete scales by a small amount. Nudging is built in to
#' [geom_text()] because it's so useful for moving labels a small
#' distance from what they're labelling.

@jpasquier
Copy link
Author

Thank you for your response.

Here, an example where it woks:

library(ggplot2)
df <- data.frame(
  x = factor(c("3m", "1d", "6w", "24m", "preop", "12m", "6m",
               "1d", "3m", "3m", "1d", "3m", "1d", "1d", "3w",
               "3w", "12m", "preop", "3m", "preop"),
             levels = c("preop", "1d", "3w", "6w", "3m", "6m",
                        "12m", "24m")),
  y = c(10, 13, 12, 11, 21, 16, 12, 4, 12, 13, 15, 7, 12,
        15, 10, 16, 9, 18, 14, 30)
)
ggplot(df, aes(x = x, y = y)) +
  geom_boxplot(width = .3, position = position_nudge(x = -0.2, y = 0),
               outlier.shape=NA) +
  geom_dotplot(binaxis = "y", binwidth = 0.5, dotsize = 0.7)

tmp2

And here a similar example, where it does not work:

library(ggplot2)
df <- data.frame(
  x = factor(c("preop", "3m", "1w", "1d", "1w", "3w", "1d",
               "1d", "1w", "3m", "1d", "1w", "6m", "preop",
               "6m", "6w", "6w", "preop", "3w", "6m"),
               levels = c("preop", "1d", "1w", "3w", "6w",
                          "3m", "6m")),
  y = c(14, 10, 13, 12, 10, 11, 7, 14, 15, 18, 15, 9, 17,
        26, 8, 8, 14, 23, 21, 13)
)
ggplot(df, aes(x = x, y = y)) +
  geom_boxplot(width = .3, position = position_nudge(x = -0.2, y = 0),
               outlier.shape=NA) +
  geom_dotplot(binaxis = "y", binwidth = 0.5, dotsize = 0.7)
#> Error: position_nudge requires the following missing aesthetics: y

It is odd, isn't it ?

(I used a sample of my data)

@clauswilke
Copy link
Member

clauswilke commented Jul 6, 2018

I can confirm this. Simplified reprex below. For one data set the y aesthetic makes it all the way into the final data frame and for the other it doesn't. I can't see yet what triggers this.

library(ggplot2)
df1 <- data.frame(
  x = factor(c("3m", "1d", "6w", "24m", "preop", "12m", "6m",
               "1d", "3m", "3m", "1d", "3m", "1d", "1d", "3w",
               "3w", "12m", "preop", "3m", "preop"),
             levels = c("preop", "1d", "3w", "6w", "3m", "6m",
                        "12m", "24m")),
  y = c(10, 13, 12, 11, 21, 16, 12, 4, 12, 13, 15, 7, 12,
        15, 10, 16, 9, 18, 14, 30)
)

df2 <- data.frame(
  x = factor(c("preop", "3m", "1w", "1d", "1w", "3w", "1d",
               "1d", "1w", "3m", "1d", "1w", "6m", "preop",
               "6m", "6w", "6w", "preop", "3w", "6m"),
             levels = c("preop", "1d", "1w", "3w", "6w",
                        "3m", "6m")),
  y = c(14, 10, 13, 12, 10, 11, 7, 14, 15, 18, 15, 9, 17,
        26, 8, 8, 14, 23, 21, 13)
)

layer_data(ggplot(df1, aes(x = x, y = y)) + geom_boxplot())
#>   ymin lower middle upper ymax outliers notchupper notchlower x PANEL
#> 1   18 19.50   21.0 25.50   30            26.47328  15.526719 1     1
#> 2   12 12.00   13.0 15.00   15        4   15.11979  10.880208 2     1
#> 3   10 11.50   13.0 14.50   16            16.35169   9.648314 3     1
#> 4   12 12.00   12.0 12.00   12            12.00000  12.000000 4     1
#> 5    7 10.00   12.0 13.00   14            14.11979   9.880208 5     1
#> 6   12 12.00   12.0 12.00   12            12.00000  12.000000 6     1
#> 7    9 10.75   12.5 14.25   16            16.41030   8.589700 7     1
#> 8   11 11.00   11.0 11.00   11            11.00000  11.000000 8     1
#>   group  y ymin_final ymax_final  xmin  xmax xid newx new_width weight
#> 1     1 NA         18         30 0.625 1.375   1    1      0.75      1
#> 2     2 NA          4         15 1.625 2.375   2    2      0.75      1
#> 3     3 NA         10         16 2.625 3.375   3    3      0.75      1
#> 4     4 12         12         12 3.625 4.375   4    4      0.75      1
#> 5     5 NA          7         14 4.625 5.375   5    5      0.75      1
#> 6     6 12         12         12 5.625 6.375   6    6      0.75      1
#> 7     7 NA          9         16 6.625 7.375   7    7      0.75      1
#> 8     8 11         11         11 7.625 8.375   8    8      0.75      1
#>   colour  fill size alpha shape linetype
#> 1 grey20 white  0.5    NA    19    solid
#> 2 grey20 white  0.5    NA    19    solid
#> 3 grey20 white  0.5    NA    19    solid
#> 4 grey20 white  0.5    NA    19    solid
#> 5 grey20 white  0.5    NA    19    solid
#> 6 grey20 white  0.5    NA    19    solid
#> 7 grey20 white  0.5    NA    19    solid
#> 8 grey20 white  0.5    NA    19    solid
layer_data(ggplot(df2, aes(x = x, y = y)) + geom_boxplot())
#>   ymin lower middle upper ymax outliers notchupper notchlower x PANEL
#> 1   14 18.50   23.0 24.50   26            28.47328  17.526719 1     1
#> 2    7 10.75   13.0 14.25   15            15.76500  10.235000 2     1
#> 3    9  9.75   11.5 13.50   15            14.46250   8.537500 3     1
#> 4   11 13.50   16.0 18.50   21            21.58614  10.413856 4     1
#> 5    8  9.50   11.0 12.50   14            14.35169   7.648314 5     1
#> 6   10 12.00   14.0 16.00   18            18.46891   9.531085 6     1
#> 7    8 10.50   13.0 15.00   17            17.10496   8.895040 7     1
#>   group ymin_final ymax_final  xmin  xmax xid newx new_width weight colour
#> 1     1         14         26 0.625 1.375   1    1      0.75      1 grey20
#> 2     2          7         15 1.625 2.375   2    2      0.75      1 grey20
#> 3     3          9         15 2.625 3.375   3    3      0.75      1 grey20
#> 4     4         11         21 3.625 4.375   4    4      0.75      1 grey20
#> 5     5          8         14 4.625 5.375   5    5      0.75      1 grey20
#> 6     6         10         18 5.625 6.375   6    6      0.75      1 grey20
#> 7     7          8         17 6.625 7.375   7    7      0.75      1 grey20
#>    fill size alpha shape linetype
#> 1 white  0.5    NA    19    solid
#> 2 white  0.5    NA    19    solid
#> 3 white  0.5    NA    19    solid
#> 4 white  0.5    NA    19    solid
#> 5 white  0.5    NA    19    solid
#> 6 white  0.5    NA    19    solid
#> 7 white  0.5    NA    19    solid

ggplot(df1, aes(x = x, y = y)) +
  geom_boxplot(position = position_nudge(x = -0.2))

ggplot(df2, aes(x = x, y = y)) +
  geom_boxplot(position = position_nudge(x = -0.2))
#> Error: position_nudge requires the following missing aesthetics: y

Created on 2018-07-06 by the reprex package (v0.2.0).

@ptoche
Copy link

ptoche commented Jul 6, 2018

Is this related to some na.rm = TRUE that drops y? I notice that even in df1, there are many NAs.

There is a na.rm = TRUE on line 84 of https://github.com/tidyverse/ggplot2/blob/master/R/stat-boxplot.r
but I couldn't say if it's got anything to do with this...

@hadley
Copy link
Member

hadley commented Jul 6, 2018

@ptoche you can link to a specific line, like so: https://github.com/tidyverse/ggplot2/blob/master/R/stat-boxplot.r#L84 (just click on the line number to get the link in the address bar)

@clauswilke
Copy link
Member

clauswilke commented Jul 6, 2018

It seems the difference is whether there is any group with only a single data point. If there is at least one, the final data set has a y column. That column has NAs everywhere but in the rows corresponding to groups with just a single data point. If there is no such group, the final data set does not have a y column.

The bigger question though is whether required_aes = c("x", "y") is appropriate in position_nudge():

required_aes = c("x", "y"),

What is required is some y aesthetic, but it could be ymin, ymax, etc.

@hadley
Copy link
Member

hadley commented Jul 6, 2018

I only imagined position_nudge() working with 0d geoms (like point and text). But it seems reasonable to extend it.

@clauswilke
Copy link
Member

I think the correct way to fix this is to come up with some way to check for classes of aesthetics in Position$setup_data(), i.e., do we have any of c("x", "xmin", "xmax") rather than exactly "x".

ggplot2/R/position-.r

Lines 51 to 54 in 8922e24

setup_data = function(self, data, params) {
check_required_aesthetics(self$required_aes, names(data), snake_class(self))
data
},

I note though that position adjustment in the y direction would likely fail because stat_boxplot() creates aesthetics called lower, middle, upper.

@clauswilke
Copy link
Member

@jpasquier As a workaround for your problem, you can just define a new position adjustment that only goes horizontally.

library(ggplot2)

# horizontal nudge position adjustment
position_hnudge <- function(x = 0) {
  ggproto(NULL, PositionHNudge, x = x)
}

PositionHNudge <- ggproto("PositionHNudge", Position,
  x = 0,
  required_aes = "x",
  setup_params = function(self, data) {
    list(x = self$x)
  },
  compute_layer = function(data, params, panel) {
    transform_position(data, function(x) x + params$x)
  }
)

df <- data.frame(
  x = factor(c("preop", "3m", "1w", "1d", "1w", "3w", "1d",
               "1d", "1w", "3m", "1d", "1w", "6m", "preop",
               "6m", "6w", "6w", "preop", "3w", "6m"),
             levels = c("preop", "1d", "1w", "3w", "6w",
                        "3m", "6m")),
  y = c(14, 10, 13, 12, 10, 11, 7, 14, 15, 18, 15, 9, 17,
        26, 8, 8, 14, 23, 21, 13)
)
ggplot(df, aes(x = x, y = y)) +
  geom_boxplot(width = .3, position = position_hnudge(x = -0.2),
               outlier.shape=NA) +
  geom_dotplot(binaxis = "y", binwidth = 0.5, dotsize = 0.7)

Created on 2018-07-06 by the reprex package (v0.2.0).

@hadley
Copy link
Member

hadley commented Jul 6, 2018

(@clauswilke this is one place where I think a data frame column might make sense - y could be a data frame with columns min, max, mid etc. I'm not proposing we do this, as it would be a huge change to the internals, but I think it's an interesting thought experiment)

@jpasquier
Copy link
Author

@clauswilke Thank you very much for the workaround. It solves my problem and allows me to produce the figures I need.

@clauswilke
Copy link
Member

It bugged me that I didn't understand why the two data frames give different results, so I hunted it down. The relevant code is here:

ggplot2/R/stat-.r

Lines 107 to 117 in 3d022ed

stats <- mapply(function(new, old) {
if (empty(new)) return(data.frame())
unique <- uniquecols(old)
missing <- !(names(unique) %in% names(new))
cbind(
new,
unique[rep(1, nrow(new)), missing,drop = FALSE]
)
}, stats, groups, SIMPLIFY = FALSE)
do.call(plyr::rbind.fill, stats)

Line 109 finds any data columns that are constant within a group, and the rest of the code then copies those data columns from the data frame before the stat transformation to the data frame after the stat transformation. If a group consists of only 1 value, all data columns meet that condition and are copied over. Then, in line 117, if some of those columns are not constant for other groups, they are filled with NAs for those groups. This explains why y is copied over only when some groups have only one value, and why when that happens all groups with more than one y value have NA in the final y column.

I think this code could use some commenting.

@clauswilke
Copy link
Member

@hadley How do you feel about addressing this issue by introducing horizontal and vertical nudge position adjustments, just like I did in my workaround? And if we do, how should they be called? hnudge, nudge_h, xnudge, nudge_x?

@clauswilke
Copy link
Member

Actually, right after I wrote this I realized it should be possible to just write different ggproto objects and pick the appropriate one in the position_nudge() constructor based on whether any of x or y are zero. Let me try that.

clauswilke added a commit to wilkelab/ggplot2_archive that referenced this issue Aug 29, 2018
clauswilke added a commit that referenced this issue Sep 1, 2018
* make nudging more robust. closes #2733.

* add regression tests for position_nudge()

* simplify position_nudge, remove required aesthetics
@lock
Copy link

lock bot commented Feb 28, 2019

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Feb 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants