Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let pick() support NULL #6685

Closed
psychelzh opened this issue Feb 6, 2023 · 5 comments · Fixed by #6756
Closed

Let pick() support NULL #6685

psychelzh opened this issue Feb 6, 2023 · 5 comments · Fixed by #6756
Milestone

Comments

@psychelzh
Copy link

psychelzh commented Feb 6, 2023

In the documentation:

pick() is useful as a bridge between data-masking functions (like mutate() or group_by()) and functions with tidy-select behavior (like select()).

Therefore, it will be as expected if they both support NULL to fall to no grouping.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
df <- data.frame(x = 1)
df |> 
  group_by(across(NULL))
#> # A tibble: 1 × 1
#>       x
#>   <dbl>
#> 1     1
df |> 
  group_by(pick(NULL))
#> Error in `group_by()`:
#> ℹ In argument: `pick(NULL)`.
#> Caused by error:
#> ! `pick(NULL)` must be size 1, not 0.

#> Backtrace:
#>      ▆
#>   1. ├─dplyr::group_by(df, pick(NULL))
#>   2. ├─dplyr:::group_by.data.frame(df, pick(NULL))
#>   3. │ └─dplyr::group_by_prepare(.data, ..., .add = .add, error_call = current_env())
#>   4. │   └─dplyr:::add_computed_columns(.data, new_groups, error_call = error_call)
#>   5. │     └─dplyr:::mutate_cols(...)
#>   6. │       ├─base::withCallingHandlers(...)
#>   7. │       └─dplyr:::mutate_col(dots[[i]], data, mask, new_columns)
#>   8. │         └─mask$eval_all_mutate(quo)
#>   9. │           └─dplyr (local) eval()
#>  10. ├─dplyr:::dplyr_internal_error(...)
#>  11. │ └─rlang::abort(class = c(class, "dplyr:::internal_error"), dplyr_error_data = data)
#>  12. │   └─rlang:::signal_abort(cnd, .file)
#>  13. │     └─base::signalCondition(cnd)
#>  14. └─dplyr (local) `<fn>`(`<dpl:::__>`)
#>  15.   └─rlang::abort(message, class = error_class, parent = parent, call = error_call)

Created on 2023-02-06 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.2 (2022-10-31 ucrt)
#>  os       Windows 10 x64 (build 22621)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  Chinese (Simplified)_China.utf8
#>  ctype    Chinese (Simplified)_China.utf8
#>  tz       Asia/Taipei
#>  date     2023-02-06
#>  pandoc   2.19.2 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  cli           3.6.0   2023-01-09 [1] CRAN (R 4.2.2)
#>  digest        0.6.31  2022-12-11 [1] CRAN (R 4.2.2)
#>  dplyr       * 1.1.0   2023-01-29 [1] CRAN (R 4.2.2)
#>  evaluate      0.20    2023-01-17 [1] CRAN (R 4.2.2)
#>  fansi         1.0.4   2023-01-22 [1] CRAN (R 4.2.2)
#>  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.2.1)
#>  fs            1.6.0   2023-01-23 [1] CRAN (R 4.2.2)
#>  generics      0.1.3   2022-07-05 [1] CRAN (R 4.2.1)
#>  glue          1.6.2   2022-02-24 [1] CRAN (R 4.2.1)
#>  htmltools     0.5.4   2022-12-07 [1] CRAN (R 4.2.2)
#>  knitr         1.42    2023-01-25 [1] CRAN (R 4.2.2)
#>  lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.2.1)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.2.1)
#>  pillar        1.8.1   2022-08-19 [1] CRAN (R 4.2.1)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.2.1)
#>  purrr         1.0.1   2023-01-10 [1] CRAN (R 4.2.2)
#>  R.cache       0.16.0  2022-07-21 [1] CRAN (R 4.2.1)
#>  R.methodsS3   1.8.2   2022-06-13 [1] CRAN (R 4.2.0)
#>  R.oo          1.25.0  2022-06-12 [1] CRAN (R 4.2.0)
#>  R.utils       2.12.2  2022-11-11 [1] CRAN (R 4.2.2)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.2.1)
#>  reprex        2.0.2   2022-08-17 [1] CRAN (R 4.2.1)
#>  rlang         1.0.6   2022-09-24 [1] CRAN (R 4.2.1)
#>  rmarkdown     2.20    2023-01-19 [1] CRAN (R 4.2.2)
#>  rstudioapi    0.14    2022-08-22 [1] CRAN (R 4.2.1)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.2.1)
#>  styler        1.9.0   2023-01-15 [1] CRAN (R 4.2.2)
#>  tibble        3.1.8   2022-07-22 [1] CRAN (R 4.2.1)
#>  tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.2.1)
#>  utf8          1.2.3   2023-01-31 [1] CRAN (R 4.2.2)
#>  vctrs         0.5.2   2023-01-23 [1] CRAN (R 4.2.2)
#>  withr         2.5.0   2022-03-03 [1] CRAN (R 4.2.1)
#>  xfun          0.37    2023-01-31 [1] CRAN (R 4.2.2)
#>  yaml          2.3.7   2023-01-23 [1] CRAN (R 4.2.2)
#> 
#>  [1] C:/Users/lenovo/AppData/Local/R/win-library/4.2
#>  [2] C:/Program Files/R/R-4.2.2/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────
@psychelzh psychelzh changed the title Let pick() supports NULL Let pick() supports NULL Feb 6, 2023
@psychelzh psychelzh changed the title Let pick() supports NULL Let pick() support NULL Feb 6, 2023
@hadley
Copy link
Member

hadley commented Feb 6, 2023

By our standard rules, pick(NULL) should be equivalent to pick(), so this definitely feels like a bug to me.

@DavisVaughan
Copy link
Member

FWIW that gives

library(dplyr)
df <- data.frame(x = 1)
df %>% group_by(pick())
#> Error in `group_by()`:
#> ℹ In argument: `pick()`.
#> Caused by error in `pick()`:
#> ! Must supply at least one input to `pick()`.

We've declared that pick() is theoretically equivalent to an inlined call to tibble(). Like pick(c(a, b)) == tibble(a = a, b = b).

So pick(NULL) ends up as tibble(), returning a 0 row data frame, resulting in this error.


It might seem obvious that pick() should always return a data frame with the same number of rows as the current group size, but it isn't that simple because we've allowed pick() to select "fresh" variables, like:

library(dplyr)

df <- tibble(g = c(1, 1, 1, 2, 2), x = c(2, 3, 1, 5, 3))

df |>
  reframe(y = mean(x), z = pick(everything()), .by = g)
#> # A tibble: 5 × 3
#>       g     y   z$x    $y
#>   <dbl> <dbl> <dbl> <dbl>
#> 1     1     2     2     2
#> 2     1     2     3     2
#> 3     1     2     1     2
#> 4     2     4     5     4
#> 5     2     4     3     4

So in this case we have to compute the common size of the columns selected by pick() and do recycling to recycle the mean values.

That means that when 0 columns are selected, the common size is 0, and we end up with the above error.

@DavisVaughan
Copy link
Member

DavisVaughan commented Feb 6, 2023

The other potential definition for pick() is:

"You can only pick() columns from the initial data frame"

I do actually like this definition. Possibly across() should have been defined this way too.

It becomes clear that pick() always results in a data frame with a number of rows matching the current group size.

However it does mean they are slightly less like macros though, because you can't replace pick() with an equivalent tibble() call in all cases.

It would be slightly complicated to implement, as we'd need a 2nd set of pick/across specific chop promises. We can't use the standard data mask because it gets modified by other expressions along the way (which can result in deletion of columns, insertion of columns, or modification of columns before pick() is called, like in the above summarise() example)

@psychelzh
Copy link
Author

psychelzh commented Feb 7, 2023

From a user's perspective, it seems more reasonable to support picking no variables at all for these data-masking functions.

@DavisVaughan DavisVaughan added this to the 1.1.1 milestone Feb 7, 2023
@DavisVaughan
Copy link
Member

DavisVaughan commented Feb 14, 2023

We explicitly defined pick() with no arguments to be an error right now - for the purpose of possibly letting pick() with no arguments be interpreted as pick(everything()) in a future release. So I don't think we should change that.
5ec033c

But I can look into seeing if pick(<empty_selection>) can work better. We probably need to detect that special case and return a 1 row data frame that can be recycled to any group size. Or just return a data frame with number of rows equal to the group size if that information is easy to access from the pick() internals.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants