Let `pick()` support `NULL` #6685

psychelzh · 2023-02-06T08:24:07Z

In the documentation:

pick() is useful as a bridge between data-masking functions (like mutate() or group_by()) and functions with tidy-select behavior (like select()).

Therefore, it will be as expected if they both support NULL to fall to no grouping.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
df <- data.frame(x = 1)
df |> 
  group_by(across(NULL))
#> # A tibble: 1 × 1
#>       x
#>   <dbl>
#> 1     1
df |> 
  group_by(pick(NULL))
#> Error in `group_by()`:
#> ℹ In argument: `pick(NULL)`.
#> Caused by error:
#> ! `pick(NULL)` must be size 1, not 0.

#> Backtrace:
#>      ▆
#>   1. ├─dplyr::group_by(df, pick(NULL))
#>   2. ├─dplyr:::group_by.data.frame(df, pick(NULL))
#>   3. │ └─dplyr::group_by_prepare(.data, ..., .add = .add, error_call = current_env())
#>   4. │   └─dplyr:::add_computed_columns(.data, new_groups, error_call = error_call)
#>   5. │     └─dplyr:::mutate_cols(...)
#>   6. │       ├─base::withCallingHandlers(...)
#>   7. │       └─dplyr:::mutate_col(dots[[i]], data, mask, new_columns)
#>   8. │         └─mask$eval_all_mutate(quo)
#>   9. │           └─dplyr (local) eval()
#>  10. ├─dplyr:::dplyr_internal_error(...)
#>  11. │ └─rlang::abort(class = c(class, "dplyr:::internal_error"), dplyr_error_data = data)
#>  12. │   └─rlang:::signal_abort(cnd, .file)
#>  13. │     └─base::signalCondition(cnd)
#>  14. └─dplyr (local) `<fn>`(`<dpl:::__>`)
#>  15.   └─rlang::abort(message, class = error_class, parent = parent, call = error_call)

^{Created on 2023-02-06 with reprex v2.0.2}

Session info

sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.2 (2022-10-31 ucrt)
#>  os       Windows 10 x64 (build 22621)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  Chinese (Simplified)_China.utf8
#>  ctype    Chinese (Simplified)_China.utf8
#>  tz       Asia/Taipei
#>  date     2023-02-06
#>  pandoc   2.19.2 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  cli           3.6.0   2023-01-09 [1] CRAN (R 4.2.2)
#>  digest        0.6.31  2022-12-11 [1] CRAN (R 4.2.2)
#>  dplyr       * 1.1.0   2023-01-29 [1] CRAN (R 4.2.2)
#>  evaluate      0.20    2023-01-17 [1] CRAN (R 4.2.2)
#>  fansi         1.0.4   2023-01-22 [1] CRAN (R 4.2.2)
#>  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.2.1)
#>  fs            1.6.0   2023-01-23 [1] CRAN (R 4.2.2)
#>  generics      0.1.3   2022-07-05 [1] CRAN (R 4.2.1)
#>  glue          1.6.2   2022-02-24 [1] CRAN (R 4.2.1)
#>  htmltools     0.5.4   2022-12-07 [1] CRAN (R 4.2.2)
#>  knitr         1.42    2023-01-25 [1] CRAN (R 4.2.2)
#>  lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.2.1)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.2.1)
#>  pillar        1.8.1   2022-08-19 [1] CRAN (R 4.2.1)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.2.1)
#>  purrr         1.0.1   2023-01-10 [1] CRAN (R 4.2.2)
#>  R.cache       0.16.0  2022-07-21 [1] CRAN (R 4.2.1)
#>  R.methodsS3   1.8.2   2022-06-13 [1] CRAN (R 4.2.0)
#>  R.oo          1.25.0  2022-06-12 [1] CRAN (R 4.2.0)
#>  R.utils       2.12.2  2022-11-11 [1] CRAN (R 4.2.2)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.2.1)
#>  reprex        2.0.2   2022-08-17 [1] CRAN (R 4.2.1)
#>  rlang         1.0.6   2022-09-24 [1] CRAN (R 4.2.1)
#>  rmarkdown     2.20    2023-01-19 [1] CRAN (R 4.2.2)
#>  rstudioapi    0.14    2022-08-22 [1] CRAN (R 4.2.1)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.2.1)
#>  styler        1.9.0   2023-01-15 [1] CRAN (R 4.2.2)
#>  tibble        3.1.8   2022-07-22 [1] CRAN (R 4.2.1)
#>  tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.2.1)
#>  utf8          1.2.3   2023-01-31 [1] CRAN (R 4.2.2)
#>  vctrs         0.5.2   2023-01-23 [1] CRAN (R 4.2.2)
#>  withr         2.5.0   2022-03-03 [1] CRAN (R 4.2.1)
#>  xfun          0.37    2023-01-31 [1] CRAN (R 4.2.2)
#>  yaml          2.3.7   2023-01-23 [1] CRAN (R 4.2.2)
#> 
#>  [1] C:/Users/lenovo/AppData/Local/R/win-library/4.2
#>  [2] C:/Program Files/R/R-4.2.2/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

The text was updated successfully, but these errors were encountered:

hadley · 2023-02-06T16:13:51Z

By our standard rules, pick(NULL) should be equivalent to pick(), so this definitely feels like a bug to me.

DavisVaughan · 2023-02-06T17:02:10Z

FWIW that gives

library(dplyr)
df <- data.frame(x = 1)
df %>% group_by(pick())
#> Error in `group_by()`:
#> ℹ In argument: `pick()`.
#> Caused by error in `pick()`:
#> ! Must supply at least one input to `pick()`.

We've declared that pick() is theoretically equivalent to an inlined call to tibble(). Like pick(c(a, b)) == tibble(a = a, b = b).

So pick(NULL) ends up as tibble(), returning a 0 row data frame, resulting in this error.

It might seem obvious that pick() should always return a data frame with the same number of rows as the current group size, but it isn't that simple because we've allowed pick() to select "fresh" variables, like:

library(dplyr)

df <- tibble(g = c(1, 1, 1, 2, 2), x = c(2, 3, 1, 5, 3))

df |>
  reframe(y = mean(x), z = pick(everything()), .by = g)
#> # A tibble: 5 × 3
#>       g     y   z$x    $y
#>   <dbl> <dbl> <dbl> <dbl>
#> 1     1     2     2     2
#> 2     1     2     3     2
#> 3     1     2     1     2
#> 4     2     4     5     4
#> 5     2     4     3     4

So in this case we have to compute the common size of the columns selected by pick() and do recycling to recycle the mean values.

That means that when 0 columns are selected, the common size is 0, and we end up with the above error.

DavisVaughan · 2023-02-06T17:08:23Z

The other potential definition for pick() is:

"You can only pick() columns from the initial data frame"

I do actually like this definition. Possibly across() should have been defined this way too.

It becomes clear that pick() always results in a data frame with a number of rows matching the current group size.

However it does mean they are slightly less like macros though, because you can't replace pick() with an equivalent tibble() call in all cases.

It would be slightly complicated to implement, as we'd need a 2nd set of pick/across specific chop promises. We can't use the standard data mask because it gets modified by other expressions along the way (which can result in deletion of columns, insertion of columns, or modification of columns before pick() is called, like in the above summarise() example)

psychelzh · 2023-02-07T01:08:17Z

From a user's perspective, it seems more reasonable to support picking no variables at all for these data-masking functions.

DavisVaughan · 2023-02-14T21:19:38Z

We explicitly defined pick() with no arguments to be an error right now - for the purpose of possibly letting pick() with no arguments be interpreted as pick(everything()) in a future release. So I don't think we should change that.
5ec033c

But I can look into seeing if pick(<empty_selection>) can work better. We probably need to detect that special case and return a 1 row data frame that can be recycled to any group size. Or just return a data frame with number of rows equal to the group size if that information is easy to access from the pick() internals.

psychelzh changed the title ~~Let pick() supports NULL~~ Let pick() supports NULL Feb 6, 2023

psychelzh changed the title ~~Let pick() supports NULL~~ Let pick() support NULL Feb 6, 2023

DavisVaughan added this to the 1.1.1 milestone Feb 7, 2023

DavisVaughan mentioned this issue Feb 14, 2023

Make the current group size a variable updated by reference #6727

Merged

DavisVaughan mentioned this issue Feb 23, 2023

Force pick() empty selections to result in 1 row tibbles #6756

Merged

DavisVaughan closed this as completed in #6756 Feb 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Let `pick()` support `NULL` #6685

Let `pick()` support `NULL` #6685

psychelzh commented Feb 6, 2023 •

edited

Loading

hadley commented Feb 6, 2023

DavisVaughan commented Feb 6, 2023

DavisVaughan commented Feb 6, 2023 •

edited

Loading

psychelzh commented Feb 7, 2023 •

edited

Loading

DavisVaughan commented Feb 14, 2023 •

edited

Loading

Let pick() support NULL #6685

Let pick() support NULL #6685

Comments

psychelzh commented Feb 6, 2023 • edited Loading

hadley commented Feb 6, 2023

DavisVaughan commented Feb 6, 2023

DavisVaughan commented Feb 6, 2023 • edited Loading

psychelzh commented Feb 7, 2023 • edited Loading

DavisVaughan commented Feb 14, 2023 • edited Loading

Let `pick()` support `NULL` #6685

Let `pick()` support `NULL` #6685

psychelzh commented Feb 6, 2023 •

edited

Loading

DavisVaughan commented Feb 6, 2023 •

edited

Loading

psychelzh commented Feb 7, 2023 •

edited

Loading

DavisVaughan commented Feb 14, 2023 •

edited

Loading