Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bind_rows() for database tables #6905

Open
mgirlich opened this issue Aug 7, 2023 · 3 comments
Open

bind_rows() for database tables #6905

mgirlich opened this issue Aug 7, 2023 · 3 comments
Labels
feature a feature request or enhancement rows ↕️ Operations on rows: filter(), slice(), arrange()

Comments

@mgirlich
Copy link

mgirlich commented Aug 7, 2023

It would be nice if bind_rows() would work for database tables.

library(dplyr, warn.conflicts = FALSE)
library(purrr)
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")

# Create random slices of mtcars
tables <-
  1:5 |>
  map(function(n) {
    tbl_name <- paste0("mtcars_", n)
    copy_to(con, slice_sample(mtcars, n = 3), tbl_name)

    tbl(con, tbl_name)
  })
bind_rows(!!!tables)
#> Error in `bind_rows()`:
#> ! Argument 1 must be a data frame or a named atomic vector.
#> Backtrace:
#>     ▆
#>  1. └─dplyr::bind_rows(!!!tables)
#>  2.   └─rlang::abort(glue("Argument {i} must be a data frame or a named atomic vector."))

Created on 2023-08-07 with reprex v2.0.2

I know there were issues with the double dispatch and what happens if local tables and database tables are mixed.
But I think it would be sufficient if either all inputs are local tables or database tables.
This was also requested again in dbplyr: tidyverse/dbplyr#1342.

@etiennebacher
Copy link

etiennebacher commented Nov 6, 2023

I made this change in the source and it passes all dplyr tests:

bind_rows <- function(..., .id = NULL) {
  dots <- list2(...)
  is_flattenable <- function(x) !is_named(x)
  if (length(dots) == 1 && is_bare_list(dots[[1]])) {
    dots <- dots[[1]]
  }
  dots <- list_flatten(dots, fn = is_flattenable)
  dots <- discard(dots, is.null)
  if (length(dots) == 0) {
    return(tibble())
  }
  UseMethod("bind_rows", dots[[1]])
}

#' @export
bind_rows.tbl_SQLiteConnection <- function(...) {
  message("hello there")
}

#' @export
bind_rows.LazyFrame <- function(...) {
  polars::pl$concat(..., how = "vertical")
}

#' @export
bind_rows.default <- function(..., .id = NULL) {
  # Current code of `bind_rows()`
}

Then, internally, each method could determine how to handle the ... (e.g error if not all items have the same class).

Here, I check that bind_rows() works fine with dplyr, some database connections (I don't know how it would bind rows so I just put a message), and polars:

library(dplyr, warn.conflicts = FALSE)
library(purrr)
library(polars)

### classic dplyr
bind_rows(tibble(mtcars), tibble(mtcars)) |> 
  head()
#> # A tibble: 6 × 11
#>     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  21       6   160   110  3.9   2.62  16.5     0     1     4     4
#> 2  21       6   160   110  3.9   2.88  17.0     0     1     4     4
#> 3  22.8     4   108    93  3.85  2.32  18.6     1     1     4     1
#> 4  21.4     6   258   110  3.08  3.22  19.4     1     0     3     1
#> 5  18.7     8   360   175  3.15  3.44  17.0     0     0     3     2
#> 6  18.1     6   225   105  2.76  3.46  20.2     1     0     3     1

### databases
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")

tables <-
  1:5 |>
  map(function(n) {
    tbl_name <- paste0("mtcars_", n)
    copy_to(con, slice_sample(mtcars, n = 3), tbl_name)
    
    tbl(con, tbl_name)
  })
bind_rows(!!!tables)
#> hello there

### Polars LazyFrames
bind_rows(pl$LazyFrame(mtcars), pl$LazyFrame(mtcars))
#> [1] "polars LazyFrame naive plan: (run ldf$describe_optimized_plan() to see the optimized plan)"
#> UNION
#>   PLAN 0:
#>     DF ["mpg", "cyl", "disp", "hp"]; PROJECT */11 COLUMNS; SELECTION: "None"
#>   PLAN 1:
#>     DF ["mpg", "cyl", "disp", "hp"]; PROJECT */11 COLUMNS; SELECTION: "None"
#> END UNION

Would this implementation be enough or am I missing something?

@DavisVaughan DavisVaughan added feature a feature request or enhancement rows ↕️ Operations on rows: filter(), slice(), arrange() labels Nov 6, 2023
@etiennebacher
Copy link

@DavisVaughan it would be great to know whether my suggested implementation makes sense. If so, I can prepare a PR for this

@avhz
Copy link

avhz commented Sep 25, 2024

Any update on this one ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement rows ↕️ Operations on rows: filter(), slice(), arrange()
Projects
None yet
Development

No branches or pull requests

4 participants