Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating use of superceded map_df to map %>% list_rbind #753

Merged
merged 2 commits into from
Aug 15, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 7 additions & 5 deletions vignettes/articles/readxl-workflows.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -134,17 +134,18 @@ What if the datasets found on different sheets have the same variables? Then you

readxl ships with an example sheet `deaths.xlsx`, containing data on famous people who died in 2016 or 2017. It has two worksheets named "arts" and "other", but the spreadsheet layout is the same in each and the data tables have the same variables, e.g., name and date of death.

The `map_df()` function from purrr makes it easy to iterate over worksheets and glue together the resulting data frames, all at once.
The `map()` function from purrr makes it easy to iterate over worksheets. Use `purrr::list_rbind()` to glue together the resulting data frames.

* Store a self-named vector of worksheet names (critical for the ID variable below).
* Use `purrr::map_df()` to import the data, create an ID variable for the source worksheet, and row bind.
* Use `purrr::map() %>% purrr::list_rbind()` to import the data, create an ID variable for the source worksheet, and row bind.

```{r}
path <- readxl_example("deaths.xlsx")
deaths <- path %>%
excel_sheets() %>%
set_names() %>%
map_df(~ read_excel(path = path, sheet = .x, range = "A5:F15"), .id = "sheet")
map(~ read_excel(path = path, sheet = .x, range = "A5:F15")) %>%
list_rbind(names_to = "sheet")
print(deaths, n = Inf)
```

Expand All @@ -162,7 +163,7 @@ Even though the worksheets in `deaths.xlsx` have the same layout, we'll pretend

* Store a self-named vector of worksheet names.
* Store a vector of cell range specifications.
* Use `purrr::map2_df()` to iterate over those two vectors in parallel, importing the data, row binding, and creating an ID variable for the source worksheet.
* Use `purrr::map2() %>% purrr::list_rbind()` to iterate over those two vectors in parallel, importing the data, row binding, and creating an ID variable for the source worksheet.
* Cache the unified data to CSV.

```{r}
Expand All @@ -171,12 +172,13 @@ sheets <- path %>%
excel_sheets() %>%
set_names()
ranges <- list("A5:F15", cell_rows(5:15))
deaths <- map2_df(
deaths <- map2(
sheets,
ranges,
~ read_excel(path, sheet = .x, range = .y),
.id = "sheet"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just catching up, but should this .id go away in favor of names_to in the list_rbind() call @SokolovAnatoliy?

) %>%
list_rbind() %>%
write_csv("deaths.csv")
print(deaths, n = Inf)
```
Expand Down
Loading