Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cross_join() in place of by = character() #6604

Closed
DavisVaughan opened this issue Dec 13, 2022 · 0 comments · Fixed by #6612
Closed

cross_join() in place of by = character() #6604

DavisVaughan opened this issue Dec 13, 2022 · 0 comments · Fixed by #6612
Assignees
Labels
feature a feature request or enhancement tables 🧮 joins and set operations
Milestone

Comments

@DavisVaughan
Copy link
Member

Rationale for adding an explicit cross_join() function in favor of the existing by = character() method:

  • I think by = character() is fairly unintuitive, and I’ve seen 2 cases recently where people didn’t even know it existed

  • I think by = character() should really be an error. I think its current behavior of doing a cross join is actually a little scary. Like, supplying c("x", "y") matches on two columns, supplying "x" matches on one column, but character() all of the sudden matches all rows? Seems very odd to me.

  • Importantly, currently only inner_join(by = character()) is a true cross join. SQL cross joins must result in nrow(x) * nrow(y) number of rows, and only the inner cross join does this in the edge case where one of the inputs has 0 rows (i.e. the left join would retain all x rows). I know this is edge case behavior, but the weirdness here feels worth resolving by just adding cross_join()

  • I think it would be more straightforward for dbplyr and friends to convert cross_join() to SQL. Converting left_join(by = character()) to SQL is probably quite odd.

  • It would be nice that cross_join() wouldn’t have a by argument at all, making it very clear it doesn’t have anything to do with columns

  • We could document cross_join() on its own help page, raising awareness of it

  • join_by() wouldn’t have to support cross joins, which it currently does by allowing empty join_by() calls to be treated like cross joins. I could turn that into an error for 1.1.0 if we also added cross_join() at the same time, which would be much nicer than having to deprecate it slowly in the future.

@DavisVaughan DavisVaughan added feature a feature request or enhancement tables 🧮 joins and set operations labels Dec 13, 2022
@DavisVaughan DavisVaughan added this to the 1.1.0 milestone Dec 13, 2022
@DavisVaughan DavisVaughan self-assigned this Dec 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement tables 🧮 joins and set operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant