You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Rationale for adding an explicit cross_join() function in favor of the existing by = character() method:
I think by = character() is fairly unintuitive, and I’ve seen 2 cases recently where people didn’t even know it existed
I think by = character() should really be an error. I think its current behavior of doing a cross join is actually a little scary. Like, supplying c("x", "y") matches on two columns, supplying "x" matches on one column, but character() all of the sudden matches all rows? Seems very odd to me.
Importantly, currently only inner_join(by = character()) is a true cross join. SQL cross joins must result in nrow(x) * nrow(y) number of rows, and only the inner cross join does this in the edge case where one of the inputs has 0 rows (i.e. the left join would retain all x rows). I know this is edge case behavior, but the weirdness here feels worth resolving by just adding cross_join()
I think it would be more straightforward for dbplyr and friends to convert cross_join() to SQL. Converting left_join(by = character()) to SQL is probably quite odd.
It would be nice that cross_join() wouldn’t have a by argument at all, making it very clear it doesn’t have anything to do with columns
We could document cross_join() on its own help page, raising awareness of it
join_by() wouldn’t have to support cross joins, which it currently does by allowing empty join_by() calls to be treated like cross joins. I could turn that into an error for 1.1.0 if we also added cross_join() at the same time, which would be much nicer than having to deprecate it slowly in the future.
The text was updated successfully, but these errors were encountered:
Rationale for adding an explicit
cross_join()
function in favor of the existingby = character()
method:I think
by = character()
is fairly unintuitive, and I’ve seen 2 cases recently where people didn’t even know it existedI think
by = character()
should really be an error. I think its current behavior of doing a cross join is actually a little scary. Like, supplyingc("x", "y")
matches on two columns, supplying"x"
matches on one column, butcharacter()
all of the sudden matches all rows? Seems very odd to me.Importantly, currently only
inner_join(by = character())
is a true cross join. SQL cross joins must result innrow(x) * nrow(y)
number of rows, and only the inner cross join does this in the edge case where one of the inputs has 0 rows (i.e. the left join would retain all x rows). I know this is edge case behavior, but the weirdness here feels worth resolving by just addingcross_join()
I think it would be more straightforward for dbplyr and friends to convert
cross_join()
to SQL. Convertingleft_join(by = character())
to SQL is probably quite odd.It would be nice that
cross_join()
wouldn’t have aby
argument at all, making it very clear it doesn’t have anything to do with columnsWe could document
cross_join()
on its own help page, raising awareness of itjoin_by()
wouldn’t have to support cross joins, which it currently does by allowing emptyjoin_by()
calls to be treated like cross joins. I could turn that into an error for 1.1.0 if we also addedcross_join()
at the same time, which would be much nicer than having to deprecate it slowly in the future.The text was updated successfully, but these errors were encountered: