Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Add support for bucketed writes #22

Closed
revans2 opened this issue May 28, 2020 · 3 comments · Fixed by #10957
Closed

[FEA] Add support for bucketed writes #22

revans2 opened this issue May 28, 2020 · 3 comments · Fixed by #10957
Labels
cudf_dependency An issue or PR with this label depends on a new feature in cudf feature request New feature or request P1 Nice to have for release SQL part of the SQL/Dataframe plugin

Comments

@revans2
Copy link
Collaborator

revans2 commented May 28, 2020

Is your feature request related to a problem? Please describe.
The SQL plugin supports partitioned writes but not bucketed writes. the main thing preventing this from working is consistent hashing between the CPU and GPU implementations. This will require us to create a version of the murmur3 hash the matches exactly with what spark does and may need us to write it ourselves as it is likely to be spark specific.

@revans2 revans2 added feature request New feature or request ? - Needs Triage Need team to review and classify SQL part of the SQL/Dataframe plugin labels May 28, 2020
@sameerz sameerz added P1 Nice to have for release and removed ? - Needs Triage Need team to review and classify labels Oct 13, 2020
@revans2
Copy link
Collaborator Author

revans2 commented Oct 13, 2020

This depends on #937

wjxiz1992 pushed a commit to wjxiz1992/spark-rapids that referenced this issue Oct 29, 2020
* Instructions for standalone/yarn wip

* Update instructions

* Fix typo

* Small fixes

* jars->jar
@revans2
Copy link
Collaborator Author

revans2 commented Feb 18, 2021

We could partially implement this now.

@revans2 revans2 added the cudf_dependency An issue or PR with this label depends on a new feature in cudf label Feb 18, 2021
@revans2
Copy link
Collaborator Author

revans2 commented Feb 18, 2021

To fully implement this we will need full support for bit for bit identical murmur3 hashing.

tgravescs pushed a commit to tgravescs/spark-rapids that referenced this issue Nov 30, 2023
Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
sperlingxx added a commit to sperlingxx/spark-rapids that referenced this issue Jan 18, 2024
Signed-off-by: sperlingxx <lovedreamf@gmail.com>
res-life pushed a commit to res-life/spark-rapids that referenced this issue Jun 27, 2024
* optimzing Expand+Aggregate in sqlw with many count distinct

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

* Add GpuBucketingUtils shim to Spark 4.0.0 (NVIDIA#11092)

* Add GpuBucketingUtils shim to Spark 4.0.0

* Signing off

Signed-off-by: Raza Jafri <rjafri@nvidia.com>

---------

Signed-off-by: Raza Jafri <rjafri@nvidia.com>

* Improve the diagnostics for 'conv' fallback explain (NVIDIA#11076)

* Improve the diagnostics for 'conv' fallback explain

Signed-off-by: Jihoon Son <ghoonson@gmail.com>

* don't use nil

Signed-off-by: Jihoon Son <ghoonson@gmail.com>

* the bases should not be an empty string in the error message when the user input is not

Signed-off-by: Jihoon Son <ghoonson@gmail.com>

* more user-friendly message

* Update sql-plugin/src/main/scala/org/apache/spark/sql/rapids/stringFunctions.scala

Co-authored-by: Gera Shegalov <gshegalov@nvidia.com>

---------

Signed-off-by: Jihoon Son <ghoonson@gmail.com>
Co-authored-by: Gera Shegalov <gshegalov@nvidia.com>

* Disable ANSI mode for window function tests [databricks] (NVIDIA#11073)

* Disable ANSI mode for window function tests.

Fixes NVIDIA#11019.

Window function tests fail on Spark 4.0 because of NVIDIA#5114 (and NVIDIA#5120 broadly),
because spark-rapids does not support SUM, COUNT, and certain other aggregations
in ANSI mode.

This commit disables ANSI mode tests for the failing window function tests. These may be
revisited, once error/overflow checking is available for ANSI mode in spark-rapids.

Signed-off-by: MithunR <mithunr@nvidia.com>

* Switch from @ansi_mode_disabled to @disable_ansi_mode.

---------

Signed-off-by: MithunR <mithunr@nvidia.com>

---------

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>
Signed-off-by: Raza Jafri <rjafri@nvidia.com>
Signed-off-by: Jihoon Son <ghoonson@gmail.com>
Signed-off-by: MithunR <mithunr@nvidia.com>
Co-authored-by: Hongbin Ma (Mahone) <mahongbin@apache.org>
Co-authored-by: Raza Jafri <razajafri@users.noreply.github.com>
Co-authored-by: Jihoon Son <jihoonson@apache.org>
Co-authored-by: Gera Shegalov <gshegalov@nvidia.com>
Co-authored-by: MithunR <mithunr@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cudf_dependency An issue or PR with this label depends on a new feature in cudf feature request New feature or request P1 Nice to have for release SQL part of the SQL/Dataframe plugin
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants