[FEA] Add support for bucketed writes #22

revans2 · 2020-05-28T20:34:37Z

Is your feature request related to a problem? Please describe.
The SQL plugin supports partitioned writes but not bucketed writes. the main thing preventing this from working is consistent hashing between the CPU and GPU implementations. This will require us to create a version of the murmur3 hash the matches exactly with what spark does and may need us to write it ourselves as it is likely to be spark specific.

revans2 · 2020-10-13T21:57:22Z

This depends on #937

* Instructions for standalone/yarn wip * Update instructions * Fix typo * Small fixes * jars->jar

revans2 · 2021-02-18T18:19:45Z

We could partially implement this now.

revans2 · 2021-02-18T18:20:18Z

To fully implement this we will need full support for bit for bit identical murmur3 hashing.

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

* optimzing Expand+Aggregate in sqlw with many count distinct Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org> * Add GpuBucketingUtils shim to Spark 4.0.0 (NVIDIA#11092) * Add GpuBucketingUtils shim to Spark 4.0.0 * Signing off Signed-off-by: Raza Jafri <rjafri@nvidia.com> --------- Signed-off-by: Raza Jafri <rjafri@nvidia.com> * Improve the diagnostics for 'conv' fallback explain (NVIDIA#11076) * Improve the diagnostics for 'conv' fallback explain Signed-off-by: Jihoon Son <ghoonson@gmail.com> * don't use nil Signed-off-by: Jihoon Son <ghoonson@gmail.com> * the bases should not be an empty string in the error message when the user input is not Signed-off-by: Jihoon Son <ghoonson@gmail.com> * more user-friendly message * Update sql-plugin/src/main/scala/org/apache/spark/sql/rapids/stringFunctions.scala Co-authored-by: Gera Shegalov <gshegalov@nvidia.com> --------- Signed-off-by: Jihoon Son <ghoonson@gmail.com> Co-authored-by: Gera Shegalov <gshegalov@nvidia.com> * Disable ANSI mode for window function tests [databricks] (NVIDIA#11073) * Disable ANSI mode for window function tests. Fixes NVIDIA#11019. Window function tests fail on Spark 4.0 because of NVIDIA#5114 (and NVIDIA#5120 broadly), because spark-rapids does not support SUM, COUNT, and certain other aggregations in ANSI mode. This commit disables ANSI mode tests for the failing window function tests. These may be revisited, once error/overflow checking is available for ANSI mode in spark-rapids. Signed-off-by: MithunR <mithunr@nvidia.com> * Switch from @ansi_mode_disabled to @disable_ansi_mode. --------- Signed-off-by: MithunR <mithunr@nvidia.com> --------- Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org> Signed-off-by: Raza Jafri <rjafri@nvidia.com> Signed-off-by: Jihoon Son <ghoonson@gmail.com> Signed-off-by: MithunR <mithunr@nvidia.com> Co-authored-by: Hongbin Ma (Mahone) <mahongbin@apache.org> Co-authored-by: Raza Jafri <razajafri@users.noreply.github.com> Co-authored-by: Jihoon Son <jihoonson@apache.org> Co-authored-by: Gera Shegalov <gshegalov@nvidia.com> Co-authored-by: MithunR <mithunr@nvidia.com>

revans2 added feature request New feature or request ? - Needs Triage Need team to review and classify SQL part of the SQL/Dataframe plugin labels May 28, 2020

YeahNew mentioned this issue Sep 9, 2020

[QST]The Tpcx_bb query#5,#16,#21,#22 on GPU are slower than CPU #697

Closed

sameerz added P1 Nice to have for release and removed ? - Needs Triage Need team to review and classify labels Oct 13, 2020

revans2 mentioned this issue Oct 13, 2020

[FEA] have murmur3 hash function that matches exactly with spark #937

Closed

wjxiz1992 pushed a commit to wjxiz1992/spark-rapids that referenced this issue Oct 29, 2020

Instructions for standalone/yarn wip (NVIDIA#22)

699d761

* Instructions for standalone/yarn wip * Update instructions * Fix typo * Small fixes * jars->jar

YeahNew mentioned this issue Nov 27, 2020

[QST]There are two questions about TPCxBB Like query results in README.md #1212

Closed

revans2 added the cudf_dependency An issue or PR with this label depends on a new feature in cudf label Feb 18, 2021

tgravescs pushed a commit to tgravescs/spark-rapids that referenced this issue Nov 30, 2023

Update submodule cudf to d19d683 (NVIDIA#22)

2cdcc74

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

sperlingxx added a commit to sperlingxx/spark-rapids that referenced this issue Jan 18, 2024

Dump Parquet Meta as SparkMetrics (NVIDIA#22)

7654636

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

jlowe mentioned this issue Feb 2, 2024

[FEA] It would be nice if we could support Hive-style write bucketing table #10366

Closed

firestarman mentioned this issue Jun 2, 2024

Support bucketing write for GPU #10957

Merged

firestarman closed this as completed in #10957 Jun 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Add support for bucketed writes #22

[FEA] Add support for bucketed writes #22

revans2 commented May 28, 2020

revans2 commented Oct 13, 2020

revans2 commented Feb 18, 2021

revans2 commented Feb 18, 2021

[FEA] Add support for bucketed writes #22

[FEA] Add support for bucketed writes #22

Comments

revans2 commented May 28, 2020

revans2 commented Oct 13, 2020

revans2 commented Feb 18, 2021

revans2 commented Feb 18, 2021