Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add Spark Sql DB engine spec and support Spark 3.x #20462

Merged
merged 7 commits into from
Jun 27, 2022

Conversation

SusurHe
Copy link
Contributor

@SusurHe SusurHe commented Jun 22, 2022

SUMMARY

Move spark engine spec in hive.py to a new file: spark.py and support Spark 3.x DateFormat;

  1. As tools like Kyuubi are gradually maturing, spark SQL has greater advantages than hive, and pyhive can not meet some of the features of spark SQL. I think it would be better to take spark SQL Engine spec alone; It is also convenient to provide better Kyuubi or spark server support in the future;
  2. Spark3.0 remove date format string[fix: Update time grain expressions for Spark >= 3.x #18690], DatabricksEngineSpec have fixed it: https://issues.apache.org/jira/browse/SPARK-31892 but Spark SQL engine is not, so i simply fix it;

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

image
image

TESTING INSTRUCTIONS

connect and query see screenshots;

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@codecov
Copy link

codecov bot commented Jun 22, 2022

Codecov Report

Merging #20462 (e82cd60) into master (44f0b51) will decrease coverage by 0.13%.
The diff coverage is 100.00%.

❗ Current head e82cd60 differs from pull request most recent head 40deb7c. Consider uploading reports for the commit 40deb7c to get more accurate results

@@            Coverage Diff             @@
##           master   #20462      +/-   ##
==========================================
- Coverage   66.75%   66.62%   -0.14%     
==========================================
  Files        1740     1741       +1     
  Lines       65172    65175       +3     
  Branches     6900     6900              
==========================================
- Hits        43505    43421      -84     
- Misses      19918    20005      +87     
  Partials     1749     1749              
Flag Coverage Δ
hive 53.73% <100.00%> (+<0.01%) ⬆️
mysql ?
postgres ?
presto 53.59% <100.00%> (+<0.01%) ⬆️
python 82.56% <100.00%> (-0.28%) ⬇️
sqlite 82.19% <100.00%> (+<0.01%) ⬆️
unit 50.57% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/db_engine_specs/databricks.py 92.30% <ø> (ø)
superset/db_engine_specs/hive.py 85.76% <ø> (-0.11%) ⬇️
superset/db_engine_specs/spark.py 100.00% <100.00%> (ø)
superset/sql_validators/postgres.py 50.00% <0.00%> (-50.00%) ⬇️
superset/views/database/mixins.py 60.34% <0.00%> (-20.69%) ⬇️
superset/sql_validators/__init__.py 80.00% <0.00%> (-20.00%) ⬇️
superset/databases/commands/update.py 85.71% <0.00%> (-8.17%) ⬇️
superset/common/utils/dataframe_utils.py 85.71% <0.00%> (-7.15%) ⬇️
superset/databases/commands/create.py 86.27% <0.00%> (-5.89%) ⬇️
superset/db_engine_specs/postgres.py 91.52% <0.00%> (-5.09%) ⬇️
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 44f0b51...40deb7c. Read the comment docs.

Copy link
Member

@zhaoyongjie zhaoyongjie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution. LGTM.

Copy link
Member

@eschutho eschutho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@eschutho eschutho merged commit c4d2238 into apache:master Jun 27, 2022
@SusurHe SusurHe deleted the spark3-support branch June 28, 2022 05:35
akshatsri pushed a commit to charan1314/superset that referenced this pull request Jul 19, 2022
* add apache spark3

* add Spark DB engine spec

* rebase secret key

* modify License error

* rebase databricks

* modify code style

* black code style

Co-authored-by: kai.he01 <kai.he01@idiaoyan.com>
@egorsmth
Copy link

Hi, are there any movements in direction of spark 3.x without pyhive? Seems like it stops last year

@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 2.1.0 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/M 🚢 2.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants