Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow users to estimate query cost before executing it #8172

Merged
merged 12 commits into from
Sep 17, 2019

Conversation

betodealmeida
Copy link
Member

@betodealmeida betodealmeida commented Sep 4, 2019

CATEGORY

Choose one

  • Bug Fix
  • Enhancement (new features, refinement)
  • Refactor
  • Add tests
  • Build / Development Environment
  • Documentation

SUMMARY

We currently added to Presto support for estimating the number of bytes scanned (trinodb/trino#806), and we'd like to surface that information to SQL Lab users before they actually run a query.

This PR extends the DB specs with an allows_cost_estimate attribute and associated methods, allowing pre-execution costs to be computed from DBs that support it.

In order to use it, the feature flag ESTIMATE_QUERY_COST must be enabled, and it needs to be explicitly turned on for each database that supports query cost estimation. When all those conditions are met, a new button will show up in SQL Lab, allowing users to run cost estimates for the whole query or for the selected SQL.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

DBs where the feature is not supported or not enabled are unmodified:

Screen Shot 2019-09-04 at 8 48 46 AM

Here's a Presto DB:

Screen Shot 2019-09-04 at 8 59 15 AM

Waiting for results:

query_cost

The result:

Screen Shot 2019-09-04 at 8 58 57 AM

And how errors (timeout, syntax errors) are surfaced:

Screen Shot 2019-09-04 at 8 53 07 AM

TEST PLAN

Tested with a Presto cluster that supports query cost estimation, running version 0.319 and with the feature enabled via extra:

{
    "version": "0.319",
    "cost_estimate_enabled": true,
    "metadata_params": {},
    "engine_params": {
       "connect_args": {
           "protocol":"https", 
           "source":"superset"
        }
    },
    "metadata_cache_timeout": {"schema_cache_timeout": 86400, "table_cache_timeout": 86400},
    "default_schemas": ["core", "default"]
}

Additionally, I tested:

  • Lowering the Presto version removes the button.
  • Other DBs don't show the button.
  • Turning off the feature flag removes the button.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Requires DB Migration.
  • Confirm DB Migration upgrade and downgrade tested.
  • Introduces new feature or API
  • Removes existing feature or API

REVIEWERS

@etr2460 @mistercrunch

@betodealmeida betodealmeida changed the title Allow users to estimate query cost before executing them Allow users to estimate query cost before executing it Sep 4, 2019
@betodealmeida betodealmeida added enhancement:request Enhancement request submitted by anyone from the community .database sqllab Namespace | Anything related to the SQL Lab lyft Related to Lyft risk:db-migration PRs that require a DB migration labels Sep 4, 2019
Copy link
Member

@etr2460 etr2460 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a bunch of comments, also @john-bodley or @villebro should probably double check the db engine specs stuff

superset/db_engine_specs/base.py Outdated Show resolved Hide resolved
superset/db_engine_specs/base.py Outdated Show resolved Hide resolved
superset/db_engine_specs/presto.py Outdated Show resolved Hide resolved
superset/db_engine_specs/presto.py Outdated Show resolved Hide resolved
superset/models/core.py Outdated Show resolved Hide resolved
superset/sql_validators/presto_db.py Outdated Show resolved Hide resolved
superset/views/core.py Show resolved Hide resolved
superset/views/core.py Outdated Show resolved Hide resolved
@betodealmeida
Copy link
Member Author

@etr2460, I addressed all the comments. I removed the DB migration, and the feature is enabled per DB in extras. Also, I use the DB version to determine if it's supported.


prefixes = ["K", "M", "G", "T", "P", "E", "Z", "Y"]
prefix = ""
to_next_prefix = 1000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commenting again, shouldn't this be 1024? And we should make it a const

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I replied to your comment but I think I resolved the conversation. This is used not just for bytes, but also for cpu and network cost, so 1000 is the correct unit. Also, 1000 is the correct unit for the prefixes K, M, G, etc. For 1024 the prefixes are Ki, Mi, Gi.

Eg, 1024 B = 1 KiB = 1.024 KB.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same concern as @etr2460 , learned something new here (had to google to double check) 👍


db_engine_spec.execute(cursor, sql)
polled = cursor.poll()
while polled:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the only way to tell if the query is finished? This seems a little sketchy, can we not pass a callback or something on success?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let me simplify this.

result = json.loads(first)
estimate = result["estimate"]

def humanize(value, suffix):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add types here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -773,6 +773,13 @@ def name(self):
def allows_subquery(self):
return self.db_engine_spec.allows_subqueries

@property
def allows_cost_estimate(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a return type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@expose("/estimate_query_cost/<database_id>/", methods=["POST"])
@expose("/estimate_query_cost/<database_id>/<schema>/", methods=["POST"])
@event_logger.log_this
def estimate_query_cost(self, database_id, schema=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add types

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -148,6 +149,10 @@ class BaseEngineSpec:
max_column_name_length = 0
try_remove_schema_from_table_name = True

@classmethod
def get_allow_cost_estimate(cls, version=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add types

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@betodealmeida
Copy link
Member Author

@etr2460, I added types and cleaned up the query execution.

Copy link
Member

@etr2460 etr2460 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind making the test plan a little more robust? Test with the feature flag both enabled and disabled, with presto dbs that are configured at a passing version and prior version? With non presto dbs? I'm sure you've tested other cases, but right now the test plan only references the happy path, so a bit more detail would be great.

other than that and my 2 comments here, this lgtm! I'll approve to unblock

superset/views/core.py Show resolved Hide resolved
superset/views/core.py Outdated Show resolved Hide resolved
@betodealmeida betodealmeida merged commit 8847e10 into apache:master Sep 17, 2019
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.35.0 labels Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels enhancement:request Enhancement request submitted by anyone from the community lyft Related to Lyft risk:db-migration PRs that require a DB migration size/L sqllab Namespace | Anything related to the SQL Lab 🚢 0.35.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants