Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add slice_id to query object #25192

Closed
wants to merge 0 commits into from
Closed

Conversation

hwmarkcheng
Copy link

@hwmarkcheng hwmarkcheng commented Sep 5, 2023

SUMMARY

Hi team,

We're looking to make a few changes that would allow folks to populate the sql_query_mutator with more useful params. Ultimately we're hoping to get chart_id and dataset_id into query comments, to automate our data masking with Satori. Open to discussion on the best practices to do so.

These are the three items we're looking to change:

  • Change mutate_query_from_config to take in kwargs (Consistent with intended purpose of sql_query_mutator, see this PR)
  • Add slice_id to query_obj : As addition context for query_obj. Having slice_id is versatile and can open doors for powerful integrations/automations. We originally wanted to add in the whole slice object for more options, but some integration tests failed since the object cannot be converted to a dictionary. Happy to discuss the best way to do this going forward.
  • Add chart_id and dataset_id as default parameters for mutate_query_from_config

Example Usecase, Automated Data Access Requests via Satori:
Context: We use Satori for our data access management.

  • Users posts chart link to request access to specific schemas -> DAR team approves request -> We grab chart/dataset id from query -> Integrate with Satori to automatically identify all schemas in the chart and grant the user access to them

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

  • Change sql_query_mutator to add kwargs as comments. Note this will be reverted before committing (we don't want to change this)
    • Click view query on any chart in a dashboard. You should now see the chart_id, as well as the dataset_id
    • Screenshot 2023-08-23 at 4 34 57 PM

Changed query_obj tests to account for additional parameters:

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@Vitor-Avila
Copy link
Contributor

thank you so much, @hwmarkcheng!

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hwmarkcheng thanks for the PR! If form_data is passed along in the query context, slice_id should be available via that (if you search for slice_id under superset/common you should find how it's being used). But I kind of agree that slice_id should be a top level property, and I've been considering adding it, too, so maybe this is a good way to be more explicit about the relevant chart.

To avoid duplicated logic related to slice_id, can you harmonize there to only be one way of passing it in? Also, I didn't notice any frontend changes - we should probably also add automatic population of slice_id to the frontend.

@hwmarkcheng
Copy link
Author

Hey @villebro!

Thank you for your review!

Yes, slice_id is in query_context, however it seems that most query actions for the BaseDatasource class (which contains sql_query_mutator) uses query_obj as the parameter.

We traced back to where query_object is created to add this slice_id in. Wondering if you can clarify on the harmonization part, I think given this slice_id is only passed in through it's creation in query_context_factory/query_object_factory?

Ah we haven't dived too much into the frontend portion of the code. Would the ask be to add something like a #slice_id into the chart editor UI so that users can easily get the ID of the chart?

@zhaoyongjie
Copy link
Member

zhaoyongjie commented Sep 13, 2023

@hwmarkcheng Thanks for your contribution, but I have different potin for this feature, --- how to identify a dashboard or a chart in SQL_QUERY_MUTATOR.

I don't think this PR is a good pattern in current Query Object because the "slice_id" isn't a part of a Query, we don't need slice_id to generate a SQL. In our fork-superset(Burda Group), we set lots of "tags" in Superset. e.g.:

image

In fact, this is a simple implement, the slice_id or any ID(dashboard_id, or others ids) passed from url parameter.
I give some pseudocode

      def SQL_QUERY_MUTATOR(sql: str, **kwargs) -> str:
          import json
          import logging
          from enum import Enum
          logger = logging.getLogger(__name__)

          from flask import request

          from superset.security import SupersetSecurityManager
          from superset.models.dashboard import Dashboard
          from superset.models.slice import Slice as Chart
          from superset.datasource.dao import DatasourceDAO
          from superset import db

          class QueryType(str, Enum):
            DASHBOARD = "dashboard"
            CHART = "chart"
            REPORT = "report"
            SQL_LAB = "sql_lab"
            EXPLORE = "explore"
            NATIVE_FILTER = "native_filter"

          form_data = request.args.get("form_data", "")
          try:
            form_data = json.loads(form_data)
          except json.JSONDecodeError:
            form_data = {}

          slice_id = form_data.get("slice_id")
          dashboard_id = request.args.get("dashboard_id")
          query_info_dict = {}

          if dashboard_id is not None:
            # 1) Query from Dashboard
            dashboard = Dashboard.get(str(dashboard_id))
            query_info_dict["query_type"] = QueryType.DASHBOARD.value
            query_info_dict["dashboard_id"] = dashboard_id
            query_info_dict["dashboard_title"] = getattr(dashboard, "dashboard_title", None)
          if slice_id is not None:
            # 2) Query from Chart
            chart = Chart.get(slice_id)
            query_info_dict["query_type"] = QueryType.CHART.value
            query_info_dict["chart_id"] = slice_id
            query_info_dict["chart_title"] = getattr(chart, "slice_name", None)

          if slice_id is None and dashboard_id is None and "sql_json" in request.path:
            # 3) Query from SQL Lab
            query_info_dict["query_type"] = QueryType.SQL_LAB.value

          if (
            slice_id is None
            and dashboard_id is None
            and ({"/api/v1/chart/data", "/superset/explore_json/"}).intersection({request.path})
          ):
            # 4) Query from Explore(unsaved chart)
            query_info_dict["query_type"] = QueryType.EXPLORE.value

          if request.args.get("tag") == "report" and request.args.get("execution_id"):
            # 5) Query from Report
            query_info_dict["query_type"] = QueryType.REPORT.value
            query_info_dict["report_id"] = request.args.get("report_id")
            query_info_dict["execution_id"] = request.args.get("execution_id")
            query_info_dict["report_name"] = request.args.get("report_name")

          if request.args.get("tag") == "native_filter" and request.args.get("filter_name"):
            # 6) a query from Dashboard Native Filter
            query_info_dict["query_type"] = QueryType.NATIVE_FILTER.value
            query_info_dict["filter_name"] = request.args.get("filter_name")
          ....
          ....

@eschutho
Copy link
Member

eschutho commented Sep 14, 2023

@hwmarkcheng thanks for the PR! If form_data is passed along in the query context, slice_id should be available via that (if you search for slice_id under superset/common you should find how it's being used). But I kind of agree that slice_id should be a top level property, and I've been considering adding it, too, so maybe this is a good way to be more explicit about the relevant chart.

yeah, to me it feels a little strange that we're taking the slice_id from the form_data, and then fetching the slice from it and then getting the id off the slice, when we could just pull it from the form_data that's passed into the query_context. Which is prob another reason @villebro why you suggested to create only way to pass in the chart.. makes sense. Maybe to make this simpler for now, I'd suggest can we move forward with just using the form_data from the query_context instead of explicitly passing in the chart/slice_id into the query_obj? It also feels to me at least that the chart/slice_id has no relationship to the query object, but it does have a relationship to the form_data which is more closely related to a chart.

@eschutho
Copy link
Member

I don't think this PR is a good pattern in current Query Object because the "slice_id" isn't a part of a Query, we don't need slice_id to generate a SQL.

I agree with @zhaoyongjie on this point as well. I think a middle ground could be what @villebro had alluded to earlier that I commented on, which is that the slice_id already exists in the form data. LMK if you think that approach works @zhaoyongjie.

@hwmarkcheng
Copy link
Author

Thank you everyone for chiming in here! We agree that it doesn't make much sense to add slice_id into the current query object, we should instead directly retrieve it from form_data into the sql_query_mutator. Next steps we'll be looking to do:

  1. Remove the slice_id from the query object.
  2. Update the test code to make sure that: The data we want to add can still be found in the mutator from the form_data. We'll be using Yongjie’s suggestion as a starting point

@eschutho
Copy link
Member

eschutho commented Sep 15, 2023

Thanks @hwmarkcheng. Yeah if everyone else is ok with # 1, I think it sounds like a good plan. I still think it's a good idea to pass form_data to the mutator as taking it from the url could be brittle, and there will be use cases of api requests rather than a browser request that could trigger a query.

@hwmarkcheng
Copy link
Author

hwmarkcheng commented Sep 18, 2023

Closing PR as I'm looking to just revert my changes (accidentally merged upstream on fork). Will look to start a clean fork and implement proposed changes, apologies for the confusion here!

@MDeadman
Copy link

MDeadman commented Oct 3, 2023

Hi folks! @eschutho @villebro Thank you for your feedback on this thread!

I have been looking into the suggested approach you mentioned with @hwmarkcheng, about pulling the slice_id from the form_data in query_context instead of injecting the slice_id into the query_object.

While it makes sense that currently slice_id lives only in query_context/form_data as right now its not a part of a query, with this PR we are in essence changing that, as we now need the slice_id to build that comment for the final query.

The problem is however, that the places in which the final query string is constructed (get_query_str methods), do not have access to the query_context, as only the query_object is passed into that point.

This is the reason why we injected the value into the object as opposed to just pulling from query_context, as there would be what looks like a lot of functions and calls we would need to update to now pass along that query_context (or just form_data) everywhere, and that feels like it would be making a big shift away from why I am guessing the distinction between the query_context and query_object was made in the first place. This approach of injecting means there is a lot less broad changes we need to make for this to happen.

Should we be looking to make those changes and pass them around anyways? The other approach of modifying the SQL_QUERY_MUTATOR to use something like request args seems to brittle...

If you have any further guidance on what approach would be best, we would love to hear your input. Thank you so much!!

@eschutho
Copy link
Member

eschutho commented Oct 6, 2023

Thanks for that context @MDeadman, and that's a good assessment. If you'd like to reopen the PR and rebase with master, I'm happy to take another look at the changes proposed and try to help get this merged for you.

@hwmarkcheng hwmarkcheng reopened this Oct 18, 2023
@rusackas
Copy link
Member

It looks like the linter is having a problem. Hopefully running pre-commit run --all-files will either fix all the problems or at least point you in the right direction :)

@hwmarkcheng
Copy link
Author

Thanks! I've ran the pre-commit run --all-files, linting issues have been fixed

@hwmarkcheng
Copy link
Author

Still noticing some Errors in the linting CI step after running the pre-commit command. slice_id isn't really being directly used, but just passed through.
^ I think this is the portion we wanted to discuss about as mentioned in Matthew's comment.

************* Module superset.models.helpers
superset/models/helpers.py:1421:8: W0613: Unused argument 'slice_id' (unused-argument)
************* Module superset.common.query_object
superset/common/query_object.py:136:8: W0613: Unused argument 'slice_id' (unused-argument)

@eschutho
Copy link
Member

eschutho commented Nov 1, 2023

I have some ideas around this. There have been a few other cases where we've needed to pass some data around for the purpose of logging, and maybe I can tackle all of those with a new global pattern. I'll try to get a PR up for review in the next few days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants