Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SIP-119] Table Visualizations with Time Comparison Data #27432

Closed
eschutho opened this issue Mar 8, 2024 · 15 comments
Closed

[SIP-119] Table Visualizations with Time Comparison Data #27432

eschutho opened this issue Mar 8, 2024 · 15 comments
Assignees
Labels
sip Superset Improvement Proposal

Comments

@eschutho
Copy link
Member

eschutho commented Mar 8, 2024

[SIP-119] Proposal for Table Visualizations with Time Comparison Data

Motivation

Enhancing our visualization capabilities on the table chart to include time comparison data will help improve and facilitate efficient data analysis. Currently there are advanced analytics with time shift and comparisons in other charts. We would like to bring them into the Table chart with a few modifications.

Proposed Change

We would like to add additional columns for the time shifted value, the delta and the percentage, as well as an arrow showing direction. The extra data columns can be both collapsed and sorted. Because this table visualization has a high amount of usage, we would like to start on the development behind a feature flag CHART_PLUGINS_EXPERIMENTAL and plan to remove the changes from the feature flag once this feature is hardened and tested.

Preview of Table viz with time comparison data:

image

New or Changed Public Interfaces

Changes in the UI are shown above. All new features are planned to be included behind a feature flag.
We are working to use as much of the original api as possible.

New dependencies

No new dependencies

Other relevant changes

A time comparison for Big Number was also introduced recently with similar functionality.
Screenshot 2024-03-08 at 2 06 50 PM

We would like to also add this functionality to line and bar visualizations (in a separate SIP), but as those chart types already have time shift features, the improvements proposed will be changes and additions to the existing capabilities.

Migration Plan and Compatibility

No new migrations

Rejected Alternatives

Describe alternative approaches that were considered and rejected.

@eschutho eschutho added the sip Superset Improvement Proposal label Mar 8, 2024
@michael-s-molina
Copy link
Member

Thank you for the SIP @eschutho. Given that it's a WIP, I recommend checking out this issue to help work out how to display time comparisons in line charts. Time comparisons are frequently used at Airbnb and we're excited about the possible upcoming improvements. We look forward to the final version of the SIP to start the discussions.

@eschutho eschutho changed the title WIP [SIP] Visualizations with Time Comparison Data WIP [SIP] Table Visualizations with Time Comparison Data Mar 8, 2024
@eschutho
Copy link
Member Author

eschutho commented Mar 8, 2024

Thanks @michael-s-molina for the reference. It looks like we're not ready for the line and bar improvements just yet, so we will open a new SIP for those visualizations and focus just on Table for now.

@eschutho eschutho changed the title WIP [SIP] Table Visualizations with Time Comparison Data [SIP] Table Visualizations with Time Comparison Data Mar 12, 2024
@eschutho eschutho changed the title [SIP] Table Visualizations with Time Comparison Data [SIP-117] Table Visualizations with Time Comparison Data Mar 12, 2024
@eschutho eschutho changed the title [SIP-117] Table Visualizations with Time Comparison Data [SIP-119] Table Visualizations with Time Comparison Data Mar 12, 2024
@michael-s-molina
Copy link
Member

michael-s-molina commented Mar 13, 2024

Thank you for the SIP @eschutho. The proposed features look very promising and will definitely improve analysis power for Superset users. Some thoughts/questions:

  1. Given that we're using icons to represent time shift information, do we have tooltips explaining the meaning of them? For example, when I hover the delta icon, does it say what it is, and especially, what's the actual comparison window applied?

  2. Multiple time comparisons are supported in other chart types and are frequently used. They will be very useful for table charts where users can easily compare numerical metrics over different time shifts. Could you describe how are they going to work, specifically, how are we going to differentiate the headers and display "Actual range for comparison"?

  3. How "Inherit range from time filters" behave when the time filter is set to No filter?

  4. In all other chart types, time shifts are inside the Advanced Analytics panel. Here we are proposing to show them in a different section. I'm worried this will decrease UI consistency, as users frequently associate available features with the categories that the panel represent. In other words, once they associate that time shifts are inside the Advanced Analytics, they will always expect to find that feature in the same place between plugins.

  5. I understand that the current implementation of time shifts is limited and not flexible as the one we are proposing here. Assuming that all plugins will adhere to this new form of configuring a time shift, and trying to avoid the situation where time shifts behave differently between plugins, would be possible to change this globally? Apart from UI consistency, I'm also thinking about the required API/query object changes and maintaining a single implementation. If we want to do this in multiple steps, I would suggest to implement the new table features using the current time shift implementation and later work on the time shift changes globally.

  6. Could you expand on what are the required changes for our query API/concepts? I think this is the main point of the SIP given the many discussions happening on feat(plugins): Time Comparison in Table Chart #27355 and refactor: build shifted time filter in frontend #27386 about potential solutions.

  7. Given that Big Number with Time Comparison was introduced without a SIP, can we apply the resulting changes of this discussion to that plugin as well to ensure consistency?

I think what's being proposed in this SIP is really valuable. Our users frequently use these features at Airbnb, and it's a great opportunity to improve our current implementation. Thank you for driving the effort.

@eschutho
Copy link
Member Author

eschutho commented Mar 14, 2024

Thanks @michael-s-molina!

Here are some answers. I am going to defer to a few other folks to help here:
1. do we have tooltips explaining the meaning of them?
I'm going to ask @yousoph for help on this one.

2. Multiple time comparisons are supported in other chart types and are frequently used. Could you describe how are they going to work.
The table and bar charts do not currently have time comparisons, which is why we started here, but for Bar and Line which do have time comparisons, we are looking to see how we can best improve the existing functionality and will do a separate SIP on those chart types. If we find a better way to integrate with the existing time filters, we may need to come back to this chart type and update the filters to match. Another reason why this is being developed behind a feature flag.

3. How "Inherit range from time filters" behave when the time filter is set to No filter?
Good question. A time filter is required in order for inherit to work.

4. In all other chart types, time shifts are inside the Advanced Analytics panel. Here we are proposing to show them in a different section.
When we first looked at this chart type, the time shifts didn't seem like advanced analytics. So similar answer to above, there's likely going to be some work during Bar and Line to unify these, and we will definitely have more answers about that in the next SIP.

5. I understand that the current implementation of time shifts is limited and not flexible as the one we are proposing here. ...would be possible to change this globally?
Yes, that would be the goal.

6. Could you expand on what are the required changes for our query API/concepts?
The required changes for the api are actually very limited, as the advanced analytics already perform this time shift comparison. The only new addition that I noted was that we thought it would be a small improvement in the UI to show what the resulting time range would be, like this:
Screenshot 2024-03-13 at 5 49 08 PM

I've updated the sip with what the api could look like to fetch that data. For the other api changes, those aren't a result of this feature, but rather from a bug that was found in the existing visualizations that we wanted to fix. I filed a bug report for the issue found with how the data is being fetched, and we can discuss the best solution in that ticket or a separate SIP if we find that it would be helpful.

7. Given that Big Number was introduced without a SIP, can we apply the resulting changes of this discussion to that plugin as well to ensure consistency?
Yes, of course.

@kasiazjc
Copy link
Contributor

Taking this one @eschutho :)

Here are some answers. I am going to defer to a few other folks to help here: 1. do we have tooltips explaining the meaning of them? I'm going to ask @yousoph for help on this one.

Yes, when hovering over the column headers there will be tooltips displayed -
image

@zhaoyongjie
Copy link
Member

zhaoyongjie commented Mar 18, 2024

  1. Multiple time comparisons are supported in other chart types and are frequently used. Could you describe how are they going to work.
    The table and bar charts do not currently have time comparisons, which is why we started here, but for Bar and Line which do have time comparisons, we are looking to see how we can best improve the existing functionality and will do a separate SIP on those chart types. If we find a better way to integrate with the existing time filters, we may need to come back to this chart type and update the filters to match. Another reason why this is being developed behind a feature flag.

If you populate a temporal dimension, select a corresponding time filter, and enable advanced analytics section, the all viz except legacy nvd3 charts, have readily supported time comparison, rolling, and resample. I've opened a PR for demonstrate that Table chart with Advanced Analytics.

New or Changed Public Interfaces
Changes in the UI are shown above. Because we also want to show the time range in the UI, we will need to calculate that value separately. This may require a new API endpoint. All new features are planned to be included behind a feature flag.
The endpoint to fetch the time range could look like this:
get_relative_time_range_schema = {
"type": "object",
"properties": {"base_time_range": {"type": "string"}, "shift": {"type": "string"}},
}

The original /api/time_range already support multiple time ranges request at PR, IMO, no need to introduce a seprated endpoint

@Antonio-RiveroMartnez
Copy link
Member

If you populate a temporal dimension...

Would it work without populating such column? If not, what would be the easiest/clearest way to help user to set it properly and read the table's data if they don't want to have such column displayed?

The original /api/time_range already support multiple time ranges request...

IIRC the main difference it's not the multiple ranges support but rather the shift arg that's not supported by the current endpoint. If we want to pass two ranges (current and shifted) we would have to build the shifted range in the frontend as proposed here #27386, but as stated there, the backend will have to build the shifted ranges too at some point so we could either duplicate logic in frontend and backend or just pass the range in the api call?

@zhaoyongjie
Copy link
Member

Would it work without populating such column? If not, what would be the easiest/clearest way to help user to set it properly and read the table's data if they don't want to have such column displayed?

It might be a UI/UX topic, i'm not very familiar with that.

IIRC the main difference it's not the multiple ranges support but rather the shift arg that's not supported by the current endpoint. If we want to pass two ranges (current and shifted) we would have to build the shifted range in the frontend as proposed here #27386, but as stated there, the backend will have to build the shifted ranges too at some point so we could either duplicate logic in frontend and backend or just pass the range in the api call?

Not true, even though to build shifted time range in backend and follow all your designs, the original endpoint also easily extend to add "shift" offset, I've added comment at schema of time range

         humanize_time_range = [
            {"timeRange": "2021-01-01 : 2022-02-01"},
            {"timeRange": "2022-01-01 : 2023-02-01", "shift": "some-time-delta"},
            {"timeRange": "2022-01-01 : 2023-02-01", "shift": "some-time-delta"},
            {"timeRange": "2022-01-01 : 2023-02-01", "shift": "some-time-delta"},
         ]
         uri = f"api/v1/time_range/?q={prison.dumps(humanize_time_range)}"

@Antonio-RiveroMartnez
Copy link
Member

the original endpoint also easily extend to add "shift" offset, I've added [comment]...

Awesome! ❤️

@eschutho
Copy link
Member Author

Thanks @zhaoyongjie. I've removed the requirement for the new api from the doc. That's great.

@toransahu
Copy link

+1

@rusackas
Copy link
Member

rusackas commented May 7, 2024

@eschutho is this ready for a [VOTE]?

@Antonio-RiveroMartnez
Copy link
Member

Hey @rusackas I'm going to share some general snippets we want to introduce to better help understand the proposed changes and after that it should be ready. Thanks

@Antonio-RiveroMartnez
Copy link
Member

Extending current /time_range API endpoint

We are going to leverage recent changes in the API that let us use arrays as input of this API and extend the schema of it. We are adding a new shift property so the endpoint can return the adequate since and until with any passed shift applied to it using the existing get_since_untilmethod.

superset/views/api.py

get_time_range_schema = {
    "type": ["string", "array"],
    "items": {
        "type": "object",
        "properties": {
            "timeRange": {"type": "string"},
            "shift": {"type": "string"},
        },
    },
}
...
since, until = get_since_until(
    time_range=time_range["timeRange"],
    time_shift=time_range.get("shift"),
)
...

Extending how time_offset is handled in the backend

Superset doesn't support having only textual columns (dimension) when a time_offset comes into play. Instead we are breaking the hard requirement for a temporal column as part of the queyObject and a xaxis_label. Also the time_grain is mandatory as long a temporal dimensions are used.

def get_since_until_from_query_object(
    ...
    # The condition that checked for get_xaxis_label is removed
    if flt.get("op") == FilterOperator.TEMPORAL_RANGE.value and isinstance(
            flt.get("val"), str
        ):
    ...
)

Since now our time_offset API supports textual columns we might end up running a query that has no temporal column in it, in that case we cannot apply the offset to any column using the same filter values, for that particular case where no temporal column is passed we must run the query altering the filter itself so we can perform the joins later on.

superset/common/query_context_processor.py

def processing_time_offsets(
...
if not dataframe_utils.is_datetime_series(df.get(index)):
    # Lets find the first temporal filter in the filters array and change
    # its val to be the result of get_since_until with the offset
    for flt in query_object_clone.filter:
        if flt.get(
            "op"
        ) == FilterOperator.TEMPORAL_RANGE.value and isinstance(
            flt.get("val"), str
        ):
            time_range = cast(str, flt.get("val"))
            (
                new_outer_from_dttm,
                new_outer_to_dttm,
            ) = get_since_until_from_time_range(
                time_range=time_range,
                time_shift=offset,
            )
            flt["val"] = f"{new_outer_from_dttm} : {new_outer_to_dttm}"

Additionally, we are proposing the use of Full Outer Joins as a way to solve the missing datapoints when the query uses a temporal dimension, since SQLAlchemy cannot perform a Left Join as it is right now using ON clauses like ON queryA.temporal_column = queryB.temporal_column + time_delta.

The usage of the Full Outer Joins would produce more datapoints in our visualizations but generate a more correct result set.

superset/common/query_context_processor.py

def processing_time_offsets(
...

if join_keys and not time_grain:
    # only do a left join if textual dimensions are the only
    # thing we care about
    df = dataframe_utils.left_join_df(
        left_df=df,
        right_df=offset_df,
        join_keys=join_keys,
        rsuffix=R_SUFFIX,
    )
    else:
        # If a temporal dimension is used, we must use full outer join to
        # ensure that data is not lost if not present in both dataframes
        df = dataframe_utils.full_outer_join_df(
            left_df=df,
            right_df=offset_df,
            join_keys=join_keys,
            rsuffix=R_SUFFIX,
        )

Adding a new and reusable control section to charts

Advanced Analytics controls are meant to be our source of truth for visualizations that want to use time comparison, however, our main goal is to extract some of the inner controls in it to a more compact and flexible set of controls which are called Time Comparison.

In order to not create friction for visualizations unintentionally and collect better feedback from users, we are going to use this new section in a limited set of visualizations, meaning, until the full transition occurs to this new one, some degree of duplication is expected.

superset-frontend/packages/superset-ui-chart-controls/src/sections/timeComparison.tsx

export const timeComparisonControls: (
  multi?: boolean, // Multiple is not always true as in Advanced Analytics
  showCalculationType?: boolean, // Some visualizations might not want to make use of the calculation type
  showFullChoices?: boolean, // We can show a limited set of choices instead of the very populated one
) => 
...
) => ({
  label: t('Time Comparison'),
  tabOverride: 'data',
  description: t('Compare results with other time periods.'),
  controlSetRows: [
    [
      {
        name: 'time_compare',
        config: {
          type: 'SelectControl',
          multi,
          freeForm: true,
          placeholder: t('Select or type a custom value...'),
          label: t('Time shift'),
          choices: showFullChoices ? fullChoices : reducedChoices,
          // We would support the new Custom Start date and Inherit form time filter
          description: ...
        },
      },
    ],
    [
      {
        name: 'start_date_offset',
        config: {
          type: 'TimeOffsetControl',
          label: t('shift start date'),
          visibility: ({ controls }) =>
            controls?.time_compare.value === 'custom',
        },
      },
    ],
    [
      {
        name: 'comparison_type',
          ...
          visibility: () => Boolean(showCalculationType),
        },
      },
    ],
    [
      {
        name: 'comparison_range_label',
        config: {
          type: 'ComparisonRangeLabel',
          multi,
          visibility: ({ controls }) => Boolean(controls?.time_compare.value),
        },
      },
    ],
  ],
});

@eschutho
Copy link
Member Author

Vote passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sip Superset Improvement Proposal
Projects
Status: Implemented / Done
Development

No branches or pull requests

7 participants