Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref(alerts): Update Snuba queries to match events-stats more closely #77755

Draft
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

ceorourke
Copy link
Member

@ceorourke ceorourke commented Sep 18, 2024

When a user creates an anomaly detection alert we need to query snuba for 28 days worth of historical data to send to Seer to calculate the anomalies. Originally (#74614) I'd tried to pull out the relevant parts of the events-stats endpoint to mimic the data we see populated in metric alert preview charts (but for a larger time period, and it's happening after the rule is saved so I can't use any of the request object stuff) but I think I missed some things, so this PR aims to make that data be the same.

Closes https://getsentry.atlassian.net/browse/ALRT-288 (hopefully)

TODO

  • Double check each metric alert type's events-stats SQL output against anomaly detection's
  • Try to put crash rate alerts back in there

@@ -42,6 +42,27 @@
from sentry.utils.snuba import MAX_FIELDS, SnubaTSResult


def get_query_columns(columns, rollup):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this to be reused by anomaly detection

"""
serializer = SnubaTSResultSerializer(organization=organization, lookup=None, user=None)
Copy link
Member Author

@ceorourke ceorourke Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using the same serializer the events-stats endpoint uses and just pulling that data off to format into a list of TimeSeriesPoints for Seer's API. I clicked through every alert type and it always has the timestamp and count

data,
resolve_axis_column(query_columns[0]),
allow_partial_buckets=False,
zerofill_results=False,
Copy link
Member Author

@ceorourke ceorourke Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was getting strange results in tests with this set to True, and for our purposes I think it doesn't matter that much since we default to sending Seer a 0 if we don't find a count anyway

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "strange" I mean it was hitting this line and overwriting data with a count I had in a test as an empty array.

Copy link

codecov bot commented Sep 19, 2024

❌ 3 Tests Failed:

Tests completed Failed Passed Skipped
21625 3 21622 207
View the top 3 failed tests by shortest run time
tests.sentry.incidents.endpoints.test_organization_alert_rule_anomalies.AlertRuleAnomalyEndpointTest test_seer_error
Stack Traces | 8.74s run time
#x1B[1m#x1B[.../incidents/endpoints/test_organization_alert_rule_anomalies.py#x1B[0m:276: in test_seer_error
    resp = self.get_error_response(
#x1B[1m#x1B[.../sentry/testutils/cases.py#x1B[0m:793: in get_error_response
    assert_status_code(response, status_code)
#x1B[1m#x1B[.../sentry/testutils/asserts.py#x1B[0m:39: in assert_status_code
    assert minimum <= response.status_code < maximum, (
#x1B[1m#x1B[31mE   AssertionError: (200, b'[]')#x1B[0m
#x1B[1m#x1B[31mE   assert 400 <= 200#x1B[0m
#x1B[1m#x1B[31mE    +  where 200 = <Response status_code=200, "application/json">.status_code#x1B[0m
tests.sentry.incidents.endpoints.test_organization_alert_rule_anomalies.AlertRuleAnomalyEndpointTest test_simple
Stack Traces | 8.98s run time
#x1B[1m#x1B[.../incidents/endpoints/test_organization_alert_rule_anomalies.py#x1B[0m:115: in test_simple
    assert mock_seer_request.call_count == 1
#x1B[1m#x1B[31mE   AssertionError: assert 0 == 1#x1B[0m
#x1B[1m#x1B[31mE    +  where 0 = <MagicMock name='urlopen' id='139907010458256'>.call_count#x1B[0m
tests.sentry.incidents.endpoints.test_organization_alert_rule_anomalies.AlertRuleAnomalyEndpointTest test_timeout
Stack Traces | 9.07s run time
#x1B[1m#x1B[.../incidents/endpoints/test_organization_alert_rule_anomalies.py#x1B[0m:204: in test_timeout
    resp = self.get_error_response(
#x1B[1m#x1B[.../sentry/testutils/cases.py#x1B[0m:793: in get_error_response
    assert_status_code(response, status_code)
#x1B[1m#x1B[.../sentry/testutils/asserts.py#x1B[0m:39: in assert_status_code
    assert minimum <= response.status_code < maximum, (
#x1B[1m#x1B[31mE   AssertionError: (200, b'[]')#x1B[0m
#x1B[1m#x1B[31mE   assert 400 <= 200#x1B[0m
#x1B[1m#x1B[31mE    +  where 200 = <Response status_code=200, "application/json">.status_code#x1B[0m

To view individual test run time comparison to the main branch, go to the Test Analytics Dashboard

stats_period=None,
environments=environments,
)
snuba_query_string = get_snuba_query_string(snuba_query)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the key changes here - the front end constructs a stringified query based on snuba_query.query AND snuba_query.event_types. This adds a join to the table for things like errors count with the is:unresolved query, or when you're using the dropdown to select "errors", "default", or "errors OR default" event types

@ceorourke
Copy link
Member Author

The users experiencing errors query is selecting data as a different name but it's otherwise the same, I don't know if that makes a difference to the outcome?
events-stats:

SELECT (events._snuba_events.time AS _snuba_events.time), (uniq((events._snuba_events.tags[sentry:user] AS _snuba_events.tags[sentry:user])) AS _snuba_count_unique_user)

anomaly detection:
SELECT (events._snuba_events.time AS _snuba_events.time), (uniq((events._snuba_events.tags[sentry:user] AS _snuba_events.tags[sentry:user])) AS _snuba_count_unique_tags_sentry_user)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant