Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

otel: code cleanup #2286

Closed
3 tasks done
deniseli opened this issue Aug 7, 2024 · 1 comment · Fixed by #2432
Closed
3 tasks done

otel: code cleanup #2286

deniseli opened this issue Aug 7, 2024 · 1 comment · Fixed by #2432
Assignees

Comments

@deniseli
Copy link
Contributor

deniseli commented Aug 7, 2024

  • The noop refactor
  • Check across the metrics for attribute consistency, esp if there are attributes we log on some metrics that would be useful to add elsewhere
  • Check if the async call queue depth and latency metrics/attrs are enough, or if we'll want equivalents on pubsub directly (hopefully not, but double check)
@deniseli
Copy link
Contributor Author

Check if the async call queue depth and latency metrics/attrs are enough, or if we'll want equivalents on pubsub directly (hopefully not, but double check)

We should ideally add source tracing through all of these calls, but that's not a P1.

github-merge-queue bot pushed a commit that referenced this issue Aug 20, 2024
Fixes #2286

### Log Buckets

This PR introduces the ability to constrain `logBucket` with a max and
min bucket. Log buckets just bucket numbers (in this case, ms latency):

Example for base 4:
```
<1, [1,4), [4,16), [16,64), [64,256), [256,1024), [1024,4096), ...
```

In DataDog, looking across the last week, we see several latency buckets
displaying low counts in the noise, especially on the buckets closer to
1. This makes sense given the intrinsic nature of log buckets - they are
densest close to 1 and sparser the further up you go. To reduce
cardinality (thus, cost), we can chop away that noise at the low end.

It's similarly worth capping the buckets at the high end because
although the buckets get exponentially larger, you can still have
infinitely more buckets. We haven't been hit with any _major_ issues
yet, but you can imagine how this could impact cardinality during a
latency spike.

### Call Latency

DataDog links: [bucketed
counts](https://app.datadoghq.com/metric/explorer?fromUser=false&graph_layout=split&start=1723496312134&end=1724101112134&paused=false#N4Ig7glgJg5gpgFxALlAGwIYE8D2BXJVEADxQEYAaELcqyKBAC1pEbghkcLIF8qo4AMwgA7CAgg4RKUAiwAHOChASAtnADOcAE4RNIKtrgBHPJoQaUAbVBGN8qVoD6gnNtUZCKiOq279VKY6epbINiAiGOrKQdpYZAYgUJ4YThr42gDGSsgg6gi6mZaBZnHKGniqyIIIaAB0mRho9UZBGhbAAFQ8dRgaTpn4IggAFACUIDwAulSu7niYoeFzqgsYMaXxkzMg7VhoOaDyTYgIOUk4MAOLGhCZiWiicE5yiuU4j1APT070TMoiNweNCTfgQeyYLAvBTnR4iJTTHh8XbyR4IADCUmEMBkKLRTigPjgIlujmsoCkzxg2gw8kYTkU2mUNXqjWadW0eBELyJTlUGjqACM8JkANaIbZUR6qcQoABMABYqOltIQKdoBEzcgINPckTwgA),
[avg latency
rollup](https://app.datadoghq.com/metric/explorer?fromUser=false&graph_layout=multi&start=1723498319625&end=1724103119625&paused=false#N4Ig7glgJg5gpgFxALlAGwIYE8D2BXJVEADxQEYAaELcqyKBAC1pEbghkcLIF8qo4AMwgA7CAgg4RKUAiwAHOChASAtnADOcAE4RNIKtrgBHPJoQaUAbVBGN8qVoD6gnNtUZCKiOq279VKY6epbINiAiGOrKQdpYZAYgUJ4YThr42gDGSsgg6gi6mZaBZnHKGABuMMiCCGgAdJkYaA2qGk4IOE6ZOKryaIhwwABUPPXaOC148gAUlTAUAAQAbAAMAJQgPAC6VK7ueJih4fuqhxgxpfFbuyAacgMyIPLNiAg5STgw3UcaEJmJNCiOAdBQfdJAqCA4FOehMZQiNweNBbfgQeyYLCgxTKIEiJQ7Hh8O79cQAYSkwhgKBEhzQPCAA)

Currently set to base 2 without min/max.
* `<1`, `1-2` .. `8-16`: noise
* `16-32` .. `1024-2048`: healthy number of data points
* `2048-4096`, `4096-8192`: some spikes, but low data
* `8192-16384`: healthy number of data points (worth investigating
separately from this change)
* 15 buckets total

Proposed change: base 4, min `4^3` (i.e. `<64` is the smallest bucket),
max `4^7` (i.e. `>=16384` is the highest bucket, accounting for
outliers).
* => Max cardinality = 6 buckets

### Async Call Latency
DataDog links: [bucketed
counts](https://app.datadoghq.com/metric/explorer?fromUser=false&graph_layout=split&start=1723497145524&end=1724101945524&paused=false#N4Ig7glgJg5gpgFxALlAGwIYE8D2BXJVEADxQEYAaELcqyKBAC1pEbghkcLIF8qo4AMwgA7CAgg4RKUAiwAHOChASAtnADOcAE4RNIKtrgBHPJoQaUAbVBGN8qVoD6gnNtUZCKiOq279VKY6epbINiAiGOrKQdpYZAYgUJ4YThr42gDGSsgg6gi6mZaBZnHKGniqyIIIaAB0GBpYIplOmRho9Zk4qvJoiHBQwABUPA0abfgiCAAUAJQgPAC6VK7ueJih4WuqGxgxpfGLKyAacv0yIPIdiAg5STgwbZsaEJmJaKJwTnKK5TifKAfL5OehMZQiNweNCLfgQeyYLA-BT3T4iJTLHh8U59cQAYSkwhglwR4icUB8cBEr0c1lAUm+MG0GHkjCcim0yhq9UazVa7U6dTU31eLRFmTYUA2gycnicqg0dQARnhMgBrRDHKifVTiFAAJgALFR0tpCPTtAJObkBBp3lieEA),
[avg latency
rollup](https://app.datadoghq.com/metric/explorer?fromUser=false&graph_layout=multi&start=1723498319625&end=1724103119625&paused=false#N4Ig7glgJg5gpgFxALlAGwIYE8D2BXJVEADxQEYAaELcqyKBAC1pEbghkcLIF8qo4AMwgA7CAgg4RKUAiwAHOChASAtnADOcAE4RNIKtrgBHPJoQaUAbVBGN8qVoD6gnNtUZCKiOq279VKY6epbINiAiGOrKQdpYZAYgUJ4YThr42gDGSsgg6gi6mZaBZnHKGABuMMiCCGgAdBgaWCKZTpkYaA2qGk4IOO04qvJoiHDAAFQ89do4XXjyABSVMBQABABsAAwAlCA8ALpUru54mKHhJ6pnGDGl8ftHIBpyozIg8p2ICDlJODDtc4aCCZRJoURwPoKX7pcFQMEQpz0JjKERuDxofb8CD2TBYKGKZTgkRKQ48PjPEbiADCUmEMBQIjOaB4QA)

Currently set to base 8 without min/max.
* `8-64`: noise. Note that `<1` and `1-8` buckets have never occurred,
though they are possible in the current setup.
* `64-512`, `512-4096`: healthy number of data points
* `4096-32768`: noise
* 4 buckets total

Proposed change: base 4, min `4^4` (i.e. `<256` is the smallest bucket),
max `4^6` (i.e. `>=4096` is the highest bucket)
* => Max cardinality = 4 buckets

### Ingress Latency
DataDog links: [bucketed
counts](https://app.datadoghq.com/metric/explorer?fromUser=false&graph_layout=split&start=1723497891973&end=1724102691973&paused=false#N4Ig7glgJg5gpgFxALlAGwIYE8D2BXJVEADxQEYAaELcqyKBAC1pEbghkcLIF8qo4AMwgA7CAgg4RKUAiwAHOChASAtnADOcAE4RNIKtrgBHPJoQaUAbVBGN8qVoD6gnNtUZCKiOq279VKY6epbINiAiGOrKQdpYZAYgUJ4YThr42gDGSsgg6gi6mZaBZnHKGniqyIIIaAB0ojB2GnVGQRoWwABUPHUYGk6Z+CIIABQAlCA8ALpUru54mKHh86qLGDGl8VOzIB1YaDmg8hiHCAg5STgwg0saEJmJaKJwTnKK5TjPUE8vTvRMZQiNweNBTfgQeyYLBvBSXZ4iJQzHh8PbyZ4IADCUmEMBkaIxTigPjgInujmsoCkryaGHkjCcim0yhq9UazVaeBEbxJTlULQARnhMgBrRA7KjPVTiFAAJgALFR0tpCFTtAJmbkBBpHiieEA),
[avg latency
rollup](https://app.datadoghq.com/metric/explorer?fromUser=false&graph_layout=multi&start=1723498319625&end=1724103119625&paused=false#N4Ig7glgJg5gpgFxALlAGwIYE8D2BXJVEADxQEYAaELcqyKBAC1pEbghkcLIF8qo4AMwgA7CAgg4RKUAiwAHOChASAtnADOcAE4RNIKtrgBHPJoQaUAbVBGN8qVoD6gnNtUZCKiOq279VKY6epbINiAiGOrKQdpYZAYgUJ4YThr42gDGSsgg6gi6mZaBZnHKGABuMMiCCGgAdKIwdhr1qhpOCDhOmTiq8miIcMAAVDz12jhoaHjyABSVMBQABABsAAwAlCA8ALpUru54mKHhh6rHGDGl8Tv7IBpygzIg8hiDCAg5STgwPScaCCZRJoURwToKb7pUFQEFgpz0JjKERuDxoHb8CD2TBYCGKZSgkRKPY8PgPAbiADCUmEMBQImOaB4QA)
Currently set to base 2 without min/max.
* `<1`, `1-2` .. `16-32`: noise.
* `32-64` .. `1024-2048`: healthy number of data points
* `2048-4096`, `4096-8192 `: noise
* `8192-16384`: healthy number of data points (worth investigating
separately from this change)
* 15 buckets total

Proposed change: base 4, min `4^3` (i.e. `<64` is the smallest bucket),
max `4^7` (i.e. `>=16384` is the highest bucket).
* => Max cardinality = 6 buckets
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant