[Discussion] Important metrics to be recorded for athens #446

manugupt1 · 2018-08-09T22:55:04Z

Metrics can be used to measure how Athens is performing. We should have metrics for both Olympus and Proxy.

Some metrics can be storage, latency, how much data has been transferred

Along with that, we want to see how much do we want to measure. For example,
A) Disk usage / day.
B) Errors / minute
C) Time to process a request / second

and so on.

It will be great if we can have a clear idea of the metrics that we want and create them as sub tickets for this one and start measuring one.

ghost · 2018-08-09T22:59:31Z

Related to: #360

michalpristas · 2018-08-10T07:36:29Z

i'm thinking of

Request - count (total, failed, succ), duration (ms),
Errors - count,
Disk usage
network usage
For workers number of in/out work items and time spent in a queue
Cache hit rate (because it's fun to watch)

I imagine a PR for a bullet point, so it's not so huge

timraymond · 2018-09-26T03:56:35Z

I'll give this a shot. As discussed in Slack on 9/21/18 (discussion began roughly here), we're going to try to do this with opencensus.io, and more specifically, the ochttp plugin. There's some question as to how it can be integrated with Buffalo, which is part of what I'll be investigating as I look into this. Initial goals are going to be getting Go runtime stats and RED-style metrics[1] around HTTP handlers. This should coincide with roughly the first two bullet points on @michalpristas 's list.

[1] Stands for (R)ate, (E)rrors, (D)uration. This article does a decent job at introducing it, along with its cousin, USE (Utilization, Saturation, Errors).

arschles · 2018-12-07T01:51:32Z

@timraymond still interested in giving this a shot?

timraymond · 2018-12-13T21:10:09Z

@arschles Apologies, I had to take a step back due to some family issues that came up back in November. If someone else wants to pick this up, please feel free to do so.

Fortunately, things are starting to settle down again, so I might be able to contribute soon :)

bndw · 2019-09-27T21:02:21Z

Would adding per-module metrics be of use to anyone else? I'd like to be able to audit which packages are regularly accessed (think a counter with fields like name, version).

abursavich · 2019-09-27T23:36:11Z

@bndw, I would vote against any per-module metrics because they could have unbounded cardinality, which usually isn't ideal for real-time monitoring systems and may be more suited to (offline) log aggregation.

bndw · 2019-09-30T20:58:20Z

@abursavich Good point, I agree that the general import all the modules use case probably makes per-module metrics a bad idea.

I was looking at this through a corp lens where module usage is more constrained.

arschles · 2019-10-02T19:03:01Z

@bndw @abursavich do you think that a simple counter for top-level paths is still too much cardinality?

bndw · 2019-10-03T16:27:20Z

@arschles I think too much is subjective. Personally, our use-case needs metrics around package usage. Ideally Athens can provide that without us having to maintain a fork.

arschles · 2020-02-21T20:47:44Z

I totally lost track of this one. Sorry @bndw - I agree that too much is subjective. I would say that Athens by default emits a "medium" amount of cardinality, and then you can turn it up as needed, via a config variable. Not sure about this, but would you be open to having Athens emit more metrics as the log level goes up?

ghost · 2020-02-21T22:31:23Z

I imagine a PR for a bullet point, so it's not so huge

agreed, and maybe just the dashboard itself with whoever starts the first one.

arschles · 2020-03-11T00:05:05Z

@robjloranger we already have a prometheus stats exporter. would you be cool with expanding our prometheus output (possibly based on the log level) and then using the built-in prometheus dashboard for #360?

linzhp · 2022-08-27T01:03:53Z

Can I propose two more metrics?

vcsLister.List
goGetFetcher.Fetch

These two operations both depend on external services and are both expensive. It would be helpful to track their latency and count (success, failure).

For implementation, both operations call os/exec. Is there any existing View that I can use?

ghost mentioned this issue Aug 9, 2018

Add proxy admin dashboard page #360

Open

michalpristas added the observability Improving the observability of Athens running in production environments label Aug 10, 2018

michalpristas mentioned this issue Oct 25, 2018

Support for Metrics using Prometheus #816

Closed

t-tomalak mentioned this issue Dec 3, 2018

Add prometheus metrics collectors for http handlers #958

Merged

arschles mentioned this issue Dec 7, 2018

Proposal: add indication in logs whether returned data was in storage or not #892

Open

linzhp mentioned this issue Aug 27, 2022

Register HTTP client views for stats #1787

Merged

manugupt1 closed this as completed in #1787 Sep 23, 2022

manugupt1 reopened this Sep 23, 2022

gomods locked and limited conversation to collaborators Apr 9, 2024

matt0x6F converted this issue into discussion #1940 Apr 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

[Discussion] Important metrics to be recorded for athens #446

[Discussion] Important metrics to be recorded for athens #446

manugupt1 commented Aug 9, 2018

ghost commented Aug 9, 2018

michalpristas commented Aug 10, 2018

timraymond commented Sep 26, 2018

arschles commented Dec 7, 2018

timraymond commented Dec 13, 2018

bndw commented Sep 27, 2019 •

edited

Loading

abursavich commented Sep 27, 2019

bndw commented Sep 30, 2019 •

edited

Loading

arschles commented Oct 2, 2019

bndw commented Oct 3, 2019

arschles commented Feb 21, 2020

ghost commented Feb 21, 2020

arschles commented Mar 11, 2020

linzhp commented Aug 27, 2022

This issue was moved to a discussion.

This issue was moved to a discussion.

[Discussion] Important metrics to be recorded for athens #446

[Discussion] Important metrics to be recorded for athens #446

Comments

manugupt1 commented Aug 9, 2018

ghost commented Aug 9, 2018

michalpristas commented Aug 10, 2018

timraymond commented Sep 26, 2018

arschles commented Dec 7, 2018

timraymond commented Dec 13, 2018

bndw commented Sep 27, 2019 • edited Loading

abursavich commented Sep 27, 2019

bndw commented Sep 30, 2019 • edited Loading

arschles commented Oct 2, 2019

bndw commented Oct 3, 2019

arschles commented Feb 21, 2020

ghost commented Feb 21, 2020

arschles commented Mar 11, 2020

linzhp commented Aug 27, 2022

This issue was moved to a discussion.

bndw commented Sep 27, 2019 •

edited

Loading

bndw commented Sep 30, 2019 •

edited

Loading