Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] Important metrics to be recorded for athens #446

Closed
manugupt1 opened this issue Aug 9, 2018 · 14 comments · Fixed by #1787
Closed

[Discussion] Important metrics to be recorded for athens #446

manugupt1 opened this issue Aug 9, 2018 · 14 comments · Fixed by #1787
Labels
observability Improving the observability of Athens running in production environments

Comments

@manugupt1
Copy link
Member

Metrics can be used to measure how Athens is performing. We should have metrics for both Olympus and Proxy.

Some metrics can be storage, latency, how much data has been transferred

Along with that, we want to see how much do we want to measure. For example,
A) Disk usage / day.
B) Errors / minute
C) Time to process a request / second

and so on.

It will be great if we can have a clear idea of the metrics that we want and create them as sub tickets for this one and start measuring one.

@ghost
Copy link

ghost commented Aug 9, 2018

Related to: #360

@michalpristas
Copy link
Member

i'm thinking of

  • Request - count (total, failed, succ), duration (ms),
  • Errors - count,
  • Disk usage
  • network usage
  • For workers number of in/out work items and time spent in a queue
  • Cache hit rate (because it's fun to watch)

I imagine a PR for a bullet point, so it's not so huge

@michalpristas michalpristas added the observability Improving the observability of Athens running in production environments label Aug 10, 2018
@timraymond
Copy link

I'll give this a shot. As discussed in Slack on 9/21/18 (discussion began roughly here), we're going to try to do this with opencensus.io, and more specifically, the ochttp plugin. There's some question as to how it can be integrated with Buffalo, which is part of what I'll be investigating as I look into this. Initial goals are going to be getting Go runtime stats and RED-style metrics[1] around HTTP handlers. This should coincide with roughly the first two bullet points on @michalpristas 's list.

[1] Stands for (R)ate, (E)rrors, (D)uration. This article does a decent job at introducing it, along with its cousin, USE (Utilization, Saturation, Errors).

@arschles
Copy link
Member

arschles commented Dec 7, 2018

@timraymond still interested in giving this a shot?

@timraymond
Copy link

@arschles Apologies, I had to take a step back due to some family issues that came up back in November. If someone else wants to pick this up, please feel free to do so.

Fortunately, things are starting to settle down again, so I might be able to contribute soon :)

@bndw
Copy link
Contributor

bndw commented Sep 27, 2019

Would adding per-module metrics be of use to anyone else? I'd like to be able to audit which packages are regularly accessed (think a counter with fields like name, version).

@abursavich
Copy link

@bndw, I would vote against any per-module metrics because they could have unbounded cardinality, which usually isn't ideal for real-time monitoring systems and may be more suited to (offline) log aggregation.

@bndw
Copy link
Contributor

bndw commented Sep 30, 2019

@abursavich Good point, I agree that the general import all the modules use case probably makes per-module metrics a bad idea.

I was looking at this through a corp lens where module usage is more constrained.

@arschles
Copy link
Member

arschles commented Oct 2, 2019

@bndw @abursavich do you think that a simple counter for top-level paths is still too much cardinality?

@bndw
Copy link
Contributor

bndw commented Oct 3, 2019

@arschles I think too much is subjective. Personally, our use-case needs metrics around package usage. Ideally Athens can provide that without us having to maintain a fork.

@arschles
Copy link
Member

I totally lost track of this one. Sorry @bndw - I agree that too much is subjective. I would say that Athens by default emits a "medium" amount of cardinality, and then you can turn it up as needed, via a config variable. Not sure about this, but would you be open to having Athens emit more metrics as the log level goes up?

@ghost
Copy link

ghost commented Feb 21, 2020

I imagine a PR for a bullet point, so it's not so huge

agreed, and maybe just the dashboard itself with whoever starts the first one.

@arschles
Copy link
Member

@robjloranger we already have a prometheus stats exporter. would you be cool with expanding our prometheus output (possibly based on the log level) and then using the built-in prometheus dashboard for #360?

@linzhp
Copy link
Contributor

linzhp commented Aug 27, 2022

Can I propose two more metrics?

  • vcsLister.List
  • goGetFetcher.Fetch

These two operations both depend on external services and are both expensive. It would be helpful to track their latency and count (success, failure).

For implementation, both operations call os/exec. Is there any existing View that I can use?

@manugupt1 manugupt1 reopened this Sep 23, 2022
@gomods gomods locked and limited conversation to collaborators Apr 9, 2024
@matt0x6F matt0x6F converted this issue into discussion #1940 Apr 9, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
observability Improving the observability of Athens running in production environments
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants