Let’s talk Prometheus generally as an idea.
Focus is whether something is up or down - a binary version of monitoring.
Focus on the status of something - # of connections to an endpoint, CPU usage of a server, similar.
A more general term for expanded data gathering from a server. A longer interval than the above would be expected - hours or once a day. For instance, the domains on a server, or the packages and versions installed on a server.
Generally not exactly monitoring, per se. Sending the logs on a server off to a central location. Some systems allow alerting on specified conditions.
Data is generated on the server then pushed off to an endpoint.
A service not on the server connects to a server and retrieves some data. With basic up/down status, this may be checking whether the server is up via ping. With more complex data (metrics, telemetry, log shipping) the data is likely presented on a webpage or similar and an external server scrapes it.
Determining what to monitor with each Prometheus node.
Something that runs and provides data for ingest into Prometheus.
Service that combines streaming data from multiple sources.
This is the ideal Prometheus Cluster. You may not like it, but this is what Peak Performance looks like.
The majority of this will not carry state.
It should not need it - aside from specific parts.
Thus, stick everything in containers and stick it in Kubernetes so it scales.
So, do like k3os
or something and spin up Rancher somewhere and go do it.
And if you’re not using helm
and flux
in there, you’re doing it wrong.
Go revisit that.
Also, y’all gonna be using these.
You’re going to have a replicated stack to do the scraping and alerting.
prometheus
https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus#configuration
prometheus
- do the scrapingblackbox_exporter
- prometheus
scrapes against this to do ping etc…alertmanager
- prometheus sends it’s stream of data and you get alertsYou’re going to run a bunch of replicas of this. Keep your host count per replica down - theoritically 50 servers for now. There will probably shortly be examples in here idk fam.
Target discovery is difficult to do. You’ve got three basic options:
consul
- but then you’re bought into the Hashicorp system.Stick up a pair of Prometheus nodes, outside of kubernetes
.
Ingest data into there.
Just keep it simple.
Hit it for current stats as well.
You’ll appreciate it when your persistent storage gets flakey.
Ingeset Prometheus stream data into InfluxDB or something IDK you figure that out.
Then point a Prometheus node at that as a storage location so you query the same Prometheus endpoint.
Bonus points if you reuse the Prometheus historical stuff from before