Switch to prometheus-stack (!380) · Merge requests · stackspin / stackspin

Maarten de Waard requested to merge 742-try-prometheus-kube-stack into master Apr 07, 2021

I switched to kube-prometheus-stack, which includes the alerts, dashboards and rules that we wanted, without having to download them from a separate kubernetes-mixins project.

The downside of this switch is that we are now back to using prometheus-operator, but it seems like the operator itself only uses very little CPU and around 40MB of ram. That's acceptable IMO. The extra advantage is that this approach is compatible with multi-node clusters, and our old approach (prometheus & grafana separately installed) was not.

Resources used: https://www.civo.com/learn/monitoring-k3s-with-the-prometheus-operator-and-custom-email-alerts and https://github.com/cablespaghetti/k3s-monitoring to make sure the config will work with k3s

Closed issues:

#743 (closed) -- I changed this to check the helm-operator logs. I don't think we specifically had to test if eventrouter logs were there, just knowing that Loki works and that Grafana can show the data should be sufficient for this integration test
#742 (closed) -- Obviously
#712 (closed) Adds several dashboards with Kubernetes metrics
#711 (closed) Adds better node dashboards

Closes #743 (closed) #742 (closed) #712 (closed) #711 (closed)

Edited Apr 08, 2021 by Maarten de Waard

Admin message

Switch to prometheus-stack

Merge request reports