Monitor droplet resource usage
Related: #722 (closed)
We need to find a way to monitor resource usage (CPU, RAM, Disk) so we can track possible regressions over time. i.e.:
- How much of the high load discussed in #722 (closed) is the fault of OAS ?
- How does a MR influence resource usage ?
Gitlab has the integrated Performance Monitoring which we might make use of, and it would be great to use this here integrated in open.gh.
Researching a bit how to do that I found this workflow the best:
Todo
Prometheus
On prometheus.oas.greenhost.net
-
Configure it to use DNS-based service discovery, so it will auto-discover new droplets, and they will get scraped - Configure prometheus federation job to scrape certain metrics:
-
node-exporter metrics ( node_.*
) -
configure authentication
-
-
Enable ingress for prometheus (Don't know if gitlab can handle prom auth ?) -
Increase grafana metrics retention time to ~8 weeks to have a better historical overview (!409 (merged))
On each droplet to scrape
-
Enable ingress for prometheus -
Use same credentials for each droplet
-
-
On creation, create a SRV DNS record for _prom-targets._tcp.ci.openappstack.net
with the content10 50 443 prometheus.DROPLETNAME.ci
-
On termination, delete SRV record
Gitlab integration
moved to #759
Edited by Varac