Skip to content

Monitor droplet resource usage

Related: #722 (closed)

We need to find a way to monitor resource usage (CPU, RAM, Disk) so we can track possible regressions over time. i.e.:

  • How much of the high load discussed in #722 (closed) is the fault of OAS ?
  • How does a MR influence resource usage ?

Gitlab has the integrated Performance Monitoring which we might make use of, and it would be great to use this here integrated in open.gh.

Researching a bit how to do that I found this workflow the best:

Todo

Prometheus

On prometheus.oas.greenhost.net

  • Configure it to use DNS-based service discovery, so it will auto-discover new droplets, and they will get scraped
  • Configure prometheus federation job to scrape certain metrics:
    • node-exporter metrics (node_.*)
    • configure authentication
  • Enable ingress for prometheus (Don't know if gitlab can handle prom auth ?)
  • Increase grafana metrics retention time to ~8 weeks to have a better historical overview (!409 (merged))

On each droplet to scrape

  • Enable ingress for prometheus
    • Use same credentials for each droplet
  • On creation, create a SRV DNS record for _prom-targets._tcp.ci.openappstack.net with the content 10 50 443 prometheus.DROPLETNAME.ci
  • On termination, delete SRV record

Gitlab integration

moved to #759

Edited by Varac
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information