Skip to content
Snippets Groups Projects
Commit 582a30ee authored by Arie Peterson's avatar Arie Peterson
Browse files

Merge branch '1169-document-alertmanager-emails-and-what-they-mean' into 'main'

Resolve "Document alertmanager emails and what they mean"

Closes #1169

See merge request stackspin/stackspin!864
parents 6e72bcb4 93149e5f
No related branches found
No related tags found
No related merge requests found
......@@ -38,6 +38,7 @@ For more information, go to `the Stackspin website`_.
:caption: System Administration
logging
monitoring
maintenance
upgrading
customizing
......
......@@ -41,8 +41,11 @@ Outgoing email
Stackspin uses SMTP to send emails. This is essential for finishing account
setups with password recovery links. Additionally, apps like Nextcloud, Zulip
and Alertmanager will be able to send email notifications from the email address
and Wordpress will be able to send email notifications from the email address
configured here.
You also may receive alert notification emails from Stackspin's
monitoring system. See :ref:`monitoring:Email alerts` for more information about
those alerts, especially during installation.
Because Stackspin does not include an email server, you need to search your
(external) email provider's helpdesk for SMTP configuration details.
......
Monitoring
==========
For monitoring your Stackspin cluster we included the kube-prometheus-stack_
helm chart, which bundles the applications Grafana_, Prometheus_ and Alertmanager_,
and also includes pre-configured Prometheus alerts and Grafana dashboards.
Grafana
-------
Grafana can be accessed by clicking on the ``Monitoring`` icon in the ``Utilities``
Section of the dashboard. Use Stackspin single sign-on to login.
Dashboards
~~~~~~~~~~
Browse through the pre-configured dashboards to explore metrics of your
Stackspin cluster. Describing every dashboard would be too much here, reach out
for us if you don't find what you are looking for.
Browse aggregated logs in Grafana
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
See :ref:`logging:Viewing logs in Grafana` how to do this.
Prometheus
----------
Prometheus can be reached by adding ``prometheus.`` in front of your cluster
domain, i.e. ``https://prometheus.stackspin.example.org``. Until we `configure single
sign-on for prometheus`_ you need to login using basic auth.
The user name is ``admin``, the password can get retrieved by running
.. code::
python -m stackspin CLUSTERNAME secrets | grep prometheus-basic-auth
Alertmanager
------------
Alertmanager can be reached by adding ``alertmanager.`` in front of your cluster
domain, i.e. ``https://alertmanager.stackspin.example.org``. Until we `configure single
sign-on for prometheus`_ you need to login using basic auth.
The user name is ``admin``, the password can get retrieved by running
.. code::
python -m stackspin CLUSTERNAME secrets | grep alertmanager-basic-auth
Email alerts
------------
From time to time you might get email alerts sent by Alertmanager_ to the email
address you have set in the cluster configuration.
Common alerts include (listed by the ``alertname`` references in the email
body):
* **KubeJobCompletion**: A job did not complete successfully. Often happens
during initial setup phase. If the alert persists use i.e.
``kubectl -n stackspin-apps get jobs`` to see all jobs in the
``stackspin-apps`` namespace and delete the failed job
to silence the alert with i.e.
``kubectl -n stackspin-apps delete job nc-nextcloud-cron-27444460``.
* **ReconciliationFailure**: A `flux helmRelease`_ could not get reconciled
successfully. This also happen often during initial setup phase. It can have
different root causes though. Use
``flux -n stackspin-apps get helmreleases`` to view the current state of
all ``helmReleases`` in the ``stackspin-apps`` namespace.
In case the ``helmRelease`` in question is stuck in a ``install retries exhausted``
or ``upgrade retries exhausted`` state you can force a reconciliation with
.. code::
flux -n stackspin-apps suspend helmrelease zulip
flux -n stackspin-apps resume helmrelease zulip
Depending on the underlying cause this will fix the ``helmRelease`` state or
not.
For more information on this issue see `helmrelease upgrade retries exhausted regression`_
.. _kube-prometheus-stack: https://artifacthub.io/packages/helm/prometheus-community/kube-prometheus-stack
.. _Grafana: https://grafana.com
.. _Prometheus: https://prometheus.io
.. _Alertmanager: https://prometheus.io/docs/alerting/latest/alertmanager
.. _configure single sign-on for prometheus: https://open.greenhost.net/stackspin/stackspin/-/issues/371
.. _flux helmRelease: https://fluxcd.io/docs/guides/helmreleases
.. _helmrelease upgrade retries exhausted regression: https://github.com/fluxcd/flux2/issues/1878
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment