diff --git a/docs/index.rst b/docs/index.rst index fe6d38116522326477fd536737b7293f85148c17..082dbdae91af60c29b0bbc0986ae42bcfd2eab43 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -41,6 +41,7 @@ For more information, go to `the Stackspin website`_. :maxdepth: 2 :caption: Administration + logging maintenance upgrading customizing diff --git a/docs/logging.rst b/docs/logging.rst new file mode 100644 index 0000000000000000000000000000000000000000..f4fb041190d7eb3339eb69586ab17f4ca44db9ac --- /dev/null +++ b/docs/logging.rst @@ -0,0 +1,138 @@ +Logging +======= + +Logs from pods and containers can be read in different ways: + +- In the cluster filesystem at ``/var/log/pods/`` or + ``/var/logs/containers/``. +- Using `kubectl logs`_ +- Querying aggregated logs with Grafana, see below. + +Central log aggregation +----------------------- + +We use `Promtail`_, `Loki`_ and `Grafana`_ for easy access of aggregated +logs. The `Loki documentation`_ is a good starting point how this setup +works, and the `Using Loki in Grafana`_ gets you started with querying +your cluster logs with Grafana. + +You will find the Loki Grafana integration on your cluster at +https://grafana.stackspin.example.org/explore together with some generic query +examples. + +LogQL query examples +~~~~~~~~~~~~~~~~~~~~ + +Please also refer to the `LogQL documentation`_. + +Query all aggregated logs (unfortunatly we can’t find a better way of +doing this since LogQL always expects a stream label to get queried): + +.. code:: bash + + logcli query '{foo!="bar"}' + +Query all logs for a keyword: + +.. code:: bash + + logcli query '{foo!="bar"} |= "error"' + +Query all k8s apps for errors using a regular expression: + +.. code:: bash + + logcli query '{job=~".*"} |~ "error|fail|exception|fatal"' + +Flux +^^^^ + +`Flux`_ is responsible for installing applications. It uses four +controllers: + +- ``source-controller`` that tracks Helm and Git repositories like + https://open.greenhost.net/stackspin/stackspin for updates. +- ``kustomize-controller`` to deploy ``kustomizations`` that often + install ``helmreleases``. +- ``helm-controller`` to deploy the ``helmreleases``. +- ``notification-controller`` that is responsible for inbound and + outbound flux messages + +Query all messages from the ``source-controller``: + +.. code:: bash + + {app="source-controller"} + +Query all messages from ``flux`` and ``helm-controller``: + +.. code:: bash + + {app=~"(source-controller|helm-controller)"} + +``helm-controller`` messages containing ``wordpress``: + +.. code:: bash + + {app = "helm-controller"} |= "wordpress" + +``helm-controller`` messages containing ``wordpress`` without +``unchanged`` events (to only show the installation messages): + +.. code:: bash + + {app = "helm-controller"} |= "wordpress" != "unchanged" + +Filter out redundant ``helm-controller`` messages: + +.. code:: bash + + { app = "helm-controller" } !~ "(unchanged | event=refreshed | method=Sync | component=checkpoint)" + +Debug oauth2 single sign-on with zulip: + +.. code:: bash + + {container_name=~"(hydra|zulip)"} + +Query kubernetes events processed by the ``eventrouter`` app containing +``warning``: + +.. code:: bash + + logcli query '{app="eventrouter"} |~ "warning"' + +Cert-manager +^^^^^^^^^^^^ + +Cert manager is responsible for requesting Let’s Encrypt TLS +certificates. + +Query ``cert-manager`` messages containing ``chat``: + +.. code:: bash + + {app="cert-manager"} |= "chat" + +Hydra +^^^^^ + +Hydra is the single sign-on system. + +Show only warnings and errors from ``hydra``: + +.. code:: bash + + {container_name="hydra"} != "level=info" + +.. _kubectl logs: https://kubernetes.io/docs/concepts/cluster-administration/logging +.. _Promtail: https://grafana.com/docs/loki/latest/clients/promtail/ +.. _Loki: https://grafana.com/oss/loki/ +.. _Grafana: https://grafana.com/ +.. _Loki documentation: https://grafana.com/docs/loki/latest/ +.. _Using Loki in Grafana: https://grafana.com/docs/grafana/latest/datasources/loki +.. _LogQL documentation: https://grafana.com/docs/loki/latest/logql +.. _Flux: https://fluxcd.io/ +.. _reach out to us: https://stackspin.net/contact.html +.. _taints: https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/ +.. _out of resource handling with kubelet: https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/ diff --git a/docs/maintenance.rst b/docs/maintenance.rst index 81d002615a8ef4557e7ee267b716ae26ad2494f5..544fb0c92e13d358daa1329b2a6bd7861e3bc66d 100644 --- a/docs/maintenance.rst +++ b/docs/maintenance.rst @@ -1,133 +1,6 @@ Maintenance =========== -Logging -------- - -Logs from pods and containers can be read in different ways: - -- In the cluster filesystem at ``/var/log/pods/`` or - ``/var/logs/containers/``. -- Using `kubectl logs`_ -- Querying aggregated logs with Grafana, see below. - -Central log aggregation ------------------------ - -We use `Promtail`_, `Loki`_ and `Grafana`_ for easy access of aggregated -logs. The `Loki documentation`_ is a good starting point how this setup -works, and the `Using Loki in Grafana`_ gets you started with querying -your cluster logs with Grafana. - -You will find the Loki Grafana integration on your cluster at -https://grafana.stackspin.example.org/explore together with some generic query -examples. - -LogQL query examples -~~~~~~~~~~~~~~~~~~~~ - -Please also refer to the `LogQL documentation`_. - -Query all aggregated logs (unfortunatly we can’t find a better way of -doing this since LogQL always expects a stream label to get queried): - -.. code:: bash - - logcli query '{foo!="bar"}' - -Query all logs for a keyword: - -.. code:: bash - - logcli query '{foo!="bar"} |= "error"' - -Query all k8s apps for errors using a regular expression: - -.. code:: bash - - logcli query '{job=~".*"} |~ "error|fail|exception|fatal"' - -Flux -^^^^ - -`Flux`_ is responsible for installing applications. It uses four -controllers: - -- ``source-controller`` that tracks Helm and Git repositories like - https://open.greenhost.net/stackspin/stackspin for updates. -- ``kustomize-controller`` to deploy ``kustomizations`` that often - install ``helmreleases``. -- ``helm-controller`` to deploy the ``helmreleases``. -- ``notification-controller`` that is responsible for inbound and - outbound flux messages - -Query all messages from the ``source-controller``: - -.. code:: bash - - {app="source-controller"} - -Query all messages from ``flux`` and ``helm-controller``: - -.. code:: bash - - {app=~"(source-controller|helm-controller)"} - -``helm-controller`` messages containing ``wordpress``: - -.. code:: bash - - {app = "helm-controller"} |= "wordpress" - -``helm-controller`` messages containing ``wordpress`` without -``unchanged`` events (to only show the installation messages): - -.. code:: bash - - {app = "helm-controller"} |= "wordpress" != "unchanged" - -Filter out redundant ``helm-controller`` messages: - -.. code:: bash - - { app = "helm-controller" } !~ "(unchanged | event=refreshed | method=Sync | component=checkpoint)" - -Debug oauth2 single sign-on with zulip: - -.. code:: bash - - {container_name=~"(hydra|zulip)"} - -Query kubernetes events processed by the ``eventrouter`` app containing -``warning``: - -.. code:: bash - - logcli query '{app="eventrouter"} |~ "warning"' - -Cert-manager -^^^^^^^^^^^^ - -Cert manager is responsible for requesting Let’s Encrypt TLS -certificates. - -Query ``cert-manager`` messages containing ``chat``: - -.. code:: bash - - {app="cert-manager"} |= "chat" - -Hydra -^^^^^ - -Hydra is the single sign-on system. - -Show only warnings and errors from ``hydra``: - -.. code:: bash - - {container_name="hydra"} != "level=info" - Backup ------ @@ -204,14 +77,6 @@ following command that will apply the changes to all installed kustomizations: flux get -A kustomizations --no-header | awk -F' ' '{system("flux reconcile -n " $1 " kustomization " $2)}' -.. _kubectl logs: https://kubernetes.io/docs/concepts/cluster-administration/logging -.. _Promtail: https://grafana.com/docs/loki/latest/clients/promtail/ -.. _Loki: https://grafana.com/oss/loki/ -.. _Grafana: https://grafana.com/ -.. _Loki documentation: https://grafana.com/docs/loki/latest/ -.. _Using Loki in Grafana: https://grafana.com/docs/grafana/latest/datasources/loki -.. _LogQL documentation: https://grafana.com/docs/loki/latest/logql -.. _Flux: https://fluxcd.io/ .. _Velero’s documentation: https://velero.io/docs/v1.4/ .. _reach out to us: https://stackspin.net/contact.html .. _taints: https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/