Newer
Older
Maintenance
===========
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
Logging
-------
Logs from pods and containers can be read in different ways:
- In the cluster filesystem at ``/var/log/pods/`` or
``/var/logs/containers/``.
- Using `kubectl logs`_
- Querying aggregated logs with Grafana, see below.
Central log aggregation
-----------------------
We use `Promtail`_, `Loki`_ and `Grafana`_ for easy access of aggregated
logs. The `Loki documentation`_ is a good starting point how this setup
works, and the `Using Loki in Grafana`_ gets you started with querying
your cluster logs with Grafana.
You will find the Loki Grafana integration on your cluster at
https://grafana.oas.example.org/explore together with some generic query
examples.
LogQL query examples
~~~~~~~~~~~~~~~~~~~~
Please also refer to the `LogQL documentation`_.
Query all aggregated logs (unfortunatly we can’t find a better way of
doing this since LogQL always expects a stream label to get queried):
.. code:: bash
logcli query '{foo!="bar"}'
Query all logs for a keyword:
.. code:: bash
logcli query '{foo!="bar"} |= "error"'
Query all k8s apps for errors using a regular expression:
.. code:: bash
logcli query '{job=~".*"} |~ "error|fail|exception|fatal"'
Flux
^^^^
`Flux`_ is responsible for installing applications. It uses four
controllers:
- ``source-controller`` that tracks Helm and Git repositories like
https://open.greenhost.net/openappstack/openappstack for updates.
- ``kustomize-controller`` to deploy ``kustomizations`` that often
install ``helmreleases``.
- ``helm-controller`` to deploy the ``helmreleases``.
- ``notification-controller`` that is responsible for inbound and
outbound flux messages
Query all messages from the ``source-controller``:
.. code:: bash
{app="source-controller"}
Query all messages from ``flux`` and ``helm-controller``:
.. code:: bash
{app=~"(source-controller|helm-controller)"}
``helm-controller`` messages containing ``wordpress``:
.. code:: bash
{app = "helm-controller"} |= "wordpress"
``helm-controller`` messages containing ``wordpress`` without
``unchanged`` events (to only show the installation messages):
.. code:: bash
{app = "helm-controller"} |= "wordpress" != "unchanged"
Filter out redundant ``helm-controller`` messages:
.. code:: bash
{ app = "helm-controller" } !~ "(unchanged | event=refreshed | method=Sync | component=checkpoint)"
Debug oauth2 single sign-on with rocketchat:
.. code:: bash
{container_name=~"(hydra|rocketchat)"}
Query kubernetes events processed by the ``eventrouter`` app containing
``warning``:
.. code:: bash
logcli query '{app="eventrouter"} |~ "warning"'
Cert-manager
^^^^^^^^^^^^
Cert manager is responsible for requesting Let’s Encrypt TLS
certificates.
Query ``cert-manager`` messages containing ``chat``:
.. code:: bash
{app="cert-manager"} |= "chat"
Hydra
^^^^^
Hydra is the single sign-on system.
Show only warnings and errors from ``hydra``:
.. code:: bash
{container_name="hydra"} != "level=info"
Backup
------
On your provisioning machine
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
During the installation process, a cluster config directory is created
on your provisioning machine, located in the top-level sub-directory
``clusters`` in your clone of the openappstack git repository. Although
these files are not essential for your OpenAppStack cluster to continue
functioning, you may want to back this folder up because it allows easy
access to your cluster.
On your cluster
~~~~~~~~~~~~~~~
OpenAppStack supports using the program Velero to make backups of your
OpenAppStack instance to external storage via the S3 API. See
:ref:`backups-with-velero` in the installation instructions for setup details.
By default this will make nightly backups of the entire cluster (minus
Prometheus data). To make a manual backup, run
.. code:: bash
cluster$ velero create backup BACKUP_NAME --exclude-namespaces velero --wait
from your VPS. See ``velero --help`` for other commands, and `Velero’s
documentation`_ for more information.
Note: in case you want to make an (additional) backup of application
data via alternate means, all persistent volume data of the cluster are
stored in directories under ``/var/lib/OpenAppStack/local-storage``.
Restore
-------
Restore instructions will follow, please `reach out to us`_ if you need
assistance.
Change the IP of your cluster
-----------------------------
In case your cluster needs to migrate to another IP, make sure to update
the IP address in ``/etc/rancher/k3s/k3s.yaml`` and, if applicable, your
local kube config and inventory.yml in the cluster directory
``clusters/oas.example.org``.
Delete evicted pods
-------------------
In case your cluster disk is full, kubernetes `taints`_ the node with
``DiskPressure``. Then it tries to evict pods, which is pointless in a single
node setup but can still happen. We have experienced hundreds of pods in
``evicted`` state that still showed up after ``DiskPressure`` had recovered. See
also the `out of resource handling with kubelet`_ documentation.
You can delete all evicted pods with this command:
.. code:: bash
kubectl get pods --all-namespaces -ojson | jq -r '.items[] | select(.status.reason!=null) | select(.status.reason | contains("Evicted")) | .metadata.name + " " + .metadata.namespace' | xargs -n2 -l bash -c 'kubectl delete pods $0 --namespace=$1'
.. _kubectl logs: https://kubernetes.io/docs/concepts/cluster-administration/logging
.. _Promtail: https://grafana.com/docs/loki/latest/clients/promtail/
.. _Loki: https://grafana.com/oss/loki/
.. _Grafana: https://grafana.com/
.. _Loki documentation: https://grafana.com/docs/loki/latest/
.. _Using Loki in Grafana: https://grafana.com/docs/grafana/latest/datasources/loki
.. _LogQL documentation: https://grafana.com/docs/loki/latest/logql
.. _Flux: https://fluxcd.io/
.. _Velero’s documentation: https://velero.io/docs/v1.4/
.. _reach out to us: https://openappstack.net/contact.html
.. _taints: https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/
.. _out of resource handling with kubelet: https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/