-
Maarten de Waard authoredVerified4afc7367
Maintenance
Logging
Logs from pods and containers can be read in different ways:
- In the cluster filesystem at
/var/log/pods/
or/var/logs/containers/
. - Using kubectl logs
- Querying aggregated logs with Grafana, see below.
Central log aggregation
We use Promtail, Loki and Grafana for easy access of aggregated logs. The Loki documentation is a good starting point how this setup works, and the Using Loki in Grafana gets you started with querying your cluster logs with Grafana.
You will find the Loki Grafana integration on your cluster at https://grafana.stackspin.example.org/explore together with some generic query examples.
LogQL query examples
Please also refer to the LogQL documentation.
Query all aggregated logs (unfortunatly we can’t find a better way of doing this since LogQL always expects a stream label to get queried):
logcli query '{foo!="bar"}'
Query all logs for a keyword:
logcli query '{foo!="bar"} |= "error"'
Query all k8s apps for errors using a regular expression:
logcli query '{job=~".*"} |~ "error|fail|exception|fatal"'
Flux
Flux is responsible for installing applications. It uses four controllers:
-
source-controller
that tracks Helm and Git repositories like https://open.greenhost.net/stackspin/stackspin for updates. -
kustomize-controller
to deploykustomizations
that often installhelmreleases
. -
helm-controller
to deploy thehelmreleases
. -
notification-controller
that is responsible for inbound and outbound flux messages
Query all messages from the source-controller
:
{app="source-controller"}
Query all messages from flux
and helm-controller
:
{app=~"(source-controller|helm-controller)"}
helm-controller
messages containing wordpress
:
{app = "helm-controller"} |= "wordpress"
helm-controller
messages containing wordpress
without
unchanged
events (to only show the installation messages):
{app = "helm-controller"} |= "wordpress" != "unchanged"
Filter out redundant helm-controller
messages:
{ app = "helm-controller" } !~ "(unchanged | event=refreshed | method=Sync | component=checkpoint)"
Debug oauth2 single sign-on with rocketchat:
{container_name=~"(hydra|rocketchat)"}
Query kubernetes events processed by the eventrouter
app containing
warning
:
logcli query '{app="eventrouter"} |~ "warning"'
Cert-manager
Cert manager is responsible for requesting Let’s Encrypt TLS certificates.
Query cert-manager
messages containing chat
:
{app="cert-manager"} |= "chat"
Hydra
Hydra is the single sign-on system.
Show only warnings and errors from hydra
:
{container_name="hydra"} != "level=info"
Backup
On your provisioning machine
During the installation process, a cluster config directory is created
on your provisioning machine, located in the top-level sub-directory
clusters
in your clone of the stackspin git repository. Although
these files are not essential for your OpenAppStack cluster to continue
functioning, you may want to back this folder up because it allows easy
access to your cluster.
On your cluster
OpenAppStack supports using the program Velero to make backups of your OpenAppStack instance to external storage via the S3 API. See :ref:`backups-with-velero` in the installation instructions for setup details. By default this will make nightly backups of the entire cluster (minus Prometheus data). To make a manual backup, run
cluster$ velero create backup BACKUP_NAME --exclude-namespaces velero --wait
from your VPS. See velero --help
for other commands, and Velero’s
documentation for more information.
Note: in case you want to make an (additional) backup of application
data via alternate means, all persistent volume data of the cluster are
stored in directories under /var/lib/OpenAppStack/local-storage
.
Restore
Restore instructions will follow, please reach out to us if you need assistance.
Change the IP of your cluster
In case your cluster needs to migrate to another IP, make sure to update
the IP address in /etc/rancher/k3s/k3s.yaml
and, if applicable, your
local kube config and inventory.yml in the cluster directory
clusters/stackspin.example.org
.
Delete evicted pods
In case your cluster disk is full, kubernetes taints the node with
DiskPressure
. Then it tries to evict pods, which is pointless in a single
node setup but can still happen. We have experienced hundreds of pods in
evicted
state that still showed up after DiskPressure
had recovered. See
also the out of resource handling with kubelet documentation.
You can delete all evicted pods with this command:
kubectl get pods --all-namespaces -ojson | jq -r '.items[] | select(.status.reason!=null) | select(.status.reason | contains("Evicted")) | .metadata.name + " " + .metadata.namespace' | xargs -n2 -l bash -c 'kubectl delete pods $0 --namespace=$1'
Apply changes to flux variables
Before installing, you configured cluster variables in your cluster directory
in .flux.env. If you change any of these variables after installation you can
apply the changes by following the :ref:`install_core_apps`
instructions until the step kubectl apply -k $CLUSTER_DIR
. Then, use the
following command that will apply the changes to all installed kustomizations:
flux get -A kustomizations --no-header | awk -F' ' '{system("flux reconcile -n " $1 " kustomization " $2)}'