Skip to content
Snippets Groups Projects
Forked from stackspin / stackspin
4746 commits behind the upstream repository.
maintenance.rst 6.81 KiB

Maintenance

Logging

Logs from pods and containers can be read in different ways:

  • In the cluster filesystem at /var/log/pods/ or /var/logs/containers/.
  • Using kubectl logs
  • Querying aggregated logs with Grafana, see below.

Central log aggregation

We use Promtail, Loki and Grafana for easy access of aggregated logs. The Loki documentation is a good starting point how this setup works, and the Using Loki in Grafana gets you started with querying your cluster logs with Grafana.

You will find the Loki Grafana integration on your cluster at https://grafana.oas.example.org/explore together with some generic query examples.

LogQL query examples

Please also refer to the LogQL documentation.

Query all aggregated logs (unfortunatly we can’t find a better way of doing this since LogQL always expects a stream label to get queried):

logcli query '{foo!="bar"}'

Query all logs for a keyword:

logcli query '{foo!="bar"} |= "error"'

Query all k8s apps for errors using a regular expression:

logcli query '{job=~".*"} |~ "error|fail|exception|fatal"'

Flux

Flux is responsible for installing applications. It uses four controllers:

  • source-controller that tracks Helm and Git repositories like https://open.greenhost.net/openappstack/openappstack for updates.
  • kustomize-controller to deploy kustomizations that often install helmreleases.
  • helm-controller to deploy the helmreleases.
  • notification-controller that is responsible for inbound and outbound flux messages

Query all messages from the source-controller:

{app="source-controller"}

Query all messages from flux and helm-controller:

{app=~"(source-controller|helm-controller)"}

helm-controller messages containing wordpress:

{app = "helm-controller"} |= "wordpress"

helm-controller messages containing wordpress without unchanged events (to only show the installation messages):

{app = "helm-controller"} |= "wordpress" != "unchanged"

Filter out redundant helm-controller messages:

{ app = "helm-controller" } !~ "(unchanged | event=refreshed | method=Sync | component=checkpoint)"

Debug oauth2 single sign-on with rocketchat:

{container_name=~"(hydra|rocketchat)"}

Query kubernetes events processed by the eventrouter app containing warning:

logcli query '{app="eventrouter"} |~ "warning"'

Cert-manager

Cert manager is responsible for requesting Let’s Encrypt TLS certificates.

Query cert-manager messages containing chat:

{app="cert-manager"} |= "chat"

Hydra

Hydra is the single sign-on system.

Show only warnings and errors from hydra:

{container_name="hydra"} != "level=info"

Backup

On your provisioning machine

During the installation process, a cluster config directory is created on your provisioning machine, located in the top-level sub-directory clusters in your clone of the openappstack git repository. Although these files are not essential for your OpenAppStack cluster to continue functioning, you may want to back this folder up because it allows easy access to your cluster.

On your cluster

OpenAppStack supports using the program Velero to make backups of your OpenAppStack instance to external storage via the S3 API. See :ref:`backups-with-velero` in the installation instructions for setup details. By default this will make nightly backups of the entire cluster (minus Prometheus data). To make a manual backup, run

cluster$ velero create backup BACKUP_NAME --exclude-namespaces velero --wait

from your VPS. See velero --help for other commands, and Velero’s documentation for more information.

Note: in case you want to make an (additional) backup of application data via alternate means, all persistent volume data of the cluster are stored in directories under /var/lib/OpenAppStack/local-storage.

Restore

Restore instructions will follow, please reach out to us if you need assistance.

Change the IP of your cluster

In case your cluster needs to migrate to another IP, make sure to update the IP address in /etc/rancher/k3s/k3s.yaml and, if applicable, your local kube config and inventory.yml in the cluster directory clusters/oas.example.org.

Delete evicted pods

In case your cluster disk is full, kubernetes taints the node with DiskPressure. Then it tries to evict pods, which is pointless in a single node setup but can still happen. We have experienced hundreds of pods in evicted state that still showed up after DiskPressure had recovered. See also the out of resource handling with kubelet documentation.

You can delete all evicted pods with this command:

kubectl get pods --all-namespaces -ojson | jq -r '.items[] | select(.status.reason!=null) | select(.status.reason | contains("Evicted")) | .metadata.name + " " + .metadata.namespace' | xargs -n2 -l bash -c 'kubectl delete pods $0 --namespace=$1'

Apply changes to flux variables

Before installing, you configured cluster variables in your cluster directory in .flux.env. If you change any of these variables after installation you can apply the changes by following the Step 1: Install core applications instructions until the step kubectl apply -k $CLUSTER_DIR. Then, use the following command that will apply the changes to all installed helm releases:

kubectl get -A hr --template '{{range .items}}{{.metadata.namespace}}/{{.metadata.name}}{{"\n"}}{{end}}' | awk -F'/' '{system("flux reconcile -n " $1 " helmrelease " $2)}'