Pod eviction due to low resources
Adding a big file to nextcloud or just running the cluster for a long time might lead to low disk space. In this case kubernetes will try to free up resources by deleting pods which will be recreated by the corresponding deployments. As this doesn't really free up sapce the process repeats until the cluster is completely falling apart. One reason for this is that the error logs grow faster and faster, leading to more pods being evicted and so on.
In my setup the kubelet logfile, located in the rancher-kubelet container was already 7.1G big. It seems that no logrotate mechanism takes care of it.
Deleting the logfile didn't really help in my case. I had to restart the cluster with RKE and then reinstall some applications as they didn't start properly because local-storage was not available during start-up.
In the end i added 2 additional args to kubelet by modifying the cluster group vars.
case@localhost [03:52:43 PM] [~/src/openappstack/clusters/test] [v0.3 *]
-> % cat group_vars/all/settings.yml
acme_staging: false
admin_email: admin@oas.example.net
cluster_dir: /home/case/src/openappstack/clusters/test
domain: oas.example.net
ip_address: 111.111.111.11
local_flux: true
release_name: test
rke_custom_config:
services:
kubelet:
extra_args:
eviction-hard: "memory.available<100Mi,nodefs.available<1Gi,imagefs.available<1Gi"
eviction-minimum-reclaim: "memory.available=0Mi,nodefs.available=0Mi,imagefs.available=0Gi"
https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/