Skip to content
Snippets Groups Projects
troubleshooting.md 2.13 KiB
Newer Older
Varac's avatar
Varac committed
# Troubleshooting

Varac's avatar
Varac committed
Note: `cluster$` indicates that the commands should be run as root on your OAS cluster.
Varac's avatar
Varac committed

If you encounter problems when you upgrade your cluster, please make sure first
to include all potential new values of `ansible/group_vars/all/settings.yml.example`
to your `clusters/YOUR_CLUSTERNAME/group_vars/all/settings.yml`, and rerun the installation
Varac's avatar
Varac committed
## HTTPS Certificates

OAS uses [cert-manager](http://docs.cert-manager.io/en/latest/) to automatically
fetch [Let's Encrypt](https://letsencrypt.org/) certificates for all deployed
services. If you experience invalid SSL certificates (i.e. your browser warns you
when visiting Nextcloud (`https://files.YOUR.CLUSTER.DOMAIN`) here's how to
debug this:

Did you create your cluster using the `--acme-staging` argument?
Varac's avatar
Varac committed
Please check the resulting value of the `acme_staging` key in
`clusters/YOUR_CLUSTERNAME/group_vars/all/settings.yml`. If this is set to `true`, certificates
Varac's avatar
Varac committed
are fetched from the [Let's Encrypt staging API](https://letsencrypt.org/docs/staging-environment/),
which can't be validated by default in your browser.

Are all cert-manager pods in the `oas` namespace in the `READY` state ?
Varac's avatar
Varac committed

    cluster$ kubectl -n oas get pods | grep cert-manager

Are there any `cm-acme-http-solver-*` pods still running, indicating that there
are unfinished certificate requests ?

    cluster$ kubectl get pods --all-namespaces | grep cm-acme-http-solver
Varac's avatar
Varac committed

Show the logs of the main `cert-manager` pod:

    cluster$ kubectl -n oas logs -l "app.kubernetes.io/name=cert-manager"
Varac's avatar
Varac committed

You can `grep` for your cluster domain or for any specific subdomain to narrow
down results.


## Purge OAS and install from scratch

If ever things fail beyond possible recovery, here's how to completely purge an OAS installation in order to start from scratch:

    cluster$ apt purge docker-ce-cli containerd.io
    cluster$ mount | egrep '^(tmpfs.*kubelet|nsfs.*docker)' | cut -d' ' -f 3 | xargs umount
    cluster$ systemctl reboot
Varac's avatar
Varac committed
    cluster$ rm -rf /var/lib/docker /var/lib/OpenAppStack /etc/kubernetes /var/lib/etcd /var/lib/rancher /var/lib/kubelet /var/log/OpenAppStack /var/log/containers /var/log/pods