# Troubleshooting Note: `cluster$` indicates that the commands should be run as root on your OAS cluster. ## Run the cli tests To get an overall status of your cluster you can run the tests from the command line. There are two types of tests: [testinfra](https://testinfra.readthedocs.io/en/latest/) tests, and [behave](https://behave.readthedocs.io/en/latest/) tests. ### Testinfra tests Testinfra tests are split into two groups, lets call them `blackbox` and `whitebox` tests. The blackbox tests run on your provisioning machine and test the OAS cluster from the outside. For example, the certificate check will check if the OAS will return valid certificates for the provided services. The whitebox tests run on the OAS host and check i.e. if docker is installed in the right version etc. To run the test against your cluster, first export the `CLUSTER_DIR` environment variabel with the location of your cluster config directory: export CLUSTER_DIR="../clusters/CLUSTERNAME" Run all tests: py.test -s --ansible-inventory=${CLUSTER_DIR}/inventory.yml --hosts='ansible://*' #### Advance usage Specify host manually: py.test -s --hosts='ssh://root@example.openappstack.net' Run only tests tagged with `prometheus`: py.test -s --ansible-inventory=${CLUSTER_DIR}/inventory.yml --hosts='ansible://*' -m prometheus Run cert test manually using the ansible inventory file: py.test -s --ansible-inventory=${CLUSTER_DIR}/inventory.yml --hosts='ansible://*' -m certs Run cert test manually against a different cluster, not configured in any ansible inventory file, either by using pytest: FQDN='example.openappstack.net' py.test -sv -m 'certs' or directly: FQDN='example.openappstack.net' pytest/test_certs.py #### Known Issues - Default ssh backend for testinfra tests is `paramiko`, which doesn't work oout of the box. It fails to connect to the host because the `ed25519` hostkey was not verified. Therefore we need to force plain ssh:// with either `connection=ssh` or `--hosts=ssh://…` #### Running tests with local gitlab-runner docker executor Export the following environment variables like this: export CI_REGISTRY_IMAGE='open.greenhost.net:4567/openappstack/openappstack' export SSH_PRIVATE_KEY="$(cat ~/.ssh/id_ed25519_oas_ci)" export COSMOS_API_TOKEN='…' then: gitlab-runner exec docker --env CI_REGISTRY_IMAGE="$CI_REGISTRY_IMAGE" --env SSH_PRIVATE_KEY="$SSH_PRIVATE_KEY" --env COSMOS_API_TOKEN="$COSMOS_API_TOKEN" bootstrap ## Behave tests Behave tests run in a headless browser and test if all the interfaces are up and running and correctly connected to each other. They are integrated in the `openappstack` CLI command suite. To run the behave tests, run the following command in this repository: python -m openappstack CLUSTERNAME test In the future, this command will run all tests, but now only *behave* is implemented. To learn more about the `test` subcommand, run: python -m openappstack CLUSTERNAME test --help ## Upgrading If you encounter problems when you upgrade your cluster, please make sure first to include all potential new values of `ansible/group_vars/all/settings.yml.example` to your `clusters/YOUR_CLUSTERNAME/group_vars/all/settings.yml`, and rerun the installation script. ## HTTPS Certificates OAS uses [cert-manager](http://docs.cert-manager.io/en/latest/) to automatically fetch [Let's Encrypt](https://letsencrypt.org/) certificates for all deployed services. If you experience invalid SSL certificates (i.e. your browser warns you when visiting Nextcloud (`https://files.YOUR.CLUSTER.DOMAIN`) here's how to debug this: Did you create your cluster using the `--acme-staging` argument? Please check the resulting value of the `acme_staging` key in `clusters/YOUR_CLUSTERNAME/group_vars/all/settings.yml`. If this is set to `true`, certificates are fetched from the [Let's Encrypt staging API](https://letsencrypt.org/docs/staging-environment/), which can't be validated by default in your browser. Are all cert-manager pods in the `oas` namespace in the `READY` state ? cluster$ kubectl -n oas get pods | grep cert-manager Are there any `cm-acme-http-solver-*` pods still running, indicating that there are unfinished certificate requests ? cluster$ kubectl get pods --all-namespaces | grep cm-acme-http-solver Show the logs of the main `cert-manager` pod: cluster$ kubectl -n oas logs -l "app.kubernetes.io/name=cert-manager" You can `grep` for your cluster domain or for any specific subdomain to narrow down results. ## Purge OAS and install from scratch If ever things fail beyond possible recovery, here's how to completely purge an OAS installation in order to start from scratch: cluster$ apt purge docker-ce-cli containerd.io cluster$ mount | egrep '^(tmpfs.*kubelet|nsfs.*docker)' | cut -d' ' -f 3 | xargs umount cluster$ systemctl reboot cluster$ rm -rf /var/lib/docker /var/lib/OpenAppStack /etc/kubernetes /var/lib/etcd /var/lib/rancher /var/lib/kubelet /var/log/OpenAppStack /var/log/containers /var/log/pods