Consider allowing nextcloud to run when postStart fails
Currently, when any of the steps in our postStart script fails, this crashes the nextcloud pod. Depending on why the script fails, this is maybe too harsh.
For example, we recently had
- a case where nextcloud ended up in maintenance mode -- not sure why, maybe an upgrade was ended prematurely by the livenessprobe. This makes the postStart script consistently fail and prevents nextcloud from working at all.
- a case where a plugin upgrade was unsuccessful (not compatible with installed nextcloud version or something).
In both cases, it would be better to allow nextcloud to start. In the second case that would allow most or all usage of nextcloud to continue. In the first case, this would make it substantially easier to get nextcloud out of maintenance mode and back on the road.
The downside of this change would be that potential problems could go unnoticed, so we should replace it with a mechanism that creates a prometheus alert in case there is a problem in the postStart script.