Skip to content

Certificate status doesn't recover from failed state

We have a lot of failed certs in CI since we switched to ZeroSSL, and I hope this gets better moving to another CA. However, there also seems to be sth wrong with cert-manager. I was investigating the droplet used for !681 (merged), where only the sso cert is still not valid. I wrote a script that summarized cert-manager resources (attached: k8s_debug_cert_manager.sh). The output shows that only the certificate resource it self is failing, although the secret already holds the proper zerossl cert:

❯ ~/bin/k8s_debug_cert_manager.sh 
Issuers:
No resources found in default namespace.

ClusterIssuers:
NAME                 READY   AGE
letsencrypt-issuer   True    3h48m
zerossl-issuer       True    3h48m

Challenges:
No resources found

Orders:
NAMESPACE   NAME                                                              STATE   AGE
stackspin   dashboard.1077-possible-loki-mem-leak.ci.stackspin.n-2821898803   valid   3h44m
stackspin   alertmanager-tls-952q9-2580165342                                 valid   3h47m
stackspin   single-sign-on-userpanel.tls-9q2rj-4061664042                     valid   3h48m
stackspin   grafana-tls-w5dbf-1045808427                                      valid   3h47m
stackspin   prometheus-tls-7bc44-2182223537                                   valid   100m
stackspin   hydra-public.tls-hwgz9-4285727226                                 valid   39m

Certificaterequest:
NAMESPACE   NAME                                                         APPROVED   DENIED   READY   ISSUER           REQUESTOR                                         AGE
stackspin   dashboard.1077-possible-loki-mem-leak.ci.stackspin.n-bcr89   True                True    zerossl-issuer   system:serviceaccount:cert-manager:cert-manager   3h44m
stackspin   alertmanager-tls-952q9                                       True                True    zerossl-issuer   system:serviceaccount:cert-manager:cert-manager   3h47m
stackspin   single-sign-on-userpanel.tls-9q2rj                           True                True    zerossl-issuer   system:serviceaccount:cert-manager:cert-manager   3h48m
stackspin   grafana-tls-w5dbf                                            True                True    zerossl-issuer   system:serviceaccount:cert-manager:cert-manager   3h47m
stackspin   prometheus-tls-7bc44                                         True                True    zerossl-issuer   system:serviceaccount:cert-manager:cert-manager   100m
stackspin   hydra-public.tls-hwgz9                                       True                True    zerossl-issuer   system:serviceaccount:cert-manager:cert-manager   39m

Certificates:
NAMESPACE   NAME                                                         READY   SECRET                                                       AGE
stackspin   dashboard.1077-possible-loki-mem-leak.ci.stackspin.net-tls   True    dashboard.1077-possible-loki-mem-leak.ci.stackspin.net-tls   3h44m
stackspin   alertmanager-tls                                             True    alertmanager-tls                                             3h47m
stackspin   single-sign-on-userpanel.tls                                 True    single-sign-on-userpanel.tls                                 3h48m
stackspin   grafana-tls                                                  True    grafana-tls                                                  3h47m
stackspin   prometheus-tls                                               True    prometheus-tls                                               3h47m
stackspin   hydra-public.tls                                             False   hydra-public.tls                                             3h48m

Certificate resource issuers:
dashboard.1077-possible-loki-mem-leak.ci.stackspin.net-tls: zerossl-issuer
alertmanager-tls: zerossl-issuer
single-sign-on-userpanel.tls: zerossl-issuer
grafana-tls: zerossl-issuer
prometheus-tls: zerossl-issuer
hydra-public.tls: zerossl-issuer

Secrets:
NAMESPACE     NAME                                                         TYPE                DATA   AGE
kube-system   k3s-serving                                                  kubernetes.io/tls   2      3h52m
stackspin     dashboard.1077-possible-loki-mem-leak.ci.stackspin.net-tls   kubernetes.io/tls   2      3h41m
stackspin     alertmanager-tls                                             kubernetes.io/tls   2      3h40m
stackspin     single-sign-on-userpanel.tls                                 kubernetes.io/tls   2      3h40m
stackspin     grafana-tls                                                  kubernetes.io/tls   2      3h35m
stackspin     prometheus-tls                                               kubernetes.io/tls   2      99m

Certificate secrets and actual issuers:
kube-system/k3s-serving: issuer=CN = k3s-server-ca@1639391072 subject=O = k3s, CN = k3s
stackspin/dashboard.1077-possible-loki-mem-leak.ci.stackspin.net-tls: issuer=C = AT, O = ZeroSSL, CN = ZeroSSL RSA Domain Secure Site CA subject=CN = dashboard.1077-possible-loki-mem-leak.ci.stackspin.net
stackspin/alertmanager-tls: issuer=C = AT, O = ZeroSSL, CN = ZeroSSL RSA Domain Secure Site CA subject=CN = alertmanager.1077-possible-loki-mem-leak.ci.stackspin.net
stackspin/single-sign-on-userpanel.tls: issuer=C = AT, O = ZeroSSL, CN = ZeroSSL RSA Domain Secure Site CA subject=CN = admin.1077-possible-loki-mem-leak.ci.stackspin.net
stackspin/grafana-tls: issuer=C = AT, O = ZeroSSL, CN = ZeroSSL RSA Domain Secure Site CA subject=CN = grafana.1077-possible-loki-mem-leak.ci.stackspin.net
stackspin/prometheus-tls: issuer=C = AT, O = ZeroSSL, CN = ZeroSSL RSA Domain Secure Site CA subject=CN = prometheus.1077-possible-loki-mem-leak.ci.stackspin.net

I don't know how to trigger another evaluation of the cert resource, since everything else is there.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information