Certificate status doesn't recover from failed state
We have a lot of failed certs in CI since we switched to ZeroSSL, and I hope this gets better moving to another CA. However, there also seems to be sth wrong with cert-manager. I was investigating the droplet used for !681 (merged), where only the sso cert is still not valid. I wrote a script that summarized cert-manager resources (attached: k8s_debug_cert_manager.sh). The output shows that only the certificate resource it self is failing, although the secret already holds the proper zerossl cert:
❯ ~/bin/k8s_debug_cert_manager.sh
Issuers:
No resources found in default namespace.
ClusterIssuers:
NAME READY AGE
letsencrypt-issuer True 3h48m
zerossl-issuer True 3h48m
Challenges:
No resources found
Orders:
NAMESPACE NAME STATE AGE
stackspin dashboard.1077-possible-loki-mem-leak.ci.stackspin.n-2821898803 valid 3h44m
stackspin alertmanager-tls-952q9-2580165342 valid 3h47m
stackspin single-sign-on-userpanel.tls-9q2rj-4061664042 valid 3h48m
stackspin grafana-tls-w5dbf-1045808427 valid 3h47m
stackspin prometheus-tls-7bc44-2182223537 valid 100m
stackspin hydra-public.tls-hwgz9-4285727226 valid 39m
Certificaterequest:
NAMESPACE NAME APPROVED DENIED READY ISSUER REQUESTOR AGE
stackspin dashboard.1077-possible-loki-mem-leak.ci.stackspin.n-bcr89 True True zerossl-issuer system:serviceaccount:cert-manager:cert-manager 3h44m
stackspin alertmanager-tls-952q9 True True zerossl-issuer system:serviceaccount:cert-manager:cert-manager 3h47m
stackspin single-sign-on-userpanel.tls-9q2rj True True zerossl-issuer system:serviceaccount:cert-manager:cert-manager 3h48m
stackspin grafana-tls-w5dbf True True zerossl-issuer system:serviceaccount:cert-manager:cert-manager 3h47m
stackspin prometheus-tls-7bc44 True True zerossl-issuer system:serviceaccount:cert-manager:cert-manager 100m
stackspin hydra-public.tls-hwgz9 True True zerossl-issuer system:serviceaccount:cert-manager:cert-manager 39m
Certificates:
NAMESPACE NAME READY SECRET AGE
stackspin dashboard.1077-possible-loki-mem-leak.ci.stackspin.net-tls True dashboard.1077-possible-loki-mem-leak.ci.stackspin.net-tls 3h44m
stackspin alertmanager-tls True alertmanager-tls 3h47m
stackspin single-sign-on-userpanel.tls True single-sign-on-userpanel.tls 3h48m
stackspin grafana-tls True grafana-tls 3h47m
stackspin prometheus-tls True prometheus-tls 3h47m
stackspin hydra-public.tls False hydra-public.tls 3h48m
Certificate resource issuers:
dashboard.1077-possible-loki-mem-leak.ci.stackspin.net-tls: zerossl-issuer
alertmanager-tls: zerossl-issuer
single-sign-on-userpanel.tls: zerossl-issuer
grafana-tls: zerossl-issuer
prometheus-tls: zerossl-issuer
hydra-public.tls: zerossl-issuer
Secrets:
NAMESPACE NAME TYPE DATA AGE
kube-system k3s-serving kubernetes.io/tls 2 3h52m
stackspin dashboard.1077-possible-loki-mem-leak.ci.stackspin.net-tls kubernetes.io/tls 2 3h41m
stackspin alertmanager-tls kubernetes.io/tls 2 3h40m
stackspin single-sign-on-userpanel.tls kubernetes.io/tls 2 3h40m
stackspin grafana-tls kubernetes.io/tls 2 3h35m
stackspin prometheus-tls kubernetes.io/tls 2 99m
Certificate secrets and actual issuers:
kube-system/k3s-serving: issuer=CN = k3s-server-ca@1639391072 subject=O = k3s, CN = k3s
stackspin/dashboard.1077-possible-loki-mem-leak.ci.stackspin.net-tls: issuer=C = AT, O = ZeroSSL, CN = ZeroSSL RSA Domain Secure Site CA subject=CN = dashboard.1077-possible-loki-mem-leak.ci.stackspin.net
stackspin/alertmanager-tls: issuer=C = AT, O = ZeroSSL, CN = ZeroSSL RSA Domain Secure Site CA subject=CN = alertmanager.1077-possible-loki-mem-leak.ci.stackspin.net
stackspin/single-sign-on-userpanel.tls: issuer=C = AT, O = ZeroSSL, CN = ZeroSSL RSA Domain Secure Site CA subject=CN = admin.1077-possible-loki-mem-leak.ci.stackspin.net
stackspin/grafana-tls: issuer=C = AT, O = ZeroSSL, CN = ZeroSSL RSA Domain Secure Site CA subject=CN = grafana.1077-possible-loki-mem-leak.ci.stackspin.net
stackspin/prometheus-tls: issuer=C = AT, O = ZeroSSL, CN = ZeroSSL RSA Domain Secure Site CA subject=CN = prometheus.1077-possible-loki-mem-leak.ci.stackspin.net
I don't know how to trigger another evaluation of the cert resource, since everything else is there.