A TLS SAN quirk that broke mTLS
One of our backend services rejected a client cert after a renewal. The new cert verified fine with openssl verify. It chained to the same CA. It had the same CN. But the server said tls: bad certificate. Turned out the problem was in the SAN.
The setup
We do mTLS between internal services. The CA is an offline root plus an online intermediate. Each service has a cert with CN equal to its hostname and a SAN list with the same hostname plus any DNS aliases it might be called as. Our cert template for issuance was an openssl config that read:
[req]
distinguished_name = dn
req_extensions = ext
[dn]
CN = service-a.internal
[ext]
subjectAltName = @alt
[alt]
DNS.1 = service-a.internal
DNS.2 = service-a
IP.1 = 10.20.30.40
When we renewed the cert last Tuesday, the resulting cert verified fine and the service was redeployed. Five hours later a scheduled job called into this service from another cluster and failed:
x509: certificate is valid for service-a.internal, service-a, 10.20.30.40, not service-a.internal
Yes, that error message says the cert is valid for service-a.internal and not valid for service-a.internal at the same time. Fun.
Digging in
I dumped the cert with openssl:
openssl x509 -in /etc/tls/service-a.pem -text -noout | grep -A3 'Subject Alternative'
# X509v3 Subject Alternative Name:
# DNS:service-a.internal, DNS:service-a, IP Address:10.20.30.40
Looks right. I dumped the old cert from the backup. Looks identical. I ran a byte diff of the two cert files; of course they differ, because the signature changes. I pulled the SAN extension as raw DER and diffed that:
openssl asn1parse -in /etc/tls/service-a.pem | grep -A20 '2.5.29.17'
On the working cert, the SAN DNS entries were IA5STRING encoded. On the broken cert, the DNS entries were UTF8STRING encoded. That is a subtle violation of RFC 5280, which specifies DNS names as IA5String in the SAN extension. Go’s crypto/x509 strictly validates this and rejects UTF8String.
Why did the renewal do this?
Our issuance pipeline had been updated. The new version used a newer version of cfssl, and the bundle template had a minor change that caused DNS entries to be emitted as UTF8String instead of IA5String. openssl accepts both because openssl is famously permissive. Go does not. Python’s ssl module also accepts both. Java accepts both. So for a long time we did not notice, because our clients were Python and Java. The job that failed was a Go binary.
The fix
Two parts:
Fix the cfssl template to emit IA5String. The relevant cfssl config snippet:
{ "signing": { "default": { "usages": ["signing", "key encipherment", "server auth", "client auth"], "expiry": "8760h" } } }cfssl does not directly expose string encoding choice; we had to bump to a version where the default was back to IA5 (or rather, use a pin that did not have the regression). We pinned cfssl to the previous version until a newer release settled.
Add a lint to our CI pipeline that parses every issued cert and fails if any SAN entry is encoded as anything other than IA5String or iPAddress. A short Python script:
from cryptography import x509 from cryptography.hazmat.backends import default_backend cert = x509.load_pem_x509_certificate(open('cert.pem','rb').read(), default_backend()) san = cert.extensions.get_extension_for_class(x509.SubjectAlternativeName) for v in san.value: if isinstance(v, x509.DNSName) and not v.value.isascii(): raise SystemExit(f"non-ascii DNS name: {v.value}")(The real check goes one level lower and pokes at the encoding; the cryptography lib abstracts that away, but parsing the DER directly with
asn1cryptocatches it.)
The wider lesson
I now assume openssl and Go will disagree on some aspect of any cert that came out of a non-trivial pipeline. When we suspect cert weirdness I run this set of cross-checks:
openssl s_client -connect service-a.internal:443 -cert client.pem -key client.key < /dev/null
echo | go run verify.go -cert client.pem -server service-a.internal:443
python -c "import ssl; ssl.create_default_context().load_cert_chain('client.pem','client.key'); ..."
The one that disagrees is the one telling the truth. Go and Rust are usually the strictest. Python and Java are usually permissive. openssl is often wrong about what the standards actually say.
Reflection
Certificate issuance tooling is a place where subtle bugs have a way of being invisible for months. I have lost a full day to a cert with the wrong key usage, a cert with a weird critical-or-not boolean on an extension, and now a cert with the wrong string type. Every time the lesson is the same: lint your certs with multiple parsers in CI.
If your internal PKI is not doing this yet, it takes about two hours to set up and has paid me back at least three times. Related: see my post on DNS-01 challenges with a split-horizon DNS for adventures in Let’s Encrypt specifically.