Problem with ssl:connect after upgrading erlang to 24.3.4

I have an application that uses custom certificates that breaks when upgrading from OTP version 24.3.3 to 24.3.4

I’ve been able to reduce the code to the maximum as follows:

$ ./24.3.4/bin/erl
Erlang/OTP 24 [erts-12.3.2] [source] [64-bit] [smp:16:16] [ds:16:16:10] [async-threads:1] [jit]

Eshell V12.3.2  (abort with ^G)
1> ssl:start().
ok
2> ssl:connect(
2>   "some.url",
2>   9094,
2>   [
2>     {verify, verify_none},
2>     {cacertfile, "/tmp/certificate_chain"},
2>     {keyfile, "/tmp/key"},
2>     {certfile, "/tmp/certificate"},
2>     {password, "the_password"}
2>   ],
2>   5000
2> ).
=NOTICE REPORT==== 26-May-2022::12:26:02.100208 ===
TLS client: In state hello at ssl_handshake.erl:892 generated CLIENT ALERT: Fatal - Handshake Failure
 - {unknown_or_malformed_handshake,13}
{error,{tls_alert,{handshake_failure,"TLS client: In state hello at ssl_handshake.erl:892 generated CLIENT ALERT: Fatal - Handshake Failure\n {unknown_or_malformed_handshake,13}"}}}
3>

but in the previous minor version that code works:

$ ./24.3.3/bin/erl
Erlang/OTP 24 [erts-12.3.1] [source] [64-bit] [smp:16:16] [ds:16:16:10] [async-threads:1] [jit]

Eshell V12.3.1  (abort with ^G)
1> ssl:start().
ok
2> ssl:connect(
2>   "some.url",
2>   9094,
2>   [
2>     {verify, verify_none},
2>     {cacertfile, "/tmp/certificate_chain"},
2>     {keyfile, "/tmp/key"},
2>     {certfile, "/tmp/certificate"},
2>     {password, "the_password"}
2>   ],
2>   5000
2> ).
{ok,{sslsocket,{gen_tcp,#Port<0.5>,tls_connection,undefined},
               [<0.118.0>,<0.117.0>]}}

Now, I know i’m using verify_none, but if I use verify_peer, then the version 24.3.4 outputs the same error, but version 24.3.3:

$ ./24.3.3/bin/erl
Erlang/OTP 24 [erts-12.3.1] [source] [64-bit] [smp:16:16] [ds:16:16:10] [async-threads:1] [jit]

Eshell V12.3.1  (abort with ^G)
1> ssl:start().
ok
2> ssl:connect(
2>   "b-2.reviews-msk-qa.rs5hgu.c4.kafka.us-east-1.amazonaws.com",
2>   9094,
2>   [
2>     {verify, verify_peer},
2>     {cacertfile, "/tmp/certificate_chain"},
2>     {keyfile, "/tmp/key"},
2>     {certfile, "/tmp/certificate"},
2>     {password, "the_password"}
2>   ],
2>   5000
2> ).
=NOTICE REPORT==== 26-May-2022::12:44:13.561279 ===
TLS client: In state certify at ssl_handshake.erl:2074 generated CLIENT ALERT: Fatal - Unknown CA

{error,{tls_alert,{unknown_ca,"TLS client: In state certify at ssl_handshake.erl:2074 generated CLIENT ALERT: Fatal - Unknown CA\n"}}}

Suspecting that there might be something off with the certificate chain, I validated it with:

$ openssl s_client -connect the_url:9094 -CAfile ./certificate_chain
CONNECTED(00000005)
depth=4 C = US, O = "Starfield Technologies, Inc.", OU = Starfield Class 2 Certification Authority
verify return:1
depth=3 C = US, ST = Arizona, L = Scottsdale, O = "Starfield Technologies, Inc.", CN = Starfield Services Root Certificate Authority - G2
verify return:1
depth=2 C = US, O = Amazon, CN = Amazon Root CA 1
verify return:1
depth=1 C = US, O = Amazon, OU = Server CA 1B, CN = Amazon
verify return:1
depth=0 CN = *.the_url
verify return:1
4623955628:error:1401E0F4:SSL routines:CONNECT_CR_FINISHED:unexpected message:/AppleInternal/Library/BuildRoots/b6051351-c030-11ec-96e9-3e7866fcf3a1/Library/Caches/com.apple.xbs/Sources/libressl/libressl-2.8/ssl/ssl_both.c:510:
---
Certificate chain
 0 s:/CN=*.the_url
   i:/C=US/O=Amazon/OU=Server CA 1B/CN=Amazon
 1 s:/C=US/O=Amazon/OU=Server CA 1B/CN=Amazon
   i:/C=US/O=Amazon/CN=Amazon Root CA 1
 2 s:/C=US/O=Amazon/CN=Amazon Root CA 1
   i:/C=US/ST=Arizona/L=Scottsdale/O=Starfield Technologies, Inc./CN=Starfield Services Root Certificate Authority - G2
 3 s:/C=US/ST=Arizona/L=Scottsdale/O=Starfield Technologies, Inc./CN=Starfield Services Root Certificate Authority - G2
   i:/C=US/O=Starfield Technologies, Inc./OU=Starfield Class 2 Certification Authority
---
Server certificate
-----BEGIN CERTIFICATE-----
.....
-----END CERTIFICATE-----
subject=/CN=*.the_url
issuer=/C=US/O=Amazon/OU=Server CA 1B/CN=Amazon
---
Acceptable client certificate CA names
/C=US/O=Amazon/CN=Amazon Root CA 1
/C=US/O=Amazon/CN=Amazon Root CA 2
/O=CompanyName/OU=gdmnp/ST=Virginia/CN=g-gdmnp-privateca/L=Ashburn
/C=US/O=Amazon/CN=Amazon Root CA 4
/C=US/ST=Arizona/L=Scottsdale/O=Starfield Technologies, Inc./CN=Starfield Services Root Certificate Authority - G2
/C=US/O=Amazon/CN=Amazon Root CA 3
Server Temp Key: ECDH, P-256, 256 bits
---
SSL handshake has read 11986 bytes and written 169 bytes
---
New, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES256-GCM-SHA384
Server public key is 2048 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
    Protocol  : TLSv1.2
    Cipher    : ECDHE-RSA-AES256-GCM-SHA384
    Session-ID: 95FF948CC4933966E48CAA9701D9E5D43037FD6C51C6404D8EF9861B7F30EF3B
    Session-ID-ctx:
    Master-Key: A9CCCDFEED3755CF19D1CFE99DD6A7A86F65CB929487C41D8635E6620D72B75A53522A528B1EEFB8500C08084FB3BFB9
    Start Time: 1653559670
    Timeout   : 7200 (sec)
    Verify return code: 0 (ok)
---

And that seems to be OK.

So, At this point I’m wondering:

  1. Am I facing a single issue, generated by something wrong with the certificate?
  2. Are these two distinct issues that I should try to tackle separately?

Does anybody have any pointers on how I should move forward debugging this?
When looking at the diff between the two versions this commit seems relevant, but I’m not versed in the world of TLS ssl: Fix version mismatch · erlang/otp@5e48b34 · GitHub

2 Likes

For suspected bug reports please create an issue on the erlang/otp repo at github.com.

But before doing that take a look at already existing issues in case the same problem is reported by someone else. In this case I suspect you problem is the same as this:

2 Likes

ohh, thank you!! I completely missed that bug report. My searching foo failed me.
I opened the ticket here because I wasn’t sure if it was a bug… but looking at that ticket and some related tickets it seems I might be hitting 2 separate ones!

Thank you!!!

2 Likes