Connecting two nodes using OTP28 with -proto_dist inet_tls

The Problem (TL;DR)

For reasons that I won’t get into right now :trade_mark:, my system needs to use two Erlang nodes in the same machine, one running OTP24 and the other running OTP28. That, in itself, is not a big deal, but here are the kickers:

  1. That server must use TLS for Erlang distribution.
  2. The connection must be started from the OTP28 node (i.e., this is the node that has to ping the other one).

As you might have guessed, that doesn’t work and I want to know why :slight_smile:

Context

Let me show you what I tried so far…

I have a file called inet_tls_short_name.config with the following contents:

[
    {server,
        [
            {verify, verify_peer} ,
            {depth, 0} ,
            {certfile, "elbrujohalcon.cer"} ,
            {keyfile, "elbrujohalcon.key"} ,
            {cacertfile, "pca.cer"} ,
            {dhfile, "dhparam.pem"}
        ]
    },
    {client,
        [
            {verify, verify_peer} ,
            {depth, 0} ,
            {certfile, "elbrujohalcon.cer"} ,
            {keyfile, "elbrujohalcon.key"} ,
            {cacertfile, "pca.cer"} ,
            {dhfile, "dhparam.pem"}
        ]
    }
].

Trying without TLS

So, without TLS everything works as expected on both sides (I disconnected both nodes before each run so they didn’t know each other when I run net_adm:ping(…) in them…

OTP28 node

Erlang/OTP 28 [erts-16.0.2] [source] [64-bit] [smp:2:2] [ds:2:2:10] [async-threads:1]

Eshell V16.0.2 (press Ctrl+G to abort, type help(). for help)
(brujo28@elbrujohalcon)1> net_adm:ping('brujo24@elbrujohalcon').
pong

OTP24 node

Erlang/OTP 24 [erts-12.3.2.17] [source] [64-bit] [smp:2:2] [ds:2:2:10] [async-threads:1] [jit]

Eshell V12.3.2.17  (abort with ^G)
(brujo24@elbrujohalcon)1> net_adm:ping('brujo28@elbrujohalcon').
pong

Trying with TLS

This is what happens if try the same thing using TLS distribution. I’m showing you the command line arguments now so you can see how I used that file I showed above.

OTP28 node

[elbrujohalcon@elbrujohalcon ~]$ /…/erts-16.0.2/bin/erl -boot start_clean -sname brujo28 -setcookie cookie -proto_dist inet_tls -ssl_dist_optfile "inet_tls_short_name.config"
Erlang/OTP 28 [erts-16.0.2] [source] [64-bit] [smp:2:2] [ds:2:2:10] [async-threads:1]

Eshell V16.0.2 (press Ctrl+G to abort, type help(). for help)
(brujo28@elbrujohalcon)1> net_adm:ping('brujo24@elbrujohalcon').
pang

OTP24 node

[elbrujohalcon@elbrujohalcon ~]$ /…/erts-12.3.2.17/bin/erl -boot start_clean -sname brujo24 -setcookie cookie -proto_dist inet_tls -ssl_dist_optfile "inet_tls_short_name.config"
Erlang/OTP 24 [erts-12.3.2.17] [source] [64-bit] [smp:2:2] [ds:2:2:10] [async-threads:1] [jit]

Eshell V12.3.2.17  (abort with ^G)
(brujo24@elbrujohalcon)1> net_adm:ping('brujo28@elbrujohalcon').
pong

and, of course, if I then try on the OTP28 node…

(brujo28@elbrujohalcon)2> net_adm:ping('brujo24@elbrujohalcon').
pong

Conclusion

Something is preventing my OTP28 node to ping the OTP24 node when using TLS distribution. I don’t know what that is, and I don’t even know how to find information about what it is or what to do about it.
I need ideas :light_bulb:
Thanks :slight_smile:

I can’t believe I didn’t think of this before, but the issue is actually simpler to reproduce. You don’t need OTP24 at all…

I still have this file, called inet_tls_short_name.config:

[
    {server,
        [
            {verify, verify_peer} ,
            {depth, 0} ,
            {certfile, "elbrujohalcon.cer"} ,
            {keyfile, "elbrujohalcon.key"} ,
            {cacertfile, "pca.cer"} ,
            {dhfile, "dhparam.pem"}
        ]
    },
    {client,
        [
            {verify, verify_peer} ,
            {depth, 0} ,
            {certfile, "elbrujohalcon.cer"} ,
            {keyfile, "elbrujohalcon.key"} ,
            {cacertfile, "pca.cer"} ,
            {dhfile, "dhparam.pem"}
        ]
    }
].

I start 2 nodes with OTP28…

[elbrujohalcon@elbrujohalcon ~]$ /…/erts-16.0.2/bin/erl -boot start_clean -sname brujo28-1 -setcookie cookie -proto_dist inet_tls -ssl_dist_optfile "inet_tls_short_name.config"
Erlang/OTP 28 [erts-16.0.2] [source] [64-bit] [smp:2:2] [ds:2:2:10] [async-threads:1]

Eshell V16.0.2 (press Ctrl+G to abort, type help(). for help)
(brujo28-1@elbrujohalcon)1>
[elbrujohalcon@elbrujohalcon ~]$ /…/erts-16.0.2/bin/erl -boot start_clean -sname brujo28-2 -setcookie cookie -proto_dist inet_tls -ssl_dist_optfile "inet_tls_short_name.config"
Erlang/OTP 28 [erts-16.0.2] [source] [64-bit] [smp:2:2] [ds:2:2:10] [async-threads:1]

Eshell V16.0.2 (press Ctrl+G to abort, type help(). for help)
(brujo28-2@elbrujohalcon)1>

If I try to ping any of the nodes from the other one, I get a pang in return. They can’t see each other. Any idea on how to debug this, folks?

Not super-helpful, but you could also try OTP27, in case it’s a regression?

Good call.
Sadly, it also fails in OTP27:

[elbrujohalcon@elbrujohalcon ~]$ /…/erts-15.2.7/bin/erl -boot start_clean -sname brujo27-1 -setcookie cookie -proto_dist inet_tls -ssl_dist_optfile "inet_tls_short_name.config"
Erlang/OTP 27 [erts-15.2.7] [source] [64-bit] [smp:2:2] [ds:2:2:10] [async-threads:1]

Eshell V15.2.7 (press Ctrl+G to abort, type help(). for help)
(brujo27-1@elbrujohalcon)1>
[elbrujohalcon@elbrujohalcon ~]$ /…/erts-15.2.7/bin/erl -boot start_clean -sname brujo27-2 -setcookie cookie -proto_dist inet_tls -ssl_dist_optfile "inet_tls_short_name.config"
Erlang/OTP 27 [erts-15.2.7] [source] [64-bit] [smp:2:2] [ds:2:2:10] [async-threads:1]

Eshell V15.2.7 (press Ctrl+G to abort, type help(). for help)
(brujo27-2@elbrujohalcon)1> net_adm:ping('brujo27-1@elbrujohalcon').
pang
(brujo27-2@elbrujohalcon)2>

Have you tried verifying your ssl parameters without the erlang distribution using only ssl APi in shells ?

Not exactly, but I did use inet_tls_dist:dbg(). (A colleague told me about it)… and it showed the actual problem:

(<0.113.0>) exception_from {inet_tls_dist,'-setup_fun/7-fun-0-',7} {exit,{ssl_connect_failed,{10,122,198,19},
                          46520,
                          {error,{option,server_only,dhfile}}}}

dhfile is no longer a valid option for the client section of the configuration. As soon as I removed it from the config file, the problem was solved.

3 Likes

It was never, it was just previously ignored. An then we improved option error handling as we realized ignoring it was a source of lot of confusion as users keep guessing what options they should include.

2 Likes