Are you confusing ssh_connection_handler.erl doing the wrong thing with my test code doing the wrong thing? My code does the wrong thing because… that’s the point of a test case. It’s supposed to demonstrate the crash in a way where it’s obvious what’s triggering the problem.
ssh_connection_handler.erl should not care if a process unrelated to SSH exits normally.
Or, running it in reverse, let’s try your proposed solution, which is “make sure there are no leftover messages in the message queue”. We’ll do that with a flush:
robert() ->
crypto:start(),
ssh:start(),
erlang:process_flag(trap_exit, true),
spawn_link(fun() -> timer:sleep(40) end),
timer:sleep(40),
flush(),
{ok, _SSH} = ssh:connect("localhost", 22,
[{user, "bogus"}, {password, "bogus"},
{silently_accept_hosts, true}]).
flush() ->
receive
_ -> flush()
after 0 ->
done
end.
Here’s what it does:
test_ssh_crash:robert().
** exception error: no match of right hand side value {error,{exit,normal}}
in function test_ssh_crash:robert/0 (test_ssh_crash.erl, line 34)
That’s bad. It should have complained that the login credentials were wrong (assuming we’re running on a machine with an SSH server that has password auth enabled).
Now, you could point out that my test case is unnecessarily complicated and racy, and simplify it:
robert() ->
crypto:start(),
ssh:start(),
erlang:process_flag(trap_exit, true),
spawn_link(fun() -> ok end),
{ok, _SSH} = ssh:connect("localhost", 22,
[{user, "bogus"}, {password, "bogus"},
{silently_accept_hosts, true}]).
And now we’re pretty much back to what I posted in the first place. Sure, test case looks contrived. Such is life for minimal test cases.
But: it’s not reasonable for an SSH client to (quietly!) require that the calling code isn’t trapping exits and doesn’t have any linked processes that might exit during the handshake phase of the SSH connection.
I can think of a few possible fixes:
-
Documentation fix. Put a warning in the docs that the SSH client can’t handle being run in an environment where exits are trapped and a linked process might exit, even normally. Easy but it’s going to catch someone out.
-
Precondition check. The SSH client checks there are no ‘EXIT’ messages in the queue at startup, plus checks that trap_exit is false. Better, but hacky.
-
Wrap the SSH client in another process, i.e. make sure the precondition is true. Also hacky.
-
Fix ssh_connection_handler so it doesn’t go matching ‘EXIT’ messages for processes that are none of its business. A bit fiddly to implement and then difficult to test well enough to be confident that it doesn’t break something else.
I didn’t submit a patch because… I’m lazy.