Ten-year old breakage in to_erl/run_erl

Hi,

Short: I think to_erl/run_erl have been minorly broken for about ten years.

The actual problem

Back in 2013, the protocol between to_erl and run_erl was changed. The opening byte used to be 022 (^R), but then it was changed to 014 (^L). run_erl displays half the handshake to the user, which was harmless with 022 but causes a clear-screen with 014, thus making the banner from ‘to_erl’ disappear.

The protocol was changed in 0d4a6b50, “on behalf of shell search”, probably to avoid having search-backwards (^R) trigger the handshake.

Old behaviour (e.g. R14B03)

/usr/local/src/otp_src_R14B03/bin >./x86_64-unknown-linux-gnu/to_erl 
Attaching to /tmp/erlang.pipe.3 (^D to exit)


1> // this is the erlang shell

New behaviour (e.g. R26)

/usr/local/src/otp_src_26.0.2/bin > ./x86_64-pc-linux-gnu/to_erl

1> // this is the erlang shell

i.e. the banner “Attaching to…” disappears because the screen gets cleared before you can read it. That messes up alternative shells which have help text in the banner, for instance “press enter to start the shell”.

Patch

--- otp_src_26.0.2/erts/etc/unix/run_erl.c.orig 2023-10-05 20:58:36.116555285 +0200
+++ otp_src_26.0.2/erts/etc/unix/run_erl.c      2023-10-05 21:09:44.403926705 +0200
@@ -714,6 +714,7 @@

                if (!got_some && wfd && buf[0] == '\014') {
                    char wbuf[30];
+                    buf[0] = ' '; // re-write ^L in the handshake to a harmless character
                    int wlen = sn_printf(wbuf,sizeof(wbuf),"[run_erl v%u-%u]\n",
                                         RUN_ERL_HI_VER, RUN_ERL_LO_VER);
                    outbuf_append(wbuf,wlen);
@@ -1459,5 +1460,3 @@
 }

 #endif /* DEBUG */

Background

to_erl and run_erl are used to run Erlang in embedded unix-y systems. They’re sort-of a home-made version of ‘Screen’, i.e. they re-direct Erlang’s console input and output to a named pipe so you can start an erlang shell and connect to it later on.

You can try it out yourself. In one window:

cd /usr/local/src/otp_src_26.0.2/bin
export ROOTDIR=/usr/local/src/otp_src_26.0.2
export BINDIR=/usr/local/src/otp_src_26.0.2/bin/x86_64-pc-linux-gnu/
x86_64-pc-linux-gnu/run_erl /tmp/ /tmp/ x86_64-pc-linux-gnu/erlexec

In another window:

./x86_64-pc-linux-gnu/to_erl

Just to show how much this is a re-invention of ‘screen’, you don’t have to run Erlang through it, you can run other things too, like Emacs:

unset DISPLAY
x86_64-pc-linux-gnu/run_erl /tmp/ /tmp/ emacs

This works well enough to play tetris :wink:

Historical digression

The manual for run_erl and to_erl includes an archaeological leftover: a set of instructions for running embedded Erlang on a ‘VME board from Force computers’ which apparently had 64 MB or RAM and ran a version of Solaris which needed 17 MB. Great that grandpa had a supercomputer. According to the source, he worked for ETX. Maybe an Ericsson greybeard can remind me what ETX was, I only remember UAB, ERA and EPA.

3 Likes

I wish the old mailing list was still running, as this is pure gold.

It has a bit of everything, some history, some fun, a workaround and also draws the reader to a mystery. Literally this is the software equivalent of a core sample and it great.

Core samples are preserved as they are useful to go back to. The old mailing list was preservable[1] too.

I hope this nugget does not get lost in some future migration to something shinier.

[1] you do have to grind through the firewall based rate limiter, but it is possible…

For the patch, I probably would invoke the wrath of all those Rust fans and use memmove() to pull back the contents of the buf by one character, something like:

memmove(buf, &buf[1], len - 1);

I have though never used this, but from glancing at the code and your description it looks like signalling is multiplexed with the data so it would be safe just to remove the signalling before outputting?

Cool that someone else is interested; I’ve wondered if maybe my hardware is the last thing using to_erl/run_erl.

You’re right, my patch is a band-aid. On the plus side, it’s obviously correct and easy to modify to see that the original bug really is as I say, i.e. make it print ‘#’ instead of ’ '.

I considered a bigger clean-up. pass_on() would be more accurately called open_log_open_pipes_handshake_sometimes_pass_on_manage_timeouts_and_clean_up(), but C code often ends up like that. I wasn’t gung-ho enough to try separating out the handshake phase into a separate function.

The handshake was added in 2008, in R12B-3. The release notes mention OTP-5107, OTP-7252 and OTP-7342 as tickets related to that code, and a diff of R12B and R13B shows a general clean-up in run_erl.c.

1 Like

Would be sad if that is the case, but other than baked into some Ericsson/Cisco kit, you may be right!

Really glad you shared it, similarly your R15 staved processes thread a while back.

This probably is a box of spiders, or a can of worms at best.

I am just waiting for the CVE where some vendor had a banner stating “hardcoded credentials are…”

Cheers

FWIW, the “extended start scripts” generated by rebar3 release still use run_erl/to_erl when running the service in daemon/background mode …

Everything we do uses run_erl and to_erl is also part of our support work flow.

Good to know.

Could be that everyone else thought the clear screen was a new feature.

If I remember correctly we chose ^L because it would trigger a redraw of the current line. Before that change if a user had written init:stop(). and then did Ctrl+D in to_erl, the expression would remain in the buffer, and then if you come back later and attach you hit enter to get the prompt and get a surprise shutdown of the system. This may seem like a silly thing to do, but given enough time silly things tend to happen more than once…

OTP 26 recently changed the behaviour of ^L to be the same as many other shell, i.e. clear-screen+ redraw. The interaction with to_erl is not something that we thought of (nor noticed) when doing that change…

As a workaround you should be able to use the new shell_keymap to configure the behaviour of ^L to not clear the screen.

The current behaviour is a bug, I opened to_erl clears screen · Issue #7737 · erlang/otp · GitHub to track it. Thanks for the report!

1 Like