New paper: Unsafe Impedance: Safe Languages and Safe by Design Software

adolfont · July 19, 2024, 11:27am

A preprint of “Unsafe Impedance: Safe Languages and Safe by Design Software” is now available on arxiv [2407.13046] Unsafe Impedance: Safe Languages and Safe by Design Software. Lee Barney (BYU-Idaho) and I are the authors. It will be presented at the Erlang Workshop 2024 Erlang 2024 - ICFP 2024. We would love any feedback.

Abstract:
In December 2023, security agencies from five countries in North America, Europe, and the south Pacific produced a document encouraging senior executives in all software producing organizations to take responsibility for and oversight of the security of the software their organizations produce. In February 2024, the White House released a cybersecurity outline, highlighting the December document. In this work we review the safe languages listed in these documents, and compare the safety of those languages with Erlang and Elixir, two BEAM languages.
These security agencies’ declaration of some languages as safe is necessary but insufficient to make wise decisions regarding what language to use when creating code. We propose an additional way of looking at languages and the ease with which unsafe code can be written and used. We call this new perspective \em{unsafe impedance}. We then go on to use unsafe impedance to examine nine languages that are considered to be safe. Finally, we suggest that business processes include what we refer to as an Unsafe Acceptance Process. This Unsafe Acceptance Process can be used as part of the memory safe roadmaps suggested by these agencies. Unsafe Acceptance Processes can aid organizations in their production of safe by design software.

josevalim · July 19, 2024, 8:32pm

Thank you for sharing Adolfo and Lee. The current efforts aim to increase the baseline safety of programming languages but did not say much beyond that. Your paper takes the next step and I hope it gets the discussion it deserves.

My only remark is about data races. As you noticed, BEAM languages data structures do not have data races because they are immutable. And, beyond data types, all other constructs, such as counters, ETS, etc. are safe from corruption under concurrent environments. So it only remains logical data races. This is a contrast to other languages where you can still use the wrong data structure, leading to data corruptions or even segmentation faults.

bcardarella · July 19, 2024, 8:56pm

This is excellent!

yenrabbyui · July 22, 2024, 6:49pm

@josevalim Thank you Jose. I hope this starts a conversation within the BEAM languages and teams as well as for other MSLs.

yenrabbyui · July 22, 2024, 6:50pm

@bcardarella Thanks Brian!

juhlig · July 22, 2024, 7:48pm

Great, I fully agree with the paper

nzok · July 24, 2024, 11:51am

Nice paper.
The preferred spelling of the adjective meaning “relating to space” is “spatial”. Blame the Romans, it’s Latin “spatium” + “-al”.
In summary, “if you can load low-level code, you’re toast”.

In fairness to Erlang, perhaps there should be a discussion of C nodes, the fact that the old (and still safer) way for Erlang to talk to C was to put them in separate operating systems processes with separate address spaces.

The irony is that older languages did it better.
The first programming language I wrote serious programs in was Burroughs Extended Algol for the B6700, and out of the 10 CVEs in table 1, only “dangling pointer” and “data race” were possible, and “dangling pointer” got fixed. So, by about 1977, the main application programming language for a major operating system could express only 1 of the 10 kinds of problem, and this was in a language with dynamic loading. The architecture (emulated on x86[-64]), operating system, and language are still in use.

kiko · August 7, 2024, 6:56am

Well explained, and race conditions can always happen.
What do you mean with “It only remains logical data races”? I am not sure I followed what is a logical data race

josevalim · August 7, 2024, 10:47am

I meant to say we can still have data races because someone can use ETS non-atomically. For example, I do a lookup+insert instead of an update, but we are not going to have races because two processes decided to use the same ETS table and that could corrupt it internally. Compared to other languages where simply using the wrong data structure across threads can cause a segmentation fault.