There’s an old joke about someone boasting
“This feature fills a much-needed gap.”
to which the reply was
“We REALLY needed that gap.”
To my mind, string interpolation is just such an example of
filling a much-needed gap. Consider
Language X: “foo($X) = bar($Y)”
Language Y: <| ps(‘foo(’); pr(X); ps(‘) = bar(’); pr(Y); ps(‘)’) |>
It’s obvious! Language X is better than language Y!
But wait.
Language Y: <| applisti(Map, lambda(K,V)
ps(‘foo(’); pr(X); ps(‘) = bar(’; pr(Y); ps(‘)\n’)
end) |>
Language X: stunned incomprehension.
PROBLEM 1: SMALL FIXED NUMBER.
String interpolation lets you interpolate a small fixed
number of strings (or things printed like strings). It
does not handle options well and it does not handle
iteration well.
String interpolation does not scale with structure.
PROBLEM 2: MISDIRECTION.
String interpolation seduces you into building a string.
I’m reminded of something that happened at Quintus. The
emulator was written in a high level language not entirely
unlike PL/360 in spirit, called Progol. There was a
program that translated Progol into assembly code for, oh
seven different architectures. We needed it to be much
easier to extend to the next five architectures (the x86
was by far the biggest pain) and we needed it to be slower.
The new guy who was given the task was SERIOUSLY in love
with strings. I mean SERIOUSLY. The result was that the
new Progol->assembler translator was bigger, harder to
extend, and markedly slower.
The issue was that there are data structures for processing,
and trees are just BRILLIANT for that. And there are data
structures for communicating with the rest of the world, and
they are called STREAMS. Strings are OK for talking to data-
bases.
Erlang gets a lot of mileage out of saying "hey, you don’t
actually want to construct the string X++Y++X, you just
want to send the characters of X followed by the characters
of Y followed by the characters of Z to the same stream;
why don’t you write [X,Y,Z] and I’ll take care of the rest?
This observation goes back to Smalltalk, which got brilliantly
right something that Java got abjectly wrong. If you want to
be able to convert data structures to character sequences,
what are the building blocks you want?
Smalltalk:
printOn:
where could discard everything, count the
characters, send them to a file, broadcast them to a dozen
other streams, do any of these while logging to another
stream, or build a string.
Java:
.toString()
and if you want this multi-megabyte string sent to a stream,
you have to first build the string, then write it, then clean up.
I’d done benchmarks comparing writing data structures in Smalltalk
and Java, and for all Java has had huge effort put into compiling
it, any decent Smalltalk leaves Java in the dust.
So Erlang HAS a perfectly good, indeed far more powerful and
efficient, alternative to string interpolation, and that’s
iolists.
String interpolation does not scale well with size,
in either time taken or storage turned over.
PROBLEM 3: LITTLE BOBBY TABLES.
Looking at the two libraries I maintain, I see support for
-
INI files
-
IBM stanza files
-
GNUstep / Apple property lists (textual, not XML or binary yet)
-
JSON
-
CSV
-
XML
-
generating (but not parsing) shell scripts
plus some more specialised things.
For the first two, simply pasting in the characters of the
variable or embedded expression works as well as anything
else. Everything else needs some sort of quoting/escaping,
and they all need DIFFERENT quoting/escaping.
Are you an H.P.Lovecraft or Clark Ashton Smith fan?
Then savour the eldritch horror of generating a PHP file
containing an XSLT script, by hand, then open the dread
portal of interdimensional horror on your quivering brain
by trying to imagine doing it with string interpolation.
Correctly!
And this, of course, is why you can’t stop at just
interpolating variables. Either you extend string
interpolation with a fixed collection of quoting/
escaping rules, e.g., “where X = $(X:SQL)” and
“
$(Body:XmlPCDATA)”
OR you allow expressions like
“
$(xml:pcdata(Body))
And really,
“
” ++
xml:pcdata(Body) ++ “”
doesn’t look so bad any more and
[”
“,
xml:pcdata(Body),”"]
looks downright sexy.
The documentation on my Smalltalk system and library includes
a section on why string interpolation isn’t provided.
“But what happens when you have many kinds of data and
many ways to convert them to text?”
One particularly nasty on is the difference between numbers,
which you want interpolated by using the usual ‘print’
feature of your language, and strings, which you DON’T want
interpolated that way.
String interpolation does not scale to multiple
kinds of escaping and tends to produce insecure code.
Seriously, what NUL-termination is to C strings,
string interpolation is to Ruby and all the other
languages that have it. It reminds me of a Perl
book I had that argued that Perl was a great language
for writing specialised software engineering tools.
It did everything it could with strings and regular
expressions, and in a couple of hundred pages, I do
not believe there was a single program that worked.
There is a Tesla massive power storage battery being
installed in a little town in Queensland. It has 40
container-sized units, each I believe holding the
equivalent of 40 EV batteries. Right now, one of those
units is on fire, sending highly toxic fumes around the
community. The Fire Service and Tesla agree on how you
deal with such fires. Sit and wait for them to go out.
This is a parable. Massive power storage batteries are
a step in the renewable electricity dream. They are
cool tech. They are seductive, powerful. And we do not
know how to make them safe. (Although everything we DO
know about how to make bomb storage sites safe appears
to have been ignored in this case.)
String interpolation is cool tech. It’s seductive,
powerful, and dangerously toxic if misused, which it
very easily is.
ANY complexity added to a tokeniser to support string
interpolation is like effort put into the gun you are
planning to shoot your foot off with.
Having said that, once you allow string interpolation
into your programming language, you are pretty much
forced to allow at least function calls so that addicts
aren’t FORCED to make Little Bobby Tables mistakes.