String Sigils in Erlang/OTP 27

An implementation of String Sigils according to EEP 66 has been merged to ‘master’ to be released in OTP-27.0 and all pre-releases.

This implementation, as written in the EEP, is limited to the sigils ~, ~b, ~B, ~s and ~S (default, binary and string, with escape chars or verbatim), and limited to normal strings and triple-quoted strings (no new string delimiter characters).

21 Likes

I note with some amusement that the Wikipedia page on Sigils

says flatly that prefixes used in this way are NOT sigils.
“In computer programming, a sigil (/ˈsɪdʒəl/) is a symbol affixed to a
variable name, showing the variable’s datatype or scope, usually a
prefix”
“As this affects the semantics (value) of a literal, rather than the
syntax or semantics of an identifier (name), this is neither stropping
(identifier syntax) nor a sigil (identifier semantics).”

I also note with some embarrassment that since “sigil” is derived from
“sigillum” I’ve been pronouncing it with a hard “g” (also partly by
association with siglum/sigla).

I have a question about ~S . Here’s what the EEP says:

Creates a Unicode codepoint list, with verbatim string content in that
only the end delimiter character can be escaped with a «\» character.

This means that ~S""foo"" is OK, with the " sequences producing quotes.
But it also means that ~S"C:\Users\Owner\Data" is NOT OK, because the
\ at the end,
which is supposed to be part of the payload, is taken as escaping the
closing quote.
I think that this should follow Common Lisp in allowing \ as well as ".
The same applies to ~B of course.
Even if the EEP allowed alternative string delimiters, it would not
help because it’s not the delimiter as such that is the problem, but a
backslash preceding a delimiter where the backslash is intended to be
taken literally.

Being a bear of very little brain, I can all too easily imagine myself
writing Windows file names using ~B or ~S and then one day wanting a
directory name and POW. Is this addressed in the implementation?

4 Likes

It currently it appears that it does not handle this, but I imagine it would be a small thing to fix up.

2 Likes

Well, yes, Elxir used this name for this feature, and it seems like the lesser evil that we use the same name.

I guess some \ characters got lost in the forum e-mail translation, so I assume you wrote:

Which is correct.

I guess that was ~S"C:\Users\Owner\Data\" before forum e-mail mangling, and yes, that wouldn’t work! :frowning:

Thank you for pointing this out, it needs to be fixed!
Thr goal was to have the ~S and ~B sigils as verbatim as possible, not having to escape all \ characters (in addition to ") to facilitate e.g Windows file names and regular expressions.

A previous version of EEP 66 version did not allow any escaping in a verbatim sigil content, and also had a larger set of delimiters so you were supposed to choose an ending delimiter that was not part of the string content. Or use a triple-quoted string.

When having only one string delimiter, ", it would be bad to not be able to get " in the string content…

One possibility would be to have special treatment of the last character and there handle escaping also of \. That would be more practical than pretty:
~S"\"foo\bar\"""foo\bar"
~S"\foo\bar\\"\foo\bar\
These simple cases; escaping " in a string, and escaping \ at the end of a string looks fairly decent but it is not obvious that \\" is a string end. This is also not that pretty:
~S"\"\foo\bar\\"""\foo\bar\"
Triple-quoted strings would be much clearer for these.

The other options I can think of are:

  • Not allow the end delimiter in a verbatim sigil string at all, which would be a limitation that could be somewhat mitigated by allowing a large set of delimiters.
  • Require escaping of both \ and the end delimiter in a verbatim sigil string, which would be awkward for things sigils were intended to be convenient for, like Windows paths and regular expressions. I guess this is the Common Lisp way…?
3 Likes

I just wanted to say at least if escaping is done, we may not want to call this verbatim. It’s not wrong, but when I first heard verbatim string I heard “as is”, “without modification”. Similar to raw strings in python (i.e., r"C:\foo\bar\baz\"). I suppose not escaping in verbatim strings is indeed an option, and I can think of no draw back from it, sans inconsistency between the various forms, but that doesn’t mean it’s not intuitive either. I say this all without understanding what led to the escaping in a verbatim string or binary in the first place.

What about just supporting it? There’s probably an edge I’m not thinking about, but we could wait until EOL or EOF to say whether it is or is not valid syntax ({error, ...} or {ok, ...}) . The problem I suppose is that the reader may have to do a double or triple take. I also think it would be nice to avoid conditionals in the documentation for this (i.e., this is a syntax error, UNLESS ).

Edit:

FWIW, elixir supports ~S"C:\foo\bar\baz\\"

1 Like

This thread seems to disagree: Single backslash problem at end of term in sigil_S/2 · Issue #8989 · elixir-lang/elixir · GitHub

2 Likes

Was there consideration of a sigil that created iolists when doing interpolation?

An example of where I’d want to use sigils and triple quote strings is https://github.com/open-telemetry/opentelemetry-erlang-contrib/blob/main/examples/roll_dice_elli/src/roll_dice_handler.erl#L49 to be something like:

 ~B"""<html>
        <head>
          <script src="/static/index.js"> </script>
          <meta name="traceparent" content="~TraceContext~" />
       ...
     """

But still result in a

I suppose doing:

    [~b"""<html>
            <head>
              <script src="/static/index.js"> </script>
              <meta name="traceparent" content="""", TraceContext, ~b""" />
         ....
         """]

would still be an improvement to the current version where I have to escape all over the place, so maybe doesn’t make sense to try to create a sigil that somehow knows you want a list of strings and not a flat string…

1 Like

Interpolation is off the table, for now.

But yes, interpolation, if implemented, should maybe/probably create iolists.

1 Like

I have suggested a change to EEP 66, defining the verbatim sigils as truly verbatim, to remedy the trailing backslash problem pointed out by @nzok.

2 Likes

Oh my. Embedded HTML/XML as a use case for giant mutant strings. One of the reasons I liked processing XML in Erlang and Prolog and Scheme was that I could represent XML as trees instead of strings.

(html
(head
(script (src . “/static/index.js”))
(meta (name . “traceparent”)
(content . “”)))
…)

[html,
[head,
[script, src=“/static/index.js”],
[meta, name=“traceparent”, content=“”],

[html,
[head,
[script, {src,“/static/index.js”}],
[meta, {name,“traceparent}, {content,”"}],

Adding pointy brackets to this is a rendering step to do later. As late as possible, in fact. It’s not quite as pretty in Haskell, but then Haskell programmers figured out a way to turn (much of) XML validity into type-checking.

So can we please keep HTML out of discussions about mutant strings, including quoting and interpolation. Interpolation into trees in Erlang is trivial.

1 Like

I’ve come to prefer working on actual HTML. It makes copying and pasting not require a conversion step :slight_smile:

Copying and pasting from random *ML files into Erlang (or Prolog or Lisp) or vice versa isn’t something I do very often; doubtless it is a sad defect in me that I can’t imagine a conversion step being much of a problem for me.

To insert XML verbatim:

Ctrl-X Ctrl-I ESC

To insert XML as Erlang:

Ctrl-U Meta-! qh -hr ESC (-hp/-hl for Prolog/Lisp).

Of course, the first way could be random junk; the second way, if doesn’t contain well-formed XML the “Quick Hack” parser will report an error and nothing will be inserted. Win-win.

Pasting as Erlang

  • gives me something where bracket matching works in the editor

  • gives me something where “select term” and “swap terms” and
    “move forward/backward by terms” work in the editor

  • gives me something where anything that looks like an Erlang
    string is an Erlang string and anything that doesn’t
    look like an Erlang string isn’t one

  • gives me something I can indent and outdent without needing
    teenage mutant ninja string quotes and without altering the
    spaces in the data

  • gives me something where if the Erlang brackets match,
    the XML generated will probably be well-formed

and that’s just the benefits while I’m staring at it.
Processing XML as strings when you could process them as trees,
well it doesn’t bear thinking of.

The only language I commonly use where XML-as-strings isn’t a pain in the parse is AWK, and in AWK it’s a pain in the parse one has to put up with because AWK doesn’t do trees. Even in C there are better approaches.

To repeat: where there are data (like XML or assembly code) that are naturally representable as trees, in languages that make trees easy to write, such data are best represented as trees, keeping the textual representation strictly for external use. Such data should not be presented as use cases for teenage ninja mutant string quotes. After a while, you’ll forget some of the reasons you switched, because mistake you DON’T make are hard to notice.

Of course that should be

Ctrl-X Ctrl-I filename ESC

Ctrl-U Meta-! qh -hr filename ESC

The point is, of course, that it’s not that much of a burden.

1 Like

EEP 66 has been updated to have verbatim sigils with no escaping of the end delimiter character, with a lager set of string delimiter characters.

The String Sigils implementation has been update on ‘master’ accordingly.

3 Likes