Feature: Heredocs / Triple-quoted text

This is welcomed addition, but i think we have missing opportunity here. While triple-quoted string are convenient for formatting, other way of thinking about it is introduction of specific context in token/character interpretation. In other words, consumption of tokens in between " or """ is different.
This is furthered by:

as triple-quoted text would transform AST i assume or at least create not intuitive one.

Missing opportunity here is that triple-quoted text is introducing one context without attempting to generalize the situation. In order to fully utilize multiple (not just two) contexts maybe we could consider
something like:

 p"preserved
    formating 
    string ", 

b"string as binary", 

...

so in general <atom>"text" form. Where atom is possibly some mapping to the module which compiler can use to interpret quoted text (smells like, but lower than parse transformations). Then
support for interpolation can be “easily” and nicely given by pinning operator:

Year = 2023, 
inter"this is year ^Year".

This would also enable tools like merl to use both preservation of formatting and pinning so it would be more accessible to users as current syntax works but it is very difficult to read and work with.

One more benefit (or pitfall) would be that interpretations of strings or “contexts” could be user defined, therefore different domains could use different interpretations.

3 Likes

There is a proposal for this here: Create eep-0063.md: Lightweight UTF-8 binary string literals and patterns by TD5 · Pull Request #46 · erlang/eep · GitHub

To me, they should be separate proposals. The main benefit of """ is that you no longer need to escape " and this becomes important if documentation is going to be provided as regular strings, as proposed in EEP 59. If something like u"..." is introduced, then it could also be used with triple-quotes, such as u"""...""".

4 Likes

@williamthome btw, have you also considered exploring doctests? If Erlang does get docstrings, then doctests would also be a fantastic feature to have.

3 Likes

thanks for the link, and as predicted there are 3 drafts in there (triple-quoted, interpolation, simple utf-8 binary syntax) producing several types of modifiers """, b, bd, bf, lf, ld, and that’s exactly what i was “afraid” of.
I’m in favor of adding all three features but would like to see more holistic approach here. Composition of modifiers can get ridiculous, for example, we have """...""", and then b but according to interpolation proposal we can also have bf"""...""". Are those complete set of options? Could we get ourselves in situation where some additional modifiers are needed: bfx....z"""...""" which kinda gets ridiculous IMHO.

2 Likes

Agree.

Yes! To me it’s a powerful and simple way to add tests to docs.
BTW, why you have not considered adding doctests to EEP 59? Oh, doctests are in EEP 59.

1 Like

doctests would also be a fantastic feature to have.

We already have doctests at home:

2 Likes

We have a preliminary decision by the OTP Technical Board to add a warning about triple-quoted double-quotes in OTP-26.1, and to add this feature for OTP-27.0.

One small change was decided: Update after OTP Technical Board 2023-08-31 by RaimoNiskanen · Pull Request #50 · erlang/eep · GitHub

The OTB meeting lacked key people so there has to be a final decision in a few weeks, but we will nevertheless add the warning in OTP-26.1.

4 Likes

The implementation of triple-quoted strings has been merged to ‘master’ in 43d277c, to be released in OTP-27.0 and the preceding release candidates. An eager user can, of course, build off ‘master’ now…

The warning for triple-quotes was merged to ‘maint’ in 5ac156d to be released in OTP-26.1.

14 Likes

I have found a peculiar detail that wasn’t considered in EEP 64 regarding string endings and concatenation.

Strings can be concatenated like this:

    "First line\n"
    "Second line\n"

But strings can concatenated without white space in between, so before OTP 26.1 this was true:

"""" = ""

since the 4 quote characters are interpreted as two concatenated empty strings.

From OTP 26.1, in preparation for triple-quoted strings that produces a warning.

But, at the string end there is no warning, so this is OK:

" """ = " "

since after a string end a new string may immediately follow. To the left of the = char above is a string with a space char concatenated with an empty string.

We intend to disallow this peculiarity and introduce a new warning when you concatenate strings without white space in between. Now there is a warning for doing that at the start of a string, but not at the end.

And in OTP 27 that will become an error.

This applies also to triple-quoted strings so the following is OK today on ‘master’, but will become a syntax error:

"""
    abc
    """"" = "abc"

What say ye; OK?

5 Likes

This means that also this will become a syntax error:

"abc""def"

It should be written as "abc" "def", "abcdef", or:

"abc"
"def"

Still OK?

I am not sure any of those cases are common enough in practice to justify the hassle of adding the warning. Even if we say they may happen, the valid cases (i.e. "abc""def") feel way more likely to happen than the invalid ones (i.e. "abc"""). So we would be both doing work and rejecting valid cases for something very unlikely.

There are other places where the Erlang syntax is ambiguous (Foo=<<"bar">>) and we don’t ask to add spaces today, right? Or here is another example where the space makes a difference: Foo+ +Bar (vs Foo++Bar).

Foo=<<“bar”>> is not ambiguous, it’s an error.
And ‘++’ is an operator.

You are mixing apples and pears, as we say in German.

3 Likes

I was writing some test cases where I had 3-qouted strings in 4-quoted and 4-quoted in 5-quoted, and started thinking that it would be nice to have some sanity check on the end delimiter so that if I accidentally type too many quote characters that would be noticed, somehow.

Then I realized that the current warning in OTP-26.1 for 3-or-more quote characters did not detect this case:

A = "
    """

since that is string concatenation at the end of a regular string and not triple-quoted string.

So the conclusion was that adjacent string literals without intervening white space are hard to read and has no good use case…

@josevalim: Although I don’t think that was valid examples I think I hear what you are saying…

The question is if we can improve this? It is a consequence of the scanner being greedy and that it continues scanning at the point it completed the previous token. I’d say Foo=<<"bar">> is actually a counter example. Since the scanner scans that as Foo =< < "bar" >>, it is consistent that it scans " """ as " " "". White space between tokens is optional.

If we would start requiring white space between string literals that should probably be seen as an inconsistency in the scanner, although it might improve readability in many cases.

(It also immediately turned out that we have test cases with "file"".ext" in the source…, so yes - such code has obviously been written)

And that the warning for 3-or-more quotes in OTP-26.1 does not detect

"
"""

should not be a problem since that code will behave as before OTP-27 because it won’t become a triple-quoted string…

I’m starting to lean back towards that we shouldn’t require white space between string literals in the scanner…

1 Like

Foo=<<“bar”>> is definitely ambiguous for the developer, the compiler makes a choice which leads to an error, and some may find it surprising. The compiler could have chosen to make it work as Foo = <<"bar">> (which I am not arguing it should).

Similarly, Foo++Bar could have two distinct interpretations for a human, Foo ++ Bar or Foo + +Bar, the second being the case in Python, Java, etc:

$ python3
Python 3.11.4 (main, Jun 20 2023, 17:37:48) [Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> foo=1
>>> bar=1
>>> foo++bar
2

But of course, the Erlang compiler is making a correct choice here.

My point is that if someone looks at " """ = " and thinks it is a triple-quote, it is not different than someone looking at Foo=<<“bar”>> and thinking it is the same as Foo = <<"bar">> or looking at Foo++Bar and thinking it is the same as Foo + +Bar. These are incorrect human assumptions that they will learn once they compile and run the code. The compiler behaviour is clear.


I guess apples and pears are the same in Germany. :wink:

1 Like

But not always! I cannot remove whitespace here Foo = <<"bar">>. Nor I can remove it here: Foo + +Bar. But I can remove it here: Foo + -Bar.

The opposite though I believe to always be true: I can always add whitespace between tokens.

1 Like

Right. One can always add white space between tokens so one can say that white space is always optional at the token borders.

But in = << the white space is used to avoid scanner greed, so if the white space is removed the token border moves.

Sorry if I was being unclear :slight_smile:

Oh, and apples and pears are the same in Swedish too.

1 Like

With all that said, it feels not warning makes the most sense?

You could definitely argue for both cases. You could go in the same direction as =<< and raise, but most tokens in Erlang allow you to remove whitespace around them (even outside of token borders). So keeping the current behaviour means:

  1. It is backwards compatible

  2. It behaves as most tokens where whitespace can be removed regardless of border

Parser-wise, you could equally argue that strings only parse the required number of elements for closing versus being eager and parsing all of them. Plus, being eager in this case, would always mean parsing all contiguous quotes.

As far I remember erlang compiler simply concatene subsequent strings, even without +, so not sure this is simply required .

“This is a \n”
“simple example…”

Is valid. It is not exacly a heredoc but allow to better see how much whitespace are end of line.

(1) Erlang syntax was based on Prolog syntax.
In DEC-10 Prolog, ‘x’‘y’ was a SINGLE atom with 3 characters,
and “a”“b” was a SINGLE list with three characters. Getting
an embedded quote by writing it twice is a very old convention
as programming goes. Fortran, COBOL, Snobol, Algol 68, SQL
(including PL/SQL), Simula 67 (at least according to the 1986
standard, which also includes !ddd! character escapes and
string pasting with mandatory white space), APL (which also
has string pasting with mandatory white space thanks to ‘strand’
notation), Smalltalk, and a host of other programming languages
follow this convention. Pascal does, Ada does. LOTS of them.

It would certainly help me if a programming language that
doesn’t follow this convention would make “…”" a syntax
error.

(2) The argument for complaining about “”“…” is well taken.
For consistency, “…”“” should also be complained about.
Since both forms involve pasting the empty string, they
might be unlikely, but if Erlang code is generated by
another program, this could easily happen.

Given that Erlang allows ++ in patterns, it’s not clear
why Erlang even has string pasting in the first place.

(3) Strings are wrong. One of the things I’ve always
loved about languages like Prolog, Erlang, SML, Haskell,
is how easy it is to represent data as trees rather than
strings. Triple-quoted strings are a “hey Python, ME DO
LIKE YOU” attempt to make it easier to do the wrong thing.

1 Like

This is a good point which I missed. Given the parser is eager when parsing opening double quotes, maybe it should be eager when closing too (which would lead to a syntax error).

1 Like