Feature: Heredocs / Triple-quoted text

I’m proposing to implement heredocs, a triple-quoted text.
@josevalim describes/proposes it in Eep 59 as:

we may support triple-quoted strings, which reads better in multi-line format and reduces the need for escaping

This PR implements it. Improvements and tests are needed, but in general, it works.

For example, given this module:

-module(triple_quotes).
-export([examples/0]).

examples() ->
    % Expands to <<"foo \"bar\" baz\nfizz buzz\n  The indent is related to the first char">>
    <<"""
    foo "bar" baz
    fizz buzz
      The indent is related to the first char
    """>>,

    % Expands to "\" \"foo\" \"bar\" \"baz\" \"\n\t\"fizz\" \"buzz\""
    """
    " "foo" "bar" "baz" "
    	"fizz" "buzz"
    """,

    % Expands to <<"foo">>
    <<"""foo""">>,

    % Expands to <<"foo\"bar\"baz">>
    <<"""foo"bar"baz""">>.

And this is how the module is scanned using the code of the commit:

1> c("core_scan.erl"),io:format("~p~n", [core_scan:string(begin {ok, Bin} = file:read_file("/home/williamthome/triple_quotes.erl"), binary_to_list(Bin) end)]).
{ok,[{'-',1},
     {module,1},
     {'(',1},
     {triple_quotes,1},
     {')',1},
     {'.',1},
     {'-',2},
     {export,2},
     {'(',2},
     {'[',2},
     {examples,2},
     {'/',2},
     {integer,2,0},
     {']',2},
     {')',2},
     {'.',2},
     {examples,4},
     {'(',4},
     {')',4},
     {'->',4},
     {'<',6},
     {'<',6},
     {string,6,
             "foo \"bar\" baz\nfizz buzz\n  The indent is related to the first char"},
     {'>',8},
     {'>',8},
     {',',8},
     {string,10,"\" \"foo\" \"bar\" \"baz\" \"\n\t\"fizz\" \"buzz\""},
     {',',11},
     {'<',13},
     {'<',13},
     {string,13,"foo"},
     {'>',13},
     {'>',13},
     {',',13},
     {'<',15},
     {'<',15},
     {string,15,"foo\"bar\"baz"},
     {'>',15},
     {'>',15},
     {'.',15}],
    19}

The motivation is to create a more readable code and simplify the multi-line text creation.
Python and Elixir have it as well.
For me, this is a great feature.
What do you think about it?

9 Likes

Nice! Implementing this feature requires a couple considerations. Both Ruby and Python have heredocs/triple-quotes and because their implementation keeps all indentation, they both require post-processing. Imagine you write this:

foo() ->
  bar("""this is a
  very long
  string""").

According to Ruby and Python original implementation, the actual string will have a newline and two spaces before “very” and “string”. Python introduces a inspect.clean_doc to address it, Rails used to ship with a strip_indentation method for ages, and Ruby added a special sigil <<~ that does the stripping for you.

Elixir chose a solution where the indentation is given by the closing """. It feels natural and it just works, it only requires the closing quotes to be on their own line. And, to make it all consistent, we require the opening """ to be in its own line too. So in this example, there is no indentation on each line:

foo() ->
  bar("""
  this is a
  very long
  string
  """).

This example indents every new line with two spaces:

foo() ->
  bar("""
  this is a
  very long
  string
""").

If you have text starting earlier than the indentation of the closing quotes, a warning is raised.

The other concern is that, what happens when you have text that is long but you don’t want the new lines to show up to the user? For example, if I am writing documentation for a CLI tool, I don’t want my line breaks to be the ones printed in to the user. Elixir allows using \ to escape a newline:

foo() ->
  bar("""
  this is a \
  very long \
  string\
  """).

That would be equivalent to “this is a very long string”.

In any case, I am not saying Erlang should pick the same solutions as Elixir, but you should 100% consider the indentation issue, because it is 100% guaranteed it will be an issue in practice. :slight_smile:

8 Likes

Thanks, @josevalim!
Good points.
Next is what my implementation outputs to your examples:

% "this is a\n  very long\n  string"
foo() ->
  bar("""this is a
  very long
  string""").

% this is a\nvery long\nstring"
foo() ->
  bar("""
  this is a
  very long
  string
  """).

% "this is a\nvery long\nstring\n\")."
foo() ->
  bar("""
  this is a
  very long
  string
""").

Some bugs, as expected. I believe this should bring up some discussion.

Nice! I think this is a good one to be part of this proposal.

1 Like

Yeah, I suspect you are going ahead and removing all indentation, but that can also be problematic when you want the text to naturally have some indentation. For example, imagine you are writing four-space indented code blocks to be added within another doctest.

There is an additional consideration if this escape character should be added to all Erlang strings and, if so, what is the backwards compatibility story. Similarly, """foo bar""" is valid Erlang today. Luckily, none of those have meaning or are useful, so a simple warning telling people to not use those for now may be enough?

3 Likes

Right. Would be nice to have more people’s feedback about this.

EDIT

This is not correct, I’m not removing all indentations.
For example:

"""
This is a documentation text
    
    And here is the example with tab space

More doc text
  And some white spaces
"""

This will be the result:

"This is a documentation text\n     And here is the example with tab space\n More doc text\n  And some white spaces"

Contains bugs, breaklines are been removing incorrectly but not indentations.

I think there is no reason to worry about backward compatibility because the result is exactly the same.

2 Likes

Just to chime in here, coming from a Python background I really enjoy the approach Elixir went for. Python’s behaviour always felt confusing, and having to de-indent your entire triple-quoted string (or putting it into some cleaner function) always felt hacky. It would be nice to have this behaviour in Erlang as well IMO :slight_smile:

6 Likes

Thanks for your feedback, @jchrist!
Could you please provide some examples of what behavior you expect and do not expect?

2 Likes

Sure, here’s an example Elixir module:

# elixir example.exs
defmodule Example do
  def foo1, do:
    """
    This is an example triple-quoted string.
      Even with indents.
    """

  def foo2, do:
    """
      This is an example indented triple-quoted-string.
    """

  def foo3, do:
    """
This is an example for how to "naturally" de-dent it in Python.
"""

  def foo4, do:
    """
#{foo1()}
"""
end


IO.inspect(Example.foo1())
IO.inspect(Example.foo2())
IO.inspect(Example.foo3())
IO.inspect(Example.foo4())

And here’s an example Python module that looks the same way:

# python3 example.py
def foo1():
    return """
    This is an example triple-quoted string.
      Even with indents.
    """

def foo2():
    return """
      This is an example indented triple-quoted string.
    """

def foo3():
    return """
This is an example for how to "naturally" de-dent it in Python.
"""

def foo4():
    return f"""
{foo1()}
"""

print(repr(foo1()))
print(repr(foo2()))
print(repr(foo3()))
print(repr(foo4()))

The outputs are as follows:

$ python3 example.py 
'\n    This is an example triple-quoted string.\n      Even with indents.\n    '
'\n      This is an example indented triple-quoted string.\n    '
'\nThis is an example for how to "naturally" de-dent it in Python.\n'
'\n\n    This is an example triple-quoted string.\n      Even with indents.\n    \n'
$ elixir example.exs 
"This is an example triple-quoted string.\n  Even with indents.\n"
"  This is an example indented triple-quoted-string.\n"
"This is an example for how to \"naturally\" de-dent it in Python.\n"
"This is an example triple-quoted string.\n  Even with indents.\n\n"

Getting the “right” thing in Python out of the box is unintuitive to me because it breaks the natural flow of reading the module, especially if you take into account that in Python, indentation is syntactically very important.
The leading newline in the Python example is also often a point of contention, because people debate over whether you should start your docstring with """This function does ... or add a newline after the initial quotes - in which case you need to deal with cleaning it up :confused: I think there’s even linter plugins for that …

5 Likes

Either way, outside of indent behaviour discussion, I also just wanted to say: thanks for taking the time to implement this!

2 Likes

Great!

Oh, string interpolation, is an interesting (maybe important) thing that I hadn’t thought about.
Thinking how to deal with interpolation using triple-quotes, for example:

<<"""foo""", """ "bar" """, """baz""">>
% Maybe a not good example, it can simple be <<"foo", """ "bar" """, "baz">>

Outputs <<“foo "bar" baz”>>, but looks weird to add these triple quotes.
Should string interpolation be a thing in Erlang?

Maybe we can discuss about interpolation another time.

2 Likes

I’ll wait for the core team’s feedback before do any kind of improvement or bug fix.

2 Likes

I will be looking into this, hopefully soon-ish so that I can provide some useful feedback from the OTP team

3 Likes

@williamthome thank you so much for creating this PR :tada: :confetti_ball:

Before we can proceed, we think it is advisable to create an EEP that details how things will be handled.

I will try to write such EEP (unless you prefer to do so) and the main idea is that we would like to get as close as seems reasonable to Elixir’s semantics. In this sense, the triple quotes will create a binary instead of a string.

More details about the EEP coming soon, as soon as I figure out the semantics of Elixir :slight_smile:

4 Likes

Feel free to ping me. I will be glad to discuss/review anything. :slight_smile:

4 Likes

Thanks, @kiko!
Awesome! You can proceed with the EEP, I have no doubt that you will detail it better. BTW, I can help to do the implementation. I will also be looking at the Elixir semantics. Also, creating a binary makes total sense to me.
Thanks again!
All the best.

1 Like

It would be great if you can work on the implementation :slight_smile: , I plan on working on migrating the OTP documentation from XML to Markdown, adding documentation compiler attributes, and I could really use your implementation for the documentation :slight_smile:

I am not sure I can have an EEP by end of day today, busy with other stuff. But I will try to hurry up and have the EEP hopefully by end of next week (Tuesday is Swedish national day, so the week is a bit shorter, hence the end of the next week)

4 Likes

Nice! :smiley:
Just to complement Elixir’s syntax/semantics, Elixir has triple double-quotes for strings/binary and triple single-quotes for charlists.

1 Like

Please take a look at this project.
It contains a pseudo code of the proposal.
I’ll list all the test results below.
All of the results are based on discussions of this topic and in the Elixir documentation.
cc @josevalim @jchrist

% TEST 1
test = """
    this
    is
    a
    test
"""
% "    this\n    is\n    a\n    test\n"


% TEST 2
test = """
    This
    Is
    A
    Test
    """
% "This\nIs\nA\nTest\n"


% TEST 3
foo() ->
  bar("""
  this is a
  very long
  string
  """).
"this is a\nvery long\nstring\n"


% TEST 4 
foo() ->
  bar("""
  this is a
  very long
  string
""").
% "  this is a\n  very long\n  string\n"


% TEST 5
foo() ->
  bar("""
  this is a \
  very long \
  string\
  """).
% "this is a very long string"


% TEST 6
    """
    This is an example triple-quoted string.
      Even with indents.
    """
% "This is an example triple-quoted string.\n  Even with indents.\n"


% TEST 7
    """
      This is an example indented triple-quoted-string.
    """
% "  This is an example indented triple-quoted-string.\n"


% TEST 8
    """
This is an example for how to "naturally" de-dent it in Python.
"""
% "This is an example for how to \"naturally\" de-dent it in Python.\n"


% TEST 9
"""
this should contains "quotes"
and """triple quotes""" and
ends here
"""
% "this should contains \"quotes\"\nand \"\"\"triple quotes\"\"\" and\nends here\n",


% TEST 10
  """
foo
  """
% error(outdented)
% Please see the note in the footer of this message.

% TEST 11
"""foo
% error(badarg)

Note about TEST 10

Instead of error(outdented) Elixir just emits a warning. I have more details about this here.

2 Likes

This looks intuitive, I’m a fan! And I’m excited to hear about the OTP documentation updates as well :slight_smile:

3 Likes

I created an EEP for the semantics of triple-quoted text.
Overall, the semantics are “the same” as Elixir’s, modulo throwing an error instead of a warning.

Feedback on the EEP is welcome

4 Likes