Adding atoms to the definition of IO data?

I often want to render IO data where the source data contains atoms. It would be super nice if atoms were automatically converted to binaries when used in e.g. iolist_to_binary/1 and other IO functions.

I’m thinking of cases like these:

callback_url(Type) when is_atom(Type) ->
    iolist_to_binary([config(callback_root), $/, atom_to_binary(Type)]).
    % Which could become:
    iolist_to_binary([config(callback_root), $/, Type]).

(Ignore the fact that this might not be the cleanest way to join URI components, that is besides the point).

I don’t immediately see a problem with with it since it would only be at “encoding” time (and not “decoding” time which is another can of worms). It would just be a convenience for things that already looks like text to developers, e.g. iolist_to_binary([foo, $., bar, $. baz]) would generate <<"foo.bar.baz">>. Then one could use iodata() structures as a poor mans templating system as far as atoms are concerned.

What are your opinions? Any reasons why this would be a bad idea? Are there cases where it would cause problems or not be feasible?

I’m not sure why, but lists:concat/1 seems to have a similar behavior:

1> lists:concat([foo, ".", bar, ".", baz]).
"foo.bar.baz"

But by using char it does not return the expected result:

2> lists:concat([foo, $., bar, $., baz]).
"foo46bar46baz"

Interesting find. It even supports integers and floats too. :thinking:

IO data is kind of ideal, because in the end one does not have to concatenate or flatten anything, so supporting atoms there would be the most performant choice.

It would be impossible to support integers since those are already used for code points, but personally I don’t want that anyway because there usually isn’t a good 1:1 mapping between an integer/float and the desired string representation since it usually varies (precision, thousand separator etc.). But for atoms if you really want them as a string by default it is easy, since they are their own string representation (there’s just no automatic conversion for it, which is what I wish for with this post :smile:).

2 Likes

This is because $. is a representation of 46, not a list, which
would be [$.] or ".".

1> lists:concat([foo, [$.], bar, [$.], baz]).
"foo.bar.baz"
2> 46 =:= $..
true

The implementation is:

integer() -> list().
float() -> list().
atom() -> list().
list() -> list().

Think there’s there’s a typo/bug in the documentation.

%% concat(L) concatenate the list representation of the elements
%%  in L - the elements in L can be atoms, numbers of strings.
%%  Returns a list of characters.

It should probably be numbers or strings

1 Like

From the documentation it also looks like lists:concat/1 only handles flat lists so it’s not a replacement for deeply nested IO list creation, which is very efficient.

1 Like

An iolist is a collection of bytes, not a collection of characters, so I don’t think that it would make sense to include something that would need to be converted to characters in there.

If we were to add it anywhere, then I would then add it to unicode:chardata/0, as that is a collection of characters where we have a known encoding to convert to. You could then to unicode:characters_to_list(["hello ", atom]) and get what you expect. I don’t think that we should do this, as dealing with chardata is complex enough in places like the string module.

As a small aside, the file:name/0 type is already almost what you describe that you want :slight_smile:

3 Likes

Right, so with that explanation, I realize that what I’m really asking for is a built-in, singe-pass, recursive way to encode (some) terms to UTF-8… I suppose.

That is, something like foobar_to_binary/1,2 where ..._to_binary has the same meaning as in atom_to_binary/1,2 (since it is encoding aware) and foobar_to_... has the same recursive meaning and flexibility as iolist_to_....

If I understand you correctly what you are saying is that iolist_to_binary is purely a way to efficiently traverse BEAM memory (from various sources) and pipe it into an IO sink (socket, binary, file, whatever) and that due to complexity of encoding it will not get any support for exporting such data to the configured encoding of the VM (in the way atom_to_binary does).

The other alternatives are less interesting because they remove the (most) interesting property of iolist_to_binary (or whatever underlying mechanism is used that works with sockets/files etc.) in that it can be done in a single pass by the VM itself. The multiple pass way is what any Erlang project already uses today, either explicitly (homegrow recursive algorithm before the IO sink) or implicitly (via some library that does it wholly or partly, e.g. the new json module in OTP).

How are atom text representations stored internally in the BEAM? As UTF-8 binaries somewhere?

What you want is some simpler way to convert atoms to strings without having to litter the code with s(Atom) or adding some extra pass. This is very similar to the problem that EEP-62 tries to solve, only that it does it for more datatypes than just atoms. If EEP-62 were to be introduced, would this make adding atoms to iodata() still useful? or are they two different solutions to the same problem?

How are atom text representations stored internally in the BEAM? As UTF-8 binaries somewhere?

yes.

The property I would like to keep is to be able to recursively create an IO data structure without flattening it more than once (which happens for free when you pass an IO list to some output sink). Interpolating it myself, with some library or with EEP-62 doesn’t really matter that much because it is still one additional (unnecessary) pass over the structure. If the EEP-62 syntax would be compiled to something that retains the IO data properties, then that of course would solve the problem.

Right, so in my naive dream world I’d like for something like this to work: iolist_to_binary(["foo", <<"bar">>, baz], utf8) :smiley: where the atoms would just be piped/encoded automagically (and where the default option would of course be utf8 as with atom_to_binary/1).

It’s difficult to see how EEP-62 would help.
Suppose you want to include an atom in an IO list.
There are two cases.
(1) You have a specific atom in mind.
In this case, you just turn ‘X!&Z’ into “X!&Z” and there’s no problem.
(2) The atom will not be known until run-time.
In this case, EEP-62 is a rather heavy-weight alternative to atom_to_list(Atom).
which could be given a local abbreviation if used often.

This turns our focus to “where are the atoms coming from, and why are they atoms?”

I am actually very sympathetic to the idea of allowing atoms in I/O lists.
I have a library for another programming language in which its analogues of
I/O lists (called chartrees and bytetrees) DO allow atoms. If Erlang had
happened to allow this way back when, I doubt that anyone would have argued for
removing that feature when making Erlang Unicode-friendly.

Since the feature isn’t present in Erlang (yet), a useful response is to say
“what does the code look like where this would be helpful?”

Back in the day there used to be a joke about New Zealand.
When the USA came up with a new educational idea,
we’d wait 10 years until it was proven to be a very bad idea.
And then we would adopt it.
EEP-62 reminds me forcibly of that joke.