Can't set unicode in release mode

Hi All

Despite the fact I’m setting +pc unicode (in vm.args) in my rebar3 based release , it doesn’t seem to have any effect:

$ echo $LANG
en_US.UTF-8

~$ /app/bin/mirakle eval 'lists:keyfind(encoding, 1, io:getopts()).'
{encoding, latin1}

To make it work, I have to connect to the node and explicitly set it:

~$ app/bin/mirakle remote_console
...
> ok = io:setopts([{encoding, unicode}]).

This blog post (Printing non-ascii characters in Erlang releases) encountered the same issue:

[…] I first checked the shell environment, but os:getenv("LANG") correctly returns en_US.UTF-8 in both cases. I then suspected the Erlang VM was initialized with a different +pc flag in console mode, but adding +pc unicode to vm.args did not change anything.

My rebar3 based release runs in foreground mode.

What am i missing?
Help appreciated

/Z

You might want to check what io:put_chars(os:cmd("locale")). reports. There are of course several variables that can influence with what you end up with when it comes to your issue, LC_ALL might be set to something other than “en_US.UTF-8” as an example.

1 Like

@starbelly Thanks. But we don’t ship the locale command with the Docker image.

> io:put_chars(os:cmd("locale")).
/bin/sh: locale: not found
ok

You should manually check env vars starting with LC_ALL. Here is a list, followed by some examples of these env vars can influence applications.

LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
$ env LANG="en_US.UTF-8" LC_ALL="foo" erl -noshell -eval "erlang:display(lists:keyfind(encoding, 1, io:getopts())),erlang:halt()."
{encoding,latin1}
$ env LC_ALL="something else" erl -noshell -eval "erlang:display(lists:keyfind(encoding, 1, io:getopts())),erlang:halt()."
{encoding,latin1}
$ env LANG="en_US.UTF-8" LC_ALL="" erl -noshell -eval "erlang:display(lists:keyfind(encoding, 1, io:getopts())),erlang:halt()."
{encoding,unicode}
$ env LC_ALL="en_US.UTF-8" erl -noshell -eval "erlang:display(lists:keyfind(encoding, 1, io:getopts())),erlang:halt()."
{encoding,unicode}
$ env LC_CTYPE="foo" erl -noshell -eval "erlang:display(lists:keyfind(encoding, 1, io:getopts())),erlang:halt()."
{encoding,latin1}
$ env LANG="foo" LC_CTYPE="bar" LC_ALL="en_US.UTF-8" erl -noshell -eval "erlang:display(lists:keyfind(encoding, 1, io:getopts())),erlang:halt()."
{encoding,unicode}

Here’s some doc on priority of environment variables.

1 Like

@starbelly Thanks again. I’ve tried the above combinations without success.

Gonna stick with simplicity for now. This works within Docker:

-spec io_unicode() -> ok.
io_unicode() ->
    case os:getenv("LANG") of
        false -> ok;
        LANG  -> io_unicode(match == re:run(LANG, "UTF-8", [{capture,none}]))
    end.

io_unicode(false) -> ok;
io_unicode(true) ->
    ok = io:setopts([{encoding, unicode}]),
    ok = io:setopts(group_leader(), [{encoding, unicode}]).

start(_StartType, _StartArgs) ->
    ok = io_unicode(),
    % ... rest of my startup code

FWIW and to mimic what you are doing :

In one shell I have :

$ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL="en_US.UTF-8"

$ _build/default/rel/my_app/bin/my_app foreground
Exec: ... 

Then in another shell :

$ _build/default/rel/my_app/bin/my_app eval 'lists:keyfind(encoding, 1, io:getopts()).'
{encoding, unicode}
$ env LC_ALL="not_utf_8" _build/default/rel/my_app/bin/my_app eval 'lists:keyfind(encoding, 1, io:getopts()).'
{encoding, unicode}

So, you can see setting LC_ALL for the eval did nothing, this makes sense, as we’re evaling the expression on the running node.

Likewise, I can start up a release like so :

env LANG="not_utf_8" LC_ALL="not_utf_8" _build/default/rel/my_app/bin/my_app foreground                  
Exec: ... 

Then back over on the other shell :

env LC_ALL="en_US.UTF-8" _build/default/rel/my_app/bin/my_app eval 'lists:keyfind(encoding, 1, io:getopts()).'
{encoding, latin1}

You can see from the above, the environment variable has no effect here, because we are evaluating what encoding got set to on the running node.

As such, I still suggest doubling back to see what’s going on with the environment your release is starting in, but I am glad you have found a temporary work-around.

1 Like

I’m was bit surprised that you don’t get {error,enotsup} when doing app eval 'io:getopts()' so I decided to dig a bit. rebar3 uses erl_call -e to talk to nodes and erl_call by default redirects all I/O from call to the user process of the remote node. So this is why you get he UTF status of the remote node and not the local shell when doing eval. erl_call has a flag called -fetch_stdout that can be used to print things written to stdout when doing -e which I think should be added to that call. By adding that switch we also get feature parity with the old nodetool that was used before.

However, when using erl_call the return value will never be printed as unicode as erl_call is very dumb and not very unicode aware at all. It does print whatever it gets via the -fetch_stdout flag as unicode though, so maybe if someone does a PR to relx to change extended_bin to use -fetch_stdout you can get unicode to work for eval.

FYI: Setting +pc unicode controls the heuristic that turns lists of integers into strings when printed by ~tp. For example:

$ erl
> ~/erlang/28.0/bin/erl
Erlang/OTP 28 [erts-16.0] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [jit:ns]

Eshell V16.0 (press Ctrl+G to abort, type help(). for help)
1> io:format("~tp~n",[[257]]).
[257]
ok
$ erl +pc unicode
> ~/erlang/28.0/bin/erl
Erlang/OTP 28 [erts-16.0] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [jit:ns]

Eshell V16.0 (press Ctrl+G to abort, type help(). for help)
1> io:format("~tp~n",[[257]]).
"ā"
ok
2 Likes

Thanks, @garazdawi for the note about -fetch_stdout, adding that would indeed make things less surprising, thus I opened a pull request.

1 Like