Creating a unicode string from a list or bitstring?

I’m trying to write a client for a tcp server, but I don’t totally understand how to deal with the response when it contains unicode string. It doesn’t matter what the server does, just imagine a response like “*?4\n🌾”.

Depending on whether the client tcp socket is created with the binary option or not, what I actually receive is <<42,63,52,10,240,159,140,190>> or [42,63,52,10,240,159,140,190].

My problem is that when I use io:format on any of those two responses, I get “?4\nð¾". I’ve also tried io_lib:write_string and io_lib:write_unicode_string but I get ""?4\nð\237\214¾"”.

How can I turn the response to a unicode string?
Thanks

2 Likes

Try to call io:put_chars on this, seems like it handles all the unicode characters correctly. Also, make sure that your terminal supports emojis if you want to print them.

1 Like

Try io:format("~ts", [<<42,63,52,10,240,159,140,190>>]).

See: https://www.erlang.org/doc/apps/stdlib/unicode_usage.html

2 Likes

the problem with io functions is that they just print the result. I need the unicode string so I need a function that actually converts the response to a unicode string and returns it so I can work with it later.

1 Like

If you want to have a list of numbers, where each number represent a single character (1:1 relation) you can use string:to_graphemes/1. In this example, it will convert 4-byte corn emoji into a single integer and you can process it later as you want.

2 Likes

Given a binary encoded in UTF-8 (as is the binary response from your server), it can be converted to a list of characters (Unicode code points) using unicode:characters_to_list/1:

1> unicode:characters_to_list(<<42,63,52,10,240,159,140,190>>).
[42,63,52,10,127806]

If you have the server response as a list, you will need to convert it to a binary using list_to_binary/1 before calling unicode:characters_to_list/1.

That happens to work with this example, but it is probably slower than using unicode:characters_to_list/1 and it will also combine code points into clusters as in this example from the documentation of string:to_graphemes/1:

2> string:to_graphemes(<<"ß↑e̊"/utf8>>).
[223,8593,[101,778]]

That might not be what you want.

4 Likes

another example of a function requiring a string is;

wxTextCtrl:create/4

trying to discover the acceptable way to supply a string is very time consuming.
thank you for this important question.

1 Like