I’m trying to write a client for a tcp server, but I don’t totally understand how to deal with the response when it contains unicode string. It doesn’t matter what the server does, just imagine a response like “*?4\n🌾”.
Depending on whether the client tcp socket is created with the binary option or not, what I actually receive is <<42,63,52,10,240,159,140,190>> or [42,63,52,10,240,159,140,190].
My problem is that when I use io:format on any of those two responses, I get “?4\nð¾". I’ve also tried io_lib:write_string and io_lib:write_unicode_string but I get ""?4\nð\237\214¾"”.
How can I turn the response to a unicode string?
Thanks
Try to call io:put_chars on this, seems like it handles all the unicode characters correctly. Also, make sure that your terminal supports emojis if you want to print them.
the problem with io functions is that they just print the result. I need the unicode string so I need a function that actually converts the response to a unicode string and returns it so I can work with it later.
If you want to have a list of numbers, where each number represent a single character (1:1 relation) you can use string:to_graphemes/1. In this example, it will convert 4-byte corn emoji into a single integer and you can process it later as you want.
Given a binary encoded in UTF-8 (as is the binary response from your server), it can be converted to a list of characters (Unicode code points) using unicode:characters_to_list/1:
If you have the server response as a list, you will need to convert it to a binary using list_to_binary/1 before calling unicode:characters_to_list/1.
That happens to work with this example, but it is probably slower than using unicode:characters_to_list/1 and it will also combine code points into clusters as in this example from the documentation of string:to_graphemes/1: