t__ - An implementation of the Gettext system in Erlang

ergenius · September 4, 2022, 11:19pm

Hello everyone!

I want to share t__ with you, an Erlang implementation of Gettext.

t__ is available on:

Github
HEX

Features:

Highly configurable with multiple PO sources and languages per Erlang application or process.
Supports contexts.
Supports interpolation using familiar Erlang format control sequences (from io:format).
Supports translating singular term with or without interpolation and with or without context.
Supports translating plural term with or without interpolation and with or without context.
Supports all plural terms formulas defined by UNICODE CLDR.
Supports ETS tables based caching.
Supports monitoring PO file changes and reloading

You can find a full demo application on Github demoapp. The application demonstrate most functionalities, including using multiple repositories and t__ being able to monitor repositories PO changes and reload them in real time.

Any feedback is highly appreciated…

Thank you!

ergenius · September 5, 2022, 10:41am

After some consideration about the priorities in the project road map, I decided to start working on supporting other control sequences for interpolation, except Erlang io:format ones, mainly because you can’t change variables order in the translation with Erlang control sequences…

I will do this by making the t__ aware of as many gettext flags as possible, like:
#, c-format
#, python-format
#, java-format
and eventually try to register with gettext community a new format called
#, erlang-format

I will keep you informed about my progress on this…

mworrell · September 5, 2022, 11:26am

Interesting, I like especially that you implemented the plural forms.

ergenius · September 5, 2022, 1:52pm

Thank you for your kind comment and… Zotonic!

Yeah, that was the most time consuming part, involving lot’s of google-ing around

The unexpected part of the research was… many implementations are not 100% compatible with each other or with Unicode CLDR. More, some well known translation software or services (like crowdin for example) use slightly different formulas. some of them even have broken formulas with plural cases missing.

I chose to trust CLDR and I don’t really know if this is the best choice. Still questioning myself if it’s better or not to collect and support all alternatives formulas I could find…

mworrell · September 6, 2022, 9:29am

I was thinking of parsing & compiling the expressions. In Zotonic we have expression parsers and template compiler, which could be adapted to make an Erlang module to quickly check the expressions, given a n value.

Didn’t do it yet, something with time And have to figure out how to use this with crowdin, which we use to manage all our translations. (We generate a .pot file from all templates and erlang sources, and then crowdin automatically picks up that new pot file, adds new translations and makes a merge request for .po files for those new texts.)

ergenius · September 11, 2022, 3:48pm

Yep, time I’m up against here, too This week was quite busy on the job, that’s why i’m answering so late…

I can’t promise a deadline but I will try to come with a PR on zotonic, implementing plural formulas (compatible with crowdin). But first I need some… time to study your existing implementation. I will come up with any question I have, after familiarizing myself with Zotonic i18n…

mworrell · September 13, 2022, 11:40am

oh wow, that is nice if you can find some time to look into this.

We have quite a different implementation, though based on the gettext.erl from old times (2009 or so - way before the current rebar and hex days).

For the number-variations I was thinking of using our {% trans ... %} tag and then look for an n argument for the variation selection. The trans tag compiles a string with placeholders and the performs the runtime argument substitution, see:

In this case we would need to compile all n-variations and then runtime select which version to execute, based on the value of the passed n argument.

Needs some changes to the underlying get text implementation to be able to read those, maybe with an extra ets table or special-ets keys to store the variations (we have an ets table with translations for every site running).

ergenius · September 17, 2022, 12:35pm

Hello again!
(I believe we will meet each other on this forum every weekend from now on, it’s the best I can do.)

This morning I did some research on various official formats used with gettext. Here there are some highlights:

Best/full implementation support both argument reordering and argument formatting. As a side note, most implements at least argument formatting and optional argument reordering

Your proposed formatting solve the problem of argument reordering very well, but for a full 100% implementation I suggest for us to find a standard for argument formatting options as well.

Some gettext formatting from other languages:

c-format:
“Only %2$d bytes free on %1$s.”
(index positional reordering and formatting)
python-format:
‘%(language)s has %(number)03d quote types.’ %
(named keys reordering and formatting)
ruby-format
“%{name}s have %{count}d book”
(named keys reordering and formatting)
%{key}[flags][width][.precision]type
php-format
‘The %2$s contains %1$d monkeys. That's a nice %2$s full of %1$d monkeys.’
(index positional reordering and formatting - repeating same argument is allowed)
%[argnum$][flags][width][.precision]specifier

Let’s come with a standard for erlang-format that have them all. In erlang we have ~ wich is the best candidate for a separator in between placeholder name and (optional) formatting metadata.

A Zotonic and Erlang io:format compatible solution could be:

{argname~F.P.PadModC}
Where:

[argname] - is required
[~F.P.PadModC] - is optional
parameters could appear more that once in any order
it has the advantage that can be easily parsed by existing implementation, we only need a split by ~ and a recursive io_lib:format each argument.

This will look like this:

{% trans "Hello {{foo}}, and this is {foo~16.16.0b} and this is the same {foo} and this is another {foo2}." foo=1234 foo2=5678 %}

What do you think about this proposed format?

If I get the green light from you I will start adding a format parser on both Zotonic and t__

As for the ets, in t__ I use almost the same approach, one table for all languages together per application repository - because with t__ one application can have more than one PO repo - that’s the only difference (plural and singular are kept toghether in the same ETS - a compound key makes the difference in between them when needed). Because both implementation and translator need to know if a plural is required (when generating POT files also this info is necessary) all implementation have some sort of a mark for plural forms. Most double the translation, giving 2 English forms - for one and for many). Rarely I saw usage a plural flag. In Zotonic this could look something like this I believe (notice the plural):

{% trans "There is {count} books" plural count=1 %}
or like this (it offers the default plural translation for missing plural translations) that’s why most provide this one/many - actually this looks like the preferred implementation in most advanced implementations:
{% trans "There is {count} book" "There are {count} books" count=56 %}

Another advantage of using both forms is you can easily create a compound key from the input that need to be translated - like this for example:
{language, “There is a dog”} - ets key for a singular translation
{language, “There is {count} book”, “There are {count} books”} - ets key for plural
Full ets rows including the key:
{{es, “There is one book”}, [“singular form”]}
{{ro, “There is {count} book”, “There are {count} books”}, [“plural form 1”, “plural form 2”… “plural form N”]}

Then all you need left is a selector from the value list. Singular selector always returns the only available form, plural selector select the proper list element by formula and index N.

Please notice that gettext does not support mixing 2 variables or more EACH with plural forms together because the combinations that will result from, will make the translators job and implementations unpractical. Some languages having 6 plural forms… For this situation gettext simply recommend splitting the variables into different phrases/translations.

Like this:
{% trans "There is {count1} book." "There is {count1} books." count1=56 %}
{% trans "{count2} is reserved." "{count2} are reserved." count2=56 %}
NOT like this:
{% trans "There is {count1} book and {count2} are reserved" "There is {count1} books {count2} and are reserved" count1=56 count2=1 %} ← you need more forms/combinations one/many, many/many, many/one…

mworrell · September 19, 2022, 6:52am

Those formatters are a nice idea.

I was wondering how we should to the type casting (handling undefined etc), but then maybe. Maybe we could inspect the formatter and have a type cast applied, shouldn’t be too hard. Especially as we do some special lookups and digging into data structures to obtain the correct variable values.

For the plural forms, having the two versions sounds like a good indicator. Then at least the English form is already provided (in case there aren’t translations). It should be easy to add an optional second string to the template parser. Then we should only know which variable to check for the plural form, maybe the use of a n variable could solve that (or count). I was thinking of n as that is also used in the formulas and is a bit neutral to any meaning.

ergenius · September 19, 2022, 11:01am

Sounds right, the n idea!

I have to dig a little bit more into Zotonic models/rendering views process to understand what’s going on with the casting and how exactly you match the model properties with tags from templates.

Waiting for the next weekend to start coding and again come with a… huge feedback I hope I’m not abusing you with too much talk.

mworrell · September 19, 2022, 11:23am

In the Zotonic template compiler we do a lookup of the strings at compile time.

Then the strings are translated into a list of expressions, which are evaluated at runtime.

github.com

zotonic/template_compiler/blob/master/src/template_compiler_element.erl#L764-L835


      
          trans_ext({string_literal, SrcPos, Text}, Args, CState, Ws) ->
              Unescaped = template_compiler_utils:unescape_string_literal(Text),
              trans_ext_1({trans, [{en, Unescaped}]}, Args, SrcPos, CState, Ws);
          trans_ext({trans_literal, SrcPos, Tr}, Args, CState, Ws) ->
              trans_ext_1(Tr, Args, SrcPos, CState, Ws).
          
          
trans_ext_1({trans, Tr}, Args, SrcPos, #cs{runtime=Runtime} = CState, Ws) ->
              Split = [ {Lang, split_string(Txt, <<>>, [])} || {Lang, Txt} <- Tr ],
              {FunAsts, Ws1} = lists:foldl(
                              fun({Lang, Parts}, {FAcc, WsAcc}) ->
                                  {WsAcc1, Fun} = trans_ext_fun(Parts, Args, CState, WsAcc),
                                  {[{Lang, Fun}|FAcc], WsAcc1}
                              end,
                              {[], Ws},
                              Split),
              FunListAst = erl_syntax:list(
                              lists:map(
                                  fun ({Lng,FunAst}) ->
                                      erl_syntax:tuple([
                                              erl_syntax:atom(Lng),

This file has been truncated. show original