Leex documentation tells me to use a function whose documentation says I shouldn't use it

Hello!

I am beginning to write some Erlang code, and decided to write a simple lexer (without a parser attached) using leex, and found the documentation a bit confusing.

I know the lexer produced by leex creates the function string, which reads tokens from a string and returns them as a list; I can make that work fine.

But what if I want to go partially reading the input – for example, when interacting with a user, in a similar way to a REPL, where a token may be broken into different lines, and I want to read and process the complete tokens of a line before going to the next?

For example, an extremely simple lexer for integers and strings of lowercase chars would be like this:

Definitions.

Integer  = [0-9]+
String   = \"([a-z\n]*)\"

Rules.
{Integer}  : {token, {integer,   TokenLine, begin {N, _} = string:to_integer(TokenChars), N end}}.
{String}   : {token, {string,    TokenLine, string:slice(lists:droplast(TokenChars),1)}}.

Erlang code.

-export([tokenize/1]).

tokenize(String) → 
    {ok, Tokens, _EndLine} = lexer:string(String),
    Tokens.

I’d like the lexer to read the strings even if they are split among different lines, without having to manage a buffer. So I saw in the leex documentation that it generates token/2 and token/3:

token(Cont, Chars, StartLoc)

This is a re-entrant call to try and scan a single token from Chars.

If there are enough characters in Chars to either scan a token or detect an error then this will be returned with {done,…}. Otherwise {cont,Cont} will be returned where Cont is used in the next call to token() with more characters to try an scan the token. This is continued until a token has been scanned. Cont is initially [].

It is not designed to be called directly by an application, but is used through the I/O system where it can typically be called in an application by:

io:request(InFile, {get_until,unicode,Prompt,Module,token,[Loc]})
→ TokenRet

token/2 is like token/3, with StartLoc=1. Great! But then,

1> h(io,request).

request(Request)

The documentation for request/1 is hidden. This probably means that it is internal and not to be used by other applications.

request(Name, Request)

The documentation for request/2 is hidden. This probably means that it is internal and not to be used by other applications.
ok

Hmm, perhaps this (the documentation) could be enhanced? The leex documentation tells me to use some function that is documented as not supposed to be used…

So I have two questions:

First, how do I achieve what I was trying? As far as I understand, if io:request should not be used, I need to create and manage a buffer for the lines, append the next line, etc. Is that the case?

Second, would it be the case of opening an issue on GitHub - erlang/otp: Erlang/OTP requesting for clarification in the documentation?

I used it by calling tokens/2 directly.

An end token is being used for detecting the end of the content:

Rules.

%% rules...

\r\r\n : {end_token, {string, TokenLoc, ""}}.

And the code to read from stream:

-type pos() :: {non_neg_integer(), non_neg_integer()}.
-type token() :: {string, pos(), string()} | {identifier, pos(), atom()} | {atom(), pos()}.

-type token_cont() :: tuple() | [].

-spec collect_tokens_from_stream(io:device(), token_cont()) -> [token()].
collect_tokens_from_stream(IoDevice, Cont) ->
    case file:read(IoDevice, 1024 * 1024) of
        {ok, Contents} ->
            {more, Cont1} = scanner:tokens(Cont, Contents),
            collect_tokens_from_stream(IoDevice, Cont1);
        eof ->
            %% feed chars to create an `end_token`
            {done, {ok, Tokens, _}, _} = scanner:tokens(Cont, "\r\r\n"),
            Tokens
    end.

I never used io:request/2 since there is no document for it.

Hope to know the recommended way of using it, too.

1 Like

And it looks like the document is wrong or outdated:

token/3 and tokens/3 do not return {cont, Cont}, they return {more, Cont}.

I created a PR to fix it.

3 Likes

Thanks a lot, it works that way!

1 Like