Erlang Blog Post: Leex and yecc by example: part 1

Starting a series on lexing and LALR-1 parser generators using leex and yecc. The series is really focused on the “by example” part since it took me a long time to get started with these tools.

The first example is lexing and parsing advent of code 2024 day 3. Hope you enjoy!

13 Likes

Sounds amazing! :tada: Focusing on “by example” is a fantastic way to make lexing and LALR-1 parsing approachable. Using Advent of Code Day 3 as a practical case is a brilliant idea - looking forward to learning from your insights! :rocket::computer:

3 Likes

Part 2 is now available Leex and yecc by example: part 2 | Chiroptical’s Blog

I discuss the lexer and parser for advent of code day 4 and a simplification I was able to do to day 3.

3 Likes

@chiroptical Amazing blog.

1 Like

Is there an example of how to skip nested comments with leex? I want to skip things between (* *) symbols, which are Oberon comment style.

Is there an example of how to skip nested comments with leex?

I haven’t used leex specifically (I prefer to write my scanners by hand), but for nested comments you need to step outside of the finite automaton framework a bit. (Nested comments aren’t a regular language so they cannot be described by finite-automata or regular expressions.)

The standard solution is to have the first open-comment sequence initialize a counter and start a sub-automaton. That one scans and throws away characters, and for the open-comment and close-comment sequences it adjusts the counter. If it drops to zero, you’re done and resume the normal automaton.

1 Like

It can solve C-like comments. But it’s not true-nested, the first “*/” ends the coment.

The leex code:

Rules.

/\*([^*]*\*+[^*/])*[^*]*\*+/ : skip_token.
//[^\n]*\n? : skip_token.

This site explained how it works.

You can even test it with sed

sed -z -E "s#/\*([^*]*\*+[^*/])*[^*]*\*+/##g" <<EOF   
/*** this is c comment ** /
 **/

int blah(struct myobj **p)
{
        return (*p)->f(p);
}

/* /* /*
 * other c comment
 */
EOF

The output of the command:

int blah(struct myobj **p)
{
        return (*p)->f(p);
}