Starting a series on lexing and LALR-1 parser generators using leex and yecc. The series is really focused on the “by example” part since it took me a long time to get started with these tools.
The first example is lexing and parsing advent of code 2024 day 3. Hope you enjoy!
Sounds amazing! Focusing on “by example” is a fantastic way to make lexing and LALR-1 parsing approachable. Using Advent of Code Day 3 as a practical case is a brilliant idea - looking forward to learning from your insights!
Is there an example of how to skip nested comments with leex?
I haven’t used leex specifically (I prefer to write my scanners by hand), but for nested comments you need to step outside of the finite automaton framework a bit. (Nested comments aren’t a regular language so they cannot be described by finite-automata or regular expressions.)
The standard solution is to have the first open-comment sequence initialize a counter and start a sub-automaton. That one scans and throws away characters, and for the open-comment and close-comment sequences it adjusts the counter. If it drops to zero, you’re done and resume the normal automaton.
sed -z -E "s#/\*([^*]*\*+[^*/])*[^*]*\*+/##g" <<EOF
/*** this is c comment ** /
**/
int blah(struct myobj **p)
{
return (*p)->f(p);
}
/* /* /*
* other c comment
*/
EOF