RE module: strange behaviour when regular expression is compiled

Hi All,

Let’s run some regex on this HTTP headers:

> Hdrs = <<"Host: 127.0.0.1:8080\r\nUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X) AppleWebKit/602.1 (KHTML, like Gecko) Version/10.0 Safari/602.1\r\nOrigin: https://www.foobar.com/\r\nAccept: */*\r\nConnection: Keep-Alive\r\nAccept-Encoding: gzip, deflate\r\nAccept-Language: en-CH,*\r\n\n\n">>.

These work:

> re:run(Hdrs, <<"Origin:\s?(.*?)\r\n">>, []).
{match,[{153,33},{161,23}]}
>re:run(Hdrs, <<"Origin:\s?(.*?)\r\n">>, [caseless]).
{match,[{153,33},{161,23}]}

%% Let compile our regex
> {ok, RE}=re:compile(<<"Origin:\s?(.*?)\r\n">>).
> re:run(Hdrs, RE, []).
{match,[{153,33},{161,23}]}

But this doesn’t (when i specify [caseless]):

> re:run(Hdrs, RE, [caseless]).
** exception error: bad argument
     in function  re:run/3
        called as re:run(<<"Host: 127.0.0.1:8080\r\nUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X) AppleWebKit/602.1 (KHTML, like Gecko) "...>>,
                         {re_pattern,1,0,0,
                                     <<69,82,67,80,101,0,0,0,0,0,0,0,81,8,0,0,255,255,255,255,
                                       255,255,...>>},
                         [caseless])

I’m able to reproduce this behavior on both macOS and Linux running Erlang 26.x
Thanks

I suspect you need to specify caseless when compiling the RE, i.e. re:compile(..., [caseless]).

1 Like

Thanks @mikpe. But I still find the behavior inconsistent.
It doesn’t crash for example when I do:

> Hdrs = <<"Host: 127.0.0.1:4567\r\nUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X) AppleWebKit/602.1 (KHTML, like Gecko) KenViewer/3.8 Version/10.0 Safari/602.1\r\nOrigin: https://www.foobar.com/\r\nAccept: */*\r\nConnection: Keep-Alive\r\nAccept-Encoding: gzip, deflate\r\nAccept-Language: en-CH,*\r\n\n\n">>.
> {ok, RE}=re:compile(<<"Origin:\s?(.*?)\r\n">>).
> re:run(Hdrs, RE, [{capture,all_but_first,binary}]).
{match,[<<"https://www.foobar.com/">>]}

Am i missing something here?

The documentation Erlang -- re has a section starting with “If the regular expression is previously compiled, the option list can only contain the following options:” which should explain this.

1 Like

@mikpe thanks a lot.