Does pattern matching order matter in terms of performance?

nhpip · February 26, 2024, 4:19pm

Given code like this, does the order matter? If I discover that PACKET_FOO_BAR is received with a higher frequency than the others would it make sense to move it to the top (from a performance perspective)?

decode_inbound_packet(?PACKET_FOO) ->
       %% Do packet foo work
...
decode_inbound_packet(?PACKET_BAR) ->
       %% Do packet bar work
...
...
decode_inbound_packet(?PACKET_FOO_BAR) ->
       %% Do packet foo bar work

williamthome · February 26, 2024, 4:47pm

AFAIK, no. The compiler decides in which order the pattern match will be executed based on “the best fit”. Please see:

nhpip · February 26, 2024, 5:59pm

Thanks. I never used the erlc -S option before. It’s interesting to peak beneath the hood to see what pattern matching does.

mikpe · February 26, 2024, 7:15pm

That depends on how that pattern match will be implemented. If it gets implemented as a linear search, you’ll want the most common case early. If it gets implemented via a jump table, order doesn’t matter. Look at the compiler’s “beam assembly” output (erlc -S) if you’re curious.

There’s lots of literature on strategies for compiling structural pattern matching, and switch aka case statements in imperative languages, but most of it is several decades old by now.

nzok · February 26, 2024, 11:23pm

It’s really impossible to say much without knowing what ?PACKET_FOO, ?PACKET_BAR,
and ?PACKET_FOO_BAR are. For example, if they were identical, then obviously the
order would matter a lot! There are two issues here: efficiency and correctness.

Considering correctness, the only time that order matters is when the patterns
(and associated guards, if any) overlap. One thing I would find helpful is a

compiler option or other tool to say “clauses X and Y of function Z overlap”,
because that’s when I need to be careful that the order is right.

Considering efficiency, the only guarantee the compiler offers is that it’s AS IF
the matches were done clause by clause in the order written. So there is nothing
you can do to “optimise” the clause order, and if there were, there’s no reason
to believe that the best order in release R won’t be the worst order in release R+1.
So don’t bother about it.

nzok · February 26, 2024, 11:25pm

There’s one piece of advice about clause order you may find helpful.

BASE CASE FIRST.

There are two reasons for this. The first is that it’s not unusual for
people to put so much effort into worrying about the recursive case(s)
that they forget the base case(s). Making a habit of putting the base
case(s) first deals with that.

The second is that as soon as you have the base case(s) down, you have
something you can test.

raimo · February 27, 2024, 9:05am

Thank you for that reasoning.
I have always been ambivalent about this, due to lack of good arguments for either way…

rvirding · February 27, 2024, 1:58pm

One thing to be aware of is that while the compiler will sometimes happily reorder the clauses it GUARANTEES that matching will logically occur in the order of the clauses top-down. This is guaranteed!

So for example there might be sequence of clauses where the argument pattern is a tuple of 5 elements, then the compiler can fix it so that if the argument is not a tuple of 5 elements then these clauses can be skipped as they will never match.

williamthome · February 27, 2024, 2:44pm

Nice to read those clarifications. Please disregard my previous comment. BTW, the attached links stills relevant.

nhpip · February 27, 2024, 2:44pm

Thanks everyone. In this case they are ethernet frames.

jimdigriz · February 27, 2024, 3:29pm

In this case they are ethernet frames.

If you are sensitive on performance you should be able to offload this onto the kernel using SO_ATTACH_FILTER and a vanilla BPF (no need for eBPF, just ignore it for this).

Create a socket for each of your conditional branches for your expected ‘hot’ paths and apply a filter to each and then just have a catch all socket for the rest with a filter to grab only the rest.

To get familiar with the byte code you can just use:

$ tcpdump -d 'ether proto 0xbeef'
Warning: assuming Ethernet
(000) ldh      [12]
(001) jeq      #0xbeef          jt 2	jf 3
(002) ret      #262144
(003) ret      #0

The pcap-filter and *BSD bpf(4) manpages are useful as is the Linux filtering documentation.

If you need to crib some C code, do have a look at catnip.

nhpip · February 27, 2024, 6:38pm

Thanks for that, looks awesome

jimdigriz · March 2, 2024, 3:21pm

If you are sensitive on performance

I should qualify this.

‘Performance’ means you are processing of the order of more than 10k packets per second per core; this includes if you have your interface in promisc and need to filter out the noise.

Below this, the complexity of teaching the rest of the team about BPF is unlikely to pay off and you should stick to matching rules. Only worry about optimising when you measure there is contention here.

If this is for your own stuff and you want to learn…of course go wild!