Compiler optimization idea: binary:split/3 compiling lists as patterns at compile time

sabiwara · September 1, 2024, 2:34am

Hi,

Context:

using compiled patterns (binary:compile_pattern/1) for functions like binary:split/3 can be much faster than passing a list of binaries
due to compiled patterns being refs, they cannot be generated at compile time and reused at runtime
this makes it inconvenient to compile patterns once and reuse them (e.g. use persistent term)

Proposal:

calls of functions accepting patterns, when called with a literal binary or list, could be compiling the literal as a pattern and emit code using this pattern
the pattern would somehow need to be kept in the beam file so that it could be pulled from the literal pool

So writing

binary:split(Text, [" ", "\n"], opts).

would emit code like

binary:split(Text, pattern_ref, opts).

I’m not knowledgeable enough with the VM internals to know where to get started or if it would be possible in the first place, but I’m sharing the idea just in case.

starbelly · September 1, 2024, 1:35pm

I think you very much need to hear from the OTP team on this, yet I cogitated on this, and while I’m an infant in this area, in its current implementation I’m not sure how this could work, given that and as you stated, these are references tied to objects (tries it seems) in memory at runtime.

You would need a way to JIT compile or de serialize a serialized version of the data (vs a reference) stored in the beam file. Out of the two a jit instruction sounds more favorable, but still not sure it’s possible either, and if it is, is it safe? And does it not introduce surprising behavior? And what would be the improvement performance wise on average?

I do like ideas though, and I don’t think there are bad ideas

mikpe · September 1, 2024, 2:32pm

I see two ways of making this work: one at compile and load-time, and one purely at runtime.

At compile-time recognize literal binary search patterns and mark them as special literals in the BEAM file. At load-time, compile them and store their compiled versions in the literals pool. Pros: works for the requested cases, doesn’t need dynamic storage management. Cons: complicates the compiler and loader slightly (the loader already does a lot of magic, so I’m hoping this could fit there).
At runtime maintain a cache mapping search patterns to their compiled versions. Pros: works for all cases. Cons: all the usual downsides of caching (cost of lookups, how to control the cache size, what to expire and when, more code in the VM for all of this).

Is compiling patterns at runtime so expensive that we’d want something along these lines? I don’t know.

josevalim · September 1, 2024, 7:59pm

I wonder if we could somehow generalize the first version. If we had a module_term, which is a persistent term but tied to a module, we could compute the precompiled pattern and store it either on first invocation (or perhaps during on_load). This way I could precompile it myself on the cases I find relevant.

Or perhaps a simpler option is to make binary:compile_pattern(…) return a binary representing a blob of the compiled pattern, instead of a reference. This means we can reuse the existing “pure function inlining code” already in the compiler, which will then inline the binary blob into the call (this is already used by re:compile/1).

sabiwara · September 2, 2024, 1:04am

Thanks everybody for the feedback!

I did some quick benchmarking but this seems to be highly depending on the binary size (much higher for smaller binaries, insignificant for big ones) and the length of the list (the more complex the list, the higher the benefit of compiling).

To take a bit of a pathological case, splitting on whitespace using Elixir’s String.split/1 (which uses a list of 22 spaces) on a small string like "Lorem ipsum dolor" could be ~29x faster.

But the biggest benefit if the compiler could take care of this optimization is that the user could mostly stop caring about compiling patterns and storing them, just split directly and get optimal performance. And string manipulation is quite ubiquitous, so it can affect many codebases.

michalmuskala · September 2, 2024, 9:34am

I do think that the proper way to introduce this would be to have proper first-class module constants/statics. You could then fairly easily define something like (syntax of course just a placeholder):

PATTERN -> binary:compile_pattern([~"foo", ~"bar", ~"baz"]).

split(Data) -> binary:split(Data, PATTERN).

The semantics if the evaluation happens eagerly at load-time or lazily, first time the value is accessed is also something to be determined. persistent_term somewhat covers this, but it is fairly high-ceremony compared to something like this. Ideally the constants could also be accessed remotely (and in guards) with something like searcher:PATTERN eliminating another case for macros and allowing those constants (as first class language constructs) to be properly documented.

Doing it in the compiler completely automatically is tempting, but seems like a much more complicated case compared to something more explicit like this - it also feels more “erlangish” to avoid automatic behaviour or fickle optimisations (since just changing the code slightly could cause compiler to no longer “see” the whole pattern and not optimise it properly).

Elixir could, of course, build on top of that to provide something even more ergonomic with inline definitions, sigils, or something similar.

sabiwara · September 3, 2024, 12:30pm

Indeed having proper first class constants in Erlang would solve this use case, and this would be a satisfying way to achieve it.

Doing it in the compiler completely automatically is tempting, but seems like a much more complicated case compared to something more explicit like this - it also feels more “erlangish” to avoid automatic behaviour or fickle optimisations (since just changing the code slightly could cause compiler to no longer “see” the whole pattern and not optimise it properly).

This is a good point, but to be fair this applies for any compiler optimizations, this doesn’t mean they can’t be worthwhile. In the present case, the benefits of optimizing the calls would be:

would automatically speed up all existing code without any change
for library code that needs to support several OTP versions, authors would need to rely on a minimum version to be able to leverage it, and they would need to modify their code
since all the necessary info is available at compile time in this case, the non-optimized case would always be sub-optimal and compile time seems like a natural timing to compute it

I totally agree this would have to be in balance with the complexity in the compiler itself, and it might not be worth it in the end. I just wanted to expose the benefits behind the approach suggested above for the record.

starbelly · September 3, 2024, 1:18pm

You’ve got a point there And I do love all the optimizations as of late, I think the main thing for me, is still is it safe? Is there surprising behavior? Is the complication worth the benefit (as you stated)?

Additionally, one more, and a question of history, but perhaps also informs a decision, why were compiled patterns kept as refs vs how it’s done in re? I think that’s pretty interesting.