Why does the efficiency guide not mention how expensive exported functions can be for compilation times

Reposting this for visibility Slack

Why does the efficiency guide not mention how expensive exported functions can be for compilation times, is it because the default in Erlang is private? I got bit by this in Elixir, where I had forgotten to make majority of my functions private, these functions had a lot of function heads.

It’s crazy to think about that the majority of Elixir codebases is suboptimal due to one character. The Erlang compiler does hell of a job at inlining, not even force inlining helps you in Elixir land. I would go as far as saying that Elixir in general is slower than Erlang.

1 Like

I do not have access to Slack (and I suspect many here do not either), so it would be best to post the relevant parts of the discussion here.

It’s crazy to think about that the majority of Elixir codebases is suboptimal due to one character.

There are several assumptions being made here without proper evidence of it being true:

  1. It assumes the majority of Elixir codebases do not use defp properly
  2. It assumes the particular optimizations which only apply to private functions can benefit the majority of code written
  3. It assumes the applicability of said optimizations have always been the same across Erlang/OTP versions (and will remain the same)

The discussion will be way more productive if we can understand the actual issue with examples (I suspect it is related to private functions and binary context optimizations), so we can potentially discuss how to address those limitations in the compiler.

3 Likes

I am curious what’s in the slack message. I hit registration page when I click the link which is ″mauvais ton″ of any internet discussion :grin:


In compilation times or in runtimes? You first say that “how expensive exported functions can be for compilation times” and then you say that it is “in general slower”. Well, even if Elixir had less private functions than Erlang, it would still be slower because

Elixir = Erlang + some modules and compile features like protocols, so it will always be slower to compile than Erlang, since compiler just needs to perform a little bit of extra work. Plus, all what Elixir compiler does is that it translates Elixir source code to Erlang AST (abstract form one) with all macros expanded and then just calls the Erlang compiler, so it has always been slower

If you’re talking about Elixir code always being slower in runtime than Erlang, you’re wrong. Well, maybe not 100% wrong, but 99% wrong, and this 1% has nothing to do with the public/private and it’s completely different topic

Are assumptions made, certainly. Regardless of anecdotes and YMMV, from my perspective as one who would consider myself a newbie in programming, then I don’t see anyway that you can escape the fact that a user will require discipline, not only when they write the code but also in reviewing it. And as someone who spent all my life trying to learn from mistakes, then I can only draw one conclusion here, that one character is an easy mistake to make, I made it several times in GitHub - elixir-dbvisor/sql: Brings an extensible SQL parser and sigil to Elixir, confidently write SQL with automatic parameterized queries. and there is a magnitude of difference in runtime performance when constructing lists, although less when working with binaries as optimizing the match context is the most important factor, so making those functions unexported only gives a recent-able speedup.

From a compilation perspective, then having tons of functions heads in a exported function is obviously going to be problematic as they generate more code, where if they where private then the compiler can do what it wants.

It is not mentioned because it is nothing that we have noticed. The ASN.1 compiler generates huge modules with many exported functions for some of the standard ASN.1 specs, and as far as I can tell the compilation time is entirely explained by the size of the source code and not by the number of exported functions.

Also, our approach to abnormally slow compilation is to fix the compiler, not documenting it in the Efficiency Guide.

Inlining is not enabled by default in the Erlang compiler (for a few good reasons). So are you talking about compilation times when inlining is enabled?

Is this huge difference in runtime performance when you have enabled inlining?

Without inlining, I can’t really see how that would make a difference. The compiler can’t really do much optimization of code handling lists. The compiler does a lot more optimization for binaries (both construction and matching), and in this case having all functions exported can result in fewer optimizations being applied.

It is not that obvious to me. Not sure what the number of function heads have anything to do with it. Functions heads are rewritten to clauses in a case early in the compiler. For some local functions the compiler can do more optimizations when the type of arguments are known, but for others the compiler can’t do much to simplify the code. Can you give a concrete example?

4 Likes

Yes, we definitely need a minimal way to reproduce this, as it is also unclear to me what the root cause may be (as it may also be an Elixir compiler issue).

1 Like

By the way, it would be great if it didn’t do that. :sweat_smile: (I know it’s a grown and mature code base, so it’s probably wishful thinking.)

I have modules that take 4-5 minutes to compile. It would be certainly a boon if that could be broken down into separate modules, given that more cores are readily available.

Which is funny (or ironic), given the context of the thread, because the alleged “extra cost” per module for exported functions is not really a problem on machines with many cores, but especially large files that compile a long time are a major driver of overall compile times in my (elixir) project - so much so that we have a mechanism to preserve binaries across builds for only these files if the input .asn files have not changed - counter to our usual policy of compiling from scratch.

I’m still working on a minimal example, but this might be of interest since it show a similiar issue: Compiling files with many function heads is very slow(OTP 26 issue?) - #35 by BartOtten - Questions / Help - Elixir Programming Language Forum

Here is the timing of sql/lib/lexer.ex at main · elixir-dbvisor/sql · GitHub when all functions are exported, one weird thing is that it seams to get slower each time I run `ERL_COMPILER_OPTIONS=time mix run lib/lexer.ex`:

Compiling /Users/benjaminschultzer/src/sql/lib/lexer.ex
 get_module_name_from_core     :      0.000 s  126645.2 kB
 core_lint_module              :      0.058 s  126645.5 kB
 core_compile_directives       :      0.000 s  126645.5 kB
 sys_core_fold                 :      0.053 s  118018.0 kB
 sys_core_alias                :      0.021 s  118018.0 kB
 core_transforms               :      0.000 s  118018.0 kB
 sys_core_bsm                  :      0.010 s  118018.0 kB
 core_to_ssa                   :      0.300 s   45996.7 kB
 beam_ssa_bool                 :      0.717 s   35429.6 kB
 beam_ssa_share                :      0.014 s   35428.0 kB
 beam_ssa_recv                 :      0.001 s   35428.0 kB
 beam_ssa_bsm                  :      1.020 s   44993.2 kB
    %% Sub passes of beam_ssa_bsm from slowest to fastest:
    combine_matches            :      0.425 s  42 %
    accept_context_args        :      0.280 s  27 %
    skip_outgoing_tail_extracti:      0.242 s  24 %
    allow_context_passthrough  :      0.073 s   7 %
    annotate_context_parameters:      0.000 s   0 %
 beam_ssa_opt                  :      5.514 s   34200.4 kB
    %% Sub passes of beam_ssa_opt from slowest to fastest:
    ssa_opt_dead               :      1.951 s  36 %
    ssa_opt_type_start         :      0.925 s  17 %
    ssa_opt_type_continue      :      0.680 s  12 %
    ssa_opt_live               :      0.583 s  11 %
    ssa_opt_alias              :      0.476 s   9 %
    ssa_opt_cse                :      0.183 s   3 %
    ssa_opt_coalesce_phis      :      0.097 s   2 %
    ssa_opt_ne                 :      0.084 s   2 %
    ssa_opt_tail_phis          :      0.070 s   1 %
    ssa_opt_sink               :      0.061 s   1 %
    ssa_opt_bsm_shortcut       :      0.056 s   1 %
    ssa_opt_element            :      0.056 s   1 %
    ssa_opt_linearize          :      0.053 s   1 %
    ssa_opt_split_blocks       :      0.031 s   1 %
    ssa_opt_record             :      0.026 s   0 %
    ssa_opt_merge_blocks       :      0.022 s   0 %
    ssa_opt_try                :      0.018 s   0 %
    ssa_opt_float              :      0.016 s   0 %
    ssa_opt_ranges             :      0.016 s   0 %
    ssa_opt_trim_unreachable   :      0.011 s   0 %
    ssa_opt_bs_create_bin      :      0.010 s   0 %
    ssa_opt_tuple_size         :      0.007 s   0 %
    ssa_opt_bs_ensure          :      0.006 s   0 %
    ssa_opt_tail_literals      :      0.006 s   0 %
    ssa_opt_redundant_br       :      0.006 s   0 %
    ssa_opt_update_tuple       :      0.005 s   0 %
    ssa_opt_blockify           :      0.002 s   0 %
    ssa_opt_merge_updates      :      0.002 s   0 %
    ssa_opt_bsm                :      0.001 s   0 %
    ssa_opt_no_reuse           :      0.001 s   0 %
    ssa_opt_bc_size            :      0.001 s   0 %
    ssa_opt_get_tuple_element  :      0.001 s   0 %
    ssa_opt_destructive_update :      0.000 s   0 %
    ssa_opt_sw                 :      0.000 s   0 %
    ssa_opt_unfold_literals    :      0.000 s   0 %
    ssa_opt_type_finish        :      0.000 s   0 %
 beam_ssa_throw                :      0.029 s   34200.4 kB
 beam_ssa_pre_codegen          :      0.453 s   37718.2 kB
    %% Sub passes of beam_ssa_pre_codegen from slowest to fastest:
    place_frames               :      0.125 s  28 %
    live_intervals             :      0.108 s  24 %
    reserve_regs               :      0.037 s   8 %
    fix_bs                     :      0.035 s   8 %
    linear_scan                :      0.033 s   7 %
    find_yregs                 :      0.022 s   5 %
    sanitize                   :      0.018 s   4 %
    assert_no_critical_edges   :      0.016 s   4 %
    expand_update_tuple        :      0.014 s   3 %
    number_instructions        :      0.013 s   3 %
    opt_get_list               :      0.011 s   2 %
    reserve_yregs              :      0.006 s   1 %
    frame_size                 :      0.005 s   1 %
    turn_yregs                 :      0.005 s   1 %
    copy_retval                :      0.002 s   0 %
    expand_match_fail          :      0.001 s   0 %
    fix_receives               :      0.000 s   0 %
 beam_ssa_codegen              :      0.135 s   29734.0 kB
 beam_validator_strong         :      0.505 s   29734.0 kB
 beam_a                        :      0.005 s   29650.6 kB
 beam_block                    :      0.008 s   31261.6 kB
 beam_jump                     :      0.060 s   30780.2 kB
 beam_clean                    :      0.010 s   30780.2 kB
 beam_trim                     :      0.002 s   30805.3 kB
 beam_flatten                  :      0.001 s   29196.9 kB
 beam_z                        :      0.002 s   29095.6 kB
 beam_validator_weak           :      0.504 s   29095.6 kB
 beam_asm                      :      0.061 s   20258.2 kB

Compiling /Users/benjaminschultzer/src/sql/lib/lexer.ex
 get_module_name_from_core     :      0.000 s  126645.2 kB
 core_lint_module              :      0.058 s  126645.5 kB
 core_compile_directives       :      0.000 s  126645.5 kB
 sys_core_fold                 :      0.053 s  118018.0 kB
 sys_core_alias                :      0.022 s  118018.0 kB
 core_transforms               :      0.000 s  118018.0 kB
 sys_core_bsm                  :      0.010 s  118018.0 kB
 core_to_ssa                   :      0.314 s   45996.7 kB
 beam_ssa_bool                 :      0.719 s   35429.6 kB
 beam_ssa_share                :      0.015 s   35428.0 kB
 beam_ssa_recv                 :      0.001 s   35428.0 kB
 beam_ssa_bsm                  :      1.018 s   44993.2 kB
    %% Sub passes of beam_ssa_bsm from slowest to fastest:
    combine_matches            :      0.433 s  43 %
    accept_context_args        :      0.284 s  28 %
    skip_outgoing_tail_extracti:      0.233 s  23 %
    allow_context_passthrough  :      0.068 s   7 %
    annotate_context_parameters:      0.000 s   0 %
 beam_ssa_opt                  :      5.776 s   34200.4 kB
    %% Sub passes of beam_ssa_opt from slowest to fastest:
    ssa_opt_dead               :      2.005 s  35 %
    ssa_opt_type_start         :      0.972 s  17 %
    ssa_opt_type_continue      :      0.699 s  12 %
    ssa_opt_live               :      0.616 s  11 %
    ssa_opt_alias              :      0.516 s   9 %
    ssa_opt_cse                :      0.200 s   3 %
    ssa_opt_coalesce_phis      :      0.113 s   2 %
    ssa_opt_ne                 :      0.075 s   1 %
    ssa_opt_tail_phis          :      0.070 s   1 %
    ssa_opt_bsm_shortcut       :      0.067 s   1 %
    ssa_opt_sink               :      0.065 s   1 %
    ssa_opt_element            :      0.063 s   1 %
    ssa_opt_linearize          :      0.058 s   1 %
    ssa_opt_split_blocks       :      0.028 s   0 %
    ssa_opt_record             :      0.025 s   0 %
    ssa_opt_merge_blocks       :      0.022 s   0 %
    ssa_opt_try                :      0.019 s   0 %
    ssa_opt_ranges             :      0.019 s   0 %
    ssa_opt_float              :      0.018 s   0 %
    ssa_opt_trim_unreachable   :      0.013 s   0 %
    ssa_opt_update_tuple       :      0.012 s   0 %
    ssa_opt_tuple_size         :      0.008 s   0 %
    ssa_opt_redundant_br       :      0.006 s   0 %
    ssa_opt_bs_create_bin      :      0.006 s   0 %
    ssa_opt_bs_ensure          :      0.006 s   0 %
    ssa_opt_tail_literals      :      0.006 s   0 %
    ssa_opt_blockify           :      0.002 s   0 %
    ssa_opt_merge_updates      :      0.002 s   0 %
    ssa_opt_bc_size            :      0.002 s   0 %
    ssa_opt_no_reuse           :      0.001 s   0 %
    ssa_opt_bsm                :      0.001 s   0 %
    ssa_opt_get_tuple_element  :      0.001 s   0 %
    ssa_opt_sw                 :      0.000 s   0 %
    ssa_opt_destructive_update :      0.000 s   0 %
    ssa_opt_unfold_literals    :      0.000 s   0 %
    ssa_opt_type_finish        :      0.000 s   0 %
 beam_ssa_throw                :      0.032 s   34200.4 kB
 beam_ssa_pre_codegen          :      0.483 s   37718.2 kB
    %% Sub passes of beam_ssa_pre_codegen from slowest to fastest:
    live_intervals             :      0.123 s  26 %
    place_frames               :      0.121 s  25 %
    fix_bs                     :      0.039 s   8 %
    linear_scan                :      0.038 s   8 %
    reserve_regs               :      0.037 s   8 %
    find_yregs                 :      0.024 s   5 %
    sanitize                   :      0.021 s   4 %
    assert_no_critical_edges   :      0.017 s   3 %
    expand_update_tuple        :      0.016 s   3 %
    number_instructions        :      0.013 s   3 %
    opt_get_list               :      0.012 s   2 %
    reserve_yregs              :      0.007 s   1 %
    turn_yregs                 :      0.006 s   1 %
    frame_size                 :      0.006 s   1 %
    copy_retval                :      0.002 s   0 %
    expand_match_fail          :      0.001 s   0 %
    fix_receives               :      0.000 s   0 %
 beam_ssa_codegen              :      0.133 s   29734.0 kB
 beam_validator_strong         :      0.519 s   29734.0 kB
 beam_a                        :      0.006 s   29650.6 kB
 beam_block                    :      0.011 s   31261.6 kB
 beam_jump                     :      0.061 s   30780.2 kB
 beam_clean                    :      0.009 s   30780.2 kB
 beam_trim                     :      0.002 s   30805.3 kB
 beam_flatten                  :      0.001 s   29196.9 kB
 beam_z                        :      0.002 s   29095.6 kB
 beam_validator_weak           :      0.530 s   29095.6 kB
 beam_asm                      :      0.060 s   20258.2 kB


Compiling /Users/benjaminschultzer/src/sql/lib/lexer.ex
 get_module_name_from_core     :      0.000 s  126645.2 kB
 core_lint_module              :      0.058 s  126645.5 kB
 core_compile_directives       :      0.000 s  126645.5 kB
 sys_core_fold                 :      0.053 s  118018.0 kB
 sys_core_alias                :      0.021 s  118018.0 kB
 core_transforms               :      0.000 s  118018.0 kB
 sys_core_bsm                  :      0.010 s  118018.0 kB
 core_to_ssa                   :      0.302 s   45996.7 kB
 beam_ssa_bool                 :      0.717 s   35429.6 kB
 beam_ssa_share                :      0.016 s   35428.0 kB
 beam_ssa_recv                 :      0.001 s   35428.0 kB
 beam_ssa_bsm                  :      1.069 s   44993.2 kB
    %% Sub passes of beam_ssa_bsm from slowest to fastest:
    combine_matches            :      0.441 s  41 %
    accept_context_args        :      0.288 s  27 %
    skip_outgoing_tail_extracti:      0.261 s  24 %
    allow_context_passthrough  :      0.079 s   7 %
    annotate_context_parameters:      0.000 s   0 %
 beam_ssa_opt                  :      5.897 s   34200.4 kB
    %% Sub passes of beam_ssa_opt from slowest to fastest:
    ssa_opt_dead               :      2.007 s  34 %
    ssa_opt_type_start         :      0.964 s  16 %
    ssa_opt_type_continue      :      0.846 s  14 %
    ssa_opt_live               :      0.590 s  10 %
    ssa_opt_alias              :      0.522 s   9 %
    ssa_opt_cse                :      0.205 s   4 %
    ssa_opt_coalesce_phis      :      0.108 s   2 %
    ssa_opt_ne                 :      0.075 s   1 %
    ssa_opt_tail_phis          :      0.071 s   1 %
    ssa_opt_bsm_shortcut       :      0.069 s   1 %
    ssa_opt_sink               :      0.066 s   1 %
    ssa_opt_element            :      0.060 s   1 %
    ssa_opt_linearize          :      0.056 s   1 %
    ssa_opt_split_blocks       :      0.033 s   1 %
    ssa_opt_record             :      0.024 s   0 %
    ssa_opt_try                :      0.021 s   0 %
    ssa_opt_merge_blocks       :      0.020 s   0 %
    ssa_opt_float              :      0.018 s   0 %
    ssa_opt_ranges             :      0.018 s   0 %
    ssa_opt_trim_unreachable   :      0.014 s   0 %
    ssa_opt_update_tuple       :      0.011 s   0 %
    ssa_opt_tuple_size         :      0.008 s   0 %
    ssa_opt_tail_literals      :      0.008 s   0 %
    ssa_opt_bs_create_bin      :      0.007 s   0 %
    ssa_opt_redundant_br       :      0.006 s   0 %
    ssa_opt_bs_ensure          :      0.006 s   0 %
    ssa_opt_blockify           :      0.002 s   0 %
    ssa_opt_merge_updates      :      0.002 s   0 %
    ssa_opt_no_reuse           :      0.001 s   0 %
    ssa_opt_bc_size            :      0.001 s   0 %
    ssa_opt_bsm                :      0.001 s   0 %
    ssa_opt_get_tuple_element  :      0.001 s   0 %
    ssa_opt_sw                 :      0.000 s   0 %
    ssa_opt_destructive_update :      0.000 s   0 %
    ssa_opt_unfold_literals    :      0.000 s   0 %
    ssa_opt_type_finish        :      0.000 s   0 %
 beam_ssa_throw                :      0.033 s   34200.4 kB
 beam_ssa_pre_codegen          :      0.475 s   37718.2 kB
    %% Sub passes of beam_ssa_pre_codegen from slowest to fastest:
    place_frames               :      0.126 s  27 %
    live_intervals             :      0.114 s  24 %
    fix_bs                     :      0.041 s   9 %
    linear_scan                :      0.039 s   8 %
    reserve_regs               :      0.035 s   7 %
    find_yregs                 :      0.023 s   5 %
    sanitize                   :      0.020 s   4 %
    assert_no_critical_edges   :      0.017 s   4 %
    expand_update_tuple        :      0.015 s   3 %
    number_instructions        :      0.012 s   3 %
    opt_get_list               :      0.011 s   2 %
    frame_size                 :      0.006 s   1 %
    reserve_yregs              :      0.006 s   1 %
    turn_yregs                 :      0.006 s   1 %
    copy_retval                :      0.002 s   0 %
    expand_match_fail          :      0.001 s   0 %
    fix_receives               :      0.000 s   0 %
 beam_ssa_codegen              :      0.139 s   29734.0 kB
 beam_validator_strong         :      0.518 s   29734.0 kB
 beam_a                        :      0.007 s   29650.6 kB
 beam_block                    :      0.010 s   31261.6 kB
 beam_jump                     :      0.060 s   30780.2 kB
 beam_clean                    :      0.009 s   30780.2 kB
 beam_trim                     :      0.002 s   30805.3 kB
 beam_flatten                  :      0.001 s   29196.9 kB
 beam_z                        :      0.002 s   29095.6 kB
 beam_validator_weak           :      0.526 s   29095.6 kB
 beam_asm                      :      0.064 s   20258.2 kB

Vs all of them non exported

Compiling /Users/benjaminschultzer/src/sql/lib/lexer.ex
 get_module_name_from_core     :      0.000 s  126494.5 kB
 core_lint_module              :      0.057 s  126494.8 kB
 core_compile_directives       :      0.000 s  126494.9 kB
 sys_core_fold                 :      0.052 s  117867.3 kB
 sys_core_alias                :      0.020 s  117867.3 kB
 core_transforms               :      0.000 s  117867.3 kB
 sys_core_bsm                  :      0.011 s  117867.3 kB
 core_to_ssa                   :      0.290 s   45846.6 kB
 beam_ssa_bool                 :      0.705 s   35279.5 kB
 beam_ssa_share                :      0.016 s   35277.9 kB
 beam_ssa_recv                 :      0.001 s   35277.9 kB
 beam_ssa_bsm                  :      1.052 s   44843.0 kB
    %% Sub passes of beam_ssa_bsm from slowest to fastest:
    combine_matches            :      0.427 s  41 %
    accept_context_args        :      0.283 s  27 %
    skip_outgoing_tail_extracti:      0.261 s  25 %
    allow_context_passthrough  :      0.080 s   8 %
    annotate_context_parameters:      0.000 s   0 %
 beam_ssa_opt                  :      4.967 s   30842.2 kB
    %% Sub passes of beam_ssa_opt from slowest to fastest:
    ssa_opt_type_start         :      2.005 s  41 %
    ssa_opt_dead               :      1.137 s  23 %
    ssa_opt_alias              :      0.441 s   9 %
    ssa_opt_type_continue      :      0.373 s   8 %
    ssa_opt_live               :      0.332 s   7 %
    ssa_opt_cse                :      0.138 s   3 %
    ssa_opt_coalesce_phis      :      0.104 s   2 %
    ssa_opt_linearize          :      0.060 s   1 %
    ssa_opt_element            :      0.060 s   1 %
    ssa_opt_sink               :      0.051 s   1 %
    ssa_opt_split_blocks       :      0.035 s   1 %
    ssa_opt_tail_phis          :      0.032 s   1 %
    ssa_opt_ne                 :      0.031 s   1 %
    ssa_opt_bsm_shortcut       :      0.025 s   1 %
    ssa_opt_record             :      0.019 s   0 %
    ssa_opt_merge_blocks       :      0.011 s   0 %
    ssa_opt_float              :      0.009 s   0 %
    ssa_opt_try                :      0.009 s   0 %
    ssa_opt_ranges             :      0.007 s   0 %
    ssa_opt_tuple_size         :      0.007 s   0 %
    ssa_opt_update_tuple       :      0.006 s   0 %
    ssa_opt_trim_unreachable   :      0.006 s   0 %
    ssa_opt_bs_ensure          :      0.003 s   0 %
    ssa_opt_tail_literals      :      0.003 s   0 %
    ssa_opt_redundant_br       :      0.003 s   0 %
    ssa_opt_bs_create_bin      :      0.002 s   0 %
    ssa_opt_no_reuse           :      0.002 s   0 %
    ssa_opt_blockify           :      0.001 s   0 %
    ssa_opt_merge_updates      :      0.001 s   0 %
    ssa_opt_bc_size            :      0.001 s   0 %
    ssa_opt_type_finish        :      0.001 s   0 %
    ssa_opt_bsm                :      0.000 s   0 %
    ssa_opt_get_tuple_element  :      0.000 s   0 %
    ssa_opt_destructive_update :      0.000 s   0 %
    ssa_opt_sw                 :      0.000 s   0 %
    ssa_opt_unfold_literals    :      0.000 s   0 %
 beam_ssa_throw                :      0.013 s   30842.2 kB
 beam_ssa_pre_codegen          :      0.197 s   32996.8 kB
    %% Sub passes of beam_ssa_pre_codegen from slowest to fastest:
    place_frames               :      0.052 s  26 %
    live_intervals             :      0.043 s  22 %
    fix_bs                     :      0.019 s  10 %
    linear_scan                :      0.016 s   8 %
    reserve_regs               :      0.014 s   7 %
    find_yregs                 :      0.012 s   6 %
    sanitize                   :      0.009 s   5 %
    assert_no_critical_edges   :      0.007 s   4 %
    number_instructions        :      0.007 s   3 %
    expand_update_tuple        :      0.005 s   3 %
    opt_get_list               :      0.004 s   2 %
    reserve_yregs              :      0.003 s   2 %
    frame_size                 :      0.003 s   1 %
    turn_yregs                 :      0.002 s   1 %
    copy_retval                :      0.001 s   1 %
    expand_match_fail          :      0.000 s   0 %
    fix_receives               :      0.000 s   0 %
 beam_ssa_codegen              :      0.061 s   27338.0 kB
 beam_validator_strong         :      0.250 s   27338.0 kB
 beam_a                        :      0.002 s   27296.9 kB
 beam_block                    :      0.003 s   28163.1 kB
 beam_jump                     :      0.027 s   27815.5 kB
 beam_clean                    :      0.003 s   27815.4 kB
 beam_trim                     :      0.001 s   27840.7 kB
 beam_flatten                  :      0.000 s   26978.7 kB
 beam_z                        :      0.001 s   26914.3 kB
 beam_validator_weak           :      0.250 s   26914.3 kB
 beam_asm                      :      0.040 s   20109.3 kB

Thanks for posting how you did your measurements. I can confirm that compilation is a little bit slower when using the export_all option.

I think I can see why. The binary matching optimizations heavily depend on the type-based optimizations. With all functions exported, the type-based optimizations are essentially disabled between functions. That will result in larger code because some unnecessary code is not removed, and other optimizations will run slower because they will have to traverse more code. (The resulting code will probably also be less efficient.)

I don’t think that this finding generalizes to all kind of code. If the type-based optimization passes cannot discard a lot of code, I don’t expect that compilation will be slower when all functions are exported. In fact, compilation could even be faster because there is less type information to keep track of. (Note that with all functions exported, the ssa_opt_dead pass was the slowest, but with only some functions exported, the type-optimizations passes were the slowest.)

I could not reproduce that on my desktop Mac. Did you compile on a laptop? Some laptops clock down their CPU when they get hot.

1 Like

Interesting, it’s a laptop indead.

That’s indeed what I thought could be happening. We were bitten by this earlier this year when we made a function public and then someone later reported the code became slower. I wonder that, if for binary optimizations specifically, it is worth making a private version of the function so we can keep the closed loop for binary optimizations…

But then the compiler may indeed become slower with more public functions. I guess that’s the tricky bits of doing compiler work, everyone wants the compiler to be as fast as possible but also optimize as much as possible. :sweat_smile:

2 Likes

Yes, I think it could be worth it for binary operations. You probably don’t need copy the entire function, though. It will probably work if you have a public function that starts off the binary matching and then calls the private function. Something like this (not tested):

pub(<<Bin/binary>>) ->
    private(Bin).
1 Like

It might be good for Elixir to have a similar guide, since this is mostly an Elixir issue due to the inverse behavior of functions where it’s easy to forget to make a function private.

Also, I’ve noticed something after refactoring to inline case statement:

This generates almost 150k cases, it compiles fast and runtime is great almost 2.5x faster then a guard version, although if I change the return to a function, then everything blows up

  case_ast =
    for c <- Unicode.Set.to_pattern!("[[:Lu:], [:Ll:], [:Lt:], [:Lm:], [:Lo:], [:Nl:], [:Mn:], [:Mc:], [:Nd:], [:Pc:], [:Cf:]]") do
      hd(
        quote do
          <<unquote(c), _::binary>> -> true
        end
      )
    end ++ quote do
      _ -> false
    end

  def case_fn(binary) do
    case binary do
      unquote(case_ast)
    end
  end
 sql git:(main) ✗ mix sql.bench
Operating System: macOS
CPU Information: Apple M1 Max
Number of Available Cores: 10
Available memory: 64 GB
Elixir 1.20.0-dev
Erlang 28.1
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 2 s
reduction time: 2 s
parallel: 1
inputs: 1..100_000
Estimated total run time: 22 s

Measured function call overhead as: 0 ns
Benchmarking case with input 1..100_000 ...
Benchmarking guard with input 1..100_000 ...
Calculating statistics...
Formatting results...

##### With input 1..100_000 #####
Name           ips        average  deviation         median         99th %
case        15.07 M       66.35 ns ±42238.21%          42 ns          42 ns
guard        6.14 M      162.76 ns ±15740.33%         125 ns         167 ns

Comparison:
case        15.07 M
guard        6.14 M - 2.45x slower +96.41 ns

Memory usage statistics:

Name    Memory usage
case             40 B
guard            40 B - 1.00x memory usage +0 B

**All measurements for memory usage were the same**

Reduction count statistics:

Name Reduction count
case                2
guard               2 - 1.00x reduction count +0
Compiling lib/lexer.ex (it's taking more than 10s)
Compiling /Users/benjaminschultzer/src/sql/lib/lexer.ex
 get_module_name_from_core     :      0.000 s 1567585.5 kB
 core_lint_module              :      0.178 s 1567585.8 kB
 core_compile_directives       :      0.000 s 1567585.9 kB
 sys_core_fold                 :      1.161 s 1573464.2 kB
 sys_core_alias                :      0.303 s 1573464.2 kB
 core_transforms               :      0.000 s 1573464.2 kB
 sys_core_bsm                  :      0.158 s 1573464.2 kB
 core_to_ssa                   :      2.666 s  860694.9 kB
 beam_ssa_bool                 :      4.381 s  851660.4 kB
 beam_ssa_share                :      1.505 s  851494.1 kB
 beam_ssa_recv                 :      0.012 s  851494.1 kB
 beam_ssa_bsm                  :     24.190 s  863380.4 kB
    %% Sub passes of beam_ssa_bsm from slowest to fastest:
    combine_matches            :     10.507 s  43 %
    skip_outgoing_tail_extracti:      6.934 s  29 %
    accept_context_args        :      4.348 s  18 %
    allow_context_passthrough  :      2.392 s  10 %
    annotate_context_parameters:      0.000 s   0 %
 beam_ssa_opt                  :     99.483 s 1047863.4 kB
    %% Sub passes of beam_ssa_opt from slowest to fastest:
    ssa_opt_type_start         :     43.199 s  44 %
    ssa_opt_alias              :     12.884 s  13 %
    ssa_opt_live               :      9.043 s   9 %
    ssa_opt_dead               :      6.087 s   6 %
    ssa_opt_type_continue      :      5.636 s   6 %
    ssa_opt_bsm_shortcut       :      3.546 s   4 %
    ssa_opt_cse                :      3.431 s   3 %
    ssa_opt_ranges             :      1.818 s   2 %
    ssa_opt_try                :      1.559 s   2 %
    ssa_opt_trim_unreachable   :      1.284 s   1 %
    ssa_opt_merge_blocks       :      1.233 s   1 %
    ssa_opt_linearize          :      1.028 s   1 %
    ssa_opt_tail_phis          :      0.972 s   1 %
    ssa_opt_element            :      0.958 s   1 %
    ssa_opt_bs_ensure          :      0.946 s   1 %
    ssa_opt_tail_literals      :      0.833 s   1 %
    ssa_opt_float              :      0.719 s   1 %
    ssa_opt_split_blocks       :      0.692 s   1 %
    ssa_opt_coalesce_phis      :      0.570 s   1 %
    ssa_opt_redundant_br       :      0.561 s   1 %
    ssa_opt_record             :      0.361 s   0 %
    ssa_opt_update_tuple       :      0.187 s   0 %
    ssa_opt_bsm                :      0.165 s   0 %
    ssa_opt_no_reuse           :      0.109 s   0 %
    ssa_opt_blockify           :      0.078 s   0 %
    ssa_opt_ne                 :      0.077 s   0 %
    ssa_opt_tuple_size         :      0.061 s   0 %
    ssa_opt_get_tuple_element  :      0.055 s   0 %
    ssa_opt_type_finish        :      0.053 s   0 %
    ssa_opt_sink               :      0.044 s   0 %
    ssa_opt_bs_create_bin      :      0.043 s   0 %
    ssa_opt_destructive_update :      0.016 s   0 %
    ssa_opt_bc_size            :      0.012 s   0 %
    ssa_opt_sw                 :      0.011 s   0 %
    ssa_opt_merge_updates      :      0.009 s   0 %
    ssa_opt_unfold_literals    :      0.000 s   0 %
 beam_ssa_throw                :      2.323 s 1047863.4 kB
 beam_ssa_pre_codegen          :     20.689 s 1036708.9 kB
    %% Sub passes of beam_ssa_pre_codegen from slowest to fastest:
    fix_bs                     :      4.854 s  24 %
    live_intervals             :      3.679 s  18 %
    reserve_regs               :      2.682 s  13 %
    sanitize                   :      2.137 s  10 %
    place_frames               :      1.730 s   8 %
    number_instructions        :      1.486 s   7 %
    assert_no_critical_edges   :      1.218 s   6 %
    linear_scan                :      1.089 s   5 %
    expand_update_tuple        :      1.049 s   5 %
    opt_get_list               :      0.647 s   3 %
    expand_match_fail          :      0.023 s   0 %
    find_yregs                 :      0.021 s   0 %
    fix_receives               :      0.012 s   0 %
    reserve_yregs              :      0.004 s   0 %
    turn_yregs                 :      0.003 s   0 %
    frame_size                 :      0.002 s   0 %
    copy_retval                :      0.001 s   0 %
 beam_ssa_codegen              :      7.125 s  702342.0 kB
 beam_validator_strong         :      1.898 s  702342.0 kB
 beam_a                        :      0.122 s  704474.8 kB
 beam_block                    :      0.164 s  763859.5 kB
 beam_jump                     :      1.409 s  735968.3 kB
 beam_clean                    :      0.081 s  735968.3 kB
 beam_trim                     :      0.016 s  735993.6 kB
 beam_flatten                  :      0.049 s  676610.4 kB
 beam_z                        :      0.036 s  668886.5 kB
 beam_validator_weak           :      2.062 s  668886.5 kB
 beam_asm                      :      1.794 s  478814.8 kB
Compiling /Users/benjaminschultzer/src/sql/lib/mix/tasks/sql.get.ex
 get_module_name_from_core     :      0.000 s     473.2 kB
 core_lint_module              :      0.000 s     473.7 kB
 core_compile_directives       :      0.000 s     473.7 kB
 sys_core_fold                 :      0.000 s     432.9 kB
 sys_core_alias                :      0.000 s     432.9 kB
 core_transforms               :      0.000 s     432.9 kB
 sys_core_bsm                  :      0.000 s     432.9 kB
 core_to_ssa                   :      0.000 s     248.7 kB
 beam_ssa_bool                 :      0.000 s     243.0 kB
 beam_ssa_share                :      0.000 s     241.4 kB
 beam_ssa_recv                 :      0.000 s     241.4 kB
 beam_ssa_bsm                  :      0.000 s     241.8 kB
    %% Sub passes of beam_ssa_bsm from slowest to fastest:
    allow_context_passthrough  :      0.000 s  93 %
    combine_matches            :      0.000 s   2 %
    annotate_context_parameters:      0.000 s   2 %
    accept_context_args        :      0.000 s   1 %
    skip_outgoing_tail_extracti:      0.000 s   1 %
 beam_ssa_opt                  :      0.005 s     310.2 kB
    %% Sub passes of beam_ssa_opt from slowest to fastest:
    ssa_opt_type_start         :      0.001 s  31 %
    ssa_opt_type_continue      :      0.001 s  27 %
    ssa_opt_alias              :      0.001 s  12 %
    ssa_opt_live               :      0.000 s   9 %
    ssa_opt_dead               :      0.000 s   8 %
    ssa_opt_cse                :      0.000 s   4 %
    ssa_opt_sink               :      0.000 s   2 %
    ssa_opt_float              :      0.000 s   1 %
    ssa_opt_tail_phis          :      0.000 s   1 %
    ssa_opt_merge_blocks       :      0.000 s   1 %
    ssa_opt_try                :      0.000 s   1 %
    ssa_opt_linearize          :      0.000 s   1 %
    ssa_opt_coalesce_phis      :      0.000 s   1 %
    ssa_opt_unfold_literals    :      0.000 s   0 %
    ssa_opt_trim_unreachable   :      0.000 s   0 %
    ssa_opt_ranges             :      0.000 s   0 %
    ssa_opt_destructive_update :      0.000 s   0 %
    ssa_opt_merge_updates      :      0.000 s   0 %
    ssa_opt_element            :      0.000 s   0 %
    ssa_opt_split_blocks       :      0.000 s   0 %
    ssa_opt_redundant_br       :      0.000 s   0 %
    ssa_opt_record             :      0.000 s   0 %
    ssa_opt_tail_literals      :      0.000 s   0 %
    ssa_opt_bs_ensure          :      0.000 s   0 %
    ssa_opt_update_tuple       :      0.000 s   0 %
    ssa_opt_bs_create_bin      :      0.000 s   0 %
    ssa_opt_no_reuse           :      0.000 s   0 %
    ssa_opt_tuple_size         :      0.000 s   0 %
    ssa_opt_bsm                :      0.000 s   0 %
    ssa_opt_ne                 :      0.000 s   0 %
    ssa_opt_bc_size            :      0.000 s   0 %
    ssa_opt_blockify           :      0.000 s   0 %
    ssa_opt_get_tuple_element  :      0.000 s   0 %
    ssa_opt_bsm_shortcut       :      0.000 s   0 %
    ssa_opt_sw                 :      0.000 s   0 %
    ssa_opt_type_finish        :      0.000 s   0 %
 beam_ssa_throw                :      0.000 s     310.2 kB
 beam_ssa_pre_codegen          :      0.001 s     323.6 kB
    %% Sub passes of beam_ssa_pre_codegen from slowest to fastest:
    live_intervals             :      0.000 s  20 %
    linear_scan                :      0.000 s  13 %
    place_frames               :      0.000 s  12 %
    reserve_regs               :      0.000 s  12 %
    reserve_yregs              :      0.000 s   9 %
    find_yregs                 :      0.000 s   8 %
    frame_size                 :      0.000 s   5 %
    turn_yregs                 :      0.000 s   4 %
    sanitize                   :      0.000 s   4 %
    assert_no_critical_edges   :      0.000 s   2 %
    expand_update_tuple        :      0.000 s   2 %
    number_instructions        :      0.000 s   2 %
    copy_retval                :      0.000 s   2 %
    fix_bs                     :      0.000 s   1 %
    fix_receives               :      0.000 s   1 %
    opt_get_list               :      0.000 s   1 %
    expand_match_fail          :      0.000 s   0 %
 beam_ssa_codegen              :      0.000 s     202.1 kB
 beam_validator_strong         :      0.000 s     202.1 kB
 beam_a                        :      0.000 s     201.7 kB
 beam_block                    :      0.000 s     207.0 kB
 beam_jump                     :      0.000 s     206.8 kB
 beam_clean                    :      0.000 s     206.9 kB
 beam_trim                     :      0.000 s     206.7 kB
 beam_flatten                  :      0.000 s     200.5 kB
 beam_z                        :      0.000 s     189.4 kB
 beam_validator_weak           :      0.000 s     189.4 kB
 beam_asm                      :      0.001 s     132.6 kB


Compiling /Users/benjaminschultzer/src/sql/lib/lexer.ex
 get_module_name_from_core     :      0.000 s 1567585.5 kB
 core_lint_module              :      0.179 s 1567585.8 kB
 core_compile_directives       :      0.000 s 1567585.9 kB
 sys_core_fold                 :      1.129 s 1573464.2 kB
 sys_core_alias                :      0.302 s 1573464.2 kB
 core_transforms               :      0.000 s 1573464.2 kB
 sys_core_bsm                  :      0.257 s 1573464.2 kB
 core_to_ssa                   :      2.865 s  860694.9 kB
 beam_ssa_bool                 :      3.936 s  851660.4 kB
 beam_ssa_share                :      1.484 s  851494.1 kB
 beam_ssa_recv                 :      0.014 s  851494.1 kB
 beam_ssa_bsm                  :     24.808 s  863380.4 kB
    %% Sub passes of beam_ssa_bsm from slowest to fastest:
    combine_matches            :     10.964 s  44 %
    skip_outgoing_tail_extracti:      7.198 s  29 %
    accept_context_args        :      4.480 s  18 %
    allow_context_passthrough  :      2.154 s   9 %
    annotate_context_parameters:      0.000 s   0 %
 beam_ssa_opt                  :    102.630 s 1047863.4 kB
    %% Sub passes of beam_ssa_opt from slowest to fastest:
    ssa_opt_type_start         :     45.082 s  44 %
    ssa_opt_alias              :     14.295 s  14 %
    ssa_opt_live               :      9.553 s   9 %
    ssa_opt_dead               :      5.745 s   6 %
    ssa_opt_type_continue      :      5.058 s   5 %
    ssa_opt_bsm_shortcut       :      4.023 s   4 %
    ssa_opt_cse                :      3.329 s   3 %
    ssa_opt_ranges             :      1.894 s   2 %
    ssa_opt_try                :      1.759 s   2 %
    ssa_opt_trim_unreachable   :      1.208 s   1 %
    ssa_opt_merge_blocks       :      1.195 s   1 %
    ssa_opt_bs_ensure          :      1.001 s   1 %
    ssa_opt_linearize          :      0.951 s   1 %
    ssa_opt_element            :      0.912 s   1 %
    ssa_opt_tail_phis          :      0.885 s   1 %
    ssa_opt_tail_literals      :      0.812 s   1 %
    ssa_opt_float              :      0.706 s   1 %
    ssa_opt_redundant_br       :      0.634 s   1 %
    ssa_opt_split_blocks       :      0.632 s   1 %
    ssa_opt_coalesce_phis      :      0.520 s   1 %
    ssa_opt_record             :      0.345 s   0 %
    ssa_opt_update_tuple       :      0.173 s   0 %
    ssa_opt_bsm                :      0.150 s   0 %
    ssa_opt_ne                 :      0.132 s   0 %
    ssa_opt_no_reuse           :      0.088 s   0 %
    ssa_opt_blockify           :      0.081 s   0 %
    ssa_opt_get_tuple_element  :      0.058 s   0 %
    ssa_opt_type_finish        :      0.055 s   0 %
    ssa_opt_tuple_size         :      0.053 s   0 %
    ssa_opt_sink               :      0.046 s   0 %
    ssa_opt_bs_create_bin      :      0.043 s   0 %
    ssa_opt_destructive_update :      0.014 s   0 %
    ssa_opt_bc_size            :      0.012 s   0 %
    ssa_opt_merge_updates      :      0.011 s   0 %
    ssa_opt_sw                 :      0.010 s   0 %
    ssa_opt_unfold_literals    :      0.000 s   0 %
 beam_ssa_throw                :      2.316 s 1047863.4 kB
 beam_ssa_pre_codegen          :     21.428 s 1036708.9 kB
    %% Sub passes of beam_ssa_pre_codegen from slowest to fastest:
    fix_bs                     :      4.609 s  22 %
    live_intervals             :      4.025 s  19 %
    reserve_regs               :      2.829 s  13 %
    sanitize                   :      2.361 s  11 %
    place_frames               :      1.683 s   8 %
    number_instructions        :      1.459 s   7 %
    expand_update_tuple        :      1.276 s   6 %
    assert_no_critical_edges   :      1.221 s   6 %
    linear_scan                :      1.188 s   6 %
    opt_get_list               :      0.661 s   3 %
    expand_match_fail          :      0.031 s   0 %
    find_yregs                 :      0.019 s   0 %
    fix_receives               :      0.013 s   0 %
    reserve_yregs              :      0.004 s   0 %
    turn_yregs                 :      0.002 s   0 %
    frame_size                 :      0.002 s   0 %
    copy_retval                :      0.001 s   0 %
 beam_ssa_codegen              :      6.595 s  702342.0 kB
 beam_validator_strong         :      2.608 s  702342.0 kB
 beam_a                        :      0.311 s  704474.8 kB
 beam_block                    :      0.346 s  763859.5 kB
 beam_jump                     :      1.486 s  735968.3 kB
 beam_clean                    :      0.154 s  735968.3 kB
 beam_trim                     :      0.011 s  735993.6 kB
 beam_flatten                  :      0.082 s  676610.4 kB
 beam_z                        :      0.103 s  668886.5 kB
 beam_validator_weak           :      2.331 s  668886.5 kB
 beam_asm                      :      1.832 s  478814.8 kB
  defp lex(rest, context, line, column, acc) do
    case rest do
      <<226, 129, 166, _::binary>> -> {:error, :bidi, line, column}
      <<226, 129, 167, _::binary>> -> {:error, :bidi, line, column}
      <<226, 129, 168, _::binary>> -> {:error, :bidi, line, column}
      <<226, 129, 169, _::binary>> -> {:error, :bidi, line, column}
      <<226, 128, 170, _::binary>> -> {:error, :bidi, line, column}
      <<226, 128, 171, _::binary>> -> {:error, :bidi, line, column}
      <<226, 128, 172, _::binary>> -> {:error, :bidi, line, column}
      <<226, 128, 173, _::binary>> -> {:error, :bidi, line, column}
      <<226, 128, 174, _::binary>> -> {:error, :bidi, line, column}
      <<239, 187, 191, _::binary>> -> {:error, :zero_width, line, column}
      <<226, 128, 141, _::binary>> -> {:error, :zero_width, line, column}
      <<226, 128, 140, _::binary>> -> {:error, :zero_width, line, column}
      <<226, 128, 139, _::binary>> -> {:error, :zero_width, line, column}
      <<226, 129, 160, _::binary>> -> {:error, :zero_width, line, column}
      <<225, 158, 181, _::binary>> -> {:error, :zero_width, line, column}
      <<225, 158, 180, _::binary>> -> {:error, :zero_width, line, column}
      <<225, 160, 142, _::binary>> -> {:error, :zero_width, line, column}
      <<205, 143, _::binary>> -> {:error, :cgj, line, column}
      <<9, rest::binary>> -> lex(rest, context, line, column+1, acc)
      <<32, rest::binary>> -> lex(rest, context, line, column+1, acc)
      <<194, 160, rest::binary>> -> lex(rest, context, line, column+1, acc)
      <<225, 154, 128, rest::binary>> -> lex(rest, context, line, column+1, acc)
      <<226, 128, 130, rest::binary>> -> lex(rest, context, line, column+1, acc)
      <<226, 128, 131, rest::binary>> -> lex(rest, context, line, column+1, acc)
      <<226, 128, 137, rest::binary>> -> lex(rest, context, line, column+1, acc)
      <<226, 128, 175, rest::binary>> -> lex(rest, context, line, column+1, acc)
      <<226, 129, 159, rest::binary>> -> lex(rest, context, line, column+1, acc)
      <<227, 128, 128, rest::binary>> -> lex(rest, context, line, column+1, acc)
      <<194, 133, rest::binary>> -> lex(rest, context, line+1, 0, acc)
      <<226, 128, 168, rest::binary>> -> lex(rest, context, line+1, 0, acc)
      <<226, 128, 169, rest::binary>> -> lex(rest, context, line+1, 0, acc)
      <<10, rest::binary>> -> lex(rest, context, line+1, 0, acc)
      <<11, rest::binary>> -> lex(rest, context, line+1, 0, acc)
      <<12, rest::binary>> -> lex(rest, context, line+1, 0, acc)
      <<13, rest::binary>> -> lex(rest, context, line+1, 0, acc)
      <<?., rest::binary>> ->
        column = column+1
        case acc do
          [{t, _, _}|_] when t in ~w[ident double_quote bracket]a ->
            lex(rest, context, line, column, node(:dot, :operator, line, column, line, column, context, [], acc))
          _ ->
            num(rest, [?.], context, line, column, 0, acc)
        end
      <<?;, rest::binary>> ->
        column = column+1
        lex(rest, context, line, column, node(:colon, :delimiter, line, column, line, column, context, [], acc))
      <<?,, rest::binary>> ->
        column = column+1
        lex(rest, context, line, column, node(:comma, :delimiter, line, column, line, column, context, [], acc))
      <<?-, ?-, rest::binary>> -> comment(rest, [], line, column+1, context, line, column+1, acc)
      <<?/, ?*, rest::binary>> -> comments(rest, [], line, column+1, context, line, column+1, acc)
      <<?{, ?{, rest::binary>> -> double_brace(rest, [], line, column+1, 0, context, 0, 1, acc)
      <<?`, rest::binary>> -> backtick(rest, [], line, column+1, context, line, column, acc)
      <<?', rest::binary>> -> quote(rest, [], line, column+1, context, line, column, acc)
      <<?", rest::binary>> -> double_quote(rest, [], line, column+1, context, line, column, acc)
      <<?0, ?x, rest::binary>> -> hex(rest, [?x, ?0], context, line, column+1, 1, acc)
      <<?0, ?X, rest::binary>> -> hex(rest, [?X, ?0], context, line, column+1, 1, acc)
      <<?0, ?b, rest::binary>> -> bin(rest, [?b, ?0], context, line, column+1, 1, acc)
      <<?0, ?B, rest::binary>> -> bin(rest, [?B, ?0], context, line, column+1, 1, acc)
      <<?0, ?o, rest::binary>> -> oct(rest, [?o, ?0], context, line, column+1, 1, acc)
      <<?0, ?O, rest::binary>> -> oct(rest, [?O, ?0], context, line, column+1, 1, acc)
      <<?!, rest::binary>> -> special(rest, [?!], context, line, column+1, 0, acc)
      <<?#, rest::binary>> -> special(rest, [?#], context, line, column+1, 0, acc)
      <<?$, rest::binary>> -> special(rest, [?$], context, line, column+1, 0, acc)
      <<?%, rest::binary>> -> special(rest, [?%], context, line, column+1, 0, acc)
      <<?&, rest::binary>> -> special(rest, [?&], context, line, column+1, 0, acc)
      <<?*, rest::binary>> -> special(rest, [?*], context, line, column+1, 0, acc)
      <<?+, rest::binary>> -> special(rest, [?+], context, line, column+1, 0, acc)
      <<?-, rest::binary>> -> special(rest, [?-], context, line, column+1, 0, acc)
      <<?/, rest::binary>> -> special(rest, [?/], context, line, column+1, 0, acc)
      <<?:, rest::binary>> -> special(rest, [?:], context, line, column+1, 0, acc)
      <<?<, rest::binary>> -> special(rest, [?<], context, line, column+1, 0, acc)
      <<?=, rest::binary>> -> special(rest, [?=], context, line, column+1, 0, acc)
      <<?>, rest::binary>> -> special(rest, [?>], context, line, column+1, 0, acc)
      <<??, rest::binary>> -> special(rest, [??], context, line, column+1, 0, acc)
      <<?@, rest::binary>> -> special(rest, [?@], context, line, column+1, 0, acc)
      <<?^, rest::binary>> -> special(rest, [?^], context, line, column+1, 0, acc)
      <<?|, rest::binary>> -> special(rest, [?|], context, line, column+1, 0, acc)
      <<?~, rest::binary>> -> special(rest, [?~], context, line, column+1, 0, acc)
      <<?_, rest::binary>> -> ident(rest, [?_], context, line, column+1, 0, acc)
      <<?0, rest::binary>> -> num(rest, [?0], context, line, column+1, 0, acc)
      <<?1, rest::binary>> -> num(rest, [?1], context, line, column+1, 0, acc)
      <<?2, rest::binary>> -> num(rest, [?2], context, line, column+1, 0, acc)
      <<?3, rest::binary>> -> num(rest, [?3], context, line, column+1, 0, acc)
      <<?4, rest::binary>> -> num(rest, [?4], context, line, column+1, 0, acc)
      <<?5, rest::binary>> -> num(rest, [?5], context, line, column+1, 0, acc)
      <<?6, rest::binary>> -> num(rest, [?6], context, line, column+1, 0, acc)
      <<?7, rest::binary>> -> num(rest, [?7], context, line, column+1, 0, acc)
      <<?8, rest::binary>> -> num(rest, [?8], context, line, column+1, 0, acc)
      <<?9, rest::binary>> -> num(rest, [?9], context, line, column+1, 0, acc)
      <<?A, rest::binary>> -> ident(rest, [?A], context, line, column+1, 0, acc)
      <<?B, rest::binary>> -> ident(rest, [?B], context, line, column+1, 0, acc)
      <<?C, rest::binary>> -> ident(rest, [?C], context, line, column+1, 0, acc)
      <<?D, rest::binary>> -> ident(rest, [?D], context, line, column+1, 0, acc)
      <<?E, rest::binary>> -> ident(rest, [?E], context, line, column+1, 0, acc)
      <<?F, rest::binary>> -> ident(rest, [?F], context, line, column+1, 0, acc)
      <<?G, rest::binary>> -> ident(rest, [?G], context, line, column+1, 0, acc)
      <<?H, rest::binary>> -> ident(rest, [?H], context, line, column+1, 0, acc)
      <<?I, rest::binary>> -> ident(rest, [?I], context, line, column+1, 0, acc)
      <<?J, rest::binary>> -> ident(rest, [?J], context, line, column+1, 0, acc)
      <<?K, rest::binary>> -> ident(rest, [?K], context, line, column+1, 0, acc)
      <<?L, rest::binary>> -> ident(rest, [?L], context, line, column+1, 0, acc)
      <<?M, rest::binary>> -> ident(rest, [?M], context, line, column+1, 0, acc)
      <<?N, rest::binary>> -> ident(rest, [?N], context, line, column+1, 0, acc)
      <<?O, rest::binary>> -> ident(rest, [?O], context, line, column+1, 0, acc)
      <<?P, rest::binary>> -> ident(rest, [?P], context, line, column+1, 0, acc)
      <<?Q, rest::binary>> -> ident(rest, [?Q], context, line, column+1, 0, acc)
      <<?R, rest::binary>> -> ident(rest, [?R], context, line, column+1, 0, acc)
      <<?S, rest::binary>> -> ident(rest, [?S], context, line, column+1, 0, acc)
      <<?T, rest::binary>> -> ident(rest, [?T], context, line, column+1, 0, acc)
      <<?U, rest::binary>> -> ident(rest, [?U], context, line, column+1, 0, acc)
      <<?V, rest::binary>> -> ident(rest, [?V], context, line, column+1, 0, acc)
      <<?W, rest::binary>> -> ident(rest, [?W], context, line, column+1, 0, acc)
      <<?X, rest::binary>> -> ident(rest, [?X], context, line, column+1, 0, acc)
      <<?Y, rest::binary>> -> ident(rest, [?Y], context, line, column+1, 0, acc)
      <<?Z, rest::binary>> -> ident(rest, [?Z], context, line, column+1, 0, acc)
      <<?a, rest::binary>> -> ident(rest, [?a], context, line, column+1, 0, acc)
      <<?b, rest::binary>> -> ident(rest, [?b], context, line, column+1, 0, acc)
      <<?c, rest::binary>> -> ident(rest, [?c], context, line, column+1, 0, acc)
      <<?d, rest::binary>> -> ident(rest, [?d], context, line, column+1, 0, acc)
      <<?e, rest::binary>> -> ident(rest, [?e], context, line, column+1, 0, acc)
      <<?f, rest::binary>> -> ident(rest, [?f], context, line, column+1, 0, acc)
      <<?g, rest::binary>> -> ident(rest, [?g], context, line, column+1, 0, acc)
      <<?h, rest::binary>> -> ident(rest, [?h], context, line, column+1, 0, acc)
      <<?i, rest::binary>> -> ident(rest, [?i], context, line, column+1, 0, acc)
      <<?j, rest::binary>> -> ident(rest, [?j], context, line, column+1, 0, acc)
      <<?k, rest::binary>> -> ident(rest, [?k], context, line, column+1, 0, acc)
      <<?l, rest::binary>> -> ident(rest, [?l], context, line, column+1, 0, acc)
      <<?m, rest::binary>> -> ident(rest, [?m], context, line, column+1, 0, acc)
      <<?n, rest::binary>> -> ident(rest, [?n], context, line, column+1, 0, acc)
      <<?o, rest::binary>> -> ident(rest, [?o], context, line, column+1, 0, acc)
      <<?p, rest::binary>> -> ident(rest, [?p], context, line, column+1, 0, acc)
      <<?q, rest::binary>> -> ident(rest, [?q], context, line, column+1, 0, acc)
      <<?r, rest::binary>> -> ident(rest, [?r], context, line, column+1, 0, acc)
      <<?s, rest::binary>> -> ident(rest, [?s], context, line, column+1, 0, acc)
      <<?t, rest::binary>> -> ident(rest, [?t], context, line, column+1, 0, acc)
      <<?u, rest::binary>> -> ident(rest, [?u], context, line, column+1, 0, acc)
      <<?v, rest::binary>> -> ident(rest, [?v], context, line, column+1, 0, acc)
      <<?w, rest::binary>> -> ident(rest, [?w], context, line, column+1, 0, acc)
      <<?x, rest::binary>> -> ident(rest, [?x], context, line, column+1, 0, acc)
      <<?y, rest::binary>> -> ident(rest, [?y], context, line, column+1, 0, acc)
      <<?(, rest::binary>> ->
        column = column+1
        case lex(rest, context, line, column, []) do
          {rest, context, l, c, []=data} ->
            offset = case acc do
              [{_, [{:span, {_, _, el, ec}}|_], _}|_] -> {line-el, (column-ec)-1, 0, 0}
              [] -> {0, 0, 0, 0}
            end
            lex(rest, context, l, c, [{:paren, [span: {line, column, l, c}, offset: offset, type: :expression, file: context.file], data}|acc])
          {rest, context, l, c, [{_, [{:span, {_, _, eel, eec}}|_], _}|_]=data} ->
            offset = case acc do
              [{_, [{:span, {_, _, el, ec}}|_], _}|_] -> {line-el, (column-ec)-1, l-eel, (c-eec)-1}
              [] -> {0, 0, l-eel, (c-eec)-1}
            end
            lex(rest, context, l, c, [{:paren, [span: {line, column, l, c}, offset: offset, type: :expression, file: context.file], data}|acc])
          {end_line, end_column, _context, _acc} ->
            {:error, file: context.file, end_line: end_line, end_column: end_column, line: line, column: column, opening_delimiter: :"(", expected_delimiter: :")"}
        end
      <<?[, rest::binary>> ->
        column = column+1
        case lex(rest, context, line, column, []) do
          {rest, context, l, c, []=data} ->
            offset = case acc do
              [{_, [{:span, {_, _, el, ec}}|_], _}|_] -> {line-el, (column-ec)-1, 0, 0}
              [] -> {0, 0, 0, 0}
            end

            lex(rest, context, l, c, [{:bracket, [span: {line, column, l, c}, offset: offset, type: :expression, file: context.file], data}|acc])
          {rest, context, l, c, [{_, [{:span, {_, _, eel, eec}}|_], _}|_]=data} ->
            offset = case acc do
              [{_, [{:span, {_, _, el, ec}}|_], _}|_] -> {line-el, (column-ec)-1, l-eel, (c-eec)-1}
              [] -> {0, 0, l-eel, (c-eec)-1}
            end
            lex(rest, context, l, c, [{:bracket, [span: {line, column, l, c}, offset: offset, type: :expression, file: context.file], data}|acc])
          {end_line, end_column, _context, _acc} ->
            {:error, file: context.file, end_line: end_line, end_column: end_column, line: line, column: column, opening_delimiter: :"[", expected_delimiter: :"]"}
        end
      <<?{, rest::binary>> ->
        column = column+1
        case lex(rest, context, line, column, []) do
          {rest, context, l, c, []=data} ->
            offset = case acc do
              [{_, [{:span, {_, _, el, ec}}|_], _}|_] -> {line-el, (column-ec)-1, 0, 0}
              [] -> {0, 0, 0, 0}
            end
            lex(rest, context, l, c, [{:brace, [span: {line, column, l, c}, offset: offset, type: :expression, file: context.file], data}|acc])

          {rest, context, l, c, [{_, [{:span, {_, _, eel, eec}}|_], _}|_]=data} ->
            offset = case acc do
              [{_, [{:span, {_, _, el, ec}}|_], _}|_] -> {line-el, (column-ec)-1, l-eel, (c-eec)-1}
              [] -> {0, 0, l-eel, (c-eec)-1}
            end

            lex(rest, context, l, c, [{:brace, [span: {line, column, l, c}, offset: offset, type: :expression, file: context.file], data}|acc])
          {end_line, end_column, _context, _acc} ->
            {:error, file: context.file, end_line: end_line, end_column: end_column, line: line, column: column, opening_delimiter: :"{", expected_delimiter: :"}"}
        end
      <<?}, rest::binary>> -> {rest, context, line, column+1, acc}
      <<?), rest::binary>> -> {rest, context, line, column+1, acc}
      <<?], rest::binary>> -> {rest, context, line, column+1, acc}
      # <<b::utf8, rest::binary>> when Unicode.Set.match?(b, "[[:Lu:], [:Ll:], [:Lt:], [:Lm:], [:Lo:], [:Nl:]]") ->
      #   ident(rest, [b], context, line, column+1, 0, acc)
      "" -> {line, column, context, acc}
      rest -> ident_start(rest, context, line, column, acc)
    end
  end

  case_ast =
    for b <- tl(Unicode.Set.to_pattern!("[[:Lu:], [:Ll:], [:Lt:], [:Lm:], [:Lo:], [:Nl:]]")) do
      hd(
        quote do
          <<unquote(b), rest::binary>> -> ident(rest, [unquote(b)], var!(context), var!(line), var!(column)+1, 0, var!(acc))
        end
      )
    end

    defp ident_start(rest, context, line, column, acc) do
      case rest do
        unquote(case_ast)
      end
    end

I believe there might be an issue in the compiler @bjorng as I don’t understand why all the previous cases has no issue and the return is fairly simple and should not break binary optimization.

The patterns looks like this.

iex(2)> tl(Unicode.Set.to_pattern!("[[:Lu:], [:Ll:], [:Lt:], [:Lm:], [:Lo:], [:Nl:]]"))
["A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P",
 "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "a", "b", "c", "d", "e", "f",
 "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v",
 "w", "x", "y", "z", "ª", "µ", "º", "À", "Á", "Â", "Ã", "Ä", "Å", "Æ",
 "Ç", "È", "É", "Ê", "Ë", "Ì", "Í", "Î", "Ï", "Ð", "Ñ", "Ò", "Ó",
 "Ô", "Õ", "Ö", "Ø", "Ù", "Ú", "Û", "Ü", "Ý", "Þ", "ß", "à", "á",
 "â", "ã", "ä", "å", "æ", "ç", "è", "é", "ê", "ë", "ì", "í", ...]

I’d once again suggest giving reproducible examples. Even if you have Mix.install at the top. Otherwise It is very hard to give feedback on compiler issues without a way of reproducing them.

My example is to big to post, the updated lexer is here with the slow path commented: sql/lib/lexer.ex at main · elixir-dbvisor/sql · GitHub

Here is the fast minimal example:

Mix.install([{:unicode_set, "~> 1.0"}])
defmodule Fast do
  require Unicode.Set
  case_ast =
    for c <- Unicode.Set.to_pattern!("[[:Lu:], [:Ll:], [:Lt:], [:Lm:], [:Lo:], [:Nl:], [:Mn:], [:Mc:], [:Nd:], [:Pc:], [:Cf:]]") do
      hd(
        quote do
          <<unquote(c), _::binary>> -> true
        end
      )
    end ++ quote do
      _ -> false
    end

  def case_fn(binary) do
    case binary do
      unquote(case_ast)
    end
  end
end
Compiling /Users/benjaminschultzer/src/sql/fast_bug.exs
 get_module_name_from_core     :      0.000 s  557330.6 kB
 core_lint_module              :      0.016 s  557330.8 kB
 core_compile_directives       :      0.000 s  557330.9 kB
 sys_core_fold                 :      0.299 s  557323.8 kB
 sys_core_alias                :      0.138 s  557323.8 kB
 core_transforms               :      0.000 s  557323.8 kB
 sys_core_bsm                  :      0.107 s  557323.8 kB
 core_to_ssa                   :      1.262 s  295862.4 kB
 beam_ssa_bool                 :      1.098 s  295862.4 kB
 beam_ssa_share                :      0.494 s  151986.5 kB
 beam_ssa_recv                 :      0.000 s  151986.5 kB
 beam_ssa_bsm                  :      0.205 s  151986.7 kB
    %% Sub passes of beam_ssa_bsm from slowest to fastest:
    combine_matches            :      0.118 s  58 %
    accept_context_args        :      0.065 s  32 %
    allow_context_passthrough  :      0.011 s   5 %
    skip_outgoing_tail_extracti:      0.010 s   5 %
    annotate_context_parameters:      0.000 s   0 %
 beam_ssa_opt                  :      1.252 s  156173.3 kB
    %% Sub passes of beam_ssa_opt from slowest to fastest:
    ssa_opt_live               :      0.291 s  23 %
    ssa_opt_type_start         :      0.268 s  22 %
    ssa_opt_dead               :      0.154 s  12 %
    ssa_opt_type_continue      :      0.109 s   9 %
    ssa_opt_bsm_shortcut       :      0.100 s   8 %
    ssa_opt_alias              :      0.096 s   8 %
    ssa_opt_cse                :      0.048 s   4 %
    ssa_opt_ranges             :      0.023 s   2 %
    ssa_opt_try                :      0.019 s   2 %
    ssa_opt_merge_blocks       :      0.018 s   1 %
    ssa_opt_trim_unreachable   :      0.016 s   1 %
    ssa_opt_linearize          :      0.014 s   1 %
    ssa_opt_float              :      0.014 s   1 %
    ssa_opt_bs_ensure          :      0.011 s   1 %
    ssa_opt_element            :      0.010 s   1 %
    ssa_opt_tail_phis          :      0.010 s   1 %
    ssa_opt_split_blocks       :      0.007 s   1 %
    ssa_opt_tail_literals      :      0.007 s   1 %
    ssa_opt_coalesce_phis      :      0.006 s   1 %
    ssa_opt_record             :      0.006 s   0 %
    ssa_opt_redundant_br       :      0.006 s   0 %
    ssa_opt_bsm                :      0.001 s   0 %
    ssa_opt_blockify           :      0.001 s   0 %
    ssa_opt_update_tuple       :      0.001 s   0 %
    ssa_opt_no_reuse           :      0.000 s   0 %
    ssa_opt_get_tuple_element  :      0.000 s   0 %
    ssa_opt_destructive_update :      0.000 s   0 %
    ssa_opt_sink               :      0.000 s   0 %
    ssa_opt_ne                 :      0.000 s   0 %
    ssa_opt_bs_create_bin      :      0.000 s   0 %
    ssa_opt_tuple_size         :      0.000 s   0 %
    ssa_opt_sw                 :      0.000 s   0 %
    ssa_opt_merge_updates      :      0.000 s   0 %
    ssa_opt_bc_size            :      0.000 s   0 %
    ssa_opt_unfold_literals    :      0.000 s   0 %
    ssa_opt_type_finish        :      0.000 s   0 %
 beam_ssa_throw                :      0.035 s  156173.3 kB
 beam_ssa_pre_codegen          :      0.222 s  156028.6 kB
    %% Sub passes of beam_ssa_pre_codegen from slowest to fastest:
    fix_bs                     :      0.052 s  24 %
    live_intervals             :      0.034 s  15 %
    place_frames               :      0.030 s  14 %
    sanitize                   :      0.025 s  11 %
    reserve_regs               :      0.020 s   9 %
    assert_no_critical_edges   :      0.017 s   8 %
    expand_update_tuple        :      0.015 s   7 %
    number_instructions        :      0.014 s   6 %
    opt_get_list               :      0.007 s   3 %
    linear_scan                :      0.007 s   3 %
    expand_match_fail          :      0.000 s   0 %
    fix_receives               :      0.000 s   0 %
    reserve_yregs              :      0.000 s   0 %
    turn_yregs                 :      0.000 s   0 %
    copy_retval                :      0.000 s   0 %
    frame_size                 :      0.000 s   0 %
    find_yregs                 :      0.000 s   0 %
 beam_ssa_codegen              :      0.098 s  151450.1 kB
 beam_validator_strong         :      0.211 s  151450.1 kB
 beam_a                        :      0.010 s  151412.3 kB
 beam_block                    :      0.001 s  151597.6 kB
 beam_jump                     :      0.050 s  139120.4 kB
 beam_clean                    :      0.000 s  139120.4 kB
 beam_trim                     :      0.000 s  139120.4 kB
 beam_flatten                  :      0.000 s  139119.1 kB
 beam_z                        :      0.000 s  139123.7 kB
 beam_validator_weak           :      0.017 s  139123.7 kB
 beam_asm                      :      0.142 s  137917.5 kB

And slow

Mix.install([{:unicode_set, "~> 1.0"}])
defmodule Slow do
  require Unicode.Set
  case_ast =
    for c <- Unicode.Set.to_pattern!("[[:Lu:], [:Ll:], [:Lt:], [:Lm:], [:Lo:], [:Nl:], [:Mn:], [:Mc:], [:Nd:], [:Pc:], [:Cf:]]") do
      hd(
        quote do
          <<unquote(c), rest::binary>> -> case_fn(rest, [unquote(c)|var!(acc)])
        end
      )
    end ++ quote do
      _ -> var!(acc)
    end

  def case_fn(binary, acc \\ []) do
    case binary do
      unquote(case_ast)
    end
  end
end
Compiling /Users/benjaminschultzer/src/sql/slow_bug.exs
 get_module_name_from_core     :      0.000 s  885159.0 kB
 core_lint_module              :      0.058 s  885159.2 kB
 core_compile_directives       :      0.000 s  885159.3 kB
 sys_core_fold                 :      0.605 s  888540.0 kB
 sys_core_alias                :      0.170 s  888540.0 kB
 core_transforms               :      0.000 s  888540.0 kB
 sys_core_bsm                  :      0.145 s  888540.0 kB
 core_to_ssa                   :      1.744 s  531660.5 kB
 beam_ssa_bool                 :      2.730 s  531660.5 kB
 beam_ssa_share                :      1.182 s  531658.9 kB
 beam_ssa_recv                 :      0.009 s  531658.9 kB
 beam_ssa_bsm                  :     17.316 s  535050.5 kB
    %% Sub passes of beam_ssa_bsm from slowest to fastest:
    combine_matches            :      8.233 s  48 %
    skip_outgoing_tail_extracti:      4.488 s  26 %
    accept_context_args        :      3.033 s  18 %
    allow_context_passthrough  :      1.554 s   9 %
    annotate_context_parameters:      0.000 s   0 %
 beam_ssa_opt                  :     51.694 s  687403.9 kB
    %% Sub passes of beam_ssa_opt from slowest to fastest:
    ssa_opt_alias              :      9.425 s  19 %
    ssa_opt_live               :      8.518 s  17 %
    ssa_opt_type_start         :      7.475 s  15 %
    ssa_opt_dead               :      3.571 s   7 %
    ssa_opt_bsm_shortcut       :      3.062 s   6 %
    ssa_opt_cse                :      2.385 s   5 %
    ssa_opt_type_continue      :      2.268 s   4 %
    ssa_opt_ranges             :      1.590 s   3 %
    ssa_opt_try                :      1.393 s   3 %
    ssa_opt_merge_blocks       :      1.372 s   3 %
    ssa_opt_element            :      1.025 s   2 %
    ssa_opt_trim_unreachable   :      1.022 s   2 %
    ssa_opt_linearize          :      0.966 s   2 %
    ssa_opt_split_blocks       :      0.947 s   2 %
    ssa_opt_bs_ensure          :      0.898 s   2 %
    ssa_opt_coalesce_phis      :      0.885 s   2 %
    ssa_opt_tail_phis          :      0.772 s   2 %
    ssa_opt_tail_literals      :      0.741 s   1 %
    ssa_opt_float              :      0.674 s   1 %
    ssa_opt_redundant_br       :      0.552 s   1 %
    ssa_opt_record             :      0.357 s   1 %
    ssa_opt_update_tuple       :      0.248 s   0 %
    ssa_opt_no_reuse           :      0.130 s   0 %
    ssa_opt_bsm                :      0.111 s   0 %
    ssa_opt_blockify           :      0.080 s   0 %
    ssa_opt_bs_create_bin      :      0.058 s   0 %
    ssa_opt_get_tuple_element  :      0.053 s   0 %
    ssa_opt_ne                 :      0.048 s   0 %
    ssa_opt_tuple_size         :      0.044 s   0 %
    ssa_opt_destructive_update :      0.042 s   0 %
    ssa_opt_sink               :      0.030 s   0 %
    ssa_opt_sw                 :      0.010 s   0 %
    ssa_opt_bc_size            :      0.009 s   0 %
    ssa_opt_merge_updates      :      0.009 s   0 %
    ssa_opt_unfold_literals    :      0.000 s   0 %
    ssa_opt_type_finish        :      0.000 s   0 %
 beam_ssa_throw                :      1.797 s  687403.9 kB
 beam_ssa_pre_codegen          :     15.206 s  660826.1 kB
    %% Sub passes of beam_ssa_pre_codegen from slowest to fastest:
    fix_bs                     :      3.970 s  26 %
    live_intervals             :      2.191 s  14 %
    reserve_regs               :      2.007 s  13 %
    sanitize                   :      1.711 s  11 %
    number_instructions        :      1.062 s   7 %
    place_frames               :      1.027 s   7 %
    assert_no_critical_edges   :      0.916 s   6 %
    linear_scan                :      0.870 s   6 %
    expand_update_tuple        :      0.825 s   5 %
    opt_get_list               :      0.440 s   3 %
    expand_match_fail          :      0.142 s   1 %
    fix_receives               :      0.008 s   0 %
    turn_yregs                 :      0.000 s   0 %
    frame_size                 :      0.000 s   0 %
    reserve_yregs              :      0.000 s   0 %
    copy_retval                :      0.000 s   0 %
    find_yregs                 :      0.000 s   0 %
 beam_ssa_codegen              :      4.635 s  336084.4 kB
 beam_validator_strong         :      0.636 s  336084.4 kB
 beam_a                        :      0.103 s  338307.4 kB
 beam_block                    :      0.095 s  362047.0 kB
 beam_jump                     :      0.653 s  362046.9 kB
 beam_clean                    :      0.071 s  362046.9 kB
 beam_trim                     :      0.010 s  362046.9 kB
 beam_flatten                  :      0.011 s  338307.4 kB
 beam_z                        :      0.019 s  336084.0 kB
 beam_validator_weak           :      0.661 s  336084.0 kB
 beam_asm                      :      1.045 s  248700.5 kB

1 Like

I’d say that’s pretty much expected. In one version you are compiling 144693 clauses of shape:

<<unquote(c), _::binary>> → true

And in the other 144693 clauses of shape:

<<unquote(c), rest::binary>> -> case_fn(rest, [unquote(c)|var!(acc)])

The first has this AST on the right side of ->:

true

The second has this AST:

{:case_fn, [],
 [
   {:rest, [], Elixir},
   [
     {:|, [],
      [
        c,
        {:var!, [context: Elixir, imports: [{1, Kernel}, {2, Kernel}]],
         [{:acc, [], Elixir}]}
      ]}
   ]
 ]}

The second version has to compile a larger amount of code. If you give more code to the compiler, it will most likely take longer to compile. That’s why it is always a good idea to keep the amount of generated code to a minimum.

1 Like

Just as a matter of interest what does the expanded code look like?

1 Like