Erlang compiler slow on beam_ssa_bool and beam_ssa_opt

Hi, I have a project GitHub - elixir-dbvisor/sql: Brings an extensible SQL parser and sigil to Elixir, confidently write SQL with automatic parameterized queries. that has recently gone slow efter a refactor, before I start trying to please the compiler I want to understand if there is any other recourse to take, who knows, it might be an issue in the compiler that I’m not aware of, anyway. two of my files have started showing the infamouse: it’s taking longer then 10 seconds to compile.

Running ERL_COMPILER_OPTIONS="[time,no_stack_trimming]" mix run lib/lexer.ex doesn’t tell me a lot other then where things are slow, beam_ssa_bool and beam_ssa_opt how would I go about debuging where the real issue is and if there might be a potential to optimize the compiler?

Compiling /Users/benjaminschultzer/src/sql/lib/lexer.ex
 get_module_name_from_core     :      0.000 s   25389.3 kB
 core_lint_module              :      0.025 s   25389.6 kB
 core_old_inliner              :      0.616 s  777410.2 kB
 sys_core_fold                 :      0.996 s  546326.5 kB
 core_fold_after_inlining      :      0.196 s  546326.5 kB
 sys_core_alias                :      0.105 s  546326.5 kB
 core_transforms               :      0.000 s  546326.5 kB
 sys_core_bsm                  :      0.123 s  546326.5 kB
 core_to_ssa                   :      4.403 s  175685.5 kB
 beam_ssa_bool                 :     15.568 s  113570.0 kB
 beam_ssa_share                :      0.176 s  113568.4 kB
 beam_ssa_recv                 :      0.004 s  113568.4 kB
 beam_ssa_bsm                  :      5.639 s  113698.9 kB
    %% Sub passes of beam_ssa_bsm from slowest to fastest:
    combine_matches            :      3.248 s  58 %
    skip_outgoing_tail_extracti:      1.245 s  22 %
    accept_context_args        :      0.814 s  14 %
    allow_context_passthrough  :      0.329 s   6 %
    annotate_context_parameters:      0.000 s   0 %
 beam_ssa_opt                  :      6.725 s   29720.3 kB
    %% Sub passes of beam_ssa_opt from slowest to fastest:
    ssa_opt_alias              :      1.226 s  19 %
    ssa_opt_live               :      1.072 s  16 %
    ssa_opt_type_start         :      0.988 s  15 %
    ssa_opt_type_continue      :      0.621 s  10 %
    ssa_opt_dead               :      0.617 s   9 %
    ssa_opt_cse                :      0.504 s   8 %
    ssa_opt_sink               :      0.406 s   6 %
    ssa_opt_element            :      0.235 s   4 %
    ssa_opt_linearize          :      0.194 s   3 %
    ssa_opt_coalesce_phis      :      0.182 s   3 %
    ssa_opt_split_blocks       :      0.152 s   2 %
    ssa_opt_record             :      0.054 s   1 %
    ssa_opt_tail_phis          :      0.053 s   1 %
    ssa_opt_bsm_shortcut       :      0.053 s   1 %
    ssa_opt_ne                 :      0.051 s   1 %
    ssa_opt_update_tuple       :      0.025 s   0 %
    ssa_opt_ranges             :      0.017 s   0 %
    ssa_opt_merge_blocks       :      0.014 s   0 %
    ssa_opt_tuple_size         :      0.013 s   0 %
    ssa_opt_try                :      0.012 s   0 %
    ssa_opt_float              :      0.010 s   0 %
    ssa_opt_trim_unreachable   :      0.007 s   0 %
    ssa_opt_bc_size            :      0.005 s   0 %
    ssa_opt_bs_ensure          :      0.004 s   0 %
    ssa_opt_redundant_br       :      0.004 s   0 %
    ssa_opt_bs_create_bin      :      0.004 s   0 %
    ssa_opt_tail_literals      :      0.004 s   0 %
    ssa_opt_blockify           :      0.002 s   0 %
    ssa_opt_merge_updates      :      0.002 s   0 %
    ssa_opt_bsm                :      0.001 s   0 %
    ssa_opt_destructive_update :      0.001 s   0 %
    ssa_opt_get_tuple_element  :      0.001 s   0 %
    ssa_opt_sw                 :      0.000 s   0 %
    ssa_opt_unfold_literals    :      0.000 s   0 %
    ssa_opt_type_finish        :      0.000 s   0 %
 beam_ssa_throw                :      0.014 s   29720.3 kB
 beam_ssa_pre_codegen          :      0.425 s   34166.5 kB
    %% Sub passes of beam_ssa_pre_codegen from slowest to fastest:
    find_yregs                 :      0.094 s  22 %
    live_intervals             :      0.083 s  19 %
    linear_scan                :      0.050 s  12 %
    reserve_regs               :      0.044 s  10 %
    reserve_yregs              :      0.032 s   8 %
    fix_bs                     :      0.032 s   8 %
    place_frames               :      0.027 s   6 %
    frame_size                 :      0.011 s   3 %
    sanitize                   :      0.010 s   2 %
    turn_yregs                 :      0.010 s   2 %
    number_instructions        :      0.008 s   2 %
    assert_no_critical_edges   :      0.007 s   2 %
    expand_update_tuple        :      0.007 s   2 %
    opt_get_list               :      0.005 s   1 %
    copy_retval                :      0.005 s   1 %
    expand_match_fail          :      0.001 s   0 %
    fix_receives               :      0.000 s   0 %
 beam_ssa_codegen              :      0.111 s   15120.9 kB
 beam_validator_strong         :      0.232 s   15120.9 kB
 beam_a                        :      0.003 s   15169.1 kB
 beam_block                    :      0.003 s   17486.1 kB
 beam_jump                     :      0.138 s   16373.7 kB
 beam_clean                    :      0.003 s   16373.7 kB
 beam_flatten                  :      0.006 s   14174.1 kB
 beam_z                        :      0.002 s   13907.9 kB
 beam_validator_weak           :      0.234 s   13907.9 kB
 beam_asm                      :      0.019 s    4354.0 kB

/edit
I found out what cause the issue, it’s when I inline the main function in my modules. Without the inline then there is a bit of regression in performance, so the inlining does help, but at the cost of compile time, due to the use of guards. So I wonder what would be the best of both worlds, and if guards is beneficial from a runtime performance aspect?

Unfortunately, there does not seem to be any potential to optimize the compiler. The inlining hugely increases the code size, and the increased code size explains the slower compilation. I’ve tried to make the beam_ssa_bool pass faster, but only succeeded in making it a tiny bit faster for your code.

Instead of explicitly inlining certain functions, have you tried using the implicit inliner? The compilation will be almost as fast as no inlining. However, I am not sure whether it will actually increase the speed of your code. You will have to try it and measure. Here is an example how to enable implicit inlining and make it more aggressive than the default (by default it will attempt to inline while trying not to increase the code size):

-compile([inline,{inline_size,100}]).