Rebar3 is taking longer to compile projects

Hi all
Our project uses rebar3 to manage dependencies and releases. As the project progressed some colleagues began to complain that rebar3 compile was taking longer to execute. I found that after deleting the source_project_apps.dag file rebar3 compile execution time becomes normal. I also noticed that the source_project_apps.dag file changed by about 20kB after executing rebar3 compile without any changes to the project and the compile time increased accordingly. The Analyzing applications phase also takes longer and longer to execute while compiling.

The project has about 2000 files. Is this a known problem, and how can I speed up the compilation of my project.

Is your project setup as an umbrella app? And if so, do you have {vsn, "git"} in each of the app.src file of your sub applications?

We ran into this issue with zotonic, which is setup as an umbrella app. When this is the case rebar3 will run a couple of git commands to determine the version of the sub-application. This can be very time consuming.

We fixed it by adding a rebar.config.script file with a command which writes the version number to a file. Every sub-application has {vsn, {file, "../../VERSION"}} in its app.src file, so it will reuse the version number.

Our project is an umbrella project, but all of our project’s dependencies are placed in the _checkouts directory and work offline.

I checked all the *.app.src files in the project and each file had a configuration like {vsn, "x.x.x"} which should have been there when the template was generated by default.

Check. Then it must be something different.

Are you running on Erlang 26?

Erlang 25 and 26 versions

Rebar3 3.22.1 version

Has this always been the case with the project or has it gotten slower with an upgrade to rebar3?

We have found that the problem is indeed a bug in rebar3, checking the file to be compiled and checking dependencies repeatedly adds vertex data to the dag structure. We are trying to correct it, unfortunately we probably won’t initiate PullRequests because we work on the Intranet.

There are many changes that need to be made, such as adding digraph:del_vertex(G, Source). in the following places to prevent data from being added to the dag file repeatedly.

Do you, by any chance, have ebin folders in your applications? erlang.mk and similar tools used to do that, and it throws rebar3 off… I can’t find that thread, but what’s been happening, rebar3 used to make a copy of all files in ebin, leading to very slow compile times.

Hm, interesting one. Looking at the digraph implementation, the add_vertex/3 implementation defers to this call: https://github.com/erlang/otp/blob/1cfee42abc140e91f00f23b628728b42b93b6efd/lib/stdlib/src/digraph.erl#L377-L379

The vertices and edges are declared as set tables: https://github.com/erlang/otp/blob/1cfee42abc140e91f00f23b628728b42b93b6efd/lib/stdlib/src/digraph.erl#L77-L79

Since the table is of type set, the insertion should use only the key (the first element) as a key and the label changing should have no impact; they should just be replaced as is.

At least in theory the need to drop file entries for the updates doesn’t make sense right now. There may be a need to dig further.

I’m sorry that I made a mistake before, but the real problem is here.

These two locations add edges repeatedly, which is the root cause of bloated dag files

These two edges are tracking include files and the beam files coming out of source files. They should not endlessly grow as a dataset for sure.

I can imagine a few things maybe having that effect.

The first one is what Max suggested here:

Rebar3 will copy/paste any file in app-level ebin/ directories in the source app down to the artifacts step in case anything pre-compiled is present. These files generally end up having variable update times and mess with the DAG and take more time than usual to check

The second one is whether there’s something odd happening with the filesystem: are there relative paths in directory configurations where there usually aren’t? Are there symlinks across source files in ways that are maybe unusual? Are you using a virtualized filesystem (eg. working with shared host paths in virtualbox on windows)? All of these are less often encountered/tested ways of working or known to be buggy issues (particularly symlinks in virtualbox with shared host drives).

The third thing I’m thinking of is whether you’re doing anything special around file generation, adding extra compilation steps or editing/modifying files in _build via plugins or hooks?

Each of these would require extra digging.

Another option would be for you to load the DAG’s file dump to analyze it and see if there are file paths you don’t expect in there:

1> {ok, Data} = file:read_file("_build/default/lib/.rebar3/rebar_compiler_erl/source_project_apps.dag"),
   binary_to_term(Data).
%% #dag{vsn=?DAG_VSN, meta = CritMeta, vtab = VTab,
%%      etab = ETab, ntab = NTab} 
{dag,4, % version 4
     [{compiler,"8.3"}], % compiler version
     %% list of vertices: build artifacts track all the compiler options;
     [{"/home/ferd/code/self/rebar3/_build/default/lib/relx/ebin/relx.beam",
       {artifact,[{compiler_version,"8.3"},
                  {outdir,"/home/ferd/code/self/rebar3/_build/default/lib/relx/ebin"},
                  no_spawn_compiler_process,
                  {d,'RLX_LOG',rebar_log},
                  debug_info,warnings_as_errors,inline,
                  {i,"/home/ferd/code/self/rebar3/_build/default/lib/relx/src"},
                  {i,"/home/ferd/code/self/rebar3/_build/default/lib/relx/include"},
                  {i,"/home/ferd/code/self/rebar3/_build/default/lib/relx"},
                  return]}},
       %% header files and code files are just the file and their timestamp
      {"/home/ferd/code/self/rebar3/vendor/erlware_commons/include/ec_cmd_log.hrl",
       {{2023,6,24},{0,4,40}}},
      {"/home/ferd/code/self/rebar3/vendor/erlware_commons/src/ec_vsn.erl",
       {{2023,6,24},{0,4,40}}},
      {...}|...],
    %% edges have unique IDs, you can see the two files which represent a "depends on" path
    %% where the first file depends on the second one in the build order. Source files (used for
    %% includes or headers or parse transforms) have the same representation
     [{['$e'|115],
       "/home/ferd/code/self/rebar3/apps/rebar/src/rebar_prv_packages.erl",
       "/home/ferd/code/self/rebar3/vendor/providers/src/provider.erl",
       []},
    %% build artifacts have a specific artifact label.
      {['$e'|385],
       "/home/ferd/code/self/rebar3/_build/default/lib/rebar/ebin/rebar_prv_help.beam",
       "/home/ferd/code/self/rebar3/apps/rebar/src/rebar_prv_help.erl",
       artifact},
      {...}|...],
    %% neighbor table, maintained internally by the digraph app.
     [{{in,"/home/ferd/code/self/rebar3/vendor/relx/src/rlx_log.erl"},
       ['$e'|259]},
      {{out,"/home/ferd/code/self/rebar3/_build/default/lib/rebar/ebin/rebar_string.beam"},
       ['$e'|384]},
      {...}|...]}

If you were able to provide us an anonymized version of the dump it could be workable for me to investigate where duplicated paths might come from. I could expect or understand long build times if the whole directory kept being refreshed on timestamps, but the fact that it takes longer and longer hints at some file paths being dynamically added in ways they shouldn’t, and the DAG should show what they are.

1 Like

I work on Windows and some of my colleagues are working on Linux.

The projects live locally on the hard disk and do not use similar technologies such as samba.

Our project does have a lot of custom rebar3 plugins, but only to speed up development.

I have a simple example that reproduces this problem.

Use rebar3 new lib to create new lib.

Edit the mylib.erl file

-module(mylib).

-export([test/0]).


test() ->
   a.

Repeat several times to modify the return value of mylib:test/0 and run rebar3 compile and observe the size change of the source_project_apps.dag file.

The source_project_apps.dag file will increase by about 14 bytes each time.

Yep, confirmed. It looks like the edges get re-added if they already existed, the de-duplication doesn’t work because it appears to use some incrementing or unique value for each edge. So they’ll need to be looked at before they are re-added to avoid slowly growing the cost of loading them in memory.

This appears to only happen with the artifacts edge, not the other ones.

See if this PR helps: Dedupe compiler DAG edge insertion for artifacts by ferd · Pull Request #2850 · erlang/rebar3 · GitHub – you may want to delete the _build/default/lib/.rebar3/rebar_compiler_erl/source_project_apps.dag file first; the thing will look for duplicates to avoid inserting more but won’t clean up the old ones.

2 Likes

I have upgraded to the latest version of rebar3 but the problem is still there

You can create a project with rebar3 new release and rely on several third-party libraries, and then generate some erl files in the project. compile the specific profile with rebar3 as xxxx compile. If the command source_project_apps.dag is executed repeatedly without changing any files, it will continue to grow larger

Sincerely appreciate

I’ll try to take a look. I’m unlikely to have much time to dedicate to that issue this week, but I’ve confirmed a few bytes added each run when trying this on the rebar3 project itself.

See if Prevent infinite DAG growth by ferd · Pull Request #2892 · erlang/rebar3 · GitHub helps. That’s the bug I managed to identify and fix, but given there are only 3 places we add edges (one for fresh files, one for cross-app tracking, and the one I patched), this should address your problem.

1 Like

Sorry for taking so long to reply you, I have seen the new PR and tried it to prove that the problem is perfectly fixed. Thank you for doing this :grinning: