Erl_tar performance on Apple M1

ericmj · March 6, 2022, 8:43pm

erl_tar:extract performance is about 10-30x slower on Apple computers using the new M1 chip compared to Intel machines. This issue has been confirmed by multiple people in the community and seems to be affecting M1 Max/Pro chips more than the original M1 chips.

The issue can be circumvented by excluding the extraction directory from Spotlight indexing.

Given that the issue only happens on M1 chips and can be worked around by disabling Spotlight indexing it seems to be an Apple issue but I haven’t been able to reproduce the issue in other languages or by using GNU tar.

The issue is affecting the community because running tools such as mix deps.get can take up to 2 minutes even when packages are cached and Hex only has to extract the packages to the project directory. I haven’t tested if it also affects rebar3 but it should since it is also using erl_tar.

To reproduce the issue I have created a shell script to download some Hex package tarballs and an Erlang script that will extract them to show the performance issue:

packages=(
bamboo-2.2.0.tar
bamboo_phoenix-1.0.0.tar
bcrypt_elixir-2.3.0.tar
certifi-2.8.0.tar
comeonin-5.3.2.tar
connection-1.1.0.tar
corsica-1.1.3.tar
cowboy-2.9.0.tar
cowboy_telemetry-0.4.0.tar
cowlib-2.11.0.tar
db_connection-2.4.1.tar
decimal-2.0.0.tar
earmark-1.4.18.tar
earmark_parser-1.4.17.tar
ecto-3.7.1.tar
ecto_sql-3.7.1.tar
elixir_make-0.6.3.tar
eqrcode-0.1.10.tar
ex_aws-2.2.7.tar
ex_aws_s3-2.3.1.tar
ex_aws_ses-2.3.0.tar
ex_machina-2.7.0.tar
file_system-0.2.10.tar
goth-1.3.0-rc.3.tar
hackney-1.18.0.tar
hex_core-0.8.2.tar
idna-6.1.1.tar
jason-1.2.2.tar
jose-1.11.2.tar
libcluster-3.3.0.tar
logster-1.0.2.tar
metrics-1.0.1.tar
mime-1.6.0.tar
mimerl-1.2.0.tar
mox-1.0.1.tar
parse_trans-3.3.1.tar
phoenix-1.6.2.tar
phoenix_ecto-4.4.0.tar
phoenix_html-3.1.0.tar
phoenix_live_dashboard-0.6.1.tar
phoenix_live_reload-1.3.3.tar
phoenix_live_view-0.17.3.tar
phoenix_pubsub-2.0.0.tar
phoenix_view-1.0.0.tar
plug-1.12.1.tar
plug_attack-0.4.3.tar
plug_cowboy-2.5.2.tar
plug_crypto-1.2.2.tar
postgrex-0.15.13.tar
pot-1.0.2.tar
ranch-1.8.0.tar
rollbax-0.11.0.tar
ssl_verify_fun-1.1.6.tar
sweet_xml-0.7.1.tar
telemetry-1.0.0.tar
telemetry_metrics-0.6.1.tar
telemetry_poller-1.0.0.tar
unicode_util_compat-0.7.0.tar
)

for package in $packages; do
  wget "https://repo.hex.pm/tarballs/${package}.tar"
done

lists:foreach(fun(Path) ->
  erl_tar:extract(
    Path, [
    {cwd, filename:basename(Path, ".tar")},
    {files, ["contents.tar.gz"]}
  ])
end, filelib:wildcard("*.tar")).

%% Running this will show the slow performance
lists:foreach(fun(Path) ->
  erl_tar:extract(
    Path, [
      compressed,
      {cwd, filename:rootname(Path, ".tar.gz")}
    ]
  )
end, filelib:wildcard("*/contents.tar.gz")).

I am not sure how to continue investigating this issue so any help would be appreciated.

mikl · March 6, 2022, 11:30pm

There are a couple of ways to tell Spotlight not to index a folder outlined on StackOverflow. That might be one way to avoid (if not solve) this issue.

ericmj · March 6, 2022, 11:32pm

Unfortunately support for .metadata_never_index has been removed in later versions of macOS and suffixing directory names with .noindex would be a breaking change.

starbelly · March 7, 2022, 1:21am

Extracting to memory then doing a second pass to write each file out seems to help. I was seeing average times of 7 seconds, see 2 seconds average now (not concurrently). That’s still unfortunately slow, but I’m not sure there’s a better solution outside trying to optimize the algorithm for extraction.

As you mentioned conventions such as dropping .metadata_never_index doesn’t work anymore, and I didn’t even see the directory.noindex worked either.

Alternatives would be to executing commands with sudo to temporarily disable spotlight, etc. but that’s not a good idea

Edit:

Spawning for each file to be written I see an average of 400ms, which is faster when spotlight is turned off all together. With spotlight turned off and just running what you had above I saw 700ms on average.

extract(PackagePath) ->
    file:set_cwd(PackagePath),
    lists:foreach(fun(Path) ->
        erl_tar:extract(
        Path, [
        {cwd, filename:basename(Path, ".tar")},
        {files, ["contents.tar.gz"]}
    ])
    end, filelib:wildcard("*.tar")),

    lists:foreach(fun(Path) ->
        Opts = [compressed, memory, {cwd, filename:rootname(Path, ".tar.gz")}],
        {ok, Files} = erl_tar:extract(Path, Opts),
        Root = filename:dirname(Path),
        lists:foreach(fun({Name, Bin}) ->
                            spawn(fun() ->
                                filelib:ensure_dir(filename:join(Root, Name)),
                                ok = file:write_file(filename:join(Root, Name), Bin)
                            end)
                        end, Files)

    end, filelib:wildcard("*/contents.tar.gz")).

Edit:

Sadly, that seemed to be a fluke. I ran it several times, but now coming back to it again, I’m seeing sludge

ericmj · March 7, 2022, 3:08pm

I think any performance improvement from extracting into memory and then writing the files manually is because it skips safety checks that also do a bunch I/O such as relative path checking and following symlinks.

I think the issue could be related to how macOS schedules processes on efficiency and performance cores. Running mix deps.get or extracting tarballs seems to use 100% CPU on the efficiency cores without the performance cores being used at all. Maybe because the processes are doing mostly I/O the OS wants to schedule them on the efficiency cores even though the cores are already saturated by the Spotlight indexing work? It would also explain why the vanilla M1 chips (not Pro or Max) seems to be unaffected by this issue since they have 4 efficiency cores instead of Pro/Max’s 2 cores.

starbelly · March 7, 2022, 5:27pm

That makes sense to me.

I am on a vanilla M1

I think this is unrelated to M1 in general, albeit it seems worse on M1.

I tried this on an x86 mac (2018, intel i7, 6 core) running catalina, same problem. With it off, average of 1 second, with the first run would be 5 seconds, subsequent runs between 1.2 and 2 seconds. I’m willing to bet if I upgraded this mac to big sur or monterey, I’d see as as bad as my m1. That would be inline with reports of spotlight generally being worse after others upgraded to big sur (in general, unrelated to tarball extraction).

Note that my M1 was on big sur, and now monterey, no improvement.

Other languages and their dep managers seem to have had these issues in the past and present as well, so it’s not just beam projects that are affected. There doesn’t seem to be a resolution, at least nothing recent. For example with npm, in 2018 the solution was .metadata_never_index, of course that doesn’t work anymore.

I’ve never run into this before because the first thing I do when setting up a new mac is disable spotlight, as it just slows everything down per my personal experience.

Really, the only thing I think we can do is advise people to disable spotlight, make an alias to easily disable spotlight for specific directories (i.e., vs going through the gui, etc.).

We can also try griping at apple

bjorng · March 8, 2022, 3:01pm

I tried to do some benchmarking, but the times fluctuates wildly from 0.5 second to up to about 4 seconds on both my M1 MacBook Pro (Monterey) and my Intel iMac (Big Sur), though the times tend to be more frequently at the lower end of that range on my iMac than on the M1 Mac.

Doing some profiling using eprof, I saw some functions that were surprisingly slow. I have tried to optimize them in this pull request. I am not sure how much good that will do, if any. Perhaps it would be better do much more work so that BEAM could hog those performance cores from the Spotlight indexer.

starbelly · March 8, 2022, 4:20pm

Did you benchmark after you made changes?

Either way, I’ll try this when I have some time.

ericmj · March 8, 2022, 5:38pm

M1 vanilla or a Pro/Max? From the various data I have collected from asking people on Slack and other places to run tests, the non Pro/Max seems to be way less effected by this issue.

Haha, you have point, maybe some busy loops would help ? Thank you for the PR!

On my Intel i9 macbook running the above test completes in 0.7s and on my M1 Pro macbook it takes 19s pretty consistently (with Spotlight enabled on both):

Intel i9 (spotlight enabled):

2> timer:tc(fun() ->
2>   lists:foreach(fun(Path) ->
2>     erl_tar:extract(
2>       Path, [
2>         compressed,
2>         {cwd, filename:rootname(Path, ".tar.gz")}
2>       ]
2>     )
2>   end, filelib:wildcard("*/contents.tar.gz"))
2> end).
{673983,ok}
3> timer:tc(fun() -> ...
{691248,ok}
4> timer:tc(fun() -> ...
{707146,ok}

M1 Pro (spotlight enabled):

1> timer:tc(fun() ->
1>   lists:foreach(fun(Path) ->
1>     erl_tar:extract(
1>       Path, [
1>         compressed,
1>         {cwd, filename:rootname(Path, ".tar.gz")}
1>       ]
1>     )
1>   end, filelib:wildcard("*/contents.tar.gz"))
1> end).
{19935160,ok}
2> timer:tc(fun() -> ...
{19345774,ok}
3> timer:tc(fun() -> ...
{19343511,ok}

M1 Pro (spotlight disabled):

1> timer:tc(fun() ->
1>   lists:foreach(fun(Path) ->
1>     erl_tar:extract(
1>       Path, [
1>         compressed,
1>         {cwd, filename:rootname(Path, ".tar.gz")}
1>       ]
1>     )
1>   end, filelib:wildcard("*/contents.tar.gz"))
1> end).
{745903,ok}
2> timer:tc(fun() -> ...
{655094,ok}
3> timer:tc(fun() -> ...
{736323,ok}

bjorng · March 9, 2022, 4:39am

I benchmarked before and after, but did not see any clear improvement because of the wild fluctuations.

M1 vanilla.

How long does the test take on your M1 Pro without the PR?

ericmj · March 9, 2022, 8:23pm

Unfortunately I didn’t see noticeable difference with or without the PR.

josevalim · April 6, 2022, 8:56am

I wonder if this pull request is related to this: erts: on macOS, prefer F_BARRIERFSYNC over of F_FULLFSYNC by mkuratczyk · Pull Request #5847 · erlang/otp · GitHub

jhogberg · April 6, 2022, 9:10am

No, that only affects the performance of file:sync/1, which erl_tar doesn’t use.