Atom hashing performance

fhunleth · February 15, 2023, 8:34pm

I was profiling Erlang on a slow embedded system and came across the atom_hash C function. This function gets called a lot even though it’s pretty fast. It probably shows up higher on profiles than it really should.

Having said that, there are a few things I don’t understand:

Is the Latin1 clutch for r16 still needed? A quick test shows a tiny, but measurable improvement if it could be deleted.
I replaced the whole atom_hash function with return 1;. My test didn’t slow down perceptibly which really surprised me. I expected things to be unusable if all ~30K atoms hashed to the same value. Does this make sense? I assume that I’ve somehow avoided the critical paths that make use of the hash.
Any reasons to stick with hashpjw algorithm? It looks like any decent hash function should work.

I’m happy to make PRs (or not) based on what I learn.

Thanks!

max-au · February 15, 2023, 9:05pm

Tangentially related. I ran into the same problem when I attempted to use NIFs (e.g. esqlite). Turned out most if not all calls to atom_hash were caused by excessive (and unnecessary) usage of enif_make_atom routine. Hence in my implementation I followed the best practice - creating atoms upfront, and then just referencing.

Is there a chance you’re using some NIFs that are causing excessive calls to atom_hash?

fhunleth · February 15, 2023, 11:21pm

It looks like the NIFs I’m using are all being responsible and calling enif_make_atom on load. I’ll dig in some more. I had thought that most of the calls were on load of the beam files.

NAR · February 16, 2023, 10:02am

“I followed the best practice”

Where is this best practice documented? It might be a useful resource

sverker · February 16, 2023, 12:20pm

The “best practice” is unfortunately not documented, even though I have thought many times to do it.

It would be explained as an exception to the rule that all ERL_NIF_TERM belong to an ErlNifEnv. Atoms created during loading (by callbacks load or upgrade) can be referred as a term in any ErlNifEnv. That is, the best practice is to create all your atoms during loading and store them in static/global variables.

static ERL_NIF_TERM atom_ok;
static ERL_NIF_TERM atom_error;

static int load(ErlNifEnv* env, ...)
{
    atom_ok = enif_make_atom(env, "ok");
    atom_error = enif_make_atom(env, "error");
}

A documentation PR is welcome.

sverker · February 16, 2023, 1:52pm

atom_hash() is used for two things:

As an internal hash function for the atom table. Called when creating or looking up atoms from strings.
To be returned by erlang:phash2 which promises same hash values across all releases, architectures and implementations. That’s why the Latin1 clutch is still there.

fhunleth · February 16, 2023, 2:54pm

Thanks! That is super helpful. Plus the side topic on enif_make_atom turned up some NIFs that weren’t best practice. Sadly they were not causing performance problems, but nice to clean up anyway.

starbelly · February 17, 2023, 9:11pm

I’ve had to clean up a few nifs where this was happening recently. I’ll submit a doc PR soon. I figured it would go along with the function and code example to go along with it, but maybe you see it as something that should be at the top? Can hash out in the PR if I don’t hear back first.

NAR · February 17, 2023, 9:35pm

The PR is here: Add best practice for atoms in NIFs by NAR · Pull Request #6888 · erlang/otp · GitHub

starbelly · February 17, 2023, 9:36pm

Thank you!