High-concurrency SQLite reads with esqlite3 NIF - performance bottleneck

Hi Folks,

I’m experiencing a performance bottleneck with the esqlite3 NIF driver ( GitHub - mmzeeman/esqlite: Erlang NIF for sqlite ) under high concurrent read load.

Current situation

  • Read-only heavy workload against SQLite database (no writes)

  • Erlang/OTP 28.0.2 on Linux Ubuntu 24.04 LTS

  • perf profiling shows significant CPU time in SQLite operations

  • All esqlite3 operations use ERL_NIF_DIRTY_JOB_IO_BOUND flag

  • Default dirty IO schedulers: 10 threads

Performance profile excerpt

+   10.28%     0.02%  erts_dios_6      esqlite3_nif.so        [.] sqlite3_step
   - 10.26% sqlite3_step
        6.15% sqlite3VdbeExec
        1.23% sqlite3BtreeIndexMoveto
+    7.42%  erts_dios_6      esqlite3_nif.so        [.] vdbeRecordCompareString
+    5.82%  erts_dios_6      esqlite3_nif.so        [.] sqlite3BtreeIndexMoveto
+    4.69%  erts_dios_6      esqlite3_nif.so        [.] sqlite3VdbeExec

The sqlite3_step function (which maps to esqlite_step NIF) consumes 10.26% CPU, with total SQLite operations around ~28%.

The bottleneck

Even though SQLite supports concurrent readers natively, the dirty IO scheduler pool (10 threads) appears to be serializing database operations. Multiple SQLite connections don’t help since they all queue through the same limited dirty scheduler pool.

Relevant C code from esqlite3

Looking at the NIF function exports in sqlite3_nif.c:

static ErlNifFunc nif_funcs[] = {
    {"open", 1, esqlite_open, ERL_NIF_DIRTY_JOB_IO_BOUND},
    {"close", 1, esqlite_close, ERL_NIF_DIRTY_JOB_IO_BOUND},
    {"exec", 2, esqlite_exec, ERL_NIF_DIRTY_JOB_IO_BOUND},
    {"step", 1, esqlite_step, ERL_NIF_DIRTY_JOB_IO_BOUND},
    {"interrupt", 1, esqlite_interrupt, ERL_NIF_DIRTY_JOB_IO_BOUND},
    {"backup_init", 4, esqlite_backup_init, ERL_NIF_DIRTY_JOB_IO_BOUND},
    {"backup_step", 2, esqlite_backup_step, ERL_NIF_DIRTY_JOB_IO_BOUND},
    {"backup_finish", 1, esqlite_backup_finish, ERL_NIF_DIRTY_JOB_IO_BOUND},
    // ... other functions
};

All database I/O operations are marked as ERL_NIF_DIRTY_JOB_IO_BOUND, which seems correct for disk operations but creates a concurrency bottleneck.

Questions

  1. Safety: Is it safe to increase dirty IO schedulers to 50 with +SDio 50? Are there any gotchas or downsides beyond memory usage?

  2. Hardware correlation: How do dirty IO schedulers correlate with physical CPU cores? I have X cores - is there an optimal ratio or upper limit I should consider?

  3. Alternative approaches: Are there better patterns for high-concurrency SQLite access in Erlang, or is increasing +SDio the correct solution?

The esqlite3 NIF marks all operations as dirty IO-bound (which seems architecturally correct for disk I/O), but this creates an artificial bottleneck when SQLite itself could handle much higher concurrency. All SQL operations are serialized through the limited dirty scheduler pool, losing Erlang’s core concurrency advantage.

Any guidance on optimal dirty scheduler configuration for read-heavy database workloads would be greatly appreciated.

Thanks
/Z

1 Like

There’s a general throughput optimization tactic for dealing with dirty I/O NIFs, that is to buffer up operations in the Erlang code (that can be scheduled efficiently), and then commit them all at once as a batch to the NIF. This is sometimes orders of magnitude more efficient than flooding dirty IO schedulers with lots of small tasks.

To apply this tactic one usually needs one of two things:

  1. Either the business logic should allow merging multiple operation into one transaction. It’s not always the case.
  2. Or the NIF library itself is implemented with batching in mind. I haven’t worked with sqlite NIF, so I can’t say for certain if this is the case, so take it for what it’s worth.
2 Likes