Hi Folks,
I’m experiencing a performance bottleneck with the esqlite3 NIF driver ( GitHub - mmzeeman/esqlite: Erlang NIF for sqlite ) under high concurrent read load.
Current situation
-
Read-only heavy workload against SQLite database (no writes)
-
Erlang/OTP 28.0.2 on Linux Ubuntu 24.04 LTS
-
perf
profiling shows significant CPU time in SQLite operations -
All esqlite3 operations use
ERL_NIF_DIRTY_JOB_IO_BOUND
flag -
Default dirty IO schedulers: 10 threads
Performance profile excerpt
+ 10.28% 0.02% erts_dios_6 esqlite3_nif.so [.] sqlite3_step
- 10.26% sqlite3_step
6.15% sqlite3VdbeExec
1.23% sqlite3BtreeIndexMoveto
+ 7.42% erts_dios_6 esqlite3_nif.so [.] vdbeRecordCompareString
+ 5.82% erts_dios_6 esqlite3_nif.so [.] sqlite3BtreeIndexMoveto
+ 4.69% erts_dios_6 esqlite3_nif.so [.] sqlite3VdbeExec
The sqlite3_step
function (which maps to esqlite_step
NIF) consumes 10.26% CPU, with total SQLite operations around ~28%.
The bottleneck
Even though SQLite supports concurrent readers natively, the dirty IO scheduler pool (10 threads) appears to be serializing database operations. Multiple SQLite connections don’t help since they all queue through the same limited dirty scheduler pool.
Relevant C code from esqlite3
Looking at the NIF function exports in sqlite3_nif.c:
static ErlNifFunc nif_funcs[] = {
{"open", 1, esqlite_open, ERL_NIF_DIRTY_JOB_IO_BOUND},
{"close", 1, esqlite_close, ERL_NIF_DIRTY_JOB_IO_BOUND},
{"exec", 2, esqlite_exec, ERL_NIF_DIRTY_JOB_IO_BOUND},
{"step", 1, esqlite_step, ERL_NIF_DIRTY_JOB_IO_BOUND},
{"interrupt", 1, esqlite_interrupt, ERL_NIF_DIRTY_JOB_IO_BOUND},
{"backup_init", 4, esqlite_backup_init, ERL_NIF_DIRTY_JOB_IO_BOUND},
{"backup_step", 2, esqlite_backup_step, ERL_NIF_DIRTY_JOB_IO_BOUND},
{"backup_finish", 1, esqlite_backup_finish, ERL_NIF_DIRTY_JOB_IO_BOUND},
// ... other functions
};
All database I/O operations are marked as ERL_NIF_DIRTY_JOB_IO_BOUND
, which seems correct for disk operations but creates a concurrency bottleneck.
Questions
-
Safety: Is it safe to increase dirty IO schedulers to 50 with
+SDio 50
? Are there any gotchas or downsides beyond memory usage? -
Hardware correlation: How do dirty IO schedulers correlate with physical CPU cores? I have X cores - is there an optimal ratio or upper limit I should consider?
-
Alternative approaches: Are there better patterns for high-concurrency SQLite access in Erlang, or is increasing +SDio the correct solution?
The esqlite3 NIF marks all operations as dirty IO-bound (which seems architecturally correct for disk I/O), but this creates an artificial bottleneck when SQLite itself could handle much higher concurrency. All SQL operations are serialized through the limited dirty scheduler pool, losing Erlang’s core concurrency advantage.
Any guidance on optimal dirty scheduler configuration for read-heavy database workloads would be greatly appreciated.
Thanks
/Z