Hi
I can understand the requirement for a shared queue, that would indeed be a thing nice to have. But why the requirement for such extreme speed, ie where 200-350ns vs 1µs makes a difference?
I don’t know what you are planning to use this for, but in a general sense the primary use case for something like a shared queue is a producer/consumer setup. Producers create and enqueue tasks, consumers (workers) dequeue tasks and execute them, you know the pattern. But typically, the tasks are comparatively expensive and long-running, so I have doubts that a difference of 650-800ns in fetching a task would make a noticeable difference.
You might have something entirely different in mind, though. As you are willing to resort to implementing NIFs to gain speed, it looks like you desperately need it But I can’t imagine what for, so please tell me