…I smell AdTech
But your answer led me to another direction - use counter stored on disk and increase it only during startup and offset changes.
I’m lazy, I would look to OS guarantees. Mostly because I want to ignore the harder problem of guaranteeing state making it to disk consistently and reliably; leave that to the DB people
Taking Linux as an example here but each OS provides its own options. Of course the suitability of this depends on whether you have a persistent server or if your entire runtime is transient.
If you include the process PID, which you can bump the wrapping value if need be, you get effectively an incrementing counter for free. This captures if your runtime restarted.
If you want to detect a system reboot, then
boot_id is perfect to mix in. Though of course this is a random (non-sortable for your use case) value, but you may want to squirrel this away as an option of something you can store on disk safely; as if you cannot read your disk recorded version you just assume there was a reboot.
You can though infer a reboot counter via your filesystem mount count:
$ sudo tune2fs -l /dev/sda1 | grep 'Mount count'
Mount count: 9
Even if you remount without a reboot, it does not matter as long as the counter goes upwards.
so, when we will generate ID and will avoid any disk operations or counter updates.
If you are planning to generate millions per second, there is a warning in the manpage about using
erlang:unique_integer(), this is because it has to be strictly synchronised between all your CPU cores.
I personally would avoid using it.
Meanwhile I feel it a little-bit over-engineered
I think you need to step back a moment and meditate on…the speed of light is a limit…CPU ticks take time…memory bandwidth is not infinite.
These of course sound stupid to say out loud, but it is easy to forget their impact.
So the question is what is actually the time ordering of the events coming in?
When they hit the NIC?
When they hit the OS?
When they get routed into your applications network buffer?
When you actually pop the packet out of the buffer?
Now throw in that these events are reordered in the majority of environments as your network card (and the switch it is plugged into) is multi-queued, etc. Erlang also has inherit jitter too, it may pop off the packets in one order but your processes get scheduled in an order that is different to how they were popped.
You are talking of the order of 1m events per second, we are talking about microseconds as your base resolution. What is the impact to your service if some of the events have jitter by 10us in ordering of the timestamping? What about 100us? Some environments do have this constraint, finance is one space I can think of, yours may not and as time sortable is not a hard requirement for you I suspect it is not?
As I treat time as quantised (having discrete values) I probably would recommend something more like the following as applications should just embrace jitter (like you are embracing time correction):
Of course you could push
erlang:phash2 if you really needed an integer here.
I would then include a timestamp separately with
erlang:time_offset() coupled to it; probably as an event in your event stream that applies to all events after it…which will be jittered in its-self…