I need a “unique” (see below) correlation ID for a network protocol. It’s defined as a 32-bit integer. I looked at using erlang:unique_integer(), but it doesn’t look immediately suitable.
The documentation for erlang:unique_integer/0,1 simply says that it returns “an integer…”.
From experimentation:
erlang:unique_integer([positive]) returns relatively small numbers, and they increment by a small amount (~30) each time I call it.
erlang:unique_integer([monotonic, positive]) starts counting at 1, incrementing by 1 for each call.
Looking in the source code, the returned value is constrained to a 64-bit integer.
Am I safe to truncate this further to a 32-bit integer?
Clarification of “unique”: the network protocol in question (Kafka, as it happens) requires that the correlation ID be unique for the requests in flight on a particular TCP connection. This means that they can be re-used for different connections, and can be re-used for a particular connection once the reply is received.
So they don’t need to be CRNG-quality (or even PRNG-quality).
It would be simple to keep the correlation ID in the connection-handling process’s state, but for optimisation purposes, I want to assign it in the caller.
Am I safe to truncate this further to a 32-bit integer?
That depends on your definition of “safe”, plain truncation can lead to loss of randomness. Since you only want 32 bits I’d suggest erlang:phash2/2 with semi-random input, say {os:timestamp(), self()}.
Since what you want is uniqueness and not randomness, you could also use a counter in a public ets table, incrementing by 1 and wrapping around to 0 after 16#ffffffff.
Depends on your use case, but you should avoid using monotonic on anything that needs to call this a large number of times a second.
Am I safe to truncate this further to a 32-bit integer?
Assuming you are talking about erlang:unique_integer(), sure but make sure you are modulo wrapping the value into the desired range and do not simply truncate out the high bits.
You really just want something as close to an incrementing counter here, anything else will cause you problems with potential collisions.
It would be simple to keep the correlation ID in the connection-handling process’s state, but for optimisation purposes, I want to assign it in the caller.
Depends of course on the use case, but TCP connections are really cheap and you will get the advantages of that if one connection becomes saturated (or stalls) and starts to block, your other TCP sessions may continue to be okay.
Scenarios this applies in is multipath networking (especially so over the WAN) as well as the receiver here does have an upper limit on how fast it can slurp the bytes out of a single connection.