Timer precision - can we improve it?

dominic · July 11, 2022, 2:41am

Is this?
or you are meaning use a signle process to manager all timer?

SisMaker · July 11, 2022, 2:50am

no no， i am not mean that. In a virtual machine, it might be all timers managed using a timewheel that sends a timeout message to the target process when the timer is triggered.
If you enable the timer in process inner, the timeout message is sent directly to that process. If you add a global timer, the timeout message is sent first to the global timer process and then to your target process.

dominic · July 11, 2022, 3:01am

ok, I got it, I using erlang module to do it too and I edit my reply
but the point I want to discuss is that if heavy load is normal?

BTW, I write game server too. Are u work in GZ?

SisMaker · July 11, 2022, 4:24am

what’s it mean? I don’t get it.

i am in CD

dominic · July 11, 2022, 5:48am

from this reply

juhlig · July 11, 2022, 9:11am

What generates heavy load for the timer server really depends on what you are doing, and if you are doing with the old (≤ OTP 24) or new (≥ OTP 25) implementation.

Both the old and new timer implementations are based on a single gen_server process, yes, but in the new implementation that single process does much less work.

The old timer was doing the timing itself via gen_server timeouts, and look into its timer table to find out what to do, then create a new timeout. This timeout would also be interrupted by adding and cancelling timers, upon which the timeout would have to be recalculated. With many short-lived timers, timers being created and cancelled often, destination processes exiting often etc, the timer server would become quite stressed, timers would start running late, and catching up on them would put even more stress on the timer server.

The new timer does not concern itself with the task of timing at all: when a timer is created, it just starts an erlang timer to be notified when it is time to perform the requested action (which is done fast, as it is either a send operation, or an apply which is done via spawn(M, F, A), ie asynchronously in a separate process). Apart from that, it just keeps a table of timers it knows, to be able to eventually cancel them. Creating and cancelling timers does not disturb the process of timing. Most of the work has been moved into the client process, too, and some operations may even bypass the timer server altogether, namely send_after and apply_after with zero timeouts, and send_after with non-zero timeouts if the destination is a local pid. In a nutshell, even if you pile up a considerable number of timers, the timer will work pretty steady and not flinch (much).

Maria-12648430 · July 11, 2022, 9:47am

There is something to be said about interval timers, especially with short intervals. Maybe I should write a warning about that in the timer docs…

The code to be executed (directly or triggered via a message to a running process) should, at least on average, be able to complete well within the interval, otherwise processes or messages will be piling up.

Assume timer:apply_interval(1, erlang, sleep, 1000) for example. That is, every millisecond, start a process (apply is done by ways of spawn) that takes 1000ms to complete. When it triggers for the first time, you have one process with 1000ms left to run. When it triggers the second time you have a process with 1000ms and one with 999ms left to run. When it triggers the third time you have a process with 1000ms, one with 999ms and one with 998ms left to run, and so on. Over time, you will get up to 1000 processes running that code, and only then it will level out, as for each new process being started the then oldest will finish, and vice versa.

timer:send_interval is arguably even more dangerous. If you do timer:send_interval(1, SomeProcess, do_something) to send a message to SomeProcess every millisecond, and assume SomeProcess takes 1000ms to process it, messages will be piling up in SomeProcess’ message queue quickly. When it has processed the first message, 999 more will already be waiting in it’s message queue. When it has processed the second, there will be 1998 waiting, and so on. This will never level out.

dominic · July 11, 2022, 10:16am

Agree!

bad example, I’m sorry!

for send_after it’s best
but if I use some function implementations by {apply_once, {Started, Time, MFA}} or {apply_interval, {Started, Time, Pid, MFA}}(code in handle_call), it maybe get some problem, for example:
when we got Phoenix to 2 million connections
and

do_apply({M,F,A}) is quick, but the server maybe need to handle many msg in a short time

juhlig · July 11, 2022, 11:29am

Well, I would say whether or not the timer module is a good choice for anybodys use case largely depends It is a global (per node, actually), central component, which means that even if it works well for you today, a new or updated dependency may suddenly put heavy load on it without your knowing, and thereby degrade the performance timer has for your actual application

That was referring to the old timer, AFAIK it was never tested with the new implementation. And why should they, they have a working solution already, for all I know one that is better suited to their use case.
@josevalim, any comments?

Yes, it is But that is not a fault or in the scope of timer, it’s just how it works (But one should be aware of it… to my knowledge, it is not mentioned anywhere in the docs, though, so go ahead @Maria-12648430 ) If you think this may be a danger to you, then send_interval is not what you want, and you should better roll your own scheme, like set a one-shot timer (via timer:send_after or erlang:send_after), let your process do its task when the time comes, only after it has finished set a new timer.

juhlig · July 11, 2022, 11:30am

FWIW: Improve handling of lagging timers (Suggestion) by juhlig · Pull Request #6145 · erlang/otp · GitHub (Comments welcome)

dominic · July 12, 2022, 2:08am

OMG!

OK, I will forget about it

I think new timer is good enought for me
thanks for your reply

juhlig · July 12, 2022, 6:19am

You’re welcome If you run into issues after all, let us know. There is always room for improvement, and user experience is valuable input

Maria-12648430 · July 13, 2022, 11:39am

I ran some back-of-a-napkin tests, 1,000 interval timers with a 1ms interval. The timers start lagging a little, but not too bad. But something else came up which may be a bit more serious. When I wanted to pile on more timers, starting them took progessively longer. This is because the messages to the timer server to start a new timer enqueue together with the timeout messages coming in from the running timers, which causes delays in the client processes. Same goes for timer cancellation.

While I have to admit that using timers at a scale like I did is quite unlikely, there is that point that @juhlig hinted at: Everything that uses timers will go through the (node-local) timer server (with the exception of when it gets bypassed), no matter if it is code under your control or an external dependency.

I’m not sure how big of a problem this all is, and if it is worth looking into. However, I see two possible solutions, which could even be combined:

The current implementation of the timer actually packs all the things it needs, it does not depend on a named table any more like the old timer did. So, with a few changes, it would be possible to run multiple timer servers as parts of users supervison trees, alongside the global one.
Go multi-process, ie let the timer server do only timer management, and let each timer run in an own dedicated process, hosted under a simple_one_for_one supervisor alongside the timer server. That supervisor may have to be customized, though, otherwise it may become very busy just cleaning itself when many one-shot timers are firing.

SisMaker · July 13, 2022, 12:38pm

The timer can be set directly without sending requests to timer_server ， The gTimer I wrote seems to have none of these problems, setting up quickly and with almost no lag.

the test code:

Blockquote
timer(_, _) →
I = atomics:add_get(persistent_term:get(cnt), 1, 1),
io:format(“IMY******* ~p~n”, [I]) ,
case I of
1000000 →
io:format(“end time ~p ~n”, [erlang:system_time(millisecond)]);
_ →
ignore
end,
ok.
test(N, Time) →
io:format(“start time1 ~p ~n”, [erlang:system_time(millisecond)]),
persistent_term:put(cnt, atomics:new(1, )),
gTimer:startWork(16),
doTest(N, Time).
doTest(0, Time) →
io:format(“start time2 ~p ~n”, [erlang:system_time(millisecond)]),
gTimer:setTimer(rand:uniform(Time), {?MODULE, timer, });
doTest(N, Time) →
gTimer:setTimer(rand:uniform(Time), {?MODULE, timer, }),
doTest(N - 1, Time).

set 1000000 timer, run as:
testMod:test(1000000, MaxOverTime).

juhlig · August 31, 2022, 1:54pm

Just FYI, @Maria-12648430 and I made another PR in order to improve timer more:

github.com/erlang/otp

Improve interval timer handling

erlang:master ← Maria-12648430:timer_improve_interval_timer_handling

opened 02:49PM - 29 Aug 22 UTC

Maria-12648430

+242 -65

This PR was inspired and is competing (to a degree) with #6145. With the curr…ent (old) implementation, the apply messages resultingh from interval timers were handled by the timer server process itself, which may lead to its congestion. Specifically, it has been noticed that, with many short-interval timers running, starting as cancelling of timers takes longer and longer, stalling the calling process. The changes in this PR run each interval timer in a separate process, so their apply messages don't compete with other (start/cancel/etc) messages. This reduces the load of the timer server, and at the same time reduces the tendency of interval timers to lag.

LeonardB · August 31, 2022, 2:18pm

No trying to hijack post, but wanted to ask what is maybe a silly question.

Would it not make sense to have apply_interval treat the interval as a minimum?
IE, time the execution of the applied arguments and calculate the next run

eg;

timer:apply_interval(1000, timer, sleep, [100])
executes the timer:sleep(100) which returns in 102ms
determines next apply should be in 898ms
waits 898ms before applying again

This behavior would hopefully make the application more uniform and prevent the explosion should the interval be too small in relation to the execution time of the applied function

edit:
If the application took longer than the 1000ms the next apply would be immediate (would be nice to have it emit a warning/info message of some kind so that I’d be obvious)

juhlig · August 31, 2022, 3:58pm

What you describe is, in a nutshell, what apply_repeatedly/4 as proposed (later) in the linked PR does, but using the option {abs, true} for the erlang timer, thereby omitting the calculation for the remaining time and preventing timer drift.

apply_interval/4 has been kept as is regarding to not waiting for the execution to complete, for backwards compatibility. Because actually, there is nothing against parallel execution, as long as you’re careful (and a warning has been added to the docs in the same PR, for that reason).

LeonardB · August 31, 2022, 4:12pm

Thanks for the response and explanation. I suppose reading the PR code changes vs skimming the PR comments would have helped