Disabling TCO selectively

Wondersye · December 7, 2023, 3:49pm

Hi,

TCO (Tail-Call Optimization) shouldn’t (and cannot) be disabled as a whole, due to how central recursive functions are. That’s OK.

However in some cases TCO leads to ellipses in the stacktraces that make debugging way harder than it should be (e.g. the frame “jumps” from an entry point in one’s code to the failing code deep in a given library, knowing that in-between dozens of calls made from one’s code to that library may have led to such an error). Good luck to easily find which.

I do not think that there are compile flags to disable TCO on a per-module/function basis, but apparently wrapping one’s potentially-guilty code in a try/catch clause as a whole (at least extending to the function return; just for the sake of debugging) is sufficient to disable TCO for that function, and to add back in the stacktrace the offending code, so that we at last obtain a relevant line number in that function to scrutinize. It seems to me a neat (temporary) trick.

But maybe there are simpler/cleaner/less invasive ways to do the same?

Thanks in advance for any hint,

Olivier.

jhogberg · December 7, 2023, 4:02pm

Return-tracing the suspicious functions temporarily disables TCO while they’re being traced, and you can limit this to a specific process or certain conditions (match specs) if you wish.

https://www.erlang.org/doc/man/dbg.html

mikpe · December 7, 2023, 6:41pm

Wrapping the tail-call in a call to an identity function should disable TCO at that point, unless the compiler gets too clever.

Wondersye · December 7, 2023, 9:04pm

Hello John,

Thanks for your answer. This must be indeed a very flexible option (that I have yet to master! [1]).

It is puzzling that just by adding runtime traces one can change how the already-compiled code behaves (from LCO to non-LCO [2]). Does it mean that tracing involves some kind of in-RAM rewriting/transformation/insertion within the VM opcodes corresponding to the functions of a loaded module? Is there some high-level description of how tracing has been implemented?

@mikpe: you are right, feeding the resulting value of the function of interest to a cross-module call to an identity function does the trick: the stacktrace gained the lacking levels. Thanks as well! Moreover it is more convenient than wrapping suspicious code with a try ... catch E -> throw(E) end.

We could even imagine a parse transform that, for the given modules to debug, would automatically add to each of their functions a call to such an identity function.

Best regards,

Olivier.

[1] Knowing that all modules are compiled with debug_info and that the suspicious function is in a foobar module, I thought I could spy on the process of interest by spawning from it an “inspector” process that would execute:

	erlang:trace(PidOfSpawner, _EnableTracing=true, [exiting, call, return_to]),
	erlang:trace_pattern({ 'foobar', '_', '_' }, true, [local]),
	follow_traces().

follow_traces() ->
	receive

		M ->
			io:format( "[tracer] ~p.", [ M ] ),
			follow_traces()

	end.

yet for some reason this process does not seem to receive any trace, I must have made a mistake somewhere.

[2] I guess I should have called it LCO (Last Call Optimisation), as in this case this is not a recursive function but functions calling other functions (TCO being a special case of LCO, as I understand).

nzok · December 8, 2023, 1:31am

This reminds me of when I did a lot of work in Prolog.
Compiled code did last call optimisation.
Tracing and debugging did not.
This led to the opposite inconvenience when debugging, that the stack could fill up with thousands upon thousands of calls to the same “function”.
It strikes me that there is a middle ground which might be useful in some (but not all) cases, and that is switching off last call optimisation when the caller and callee have different names or are in different modules. That would be analogous to debugging a procedural language, where loops do not show up on the stack but calls do.

A parse transform wrapping non-self tail calls sounds like a good way to do that.

What I found was that adding assertions to the code and doing more checks earlier helped more.

jhogberg · December 8, 2023, 8:56am

It gets less puzzling if you look at last call optimization as an implementation detail. In the absence of tracing, stack traces, running out of memory, and so on there is no way to know whether it has been applied or not – it’s not “observable” from regular code. An Erlang implementation for Magical Unicorn Machines™ with infinite memory could freely skip last call optimization without affecting the semantics of the code (modulo user-defined process heap limits and so on).

Return tracing gives you the return value of every function call, tail called or not, and the way it does so indirectly disables tail call optimization. The optimization is technically still there, but the tracing pushes an extra stack frame on every call to keep track of returns, countering the optimization.

mikpe · December 8, 2023, 1:02pm

I always get uncomfortable when people refer to this as an “optimization”. The term “optimization” makes it sound like it’s an optional nice-to-have but not essential for correct operation. That’s not true: proper tail-calls are as central to the semantics of affected programming languages (including Scheme, the ML family, and Erlang) as proper arithmetic or proper assignments (if the language has them) are.

jhogberg · December 8, 2023, 1:27pm

I guess it depends on how pedantic you are: the results are the same with and without LCO (modulo running out of memory), so saying it’s an optimization isn’t wrong. That it’s required to make things usable in practice is another matter.

nzok · December 9, 2023, 10:13am

For some programming languages, tail call optimisation is an optional optimisation.
Notably C. gcc, clang, and I believe icc will do tail call optimisation for the languages they support. For gcc and clang the compiler flag is -foptimize-sibling-calls. (In fact my Smalltalk to C compiler has relied on this for a long time; TRO isn’t normally done by Smalltalk systems, but it’s nice to have it.)

When David Warren built last call ‘optimisation’ into Edinburgh Prolog, I’m pretty sure he didn’t pick it up from Scheme. I’m also pretty sure that we all knew about the technique already. Was it from Landin? I know that Pop-2 (in use at Edinburgh long before Prolog) picked up the idea of the ‘J’ function (called ‘jumpout’ in Pop-2) from Landin. The general idea that function call can replace goto came from Landin. Ah, GOT IT. The earliest implementation of TRO that I know of is David Turner’s SASL from 1973. "The only method of iteration

was recursion — the interpreter recognised tail recursion and implemented it efficiently."

So the technique has been around for 50 years.

In that time it has gained too many names.
Generally, I’m all for “the rectification of names.”
But the blackboard in my parents’ house is green and it would be silly to call it a greenboard.
The cupboard by my desk isn’t a board and has never contained cups.

What really worries me is all the people out there on the net (including the designer of a widely used programming language named for a legless reptile) who think that full stack traces are so essential for debugging that TRO should not be done. (Mind you, said designer ALSO thought it was clever to limit function call depth to 30 so that people wouldn’t write deeply recursive code.) I am reminded of one of the many reasons I am so thankful that my first “real” programming experiences were on a B6700 rather than a System/360. I never had to learn how to wade through 3-inch-thick program dumps.

Wondersye · December 9, 2023, 7:06pm

FWIW, I quickly implemented the trick and added it to a base parse transform I routinely use [1], and it seems to work as intended: by default LCO is left enabled (and stacktraces in user code might be ellipsed), but, if adding the -Dmyriad_disable_lco command-line flag when compiling a module, LCO for that module will be disabled (then apparently stacktraces get complete regarding that module, at least as shown by my initial test case - of course at the expense of the memory footprint of recursive functions).
Might be useful - but for debugging purposes only!

[1] https://github.com/Olivier-Boudeville/Ceylan-Myriad/blob/e8ff53a07f3add61c15cd5b89a1bdfdf2ebf2fa1/src/meta/myriad_parse_transform.erl#L1250

Wondersye · December 10, 2023, 9:26pm

Hi,

Still on this subject: in a given module m, I have an (exported) function f that ends up with a call to a local function g, whose single clause ends up itself with a remote call.

So, as I understand it, by default LCO should be applied within f’s body, but not within the one of g.

I could check that the aforementioned parse-transform auto-protected the call to g done from f (it wrapped it in a remote identity call; I could even add such an identity call directly in the code of f - this changed nothing).

Yet, when executing m:f from a test case, in the course of the evaluation of g a call to a library is known to fail.

The problem is that then the call to f appears indeed in the stacktrace, but the one to g does not (whereas it was the most interesting one, as a line number in that function would point to the actual call to the library that was the actual origin of the error - g could be making many of them).

This happens whether or not g is exported. m is not compiled with fancy options or defines.

Am I making a silly mistake or assumption here? Are there other reasons that would result in an aggressive inlining where g could disappear in f? How could I force the presence of g in the stacktrace?

Thanks in advance for any hint!

Best regards,

Olivier.

mikpe · December 11, 2023, 5:28pm

TCO / LCO / Proper tail-calls isn’t disabled or enabled based on whether a function is exported or not.

Wondersye · December 12, 2023, 8:34pm

Yes, I agree, I thought that maybe any unintended extra inlining of g in f could have been prevented if g had been exported.

Anyway I tried to minimise a test case to rule out any mistake/confusing setting.

If compiling with erlc small_test.erl the following test module:

-module(small_test).
-export([f/0]).

f() ->
	% Poor man's identity (forcing a remote call to avoid any LCO):
	lists:max([ g() ] ).

g() ->
	xxxx:x().

we have:

> erl -pa . -eval 'small_test:f()'
Erlang/OTP 26 [erts-14.1] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [jit:ns]

Error! Failed to eval: small_test:f()

Runtime terminating during boot ({undef,[{xxxx,x,[],[]},
{small_test,f,0,[{file,"small_test.erl"},{line,6}]},
{erl_eval,do_apply,7,[{file,"erl_eval.erl"},{line,746}]},
{init,start_it,1,[]},{init,start_em,1,[]},{init,do_boot,3,[]}]})

Crash dump is being written to: erl_crash.dump...done

I wish there was a {small_test,g,0,[{file,"small_test.erl"},{line,9}]} term just after the undef one.

No way to have it listed?

Olivier.

PS: without the anti-LCO hack, even the call to f is lacking:


Runtime terminating during boot ({undef,[{xxxx,x,[],[]},
{erl_eval,do_apply,7,[{file,"erl_eval.erl"},{line,746}]},{init,start_it,1,[]},
{init,start_em,1,[]},{init,do_boot,3,[]}]})