Questions about performance differences between recursion and tail recursion

Here is a Fibonacci code implementation, divided into recursion and tail recursion

-module(fib).

-compile(export_all).

%% Tail recursion
fib1(N) -> fib1(N, 0, 1).
fib1(0, A, _) -> A;
fib1(N, A, B) -> fib1(N - 1, B, A + B).

%% Recursion
fib2(0) -> 0;
fib2(1) -> 1;
fib2(N) -> fib2(N - 1) + fib2(N - 2).

Run two functions

Erlang/OTP 25 [erts-13.1.2] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1][jit:ns]

Eshell V13.1.2  (abort with ^G)
1> c(fib).
{ok, fib}
2> tiemr:tc(fun() -> fib:fib1(50) end).
{0,12586269025}
3> tiemr:tc(fun() -> fib:fib2(50) end).
{132362547,12586269025}
4>

Why is the performance of the two functions so much worse and i also see no significant change in fib2 memory at run time. Is there any optimization?

On Windwos 10 LTSC

1 Like

fib2 uses a lot more function calls. fib1(10) calls fib1 about 12 times if I counted properly (fib1(10), fib1(10, 0, 1), fib1(9, 1, 1), fib1(8, 1, 2), etc.) while fib2 calls fib2(10), fib2(9), fib2(8), fib2(8), fib2(7), fib2(7), fib2(6), etc.). The difference is in the algorithm, not in tail or non-tail recursion.

3 Likes

You are right. I’m such a fool

2 Likes

Since you did this from the shell, you have measured the interpreter, not compiled code.

Since the time for fib1(50) is 0, you seem to have a timer granularity problem. It should not take zero time. You should write code that loops over your test function in a tight loop, and maybe also in linear code (to diminish the loop overhead) to make the time measurable.

IIRC the timer on Windows has (have had) much coarser granularity than on Linux.

Also, the first time you run a function like this, it causes garbage collections that expand the heap and warm up caches. The second and later runs are therefore often much faster and in general the number you are interested in.

There are existing benchmark frameworks that do all this for you. I have temporarily forgotten names…

4 Likes

erlperf

4 Likes