fib2 uses a lot more function calls. fib1(10) calls fib1 about 12 times if I counted properly (fib1(10), fib1(10, 0, 1), fib1(9, 1, 1), fib1(8, 1, 2), etc.) while fib2 calls fib2(10), fib2(9), fib2(8), fib2(8), fib2(7), fib2(7), fib2(6), etc.). The difference is in the algorithm, not in tail or non-tail recursion.
Since you did this from the shell, you have measured the interpreter, not compiled code.
Since the time for fib1(50) is 0, you seem to have a timer granularity problem. It should not take zero time. You should write code that loops over your test function in a tight loop, and maybe also in linear code (to diminish the loop overhead) to make the time measurable.
IIRC the timer on Windows has (have had) much coarser granularity than on Linux.
Also, the first time you run a function like this, it causes garbage collections that expand the heap and warm up caches. The second and later runs are therefore often much faster and in general the number you are interested in.
There are existing benchmark frameworks that do all this for you. I have temporarily forgotten names…