SIMD schMIMD. If you want top speed, send the work to the GPU.

The only thing I’ve looked at so far is complex.erl.

The point of a module with an interface that hides the internal representation is that on the INSIDE of the module the representation is NOT hidden.

There is nothing to be gained, and much to be lost, by writing

divide(Z1, Z2) →

new((real(Z1) * real(Z2) + imaginary(Z1) * imaginary(Z2))

/ (real(Z2) * real(Z2) + imaginary(Z2) * imaginary(Z2)),

(imaginary(Z1) * real(Z2) - real(Z1) * imaginary(Z2))

/ (real(Z2) * real(Z2) + imaginary(Z2) * imaginary(Z2))).

when you could have written

divide({R1,I1}, {R2,I2}) →

D = R2*R2 + I2*I2,

{(R1*R2 + I1*I2)/D, (R2*I1 - R1*I2)/D}.

Which one is easier to read?

Which one is easier to check?

Which one is easier to see the numerical problems in?

In fact complex division is a notoriously tricky problem to get right. If you are working with exact rational numbers, no worries. If you are working with floating-point numbers, you have to worry about overflow, underflow, and cancellation.

In my Smalltalk library, complex division uses (a minor variant of) the Smith algorithm, taking about two dozen lines of terse code (single-letter variables). It also remarks that the code should be replaced by the more robust BaudThin+Smith algorithm.

There are also problems with complex multiplication.

Technical Report OUCS-2017-07

Complex Arithmetic is Complex

Otago University Computer Science

goes into some but not all of the details. The multiplication algorithm given there works for IEEE arithmetic, but infuriatingly, not in Erlang.

But the thing that really vexed me was

equal(Z1, Z2) →

erlang:abs(real(Z1) - real(Z2)) < 0.005

andalso erlang:abs(imaginary(Z1) - imaginary(Z2)) < 0.005.

This is not equality. It isn’t remotely equality. So why call it equal/2? The magic number 0.005 has no place in a module called ‘complex’ that is not documented as tied to a specific application. What you want here is

l_infinity_distance({R1,I1}, {R2,I2}) →

max(abs(R1-R2), abs(I1-I2)).

then in the caller, instead of

equal(Z1, Z2)

use

l_infinity_distance(Z1, Z2) < 0.005

because it is only in the caller that we can tell what value to use for the bound.

Now let’s stop and think about something else.

Memory.

Suppose we are dealing with floating-point numbers.

32-bit machine: 3 words (FLONUM tag, 2 data words)

64-bit machine: 2 words

A 2-tuple costs 3 words either way (going by the BEAM book).

So {1.2,3.4} costs 9 words on a 32-bit machine, 7 on a 64-bit one.

Compare this with 4 words (32-bit) 2 words (64-bit) using C or Ada.

Sticking with floats for now, the only way to reduce the space overhead (and thus the time overhead) is to eliminate the tuples.

-define(ADD(Rr,Ir, R1,I1, R2,I2), (Rr = R1+R2, Ir = I1+I2)).

-define(SQUARE(Rr,Ir, R,I), (Rr = R*R - I*I, Ir = 2*R*I)).

-define(NORMSQ(R,I), (R*R+I*I)).

…

and instead of

escape(_, _, Limit, It) when It >= Limit →

It;

escape(Z, C, Limit, It) →

case complex:norm_sqr(Z) < 2.0 of

false →

It;

true →

escape(complex:add(

complex:mul(Z, Z), C),

C,

Limit,

It + 1)

end.

write

escape(R, I, Rc, Ic, Limit, It) →

if It >= Limit → It

; ?NORMSQ(R, I) >= 2.0 → It

; true → ?SQUARE(Rs,Is, R,I),

?ADD(Rx,Ix, Rs,Is, Rc,Ic),

escape(Rx, Ix, Rc, Ic, Limit, It+1)

end.

Oh look, you don’t actually NEED complex division or general complex multiplication. Good.

The next place I’d look to be saving memory is that you are calculating the whole picture and then rendering it. That means building up a giant data structure in memory that you have no actual use for. Switching to calculating-as-you-render is easy because the pixel calculations are independent. (This is why the task shines using SIMD and GPU support. It’s trivially parallelisable.)

Finally, I’d ask whether you really need to use floats.