In Erlang/OTP 27, +0.0 will no longer be exactly equal to -0.0

jhogberg · May 11, 2023, 6:22am

This seems to have blown up elsewhere on the internet (hi Hacker News!) and a lot of people misunderstand what this change means. Since I can’t reply everywhere I’ll try to explain it here in a manner that hopefully makes more sense for people who are unfamiliar with Erlang:

Like Prolog before us we only have one data type, terms. This means that the language is practically untyped. There are only terms and while you can certainly categorize them if you wish, there are no types in the sense most people use that term.

Functions have a domain over which terms they operate, and going outside their domain results in an exception that’s often analogous to a type error (for example trying to add a list to an integer) which can be mistaken for having traditional types. To make things more confusing, we have functions that can tell you which pre-defined category a term belongs to, like is_atom returning true for all terms that are in the atom space and false for all terms outside of it.

This mixup is so prevalent that even our documentation refers to these pre-defined categories as “types” despite them being nothing more than value spaces, but it’s important to remember that at the end of the day we only have one data type, and that many functions are defined for all terms.

The arithmetic equality operator (==) returns whether two terms are considered arithmetically equal (for clarity it’s defined for all combinations of terms). Primitive non-numeric terms are simply checked for identity, compound terms are compared recursively, and any numeric terms (floats and integers) are compared according to the rules listed in the documentation. This operator will remain unchanged in the future and 0.0 == -0.0 will continue to hold.

This operator covers just about all common uses, but is not enough for code that needs to reason about terms in the general sense, for example sets, memoization, or other things you would use generics for in another language.

For that we have the exact equality operator (=:=) which returns whether two terms are indistinguishable. That is for any terms X and Y, X =:= Y returns whether f(X) =:= f(Y) for all pure functions f.

Since there exists several f that distinguish f(0.0) from f(-0.0), we either have to conclude that those functions are broken (where does that leave copysign?) or say that 0.0 =:= -0.0 should not return true.

We could make it consistent by removing all the things that let us observe the difference, but that removes functionality that people rely on so it’s not much of an option. So what we’ve done so far is to try to sweep these differences deep under the rug and hoping no one notices, one small patch at a time.

Since we’ve never exposed copysign or the other IEEE functions that allow you to observe the sign of zero, it has kind-of-sort-of worked as long as people stayed entirely within Erlang-land. Unfortunately with the rising popularity of GPGPU in general and the Nx library in particular, this bug has been rearing its ugly head with increasing regularity.

Not only because the compiler flubs the constants 0.0 and -0.0 like in the examples earlier in the thread, but because application code does the same. If you want to memoize the result of the aforementioned f(0.0) it will be confused with f(-0.0) unless you invoke some arcane nonsense to keep them apart.

The compiler is just where this bug is most visible. While we can certainly try to make the compiler distinguish between these values at all times there’s nothing we can do for application code, hence our attempt at breaking backwards compatibility. If it fails we’re most likely going to have to throw in the towel on this bug.

Yes.

The compiler will raise a warning whenever 0.0 is used in this manner, so our hope is that it will not be too difficult to find where to change the code.

Yes, we’ll look into it.