In Erlang/OTP 27, +0.0 will no longer be exactly equal to -0.0

bjorng · May 5, 2023, 8:20am

Currently, the floating point numbers 0.0 and -0.0 have distinct internal representations. That can be seen if they are converted to binaries:

1> <<0.0/float>>.
<<0,0,0,0,0,0,0,0>>
2> <<-0.0/float>>.
<<128,0,0,0,0,0,0,0>>

However, when they are matched against each other or compared using the =:= operator, they are considered to be equal. Thus, 0.0 =:= -0.0 currently returns true.

@nzok considers this behaviour to be a bug. We also consider it a bug, but until recently we were not sure that fixing and introducing an incompatibility would be be worth it.

A recent bug found by erlfuzz made us reconsider. An optimization in the compiler to share identical code would consider the code for the two clauses in this function to be identical:

f(_V0, _V0) ->
    -0.0;
f(_, _) ->
    0.0.

and essentially rewrite it to:

f(_, _) ->
    0.0.

To fix this optimization when =:= considers 0.0 and -0.0 to be equal would be cumbersome and could make the compiler slower. It also likely that other optimizations in the compiler could be affected and would have to be fixed in similarly cumbersome ways.

Therefore, the OTP Technical Board decided that in Erlang/OTP 27, we will change +0.0 =:= -0.0 so that it will return false, and matching positive and negative 0.0 against each other will also fail. When used as map keys, 0.0 and -0.0 will be considered to be distinct.

The == operator will continue to return true for 0.0 == -0.0.

To help to find code that might need to be revised, in OTP 27 there will be a new compiler warning when matching against 0.0 or comparing to that value using the =:= operator. The warning can be suppressed by matching against +0.0 instead of 0.0.

We also plan to introduce the same warning in OTP 26.1, but by default it will be disabled. Anyone that suspect they have code that might be affected can turn on that warning in OTP 26.1.

eiji7 · May 6, 2023, 12:54pm

Interesting … for people not in topic could you also please describe (or link to some article) why <<-0.0/float>> is represented as <<128,0,0,0,0,0,0,0>>? I think that many people and especially newbies would be confused and assume that -0.0 and +0.0 is always seen the same i.e. 0.0.

If I understand correctly regardless of value (0.0, 12.34, -98.76, …) first bit of every float is 0 (positive) or 128 (negative) and Erlang previously saw that like:

defmodule Example do
  def negative?(<<128, _data::8*7>>), do: true
  def negative?(<<_sign, _data::8*7>>), do: false

  def positive?(<<0, _data::8*7>>), do: true
  def positive?(<<_sign, _data::8*7>>), do: false

  def same?(<<_first_sign, 0::8*7>>, <<_second_sign, 0::8*7>>), do: true
  def same?(same, same), do: true
  def same?(_left, _right), do: false
end

and the change is like about removing first clause of Example.same?/2 function, right?

bjorng · May 6, 2023, 1:52pm

Here is a link to Wikipedia’s description:

Yes, the most significant bit is the sign bit. Your example have the wrong sizes for the segments, but it does seem that you have correctly understood both the current behavior and the new behavior. Here is a corrected Erlang version of your same?/2 function:

is_same(<<_:1, 0:63>>, <<_:1, 0:63>>) -> true;
is_same(Same, Same) -> true;
is_same(_, _) -> false.

eiji7 · May 6, 2023, 2:21pm

Since 8-bit 0 is 00000000 and 128 is 10000000 both examples are good. The <<0>> is <<0:8>> and <<128>> is <<1:1, 0:7>>

1> <<0, 0:56>> =:= <<0.0/float>>.             
true
2> <<128, 0:56>> =:= <<-0.0/float>>.
true
3> <<0:1, 0:63>> =:= <<0.0/float>>. 
true
4> <<1:1, 0:63>> =:= <<-0.0/float>>.  
true
5> <<0, 0:56>> == <<0:1, 0:63>>.
true
6> <<128, 0:56>> == <<1:1, 0:63>>.
true

For those who do not understand:

My example have:
a) 8 bits of 0 + 56 bits of 0 (<<0, 0:56>> or <<0, 0::56>> in Elixir)
b) 8 bits of 128 + 56 bits of 0 (<<128, 0:56>> or <<128, 0::56>> in Elixir)
Your example have:
a) 1 bit of 0 and 63 bits of 0 (<<0:1, 0:63>> or <<0::1, 0::63>> in Elixir)
b) 1 bit of 1 and 63 bits of 0 (<<1:1, 0:63>> or <<1::1, 0::63>> in Elixir)

So:

1> <<0>> =:= <<0:8>>.
true
2> <<128>> =:= <<1:1, 0:7>>.
true
3> <<0, 0:56>> =:= <<0:1, 0:63>>.
true
4> <<128, 0:56>> =:= <<1:1, 0:7, 0:56>>.
true
5> <<1:1, 0:7, 0:56>> =:= <<1:1, 0:63>>.
true

Also this article may be helpful especially for new developers:

jhogberg · May 8, 2023, 6:38am

No, they work for 0.0 but to be picky they drag in another 7 bits that will either be significant (positive? and negative?) or insignificant (same?) which will give funny results for other numbers. For example, -2.0 would be considered non-negative by negative?, and equal to 2.0 according to same?.

eiji7 · May 8, 2023, 8:28am

ok, now I understand why my code was wrong

I was thinking in good way, but I assumed that all first 8 bits are used to store a sign. It’s why I asked if it would be always 0 or 128. That’s obvious waste of memory, the only relevant is the first bit i.e. <<relevant::1, rest::63>>.

The image says more than thousands words!

s3cur3 · May 10, 2023, 12:26am

I’m not a floating point expert, but, per the Comparison section on the Wikipedia page on signed zero, won’t this break IEE 754 compatibility?

msheldon · May 10, 2023, 2:59am

I’m also not a floating point expert. I’d be interested to hear from the core team on this.

According to https://irem.univ-reunion.fr/IMG/pdf/ieee-754-2008.pdf, “Comparisons shall ignore the sign of zero (so +0 = −0).” So, the standard does indeed say this.

I suspect the idea is that == will still show them equal, and that satisfies the constraint that " negative zero and positive zero should compare as equal with the usual (numerical) comparison operators, like the == operators of C and Java," as the Wikipedia page says. Though Erlang’s == also equates integer 0 and floating point 0.

Erlang’s exact equality operator =:= will distinguish them, but maybe that is regarded as outside the standard’s requirement?

jhogberg · May 10, 2023, 9:20am

No, arithmetic comparison (==, >, =<, et al) will still consider them equal. They will only be considered unequal in exact comparisons (=:=, =/=) which are an entirely different beast.

IEEE754 distinguishes between 0.0 and -0.0 on purpose because it’s actually useful in many contexts, and people have complained about our behavior of randomly losing (or gaining!) the sign of zero many times. The bug referenced in the first post was just the straw that broke the camel’s back.

Why? Consider the following:

foo(X, Y) when X =:= Y ->
    Y;
foo(... snip ...) ->
    ... snip ...

When X and Y are the exact same it shouldn’t matter whether we return X or Y from the first clause: if the compiler deduces that it’s cheaper to return X it should be free to do so, but 0.0 =:= -0.0 makes that impossible because we could return 0.0 when the user wanted -0.0 or vice versa.

This affects all code in the compiler that has to reason about exact equality. We cannot blindly assume that A and B represent the same term unless we know for certain that they’re not a zero float, so the compiler skips these optimizations when this isn’t the case. This is not much fun since the highly dynamic nature of Erlang means that we rarely have this information, but we can live with it.

So far so good, right? We know about this problem and have taken measures to combat it. This is where the linked bug comes in. In our SSA form the code looks like this:

function `t`:`f`(_0, _1) {
0: %% Basic block 0
  @ssa_bool = bif:'=:=' _1, _0
  br @ssa_bool, ^4, ^3

4: %% Basic block 4
  ret `-0.0`

3:
  ret `0.0`
}

We have an optimization for sharing basic blocks that more or less just checks whether two blocks contain the same instructions and de-duplicates them if they do. Because 0.0 =:= -0.0 the return instruction in blocks 3 and 4 are considered equal and are thus shared, returning the wrong result.

To make this respect the sign bit of 0.0 we would have to implement our own comparisons manually to recursively dig into arbitrarily complex terms and treat 0.0 and -0.0 differently. This is not very difficult per se, but the rub is that we did not expect this to be a problem here. To solve it in this manner we would have to identify all the places in the compiler where this is necessary, which would quickly become a game of whack-a-mole that we’re sure would cost a lot of precious time and risks introducing bugs on its own.

As Björn mentioned we’ve considered -0.0 =:= 0.0 to be a bug for quite some time but just haven’t found it big enough of a deal to risk changing until now. People are using floats far more often now than they used to on our platform (the recently launched Elixir Nx being one big user) so we can’t just stick our heads in the sand and pretend it’s not an issue anymore. We’re faced with the options of:

Giving up all attempts to distinguish 0.0 and -0.0, throwing everyone who needs that distinction under the bus.
Having the compiler team bend over backwards trying to make the compiler 0.0-aware everywhere. We are just two people who split our time on many other things as well so we’d rather not have to do this.
Changing exact comparisons (=:=) to consider them different, keeping arithmetic comparison the same.

As we consider the behavior itself a bug and it’s not only the compiler that’s affected, we’re attempting option 3 in the hopes that it won’t cause too many problems elsewhere. If it does we’ll have to backtrack before the OTP 27 release (or perhaps punt it to OTP 28 or even later).

Yes, == and friends will work just as they have before. It’s only exact comparisons that will change.

This is a rather annoying wrinkle: people who have used X =:= Y as both an arithmetic equality and implicit type test cannot blindly rewrite their code to X == Y, and will have to figure out whether the type test is unnecessary or rewrite it as X == Y andalso (is_float(X) == is_float(Y)) or similar.

Judging by a survey of the float-heaviest code bases we know of we don’t expect this to become a problem, but you never know, and we’re well aware that we may have to change our course after the release candidates for OTP 27. We’re making noise now in the hopes that we’ll catch these issues early.

s3cur3 · May 10, 2023, 10:20am

Ahh, understood. I was getting tripped up on the exact-equals. I really appreciate the detailed explanation.

msheldon · May 10, 2023, 5:37pm

Thank you! This is very helpful and interesting.

I am not heavily impacted personally, but I think the long lead time is a good idea.

I’ve probably used =:= or =/= as an implicit type test as you mention. I also give an introductory homework assignment that asks students to solve the quadratic equation. Normal solutions, including mine, pattern match on 0.0 for the coefficient of the x^2 term.

I assume this kind of code will have to go from

f(0.0) -> %...

to

f(X) when X == 0.0 -> %...

(or separate positive and negative 0.0 or do explicit type checks if that matters).

This will also arise in unit tests, because ?assertEqual(0.0, StudentAnswer) currently succeeds for both positive and negative 0.0 and will fail in the future. My unit tests for the above mentioned homework will require some changes (to use, e. g., ?assert(StudentAnswer == 0.0))

For me, it’s probably just a handful of relatively short files to look through, but I can imagine it will take others a fair amount of time and testing.

Again, thank you so much for the clarifications!

asabil · May 10, 2023, 9:04pm

Out of curiosity, would it make sense to introduce a feature in OTP 26.x maybe to preemptively enable this change?

jhogberg · May 11, 2023, 6:22am

This seems to have blown up elsewhere on the internet (hi Hacker News!) and a lot of people misunderstand what this change means. Since I can’t reply everywhere I’ll try to explain it here in a manner that hopefully makes more sense for people who are unfamiliar with Erlang:

Like Prolog before us we only have one data type, terms. This means that the language is practically untyped. There are only terms and while you can certainly categorize them if you wish, there are no types in the sense most people use that term.

Functions have a domain over which terms they operate, and going outside their domain results in an exception that’s often analogous to a type error (for example trying to add a list to an integer) which can be mistaken for having traditional types. To make things more confusing, we have functions that can tell you which pre-defined category a term belongs to, like is_atom returning true for all terms that are in the atom space and false for all terms outside of it.

This mixup is so prevalent that even our documentation refers to these pre-defined categories as “types” despite them being nothing more than value spaces, but it’s important to remember that at the end of the day we only have one data type, and that many functions are defined for all terms.

The arithmetic equality operator (==) returns whether two terms are considered arithmetically equal (for clarity it’s defined for all combinations of terms). Primitive non-numeric terms are simply checked for identity, compound terms are compared recursively, and any numeric terms (floats and integers) are compared according to the rules listed in the documentation. This operator will remain unchanged in the future and 0.0 == -0.0 will continue to hold.

This operator covers just about all common uses, but is not enough for code that needs to reason about terms in the general sense, for example sets, memoization, or other things you would use generics for in another language.

For that we have the exact equality operator (=:=) which returns whether two terms are indistinguishable. That is for any terms X and Y, X =:= Y returns whether f(X) =:= f(Y) for all pure functions f.

Since there exists several f that distinguish f(0.0) from f(-0.0), we either have to conclude that those functions are broken (where does that leave copysign?) or say that 0.0 =:= -0.0 should not return true.

We could make it consistent by removing all the things that let us observe the difference, but that removes functionality that people rely on so it’s not much of an option. So what we’ve done so far is to try to sweep these differences deep under the rug and hoping no one notices, one small patch at a time.

Since we’ve never exposed copysign or the other IEEE functions that allow you to observe the sign of zero, it has kind-of-sort-of worked as long as people stayed entirely within Erlang-land. Unfortunately with the rising popularity of GPGPU in general and the Nx library in particular, this bug has been rearing its ugly head with increasing regularity.

Not only because the compiler flubs the constants 0.0 and -0.0 like in the examples earlier in the thread, but because application code does the same. If you want to memoize the result of the aforementioned f(0.0) it will be confused with f(-0.0) unless you invoke some arcane nonsense to keep them apart.

The compiler is just where this bug is most visible. While we can certainly try to make the compiler distinguish between these values at all times there’s nothing we can do for application code, hence our attempt at breaking backwards compatibility. If it fails we’re most likely going to have to throw in the towel on this bug.

Yes.

The compiler will raise a warning whenever 0.0 is used in this manner, so our hope is that it will not be too difficult to find where to change the code.

Yes, we’ll look into it.

Wondersye · September 22, 2023, 9:11am

Hi,

What would then be the idiomatic way to match both types of zero (as with previous Erlang versions) in expressions like:
[ [ 1.0, 0.0 ], [ 0.0, 1.0 ] ] = matrix:identity( 2 )?

With 26.1, compiler reports:

matrix_test.erl:60:11: matching on the float 0.0 will no longer also match -0.0 in OTP 27. If you specifically intend to match 0.0 alone, write +0.0 instead.
%   60|     [ [ 1.0, 0.0 ], [ 0.0, 1.0 ] ] = matrix:identity( 2 ),

Even [ [ 1.0, Z ], [ Z, 1.0 ] ] when Z == 0.0 would not be that readable (moreover it cannot be a LHS, and would not bear the same semantics, as the two zeros can have different signs and would thus not match a single Z)

Thanks in advance for any hint!

Best regards,

Olivier.

nzok · September 22, 2023, 11:41am

I respectfully suggest that you DON’T want to match both kinds of zero in
[ [ 1.0, 0.0 ], [ 0.0, 1.0 ] ] = matrix:identity( 2 )

because considered as matrices,
[[1.0, 0.0],[ 0.0,1.0]] and
[[1.0,-0.0],[-0.0,1.0]]
are DIFFERENT (and the two variants obtained by flipping just
one sign give you four matrices with different behaviour).
In particular, only the variant with positive zeros is in fact
the identity matrix.

There’s a rule of thumb that we teach beginning programmers:
NEVER COMPARE FLOATING-POINT NUMBERS FOR EQUALITY.

When your program might have to run on machines with base-2,
base-8, or base-16 arithmetic, which might have no guard
digits, some guard digits, or enough guard digits, which
might treat -0.0 as different from +0.0, numerically equal,
or a run-time error, that was essential. To this day,
when people get upset that (1.0/49.0)*49.0 =:= 1.0 is false
(which it is), this advice is worth taking as a starting point.

There are situations, especially in IEEE/IEC arithmetic,
where it makes sense to compare floating-point numbers for
equality. Like avoiding division by zero (which IEEE arithmetic
was designed to make safe, oddly enough.) But in general it is
the kind of thing which should make you stop and think very
carefully, and write down your reasoning in exceptionally clear
comments.

Myself, I’d like an option to warn whenever a pattern (in a
function head, or a case clause, or a match expression)
contains a floating-point literal, and I’d like it enabled by
default.

nzok · September 22, 2023, 11:49am

A further thought on the identity matrix.
Why do you want -0.0 and +0.0 to match but neither of them to match 0?
What if matrix:identity(2) returns [[1,0],[0,1]], which is arguably
the most useful result for it to return?

If you want numeric equality, you’re going to have to use it.
is_identity2([[A,B],[C,D]]) →
A == 1 andalso B == 0 andalso C == 0 andalso D == 1.
There are 36 combinations that make this true…

raimo · September 22, 2023, 11:58am

[ [ 1.0, Z1], [ Z2, 1.0] ] when is_float(Z1), is_float(Z2), Z1 == 0.0, Z2 == 0.0 ->
would be semantically equivalent to the old behaviour.

But as @nzok says, comparing floating point numbers for equality, or even worse, matching them, is probably the wrong thing to do.

It is, for example not unlikely that your code one day will need a rounding delta within which floating point values are considered equal enough. So it might be a good idea to already now write a helper function that tests for the identity matrix, and hides the gory details of what is currently the right thing to do.

Wondersye · September 22, 2023, 2:03pm

Hello Richard, hello Raimo,

Thanks for your answers. I wholeheartedly agree with the fact that in general terms we should not compare floating-point values for strict equality. This is why in the general case I use [1], which boils down to relying on the absolute [2] if not relative [3] difference between each pair of coordinates, based on its absolute value and on an epsilon that is either default or user-specified.
Yet for a unit test I was preferring strict equality to be enforced, as I supposed that zero, one and the identity matrix were well-defined and should just thus exactly match with constants expressed the same way. Moreover I would have felt that otherwise I was testing simultaneously are_equal/2 and identity/1. But I agree that this is a corner case and I can use are_equal/2 there as well.

@Richard: on whether integer coordinates should be allowed (regarding test against 0), I would prefer letting the user specify them as number() coordinates indeed, yet afterwards, for simplicity, handling only float()-based matrices after a construction-like phase (with a new/1 operator, from user matrices to “internal” ones). It may be a tad larger in memory but, at least for matrices of statically-known dimensions (e.g. 4), we can introduce plain matrices, compact (homogeneous) ones and identity_4, manage them in the operators of interest (as in [4]) and possibly spare quite many bytes and CPU cycles. Depending on the data source, most if not all may be already float() anyway.

Thanks to both of you for your guidance!

Olivier.

[1] https://github.com/Olivier-Boudeville/Ceylan-Myriad/blob/9295df8c06455de4bde744d713aada4d6d42e977/src/maths/matrix.erl#L612
[2] https://github.com/Olivier-Boudeville/Ceylan-Myriad/blob/9295df8c06455de4bde744d713aada4d6d42e977/src/maths/math_utils.erl#L534
[3] https://github.com/Olivier-Boudeville/Ceylan-Myriad/blob/9295df8c06455de4bde744d713aada4d6d42e977/src/maths/math_utils.erl#L592
[4] https://github.com/Olivier-Boudeville/Ceylan-Myriad/blob/9295df8c06455de4bde744d713aada4d6d42e977/src/maths/matrix4.erl#L88

raimo · September 22, 2023, 5:09pm

On a side note / anecdote: for the Wings3D project I once wrote a small matrix library subset optimized for sparse matrices where I made use of the fact that float() and integer() are different types. A column was represented as a list of float() and integer() where an integer() meant that number of zero rows (±0.0s).

That representation saves a lot of memory when most columns in a row are zero, and many matrix operations can quickly skip over a bunch of zeros and thus avoid floating point multiplications.