Total term order

Maps have a defined order independent of implementation.
The changed ordering in OTP 26 of atom keys in maps is a totally internal affair. The only visible effect is the undefined traversal order exposed by io:format, maps:iterator and the external format.

4 Likes

We can’t, not without a breaking change. Honestly… maybe something we have to accept, maybe by raising or warning on it. At some point, it is ok to say that old code needs work to compile on modern OTP. We broke old rebar forcing a lot of people to port it to rebar3 a few years ago after all.

To go back to the original question: Map key order, BIF, no new operators, and yes, please break or yell on comparing floats and integers. Erlang is strongly typed, and this particular behavior with floats have been a source of bugs multiple time for me.

1 Like

I’m a little concerned about the phrasing of things here, which may confuse things more than they should have to. The order defined by the operators <, =<, >, >=, ==, and /= is an arithmetic term order (== being arithmetic equality), and it is a total order, precisely because it does not separate integers from floats and considers the integer and float forms of the same number to be the same object. This is great for how humans work with numbers and orderings; both individual numbers as well as tuples of numbers or lists of numbers get the order you expect, and it’s total. Nothing peculiar at all about it.

Only when doing computery things is this often not what you want. In data structures we want fast comparisons on keys, and not have to consider whether we should promote an int to a float just for comparing. We probably do not care about the order between floats and ints, since ints are common as keys, while using floats as keys is rarely a good idea except for things like memoizing, and we typically want to have separate entries for 3 and 3.0. The built-in pattern matching behaves like this, using the exact equality operators =:= and =/=.

It is when we combine < and ´>´ with =:= that we don’t get a total order:

    case X of
        Y -> equal;    # uses =:=
        _ when X > Y -> greater;
        _ when X < Y -> less
    end

if X is 3 and Y is 3.0, none of the clauses match, because we have a mix of operators. What we need is an alternative total order, so we don’t end up with this mix, with the new operators :< and >:. Using these in the above code, one clause would always match, since we’d have integer() < float() (in the map key ordering).

I don’t think the “glitch at zero” solution is something to be sad about. I have seen other functional languages do the exact same thing in order to get a total order on floats. I also think that the maps ordering on keys is quite perfect as it is, and that there is nothing weird in saying that one map is less than some other map, any more than saying one tuple {2, 1} is less than another tuple {2, 2}, given that there is an ordering on the keys.

I think that the candidate named “the exact term order” in the proposal, which would consider an integer to be infinitesimally greater than the corresponding float, would cause too many problems. For one thing, you’d get the unwanted result that converting maps into lists of keys/value pairs would produce a different order than when comparing them as maps. And for several “computery” applications, you’d want integers and floats grouped separately by a sort, so you’d have to implement yet another order anyway on top of this.

5 Likes

Precisely. If the =:= and =/= operators had not been introduced, it would have been “impossible” (we would have to be without is_float() and is_integer() too, and match them as equal) to distinguish between a float == an integer. And now when we have them there is no total order they (and pattern matching) are part of.

I think it is less obvious since a tuple has got a very natural element order numbered 1…N, so defining a sort order is easy. For a data structure intended for lookup access, key order as well as sort order is uninteresting. A sort order has to be constructed/forced, and there are multiple ways with different trade-offs.

If that had been the case, the map key order would have been a stronger contestant for me. But now to sort maps by converting them to simpler terms, since they are sorted on size first then keys before values, I suppose you would have to use something like:

MX =
    lists:unzip(
        lists:sort(
            fun (KV1, KV2) -> element(1, KV1) =:< element(1, KV2) end,
            maps:to_list(M)),
{list_to_tuple(element(1, MX), list_to_tuple(element(2, MX))

and then compare these terms according to the maps key order.
That conversion is unfortunately not as useful as maps:to_list(M).

1 Like

It is important to have comparison operations that behave themselves sensibly
for numbers.

It is useful to have a total order on terms.

But what are the fundamental requirements for ordering?
(1) X =< X
(2) X =< Y and Y =< Z implies X =< Z
(3) X =< Y and Y =< X implies X equals Z
(4) X =< Y or Y =< X
(5) X equals X
(6) X equals Y and Y equals Z implies X equals Z.

Modulo some rather desperate hand-waving over “equals”,
this is true of the arithmetic order. The equivalence
induced by =< in Erlang puts {0,-0.0,0.0} in the same
equivalence class, but that’s OK.

When one of X,Y is exact (an integer, say) and the
other is inexact (a float, say), it is important to
compare them by converting the inexact number to an
exact one rather than the other way around. Otherwise
you can easily break (6).

But we’re not talking about total orders in general
at this point, but an ARITHMETIC total order. And for
those we have expectations like
(7) X - Y equals 0 implies X equals Y.
And here, I’m afraid, Erlang’s arithmetic order fails.

Consider X = 255 + 1, Y = 2.055.
Then X-Y == 0 but it is not true that X == Y.

Ada and Haskell forbid mixing integers and floats
without explicit conversions. It’s a very good idea,
even for dynamic programming languages.

The problem comes when we confuse arithmetic equality
with identity. We expect that for any pure F,
(8) X equals Y implies F(X) equals F(Y)
and that just isn’t true in Erlang for several
built-in F when equals is taken to be == .

We are driven to the conclusion that
the arithmetic ordering in Erlang
IS a total order
but is NOT an arithmetic total order.
due in large part to mashing up integers and floats.

As for the term order, the only thing that really really matters
is that ALL uses of it be CONSISTENT with the rules and with each
other.

It is important to compare ex

1 Like

Curse this formatted mail nonsense!
Consider X = 255 + 1, Y = 2.055.
Was, of course, X = 2 “raised to the power” 55 + 1,
Y = 2.0 “raised to the power” 55.

1 Like

I confirm the behaviour:

1> X = 1 bsl 53.
9007199254740992
2> Y = float(X).
9.007199254740992e15
3> Z = X + 1.
9007199254740993
4> X - Y == 0.
true
5> X == Y.
true
6> Z - Y == 0.
true
7> Z == Y.
false

I would say that

is an assumption about arithmetic operations that we cannot expect to hold for the arithmetic model that we have.

Floating point values are exact, but floating point operations may be inexact.

Even a simple operation like Z - Y may cause rounding that produces exactly 0.0, while the values of Z and Y do not compare equal. In this case it is the implicit conversion from integer to float of Z that rounds off one bit.

The floating point value 2.0^53 (math:pow(2.0, 53)) has got an exact value that is equal to the integer 1 bsl 53 and should not be equal to any other integer. Therefore the floating point value is converted to an exact integer representation when being compared to an integer, if possible, to not break your (6), as you say.

I would say that the arithmetic ordering in Erlang IS an arithmetic total order, but floating point operations such as automatic conversion from integer to float may cause rounding errors.

1 Like

Say… did anything come out of this proposal? :sweat_smile:

On a related but different topic: wouldn’t it be a good idea to have a compare function alongside the comparison operators, which would return whether the first argument is less, equal or greater than the second? Something like…

compare(X, Y) when X < Y ->
    less;
compare(X, Y) when X > Y ->
    greater;
compare(_X, _Y) ->
    equal.

… only as a more efficient BIF. AFAICT, there is already something like that, internally.

Where could this be useful? If I look at the lists:usort/1 code for example, I see many occurences of “… is X>Y? if not, is X==Y? …”, that is, two comparisons are necessary if X=<Y. If X and Y are for example long lists which are equal or only differ in their last elements, comparisons are expensive because all elements up to the first different one are compared. And in (u)sorting, comparisons are done a lot.

Where could this be useful? If I look at the lists:usort/1 code for example, I see many occurences of “… is X>Y? if not, is X==Y? …”, that is, two comparisons are necessary if X=<Y. If X and Y are for example long lists which are equal or only differ in their last elements, comparisons are expensive because all elements up to the first different one are compared. And in (u)sorting, comparisons are done a lot.

The compiler should be able to compile clauses that only differ in such term comparison expressions into a single cmp call.

I’m not sure I’d want to lift that into the language though.

if X > Y → A; X < Y → B; true → C end

is better IMO than

case cmp(X, Y) of greater → A; less → B; _ → C end

The problem with compiler optimization magic is that it is largely undocumented, including the circumstances when it will (not) happen. The preconditions may change only so slightly, or the optimization might be gone altogether tomorrow.

Aside from that, I agree that the first looks better than the second :wink:

case compare x y of
LT → …
GT → …
EQ → …
is how you would do it in Haskell; I really cannot see the two-comparison version as anything other than an ugly approximation to this.
I was and am a big fan of compare/3 in Prolog and never did understand why Erlang didn’t pick this up.

Now that you put it that way… :smile:

For me, I think both notations (!) have their merits.

The two (or three, if you want to be really strict) -comparison way makes it pretty clear, even without looking at the surrounding (compare...) context, when what will happen.

The compare notation is by and large just as good, depending on circumstances possibly even more expressive. Depending on circumstances again, possibly less.

That said, I would like to have the choice of both.


When it comes to runtime characteristics such as what we generally term “performance”, it is pretty clear just from looking at it that the compare approach can go with less actual comparison work than the multi-comparison one.

The compiler may be able to massage the latter to be like the former performance-wise, but unless you know about it, the latter (“tell me how the first argument compares to the second”) always at least looks inferior to the former (“tell me if the first argument is less than the second, and if not tell me if the first argument is equal to the second”).

Compiler optimizations (not) happening in the background without anybody (at the user level) knowing about them is always a fickle thing. Small changes in the code may stop them from happening, or they may be all gone by tomorrow because they got in the way of something.

That said, I prefer something that is reliably slow over something that may be fast if the stars are right any day :woman_shrugging:

Maybe @raimo and/or @bjorng can shed some light on the background here? *poke*

The thing about “compare” is that it is intention-revealing.
if X < Y →
; Y < X->
; true →
end
is not particularly intention-revealing. You have to read it carefully. It is easy to get
wrong (the first time I typed the example about the second line read “; X < Y” oops).
There are arbitrary choices to be made. For example,
if X =:= Y →
; x < Y →
; true →
end
might be marginally faster: when comparing two atoms =:= can fail fast; when comparing two
tuples or binaries =:= can fail fast if they are different lengths. And there is the
choice I always find confusing in Erlang, whether to use =:= or == or make it less
efficient and put it as the default. With “compare” there is one obvious way to do it.

What’s more, the “compare” approach can be extended in a natural way to comparisons
(such as IEEE floating-point comparisons) where there is a fourth possibility, order
undefined.

Ignoring the =:= vs == question, I can think of about 20 different ways to express
“A if X < Y or B if X > Y or C if X = Y” using 2-way comparisons, with slight but
detectable performance differences.

Mind you, I’ve been burned a few times by languages where testing whether
X =< Y runs you into “not(Y < X)” and (X < Y or X = Y) can give different answers.

I don’t disagree :wink:

It’s not an either- this-way-or-that question, too. compare would not stand in the way of using </==/> and vice versa, so a user could chose whichever (he thinks) best fits a particular use case.

Only awkward thing I see is that, given that we do have == and =:= (and maybe :< etc coming), we would also need a strict (=:=-ish) and a non-strict (==-ish) version of compare :woman_shrugging:

And maybe also the maps key order version