Any chances to support rational numbers in BEAM?

mat-hek · June 23, 2022, 5:59pm

Hey,

rationals seem like a natural extension of big integers to big floats. In our use case we need them all over the place as we can’t allow accumulating errors. We use an Elixir library for that, but having rationals built in would be a totally next-level experience. What do you think?

DianaOlympos · June 25, 2022, 7:36pm

Not only rational, but fixed precision in general is something i floated before. There is still no good idea on how we would provide a nice syntax for them. If someone had a good proposition around this, i would be happy to help.

nzok · June 26, 2022, 1:41am

The semantics of fixed precision is massively unclear.
I wrote a paper about this, complete with a specification
and test suite, and could not find any venue that was
interested, including the Language Independent Arithmetic
standards people.
The reason I did this was the situation in Smalltalk,
where there is standard syntax for ScaledDecimal numbers.
For example, 123.45s2 is a ScaledDecimal with 2 decimal places.
The standard says that these are fixed point numbers,
and points the LIA standard for the semantics. Yet the LIA
standard in question not only has nothing to say about
fixed point, but explicitly disavows any intention of saying
anything. My Smalltalk implements fixed point as fixed
point, some Smalltalks map it onto IBM/360 decimal fixed
point, and some say in effect “a scaled decimal is an exact
rational whose denominator can be ANY positive integer, and
the scale is just advice about how to print it”. So I looked
at every standard I could find, and there isn’t even any
agreement about whether fixed point is an exact type or an
approximate type. COBOL these days defines fixed point in
terms of decimal floating point.

What, for example, is 1.23 (fixed(2)) * 3.456 (fixed(3))?
Naively, you might expect 4.25088 (fixed(5)), but there are
systems out there where you will get 4.25 (fixed(2)), and
from there on it gets weird.

If you really want to see disagreement, what happens when
you divide one fixed point number by another using /
(rather than div)?

Before asking for fixed point arithmetic in Erlang, we should
be really clear about WHAT we are asking for and WHY that
(rather than some other interpretation of the term) is the
right thing to add.

These days, I’d rather effort went into getting Erlang running
on RP2040s and ESP32-C3s.

nzok · June 26, 2022, 1:45am

I got distracted. Sorry.
SYNTAX for fixed point is EASY.
Just borrow Smalltalk’s + “.” + “s” +
where the number after “s” is the scale, and that job is done.
SEMANTICS for fixed point is hard to agree on.
I can’t even tell if you want binary fixed point or decimal fixed point (Ada and PL/I offer both).

mat-hek · June 26, 2022, 9:40am

Thanks @DianaOlympos @nzok for chiming in. I agree that fixed precision is not trivial to handle, so maybe it’s better just to focus on rationals first? Especially that rationals would already solve some problems that fixed precision solves: you can get arbitrary precision from a rational and building a fixed precision library on top of rationals (choosing arbitrary semantics) would be way easier I guess.

But I think rationals are not that easy to implement as well - to make them work with other numbers, jit etc. I’d love to hear someone’s opinion on that part.

Regarding syntax, is there any discussion on that somewhere? I’m not really into inventing syntax, but of course, I’ll do my best to propose something

So, to create a rational we could have either:

rational(numerator, denominator) function
or numerator // denominator or some other operator
or (numerator)r(denominator) syntax, for example 22r19 would mean 22 over 19. Some other letter could be used as well, but a-f would conflict with hex I guess

Not sure if that’s possible in Erlang for any of these options, but ideally the same syntax could be used for matching on rationals, like
rational(num, denom) = some_rational
or
num // denom = some_rational

However, I don’t think it would be necessary, like you don’t match on mantissa and exponent in the case of floats.

Functions like numerator(rational), denominator(rational), is_rational(term) would also come in useful, and it seems that could suffice. Of course, if all the operators worked with rationals too

nzok · June 26, 2022, 11:10am

Arithmetic on fixed point numbers represented as
(m, s) where m and s are arbitrary integers and
encoding the value m / 10**s is actually pretty
straightforward, provided
(a) you care more about correctness than efficiency
– while caring somewhat about efficiency
(b) you already have arbitrary precision integers
to work with.
I have code written in Smalltalk that I am willing, nay eager,
to share. I also have the specification in Haskell that I
would be tearfully grateful to share.

But that’s on the assumption that what you want is EXACT
decimal numbers, where each number may have its own precision.
And that’s where the hard part comes in: deciding what the
semantics should be.

I actually have two implementations in Smalltalk. The other
one implements just enough of 31-digit packed decimal with a
17th byte holding the scale (0…31), which sounds pretty weird
but it’s there to interoperate with VisualAge Smalltalk, which
uses this IBM mainframe datatype. Like I said, FIRST agree on
the SEMANTICS, and then the rest is actually not that hard.

nzok · June 26, 2022, 11:55am

Rationals are if anything harder to implement than fixed
precision, but there is less disagreement about what they mean.
There are some semantic issues nonetheless.
Is 1/1 exactly identical to 1
or is it just numerically equivalent to it?
(Think of the fun you can have if you associated something
with 1/1 in a map, go looking for it with 1, and don’t find it!)
Is 2/4 exactly identical to 1/2
or are they just numerically equivalent?
Imagine the fun you can have when N and D are positive
integers and float(N/D) works but float((N100)/(D100))
doesn’t, thanks to floating-point overflow.
Semantic questions.
Should introducing rationals change the semantics
of N/D in expressions from its present “answer a float”
to the nicer “answer an integer or rational”?
Semantic questions.

As for integrating them with the built-in operations,
consider the present situation.
$(X)
where $ is a unary operation and X an expression.
There are four cases:
X is an immediate integer
X is a floating-point number
X is a bignum
X is something else.
A naive implementation allocates a new floating-point value in
the heap for every operation with a floating-point result.
HiPE used to work quite hard to avoid this.
Another approach that works nicely on 64-bit machines wes
developed by Andres Valloud, and yields very pleasant speedups
indeed, and that is to use 64-bit floating-point immediates,
where you steal a couple of bits out of the exponent for
tagging, so that extremely large numbers are boxes but most
numbers are not. Either way, avoiding float boxing as long
as you can is a good idea. Let’s ignore that for now.

Are you going to do $(X) with in-line code when X is a bignum?
No, you aren’t. So your in-line code is going to look like
if X is an small integer
compute Y from X
if Y is too big, trap out
if X is a floating point immediate
compute Y from X
if X is anything else, trap out.

Here “trap out” might mean “call a support function using the
cheapest-available calling method” or “invoke a user trap
handler”, whatever works well on your architecture.

Now the trap handler redoes the case analysis, and since there
is only one copy of it, it’s no harder to add cases for ratios
and fixed precision than it was to add cases for bignums and
boxed floats.

Now consider @(X,Y), a binary operation on numbers.
You might do
if X and Y are small integers
compute Z from X and Y

if it’s too big, trap out
if X and Y are immediate floats
compute Z from X and Y
if it doesn’t fit, trap out
trap out
or you might do a 2x2 case analysis. Once again,
you are NOT going to place copies of bignum division
in-line. There are now (immediate integer, bignum,
ratio, fixed, immediate float, box float, other) = 7
cases for each operand, so 49 cases all up, but that
can be simplified and it’s in one place, so no big
problem.

The result is that adding ratios and fixed precision
does bloat up the case analysis in the trap handlers,
but not insuperably, but it doesn’t affect the size
of the in-line code or the time of the fast paths.

There are all sorts of fiddly details to worry about,
but it’s the kind of stuff I call “tedious rather than
difficult”. What’s needed, as always, is

agreement on the SEMANTICS
agreement on the UTILITY of the addition
resources to do the work.

mat-hek · June 26, 2022, 4:08pm

Thanks for all the insights! Good to hear that the overhead may be acceptable. Regarding the semantics, here is my personal view, though somehow validated by using rationals for a while

It should not be exactly identical, just like 1.0 is not exactly identical to 1. Having clear boundaries between types is crucial to avoid confusion IMHO. It can also save you from a situation when something is an integer 99% of the time, but in the remaining 1% of cases it happens to be a rational (obviously in prod ) and thus lead to an error because of some code handling integers only. For this specific reason I convinced the author to change that behaviour in the library we use - Implicit conversion of integer rationals to integers may cause problems · Issue #28 · Qqwy/elixir-rational · GitHub

It shouldn’t be “answer an integer or rational” for the same reason as above. It could be “always answer a rational”, but that would be backwards incompatible. To get a rational, one would use a rational constructor, which could be the num // denom operator (returning always a rational), which seems to fit in this context: we’d have div always returning an integer, / always returning a float and // always returning a rational.

These would have to be exactly identical. I’d even consider reducing fractions after each operation, so that we would never keep something like 2/4. Reducing fractions, even if not each time, is necessary to prevent memory leaks and this requires 2/4 to be identical to 1/2.

That can be avoided by reducing the fraction in the float function itself, no matter which approach is chosen.

nzok · June 27, 2022, 5:46am

Right here we have a pretty fundamental disagreement about semantics.
Considering Common Lisp, Scheme, and Smalltalk
(eq 1/1 1)
(eq? 1/1 1)
1/1 == 1
all come out true. Taking Smalltalk as the example,
there are five classes
Fraction
Integer
SmallInteger
LargePositiveInteger
LargeNegativeInteger
but this is most easily understood as a single type
“rational number” with 4 concrete implementations. This has been
working smoothly for the last 40+ years in Smalltalk and the last
50+ years in Lisp.

In strongly statically typed languages like SML and Haskell, it makes sense for rationals and integers to be different types, just as it makes sense to have a dozen or so incompatible integer types. In Erlang, not so much. Now mathematicians distinguish between infinitely many modular integer GF(p) types, but they are perfectly happy considering Z to be a subset of Q. Do you suppose that might be because it works? I have often wished for a distinction between Z and N in a programming language; the “Elixir” example reminds me forcibly of cases where tests passed because all the numbers happened to be positive, but failed on negative ones. Is anyone going to argue for a distinction between (signed) integers and (non-negative) natural numbers in Erlang as distinct types?

There are arguments for NOT reducing rationals after each operation.
I’m not convinced by them, but they do exist. Bignum division is hard.

I think you missed the point of the float(N/D) example because I made
the mistake of writing float((N100)/(D100)) instead of
float((N100)/(D100+1)). The issue is not reduction but overflow. My Smalltalk library basically does this:
to convert N/D to float:
convert N to Fn * 2Sn where 1/2 <= |Fn| < 1
convert D to Fd * 2Sd where 1/2 <= |Dn| < 1
return (Fn/Fd) * 2**(Sn-Sd)
so that overflow or underflow occur only if the result is genuinely too big or too small. You’d be amazed how many runtimes get this wrong.

mat-hek · June 27, 2022, 8:41am

Ok, so to my understanding what you’re proposing is not to introduce a new type for rationals, but one that would cover both integers and rationals. That seems interesting and way better than having distinct types that have a common intersection (as I understood before). I’m afraid that in this case, we’d have to remove the existing integer type, which would of course break everything. Alternatively, the integer type would have to be extended, but well, it wouldn’t be integer anymore

Anyway, I’d be more than happy to know what the core team thinks of having rationals (in any shape).

bjorng · June 29, 2022, 12:04pm

I find it unlikely that we will ever implement rationals.

Not because we don’t like the idea of rationals, but because we don’t think that they are sufficiently useful for enough applications to be worth the huge effort of both figuring out the semantics and then do the actual implementation. Then there is the future cost of maintaining the implementation and make sure that it continues to work.

AstonJ · July 6, 2022, 1:21pm

6 posts were split to a new topic: Complex numbers in Erlang

mat-hek · July 6, 2022, 9:29am

Since rationals are not going to be supported, I started wondering how we can improve our experience without having them built-in. One of the tricky things is comparison. Erlang’s comparison operators work for all terms, so they’re going to return a boolean no matter what we’re comparing. Thus, when transitioning from integers or floats to rationals (represented for example by a record {:rational, numerator, denominator}), when one forgets to use the dedicated comparison function somewhere, the default one doesn’t fail, but it sometimes returns invalid results… Such a bug may be also hidden behind a call to max or sort. Any ideas on how to prevent that?