"Suspended" computation - papers / articles?

al2o3cr · July 4, 2023, 7:52pm

I’m working on understanding the classic paper “Purely Functional Data Structures” (Okasaki 1996) that’s referenced in places like the queue module. One thing that seems very un-BEAM-like is the memoization of suspended computations to avoid re-computation. For instance, the accounting in the paper requires the second call here to take O(1) time:

Q = produce_a_queue_that_needs_reversing()
{{value, V1}, Q2} = queue:out(Q)
{{value, V2}, Q3} = queue:out(Q)

but the implementation obviously doesn’t work like that - the first call to queue:out and the second both do exactly the same things since they receive the same Q.

Two questions related to this:

where would I find prior work with “suspension” in BEAM languages? I’d assume there’s no general way to do them that handles cross-process / cross-node forcing, but perhaps something could be done locally
what are your favorite references for functional data structures with the additional requirement of strict evaluation?

eproxus · July 5, 2023, 8:50am

I haven’t dug into what the paper actually refers to but you could implement memoization by always updating your reference. A hypothetical example:

Q1 = produce_a_myqueue_that_needs_reversing()
{{value, V1}, Q2} = myqueue:out(Q1)
{{value, V2}, Q3} = myqueue:out(Q2)

siiky · July 5, 2023, 1:01pm

Unless I’m missing some detail (I wish I am), Erlang doesn’t have mutation on the “functional language” level, so there’s no delay/force (I wish there was!). I think the only way to achieve that, which is very Erlang-like, is with processes: your queue (or any other data structure) is actually a process that does things for you. Then you get everything one’s used to in other languages, such as sharing computation results between threads/processes.

However, at the same time you lose the “functional” part of “functional data structures” – suddenly there is mutation, and foo:bar(X) may give different results at different times.

EDIT: as an example of “futures” kind of thing, see Elixir’s Task. A delay/force is possible too with a similar approach, but I don’t know of any implementation.

juhlig · July 24, 2023, 4:13pm

It has been a while since I read (part of) Okasaki’s paper, and I have forgotten most of it since then. Dunno about Erlang-related papers on the topic, either.

However, “Suspended computation” aka Laziness in Erlang can be achieved by wrapping it in a function. So if you have X and Y and want to add it, but later, you can do…

1> X=1.
1
2> Y=2.
2
3> XPlusY=fun() -> X+Y end.
#Fun<erl_eval.43.3316493>
4> XPlusY().
3

A fun project for lazy sequences that @Maria-12648430 and me did a while ago was GitHub - hnc-agency/lazy: Lazy sequences for Erlang. Gleam has that already built in as far as I remember.

Memoization is another topic. The above doesn’t memoize anything, it always does the addition again whenever you call XPlusY.

From the example you gave, I assume that produce_a_queue_that_needs_reversing() will give you a queue with all the items in the rear list, right? Like, {[], [z, y, x]}?

Giving this to queue:out will have to reverse the rear list and make it the front, ie {[x, y, z], []}, then give you back {x, {[y, z], []}}. With hypothetical memoization, the second call to queue:out should remember that it already saw {[], [z, y, x]} and what it returned in that case, so instead of doing the reversal again, it should just look up and return {x, {[y, z], []}}.

Erlang doesn’t do that. Memoization may be nice, but as they say, there is no free beer. You have to pay. In the case of memo-ization, you pay with memo-ry, ie you have to store inputs and related outputs somewhere, maybe the process dictionary. And even looking things up doesn’t come for free, either, stuff needs to be compared to other stuff etc.

Another way to do it that circumvents storing stuff (semi-)permanently in the process dictionary or elsewhere would be to generate new functions that have the inputs and outputs memoized, and return them along with the value:

1> MemoAddHelper=fun
    F(X, Y, C) when is_map_key({X, Y}, C) ->
        {maps:get({X, Y}, C), fun(X1, Y1) -> F(X1, Y1, C) end};
    F(X, Y, C) ->
        Z=X+Y,
        timer:sleep(1000),
        {Z, fun(X1, Y1) -> F(X1, Y1, C#{{X, Y} => Z}) end}
end.
#Fun<erl_eval.17.3316493>
2> MemoAdd=fun(X, Y) -> MemoAddHelper(X, Y, #{}) end.
#Fun<erl_eval.41.3316493>
3> {_, MemoAdd1}=MemoAdd(1, 2).
% one second passes
{3, #Fun<erl_eval.41.3316493>}
4> MemoAdd1(1, 2).
% instantly
{3, #Fun<erl_eval.41.3316493>}
5> {_, MemoAdd2}=MemoAdd1(2, 3).
% one second passes
{5, #Fun<erl_eval.41.3316493>}
6> MemoAdd2(1, 2).
% instantly
{3, #Fun<erl_eval.41.3316493>}
5> MemoAdd2(2, 3).
% instantly
{5, #Fun<erl_eval.41.3316493>}

Calling MemoAdd with two integers like 1 and 2 will defer to MemoAddHelper, which will add them (and sleep for 1s to simulate that something very complex is going on) and finally give you the result of adding those two integers, 3, and a new function that has the arguments 1 and 2 and the result 3 memoized, in a tuple.

Calling the returned function with 1 and 2 again will instantly return the memoized 3 and another function, in a tuple; calling it with different integers which it hasn’t seen before, like 2 and 3, will again add them (and wait the 1s), then return 5 and a function that also has those new arguments and the associated result memoized, in a tuple.

Etc etc.