Known ways to impose an explicit order on a set of distinct terms?

(The below was prompted by a real world use case, in which I had to determine the minimum required encoding scheme (labeled by atoms) for a binary, by traversing it and looking at the individual bytes. For the sake of discussion, I have abstracted and abbreviated the problem, in hope to make it simpler for everyone to follow.)


Say you have a limited set of known distinct terms, for example one, two and three.

In implicit Erlang term order, which in this case coincides with alphabetic ordering, their order is one < three < two. Their semantic ordering (unknown to Erlang of course, but known to you), however, is one < two < three.

Implementing functions for determining the min/max of two values for this semantic ordering is very verbose, the best I could come up with at first was something like this for max

max2(three, _) -> three;
max2(_, three) -> three;
max2(two, _) -> two;
max2(_, two) -> two;
max2(one, _) -> one;
max2(_, one) -> one.

… and likewise for min, just with the clauses reversed.
Devising a function that could be used in sorting is similar to the above, just with trues and falses as return values in the proper places.


Not exactly hard, but really tedious. There are 3 terms in this example, what if you have 100? And the function is specific to this very set, what if you have more different ones?

Is there a better, more general way to do this? The best I could come up with is this:

which([T1|_], T1, _T2) -> T1;
which([T2|_], _T1, T2) -> T2;
which([_|Rest], T1, T2) -> which(Rest, T1, T2).

which takes the terms of the set ordered according to the semantic rules as first argument, and the two terms to compare as second and third. It returns the term which appears first in the order list.
min can be realized by giving the order in ascending order, max by giving it in descending order.
A generalized sort function for explicit orders could be something like this:

sort(Order, List) ->
    lists:sort(
        fun (T1, T2) -> which(Order, T1, T2)=:=T1 end,
        List).

So far, this is my best shot. Are there other, better known ways or ideas to achieve this?

4 Likes

If you already know the semantic ordering might something like the following work?
(edited to remove the Item check)

-module(st).

-export([ks/1]).

ks(Items) ->
  OrderMap = #{one   => 1, two   => 2,
               three => 3, four  => 4,
               five  => 5, six   => 6,
               seven => 7, eight => 8,
               nine  => 9, ten   => 10},
  SortFun = fun(A, B) -> maps:get(A, OrderMap) =< maps:get(B, OrderMap) end,
  lists:sort(SortFun, Items).
1> c(st).
{ok,st}
2> st:ks([nine,ten,eight,one,three,two]).
[one,two,three,eight,nine,ten]
4 Likes

alternate with an ordermap generator (edited to remove the Item check)

ks(Order, Items) ->
  OrderMap = maps:from_list(lists:zip(Order, lists:seq(1, length(Order)))),
  SortFun = fun(A, B) -> maps:get(A, OrderMap) =< maps:get(B, OrderMap) end,
  lists:sort(SortFun, Items).

4> st:ks([one,two,three,four,five,six,seven,eight,nine,ten], [nine,ten,eight,one,three,two]).
[one,two,three,eight,nine,ten]
4 Likes

Yes, that would work, but it is way more elaborate than what I was imagining :sweat_smile:

It also has the downside that you have to renumber the mappings if you want to insert another term in the middle, or resort to floats. With number names that’s unlikely, so think of preferences:

#{milk => 1,
  water => 2,
  coffee => 3}

To insert orange_juice between milk and water, you’ll have to renumber water and coffee, or assign 1.5 to orange_juice.

3 Likes

Ok, that solves that concern :wink: And it is shorter than the previous version, too. Ok :slight_smile:

As I’m interested in reaping more ideas and opinions, I’ll not mark this as the solution, though (sorry), as this would prevent others from looking at the thread :blush:

3 Likes

@LeonardB I have done some napkin benchmarks.

For min/max of two values, my solution is about 3 times as fast as an adapted version of yours. For sorting, your version is about 4 times as fast as mine. The differences become more pronounced as the sets of possible values get larger.

Both results are not very surprising, though. For min/max, the zipping and map conversion in your solution poses an overhead that my solution avoids. For sorting, the fact that the zipping and map conversion has to be done only once at the beginning while the actual comparisons in the individual sorting steps use fast map lookups comes in beneficial, whereas my solution has to do list traversals in each step.

So I guess it’s mix-and-match. If you need min/max you should use my solution, if you need sorting you should use yours. It is noteworthy, however, that the behavior differs when a value is present multiple times in the semantic order, like [ a, b, c, b, d ]. My solution will use the first occurrence (resulting in the order a < b < c < d) whereas yours will use the last occurrence (resulting in the order a < c < b < d).

2 Likes

Not unexpected that it works better for larger lists being sorted.

I’m not sure that allowing repeated items in the semantic order should be allowed since ordering should imply a specific and unique position for a key in the index.

This could be prevented by changing how the OrderMap is generated, but would add more overhead.

ks(Order, Items) ->
  OrderMap = maps:from_list(lists:zip(Order, lists:seq(1, length(Order)))),
  SortFun = fun(A, B) -> maps:get(A, OrderMap) =< maps:get(B, OrderMap) end,
  lists:sort(SortFun, Items).

ks2(Order, Items) ->
  OrderMap = mk_ordermap(Order),
  SortFun = fun(A, B) -> maps:get(A, OrderMap) =< maps:get(B, OrderMap) end,
  lists:sort(SortFun, Items).
  
mk_ordermap(Order) ->
    {_, Res} =
      lists:foldl(
        fun(K, {X, M}) when not is_map_key(K,M) ->
             {X + 1, M#{K => X}};
           (_K, MX) ->
             MX
        end, {0, #{}}, Order),
    Res.
2 Likes

I wouldn’t really mind very much if a value is in the order multiple times as long as the behavior is consistent (and documented). It makes no sense, but does no harm and coincides with the “feel” of how such things generally work out in Erlang.

That said, there is an easier way to achieve this than resorting to a fold: Either reverse the order and zip up with a descending seq, or reverse the zipped list before passing it to maps:from_list (there is probably no difference in performance between the two):

mk_ordermap(Order) ->
    maps:to_list(
        lists:reverse(
            lists:zip(
                Order,
                lists:seq(1, length(Order))
            )
        )
    ).

For the record, here are my slightly revised versions of min and max:

min(_, T, T) -> T;
min(Order, T1, T2) -> min1(Order, T1, T2).

min1([T1|_], T1, _) -> T1;
min1([T2|_], _, T2) -> T2;
min1([_|Rest], T1, T2) -> min1(Rest, T1, T2);
min1([], T1, T2) -> min(T1, T2).

max(_, T, T) -> T;
max(Order, T1, T2) -> max1(Order, T1, T2).

max1([T1|_], T1, T2) -> T2;
max1([T2|_], T1, T2) -> T1;
max1([_|Rest], T1, T2) -> max1(Rest, T1, T2);
max1([], T1, T2) -> max(T1, T2).

This implementation is relaxed when one or both terms are not present in the order, such that a term that is not in the order is greater than any term that is in the order, and that if both terms are not in the order their Erlang term order applies. While this is a loophole (and the sort function does not have that), I personally am willing to accept it to avoid the otherwise necessary overhead.

3 Likes

To get to the same behavior for sort as described for min and max at the end of my last post, I guess something like this would be required:

ks3(Order, Items) ->
    OrderMap = mk_ordermap(Order),
    SortFun = fun (A, B) ->
        case OrderMap of
            #{A := OrderA, B := OrderB} -> OrderA =< OrderB;
            #{A := _} -> true;
            #{B := _} -> false;
            #{} -> A =< B
        end
    end,
    lists:sort(SortFun, Items).
3 Likes