Spec clauses - any way except parse transformations to make specs more precise?

mmin · September 9, 2024, 2:11pm

Lets say there is a relatively big amount of different object type (lets say hundred) and each object type has it’s own type of the reference.

Lets assume that the following types are defined

-type object_ref() :: object_1_ref() | object_2_ref() | ... object_n_ref().
-type object() :: object_1() | object_2() | ... object_n().

and that I want to implement a function:

-spec get(object_ref()) -> object().

and its inverse:

-spec get_ref(object()) -> object_ref().

Is there any way except parse transformations to make those specs more precise without exhaustively creating all those boilerplate spec clause like:

-spec get(object_1_ref()) -> object_1();
         (object_2_ref()) -> object_2();
               ...

I eventually came up with a macro solution where each clause is a macro call, but I still need to call that macro for every object type which is still very cumbersome and prone to mistakes.

Is there any to simulate for-each macro call or something like that?

kuna.prime · September 10, 2024, 12:18pm

i think more general question is how to type set of transformations instead of transformation of set.

i-am-linja · September 10, 2024, 9:18pm

Why except parse transformations? This seems a perfectly valid case for one.

LostKobrakai · September 11, 2024, 7:56am

-spec get(object_1_ref()) -> object_1();
         (object_2_ref()) -> object_2();

and

-spec get(object_1_ref() | object_2_ref()) -> object_1() | object_2();

are afaik equivalent – at least to dialyzer. Not sure if you’re after more precision for dialyzer or for documentation purporses.

mmin · September 11, 2024, 8:54am

LostKobrakai:

-spec get(object_1_ref()) -> object_1();
         (object_2_ref()) -> object_2();
and
-spec get(object_1_ref() | object_2_ref()) -> object_1() | object_2();
are afaik equivalent – at least to dialyzer. Not sure if you’re after more precision for dialyzer or for documentation purporses.

I believe that this isn’t true. Look at the following example:

-module(dia).

-export([a/1, t1/0, t2/0]).

-spec a([]) -> ok;
       ([ok,...]) -> error.
a(X) ->
    case length(X) of
        0 -> ok;
        _ -> error
    end.

t1() ->
    ok = a([]).

t2() ->
    ok = a([ok, ok]).

If you run dialyzer over that, you’ll get an error in t2/0 (as you should). If you rewrite specs of a/1 as an union, you won’t get an error.

Parse transforms are always a last stop, I just want to make sure that there is no better solution without involving parse transforms. There are many ways how I can do that with parse transforms.

lambdamix · September 11, 2024, 11:19am

LostKobrakai:

-spec get(object_1_ref()) -> object_1();
         (object_2_ref()) -> object_2();

and

-spec get(object_1_ref() | object_2_ref()) -> object_1() | object_2();

are afaik equivalent

In simple cases dialyzer distinguish them, but in more or less complex projects, they indeed get merged.

mmin · September 11, 2024, 11:22am

Is there any example when/how/why those get merged? What triggers the merge?

lambdamix · September 11, 2024, 3:13pm

-module(dia).

-export([example1/0, example2/0]).

-type map1() :: #{a1 := atom(), union13() => atom()}.
-type map2() :: #{a2 := atom(), union13() => atom()}.

-type map3() :: #{a1 := atom(), union14() => atom()}.
-type map4() :: #{a2 := atom(), union14() => atom()}.

-type union13() :: a | b | c | d | e | f | g | h | i | j | k | l | m.
-type union14() :: a | b | c | d | e | f | g | h | i | j | k | l | m | n.

-spec bar1
    (map1()) -> a1;
    (map2()) -> a2.
bar1(#{a1 := _}) -> a1;
bar1(#{a2 := _}) -> a2.

-spec example1() -> ok.
example1() ->
    a1 = bar1(#{a1 => a, a => b}),
    a1 = bar1(#{a2 => a}),
    ok.

-spec bar2
    (map3()) -> a1;
    (map4()) -> a2.
bar2(#{a1 := _}) -> a1;
bar2(#{a2 := _}) -> a2.

-spec example2() -> ok.
example2() ->
    a1 = bar2(#{a1 => a, a => b}),
    a1 = bar2(#{a2 => a}),
    ok.

If you run dialyzer - you would get an error for example1, but not for example2. The difference is the complexity of types used.

Removing any element from union14() would result in getting an error for example2.

mmin · September 11, 2024, 3:57pm

Damn!

set_union(S1, S2)  ->
  case ordsets:union(S1, S2) of
    S when length(S) =< ?SET_LIMIT -> S;
    _ -> ?any
  end.

?SET_LIMIT is 13 (which seems quite low tbh), which is exactly the number of alternative choices of union13()

Thanks a lot for the example, seems like dialyzer won’t help me in this case…

mikpe · September 11, 2024, 4:02pm

In static analysis of code it can happen that the analysis loops computing ever increasing (more detailed) approximations of some value – a type in the context of Dialyzer. The analysis may then “jump” to a value that is an upper fixed point of the current value, the fixed point ensuring termination of the analysis.

Search for Widening and any papers by the Cousots if you want to dive into it.

mmin · September 11, 2024, 4:23pm

But is that limit set to ensure termination or for performance? How could one write an infinite union type (except integer ranges which allow us to express (+|-)inf)?

TD5 · September 12, 2024, 9:23am

But is that limit set to ensure termination or for performance?

Both, in practice. Dialyzer iterates to a fix point to find the appropriate types for your code. Having unboundedly unrolled recursive types would mean the inferred types can keep growing. Also, some operations scale poorly with the size of the types, so keeping a conservative limit on them keeps the performance from drastically degrading as a codebase grows.

How could one write an infinite union type (except integer ranges which allow us to express (+|-)inf)?

Depends on precisely what you are looking for. There’s no (or a very high) limit on the size of specs, so you can write very detailed function type definitions, albeit not arbitrarily complex nor all infinite types. Notably though, once any specs are combined with Dialyzer’s inferred success types (and during the inference of the success types themselves), then types are limited in various ways (level of unrolling of recursive types, sizes of sets of integers, atoms, etc., unions of ranges of numbers are coalesced, etc.).

I suppose, as with any such system, there’s some trade-off between expressiveness, soundness and performance, and Dialyzer sits in a fairly unconventional position in that space.

To your original question:

Is there any way except parse transformations to make those specs more precise without exhaustively creating all those boilerplate spec clauses

Whether you manually create all those clauses or have a tool generate them, Dialyzer (and, of course, other tools may do other things) probably won’t benefit from them, due to the various simplifying generalisations it makes. Whilst you can increase those very conservative limits, performance will suffer, so there will always be a point where you want to express a type which is too complex to feasibly exhaustively check with Dialyzer. What’s more, Dialyzer errors can be tricky for a human to trace back to a root cause, so even if Dialyzer could generate a warning, it might still be infeasible to understand and rectify it.

On a positive note, I am working on substantially improving Dialyzer’s performance (talk at CodeBEAM in Berlin next month, if you’re interested!), and I am aware of some techniques to improve the quality of error messages by displaying the reasoning steps Dialyzer took to reach its conclusion, so whilst the fundamental limits will continue to exist, we might well have a better experience in practice in coming releases.