Static Typing discussion (split thread)

starbelly · December 11, 2021, 7:09pm

I meant to add… I think a static type system would be a nice to have, but I very much agree with what José said here :

kaashyapan · December 12, 2021, 12:24pm

I liked the quoted @rvirding’s answer better. They made choices for the problems they were trying to solve at that time.

Erlang and OTP have traditionally served a niche, low latency distributed applications. As far as I know they haven’t been used extensively in applications where domain modelling is more important than latency. Or in applications that require ‘the other kind of scaling’ - an army of developers of all skill levels who come and go and are required to work on a legacy code base.

As Anders Heljsberg said when he introduced typescript… they were finding it difficult to scale large JS code bases’. Not to mention the python typing efforts. So no one feels the need for typing until codebases become large and the number of hands touching the code base becomes numerous. An erlang company that had this problem hired the guy who wrote Purerl. So it is a real problem and businesses are willing to spend money to solve it.

Just the way there were multiple efforts in JS Flow, typescript and there were multiple efforts in Clojure Spec, typed, malli… and there are now multiple efforts in Python there have been multiple efforts on the beam.
My thoughts after observing these projects in other ecosystems is that…
A typing solution may be had if

A tech giant like Microsoft or Whatsapp could fund the development of a dialect of Erlang (Typescript/ReasonML)
If there was an open source project that was officially incubated and blessed by the Erlang foundation.

By all means let the project move as slow or as fast as resources may allow. Looking at the amount of man hours spent on the projects in the list created by @michallepicki I cant help but wonder what might have been if the majority of that effort had been channeled into a single officially ‘blessed’ product that would see wider adoption by virtue of having the backing of the foundation. I think we can learn from other ecosystems and do better than dialyzer.

Many of the talks by the original Ericsson developers quote examples comparing Erlang networking code performance vs C++ / Java. If the BEAM has to outgrow that niche and see wider adoption in newer domains, it needs a static typing. Dynamic typing was good for problems that were solved back then. There is an entire other class of problems that are better served through static typing.

michallepicki · December 12, 2021, 3:32pm

I imagine it could end up being a very unfocused mess and most of people involved not being happy with it

I don’t see the amount of different approaches to “typing Erlang” as a bad thing. There are many trade-offs to consider, it is not viable to have a language and a type system with all possible features in the world. And by having one feature and giving up another, each choice is limiting the expressiveness or changing ergonomics of the language in a slightly different way. One person can desire type classes, another some kind of typed multimethods or protocols, another wants having ocaml-style functors, and someone else would want source generators, staging or typed macros. Or any combination of those.

There is also a question whether you want a “type system for Erlang”, possibly giving up some type system features, maybe even giving up soundness of the type system (like Typescript does) in order to be able to “type” more of code that is there in the wild? Then in my opinion the closest thing would look like some mainly structural, optional/gradual type system with advanced type flow analysis (“occurrence typing” / type refinement). I have not worked much with any of the TypeScript, Flow, Pyright, Sorbet, Typed Racket, Typed Clojure or other similar systems but they come to mind. Gradualizer is an example of such efforts in this area for Erlang.

(as an aside, I wonder how the research on semantic set-theoretic types from polymorphic CDuce could be applied here - here’s a recent online demo and code of an example language that uses CDuce’s typechecking code)

kaashyapan · December 12, 2021, 4:14pm

That is the limitation of the community model. Lot of master thesis material / weekend projects left dangling.
Which is why something guided and promoted by the platform stewards would be beneficial to the entire community as it would have a coherent vision and ensure the longevity of the project.

There are going to be a lot of dissatisfied people even if you implemented all of these

I was sad to see activity on Caramel winding down. Gleam and Purerl are going strong. But I don’t know how many businesses would bet on them.
I had grand hopes when I heard Whatsapp was making a typed dialect of Erlang. Sad to see that winding down as well.

peerst · December 12, 2021, 4:35pm

This is an unproved assumption.

Many Computer Scientists like static type systems, because they are good for writing papers and thesis.

If and how much static typing is useful in practice is yet to be demonstrated. We just assume one needs to have static type systems in a language or it’s lacking something.

There certainly is a small benefit in static types in that they catch a certain relatively small subset of bugs.

But there also is a cost. E.g. the evolvability of a system is hindered. Hot code loading and distributed systems are completely unsolved. Productivity is not higher, time is spent either in making the types check or the tests pass. First order approximation it’s the same effort.

So if we don’t take static type systems as a must have given by the gods: there is a cost associated with them. What if I invest this time in writing more tests? Maybe even property based tests?

My learning so far: testing > static type systems, kind of unsurprisingly.

Ultimately it boils down to a matter of taste.

What I find cool in Erlang is the pluggable type system: I can annotate types in my code or I can leave it, I can run dialyzer or any other static analyser if I want but I don’t have to.

Definitely not.

a). it long escaped the niche and is used for all kinds of things
b). lack of static type systems is definitely not the problem with the nice. Its just attracting one kind of developer and chasing away another kind

Dialyzer and Gradualizer are both interesting approaches to non intrusively type check Erlang programs if one wants.

lpil · December 12, 2021, 4:51pm

I’m a typing fanatic, so rather predictably I’m going to wade in and an have an opinion.

I think the idea that the value of types comes from catching bug is a misunderstanding of types, or alternatively, is the value of a disappointing and not very useful type system.

I believe that ~~types~~ static analysis is a development tool, and its purpose is to lower the cost of maintenance. If it makes development slower then something has gone wrong, and the tool should be discarded.

I think tests do a better job at catching bugs, but they are not as useful for lowering the cost of editing code.
With sufficiently powerful static analysis it should be relatively easy to step into a large and unfamiliar codebase and start making edits, your editor telling you what you need to do after you’ve told it what kind of change you wish to make.

It’s possible to get a degree of this experience with tests, but they fall down any time you need to make a change to an API or to the behaviour of the program as they verify that the code does what it was intended to do when the tests were written, not what you intend it to do today. Those two things may be different.

Yes, completely!

This is one of the things that for me makes Gleam, Purerl, etc, exciting for the BEAM. We have a wealth of fantastic dynamically typed languages; if we were to have the same quality and robustness for statically types languages we’d have a wider range of programming styles that could draw in even more people to our ecosystem.
I frequently get messages from people saying “I’ve always wanted to try Erlang, but the lack of a static type system meant I didn’t feel it was right for me before”. With Gleam, etc, these people are now trying out the BEAM.

The more BEAMers the better!

peerst · December 12, 2021, 5:02pm

I started this argument probably because I miss conferences and drunk discussions about static vs. dynamic in the pub

Nothing against a static analysis tool if it doesn’t get in my way too much.

A refactoring tool in an editor is certainly useful but not tied to static types as the two refactoring tools for Erlang show (wrangler will be back soon BTW).

starbelly · December 12, 2021, 5:17pm

^ that. Having had a background mostly in dynamic languages I’ve never quite understood what static types give me that I’m missing and if anything it would primarily be what you stated above.

Specifically, going into large high risk code bases (i.e., high risk because of the impact it would have on millions to billions of users). From conversations I’ve had with many many people about the practicality of static type systems it always comes back to this. It’s a development tool that should in a lot of cases result in safer refactoring of very large code bases and are high risk.

That said, I myself have never had trouble making changes in very large code bases (millions of LOC) , but they weren’t say WhatsApp risk level either

I think what you’re doing with Gleam is awesome FWIW and I happily point people to Gleam when they really want a statically typed BEAM language.

I still think a static type system for Erlang itself would be a nice to have, and one I would love to have if 1). It didn’t fundamentally change the language (or change it all really) 2. Didn’t slow me down.

The question as always is, is it possible? Would it be accepted?

Love this discussion overall though

lpil · December 12, 2021, 5:49pm

You and me both!

peerst · December 12, 2021, 8:58pm

That exists in two instances its Success Types (Dialyzer) and Gradual Types (Gradualizer).

All other efforts making Erlang into a static typed language have failed. You can have optional typing, like with the above tools and that’s about it without changing the language fundamentally.

I think efforts building languages different from Erlang with static typing are more useful than messing too much with Erlang itself. Its dynamic typing are one of its strength, why change it into a completely different language?

starbelly · December 12, 2021, 9:57pm

Yeah, I agree. There’s a reason why I’ve never made noise about this I’ve brought it up in recent conversations, but mainly because it’s an interesting conversation to have some times.

The nice thing about dialyzer is I can opt in and out when I want to (and with granularity). I can choose when I want it to “slow me down”.

AstonJ · December 13, 2021, 12:05am

I think @peerst’s post contains a good argument for why not, and @lpil’s post for why, and like Bryan, it’s the potential improvements to developer experience and productivity that Louis (and others like ODL have previously) highlighted that would make it interesting to me.

Generally though I prefer languages and frameworks to get out of your way and to feel natural and effortless - particularly in the creation stage - when you’re most excited about working on a project and so anything that dampens that or makes the process more mundane than it needs to be isn’t great for people like me. I can however also see why others with large ever-changing apps might prefer the assurances and longer term benefits of static typing.

What might be an interesting discussion is asking what people would most like to see in/for Erlang - ‘better’ performance? Static typing? Better BEAM interop? While one doesn’t mean another has to be ignored, it could be informative and interesting nonetheless - even if to see just how much people would like to see it with the wider picture in mind.

starbelly · December 13, 2021, 12:12am

Indeed. That thread around what do you think should be in OTP is interesting, but I think something more general vs specific things you’d like to see would be good. Spin it up and I will comment

AstonJ · December 13, 2021, 1:18am

…done: What would you like to see in Erlang or the BEAM world?

michallepicki · December 13, 2021, 9:09am

I am no 10x programmer, but I’ll list some reasons I find static analysis tools / code checkers useful.

A good static analysis tool lowers my cognitive overhead and boosts my confidence in the correctness of code modifications I make. When modifying code in a way that affects some module’s or function’s interface in Erlang or Elixir, I have to think about, and check:

where are all possible code sites this code is being called from
what are the accepted and correctly handled values that I can return
what will be other possible side-effects of the change I am about to make
how will I need to change other code

If there is sufficient tests coverage, failing tests may point out a few use-cases I didn’t think about. But even before I get to running tests, the main thing I do is I perform textual searches, to understand the use-cases and think whether the modifications I want to do are the right way to go. Running all tests may be also much slower than running type checks. There’s also just no way to have tests coverage for all possible execution paths (combinations of data and state of the execution environment). And in presence of dynamic dispatch (through apply and atom module names being passed around, for dependency injection or other reasons, through functions being passed around etc) I can’t rename the function and rely on the Erlang compiler to point out all code sites. And I can’t rely on Dialyzer either, because with its “success typing” approach, it will be happy if there’s any execution path that succeeds. And it will “give up” after hitting sufficient complexity.

(I’m not saying types are a replacement for tests, they are not. I see value in both, and think they complement each other nicely.)

Writing and modifying code is a lot of work. What if there were computer programs that could help me with that?

If I know how, I can use the static analysis tool to make some of the mundane code reading and data flow mapping work for me. And I would love all the help I can get, reading code is hard. I can use a compiler or static analysis tool with more advanced type checking to do the searches and check interfaces for me so that it is

faster than running all the tests
more reliable in pointing out all code sites affected by interface changes
I can describe and see effects of the interface change as a type/specification change before I even make the code change

Describing data types and type specifications explicitly also helps me reason about, and model computation in a succinct way. If a function has a type declared, or my code editor can show me the type of a function with help of a type checker, I immediately know a lot about a piece of code without reading it fully or executing it. Writing down all data type variants helps me think about and document possible states my program will be in. Existing Erlang type and spec language can be used for that a bit, but that types language is limiting (does not even allow you to say “the return type is the same as the type of first argument”, you have to say “any/term”), and only serves as documentation (so will get out of date as the program gets modified by many people, doesn’t provide any checks by default, and even with Dialyzer the checks are limited).

It may seem counter-intuitive, but by limiting the code I (and others) can write that the tool will accept, it often helps me write simpler, more focused code. Yes, it will get in my way if I want to write one function that takes integers or atoms or heterogeneous lists of various pieces of data and returns a value of different data type depending of the day of the month and state of the database. Often it’s just better to write functions that take one type and return another type.

That said, I am personally not super strict about static types, I only see static analysis as a tool that can help me. I want “escape hatches”. I want to be able to tell the compiler: “trust me this will be the type” or “do a run-time check or crash otherwise, that’s fine” or “I know there could also be an error tuple returned here, but ignore it for now”. I want to be able to throw and handle exceptions because that can actually happen in the run-time. I want to use dynamic programming techniques wherever they are more convenient or result in better performing code (e.g. to avoid converting from and to tagged tuples). Optional typing can probably do that. But also I see value in mixing statically typed and dynamically typed languages like Erlang/Elixir/LFE with Gleam/Purerl/Sesterl in a single project.

peerst · December 13, 2021, 11:33am

Maybe I lack this kind of trust. My confidence doesn’t change by a noticeably amount if I have some static analysis tool.

There is a huge boost in confidence when I can say: I have quick check tests in place.

That points into a interesting direction. Having good code navigation tools is very important to jump into a new codebase. And for that one doesn’t need to have static types.

Property based testing can get you there. If one can’t trigger certain execution paths maybe they should be removed?

This complement I don’t see. The dynamic dispatch etc. you mentioned further up is one of the things that would be taken away when we really would have static types. Or otherwise any type checker needs to give up there. Very likewise with messages and distribution and hot code loading. What version of which code is sending you this message from another node. Once you try to hedge that in with types the usefulness of Erlangs approach breaks down. You can only have either or.

https://www.cs.kent.ac.uk/projects/wrangler/Wrangler/Home.html (soon to be back for latest OTP and integrated into Erlang LS thanks to EEF funding)

I read here that you are perfectly fine with the optional typing Erlang provides.

Because if we really would move to a static typed Erlang (the horrors) that would mean that if the types don’t check it doesn’t compile.

The thing with compilers and static types is, they make the code generation somewhat simpler when you want static dispatch. And some still think that is necessary for performance. Well that used to be true in the 8 and 16bit and early 32bit era but its quite bogus claim now.

With static dispatch one either gives up on generic programming or ends up with C++ and Rusts templating garbage that generates all code path variants. This kills all locality which messes up the usefulness of caches. Well but caches are crucial part of performance and a tight late binding dynamic dispatch loop like the one at the center of Objective C/Swift or a VM that fits in second level cache is not so bad actually in real world performance.

All these cache effects (direct only looking at one thread/process and indirect driving other stuff out of the cache) are happily ignored by the Rust/C++ is faster because of static typing fraction.

michallepicki · December 13, 2021, 11:59am

I’m pretty sure there’s still a bit of performance on the table to gain with the Erlang JIT leveraging more type-guided optimizations, e.g. in private functions of a module that has type guards present at entry points of the module. At least one of recently merged PRs mentions there’s room for more aggressive integer optimizations. Compile-time type information is needed to do those optimizations. If I as a programmer can help the compiler generate more efficient code by adding type guards in the right places (or type specs that also get turned into run-time checks by a clever compiler) I’d like to be able to do that

robashton · December 13, 2021, 12:12pm

I mean - if there was a thread on these forums that would cause me to join…

I work at a company (and have worked there for nearly a decade now) writing Erlang inside what ended up being a very large code base. How large? Well, showing the code base to anybody at Erlang Solutions or at conferences generally yielded a “My god” reaction, because we’ve just written nearly everything in Erlang, with C sat underneath where needed for interop with all of the various drivers/libraries we end up needing to interface with.

The last two years have been spent moving a lot of this code over to Purescript (Purerl) and Rust, and developing the libraries required in Purescript to make this all possible. A quick look over at Github tells me that the first commit to Pinto (the OTP bindings for Purescript) was back in 2019, which is what we use to drive the vast majority of our Purescript based code (albeit inside our own monadic types or whatever). We have pushed out dozens of ports of the JS based libraries onto Github as well as dozens of libraries binding onto the various Erlang-specific modules (and libraries) to facilitate these efforts. (See: id3as · GitHub )

Why have we done this? Why have we gone to the effort of doing this when we had perfectly good Erlang that has worked and carried on working in deployed client environments over all of these years? The answer is quite simple, in a sufficiently large codebase, no matter how much effort you make to keep things de-coupled, the likelihood of a small change causing unintended side effects tends towards being higher than in smaller code bases.

It’s not like we didn’t leverage the tools available to us at the time, we annotated all of our records with types, the vast swathes of code written in the latter years had specs on all public functions, we ran dialyzer regularly to ensure that we weren’t doing silly things (and dialyzer wouldn’t run on my laptop because it would OOM because of the size of our projects and sometimes I’d wonder if I’d grow old before it finished running or not but I digress). For us, the benefits of an actual typed language were obvious and a couple of years on we’re looking at taking on new client projects and being able to use this new world we’ve created to do so - the peace of mind that we now have when making changes to our code to allow for client requests is something else.

The other day I wrote a whole chunk of code in rust, and a whole chunk of code in purescript - the types of my records were used to define structs in rust and we have macros to marshal them automatically so that the nif+ffi layer can just be dumb proxies. It all worked first time, despite being fairly complicated code. I didn’t have to repeatedly run it while I looked for oddities and typos and in five years when that code is being looked at by somebody else, the won’t need to worry about breaking it because

a) Types
b) Tests

Yes indeed, it’s not all or nothing, just because you have types doesn’t mean you don’t need tests, in our experience you probably just need fewer of them - check the happy path for a sanity check and then check the edge cases because they’re not encodable in the type system.

Every now and then I have the pleasure of diving into our old code bases and “just writing erlang”, it’s highly enjoyable to ditch the mental effort of reasoning about types, but sadly after a few hours it becomes apparent that all I’m doing is delaying the mental effort to the debugging stage when it inevitably turns out that some of my understandings or assumptions were invalid. In a one time piece of work this doesn’t really matter, but if you’re working with a long-term code-base that is in excess of 13 years old in places, this is not an enjoyable place to be spending your time repeatedly.

It’s very frustrating to open up a chunk of code with arcane row types and typeclasses in it and have to start the work of unpicking to work out just how exactly you’re going to make a change, but when you finally get around to making that change - the joy of seeing it “just work” makes it all worth it.

erszcz · December 16, 2021, 6:19pm

Totally! I’d like to be able to throw away the tests, though, and yet keep the confidence thanks to a tool that runs faster and works ahead of time (not “compile time”, as it’s not the compiler…). Ideally, in the background as I type.

Gradualizer is not there yet, but maybe one day? The level of maturity is definitely way behind QuickCheck or Proper, but you have to start somewhere… It was an enlightening experience (and a bit meta) when I worked on property based tests for Gradualizer - ideally, we would want the same level of confidence as the tests give, but with writing just the types and specs.

“In the background as I type” actually works quite well already thanks to ErlangLS and the new Graudalizer diagnostics.

starbelly · December 17, 2021, 2:36am

I’d love to see that old code base. I love digging into stuff like that and pondering on how things could have been done different to avoid the problems you alluded to.

Do you have any thoughts on what could have been done differently to avoid the problems you’re thinking of?

Also, how many SLOC was it? Not that it matters entirely. I’ve seen havoc go on in small code bases and it always had to do with everything but the language itself, but every domain is different, thus why I’m interested.

Perhaps an interesting case in point is Erlang/OTP itself. Only counting the lines of erlang code within lib gives us 1,710,728 lines of code.

To note here is the SLOC for erts :

    1660 text files.
    1479 unique files.
     359 files ignored.

github.com/AlDanial/cloc v 1.90  T=2.17 s (604.3 files/s, 498136.1 lines/s)
---------------------------------------------------------------------------------------
Language                             files          blank        comment           code
---------------------------------------------------------------------------------------
C                                      411          54146          42781         374800
C/C++ Header                           354          23355          36631         111275
Erlang                                 213          15254          12686          93392
SVG                                      2             42              4          85626
XML                                     43           3057            132          63248
C++                                     93          10992           7406          44509
Bourne Shell                            70           4672           4129          30581
make                                    48           1360           1389          19952
SQL                                     23           2547            644          10862
Markdown                                25           1658              0           5630
Assembly                                 4             53              2           4541
Perl                                    12            716            879           3751
D                                        1             70            568           2934
m4                                       1            451            249           2916
Python                                   3             76             97            648
HTML                                     1              6             21            277
YAML                                     2              1              1            105
Windows Resource File                    2             21             50             84
DOS Batch                                1             17              4             55
Bourne Again Shell                       1              6              2             34
Windows Message File                     1              3              8             22
Windows Module Definition                1              0              0              4
---------------------------------------------------------------------------------------
SUM:                                  1312         118503         107683         855246
---------------------------------------------------------------------------------------

And here is the SLOC for lib :

   9479 text files.
    8811 unique files.
    3970 files ignored.

github.com/AlDanial/cloc v 1.90  T=5.28 s (1094.9 files/s, 585903.5 lines/s)
-----------------------------------------------------------------------------------
Language                         files          blank        comment           code
-----------------------------------------------------------------------------------
Erlang                            3922         280407         322071        1710728
XML                                901          38376           2439         384636
C++                                 19           7821           4020          85206
Bourne Shell                        52           8341           6893          52882
C                                  153           8300           7562          50890
make                               352           5868           9199          15463
XSD                                 37            360            223          14486
Lisp                                12           1340           1603           9665
Java                                71           1773           6465           8473
C/C++ Header                        86           1240           2499           7991
XSLT                                 8            971            667           5704
HTML                                26           1258             92           4895
Python                              29           1002              0           4821
m4                                   9            515            490           3179
DTD                                 22            936           1109           2512
Assembly                            28            278              0           1885
SQL                                  3            130              0           1804
CSS                                  6            132             39            853
D                                   14             71            305            821
JavaScript                           7            151             56            747
INI                                  3              0              0            424
Perl                                 2             32            109            365
Markdown                             2             54              0            171
Bourne Again Shell                   2             19              6             97
Elixir                               1              7              7             91
diff                                 2             21            106             87
sed                                  8             30            190             72
DOS Batch                            5              2              0             16
Windows Resource File                1              1              0              3
-----------------------------------------------------------------------------------
SUM:                              5783         359436         366150        2368967
-----------------------------------------------------------------------------------