Comments on merge request OTP/#8026 - Change documentation to use ExDoc

kuna.prime · January 18, 2024, 3:02pm

I wanted to comment on OTP pull request Change documentation to use ExDoc and EEP-59 style documentation:

First of all i must thank @garazdawi for this huge amount of work. Also move to md files form XML is very welcomed progress.

However IMHO there are two major regressions in the current state of this PR:

readability of docs is significantly reduced when it comes to function specifications/reference:
old

fun_info_doc_s895×376 37.7 KB

vs new

fun_info_doc_nw884×343 33 KB

in particular fonts for types (and spec in general) is smaller that the rest which i believe is semantically
wrong as these are most important information one is looking for in everyday work with the language. Also notice that fun_info_item() options are not directly listed which is major inconvenience.
I’m not that familiar with ExDoc but these are things that may be fixed with better configuration?

docs now depends on elixir and elixir tooling to build documentation. Of course it is not realistic to expect that every tool will be bootstrapped and in fact already it is not (compiler, make …) but doesn’t that introduce additional maintainence burden and it conflicts with reasoning why rebar3 is not part of OTP (I remember that arguments was that it contains to many external dependencies so it will be major maintainence burden)?

So what rest of you think? are those valid concerns?

starbelly · January 18, 2024, 4:44pm

On point two: ex_doc (i.e., the project on github) generates escripts for this task. I don’t know what OTP(s) plans are for shipping it with OTP (i.e., including it in OTP repo), but with the escript, you don’t have to depend on elixir, if you will, it’s all in the escript

garazdawi · January 18, 2024, 7:40pm

I’m no expert in readability, different people will always have different preferences and be used to different ways things look. One of the things I like about using ExDoc is that there are people better at those things than me looking at it and making decisions. If you want to engage and talk about it then I suggest making an issue in the ex_doc github page.

Regarding the inlined types, I agree that for some types it is very nice to have. It is something that I’ve been thinking of extending ExDoc to be able to do, but I’ve not gotten around to it yet. A heauristic I think might work well is if a type is used by only a single spec, then it should be inlined into its function.

Regarding #2, in addition to what @starbelly wrote, I also answered in the PR.

nzok · January 18, 2024, 11:18pm

I agree that the second version is markedly less readable than the first, at least as displayed by gmail.
I mean by “readability” here simple discrimination of what the text is, and contrast between foreground and background is the aim. The characters in the second version are thinner and paler (at least as displayed by gmail). This is not a matter of whether I like it or not. It’s not an issue of taste. It’s an issue of it being physically harder to tell what the text is.

The red text screams out in a way that the blue text does not. This is contrast going the other way. But the red text is not the most salient part of the information, so it should not be the first thing that catches the eye. Again, this is not a matter of whether I like it or not. It’s simply that when I look at the second version I can’t attend anything else until I have read the red text, which is actually a waste of my time. Reserve red for warnings. (This also applies to the change bar in the left margin, which looks as though it’s trying to tell me that this is a deprecated or dangerous function.)

paulo-f-oliveira · January 19, 2024, 4:02pm

ex_doc is also fed by a bunch of CSS files.

Isn’t it possible to override those (maybe ex_doc already allows it - we did it for EDoc at $prevCompany - but I don’t remember if we used functionality from EDoc or simply copied over the generated files)?

Maybe a PR to ex_doc would expose required features and then Erlang/OTP could be styled differently?

juhlig · January 19, 2024, 7:45pm

I think @nzok put it well. I have been wondering why I don’t like the new style. After all, let’s be honest, it does look nice, shinier, more modern than the old. I realized that the new style somehow requires a higher cognitive load to read. And the effect of nice, shiny, new will wear off quickly, but the higher cognitive load will mostly remain. After a while, I will only want to know what a described function does and how to use it, I won’t care how nice and colorful it is presented to me. In that respect, the current/old style is pretty much perfect, the new isn’t, it puts too much emphasis on looks at the expense of readability. After all, this is not about selling this off to a prospective customer or investor who might be impressed by cool looks, it is for people who have already bought into the thing and need to know how to use it.

Maria-12648430 · January 19, 2024, 8:00pm

I largely agree with what @nzok said and @juhlig elaborated on.

Maybe the EEF could sponsor a project or thesis that focuses on finding the best way of how a documentation should be? There are fields of study making things like that their specialty. @peerst?

Making just a simple poll would probably not yield a good result: The “old” Erlangers would probably vote for the old (current) style, simply because that is what Erlang documentation looks like. Newcomers and people coming from Elixir would probably vote for the new style, for the same reason.

The best way lies probably somewhere in the middle, or somewhere completely different, but that would likely not be considered much, not when given only the choice between “old or new?”

juhlig · January 19, 2024, 8:01pm

Well put

starbelly · January 20, 2024, 4:34am

This is actually already doable using the before_closing_body_tag config option.

nzok · January 20, 2024, 9:03am

I like examples. A salient example for me is the InterLISP-D manual. Back in the mid to late 1980s I loved it.
I read it cover to cover, and learned everything I needed to know from it. Two days after first reading the
(two!) chapters on graphics, I was able to code up a multi-panel debugger for the software I was working on.
(I have never since found anything as straightforward.) Now that I it installed on my Linux box, I find the
same documentation unattractive, awkward, and frustrating.

What has changed? My typographical taste certainly has. The text is indented far from the left margin and I do not like that. There is heavy use of horizontal rules in tables of contents, and something much visually lighter would be easier to read. But there are structural issues. It’s not until page 163 that we see a function definition and not until page 171 that we learn what the form actually does. These days I want more examples. Because the most important change is the way I want to use the documentation. I’m not reading from cover to cover trying to learn everything. I’m trying to answer the question “how do I do THIS”.

This leads me to a problem I have with Erlang’s stdlib. There are too many I/O modules and I can never remember which one I need to read the documentation of. Has anyone considered tool support for ‘union documents’ which let you distribute implementations across several modules for information hiding/maintenance purposes, but put the documentation in a single topically organised file?

Another lesson from the InterLISP-D example is that different users have different needs.
We should, for example, consider the needs of the approximately 10% of male readers who have some degree of colour-blindness.
We should also consider the needs of people reading the documentation using some form of text-to-speech system.
ALL the information should be accessible through the plain text.

But people have different information-accessing needs. Some people want to learn how to use an application (in the Erlang
sense). Some people need to read the overview of a module (possibly a union of modules). Some people need to see some examples of using a particular function. Some need clarity on a type, and want to know what’s in the type and what functions work with it. Some need to follow a topic which is neither a function nor a type nor a module. Some need to see lots of examples. Some will be annoyed by them.

There’s a book I got several decades ago. The title is “Programming as if People Mattered: Friendly Programs, Software Engineering, and Other Noble Delusions”. My copy is >1000km away so if I recall correctly this is the book that described an experiment done when it first became possible to display fancy styled text on a computer screen. The conditions were (existing text vs improved text) x (plain display vs styled display). The result was that users obtained no benefit from the styled display, whereas improving the text reduced the time it took to solve their problems. On the other hand, users liked the styled text better.

So a poll of users might well result in a preference for fancy styled text that they can’t actually read very well.
It’s important to ask “what is the PURPOSE of this documentation system?” Is it to pound one’s chest and say “lookit whut I can do!”
Is it to say “hey, we’re hip with the latest cool fashions”? Is it to help people accomplish tasks, and if so, what people and what tasks?

I shall always be grateful for some lessons Kennita Watson (lead technical writer at Quintus for several years) taught me:
(1) engineers should not be allowed to write user documentation – they know too much
(2) the table of contents and indices are also documentation. Index, index, INDEX!

(3) there should be examples
(4) the examples MUST work
(5) if the documentation and the code disagree, one of them is wrong, and probably both are.

Free text search can to some extent substitute for indexing, but it’s a poor substitute.
() It puts words in the examples on the same level as words in the explanations.
() It finds words, not concept. You can put a concept in the index even if the word isn’t on the page.

Personal example… I am resurrecting an old AI programming language. Writing the emulator is fairly straightforward.
Writing the library has been straightforward and educational – the language was capable of far more than I realised
at the time. Writing the compiler has been exceedingly tricky because I am working from documentation and it was
very inadequate on the scope rules. I don’t actually have a manual for the dialect I particularly wanted to
reconstruct because all surviving ones were accidentally destroyed in a fire. I do have several other manuals,
but while the authors were all good at writing clear prose (better than me), they KNEW what the scope rules were
and the only way that the later manuals help is by pointing out that what the old manual DID say was wrong and no
implementation ever did what it said. My current attempt at documenting the scope rules is my third, and when
I’m done I’ll finally be able to finish the compiler. (What’s particularly annoying is that my old University had
a listing of the original compiler, which they have since lost.) I have a useful amount of old code, but none of it
goes anywhere near the corner cases…

Do I need to point out that fancy styling would be of no help to me at all?

Do I need to point out that this perfectly exemplifies Kennita’s “engineers should not be allowed to write documentation – they know too much”?

Do I need to point out that ‘scope’ is not a keyword in the language, nor the name of a function or a type or a module, but that I needed to find everything I could about it in the manuals? If you want an Erlang example, searching for ‘lifetime’ took me to the debugger, not the language (lifetime of a variable) or a network module (lifetime of a connection). Keyword search has its limitations.

So. If you want to propose a new documentation system for Erlang, how about one that makes it easy to generate a concept index for a module? I’ve got cmark and hxindex, but having to write lots of and elements kind of defeats the point of Markdown, no?

Maria-12648430 · January 20, 2024, 9:43am

Yes! Well said, I wholeheartedly agree

Maria-12648430 · January 20, 2024, 9:56am

Btw, I think that the actual purpose of the PR we’re talking about here was to make it easier to write documentation and adapt the current documentation to this. This has not been discussed yet.
For that matter, I think that the new system for writing documentation is good and sound, the XML files were a drag
That, at the same time, the overall look and feel of the generated documentation becomes quite radically different from what it used to be, for better or worse, is another matter. But as it is like this, that must be discussed, too, of course

jimdigriz · January 20, 2024, 11:41am

You have highlighted the problem. The PR states its mission is to change how documentation is implemented but it then goes on to also change the output.

XML meant any language could parse it and generate the output they want. Crucially decorators and context could be included in the form of elements and attributes. That is now lost, though debatable if this actually has a measurable effect outside of OTP; maybe a Language Server implementer would have something to add here.

It is clear this conversion has lost some context or accessibility that has people coming out of the woodwork.

The positive change of the PR is embedding documentation alongside the code it is referencing which hopefully couples the two more tightly. I suspect no one has any problem with this, if anything it probably was long overdue.

Consider if instead the PR did only tackle inlining the documentation and had (mostly) no effect on the output. Then later a second PR changed the styling. I suspect no one would care as it would have been demonstrated that the rendering is independent of the source.

Right now I am not convinced this is the case. It looks that though the affect is positive, there was a negative impact. If this was some other PR you may have been able to weigh against other positive effects, but when the implementation has mostly only one purpose (ie. production of human readable documentation) it is a problem.

It is worse, just say it out loud, it is okay to do this.

My concern is I will not be able to just fallback onto manpages to side step any UX/branding/marketing drive as now the content is effected; in particular “what values can I use here in this function?” is harder to answer.

The relatively recent changes to the styling of the documentation really has been good. Visually it was tidied up but it did not fundamentally change the content. I think it was generally an all round improvement compared to what it was before, having a dedicated ‘Data Types’ section I found useful and cut back on possible repetition.

Can someone point to examples of problems with the existing formatting of the documentation? What problem does this PR solve by changing the formatting radically that the old formatting failed to address?

jimdigriz · January 20, 2024, 12:02pm

Not exactly sure funding a social psychology R&D effort to ‘feel’ out how documentation should be would be any more use than just collecting the gripes of those actively using the documentation in a ticket somewhere.

Are there any gripes (about the output not the source format) to be noted here?

The EEF seems to have no bandwidth to soak into packaging which still seems to be a dead end so asking for an R&D effort in a non-software field is probably a too big an ask.

I do 100% agree with your statement that 'old’ies tend to lean towards field tested stuff whilst 'new’ies often prefer the superfluousness of shiny stuff.

Cheers

kuna.prime · January 20, 2024, 12:56pm

I think @Maria-12648430 is completely right in pointing out that primary purpose of PR is making writing documentation easier, and i also get that notion form the author of PR that this was primary focus of the effort. Also I would argue that this new approach achieves that primary goal nicely (mostly) especially usage of Markdown, documentation attributes and referencing external files for documentation.

leaders are important

Having said that, we must examine the end product and it’s implications on larger scale because OTP is “core” project for entire ecosystem, and i would hope that many (most) users will look at OTP as inspiration and guide how to do things “right”. I cannot stress out enough, importance of leader project to make good quality choices in order to lead by example, even when it slows thing down a bit.
When OTP project does something we should always ask follwing question:

fundamental question of leader project

OTP is introducing/doing/changing X then how is world looking if all projects using erlang are using/doing/changing X?

choice is great, reliability is even better

Choice is essential to in order to have vibrant community and explore ideas hopefully with the goal of converging to optimal solution to whatever problem is interesting at the time. Choice must be enabled, and encouraged but one should be careful to be aware of distinction of making a choice and enabling a choice.

In my experience we as developers, system designers, tool makers (and humans of course) are limited by two (at least) things: time and memory. Life and brain capacity, both unfortunately finite resource. Having tired out multiple strategies on how to optimize those resources for myself, I found out that I don’t like surprises, edge-case, exceptions. Those are wasteful and committing them to memory will make you seem like “expert” while you are becoming just glorified database for infuriating problems that scarred you once before.
Solution is seemingly simple - have tools adhere to rules, have them be simple and algebraic (composable, uniform) make them reliable!

examination

So let me try once again examine this PR and in part EEP-59 by looking not at the implementation (that i’m sure is great and fine) but end product and it’s effects: produced documentation and processes.

Markdown - it is objectively simpler and less powerfull than XML but it seems that we don’t need full power of XML and Markdown is much faster to construct. This is great it saves time.
doc attributes - makes documentation more structured easily processed. Amazing thing: enables choice, more reliable/enforcable,
using exdoc - in it’s current state exdoc introduces dependencies that are not easily available (not being directly installable from distribution repos is not easy. In my case as frequent user of gitlab not github it took me stupid amount of time finding downloadable escript in repo, and no really it does not matter which platform is used) this is surprise. Also by fundamental question above it follows that now all of my projects are depending either on elixir or on rebar3 plugin which also waste time.
also by fundamental question it follows that now entire erlang world is reading docs in format of new version that is less readable (time waste). So not being expert in readability or suggesting that this is problem that should be fixed in external tool is choice that IMHO leader project cannot afford to make (to be clear, this is not criticism of @garazdawi at all)
dumping -spec in documentation - currently specs are directly dumped into function documentation, and i have seen this in other projects using ExDoc, to be honest, infuriates me. Firstly this introduces useless characters in documentation (-spec, when) and also it has inconsistent formatting (or maybe it is consistent with the source, I could not care less) meaning that sometimes it will have everything on one line while other times it is closer to old format, which is wasteful and not reliable/consitent. But main problem is that type specification is not documentation: code is not documentation. Code/type specification is conversation with tools, documentation is conversation with users, we should be very careful about that. Which brings me to the opportunity to quote this gem that i will use often from now on:

duplication of function signatures - if you notice in my original example: “fun_info(Fun, Item)” is repeated twice for no reason. I assume first occurrence is result of “slogan” feature which is by it self surprising, which is wrapped in code block element also without any utility. All of this make readability and parsing documentation much worse that previously which of course wastes time.

this is so on point:

and variation of that by Bjarne Stroustrup:

Every new (powerful) feature will be overused and misused

we are all susceptible to shiny and new, there is no escape form that, so i hope that this discussion will make all of us reflect on that in effort to help leader project continue to make very considerate choices and take standards of quality even higher.

We obviously need new doc tool for the future (also replacing multiple tools in OTP with one would be nice) and maybe EEF could be right place/organization to try to at least specify that tool and format/representation.

nzok · January 21, 2024, 12:20am

So this claim has been made:

Be clear though, the positive change of the PR is embedding documentation alongside the code it is referencing which hopefully couples the two more tightly. I suspect no one has any problem with this, if anything it probably was long overdue.

I respectfully disagree. STRONGLY. Coupling the code and documentation tightly is a BAD idea. I have a very big problem with this. Insisting that the documentation goes in the source code means insistin that there be ONR docuent, and that is just NUTS. Users and maintainers have DIFFERENT information needs. To continue my old AI programming language example, consider

foldlist2 : (seq(s,n+l), seq(t,n+m) or map(k,s,n+l), map(k,t,n+m)), a, (s, t, a => a) => a

whose implementation is

function foldlist2 : genfold(% applist2 %);

A maintainer needs to know about genfold and why genfold is being used instead of direct code. A user might need to be reminded of the naming conventions for higher order functions and the accumulator-last convention for folds. A maintainer could well be reminded of these things, but once per file, not once per function. A user shoould be warned that this function was not in the original library; a maintainer has no reason to care about that. And so it goes.

Literate programming focuses on explaining algorithms to people who need to read the source code. It’s information for maintainers and reviewers. As such, it generally makes poor to very poor user documentation. I note that Knuth wrote the TeXBook for users of TeX, but a completely different ‘literate program’ for the source code of TeX. Conversely, my experience has been that JavaDoc not only clutters upper source files unbearably, to the point where many people hate to write it, but it produces poor to very poor documentation for maintainers.

Even the order of the ducmentation needs to be different. Documentation for maintainers needs to respect the internal layering of the functions: if g uses f, then you should expound f before you expound g. But that is not relevant for users. Alphabeic order, as commonly used in the OTP documentatoin, is great for finding functons, but lousy for learning about them, where a topical order is better.

So no, the very LAST thing any sensible person should want is for the existing OTP user documentation to be replaced by documentation comments in the source code.

juhlig · January 21, 2024, 8:21am

I agree. I spent a few years doing Java, the JavaDoc stuff in the code was pretty much unbearable. User documentation is naturally somewhat verbose and example-laden, which totally clutters the code.

Maria-12648430 · January 21, 2024, 10:11am

Hm, now that you and @nzok put it that way, I cannot but agree… But I wonder if we shouldn’t have had this discussion earlier. As far as I can tell, this ship is about to set sail.

jimdigriz · January 21, 2024, 10:13am

Consider me sold.

kuna.prime · January 21, 2024, 10:36am

i’m also all for decoupling code and documentation 100%, but naturally i would say that in order to have as much help available as possible while writing documentation it is important that doc tool understands the code. So in that sense documentation should be coupled to code (references, ability to interpret and reformat type specs and other module data) but not other way around.

also as far i can see there should be three layers of documentation:

User guides → erlang is doing this part right
references → also erlang is doing it right most of the time in presentation but fact that doc is embedded in source code is thing that we should avoid in the future
comments → and i 100% agree with @nzok here, there are great differences between user and maintainer documentation, while former should be mostly outside code but referencing interfaces, latter can be, maybe should be, in form of comments demystifying some implementation details. Here one should strive to make code readable but that’s another topic.

and @Maria-12648430 i would really hope that we can sow that ship down in order to have superior solution in the future.

Also all conversations seem like they should have happen earlier once you are in that conversation!