Comments on merge request OTP/#8026 - Change documentation to use ExDoc

mmin · January 21, 2024, 1:23pm

I don’t see how code and documentation are more tightly coupled in the new documentation format. If you want to write extensive documentation for a module and/or functions, you can write it in a separate file and reference it in -doc attribute. You have a choice, either inline it if it’s short enough or move it to a separate file you believe there is a need. If you want, you don’t even need to couple documentation to modules, you can just include it in the documentation with ex_doc’s extras option.

100%, but “user” and “maintainer” are very vague terms and “what they need to know” might change across different projects. There could also be more roles, because why not have different documentation for sysadmins, for devops, and for security team? I can see that resolved with profile-based documentation that could actually be implemented in top of the current solution. E.g.:

-doc #{scope => user}.
api() ->
    do_stuff().

-doc #{scope => maintanier}.
do_stuff() ->
    some_super_optimized_algorithm().

Then you could run something like rebar3 as maintainer ex_doc and get your maintainers docs. I’d say that maintainer docs should also build user docs, but should not contain it directly. Maybe there could be a link to the user doc for the same function or it could be inlined (that could also be configurable).
That way one could achieve different aspects of documentation.

Yes, but if there is so much to document, with the new docs you can move it the the separate file. In most cases you just put a line or two and I’d say that it’s not a clutter and sometimes it is more convenient to have docs directly in the code. If you want to include examples, edge cases, pitfalls, move it to the separate file.

Back to the real world, most project will (hopefully) write user docs, while maintainers docs will be contained in a few (if any) md files. We should make writing user docs as smooth and simple as possible, but eventually provide support for different aspects of documentation that could fit big and complex projects.

jimdigriz · January 21, 2024, 2:22pm

I should probably qualify this.

OTP is not my code base. I am not a maintainer. I do not contribute to its running costs.

I am just a moocher.

As such, the implementation is irrelevant to me.

I do though love standing on a soap box…

lawik · January 21, 2024, 6:06pm

Seems like contributions to inline relevant specs would be helpful.

From my experience using the Erlang do that type information is indeed very important so tweaking the ex_doc CSS in support of giving it more focus could be well worth the effort.

I look forward to this change a lot. I read a lot of docs on my phone which is a bad experience under current docs.

As an Elixir dev I will suddenly start reading Erlang code because ex_doc puts a direct link to the source with every piece of function documentation. Power of habit and a low barrier, I guess.

juhlig · January 21, 2024, 6:59pm

To be clear, I’m not talking about the documentation system in general, that is ok and after all, everyone is still free to choose for own projects. I’m talking about putting inline documentation in OTP code, which affects all of us to some degree, at least those who want to contribute to OTP. As long as it is for simple, straightforward functions in simple, straightforward modules, putting it inline is ok, I don’t mind. As an example on the other end of the spectrum, take the supervisor module as an example. See the changes the discussed PR inflicts on this module, and then tell me in all honesty that this doesn’t bloat the module a little bit. I don’t mind the moduledoc at the top, but the function docs tears the actual code apart.

mmin · January 21, 2024, 7:16pm

But those changes are made with a tool, not manually. So, I believe, if one takes a bit of it’s time, that moving supervisor’s docs to separate files would be accepted as a PR. Take a look at lists module - many of it’s fun docs contain just a few lines. Would you extract everything into a separate file?
There could actually be a tool that would extract e.g. function docs larger than X lines to a separate file placed in a docs/<app>/<module> directory with name <fun_name>/<arity>.md, but it takes some time and effort to develop.

nzok · January 22, 2024, 6:54am

Well, this thread is the first I’ve heard about it.
My comments are pretty much bog-standard technical writing ACM SIGDOC stuff.
None of them are original and all of the points are DECADES old.

As for the ship having nearly sailed, BEWARE THE SUNK COST FALLACY.

nzok · January 22, 2024, 7:23am

Here are two points about embedded documentation.
(1) User documentation is best done by technical writers who know enough about the
system but not too much. They should not be changing the code. The engineers
know too much. They should not be changing the documentation. So in an ideal
world, the version control system would not grant engineers write access to
documentation or technical writers write access to code. If your budget does
not stretch to a technical writer for user documentation, bear role-based
access in mind and remember that coder and manual-writer are different roles.

(2) Code and documentation do or should have different revision cycles. If
there is an error in the documentation, it should be possible to fix that
without the build system thinking the code has changed. Now of course it
is possible to program a system where there is a tool that reads the
abstract syntax of a module (filtering out comment and layout changes) and
sorts it into a canonical order (filtering out presentation order changes)
and computes some sort of message digest from that, so that foobar.erl =>
foobar.erl.b2, and drives rebuilding from the message digests. (You’d want
to canonically relabel variables too so that variable name changes were
ignored.) In fact with what’s in OTP it would not be much of a challenge
to do this, but if it has been done nobody bothered to tell me about it.
Conversely, to avoid excess documentation regeneration, you need tools that
strip the code out and just notice changes to the documentation.

nzok · January 22, 2024, 7:30am

The topic was REPLACING the existing documentation and documentation system.

The only problem I ever had with the old printed Erlang documentation
was that it desperately needed copy-editing by a native speaker of English.
(And a lack of per-module concept indices, like the mini-indices in the
Stanford GraphBase book.)

So effort put into REPLACING documentation would be better spent proof-reading
and indexing the documentation we already have.

nzok · January 22, 2024, 7:45am

“the rendering is independent of the source.”
Hmm. Documents written using Troff look different from documents written using Scribe look different from documents written using plain TeX look different from documents written using LaTeX look different from documents written using ConTeXT look different from documents written using Markdown look different from HTML (where every browser I’ve tried, even with quite detailed CSS, renders pages to PDF differently) look different from documents written using Typst (yes, Typst is a real wanna-be TeX replacement and can produce some beautiful documents) look different from documents made using Word (and I have never yet seen two word processors render the same RTF or .doc file so that you couldn’t tell which did it).

With advice from a professional typographer, and style sheets that adapt to characteristics of the medium, I have no doubt that a new documentation system could produce documentation that was EASIER to read than what we have. (Under “adapt to … the medium” I note “which font is more readable” depends on whether the text is plain or coloured and whether it’s light text on a dark background or vice versa. It also depends on the reader’s visual acuity, sigh. I’m going to the optometrist on Friday…)

I don’t mind if “new” and “old” look different. I just don’t want “new” being harder to read.

juhlig · January 22, 2024, 8:07am

Each time I revisit this thread, I find myself more agreeing with @nzok et al

Yes. But the supervisor module was just an example. There are tons of other modules around that would need a similar rework. The PR touches >3.000 files, round about half of that are removals of the XML files, that leaves 1.500. They would all have to be reviewed and some of them reworked.

To be honest, I would move all of them into 1 separate file, much as it is now.

And then we have documentation in several places. Some in the code, some in separate files, the latter for no other reason as that they are too long. Having a separate file for each function means that you lose cohesion between them. Many pieces of documentation for one function refer to other functions in the same module, more often than not those are the functions with large explanations. And the functions with short documentation still interrupt the reading of code.

The more I think about it, the less I see any benefit in placing user documentation in the code. Turning it upside down, why don’t we place the code in the documentation files? Absurd, of course. Still, this is what this kinda does.

Maria-12648430 · January 22, 2024, 8:08am

I was just pointing it out

Maria-12648430 · January 22, 2024, 8:40am

My current experience entirely

To elaborate on this… I can see three aspects concerning documentation.

The user. This is who the documentation is written for (the user documentation, to be sure, I’ll leave out maintainer documentation etc out of this for now). For him, what counts is to understand what a given function does and how to use it, what other options and related function exist, pitfalls, examples, all that. What the user does not care about is where the documentation is placed and how it was generated. What he also does care about is the form of the presentation, like indendation, colors and fonts used, all the things that make up readability (and understandability).
The developer. This is the person who writes the code. For him, what counts is how easy it is to browse and navigate the code. Comments help. User documentation is useless and disturbing to him. If he needs user documentation for a function, he can read the generated user documentation, ie go to “user mode”.
The documentation guy (often the developer, but not necessarily so). This is the person in between the developer and the user. He speaks “user language” and the system the code is written in (XML, md, whatever). He ideally has some understanding of the code, but to a much lesser degree than the developer. Ideally, the developer tells him what a function does etc, and he puts it into words understandable by users.

So there, I see no benefit in putting user documentation in or alongside the code.

The user doesn’t care about where what he is reading comes from. He will read the generated document, not look inside the code.
The developer, as long as he is developing at least, does not care about the documentation. If he does need some documentation at all, he will also look at the generated document instead of finding the documentation in the code. Worse for him, the documentation sprinkled across the code disrupts his flow (at least it would seriously disrupt mine).
The documentation guy, for him the location of the documentation matters. But now he will have to do it in the source files, be careful not to accidentially mess some code up by a search-and-replace for instance. Having code between the documentation disrupts his flow. Plus, working in the same files as the developer, they are bound to step on each others’ toes.

To be clear, this statement is made from a person regularly making contributions to OTP. For users running own projects, I don’t care, if putting code and documentation together is what works best for you, it’s ok with me. But for OTP itself following that path is IMO a bad choice.

garazdawi · January 22, 2024, 9:09am

Hello everyone!

Thanks for all of your ideas and opinions!

I’m not going to try to address everything, but rather try to speak to some general themes in the discussion.

The purpose of this change

From my perspective, the purpose is threefold.

Firstly, we want to migrate to a way of writing documentation that makes it easier for experienced and beginner users to contribute to the documentation of Erlang/OTP. This has lead us to want to move away from our own XML format to something that many more tend to be familiar with, which we have chosen to be Markdown. This choice poses a problem though, Markdown is a lot more complex to parse than XML, and while Erlang parsers exist we would very much prefer to not have to support one.

That brings us to the second purpose, decreasing the support burden for documentation tools. No-one at the Erlang/OTP team is any good at visual design. If we were to design the documentation it would look like this. Which, although it is functional, cannot be called nice. We don’t work with HTML/CSS on a daily basis, so eachtime we need to do a change somewhere it takes a lot of time for us. So for us, switching to a tool where we are not the primary maintainers make a lot of sense. We will of course still need to do the occational PR to fix things in ExDoc and its dependencies.

And lastly, we want to move towards having fewer documentation tools in the BEAM ecosystem. It thus makes sense to put our efforts into the currently most used tool for documentation.

Regarding docs in code vs separetaly

I’ve worked most of my carreer with documentation outside code and I’m a fan of that approach for most of the reasons already mentioned here. Having said that, many other language communities seem to manage just fine with docs in code. Maybe they have a larger tendency to write an API module and move the actual code to supervisor_impl, or maybe Java engineers have better code folding IDEs which means that they can hide code/docs when needed.

It will not be hard to write a tool to move all the docs out of the code again if in the future we decide to do so. What will be hard is migrating from Markdown to anything else in the future. Markdown is a very odd language.

Regarding the styling of ExDoc

As I said before in this thread, if you have concerns I suggest you talk to the primary maintainers of ExDoc. If we make ExDoc better, both the Erlang/OTP docs and any Erlang project that uses ExDoc will get better. We will in due time propose changes that we think makes sense, so while in places it will look worse in this PR than the current doc, things will improve with time. This is only a single step in a long journey.

Maria-12648430 · January 22, 2024, 9:28am

I would argue that they just don’t know any better, it is what they grew up with

Anyway, I guess what I was saying earlier is that there is no reason to do it that way if you don’t already have it that way. I just see no good point to doing it (as I elaborated), only bad.

As to that, I have a firm belief that you should not depend on IDEs to make your life liveable. They can make your life much easier, for sure, especially in large projects. But at the same time, any source code should at least be readable/understandable with as much as a simple plain text editor, ie without any help from an IDE. Reading code without syntax highlighting and all is hard enough, anything that does not introduce anything that makes that harder (like documentation between the code) is much appreciated.

That is an upside to things, yes, and the earlier the better IMO I’m not looking forward to the time until that is done, though

(Just my two cents )

NAR · January 22, 2024, 10:45am

In my experience even if the user documentation is within the source file, it’s somewhat separated from the code. I mean source files tend to start with the exported (and thus documented) functions, then there’s a comment like %% Internal functions followed by the actual implementation (which does not have user documentation). The exported functions are usually (but not always!) small, e.g. gen_server:call(...) or foo_impl:do_something(...), so the first part of the source file tends to be more documentation, less code. The second part is more code, less user documentation, that’s where the logic is. Don’t forget that we already have inline type specs which also provide documentation and are also “not code” that uses up screen estate.

nzok · January 22, 2024, 11:30am

“Firstly, we want to migrate to a way of writing documentation that
makes it easier for experienced and beginner users to contribute to
the documentation of Erlang/OTP.”

Back when I first looked at the Erlang XML documentation, I was
thoroughly familiar with SGML and XML and had an abundance of tools
for processing it. The one thing I didn’t have was documentation
about the Erlang DTD - what tags should be used for what in which
contexts. I had XML toolkits of my own in C (still faster than
expat!), Prolog, Erlang, Scheme, Smalltalk…, and writing XML
processing tools using languages like these was about two orders of
magnitude easier than using XSLT. C#, which does the
heavily-marked-up ‘documentation’-in-code thing, uses XML (see for
example

so if you want to go with 'some language communities are OK with doc
comments" then you also need to go with “some language communities are
OK with XML for beginner users” as well.

The question is, what does it actually *MEAN^ for experienced and
beginner users to contribute to documentation, especially if by this
we mean "to make USEFUL contributions, good quality, to documentation?
From the perspective of a whiny entitled user, it should mean “help
them to produce better text that is suited to my needs, thank you very
much, and no I can’t afford to pay you anything.” I care that

code fragments should be current and correct (right names, right
puncutation, up to date)
text should use correct word choice, be correctly spelled and
punctuated, be grammatical, and idiomatic.
cross references (explicit or implicit) should refer to the correct
thing (no dead links, internal or external)
where concepts are complex the prose must be simple

So tools like spelling checkers, grammar checkers, the old Unix
‘style(1)’ and ‘diction(1)’, index/link chckers, THAT is the kind of
thing I would see as helping people contribute to documentation.
Templates. Guides. Possibly even VSCode support specialised for
writing and revising documentation (which is simpler if you don’t have
to cope with code at the same time).

I would argue that beginning users should NOT be contributing to
Erlang/OTP documentation directly.
They should be encouraged and helped to contribute suggestions for
improvements to the text, but are the very people who should NOT be
touching the files. If you do want beginners to touch documentation,
that means you MOST CERTAINLY DO NOT want the documentation in the
code!

Which introduces the point that there is no good reason to believe
that something intended to help experienced users contribute to the
documentation will also help beginning users, and vice versa.

kuna.prime · January 22, 2024, 1:02pm

this is exactly wrong course or action.
So we are changing to a tool that is producing worse format of documentation that we have today in order to have hope that it will get better in the future.
And don’t get me wrong, your arguments for the architectural reasons are fine if OTP team is sole owner of that new tool, but that’s not the case. Sure, we can hope that there will be some contributions in “right” direction but community cannot , and for that matter OTP team also can not force the issue and then we are stuck with whatever horrible format external tool produces (and format is not the only issue, dumping -spec is another one), and then we are on the square one again: either we will be forced to tolerate regression or OTP team will have to take on again burden of producing correct too.

To put it simple: first make sure that tool is correct for the job then transition to it. ExDoc is not correct tool because no matter how it simplifies transition to Markdown or whatever it produces worse, less readable, documentation, end for the new and current users reading documentation is much more important than question “XML or Markdown”.

garazdawi · January 22, 2024, 1:17pm

We’ve had documentation for the DTD since R11B-5, which is the same release in which the documentation XML files were added to the opensource release tar archive. So the reason why you didn’t have it was because you didn’t find it, not because it was not there.

It is indeed. One of the nice things that comes with using markdown is that we can dip into the tools that have been developed for it, like markdown linting and formatting tools. There are also equivilants to diction, style and aspell that can be used.

My intent was not (alltough I admit it was not clear) to refer to beginners of Erlang, but rather beginners with contributing to us. The main issue there is that we use a different documentation format and tool than everyone else is the community.

garazdawi · January 22, 2024, 1:34pm

Except that now any change we make will benefit all of the BEAM ecosystem, instead of only the Erlang/OTP docs. I think that is worth a (mostly) temporary visual regression.

I have no illusions about the fact that we will be forced to make contributions, in fact if you check the ExDoc repo statistics you will see that I am currently #7 on contributor list and I expect that I will climb. We chose ExDoc because we have a good relationship with the maintainers and expect to continue working with them to make it better for everyone.

So I suppose that what I’m asking for is a bit of patience, the OTP 27 release is still 4 months away and we are going to continue working on this until the last day before the release.

MononcQc · January 22, 2024, 1:57pm

Maria-12648430:

The user. This is who the documentation is written for (the user documentation, to be sure, I’ll leave out maintainer documentation etc out of this for now). For him, what counts is to understand what a given function does and how to use it, what other options and related function exist, pitfalls, examples, all that. What the user does not care about is where the documentation is placed and how it was generated. What he also does care about is the form of the presentation, like indendation, colors and fonts used, all the things that make up readability (and understandability).

The developer. This is the person who writes the code. For him, what counts is how easy it is to browse and navigate the code. Comments help. User documentation is useless and disturbing to him. If he needs user documentation for a function, he can read the generated user documentation, ie go to “user mode”.

The documentation guy (often the developer, but not necessarily so). This is the person in between the developer and the user. He speaks “user language” and the system the code is written in (XML, md, whatever). He ideally has some understanding of the code, but to a much lesser degree than the developer. Ideally, the developer tells him what a function does etc, and he puts it into words understandable by users.

So there, I see no benefit in putting user documentation in or alongside the code.

I would define things a bit differently:

you have users, but they are also fitting into different categories: are they long timers in the community? New joiners? junior engineers or experienced ones? Are they people with or without experience in functional languages? Documentation written for experts (often reference content) tends to have a different shape from documentation written for people who have limited functional programming experience or none at all (often more tutorials)
you have developers who wrote the code the documentation is about. Here again there are differences depending on where you are: is documentation an afterthought? Something added after writing code? Something done as part of writing code? Are the developers receiving any training in technical writing or is any toolchain provided to them?
you have technical writers, maybe. Most places don’t. Some places have them, but you might have one or two per department or company, and they also write public copy for marketing material or end-user documentation not intended for engineers building with what is there. Some places give you one technical writer per team. The lack of actual formally trained technical writers is why we see a term like documentation guy, the same way we tend to have engineers write their own tests and we see fewer and fewer tester roles out there, even if they still exist.

The thing that’s missing in your overview is that very often, these 3 roles are done by the same person: the engineer writing the code has to write the docs for themselves and the rest of their teams, and the same documentation is made available to external users.

If your documentation tool is less pleasant for more perfect results, you’re trading that off on your engineering time for other issues.

This has knock-on effects: if you find yourself receiving contributions from the public, who may already be familiar with other tools but not yours, your maintainership turns into a job of validating commits and writing docs (which the original contributor now has to validate, rather than writing it themselves), for example.

And so while I can understand the arguments going in favor of what is the theoretical ideal for producing the best documentation around, my experience with the Erlang community is that the thing it has the most glaring lack of is people with time and desire to contribute. There’s more work to do than people available to do it. This is true of the core language, of the toolchain, and of the foundation. This doesn’t mean there are no contributors or that the existing ones aren’t appreciated, but that the roadmap maintainers would wish to hit isn’t realistic under their current workforce and contributors.

I’ve not been directly involved in this change but I’ve seen ongoing discussion about it over the course of the last year or so, where most of the contributors aligned on the need for a change in the implementation format but most of the issue was on the redaction format. You can see this more closely in the list of EEPs that were required for that change as well:

EEP 48: Documentation storage and format (~Jan 2018)
EEP 59: Module attributes for documentation (~Feb 2022)

Ignoring the advantages coming with these changes (aligning toolchains across parts of the community, integrating with other testing or publishing tools), this appears to partly be the very frequently seen outcome of decision-making as done per communities of volunteers: The people doing the maintaining and the volunteers involved in the decision-making aren’t the same people as the critics, and the critics present in the room at publication time aren’t the total set of end-users either.

My intent here is not to say “if you don’t want surprises, get involved” because it’s frankly unrealistic to expect everyone to be involved in everything all the time, and I hate the “you had to be there if you want to say anything” line. Some of these changes are seemingly years in the making, and the current Erlang documentation format has been pointed out as a blocker often, albeit by other people than those here, including people submitting changes to Erlang/OTP itself.

Anyway I wanted to point out that the distinct user sets are possibly true in some work environments, but the user/developer/writer distinction does not really exist in this specific case.

If the criticism of the changeset does not recognize this reality, it is criticizing a change without a proper understanding of the workflows it aims to change and correct and instead works on a mistaken ideal that doesn’t exist.