Comments on merge request OTP/#8026 - Change documentation to use ExDoc

NelsonVides · January 22, 2024, 2:04pm

My grain of salt here.

TL;DR: Perhaps, just maybe, a PR achieving three different things is too big and the whole change can be made incrementally. But overall I’m greatly in favour of the proposal.

Edit, TL;DR: @MononcQc just gave an awesome detail of this whole thing : Comments on merge request OTP/#8026 - Change documentation to use ExDoc - #40 by MononcQc

Markdown >>> XML

A million times. It is an ubiquitous markup language widely used across all IT communities in the world. It is pleasant and very easy to use for both the reader as well as the writer, without tools. This makes the support burden way smaller:

I don’t need to install xml tools in my system I’ve ever needed for anything else only to build documentation for OTP. Markdown comes out of the box with the building system in my local build, without global installs.
It is much easier for new contributors, as there is no DTD to learn, no <> terminators to be careful with. If I want to just simply throw a quick PR fixing some typos it’s no effort for me at all, which in the current documentation I’ve been put off from way too many times.
It is readable even without pretty printing, just reading markdown in a raw text editor shows me clearly the structure of the text, XML takes a huge cognitive overload to find closing tags and not be confused by nesting levels. Yes yes there are tools that help with XML, but that’s the point, XML is so unfriendly it needs tools.
I just wrote this whole answer using Markdown, as we all did here, for example.

ExDoc

Yes, the great benefit of this is unifying forces across the whole BEAM ecosystem, and I think this is such a very very big point to be made worth the effort of hassling with tooling for a moment.

Docs in code or separately

Here I agree there’s a difference in the purpose of the documentation. One example that I come back to very often is gen_statem (because I’m a great fan of it!), the reference manual docs Erlang -- gen_statem and the user guide docs Erlang -- gen_statem Behaviour.

The first is details about functions, and that fits perfectly as headers to the functions I’m interested in to begin with. The maintainer of the function must know what is the contract of the function and how to use it and be able to explain so, together with the function. Details I don’t want in the documentation can be just comments into the code, the internal hows of functions are not something that users of the function need to read, only maintainers of the function, so, precisely, maintainers will have a better use of it if these are good old usual comments within the function.

The second is a user guide about how to use the whole thing, and that would be a big clutter to have within the module doc. ExDoc fortunately supports adding extras, see how rebar3_ex_doc itself does it: rebar3_ex_doc/rebar.config at cb6bc51928c0b4efd2b1081781466b9822f8e403 · starbelly/rebar3_ex_doc · GitHub

specs

Removing -specs? Wait why? I haven’t seen that one and I’d strongly vote against, but maybe there’s a good reason, dunno, why is it happening?

What (I think) we could do

Split the PR. One PR migrating XML to Markdown. Is there any tool that would build the same documentation looks as we already have it, but having docs written in markdown as annotations to functions? I think that to me would be the largest benefit by far. Makes reading and writing documentation a lot easier and pleasant.
Use ExDoc, keeping the distinction between reference manuals and user guides as it already exists, and get a single tool to benefit the whole BEAM ecosystem. Over time contribute to ExDoc to make it more powerful, extensible, and all it might be needed, and again have the whole BEAM ecosystem benefit from it. Having it eventually extensible might also help address custom formatting as desired by the OTP team.

mmin · January 22, 2024, 2:20pm

+1

Dumping specs may be fine if you have a single clause, but e.g. take a look at specs of erlang:system_flag/2. Then, take a look to the current docs of the function . Basically, everything that utilizes clauses in specs will become very hard to read. Type-specs are for sure way-to-go, but there were situations (in OTP also) in which you don’t improve specs because that would render harder-to-read docs. The point is that specs shouldn’t be removed, but shouldn’t be just dumped either, they should be parsed in order to provide more readable docs.

Maria-12648430 · January 22, 2024, 2:54pm

@MononcQc… consider me confused While I mostly agree with all you said, I don’t see how it relates to what I said earlier. My entire point was to illustrate that I see no benefit in putting user-targeted documentation in source code (instead of separate), not for the users, not for the developers, not for the ~~documentation guys~~ technical writers, and no matter how many people fill or don’t fill these roles.

MononcQc · January 22, 2024, 3:32pm

Yeah I didn’t mean it as an entire revocation of the previous set of arguments, nor singling you out personally (I do know you are a contributor familiar with how the documentation cycle for Erlang/OTP works) but to be sure to clarify the roles because it changes the represented workflow for the arguments.

Specifically, user-targeted documentation, where the user is another engineer looking for a reference (eg. function references for interface calls) can very well fit within the code. It’s the one that often specifies arguments, may give examples, and explains behavior for the person calling the code. This operates at a conceptual level that is nicely coupled with code, and even with tests to make sure the documentation is still relevant.

This however is distinct from user-targeted documentation where the user is a beginner or someone shopping around for “how do I even get this set up? How do I know this is for me” where you’d usually have the content of a README. This documentation typically lives outside the code (even though it may refer to some code or may provide sample modules/functions with documentation with them) because a lot of its concerns and content talks of context only loosely connected to the implementation.

Even in cases where rebar3, we find ourselves having often multiple layers of documentation:

a website with tutorials and high level descriptions that provide context in a way the tool or a single readme can’t
a reference documentation for commands invoked from the command line (rebar3 help)
a reference documentation for the configuration values users have to write (a sample file and a dedicated section on the website)
code-level documentation you may want to break down to module-level documentation expressing intent and implementation details for API-level calls, and function-level documentation for specific arguments or gotchas.
project-level documentation in issues, pull requests, or git history that also provides extra context for decision-making.

My preferences for the high-level tutorials for end-users is currently AsciiDoctor as the best trade-off between all the high-level flexibility I’ve seen in random tools (HTML, proprietary XML formats, LaTeX, etc.) when writing books, websites, PDFs, and so on. The flexibility I want in serving the user there requires something better than Markdown (which has no proper standard, can’t even do proper call-outs, has a tough time importing fragments of document without extensions) or LaTeX (all the document generation tools for anything but PDF I tried would break down in incredible ways if you used underscores or dashes in code where they didn’t expect them), for example.

For reference documentation, the EDoc format is adequate. It does provide the extra flexibility you’d want to generate acceptable higher-level docs, but mostly at the level of a fancy README and not at the level of a functional website, which is why higher-level projects like the Erlang documentation, ended up specifying extra formats like XML.

However it struggles a bit on the code-level documentation because that’s where you start having overlap with the reference documentation, and you find yourself having to read and write HTML/XML or some variant thereof hybridized with EDoc when you end up wanting to cover both. See the recon doc for example. In it, if you’re editing the code, you’re having a mix of XML, EDoc and code-level formatting all in one:

%%% <dl>
%%%     <dt>1. State information</dt>
...
%%%     <dd>Functions to access node statistics, in a manner somewhat similar
%%%         to what <a href="https://github.com/ferd/vmstats">vmstats</a>
%%%         provides as a library. There are 3 of them:
%%%         {@link node_stats_print/2}, which displays them,
%%%         {@link node_stats_list/2}, which returns them in a list, and
%%%         {@link node_stats/4}, which provides a fold-like interface
%%%         for stats gathering. For CPU usage specifically, see
%%%         {@link scheduler_usage/1}.</dd>
...
%%% </dl>

Funnily enough, this list can’t even be represented in markdown, which has no definition-lists, can be dealt with in other formats (such as AsciiDoctor), but people would typically work around it by writing it like this in the Markdown format supported by ExDoc:

%%% 1. **State information**
%%%    - Functions to access node statistics, in a manner somewhat similar
%%%      to what [vmstats](https://github.com/ferd/vmstats) provides as a library.
%%%      There are 3 of them: `node_stats_print/2`, which displays them,
%%%      `node_stats_list/2`, which returns them in a list, and `node_stats/4`, which
%%%      provides a fold-like interface for stats gathering. For CPU usage
%%%      specifically, see `scheduler_usage/1`

do note that the functions would be linked properly here. So what’s interesting of that format, despite not being my favorite but the one other folks on the Docs and Build & Packaging working group pushed for is that it removes the noise and impedance when bits of your documentation overlap.

The other thing though is that this markdown documentation, making code-level comments subjectively easier to read while still providing acceptable (though not identical nor ideal) reference documentation, ends up lining up with some of the user-level tools (especially if you use static site generators), project-level documentation (if you use github or gitlab or atlassian, to name a few).

So sure, the users might not be the same all the way through for all the types of docs, but when the authors and developers are the same and also some of the user base, I find it really hard to say this isn’t an actually better workflow despite some of the end users having a tougher time for it. The expectation here is that your person acting as technical writer is the one who’s going to suffer the overhead of matching up the new less expressive syntax to the requirements for the end-user guides if at all, through either better tooling, or extensions to existing tools.

(case in point for this last one, I migrated recon to use ex_doc so it ended up auto-publishing documentation on hex.pm each time I release a new version of the library on hex and shows it on their interface, but I still also maintain some ad-hoc script to convert it back in the format required for the old documentation site I had before when edoc was too bad to support this well.)

robert · January 22, 2024, 4:04pm

How does anyone expect Erlang to have a healthy community if discussions about changes as the ones above are to be first had in walled foundations guarded by membership fees? This is contrary to most healthy PL communities out there. Perhaps that is one thing to adopt from said communities before we think about what flavor of the day markup they use for the documentation.

This change is bewildering (not so much the format but rather the implementation choice) for me so I will not even try to say anything about that.

MononcQc · January 22, 2024, 4:17pm

You can join the foundation without membership fees, and if you give say 5 hours a month to any working group by participating, you get to be a voting member right away. Literally getting involved makes it otherwise free to be a fully qualified member of the foundation. It’s something informally called “sweat equity” and it’s part of how we can recognize the efforts of people supporting the foundation without spending a cent of actual money, only their time.

The foundation is not a secret cabal, it’s the people who were already working on all this stuff getting together to build a legal means of funding some of the efforts and organizing them, and opening up the process to any person willing to contribute.

robert · January 22, 2024, 4:38pm

You are saying that as if it changes any of what I said.

Your point is that good ideas and constructive discussions can only be had by those that get involved in some secret committee for five hours a month or shell out $99. My point is that you are wrong, and if you are looking for proof you are very welcome to make a survey in the PL community.

MononcQc · January 22, 2024, 5:29pm

From a practical point of view, if you want to get involved without spending any time talking to the people doing the work, you’re most likely to end up having to argue in github issues or in forums once it’s a done deal—as is the case today but also as was the case 10 years ago on the mailing lists—but sure, why not.

I don’t know what to say at this point, because many years ago, having a foundation that can fund projects was one of the things missing to have a healthy community. I don’t mean this as a personal point, but I’ve been around here 15 years, and whenever a decision is annoying to someone (whether it’s the introduction of unicode support in code, removing parametrized modules, changing docs, introducing maps, etc.), any discontent person says this is the end of the community and a sign of how bad things have gotten.

I have no skin in this docs thing aside from having written a lot of docs in the past, but it feels like people’s criteria for good processes is whatever ends up with a result they like, and whatever is a bad process is whatever ends up with a result they dislike.

kuna.prime · January 22, 2024, 7:26pm

exactly

I agree that this is huge benefit for the entire ecosystem, and if OTP chooses to package and distribute ExDoc escript, #2 would be completely mute.

As to if this is only temporary visual regression … ? I’m not so sure about. There are issues that are really not only design/HTML/CSS.
For example mayor regression is “spec dumping” and given that even official Elixir documentation is doing that (and no one seem to be bothered by it) are we realistic to believe that will change?

If erlang didn’t already have amazing documentation this would not be the issue but the fact that it does makes this big regression.

Maybe to clarify here, i’m taking two viewpoints while evaluating this PR:

One of library author - i want to be symmetrical to the OTP in my projects but i don’t want to have additional dependencies to produce my documentation, that doesn’t come with OTP or my distro (compiling ExDoc or searching online for escript is annoying at best )
One of developer using library - i don’t care which tool is used to produce what i’m reading but i don’t want to read something worse than before (solution is not to make issue with ExDoc project if my reading experience is less now, i don’t care about the tool so i’m taking issue with one that is distributing my documentation)

sure most of us here are in both roles probably most of the time, but we need to separate concerns as anything else is hostile to new users (in both of this roles) at minimum.

NelsonVides · January 22, 2024, 10:05pm

I’m sorry but I need to jump here. This is quite an attack here. At the risk of further digressing from the point of the thread, EEF meetings are open, I’ve randomly got into meetings of groups I’m not a member of nor do I collaborate with just because I wanted to follow up on them. I have also unmuted and talked. No secret committee, super easy to enter, all public. You get voting right by either paying or helping, what else do you expect?

Anyway. Fred answered perfectly already.

NelsonVides · January 22, 2024, 10:17pm

Integration in the BEAM ecosystem is I think a very very important point to keep in mind and I think one of the biggest wins of this point.

And well, yes, OTP has amazing docs already, and maybe this feels like a regression to some, but the problem is that while OTP’s docs are amazing to read, they’re terrible to write and change. Some have argue that there are better things than markdown, probably, but it certainly still wins by a long one over XML, and it’s a sufficiently ubiquitous choice in the whole IT world to be reliable enough.

As much as we want readable docs (obviously!), we also want docs that are easy to contribute to and easy to maintain, which is what has always been the problem. One could say we can’t have it all, so I’d say we can talk about how to balance things out. Docs that are easy to maintain and contribute to have a better chance of becoming readable that readable but unmaintainable docs have of getting contributions, IMO.

Also, maybe a bit of a contentious argument, but, Elixir has made a huge impact in the community, apart from many other reasons, also by its super smooth documentation. Can’t be that bad after all I’d say.

Nevertheless, docs I can just read from source-code without any pretty-printing are already a massive win to me. They can also be pretty-printed in the shell like the current shell already prints type-specs, making development easier.

About the library author point, valid enough, I’d love to have docs like OTP itself, but that is currently not easily achievable already, with this change it will be, installing GitHub - starbelly/rebar3_ex_doc: rebar3 plugin for generating docs with ex_doc as a rebar project plugin is all it needs, it works just like installing hex dependencies so I’d say that’s the lowest entry barrier possible already

Maria-12648430 · January 23, 2024, 6:41am

No problem, I didn’t take it in a bad way I just had the feeling that you missed my point or I yours, and that we were talking at cross purposes.

Maria-12648430 · January 23, 2024, 8:15am

I have been wondering about that for a while now. What exactly do you mean when you say “spec dumping”? I get the impression that you think that specs are about to be removed from the language, or at least the docs, but AFAICT this is not even remotely part of the PR or any consideration outside of it even.

mmin · January 23, 2024, 8:19am

Spec dumping = copying specs from the code to docs, literally. I wrote about it here.

Maria-12648430 · January 23, 2024, 8:45am

Ah, thanks, now I got it

nzok · January 23, 2024, 11:56am

As someone who has been involved with Erlang since the 1990s, I have to say that this is the first I ever heard about what it takes to have voting rights in the EEF, or indeed what the EEF does. One of the advantages of something like the old mailing list – which I sorely miss – is that you could automatically post a brief “get involved with the EEF” message to the list every two or three months so that eventually even the dumbest reader (embarrassed grin) would notice it. The first I ever heard about this proposal, for example, was in my mailbox a few minutes before I made my first reply. Otherwise the decision would have been done and dusted without any input from me.

By the way, contrary to what someone said here recently, there is a standard for Markdown.
It’s called CommonMark (commonmark.org) The web site claims that CommonMark has been adopted by Discourse, GitHub, GitLab, Reddit, Qt, Stack Overflow / Stack Exchange, Swift and there is an Elixir wrapper for one of the C implementations, so there’s already support in the BEAM world. Now I hate Markdown in all its variations for its complexity compared with Scribe and Lout, and I’m happy to criticise it till the cows come home. But I can no longer criticise it for having no standard but an ultra-sloppy spec with no two implementations agreeing.

I have spent a fair bit of time building an experimental Smalltalk-via-C compiler (yes, a dumb static compiler can give commercial best-of-breed dynamic compilers a run for their money) and an extensive library for it. As things have turned out, the user documentation and the internal documentation have approximately 1% overlap. The user documentation is a huge amount of work, and the tools have nothing to do with it. There is just a lot that needs to be said and explained clearly. Some of the documentation in the code takes the form of annotations like

<compatibility: #{system}>
<see: #{selector} [in: class}>
<compositionOf: #{selector} and: #{selector} for: {speed|safety|correctness|space}>
<complexity: ‘{formula}’>
– even if all input numbers are exact, the answer might not be, e.g., sqrt
<overrideFor: #(speed|space|safety|correctness|testing)> – overriding a method normally shouldn’t change the semantics, it’s done to change the pragmatics. Why is this method overriding? Not part of user documentation; the user shouldn’t know the method is overridden.
– is called indirectly
<requirement: ‘{file or url}>’
<supportFor: #{selector}> – needed for some other selector, this interface is not meant for general use

The file that implements run-time access to these (and other) annotations is nearly 500 raw lines,
only 100 of which are actual code. 50 lines are comments attached to specific functions. 350 lines
are about the whole file. Of that, about 150 lines describe the annotations listed about (and several others) and what they mean, and the rest is mostly comparisons with other dialects and future directions.

Writing those 350 lines was hard work. But they are in plain text. Markup was never an issue. At some
point I’ll transfer the 150 lines to the user manual and add some text explaining how to use the run-time facilities. This lot will be in LaTeX, but the effort of adding LaTeX markup will be TRIVIAL in comparison to the effort of WRITING GOOD PROSE. (By the way, if you think such annotations sound a lot like @Annotations in Java, Smalltalk systems had annotations before Java did.)

To put it bluntly, I don’t care what markup system is used for writing documentation, I’d prefer Scribe; I’m used to LaTeX; I’ll use Troff or Lout if paid to. NOT ONE OF THEM is going to make my life measurably easier when writing documentation for people who don’t yet know what I’m talking about. Pretending that any markup system is going to make it easy is disingenuous; expecting that any markup system will do so is hopelessly naive.

The documentation system for Haskell is called Haddock. When it comes to Haskell, I often struggle. I’m a “set theory” person, not a “category theory person”, and Haskellers do love their category theory. So I pretty much count as a “beginner” in the Haskell world, though I’ve been using it for ooh, 25 years. What’s the feature of Haddock I like best? Is it the fonts? Is it turning ASCII keywords into Unicode? Is it colouring? There are two features. Cross-references. And the (source) button. Time after time I have to turn from the documentation to the (source) in order to understand what the documentation is talking about.
Just yesterday, for example, I stumbled across something that sounded interesting, but I had real trouble understanding the documentation. Turning to the (source) button, I found that it was useful to me, and the basic idea was stunningly simple. Now if you have to turn to the code to explain the documentation, the documentation tools clearly aren’t where the real problems lie. (Haddock is one of those Javadocy works-from-comments-in-the-code systems, and strikes me as an excellent one of its kind.) The problem is the expository prose.

Do the proposed new erlang documentation tools provide automatic linking

from a module or type or function to its source code?
from a module or function to its test cases?
from a module or function to its requirements document?
from a module or function to its bug reports?

If so, wonderful. If not, you’re not helping.

peerst · January 24, 2024, 1:11pm

Thats totally not true. Basic Membership in EEF is free and with that you can join any WG, get access to the EEF Member Slack and participate in anything going on. Its all open and transparent.

Please stop spreading the misinformation that it costs something.

peerst · January 24, 2024, 1:18pm

One more thing here: these “voting rights” that have been mentioned have nothing to do with voting on OTP features or the future of Erlang. Its basically only the right to vote on the Board Members that steer EEF and make decisions how the money is spent.

These voting rights are totally not necessary to participate in EEF Working Groups or anywhere else. What @MononcQc mentioned that if you participate a certain amount you even get these voting rights without paying. But besides the Board elections they are not necessary for anything

peerst · January 24, 2024, 1:26pm

EEF is actually sponsoring Erlangforums.com (see our logo below). But I guess we need to explore how we can make more noise here since we obviously seem to be missing parts of the community.

peerst · January 24, 2024, 1:35pm

That would probably be not so easy. Is there even one “best” way here or just a multitudes of tastes clashing with each other. How can this been done objectively?

We could only try to find a UX expert and have them make suggestions on better formatting, but then again many will dislike the result because everyone is their own readability expert in their own subjective world (rightfully so, because I can’t read that well is for sure true for each individual).

If someone has a good proposal how this can be improved by spending a reasonable amount of money we (EEF) can consider that. Suggestions where the resulting design is not EEFs fault preferred