Start of indices - is there a rationale behind it?

Hi,

As I understand, in Erlang, by default (except the ones of the array module and the positions in binaries, which are zero-based), indices start at 1.

For example a = element(1,{a,b,c}).

Nevertheless, string:substr/3 (1-based) has been replaced (in OTP 20.0) with string:slice/3, which apparently is 0-based:

> string:substr("Hello World", 4, 5)
"lo Wo"

> string:slice("Hello World", 4, 5).
"o Wor"

Is there a rationale for that? Isn’t it unnecessarily confusing?

(moreover the doc of string:slice/3 is not explicit about that; this can only be guessed based on the typing and the examples)

Thanks,
Best regards,

Olivier.

Not to mention:

lists:sublist(“Hello World”, 4, 5).
“lo Wo”

1 Like

Hey Olivier!

In Erlang there isn’t really such a thing as an array type. What you have are lists, and even strings are just lists of integers (character codes). That’s why most things in the standard library are 1-based. For example, “element(1, {a,b,c})” returns “a”, which feels natural because we usually think in terms of “first”, “second”, “third” element. But when the string module was rewritten in OTP 20, a new function “string:slice/3” was introduced - and that one is 0-based. The reason is that it was designed with binaries and UTF-8 in mind, and the maintainers wanted to align with conventions from most other programming languages. In Python, Java, C, and many others, slicing starts from zero, so Erlang followed that pattern here. As a result, the old “substr/3” stayed 1-based, while the new “slice/3” is 0-based. There’s a rationale: for slicing and working with byte positions, zero-based indexing is often more convenient and familiar to people coming from outside the Erlang world. But for long-time Erlang developers, it’s understandably confusing because everywhere else in the language indexing is 1-based. The documentation doesn’t spell this out very clearly - the only hint is in the typespec where Start is a non_neg_integer(), which implies it can be zero. So yes, it’s a deliberate decision to follow a “global convention”, even at the cost of breaking Erlang’s internal consistency.

Thanks for your answer@vkatsuba; the rationale behind design decisions is often interesting (and too rarely reported); I nevertheless wish Erlang had a single “indexing convention” (and was fine with the 1-based one) - but it is not a too serious issue. The slice/3 documentation could be a tad clearer as well. Thanks again!

2 Likes

Erlang/OTP is an open-source project. If you have ideas for improving the documentation, I’m pretty sure the @erlang_core_team would be glad to review and accept contributions! Don’t hesitate to make changes and contribute to Erlang/OTP. Some time ago, a great article was published on this topic: https://medium.com/erlang-battleground/all-you-need-to-know-to-start-contributing-to-erlang-2fcd5748319e. If you have any questions, feel free to ask in the forum, join the Erlang Slack, or just DM me directly.

1 Like

There is a very simple explanation for this: indices refer to an element at a certain position, whereas string slicing operates on positions between elements:

1> string:slice("Hello World", 0, 1).
"H"
2> [lists:nth(1, "Hello World")].
"H"

You can see the string slice API values as cursor positions, e.g. “take all characters between cursor position 0 and 1.” It doesn’t refer to the elements themselves.

3 Likes

Lisp uses zero origin for lists and vectors. There is no connection between what sequence representation you are using and where you start numbering.

I designed the “string” interface in Quintus Prolog, which had to interoperate with Xerox Lisp. The basis of the whole thing was the relation
ABC = A++B++C & |A|=Drop & |B|= Take & |C|=Rest
which unites substring, concatenation, and searching.

There is no relevance of this here is that Prolog was and remained a consistent 1-origin language. This fundamental string relation does not mention ANY index wgatsoever. All the numeric arguments are COUNTS; lengths of one fragment of the whole or another. This turned out tobe a stunningly effective approach. It virtually eliminated off-by-one errors.

And this is what is happening with string:slice/[2,3]. The second argument is a LENGTH not and index. It should be described as slice(String, Drop, Take) where Drop is the number of characters you do not want and Take is the number of chacters you do want. This has or should have nothing to do with indices.

Turn to APL, where the index origin can be set to 0 or 1 at run time, but TAKE \uparrow DROP \downarrow ARRAY works independently of the index origin.

One reason it is important to understand this parameter as a count, not an index, is that if you used the AWK-style substr(s, index, length) you mightt on occasion, need to pass length(s)+1, which is not the index of anything.

I don’t know why the designers of the string module chose the convention they did, but out of all the conventions I’ve hD to use in the past, this one is arguably the best, and has the merit - unlike its Scheme rival - of not delending on inixes or index origin at all.

Rewrite the documentation to talk about LENGTHS not positions and everything becomes clear.

2 Likes

That’s a very good clarification! I was looking at this mostly from the historical/user-facing angle: OTP 20 introduced “slice/3” alongside the older “substr/3”, and since the docs show Start as 0 it looks like Erlang suddenly has a “0-based” API in contrast to the usual 1-based world.