Maximum number of parallel processes

Abdelghani · August 5, 2022, 3:57pm

Hi everybody,
consider we have a pool of processes that handle a mnesia table and we access each process by hashing the key, I think the maximum number of these processes for the best performance is the number of cores because any other process will wait if all are busy and will run as concurrent and not parallel,
Please is that true or not ?
Thank you.

kokolegorille · August 5, 2022, 4:44pm

It’s the number of schedulers that is related to the number of cores.

Processes, You can have many, like really a lot

Abdelghani · August 5, 2022, 5:08pm

Yes I know, but the SMP bahaviour will put Erlang Processes each on one Thread until reaching the number of Threads(cores) and the next will be the second process in the first Thread queue, should learn more about ERTS

AstonJ · August 6, 2022, 2:18pm

Try not to worry too much about performance until you encounter issues (unless you have a real need to - such as when evaluating porting a large scale app that is already in production and encountering issues).

Erlang has been around for decades and over that time the Erlang/OTP team and numerous members of the community have been working on performance tweaks - it’s why the Erlang VM is one of the leaders in the field

Erlang/OTP is smart - I can’t remember which book I read it in now (probably book-programming-erlang) but if a process is taking too long to do something it will be paused and moved to the back of the queue, this is one of the things that makes Erlang so highly available and resilient

You should Have a look at our Books section and check out some of the reviews

As a general tip, and as others have mentioned to your previously, try not to be too theoretical - get out there and start building apps and only concern yourself with issues you are experiencing or think are likely to encounter depending on what you’re building… there’s no need to stress over some of the details until you really need to

Abdelghani · August 6, 2022, 2:33pm

Thank you @AstonJ, but sorry you didn’t understand my question you should have an idea about what I mean.(ERTS is surely smart, but if you didn’t use it the right way it can be no-smart)

Abdelghani · August 6, 2022, 4:50pm

@AstonJ thank you so much for your help, I want to create a github account and put a little database server within a week but how can I try it’s performance in terms of scalability and fault tolerance ? your opinion interrest me a lot

rvirding · August 6, 2022, 10:43pm

One thing to be very much aware of is that the BEAM puts a lot of effort into making sure that processes will not block the system, even if they do a lot of continual work!

For example after 4000 reductions (function calls) a process is automatically rescheduled and its scheduler will take the next process in its run-queue and execute that. There is never a need to explicity try and make a process yield in some way. Also processes suspend when waiting for messages and are rescheduled when a message arrives or the receive timesout so there is no busy wait.

Also processes are automatically load-balanced over all the schedulers so no scheduler will sit dormant while the other schedulers are doing a lot of work.

These are some reasons why it is perfectly reasonable to run systems with hundreds of thousands or even millions of processes. This is why the most important thing when structuring the system is to look at the concurrency the problem and your solution have and from that work with which processes you need and what they should.

EDIT: One of the major requirements we had from the very beginning when developing Erlang was that the system should never block.

Abdelghani · August 6, 2022, 11:29pm

Thank you sir for your answer, it’s a great honor to talk with one of the creators of erlang (in fact I know just the Joe).
That looks a convincing answer, I have absolutely understand what did you say, my first thinking was that the Erlang SMP architecture (for multi-cores machines) try to distribute and balance the new created processes over available cores(schedulers), for example if we have 4 cores and we spawn 6 processes, the ERTS will spawn the first four processes each on one scheduler, the 5th process will be the second process in the first scheduler’s queue and the 6th will be the second in the second scheduler’s queue, so when I talked about fixing the number of workers at the number of cores I mean that other processes much than that will always wait to be scheduled, that’s was my first thinking.
This is exactly the case of standard Parallel model as most other languages use (if we have just one erlang process per core, why using erlang processes we can just use the associated Thread to do the job), and here come the concurrency idea that overload the Thread with a high number of lightweight processes to get more scalability and performance.
Thank you for your help Mr @rvirding.

AstonJ · August 7, 2022, 12:16pm

Fantastic post - thank you Robert!

I have bookmarked it and will now refer to it whenever anybody asks about performance

Actually, it’s given me an idea for another thread, will post it now and include what you said!

rvirding · August 7, 2022, 2:26pm

No, it doesn’t quite work like that in the BEAM. In the BEAM a new process is spawned/started on the same scheduler as the spawning process. It is then up to the built-in load-balancing mechanism in the BEAM to distribute the Erlang processes over all the available schedulers. There is no way for me to specify on which scheduler I want a process to run and it will be moved between the schedulers as the load-balancing mechanism sees fit.

Each BEAM scheduler runs on its own OS thread and it is generally up to the OS to move the scheduler threads around on the machines cores. On some OS it is possible to specify this but I have never tried doing that. I just see the load on my Mac cores change when I am running Erlang stuff. There are options to allow you to have some control over the load-balancing mechanism, e.g. how eager is the BEAM to load-balance. An interesting feature is that if the BEAM feels that there is so little to do it can move processes from schedulers and “put them to sleep”. When the load goes up again then these schedulers will then be reawoken and start working.

As I said there is really an aweful lot of really smart things going on in the BEAM. With all the things the BEAM does the best way to view it is as an OS for a specific language, Erlang, with a specific set of necessary features, processes and message and error handling and … .

EDIT: Try reading blogs and conference talks by Björn Gustavsson and Lukas Larsson.

Abdelghani · August 7, 2022, 4:47pm

Thank you Mr @rvirding for all these true informations,
in fact I thinked before that each process will be spawned on it’s scheduler and stay there I didn’t have the idea that the load balancing start after spawning mechanism.
Understanding Beam deeps will help more and more in developping high level applications, that’s why Iam interrested in.
Thank you too for your suggestions, I will enjoy a lot since watching erlang conferences on youtube become my new hobby.

my greetings,
ABDELGHANI