Erlang REST API Design
(newbie questions)
This is a follow up to a question I posted a few weeks
ago.
So I’m trying to wrap my head around Erlang and how to use it in problems I’m
familiar with, such as REST micro-services. I’ll layout the problem domain
that’s in my head, and then ask for comments as to whether I’m on the right
track with how to approach this in Erlang…
My apologies for this being such a long question…
Main Question
When is it advantageous to have a long-lived process vs. just spinning up an
ephemeral process?
Problem in traditional micro-services architecture
Before Erlang, I would have designed something like this:
# load balancer exposes public micro-services S1-S(N)
LB -> S1, S2, S3
A micro-service would be composed of an API plane, a Control Plane (CP), and
some state:
S1 = {
CP -> control plane endpoints (e.g. info(), etc)
API -> application endpoints (e.g. get_resource_foo(), etc.)
}
Each micro-service owns some sort of state, be it in a Redis Cache, an S3
Bucket, or a database.
S1 -> postgres database
S2 -> redis cache
S3 -> s3 bucket
When a client request arrives, it is serviced by the micro-service. The
micro-service calls other micro-services for things like authorization, and to
get various resources:
client -> [request] ->
LB -> S1.get_resource_foo() -> S2.is_client_authorized()
S3.get_resource_bar()
calculate foo
return foo
Erlang Approaches
Single Erlang Node
What I’ve been struggling with is how this sort of thing maps to Erlang. Here’s
my thought process on that.
- consolidate all the micro-services into a single Erlang node
- replace Redis cache with ETS
- use gproc to register micro-service endpoints as well known long-lived
processes
This looks like:
LB -> Erlang Node -> (gproc) -> P1, P2, P3
Now when a client request arrives, we send a message to the well known
long-lived process:
client -> [request] -> LB -> Erlang Node ->
webserver ->
P1.get_resource_foo() ->
gproc -> P2 ! is_client_authorized()
gproc -> P3 ! get_resource_bar()
calculate foo
return foo
In turn, the web-server and P1-P3 would just spawn ephemeral processes (E) to
handle their requests.
... Erlang Node -> webserver spawn E1 -> P1 ! get_resource_foo()
P1 spawn E2 -> P2 ! is_client_authorized()
P2 spawn E3 -> P3 ! get_resource_bar()
P1 spawn E4 -> calculate foo
return foo
This seems great, but where are the bottlenecks?
Multiple Erlang Nodes
If I have one Erlang node running the web-server with API and CP endpoints, and
several other nodes for processing:
- Node1 : web-server
- Node2 : worker
- Node3 : worker
Now say I have shared state in ETS across this small cluster. I could
distribute processes in various ways:
- Option1: node1 contains web-server endpoints AND well known long-lived
processes P1-P3, and ephemeral worker processes get created on node2-node3 - Option2: node1 contains web-server and P1-P3 and all ephemeral processes are
created on node2-3 - Option3: same as Option2 but spin up ephemeral processes randomly on nodes
2-3 - Option4: same as Option3 but spin up ephemeral processes on nodes 1-3 that
have the least load
This seems like I’m over thinking it. A simpler option would be to just have
all the nodes the same and not spin up processes on remote nodes, just the
local node:
- Option5: All nodes the same, with web-server, P1-3, and Ephemeral processes
on local node
No Long Lived Processes
This brings me to one of my core questions… why have long lived processes at
all? In this case, the problem could be simplified by just invoking the MFA
directly:
... Erlang Node -> webserver spawn E1 ->
spawn E2 (Module1:get_resource_foo:arguments)
spawn E3 (Module2:is_client_authorized:arguments)
spawn E3 (Module3:get_resource_bar:arguments)
spawn E4 (Module4:calculate_foo:arguments)
return foo
Now the problem looks a lot simpler:
- Nodes 1-3 : web-server, code for Modules 1-4, sharing ETS
client -> [request] ->
LB -> Node(1-3) ->
webserver ->
webserver spawn E1 ->
spawn E2 (Module1:get_resource_foo:arguments)
spawn E3 (Module2:is_client_authorized:arguments)
spawn E3 (Module3:get_resource_bar:arguments)
spawn E4 (Module4:calculate_foo:arguments)
return foo
So my primary questions are:
- Am I composing this problem in Erlang correctly?
- When do you use long-lived well known (e.g. registered with gproc) processes
vs. ephemeral processes? - How do you compose your nodes and distribute work to them? In this last case,
I’m using the LB to randomly spread the load across the nodes
I have other questions that I’ll keep on the back-burner, but mention here in case anyone knows:
- how well does ETS handle network partitions?
- if you join a node to an already running cluster, is that node available
before ETS has finished hydrating in the new node? what happens if you query
ETS before it’s been populated/hydrated?
Thank you in advance!