I’m hoping someone who’s familiar with mnesia can help me solve this problem.
Several years ago, we received two Dell servers. We set up our Erlang-based process on both. One was configured as a back-up and the other as the primary. The project uses mnesia. The system was working well – keeping the databases in sync and when the primary was brought down, the system would switch to the other.
Unfortunately, the Intel chipset in these machines had an errata. Once every few days, the USB bus would generate an interrupt storm and cause the primary to become unresponsive for 30 seconds – long enough to switch to the other system. Then the primary would continue and chaos would happen.
We hoped a fix would be made available, so we kept the secondary system off. Our system ran for many years in this configuration. But now we purchased new hardware and we want to retire those machine. I installed Linux and Erlang on the new hardware and I’m trying to get it to rejoin the mnesia cluster and I don’t know how to do it. The new system doesn’t have the schema that was on the old system so mnesia, on the primary system, won’t let it join.
I’ve looked at StackOverflow and none of the answers worked for this situation. The mnesia documentation didn’t work because the primary has the secondary still in its schema, but I can’t remove it. I even tried some brute force techniques: shutting down the primary, copying the database directory to the new machine, renaming the secondary machine, and starting it on the new machine. That didn’t work either.
So I’m asking for help to:
Remove the secondary machine from the primary’s schema (it seems part of the problem is that it’s configured to have a secondary, but the secondary doesn’t have enough to validate itself.)
Add the secondary system to the mnesia cluster.
If anyone knows the steps to do this, or even a link to documentation that explains it, I’d greatly appreciate it.
Can you even join the cluster? Can you ping the primary node from the new one?
If yes, I think you need to call mnesia:del_table_copy/2 for each shared table on the primary node, with old secondary node being 2nd argument in the call. Then call mnesia:add_table_copy/3 to add a replica on the new secondary node.
I think you need to call mnesia:del_table_copy/2 for each shared table on the primary node
I will give this a try the next time we can schedule down-time. I thought I tried this and it failed because the other node needed to be responding. But maybe that was a different set of functions I saw on StackOverflow.
I’m a bit doubtful this will work, though, because the primary still has the secondary machine’s hostname in its db_nodes parameter. It seems to really want to update the secondary’s schema when these changes are made. And my secondary doesn’t have a valid schema.
IMO, try to replicate the issue from scratch and then solve it (unless down-time is really cheap!).
Have you tried calling mnesia:del_table_copy(schema, BadNode)? The docs say: “This function can also be used to delete a replica of the table named schema. The Mnesia node is then removed. Notice that Mnesia must be stopped on the node first.”
Also, OTOH if the old secondary is gone, can’t you give a new secondary new the same name as the old one?
The new secondary has the same name as the old secondary. But it doesn’t have the schema with the “secret cookie” that was on the old secondary so the primary won’t talk to it. The comment “Notice that Mnesia must be stopped on the node first.” makes me think that mnesia:del_table_copy(schema, BadNode) will fail because it needs to talk to the secondary and modify the schema.
But you’re right, I’ll try to replicate the issue on a different machine. This primary machine really needs to be running 24/7 and I can’t risk corrupting the database.
I just made a small demo with 2 nodes, A (primary) and B (secondary). I started them both and connected them. Then, on A I run mnesia:create_schema([node(), B]). Then I killed B and run mnesia:del_table_copy(schema, B) (with mnesia application running on A!) and it returned {atomic, ok} and it removed B from db_nodes
You can start mnesia (without a schema) on the new node with extra_db_nodes set to the primary node.
Since this a do once installation: mnesia:start([{extra_db_nodes, [PrimaryNode]}]).
That will connect mnesia on the nodes and put the new node in the schema as a disk less node, i.e. schema is ram_copies.
You can then change that to disc_copies with mnesia:change_table_copy_type(schema, node(), disc_copies). **
After that you add all the tables you want a copy of with: [{atomic, ok} = mnesia:add_table_copy(Tab, node(), Storage) || {Tab,Storage} <- TabsAndStorageOnNewMachine].
** This is not tested. You may need to delete the BadNode as described above before changing the new node to disc_copies.
We’ve allocated time today to try these ideas. I’ve been able to get them to work on a test system so we’re going to apply them to the running system. I’ll post the results and close out this discussion if everything works.
In my test nodes, I was able to build a cluster, remove a node, and then reattach the node.
The exact same steps on the operational system weren’t completely successful. I was able to remove the old, obsolete node using mnesia:del_table_copy(schema, OldNode) (however, the removed node was still in mnesia:system_info(extra_db_nodes) on the production node.)
Then, on the fresh, new system, I tried mnesia:start([extra_db_nodes, ProductionNode]). Using tcpdump, is saw a flurry of traffic between the nodes, but they didn’t connect. Each only reports themselves in mnesia:system_info(db_nodes) and mnesia:system_info(running_db_nodes). Trying to use change_table_copy_type on the schemas results in {aborted,{badarg,schema,unknown}}.
They see each other in epmd’s registry and they are both started with the same cookie so I think they’re setup correctly in distributed mode.
I’m partially successful! Thank you!
Anyone have any ideas on how to get them over this final hurdle?
Yes. We’re currently running OTP/21. Once I get the cluster working, we’ll start migrating to OTP/26 (with JIT!)
How did you verify that? Does net_adm:ping/1 returns pong ?
I used tcpdump to see chit chat between the two. Didn’t know about net_adm:ping/1. Running that returns the value pang, so I don’t have them set up correctly. Once I have that working, I’ll try again to get the database to sync.
More info (with anonymized host names and port numbers):
No firewalls running on either machine.
Both machines are running OTP/21 (although at different patch levels)
From the new node, net_adm:names('this_host') returns a list of one element referring to the current VM’s node.
net_adm:names('other_host') returns a list of one element referring to the VM on the other machine (my operational machine), so the epmd protocol seems to be working across nodes.
On the operational machine, netstat -a shows the VM uses the port reported by net_adm:names/1. netstat also shows the VM bound to 0.0.0.0:PORT, so it’s listening on all interfaces.
Both VMs are started with -sname and both have the same cookie.
tcpdump on the operational machine shows traffic from the other node first goes to the .epmd port and then to the VM’s port.
Yes, I can ping each machine from the other. And ~/.erlang.cookie is 400.
Since epmd interaction is working, I feel I have something configured wrong with my Erlang VM instance. The ~/.erlang.cookie permission was something I didn’t know about, thanks. Unfortunately it’s set correctly.
Primary node is started with dist_auto_connect = never, you can try to see if that is true by calling application:get_all_env(kernel). If there is no dist_auto_connect env over there, than it’s ok. If it’s set to never, you must connect the new node by explicitly calling net_kernel:connect_node/1 before trying to ping the node.
If somewhere in the past there was a call to net_adm:allow/1, then new node won’t connect to the primary even if the cookies are the same. You can check this by calling net_adm:allowed() which is unfortunately undocumented (although mentioned under documented API exports in the source code). If it returns {ok, []} then it wasn’t called. If the list is not emtpy, then you need to call net_adm:allow/1 on the primary node to be able to connect the other node.
dist_auto_connect is not in the environment of the active, mnesia node. In OTP/21, net_adm:allowed/0 is not defined, so that can’t be the problem either.
You’re still stuck on this? If you’ve done all of those things than you have networking issues. Each host runs one epmd process listening to the well known port:
$ grep epmd /etc/services
epmd 4369/tcp # Erlang Port Mapper Daemon
epmd 4369/udp # Erlang Port Mapper Daemon
Each node listens on a randomly selected port for distribution, which is controlled by the kernel application environment variables:
You can choose the distribution port range with inet_dist_listen_min and inet_dist_listen_max. You’re probably running in an environment which is blocking the distribution traffic, so select the port(s) and/or interface to use for distribution explicitly and configure your environment to allow that traffic.