Is it safe to share Fd from prim_file:open(Filename, [read]) between multiple processes?

Hi,

My system has multiple processes on the single node, reading data from a single file. This file is immutable and I’ve tried to pass around the Fd I got from the prim_file:open(Filename, [read]) to other processes but these processes couldn’t prim_file:pread with not_on_controlling_process error.

But then I did this

{file_descriptor, prim_file, Data} = Fd
Fd1 = {file_descriptor, prim_file, Data#{owner => self()}}

And reading started working. And I’ve tested it with different read sizes and different offsets in parallel and it always worked correctly. But it is definitely an undocumented hack, however it works perfectly.

Is there anything I am missing? Perhaps there is some scenario which will lead to unexpected exception, VM crash or corrupted data?

I don’t know of any concrete reason why this won’t work, but you’re clearly going down the undocumented & unsupported path.

Is there a particular reason you can’t use a normal IoDev (file:open not raw)?

prim_file + sharing across processes is not something I’d expect to work

file:open spawns a process and preads are becoming requests to this process, removing any ability to have parallel reads from this file, making access sequential.

I mean, other solution is to open file schedulers_online times and then store these Fd’s in some persistent term tuple or ets and access them by scheduler id. This way we can have parallel access, but there is still a problem of unnecessary request to the io process.

The documentation for file:open/2 with Mode raw explicitly says:

Only the Erlang process that opened the file can use it.

and that’s just a redirect to the undocumented prim_file.

I’d open the file in each process that needs it, but with file:open/2 raw as that bypasses the (infamous) file server. Since you only do reads that should be fine.

I assume the file is too large to read upfront into memory?

Indeed, like Mikael mentioned, it won’t work for raw file handles. Each raw fd can only be controlled by one process at a time, even though technically at the lower level the file API functions can be thread-safe and can be called from multiple threads, like for instance pread [1].

I had recently implemented a subset of file operations like that as a NIF to allow access from multiple parallel processes in CouchDB in Implement parallel preads by nickva · Pull Request #5399 · apache/couchdb · GitHub It resulted in a decent speedup in a highly concurrent benchmarks. Feel free to copy or use parts of it if you want.

[1] pread(2) - Linux manual page

The problem is that it actually works in every test I do. Why do you think it won’t work? What scenario do you have in mind?

The problem is that it actually works in every test I do. Why do you think it won’t work? What scenario do you have in mind?

Because the process ownership is there for a reason - safety. The closing state and cleanup when the owner dies will break. At least on posix systems file descriptors are just plain integers. Without managing ownership and lifetimes properly you might find that file descriptors might close and reopen and now the same fd you stashed way in one process could be reading from a completely random file, or not even a file, could be socket or anything else that can have a descriptor.

2 Likes

I understand what you’re describing here, but how is it different if I close some Fd, then open some socket (in this process) which gets the same id and then I accidentally pread the old Fd? It seems to meet that this issue would still occur (and even in other programming languages) and in this case ownership provides no safety mechanism

What you’re missing is that the implementation safely wraps that number in a manner that won’t lead to resource leaks or problems with identifier reuse … but it assumes that only one process accesses the file at a time.

If the owning process is killed (asynchronously) while an operation is in flight, the underlying file descriptor is closed after the operation completes (think exit(Pid) during a long-running read). This is accomplished by atomically compare-exchanging a simple flag. If several operations are in flight, they will all be told to close the underlying file descriptor, thus risking closure of an unrelated descriptor that just-so-happened to reuse the same number.

If you run the debug emulator, you will probably trigger assertions with this hack.

1 Like

Thanks, now I see the issue. It sounds like a very rare bug, given that preconditions are

  1. Multiple file operations being interrupted at the same time
  2. Other descriptor being created right in-between of the close operations and this descriptor would have exact same number

I’ve read the NIF provided by @nickva and it looks promising, but I still don’t feel like having a NIF is a right solution. I can think of this solution:

Just let other processes do preads from the same Fd and have a code which would disable auto-closing of the descriptor when the process (not the owner) exits during long-running pread (or any other thread-safe file operation). If you think that it is a correct way to approach the problem, I would love to make my first contribution to OTP source code :slight_smile:

But if you have any other approach in mind, please do share.

I understand what you’re describing here, but how is it different if I close some Fd, then open some socket (in this process) which gets the same id and then I accidentally pread the old Fd?

The Erlang Fd is not a plain integer fd from the libc / posix interface, it’s a resource that has associated locks and contains the integer fd handle as well. If the controlling process dies, the Fd will switch to closing/closed states until the cleanup finishes. The actual integer file descriptor will stay opened and will only close later until when there is no chance anyone can read from it any longer.

If you open some new file or socket that will only be able to reuse the integer fd after the Fd erlang handle has performed cleanup, closed it and so released so it’s safe.

It seems to meet that this issue would still occur (and even in other programming languages) and in this case ownership provides no safety mechanism

No it’s impossible to occur if the kernel, the libc library and the erlang is implemented correctly. That’s why the safe mechanism is there.

1 Like

The rarity of it is irrelevant, once it happens there is no way for the emulator to recover from that condition. It’s at least as bad as memory corruption.

Funny enough, that is practically guaranteed to happen given how the numbers are compacted. In practice all that is needed to trigger this is that the process that owns the file is killed while two operations are in flight, and that another descriptor is created between the closures.

I’ve had “redesign the file interface” in my backlog for a long time, the new fs module wouldn’t have any of these issues.

4 Likes

That’s nice to hear, but I would like to do a patch with what I’ve described above if you could please point me to the code where this fclose happens

UPD, I think I found the owner_death_callback function

Just FYI we won’t accept that patch, doing so would effectively document and support prim_file, and that’s not something we want.

Gotcha, it sounds like I should write a NIF or use the one provided by @nickva. Thanks for the answers, it’s sad that there’s no way I can contribute to the project, but I guess it’s for the best

Raising this issue is a contribution in and of itself, and it’s not that you can’t contribute this functionality, it’s just that the current file module (and especially the internal prim_file) is the wrong place for various reasons. I wouldn’t add it there, myself; we need a new interface to make this work well.

1 Like