Allocate the poll array once and re-use it. If the caller increases the
number of file descriptors supported by the process, try to account for
it by re-sizing the poll array.
Caveats:
The poll array will only grow. Because the offset into the array is used
for indexing the fd (similar to how the FD_SET macros work with
select(2)), decreasing the array with existing fd's may result in the
fd's exceeding the size of the array.
Using sysconf(_SC_OPEN_MAX) is not portable. On Linux, it seems to call
getrlimit(RLIMIT_NOFILE).
The value of maxfd is checked in the event loop rather than doing it in
the call to setrlimit because there may be other methods for changing
the limit, for example, using prctl.
Base the default number of child processes per fork on the number of
available file descriptors. The maximum number of processes per fork is
an unsigned 2 byte integer.
If this value exceeds the RLIMIT_NOFILE, {error,emfile} will be
returned to the caller.
To remove the arbitary limit on file descriptors and consequently,
number of forked processes, use poll(2) instead of select(2).
Using poll(2) has the downside that poll(2) has restrictions on what
type of file descriptors can be monitored, for example, devices on Mac
OS X, since on the roadmap is allowing the user to add fd's to the event
loop.
Scrub the fd array inherited from the parent before realloc(), in case
the number of fd's in the child has shrunk. Purely cosmetic, nothing
sensitive in this array.
dup the control fd to the "well known" file descriptor 3. This
simplifies checks in the code (reserved fd's are under 4).
Since dup'ing the fd doesn't copy the close on exec flag to the new fd,
perform the fcntl() operation after the fd is dup'ed.
readdir/2 lists files within a directory. This duplicates some of the
functionality of the file and filelib modules in the stdlib but, since
the process may be running as a different user than beam and may be
running in a different mount space or chroot, the stdlib may not have
access to the requested files.
Since the main purpose of this function is to support enumerating
cgroups, opendir(3) and closedir(3) aren't accessible from erlang
(similar to setns, but differing from support for open, close, read,
write).
The maximum number of forks a process can have is limited by the
available file descriptors, with the upper boundary set by the number of
fd's select() can handle.
This option is provided because the number of child process per fork
can't easily be done by other means. For example, setrlimit() with
RLIMIT_NPROC applies to all the user processes and RLIMIT_NFILE will
interfere with normal file operations. Useful for testing and for
preventing accidental fork bombs.
The maxchild options applies to child processes. The value in the
current process is unchanged.
Free memory on error. Be cautious about errno being overwritten by
intervening functions.
Lots more cleanup to be done, for example, check for double free's.
Since there may be multiple message headers, a read may result in a
valid packet that the next process in the fork chain rejects as too
large.
Calculate the message buffer size for the read based on the length of
the fork chain.
write(3) may return success after writing less than the full buffer
supplied in the argument. From the Linux write(2) man page:
The number of bytes written may be less than count if, for example,
there is insufficient space on the underlying physical medium, or the
RLIMIT_FSIZE resource limit is encountered (see setrlimit(2)), or the
call was interrupted by a signal handler after having written less
than count bytes. (See also pipe(7).)
Begin support of cgroups by adding mkdir and rmdir. Instead of providing
an inflexible interface to cgroups within the port, support the
primitives so that the logic can be done within erlang.
This functionality duplicates the file module in the standard library.
It is unfortunately necessary because the port requires superuser
privileges to write to the cgroup filesystem.
The process environment can be set up in a few ways:
* using the {env, [{Key, Val}]} option in alcove_drv:start/1
Global: will effect all future spawned processes.
* at exec, using execve/5
Per process: will affect the child process
* using getenv(3), setenv(3) and unsetenv(3)
Exit status is disabled by default and can be enabled per process by
using setopt.
Port = alcove_drv:start([exit_status, termsig]),
{ok, Child1} = alcove:fork(Port),
{ok, Child2} = alcove:fork(Port),
ok = alcove:exit(Port, [Child1], 1),
{exit_status, 1} = alcove:event(Port, [Child1]),
ok = alcove:kill(Port, [Child2], 9),
{termsig, 9} = alcove:event(Port, [Child2]).
The exit status is spoofed so the event will appear to come from the
child. The parent will still receive SIGCHLD ({signal,17} on linux).
Add exit(3) for testing.
Confirm events are delivered properly, for example, to the parent after
the child has exited.
The event extraction in alcove_drv is still messy, redundant and error
prone. Needs to be cleaned up.
Accidentally the maxchild option. maxchild is used to size an array of
file descriptors. So obviously resetting this value will either crash
the process (read outside the bounds of the array) or cause it leak fd's.
So punt for now and remove support for changing maxchild. I guess the
simplest way to handle this would be to:
* get the new value of maxchild
* allocate an array of maxchild bytes
* copy PIDs from old to new
* if there is not enough space, free new array and return badarg
* otherwise, point the state to the new array and free the old array
Instead of crashing the caller by returning badarg, we could synthesize
an errno value and return {error,enomem} or simply leave the maxchild
unchanged and return ok.
Getting an integer value from erl_interface is tricky: the macros are
undocumented and the behaviour is not what you'd expect. A small value is
sent from the Erlang side as an integer. An unsigned integer is a value
exceeding a signed int. A long long is a value exceeding an unsigned
int, etc.
Test that the value is an integer and is one of the expected types. In
the case of setrlimit, if the executable is compiled with support for
large files, the value of cur/max are stored in 8 bytes.
If a long long is passed into cur/max using 32-bits, it should be
truncated.
The intent of using a signed was to allow disabling fork depth but it's
simpler to use an unsigned integer and set a high value or set it to -1
and let the value overflow and wrap.
Instead of using a case statement, simplify the lookup by searching
through an array.
Do not modify the name of constants: previously, the names were
downcased and the leading namespace removed, e.g, PR_CAPBSET_READ
would become 'get_capbset_read' and SIG_TERM would be represented by
'term'. Constants are now atoms mirroring the name in the header file:
'PR_CAPBSET_READ', 'SIG_TERM'.
This change will cause issues with non-portable naming of constants. For
example, Linux uses MS_ to preface mount constants and FreeBSD uses
MNT_. A future change will add portable representations of the atom.
Something like 'rdonly' for MS_RDONLY and MNT_RDONLY.
Set the exec() status of the child before reading from the child's
stdin/stderr: the exec status determines the message type (proxy or
stdout) returned by the port.
Begin recording the exit status of the child. Currently it is just a
non-zero number. It should be set to the actual exit status of the
child. The value should be reported to the erlang side as well:
{exit_status, integer()}