Sending data to the stdin of a non-existent process will also return
'badpid'. Mark the type as a control message rather than the result of
a call.
Probably there only needs to be 2 types of messages: sync (calls) and
async (events). The control message type might be removed and these
events labelled as events.
I go back and forth with this: be explicit or brief. Especially with
strcmp() where I always WTF.
The code is simpler and easier to read when the checks are explicit. For
example:
// NULL pointer or integer?
if (!x) return;
Return {error, unsupported} if an atom is used as an argument and the
constant the atom represents does not exist on the current platform.
The previous behaviour was inconsistent and non-deterministic. The
constant might:
* return {error,einval}. System return values could not be distinguished
from alcove return values.
* cause an exception
* be silently ignored
Move out the clone constants to a header since fork() does not need
them. Include sched.h with the constants to ensure the CLONE_ constants
are defined.
Leave common utility functions in c_src/sys/alcove_fork.(c|h). Maybe
these files should be renamed "fork_common.c" or "alcove_fork_common.c".
Move alcove_setfd() to alcove.c since the pid_foreach() is also there.
These common functions should also eventually be moved out to another
file.
Support building with rebar3 and rebar2 using the makefile generated
from "rebar3 new cmake".
Do a straightforward port of the compiler options from the rebar port
compiler. The optimizations/warnings can be re-enabled later.
Use the fancy pants incremental building of each compilation unit from
the template. It's a lot slower than compiling in one go but is slightly
faster when running any rebar command (rebar runs make on each command
which recreates the target. The Makefile needs to be fixed).
Use POSIX 200112L/C99 mode on Solaris to suppress compilation warnings.
"-D__EXTENSIONS__=1" is required for NSIG (max number of signals). This
also pulls in an ERR macro which necessitates renaming the goto label
from ERR -> ERROR.
A message sent to a non-existent Unix PID would be discarded, causing
the calling erlang process to block forever in receive.
This change modifies the behaviour of the port to return "badpid" if an
element in the fork path is not found. Since any element of the path may
be invalid, an event (as opposed to a call reply) is generated.
Currently any "badpid" event will cause the erlang process to exit.
Since only one call can be active at a time, it shouldn't interfere with
other operations but the mailbox should pattern match on the list
prefix (i.e., a call to [1,2,3,-1,4] would generate badpid for
[1,2,3,-1]).
Also define the behaviour when writing to invalid PIDs (<= 0). Since 0
is used to initialize the list of child PIDs in the port, passing in a
PID of 0 would match the first available slot in the PID table, causing
the port to write to fd 0.
Use the same names for signal actions as the C headers. Since the
actions are atoms, the names are lower cased.
Invent the name 'sig_catch' to denote signals that are caught and passed
to the erlang side as messages.
Add the 'sig_' prefix to all the actions to avoid quoting 'catch', since it
is a reserved keyword.
Like Linux and Solaris, poll(2) on OpenBSD will return EINVAL if the
number of file descriptors argument passed to poll is above the maximum
number of fd's allowed by setrlimit(2).
By inspection, NetBSD behaves in a similar way to FreeBSD (nfd is
allowed to exceed maximum fd).
setpriority(2) is useful for limiting the impact of processes running in
a sandbox. It is also possible to exec the sandbox process using nice:
alcove:execvp(Drv, Path, "/usr/bin/nice", ["/usr/bin/nice", "-n",
"19", "sandbox"]).
The advantage of supporting set/getpriority natively in alcove are:
* nice doesn't need to exist in the sandbox chroot
* better error values (nice will return errors as a binary string)
* priorities can apply to fork trees
Returning a "plain" error tuple for unsupported calls breaks the type
spec for functions:
On BSD:
-spec setproctitle(alcove_drv:ref(),iodata()) -> 'ok'.
On Linux and Solaris:
-spec setproctitle(alcove_drv:ref(),iodata()) -> {'error','unsupported'}.
So the type spec would need to be amended to be the union of both return
values and portable code would have to test for both cases, with the
effect that any call in the future might return 'unsupported' if alcove
were ported to a new OS.
Attempting to make an supported call will now result in an "undefined
function" exception:
1> catch alcove:setproctitle(P, "foo").
{'EXIT',{undef,[{alcove,call,
[<0.46.0>,setproctitle,["foo"]],
[{file,"src/alcove.erl"},{line,426}]},
{erl_eval,do_apply,6,[{file,"erl_eval.erl"},{line,661}]},
{erl_eval,expr,5,[{file,"erl_eval.erl"},{line,434}]},
{shell,exprs,7,[{file,"shell.erl"},{line,684}]},
{shell,eval_exprs,7,[{file,"shell.erl"},{line,639}]},
{shell,eval_loop,3,[{file,"shell.erl"},{line,624}]}]}}
2> os:type().
{unix,linux}
This still places a burden on the caller. Portable code either needs to
hardcode supported functions by OS:
% And imagine all the error checking this code is not doing ...
proctitle(Name) ->
case os:type() of
{unix,linux} -> alcove:prctl(...);
{unix,BSD} where BSD = ... -> alcove:proctitle(...)
Or run through sequences of awkward try/catch statements:
proctitle(Name) ->
try alcove:proctitle(...) of
ok -> ok
catch
error:undef ->
try alcove:prctl(...) of
{ok,_,_,_,_,_} -> ok
catch
error:undef ->
% Does Solaris even?
...
end,
end
Dealing with portability should be the job of a higher level library
built on top of alcove.
The interaction between poll(2) and setrlimit(RLIMIT_NOFILE) differs
between BSD and Linux/Solaris.
On BSD systems, if RLIMIT_NOFILE is set to 0, poll(2) will continue to
work with open file descriptors.
On Linux/Solaris, poll(2) requires the minimum value of RLIMIT_NOFILE
to be equal to the max open file descriptor+1. If the nfd parameter to poll(2)
exceeds the current soft limit, poll(2) returns EINVAL, causing the
alcove process to exit.
Setting RLIMIT_NOFILE to 0 is useful for creating rlimit sandboxes. The
caller allocates whatever resources are required, drops privileges, then
sets limits on resources. For example, to prevent file creation, forking
of new processes and opening file descriptors, the following works on
BSD:
ok = alcove:setrlimit(Drv, [Child], rlimit_fsize,
#alcove_rlimit{cur = 0, max = 0}),
ok = alcove:setrlimit(Drv, [Child], rlimit_nproc,
#alcove_rlimit{cur = 0, max = 0}),
ok = alcove:setrlimit(Drv, [Child], rlimit_nofile,
#alcove_rlimit{cur = 0, max = 0}).
If an open file descriptor is closed, it cannot be re-opened:
1> alcove:open(P, [F], "/etc/passwd", [o_rdonly], 0).
{ok,6}
2> alcove:setrlimit(P, [F], rlimit_nofile, #alcove_rlimit{cur = 0, max = 0}).
ok
3> alcove:read(P, [F], 6, 10).
{ok,<<"# $FreeBSD">>}
4> alcove:close(P, [F], 6).
ok
5> alcove:open(P, [F], "/etc/passwd", [o_rdonly], 0).
{error,emfile}
On a linux or solaris system, RLIMIT_NOFILE would have to be set to 7.
The array holding the file descriptors will grow if RLIMIT_NOFILE is
increased but does not shrink (processes/file descriptors may exist and
would be leaked). Since BSD systems can poll any open file descriptor,
pass the whole array into poll(2). For linux/solaris, check the highest
currently opened fd is below RLIMIT_NOFILE and use the value of
RLIMIT_NOFILE as the number of file descriptors argument.
Processes have 2 ways of notifying the parent of exit: termination
signal and exit status. Since termination signals are enabled by
default, enable exit status for consistency. Exit status can be disabled
by using:
alcove:setopt(Drv, ForkPath, exit_status, 0)
alcove is usually called from sudo which sets uid, euid and suid
appropriately. However, if run with the setuid bit, the behaviour might
be more mysterious as shown in "Setuid Demystified":
http://www.usenix.org/events/sec02/full_papers/chen/chen.pdf
setresuid(2) is a more predictable interface. But of course it is not
supported by Solaris.
Alcove will exit on fatal errors like resource allocation failures. Use
the value of errno as the exit value (which will be sent as a message if
the exit_status option is used).
The reason for the failure is sent to stderr using the err macros and so
will be in format:
<<"alcove: ", Reason/binary>>
Mount filesystems using the Solaris-specific version of the mount
interface. Solaris adds an options and options length parameter to
mount which takes a NULL terminated string of comma separated arguments.
On other Unix'es the options are either included in the mount flags
(MS_NOEXEC, ...) or in the data argument (<<"size=128M">> for tmpfs).
The behaviour of mount(2) on Solaris is bizarre: the options argument is
input/output, with the mount options placed in the buffer on return.
If the MS_OPTIONSTR is present in the mount flags and the options buffer
is too small, the mount call returns -1 and ERRNO is set to EOVERFLOW
but the mount actually succeeds! A more robust interface might truncate
the options to the size of the buffer, possibly seting the options
length to the required length and return 0.
Surprisingly, the options buffer can also be too large. This is so
weird, it must be a bug in alcove. If the buffer exceeds a certain size,
mount returns -1 with ERRNO set to EINVAL. The mount fails in this case:
{error,einval} = alcove:mount(Drv, [Child], "swap", Dir, "tmpfs",
[ms_optionstr], <<>>, <<"size=16m", 0:(1024*8)>>).
Since Solaris has this extra argument, mount/6,7 has to be extended to
mount/7,8, forcing all platforms to pass in an options parameter. This
parameter is ignored on all platforms except Solaris. The result is that
the mount interface is not the Linux mount(2) interface, it is some weird
hybrid that is awkward to use on all platforms (the interface does not
map to the mount(2) man page on any platform).
The value of the option parameter is also not returned to the caller on
Solaris. Options to fix this include:
* breaking out Solaris mount(2) to a Solaris specific call (mountext or
whatever)
* checking if opt is non-NULL on return and opt len > 0. If so, return:
{ok, binary()}
This extends the mount/7,8 type to:
ok | {ok,binary()} | {error,posix()}