Use the environment ALCOVE_TEST_STREAM_MAGIC_SLEEP to control the length
of time the shell process will wait before exiting. The default is 0
(no wait).
The underlying issue is that a parent process is exiting before relaying
all the stdout from the child shell process. While this problem needs
to be fixed, it is interrupting the stress tests which are turning up
some new, interesting failures.
Linux has a resource limit for scheduling priority. If the value is 0,
processes that have increased their nice value (i.e., set it to a
positive integer between 0 and 20) may not decrease it. If the value is
20, the user may decrease the process priority back to 0.
Uniformly test for this resource limit across all OS'es (even though
only linux seems to support it) and set it down to 0 for the test.
Use the same names for signal actions as the C headers. Since the
actions are atoms, the names are lower cased.
Invent the name 'sig_catch' to denote signals that are caught and passed
to the erlang side as messages.
Add the 'sig_' prefix to all the actions to avoid quoting 'catch', since it
is a reserved keyword.
Forked/exec'ed executables will sometimes get a sigpipe causing the
tests to fail. On OpenBSD, the "yes" command uses SIGPIPE to cause yes
to exit, so re-enable for that test.
There is a race condition in the stream test. Occasionally, when the
shell process exits, a process higher up in the chain is killed before
it has completely relayed the data.
Work around the test failure for now by adding a delay.
Allow any erlang processes to send messages into the port, similar
to the way port drivers can respond directly to a process. The Unix PID
fork path is used as the key to map the port response to the Erlang PID
by the gen_server.
The current implementation is an experiment only and has a race
condition. Process A calls into the gen_server and blocks. Process B calls
into the port for the same Unix fork path and blocks. The response for
both requests will be sent to process B and process A will block forever.
In other words, the last erlang process to make a request to the fork
path becomes the controlling process. If the erlang process dies, any
data generated by the Unix process is sent to the process that started
the gen_server.
What the behaviour should be needs to be defined. For example, using the
gen_tcp behaviour of controlling_process would be problematic: since the
response always goes to the controlling process, a call from another
erlang process would hang. After the Unix process has called exec(),
allowing multiple processes to send in data may make sense.
Making the controlling process the only process privileged to talk to
the fork path would have some weird side effects:
* data for the fork path would have to be serialized through the
controlling process
* if a non-owning process sends a message to the fork path, either we
have to extend the type spec for each call to include
{error,not_owner} or we send a badsig exception and the client is
forced to wrap each call in a try/catch
There should also be a concept of linking between unix and erlang
processes:
* processes are unlinked
* erlang process dies: stdout/stderr from the unix process should
be dropped
* unix process dies: controlling process gets the normal exit
messages (exit_status, termsig)
* processes are linked
* erlang process dies: unix process gets a SIGKILL
* unix process dies: erlang process gets an exit(kill)
* erlang process monitors unix process (?)
* unix process dies: erlang process gets a 'DOWN' message
To test the concept works, modify the tcplxc example to talk directly to
the port from each erlang container process (the example needs much more
cleanup and should be converted to an OTP process).
Handle framing in erlang code rather than relying on the {packet,2}
option to open_port/2. This change simplifies the port protocol codec
and should eventually allow easily calling exec() in the port.
Like Linux and Solaris, poll(2) on OpenBSD will return EINVAL if the
number of file descriptors argument passed to poll is above the maximum
number of fd's allowed by setrlimit(2).
By inspection, NetBSD behaves in a similar way to FreeBSD (nfd is
allowed to exceed maximum fd).
An alcove system process may exit unexpectedly during a call, for
example, if poll(2) exits due to an error. In this situation, the caller
will hang in receive.
setpriority(2) is useful for limiting the impact of processes running in
a sandbox. It is also possible to exec the sandbox process using nice:
alcove:execvp(Drv, Path, "/usr/bin/nice", ["/usr/bin/nice", "-n",
"19", "sandbox"]).
The advantage of supporting set/getpriority natively in alcove are:
* nice doesn't need to exist in the sandbox chroot
* better error values (nice will return errors as a binary string)
* priorities can apply to fork trees
Returning a "plain" error tuple for unsupported calls breaks the type
spec for functions:
On BSD:
-spec setproctitle(alcove_drv:ref(),iodata()) -> 'ok'.
On Linux and Solaris:
-spec setproctitle(alcove_drv:ref(),iodata()) -> {'error','unsupported'}.
So the type spec would need to be amended to be the union of both return
values and portable code would have to test for both cases, with the
effect that any call in the future might return 'unsupported' if alcove
were ported to a new OS.
Attempting to make an supported call will now result in an "undefined
function" exception:
1> catch alcove:setproctitle(P, "foo").
{'EXIT',{undef,[{alcove,call,
[<0.46.0>,setproctitle,["foo"]],
[{file,"src/alcove.erl"},{line,426}]},
{erl_eval,do_apply,6,[{file,"erl_eval.erl"},{line,661}]},
{erl_eval,expr,5,[{file,"erl_eval.erl"},{line,434}]},
{shell,exprs,7,[{file,"shell.erl"},{line,684}]},
{shell,eval_exprs,7,[{file,"shell.erl"},{line,639}]},
{shell,eval_loop,3,[{file,"shell.erl"},{line,624}]}]}}
2> os:type().
{unix,linux}
This still places a burden on the caller. Portable code either needs to
hardcode supported functions by OS:
% And imagine all the error checking this code is not doing ...
proctitle(Name) ->
case os:type() of
{unix,linux} -> alcove:prctl(...);
{unix,BSD} where BSD = ... -> alcove:proctitle(...)
Or run through sequences of awkward try/catch statements:
proctitle(Name) ->
try alcove:proctitle(...) of
ok -> ok
catch
error:undef ->
try alcove:prctl(...) of
{ok,_,_,_,_,_} -> ok
catch
error:undef ->
% Does Solaris even?
...
end,
end
Dealing with portability should be the job of a higher level library
built on top of alcove.
The interaction between poll(2) and setrlimit(RLIMIT_NOFILE) differs
between BSD and Linux/Solaris.
On BSD systems, if RLIMIT_NOFILE is set to 0, poll(2) will continue to
work with open file descriptors.
On Linux/Solaris, poll(2) requires the minimum value of RLIMIT_NOFILE
to be equal to the max open file descriptor+1. If the nfd parameter to poll(2)
exceeds the current soft limit, poll(2) returns EINVAL, causing the
alcove process to exit.
Setting RLIMIT_NOFILE to 0 is useful for creating rlimit sandboxes. The
caller allocates whatever resources are required, drops privileges, then
sets limits on resources. For example, to prevent file creation, forking
of new processes and opening file descriptors, the following works on
BSD:
ok = alcove:setrlimit(Drv, [Child], rlimit_fsize,
#alcove_rlimit{cur = 0, max = 0}),
ok = alcove:setrlimit(Drv, [Child], rlimit_nproc,
#alcove_rlimit{cur = 0, max = 0}),
ok = alcove:setrlimit(Drv, [Child], rlimit_nofile,
#alcove_rlimit{cur = 0, max = 0}).
If an open file descriptor is closed, it cannot be re-opened:
1> alcove:open(P, [F], "/etc/passwd", [o_rdonly], 0).
{ok,6}
2> alcove:setrlimit(P, [F], rlimit_nofile, #alcove_rlimit{cur = 0, max = 0}).
ok
3> alcove:read(P, [F], 6, 10).
{ok,<<"# $FreeBSD">>}
4> alcove:close(P, [F], 6).
ok
5> alcove:open(P, [F], "/etc/passwd", [o_rdonly], 0).
{error,emfile}
On a linux or solaris system, RLIMIT_NOFILE would have to be set to 7.
The array holding the file descriptors will grow if RLIMIT_NOFILE is
increased but does not shrink (processes/file descriptors may exist and
would be leaked). Since BSD systems can poll any open file descriptor,
pass the whole array into poll(2). For linux/solaris, check the highest
currently opened fd is below RLIMIT_NOFILE and use the value of
RLIMIT_NOFILE as the number of file descriptors argument.
On ARM, dialyzer crashes when compiling native code. Add support for
dialyzer flags via an environment variable to work around this:
DIALYZER_FLAGS=--no_native make dialyzer
Since exit_status is now enabled by default, a test would occasionally
fail depending on how quickly the message was received from the port.
Fix by waiting for the exit_status event to be received.
Processes have 2 ways of notifying the parent of exit: termination
signal and exit status. Since termination signals are enabled by
default, enable exit status for consistency. Exit status can be disabled
by using:
alcove:setopt(Drv, ForkPath, exit_status, 0)
alcove is usually called from sudo which sets uid, euid and suid
appropriately. However, if run with the setuid bit, the behaviour might
be more mysterious as shown in "Setuid Demystified":
http://www.usenix.org/events/sec02/full_papers/chen/chen.pdf
setresuid(2) is a more predictable interface. But of course it is not
supported by Solaris.
Alcove will exit on fatal errors like resource allocation failures. Use
the value of errno as the exit value (which will be sent as a message if
the exit_status option is used).
The reason for the failure is sent to stderr using the err macros and so
will be in format:
<<"alcove: ", Reason/binary>>
On a single CPU linux test system, the last process in the fork chain may
not have exited when an process earlier in the chain has returned
{error,esrch} from a kill signal. Test the entire fork chain has exited
by waiting for the last element in the chain.
Instead of sleeping with a magic constant, poll the child of the child
of the process that called exec. Since the process midway in the chain
called exec, we won't be sent termination messages from the child procs.
Interestingly, on OpenBSD, the child of the process calling exec is
reliably terminated. On Linux, FreeBSD, NetBSD and Solaris, the child
process becomes a zombie and consequently, sending a signal of 0 returns
ok. Presumably this occurs because the parent did not call waitpid, but
it could point to another bug somewhere.