Enforce the use of the fork path. Having an optional fork path was nice
when working in the shell:
{ok, Child} = alcove:fork(Drv).
Instead of:
{ok, Child} = alcove:fork(Drv, []). % port process forks
However it introduced a few problems:
* made the interface inconsistent and ambiguous
alcove:kill(Drv, Pid, 9)
% vs
alcove:kill(Drv, [], Pid, 9) % the port process is sending the signal
* calls could not have optional arguments
Whether or not calls should have optional arguments is an open question
but the optional fork path would have conflicting arities:
For example, the last argument to mount/8 is used only by Solaris:
% arity 6
-spec mount(alcove_drv:ref(),iodata(),iodata(),iodata(),uint64_t() | [constant()],iodata()) -> 'ok' | {'error', file:posix() | 'unsupported'}.
% arity 7
-spec mount(alcove_drv:ref(),iodata(),iodata(),iodata(),uint64_t() | [constant()],iodata(),iodata()) -> 'ok' | {'error', file:posix() | 'unsupported'}.
% arity 7
-spec mount(alcove_drv:ref(),fork_path(),iodata(),iodata(),iodata(),uint64_t() | [constant()],iodata()) -> 'ok' | {'error', file:posix() | 'unsupported'}.
% arity 8
-spec mount(alcove_drv:ref(),fork_path(),iodata(),iodata(),iodata(),uint64_t() | [constant()],iodata(),iodata()) -> 'ok' | {'error', file:posix() | 'unsupported'}.
* because of the ambiguity in arity, each call can't have an optional
timeout
call/5 sets a timeout of 'infinity'. The timeout isn't accessible from
the "named" functions (e.g., fork, chdir, ...), which means that users
who need the timeout will have to resort to using the call/5
interface. An unfortunate side effect of using call/5 is that dialzyer
won't be able to type check the arguments.
The result of removing the functions without the fork path is that the
code ends up being simpler and more consistent.
Move out the clone constants to a header since fork() does not need
them. Include sched.h with the constants to ensure the CLONE_ constants
are defined.
A message sent to a non-existent Unix PID would be discarded, causing
the calling erlang process to block forever in receive.
This change modifies the behaviour of the port to return "badpid" if an
element in the fork path is not found. Since any element of the path may
be invalid, an event (as opposed to a call reply) is generated.
Currently any "badpid" event will cause the erlang process to exit.
Since only one call can be active at a time, it shouldn't interfere with
other operations but the mailbox should pattern match on the list
prefix (i.e., a call to [1,2,3,-1,4] would generate badpid for
[1,2,3,-1]).
Also define the behaviour when writing to invalid PIDs (<= 0). Since 0
is used to initialize the list of child PIDs in the port, passing in a
PID of 0 would match the first available slot in the PID table, causing
the port to write to fd 0.
Use the environment ALCOVE_TEST_STREAM_MAGIC_SLEEP to control the length
of time the shell process will wait before exiting. The default is 0
(no wait).
The underlying issue is that a parent process is exiting before relaying
all the stdout from the child shell process. While this problem needs
to be fixed, it is interrupting the stress tests which are turning up
some new, interesting failures.
Linux has a resource limit for scheduling priority. If the value is 0,
processes that have increased their nice value (i.e., set it to a
positive integer between 0 and 20) may not decrease it. If the value is
20, the user may decrease the process priority back to 0.
Uniformly test for this resource limit across all OS'es (even though
only linux seems to support it) and set it down to 0 for the test.
Use the same names for signal actions as the C headers. Since the
actions are atoms, the names are lower cased.
Invent the name 'sig_catch' to denote signals that are caught and passed
to the erlang side as messages.
Add the 'sig_' prefix to all the actions to avoid quoting 'catch', since it
is a reserved keyword.
Forked/exec'ed executables will sometimes get a sigpipe causing the
tests to fail. On OpenBSD, the "yes" command uses SIGPIPE to cause yes
to exit, so re-enable for that test.
There is a race condition in the stream test. Occasionally, when the
shell process exits, a process higher up in the chain is killed before
it has completely relayed the data.
Work around the test failure for now by adding a delay.
Handle framing in erlang code rather than relying on the {packet,2}
option to open_port/2. This change simplifies the port protocol codec
and should eventually allow easily calling exec() in the port.
Since exit_status is now enabled by default, a test would occasionally
fail depending on how quickly the message was received from the port.
Fix by waiting for the exit_status event to be received.
On a single CPU linux test system, the last process in the fork chain may
not have exited when an process earlier in the chain has returned
{error,esrch} from a kill signal. Test the entire fork chain has exited
by waiting for the last element in the chain.
Instead of sleeping with a magic constant, poll the child of the child
of the process that called exec. Since the process midway in the chain
called exec, we won't be sent termination messages from the child procs.
Interestingly, on OpenBSD, the child of the process calling exec is
reliably terminated. On Linux, FreeBSD, NetBSD and Solaris, the child
process becomes a zombie and consequently, sending a signal of 0 returns
ok. Presumably this occurs because the parent did not call waitpid, but
it could point to another bug somewhere.
Mount filesystems using the Solaris-specific version of the mount
interface. Solaris adds an options and options length parameter to
mount which takes a NULL terminated string of comma separated arguments.
On other Unix'es the options are either included in the mount flags
(MS_NOEXEC, ...) or in the data argument (<<"size=128M">> for tmpfs).
The behaviour of mount(2) on Solaris is bizarre: the options argument is
input/output, with the mount options placed in the buffer on return.
If the MS_OPTIONSTR is present in the mount flags and the options buffer
is too small, the mount call returns -1 and ERRNO is set to EOVERFLOW
but the mount actually succeeds! A more robust interface might truncate
the options to the size of the buffer, possibly seting the options
length to the required length and return 0.
Surprisingly, the options buffer can also be too large. This is so
weird, it must be a bug in alcove. If the buffer exceeds a certain size,
mount returns -1 with ERRNO set to EINVAL. The mount fails in this case:
{error,einval} = alcove:mount(Drv, [Child], "swap", Dir, "tmpfs",
[ms_optionstr], <<>>, <<"size=16m", 0:(1024*8)>>).
Since Solaris has this extra argument, mount/6,7 has to be extended to
mount/7,8, forcing all platforms to pass in an options parameter. This
parameter is ignored on all platforms except Solaris. The result is that
the mount interface is not the Linux mount(2) interface, it is some weird
hybrid that is awkward to use on all platforms (the interface does not
map to the mount(2) man page on any platform).
The value of the option parameter is also not returned to the caller on
Solaris. Options to fix this include:
* breaking out Solaris mount(2) to a Solaris specific call (mountext or
whatever)
* checking if opt is non-NULL on return and opt len > 0. If so, return:
{ok, binary()}
This extends the mount/7,8 type to:
ok | {ok,binary()} | {error,posix()}
Compile and run on Solaris (tested on SmartOS). The basic privsep
sandbox works. Support for creating Solaris containers is planned.
TODO:
* support options in mount. Flags like MS_NOEXEC are passed as strings.
* clean up compiler warnings
* add chroot test: requires a statically linked binary
* makefile: the "which" command does not return an error on Solaris
Set the parent's side of the child's fds (stdin, stdout, stderr, fdctl)
to close on exec. This prevents fd's from leaking when the process calls
exec.
Without setting the fd's to close on exec, the child processes spawned
by the exec'ed process would not be terminated until the process exited:
port -> alcove1 -> alcove2 -> alcove3 -> alcove4
exec(alcove2, "/bin/cat")
port -> alcove1 -> /bin/cat -> alcove3 -> alcove4
With this change, the child process exits but becomes a zombie, since
the parent will not have called waitpid to reap the process:
port -> alcove1 -> /bin/cat -> alcove3 (defunct)
Remove the check for signals from the code dealing with the control fd.
This ensures a successful call to execvp/3,4 and execve/4,5 that
immediately dies with a signal will return ok, then generate a termsig
event.
It's possible that a non-returning call may be killed by a signal. If
that occurs and the child process exits, there is no way to distinguish
a successful call from a terminated one. In this case, ok will be
returned to the caller, followed by a sigterm event.
Signal handling has some problems:
* If a child process receives a signal and exits, the parent detects
the child's control fd has been closed and sends a spurious event
({alcove_event, [pid()], ok}) to the erlang side.
* When the erlang side makes a call, it sends a message into the port
then blocks in receive.
If the child process doing the operation gets a signal, the situation
becomes murky. The parent may:
* notice the control fd closing and return 'ok'
For example, if the process doing a getcwd/2 received a signal:
{ok, Cwd} = alcove:getcwd(Drv, [Pid]) % badmatch: ok
Even worse if the expected return value is 'ok'.
* no response is sent back to the erlang side. The erlang process will
block forever.
Make the signal handling more deterministic by:
* checking if the child has exited due to receiving a signal when the
control fd is closed
* enabling termsig notification by default
* killing the erlang process if a signal is received during a call
(similar to calls with a timeout set)
Killing the erlang process is still not correct. Since the erlang
process is linked to the gen_server, the port will be killed.
Tests on openbsd sometimes fail because the random PID generation
collides with the test PID. Since a pid_t is an int32, choose some
pseudo-random large number as the test PID.
It's possible that the test can still fail since there is a race condition
between testing the PID and using it.
Tests on openbsd sometimes fail because the random PID generation
collides with the test PID. Since a pid_t is an int32, choose some large
number as the test PID.
It's possible to get a unused PID, either by running through the PIDs
sequentially or generating a random number:
get_unused_pid(Drv) ->
PID = crypto:rand_uniform(16#afffffff, 16#ffffffff),
case alcove:kill(
Pass in an integer or a list of integers/atoms for the mount flags
parameter.
The mount flags is an unsigned integer and the decode function returns
an int. While this is probably ok, there are a number of typecasts in
the lookup code that needs to cleaned up.
Accept an integer or a list of atoms/integers for the flags argument. If
a list is passed, the value are OR'ed together. The values for atoms are
looked up from the constants defined in the system header files.
Accept and return signals as names. For example, if a signal is trapped,
the process will receive:
{signal,'SIGCHLD'}
Versus:
{signal,17} % on linux
Benefits:
* similar to the way errno is returned, e.g., {errno,enametoolong}
* better portability: integer values differ by platform
* library user does not need lookup the signal value before calling
sigaction/3,4 or pattern matching the process mailbox
Problems:
* sigaction/3,4 will throw badarg if the name is unknown
Probably it should return an error ({error,einval}).
* not all functions dealing with signals have been changed to use atoms,
e.g., kill/3,4
* all functions with mapping of constants -> integers should take
constants:
clone_define
mount_define
file_define
prctl_define
rlimit_define
syscall_define
If this is done, alcove:define/2,3 and the *_define/*_constant functions
should be removed.
* should the constants be upper or lower case atoms?
atoms are typically lowercased, e.g., {error,ebadf} not
{error,'EBADF'}
For example:
PR_SET_PDEATHSIG -> pr_set_pdeathsig
* if the interface accepts constants, should support for integers be
removed?
1. No way to pass in a "raw" signum if the signal does not have a name
2. Signals without a name will be sent to the process mailbox as:
{signal,unknown}
If the caller has trapped 58 and 59, there won't be a way to
distinguish these signals.
Use sigaction/3,4 for setting the SIGCHLD handler. The default action is
to ignore SIGCHLD, i.e., not notify the caller of SIGCHLD events.
Notifcation can be enabled by setting the handler to 'trap':
SIGCHLD = alcove:define(Drv, 'SIGCHLD'),
alcove:sigaction(Drv, SIGCHLD, trap).
On start up, beam ignores SIGPIPE. Ignored signals are not reset in the
child when exec is called. See sigaction(2):
A child created via fork(2) inherits a copy of its parent's signal
dispositions. During an execve(2), the dispositions of handled signals
are reset to the default; the dispositions of ignored signals are left
unchanged.
This could be considered to be a bug in beam. Usually daemon processes
will ignore SIGPIPE anyway.
Processes that launch a shell should not ignore SIGPIPE, since filter
programs may ignore write failures.
Setting SIGPIPE to the default handler seems to be the most predictable
behaviour for alcove. The downside is that it forces the caller to
manually ignore SIGPIPE on startup:
SIGPIPE = alcove:define(Drv, 'SIGPIPE'),
alcove:sigaction(Drv, SIGPIPE, ign).
The alternative is to ignore SIGPIPE and reset SIGPIPE to the defaults
after calling fork() or before calling exec but this would interfere
with any custom signal handlers the user has set up.
The OpenBSD version of the yes utility doesn't check for write errors:
if (argc > 1)
for (;;)
puts(argv[1]);
else
for (;;)
puts("y");
When the stream test completes, yes is still running in the background.
On OpenBSD, use the jot utility instead.