Sending data to the stdin of a non-existent process will also return
'badpid'. Mark the type as a control message rather than the result of
a call.
Probably there only needs to be 2 types of messages: sync (calls) and
async (events). The control message type might be removed and these
events labelled as events.
Make the result of alcove_cgroup:set/6 more consistent:
* return ok on success
* return an errno tuple if open() or write() fails
* return {error,enoent} if the cgroup does not exist, instead of []
In the case of a partial write, currently the code will crash, taking
care to close the fd in the unix child process first. The code could
select on the fd and attempt another write.
Enforce the use of the fork path. Having an optional fork path was nice
when working in the shell:
{ok, Child} = alcove:fork(Drv).
Instead of:
{ok, Child} = alcove:fork(Drv, []). % port process forks
However it introduced a few problems:
* made the interface inconsistent and ambiguous
alcove:kill(Drv, Pid, 9)
% vs
alcove:kill(Drv, [], Pid, 9) % the port process is sending the signal
* calls could not have optional arguments
Whether or not calls should have optional arguments is an open question
but the optional fork path would have conflicting arities:
For example, the last argument to mount/8 is used only by Solaris:
% arity 6
-spec mount(alcove_drv:ref(),iodata(),iodata(),iodata(),uint64_t() | [constant()],iodata()) -> 'ok' | {'error', file:posix() | 'unsupported'}.
% arity 7
-spec mount(alcove_drv:ref(),iodata(),iodata(),iodata(),uint64_t() | [constant()],iodata(),iodata()) -> 'ok' | {'error', file:posix() | 'unsupported'}.
% arity 7
-spec mount(alcove_drv:ref(),fork_path(),iodata(),iodata(),iodata(),uint64_t() | [constant()],iodata()) -> 'ok' | {'error', file:posix() | 'unsupported'}.
% arity 8
-spec mount(alcove_drv:ref(),fork_path(),iodata(),iodata(),iodata(),uint64_t() | [constant()],iodata(),iodata()) -> 'ok' | {'error', file:posix() | 'unsupported'}.
* because of the ambiguity in arity, each call can't have an optional
timeout
call/5 sets a timeout of 'infinity'. The timeout isn't accessible from
the "named" functions (e.g., fork, chdir, ...), which means that users
who need the timeout will have to resort to using the call/5
interface. An unfortunate side effect of using call/5 is that dialzyer
won't be able to type check the arguments.
The result of removing the functions without the fork path is that the
code ends up being simpler and more consistent.
Return {error, unsupported} if an atom is used as an argument and the
constant the atom represents does not exist on the current platform.
The previous behaviour was inconsistent and non-deterministic. The
constant might:
* return {error,einval}. System return values could not be distinguished
from alcove return values.
* cause an exception
* be silently ignored
If the alcove port exits, cause the gen_server to crash.
The reason for crashing:
1. The type specs for all calls was implictly extended to include
{error,closed} on port failure.
2. There is no error recovery the calling process could do (except
cleaning up state). The caller can still catch errors, just like when
dealing with a normal port.
3. Asynchronous messages into the port (stdin) had to be handled as calls,
rather than casts.
A message sent to a non-existent Unix PID would be discarded, causing
the calling erlang process to block forever in receive.
This change modifies the behaviour of the port to return "badpid" if an
element in the fork path is not found. Since any element of the path may
be invalid, an event (as opposed to a call reply) is generated.
Currently any "badpid" event will cause the erlang process to exit.
Since only one call can be active at a time, it shouldn't interfere with
other operations but the mailbox should pattern match on the list
prefix (i.e., a call to [1,2,3,-1,4] would generate badpid for
[1,2,3,-1]).
Also define the behaviour when writing to invalid PIDs (<= 0). Since 0
is used to initialize the list of child PIDs in the port, passing in a
PID of 0 would match the first available slot in the PID table, causing
the port to write to fd 0.
The port will accept negative PIDs. A PID is usually defined as a signed
32-bit integer.
For fork paths, values will always be positive (as returned by fork;
fork errors will return a tuple rather than -1). If the user passes a
negative number as a component of the fork path, the message will be
discared.
Allow any erlang processes to send messages into the port, similar
to the way port drivers can respond directly to a process. The Unix PID
fork path is used as the key to map the port response to the Erlang PID
by the gen_server.
The current implementation is an experiment only and has a race
condition. Process A calls into the gen_server and blocks. Process B calls
into the port for the same Unix fork path and blocks. The response for
both requests will be sent to process B and process A will block forever.
In other words, the last erlang process to make a request to the fork
path becomes the controlling process. If the erlang process dies, any
data generated by the Unix process is sent to the process that started
the gen_server.
What the behaviour should be needs to be defined. For example, using the
gen_tcp behaviour of controlling_process would be problematic: since the
response always goes to the controlling process, a call from another
erlang process would hang. After the Unix process has called exec(),
allowing multiple processes to send in data may make sense.
Making the controlling process the only process privileged to talk to
the fork path would have some weird side effects:
* data for the fork path would have to be serialized through the
controlling process
* if a non-owning process sends a message to the fork path, either we
have to extend the type spec for each call to include
{error,not_owner} or we send a badsig exception and the client is
forced to wrap each call in a try/catch
There should also be a concept of linking between unix and erlang
processes:
* processes are unlinked
* erlang process dies: stdout/stderr from the unix process should
be dropped
* unix process dies: controlling process gets the normal exit
messages (exit_status, termsig)
* processes are linked
* erlang process dies: unix process gets a SIGKILL
* unix process dies: erlang process gets an exit(kill)
* erlang process monitors unix process (?)
* unix process dies: erlang process gets a 'DOWN' message
To test the concept works, modify the tcplxc example to talk directly to
the port from each erlang container process (the example needs much more
cleanup and should be converted to an OTP process).
Handle framing in erlang code rather than relying on the {packet,2}
option to open_port/2. This change simplifies the port protocol codec
and should eventually allow easily calling exec() in the port.
An alcove system process may exit unexpectedly during a call, for
example, if poll(2) exits due to an error. In this situation, the caller
will hang in receive.
setpriority(2) is useful for limiting the impact of processes running in
a sandbox. It is also possible to exec the sandbox process using nice:
alcove:execvp(Drv, Path, "/usr/bin/nice", ["/usr/bin/nice", "-n",
"19", "sandbox"]).
The advantage of supporting set/getpriority natively in alcove are:
* nice doesn't need to exist in the sandbox chroot
* better error values (nice will return errors as a binary string)
* priorities can apply to fork trees
Returning a "plain" error tuple for unsupported calls breaks the type
spec for functions:
On BSD:
-spec setproctitle(alcove_drv:ref(),iodata()) -> 'ok'.
On Linux and Solaris:
-spec setproctitle(alcove_drv:ref(),iodata()) -> {'error','unsupported'}.
So the type spec would need to be amended to be the union of both return
values and portable code would have to test for both cases, with the
effect that any call in the future might return 'unsupported' if alcove
were ported to a new OS.
Attempting to make an supported call will now result in an "undefined
function" exception:
1> catch alcove:setproctitle(P, "foo").
{'EXIT',{undef,[{alcove,call,
[<0.46.0>,setproctitle,["foo"]],
[{file,"src/alcove.erl"},{line,426}]},
{erl_eval,do_apply,6,[{file,"erl_eval.erl"},{line,661}]},
{erl_eval,expr,5,[{file,"erl_eval.erl"},{line,434}]},
{shell,exprs,7,[{file,"shell.erl"},{line,684}]},
{shell,eval_exprs,7,[{file,"shell.erl"},{line,639}]},
{shell,eval_loop,3,[{file,"shell.erl"},{line,624}]}]}}
2> os:type().
{unix,linux}
This still places a burden on the caller. Portable code either needs to
hardcode supported functions by OS:
% And imagine all the error checking this code is not doing ...
proctitle(Name) ->
case os:type() of
{unix,linux} -> alcove:prctl(...);
{unix,BSD} where BSD = ... -> alcove:proctitle(...)
Or run through sequences of awkward try/catch statements:
proctitle(Name) ->
try alcove:proctitle(...) of
ok -> ok
catch
error:undef ->
try alcove:prctl(...) of
{ok,_,_,_,_,_} -> ok
catch
error:undef ->
% Does Solaris even?
...
end,
end
Dealing with portability should be the job of a higher level library
built on top of alcove.
Minimize the arguments for the port. These options can be set using
alcove:setopt/3,4.
Command line arguments should be reserved for options that can be set
only at start up.
Generate an event when the child process closes stdout or stderr,
similar to the control fd. The idea is to try simplifying the C code
by moving some of the logic into the gen_server.
After exec() is called, wait for the control fd to be closed. If a
signal is received, it will be queued in the process mailbox.
This change may seem racy because, during the window between the erlang
process requesting exec() and the unix process performing the exec(),
the unix process could receive a signal. If this occurred, the control
fd will become ready and the parent will send an fdctl_closed event on
behalf of the child.
There is a race condition in signal handling that gives invalid
responses to calls on some systems:
in function alcove_seccomp_tests:'-kill/1-fun-2-'/2 (test/alcove_seccomp_tests.erl, line 76)
**error:{assertEqual_failed,[{module,alcove_seccomp_tests},
{line,76},
{expression,"Reply1"},
{expected,{'EXIT',{termsig,sigsys}}},
{value,ok}]}
alcove_seccomp_tests:77: kill...*failed*
in function alcove_seccomp_tests:'-kill/1-fun-4-'/2 (test/alcove_seccomp_tests.erl, line 77)
**error:{assertEqual_failed,[{module,alcove_seccomp_tests},
{line,77},
{expression,"Reply2"},
{expected,{error,esrch}},
{value,ok}]}
The failed tests indicates the parent process has seen the child
process' control fd has been closed but the child process has either not
yet terminated or the termination status has not been propagated to the
parent.
There are 2 types of calls: calls that return a value and calls that
never return such as execve(2), execvp(2) and exit(2).
For calls that do not return, the closing of the child's control fd
indicates success.
For all other calls, the child should not exit. If the parent generates
a "fdctl_closed" event on behalf of the child, the erlang process blocks
until a termsig event is received.
Instead of replying with a fake response to calls that do not return
(exit, exec), send an event indicating the child's control fd has
closed and allow the erlang driver to interpret the meaning of the
event.
Signal handling has some problems:
* If a child process receives a signal and exits, the parent detects
the child's control fd has been closed and sends a spurious event
({alcove_event, [pid()], ok}) to the erlang side.
* When the erlang side makes a call, it sends a message into the port
then blocks in receive.
If the child process doing the operation gets a signal, the situation
becomes murky. The parent may:
* notice the control fd closing and return 'ok'
For example, if the process doing a getcwd/2 received a signal:
{ok, Cwd} = alcove:getcwd(Drv, [Pid]) % badmatch: ok
Even worse if the expected return value is 'ok'.
* no response is sent back to the erlang side. The erlang process will
block forever.
Make the signal handling more deterministic by:
* checking if the child has exited due to receiving a signal when the
control fd is closed
* enabling termsig notification by default
* killing the erlang process if a signal is received during a call
(similar to calls with a timeout set)
Killing the erlang process is still not correct. Since the erlang
process is linked to the gen_server, the port will be killed.
Accept an integer or a list of atoms/integers for the flags argument. If
a list is passed, the value are OR'ed together. The values for atoms are
looked up from the constants defined in the system header files.
Accept and return signals as names. For example, if a signal is trapped,
the process will receive:
{signal,'SIGCHLD'}
Versus:
{signal,17} % on linux
Benefits:
* similar to the way errno is returned, e.g., {errno,enametoolong}
* better portability: integer values differ by platform
* library user does not need lookup the signal value before calling
sigaction/3,4 or pattern matching the process mailbox
Problems:
* sigaction/3,4 will throw badarg if the name is unknown
Probably it should return an error ({error,einval}).
* not all functions dealing with signals have been changed to use atoms,
e.g., kill/3,4
* all functions with mapping of constants -> integers should take
constants:
clone_define
mount_define
file_define
prctl_define
rlimit_define
syscall_define
If this is done, alcove:define/2,3 and the *_define/*_constant functions
should be removed.
* should the constants be upper or lower case atoms?
atoms are typically lowercased, e.g., {error,ebadf} not
{error,'EBADF'}
For example:
PR_SET_PDEATHSIG -> pr_set_pdeathsig
* if the interface accepts constants, should support for integers be
removed?
1. No way to pass in a "raw" signum if the signal does not have a name
2. Signals without a name will be sent to the process mailbox as:
{signal,unknown}
If the caller has trapped 58 and 59, there won't be a way to
distinguish these signals.
Use sigaction/3,4 for setting the SIGCHLD handler. The default action is
to ignore SIGCHLD, i.e., not notify the caller of SIGCHLD events.
Notifcation can be enabled by setting the handler to 'trap':
SIGCHLD = alcove:define(Drv, 'SIGCHLD'),
alcove:sigaction(Drv, SIGCHLD, trap).
Messages longer than can fit in a 2 byte header will cause the port to
exit (not sure what the behaviour of the erlang side is when a message
bigger than the header can represent is sent ... truncate?)
Be friendlier to the caller and return badarg. This will still crash the
process (unless badarg is caught).
Replace use of a list to pass arguments to a call with a tuple. Parsing
the elements in the external term format is less error prone. A list, in
the term format, can be:
* a string (a char buf)
* a list header followed by terms, terminated with an empty list
* a list header followed by 1 element, followed by a list header ...,
terminated by an empty list
Since the calls take a fixed number of arguments (none of the vararg
versions are used), a tuple might be a better erlang representation.
It also fits in better with the usage of tuples in other functional
languages, where a tuple can hold any types but a list contains the
same type.
alcove:call(Port, fork, []) -> fork()
alcove:call(Port, exit, [1]) -> exit(1)
alcove:call(Port, execvp, ["/bin/echo", ["/bin/echo",
"foobar"]]) -> execvp("/bin/echo", argv)
alcove:call(Drv, open, ["/tmp/foo", 0, 0]) ->
open("/tmp/foo", 0, 0)
Would become:
alcove:call(Port, fork, {}) -> fork()
alcove:call(Port, exit, {1}) -> exit(1)
alcove:call(Port, execvp, {"/bin/echo", ["/bin/echo",
"foobar"]}) -> execvp("/bin/echo", argv)
alcove:call(Drv, open, {"/tmp/foo", 0, 0}) ->
open("/tmp/foo", 0, 0)
However that exposes the caller to the dreaded single element tuple.
Simplify the C port code by converting the argument list to a tuple.
Convert to static buffers for encoding/decoding the erlang external term
format using ei.
Use of pthreads when fork/exec is involved is risky. After a process
calls fork(), threads may be holding a lock and cause a deadlock. So the
operations that can be performed after a fork (before exec) are
basically limited to the same operations that can be done in a signal
handler. Operations that may be unsafe include malloc and the f* stream
functions.
alcove relied on erl_interface for encoding/decoding the external term
format. Internally, erl_interface uses pthreads.
Since the interesting part of alcove is the set of operations that can
be done post-fork and pre-exec, remove the dependency on erl_interface. The
change was done manually and was pretty massive for something done
by hand. alcove compiles and all the tests pass but there will be
regressions, especially considering not all calls are tested. For example,
cgroups are likely broken.
These changes have been tested only on Linux/32-bit/ARM.
This change had the benefit of simplifying the code and applying limits
to some of the inputs.
ei was used for convenience. It may be better to move to a custom
protocol or building a custom library for decoding the ETF.
ei/the term format has many problems:
* lists of small integers are encoded as strings
A list may be magically converted to a string if the list is comprised
of integers 0-255. Interestingly, 0 is considered to be a string even
though strings are NULL-terminated. So [0] is encoded as a string.
C code must catch both cases and possibly normalize the string into a
list. Even trickier since a list is terminated by an empty list.
* integer overflows
All ei functions dealing with static buffers take an pointer to an
offset. This index is a signed integer which can overflow.
Not a problem for alcove since messages are limited to lengths that can be
represented in 2 bytes.
* buffer overflows
The ei functions do not take a length. While the size of certain types
(strings, binaries, atoms) can be queried using ei_get_type(), other
types report a size of 0. Since the size of these encodings are not
known to the caller, it is questionable what is an appropriately sized
buffer.
The term encoding changes in this commit are sketchy and need to be
cleaned up. Some of the functions encode a complete term, others encode
at an offset and others return an allocated buffer.
To ensure consistency, exit if a timeout is set and reached. Otherwise,
late messages may arrive in the queue, messing up subsequent calls:
1> {ok,P} = alcove_drv:start().
{ok,<0.45.0>}
2> catch alcove:call(P, [], getpid, [], 0).
{'EXIT',timeout}
3> alcove:version(P).
2897
4> flush().
Shell got {alcove_call,<0.45.0>,[],<<"0.6.1">>}
ok
Using timeouts might be needed if it is not known whether is still
running in the event loop.
Remove the cast functions, since they are dangerous and can be emulated
by using call/5.