303 Commits

Author SHA1 Message Date
Michael Santos
0fe9e6c15e unshare: remove debug printf 2015-08-08 15:43:58 -04:00
Michael Santos
2379cd04a2 Change 'badpid' to a control message
Sending data to the stdin of a non-existent process will also return
'badpid'. Mark the type as a control message rather than the result of
a call.

Probably there only needs to be 2 types of messages: sync (calls) and
async (events). The control message type might be removed and these
events labelled as events.
2015-08-03 09:36:30 -04:00
Michael Santos
535eeb9530 readdir: do not filter paths
Return the result of readdir(3) without removing the magic "." and ".."
directories.
2015-07-25 11:54:39 -04:00
Michael Santos
d5e80fac33 setns/4: supporting setting namespace 2015-07-24 10:52:06 -04:00
Michael Santos
aa306a6ab2 Be explicit with checks
I go back and forth with this: be explicit or brief. Especially with
strcmp() where I always WTF.

The code is simpler and easier to read when the checks are explicit. For
example:

    // NULL pointer or integer?
    if (!x) return;
2015-07-20 10:41:46 -04:00
Michael Santos
ca0d6f12b4 constants: return {error, unuspported}
Return {error, unsupported} if an atom is used as an argument and the
constant the atom represents does not exist on the current platform.

The previous behaviour was inconsistent and non-deterministic. The
constant might:

* return {error,einval}. System return values could not be distinguished
  from alcove return values.

* cause an exception

* be silently ignored
2015-07-18 10:09:41 -04:00
Michael Santos
6a400730c6 alloc: call per file 2015-07-17 06:58:34 -04:00
Michael Santos
2c9633ff27 limit: call per file 2015-07-16 09:36:09 -04:00
Michael Santos
5d88ec52a0 exec: call per file 2015-07-15 09:39:05 -04:00
Michael Santos
07cc6f79dd mount: call per file 2015-07-14 05:34:57 -04:00
Michael Santos
5984f08dd2 signal: call per file
c_src/sys/alcove_signal.c has a function to map signal numbers to
constants used by the signal handler in the event loop.
2015-07-13 10:13:04 -04:00
Michael Santos
eebe8ccc7d proc: call per file 2015-07-12 08:58:48 -04:00
Michael Santos
a5bf4e282a alcove_file: call per file 2015-07-11 09:57:06 -04:00
Michael Santos
014411ba34 syscalls: rename files 2015-07-10 10:04:46 -04:00
Michael Santos
2650d22c4d utsname: calls per file 2015-07-09 08:57:19 -04:00
Michael Santos
8a2f876f7d alcove_env: call per file 2015-07-08 09:36:50 -04:00
Michael Santos
8f5c6d8550 alcove_fork.h: remove alcove_clone_constants 2015-07-06 13:48:03 -04:00
Michael Santos
5d03da9a2c Fix clone_define/2,3
Move out the clone constants to a header since fork() does not need
them. Include sched.h with the constants to ensure the CLONE_ constants
are defined.
2015-07-06 10:12:30 -04:00
Michael Santos
c7bd453e68 fork: individuals calls per file
Leave common utility functions in c_src/sys/alcove_fork.(c|h). Maybe
these files should be renamed "fork_common.c" or "alcove_fork_common.c".

Move alcove_setfd() to alcove.c since the pid_foreach() is also there.
These common functions should also eventually be moved out to another
file.
2015-07-05 11:59:55 -04:00
Michael Santos
bded978e4b mkdir: add missing header 2015-07-04 07:27:48 -04:00
Michael Santos
97fa5d5379 alcove_dir: separate files for each call 2015-07-04 07:19:48 -04:00
Michael Santos
febe77b585 cred: split out calls into separate files
Begin code re-organization by moving functions to one call per file.
2015-07-03 10:40:41 -04:00
Michael Santos
672af8a39f Port to rebar3
Support building with rebar3 and rebar2 using the makefile generated
from "rebar3 new cmake".

Do a straightforward port of the compiler options from the rebar port
compiler. The optimizations/warnings can be re-enabled later.

Use the fancy pants incremental building of each compilation unit from
the template. It's a lot slower than compiling in one go but is slightly
faster when running any rebar command (rebar runs make on each command
which recreates the target. The Makefile needs to be fixed).
2015-07-02 10:23:08 -04:00
Michael Santos
f060d44937 errno_id/2,3: convert errno integer to atom 2015-06-16 16:54:13 -04:00
Michael Santos
d81dcb688a Fix compilation errors on Solaris
Use POSIX 200112L/C99 mode on Solaris to suppress compilation warnings.

"-D__EXTENSIONS__=1" is required for NSIG (max number of signals). This
also pulls in an ERR macro which necessitates renaming the goto label
from ERR -> ERROR.
2015-02-21 09:21:48 -05:00
Michael Santos
3b15781433 Use a new message type for stdio events
Make a new message type (alcove_ctl) for informing the erlang process
when a stdio descriptor has been closed.
2015-01-01 17:09:48 -05:00
Michael Santos
5ea775fced badpid: event -> call
Although badpid may be returned by any process in the the fork chain,
it is a synchronous response to a call like badarg and undef.
2014-12-28 10:38:38 -05:00
Michael Santos
2ba7caaf76 Exit with badpid for invalid OS PID in fork path
A message sent to a non-existent Unix PID would be discarded, causing
the calling erlang process to block forever in receive.

This change modifies the behaviour of the port to return "badpid" if an
element in the fork path is not found. Since any element of the path may
be invalid, an event (as opposed to a call reply) is generated.
Currently any "badpid" event will cause the erlang process to exit.
Since only one call can be active at a time, it shouldn't interfere with
other operations but the mailbox should pattern match on the list
prefix (i.e., a call to [1,2,3,-1,4] would generate badpid for
[1,2,3,-1]).

Also define the behaviour when writing to invalid PIDs (<= 0). Since 0
is used to initialize the list of child PIDs in the port, passing in a
PID of 0 would match the first available slot in the PID table, causing
the port to write to fd 0.
2014-12-27 15:42:10 -05:00
Michael Santos
34b1760e87 Rename signal actions to reflect the POSIX naming
Use the same names for signal actions as the C headers. Since the
actions are atoms, the names are lower cased.

Invent the name 'sig_catch' to denote signals that are caught and passed
to the erlang side as messages.

Add the 'sig_' prefix to all the actions to avoid quoting 'catch', since it
is a reserved keyword.
2014-12-11 18:11:14 -05:00
Michael Santos
a2f812a9b6 openbsd: fix poll(2) with setrlmit(RLIMIT_NOFILE)
Like Linux and Solaris, poll(2) on OpenBSD will return EINVAL if the
number of file descriptors argument passed to poll is above the maximum
number of fd's allowed by setrlimit(2).

By inspection, NetBSD behaves in a similar way to FreeBSD (nfd is
allowed to exceed maximum fd).
2014-12-04 10:37:42 -05:00
Michael Santos
af4a17e5f1 Fix compilation on non-Linux systems 2014-12-01 11:54:21 -05:00
Michael Santos
b5f8fd36ff Add setpriority(2)/getpriority(2)
setpriority(2) is useful for limiting the impact of processes running in
a sandbox. It is also possible to exec the sandbox process using nice:

    alcove:execvp(Drv, Path, "/usr/bin/nice", ["/usr/bin/nice", "-n",
            "19", "sandbox"]).

The advantage of supporting set/getpriority natively in alcove are:

* nice doesn't need to exist in the sandbox chroot
* better error values (nice will return errors as a binary string)
* priorities can apply to fork trees
2014-12-01 11:20:48 -05:00
Michael Santos
a6e0c5de49 unsupported calls: use undef function exception
Returning a "plain" error tuple for unsupported calls breaks the type
spec for functions:

    On BSD:
        -spec setproctitle(alcove_drv:ref(),iodata()) -> 'ok'.

    On Linux and Solaris:
        -spec setproctitle(alcove_drv:ref(),iodata()) -> {'error','unsupported'}.

So the type spec would need to be amended to be the union of both return
values and portable code would have to test for both cases, with the
effect that any call in the future might return 'unsupported' if alcove
were ported to a new OS.

Attempting to make an supported call will now result in an "undefined
function" exception:

    1> catch alcove:setproctitle(P, "foo").
    {'EXIT',{undef,[{alcove,call,
                            [<0.46.0>,setproctitle,["foo"]],
                            [{file,"src/alcove.erl"},{line,426}]},
                    {erl_eval,do_apply,6,[{file,"erl_eval.erl"},{line,661}]},
                    {erl_eval,expr,5,[{file,"erl_eval.erl"},{line,434}]},
                    {shell,exprs,7,[{file,"shell.erl"},{line,684}]},
                    {shell,eval_exprs,7,[{file,"shell.erl"},{line,639}]},
                    {shell,eval_loop,3,[{file,"shell.erl"},{line,624}]}]}}
    2> os:type().
    {unix,linux}

This still places a burden on the caller. Portable code either needs to
hardcode supported functions by OS:

    % And imagine all the error checking this code is not doing ...
    proctitle(Name) ->
        case os:type() of
            {unix,linux} -> alcove:prctl(...);
            {unix,BSD} where BSD = ... -> alcove:proctitle(...)

Or run through sequences of awkward try/catch statements:

    proctitle(Name) ->
        try alcove:proctitle(...) of
            ok -> ok
        catch
            error:undef ->
                try alcove:prctl(...) of
                    {ok,_,_,_,_,_} -> ok
                catch
                    error:undef ->
                        % Does Solaris even?
                        ...
                end,
        end

Dealing with portability should be the job of a higher level library
built on top of alcove.
2014-11-30 09:35:06 -05:00
Michael Santos
8e1074fbd6 linux/solaris: fail if RLIMIT_NOFILE is less than openfd
The interaction between poll(2) and setrlimit(RLIMIT_NOFILE) differs
between BSD and Linux/Solaris.

On BSD systems, if RLIMIT_NOFILE is set to 0, poll(2) will continue to
work with open file descriptors.

On Linux/Solaris, poll(2) requires the minimum value of RLIMIT_NOFILE
to be equal to the max open file descriptor+1. If the nfd parameter to poll(2)
exceeds the current soft limit, poll(2) returns EINVAL, causing the
alcove process to exit.

Setting RLIMIT_NOFILE to 0 is useful for creating rlimit sandboxes. The
caller allocates whatever resources are required, drops privileges, then
sets limits on resources. For example, to prevent file creation, forking
of new processes and opening file descriptors, the following works on
BSD:

    ok = alcove:setrlimit(Drv, [Child], rlimit_fsize,
            #alcove_rlimit{cur = 0, max = 0}),
    ok = alcove:setrlimit(Drv, [Child], rlimit_nproc,
            #alcove_rlimit{cur = 0, max = 0}),
    ok = alcove:setrlimit(Drv, [Child], rlimit_nofile,
            #alcove_rlimit{cur = 0, max = 0}).

If an open file descriptor is closed, it cannot be re-opened:

    1> alcove:open(P, [F], "/etc/passwd", [o_rdonly], 0).
    {ok,6}
    2> alcove:setrlimit(P, [F], rlimit_nofile, #alcove_rlimit{cur = 0, max = 0}).
    ok
    3> alcove:read(P, [F], 6, 10).
    {ok,<<"# $FreeBSD">>}
    4> alcove:close(P, [F], 6).
    ok
    5> alcove:open(P, [F], "/etc/passwd", [o_rdonly], 0).
    {error,emfile}

On a linux or solaris system, RLIMIT_NOFILE would have to be set to 7.

The array holding the file descriptors will grow if RLIMIT_NOFILE is
increased but does not shrink (processes/file descriptors may exist and
would be leaked). Since BSD systems can poll any open file descriptor,
pass the whole array into poll(2). For linux/solaris, check the highest
currently opened fd is below RLIMIT_NOFILE and use the value of
RLIMIT_NOFILE as the number of file descriptors argument.
2014-11-29 15:06:40 -05:00
Michael Santos
9a0b6821c8 Notify of exit status by default
Processes have 2 ways of notifying the parent of exit: termination
signal and exit status. Since termination signals are enabled by
default, enable exit status for consistency. Exit status can be disabled
by using:

    alcove:setopt(Drv, ForkPath, exit_status, 0)
2014-11-22 15:56:58 -05:00
Michael Santos
8ab1b8908b Add pivot_root(2) 2014-11-15 10:17:46 -05:00
Michael Santos
5f1aac3eff Add MIN() macro 2014-11-11 13:17:26 -05:00
Michael Santos
1af5057213 Crash on poll(2) failures 2014-11-10 10:28:07 -05:00
Michael Santos
edcdea530c Namespace functions callable from erlang 2014-11-09 08:33:45 -05:00
Michael Santos
8dd0c78869 Add doc for setres([ug])id/getres([ug])id 2014-11-09 08:33:45 -05:00
Michael Santos
9500eb0169 Add getresgid(2) 2014-11-07 16:31:41 -05:00
Michael Santos
9cc0bf74f4 Add setresgid(2) 2014-11-06 15:47:26 -05:00
Michael Santos
2426063ac7 Add getresuid(2) 2014-11-04 13:47:40 -05:00
Michael Santos
33285e1dd8 Fix initialization warnings on OpenBSD 2014-11-03 11:35:43 -05:00
Michael Santos
d608a18a4b Add setresuid/4,5
alcove is usually called from sudo which sets uid, euid and suid
appropriately. However, if run with the setuid bit, the behaviour might
be more mysterious as shown in "Setuid Demystified":

http://www.usenix.org/events/sec02/full_papers/chen/chen.pdf

setresuid(2) is a more predictable interface. But of course it is not
supported by Solaris.
2014-11-02 09:27:08 -05:00
Michael Santos
b50a592858 Add support for notifications when stdin is closed 2014-11-01 16:29:08 -04:00
Michael Santos
c010cf65d4 Term must begin at start of buffer
Initialize the index passed into the macro to 0. These macros should be
removed in the future.
2014-11-01 16:29:08 -04:00
Michael Santos
79ad18051c Rename fd constants to match stdio naming 2014-10-31 15:12:27 -04:00
Michael Santos
96aae9e59c Use the errno for the exit value
Alcove will exit on fatal errors like resource allocation failures. Use
the value of errno as the exit value (which will be sent as a message if
the exit_status option is used).

The reason for the failure is sent to stderr using the err macros and so
will be in format:

    <<"alcove: ", Reason/binary>>
2014-10-30 15:18:10 -04:00
Michael Santos
244c4f6285 Support mount(2) on Solaris
Mount filesystems using the Solaris-specific version of the mount
interface. Solaris adds an options and options length parameter to
mount which takes a NULL terminated string of comma separated arguments.

On other Unix'es the options are either included in the mount flags
(MS_NOEXEC, ...) or in the data argument (<<"size=128M">> for tmpfs).

The behaviour of mount(2) on Solaris is bizarre: the options argument is
input/output, with the mount options placed in the buffer on return.

If the MS_OPTIONSTR is present in the mount flags and the options buffer
is too small, the mount call returns -1 and ERRNO is set to EOVERFLOW
but the mount actually succeeds! A more robust interface might truncate
the options to the size of the buffer, possibly seting the options
length to the required length and return 0.

Surprisingly, the options buffer can also be too large. This is so
weird, it must be a bug in alcove. If the buffer exceeds a certain size,
mount returns -1 with ERRNO set to EINVAL. The mount fails in this case:

    {error,einval} = alcove:mount(Drv, [Child], "swap", Dir, "tmpfs",
            [ms_optionstr], <<>>, <<"size=16m", 0:(1024*8)>>).

Since Solaris has this extra argument, mount/6,7 has to be extended to
mount/7,8, forcing all platforms to pass in an options parameter. This
parameter is ignored on all platforms except Solaris. The result is that
the mount interface is not the Linux mount(2) interface, it is some weird
hybrid that is awkward to use on all platforms (the interface does not
map to the mount(2) man page on any platform).

The value of the option parameter is also not returned to the caller on
Solaris. Options to fix this include:

* breaking out Solaris mount(2) to a Solaris specific call (mountext or
  whatever)

* checking if opt is non-NULL on return and opt len > 0. If so, return:

    {ok, binary()}

  This extends the mount/7,8 type to:

    ok | {ok,binary()} | {error,posix()}
2014-10-26 10:42:54 -04:00