The macro should be a "pure" function. Remove the call to the port. This
forces the caller to catch the case where a syscall does not exist on a
particular platform.
Move the BPF generation macros to the header file. This needs further
cleanup:
* some of the macros require the port pid, others do not
* some of the macros refer to functions
* some of the raw BPF statements should be wrapped in a function or
macro
Return information about the length of unallocated binaries so the
caller can return a list including the contents of the pointers, rather
than a binary including the pointers themselves.
ERL_LIBS was pulling in the bpf functions from procket. Copy/paste of
bpf functions from procket into alcove_seccomp. Amazingly, seccomp mode
worked out of the box. Still needs a lot of clean up, as well as a
higher level, simpler, friendlier library to wrap it all up.
The biggest change to follow will be altering the return value of prctl.
Right now, passing in a struct (represented as a list of binaries and tuples)
will return a binary. This isn't very useful because the binary contains
the pointer to the value, not the value itself. With seccomp mode, this
value isn't updated anyway.
prctl/6,7 will change to returning the contents of the pointer. So if a
struct is passed in like:
[<<"some data">>, {ptr, 6}, {ptr, <<"stuff">>}, <<"foo">>]
The return value will look like:
[<<"some data">>, {ptr, <<1,2,3,4,5,6>>}, {ptr, <<"stuff">>}, <<"foo">>]
Provide an interace for creating buffers with embedded pointers for
seccomp mode. Seccomp mode uses bpf programs which look like:
struct sock_fprog {
unsigned short len;
struct sock_filter *filter;
};
The caller can provide generate the structure using a list:
[
% 8 instructions in filter
<<8:2/native-unsigned-integer-unit:8>>,
% Pad
<<0:16>>,
% Generate a binary containing the BPF instructions
{ptr, bpf_filter()}
]
Rough, buggy code to get started testing seccomp mode. In particular,
the contents of the allocated buffers is write only currently. Some of
the prctl() calls are probably in/out.
Since stdout is used for control and for data, add a new message type
(proxy) to distinguish them.
When a child calls exec():
* if the call is unsuccessful, the child writes a term to stdout which
the parent proxies
* if the call is successful, no value is returned but the child will close
the control socket on exec. The parent will "spoof" an "ok" response.
Allocate the message headers on the heap to make it easier to layer them.
If the child process has not exec'ed, use the packet length to avoid
having multiple writes coalesce into a single read.
Before checking if an fd is ready, ensure the fd is a positive value. On
freebsd, invalid fd's cause FD_ISSET() to segfault.
Modify the message format to include the PID. Tests pass but requires
much more work. The message format is now:
* erlang -> port: make a call
2:ALCOVE_MSG_CALL
2:call
*:arg
* port -> erlang: call results
2:ALCOVE_MSG_CALL
*:arg
* erlang -> port: write to child's stdin
2:ALCOVE_MSG_CHILDIN
4:pid
* port -> erlang: write from child's stdout
2:ALCOVE_MSG_CHILDOUT
4:pid
*:data
* port -> erlang: write from child's stderr
2:ALCOVE_MSG_CHILDERR
4:pid
*:data
alcove acts a proxy between the Erlang VM and a forked process. It can
enforce restrictions on the child process, such as dropping privileges,
setting resource limits and chroot'ing.
The goal is to support Linux namespaces, seccomp mode and cgroups so
the port process can run in an application container.
alcove should be portable though and a subset of the features should
work on any unix, possibly even supporting the sandboxing mechanisms on
other platforms.