44 Commits

Author SHA1 Message Date
Michael Santos
572742b376 alcove_cgroup:set/6: return value
Make the result of alcove_cgroup:set/6 more consistent:

* return ok on success
* return an errno tuple if open() or write() fails
* return {error,enoent} if the cgroup does not exist, instead of []

In the case of a partial write, currently the code will crash, taking
care to close the fd in the unix child process first. The code could
select on the fd and attempt another write.
2015-07-29 11:47:44 -04:00
Michael Santos
f9ac18d0c1 examples: set MS_PRIVATE on container mounts
Ubuntu 15.04 sets the MS_SHARED flag on system mounts. Since this flag
is inherited by container mounts and the container filesystem is visible
to the global namespace.

Use the MS_PRIVATE flag on container mounts to prevent this issue.
2015-07-28 07:55:32 -04:00
Michael Santos
d755ada152 Remove function versions with optional fork path
Enforce the use of the fork path. Having an optional fork path was nice
when working in the shell:

    {ok, Child} = alcove:fork(Drv).

Instead of:

    {ok, Child} = alcove:fork(Drv, []). % port process forks

However it introduced a few problems:

* made the interface inconsistent and ambiguous

    alcove:kill(Drv, Pid, 9)
    % vs
    alcove:kill(Drv, [], Pid, 9) % the port process is sending the signal

* calls could not have optional arguments

  Whether or not calls should have optional arguments is an open question
  but the optional fork path would have conflicting arities:

  For example, the last argument to mount/8 is used only by Solaris:

  % arity 6
  -spec mount(alcove_drv:ref(),iodata(),iodata(),iodata(),uint64_t() | [constant()],iodata()) -> 'ok' | {'error', file:posix() | 'unsupported'}.
  % arity 7
  -spec mount(alcove_drv:ref(),iodata(),iodata(),iodata(),uint64_t() | [constant()],iodata(),iodata()) -> 'ok' | {'error', file:posix() | 'unsupported'}.
  % arity 7
  -spec mount(alcove_drv:ref(),fork_path(),iodata(),iodata(),iodata(),uint64_t() | [constant()],iodata()) -> 'ok' | {'error', file:posix() | 'unsupported'}.
  % arity 8
  -spec mount(alcove_drv:ref(),fork_path(),iodata(),iodata(),iodata(),uint64_t() | [constant()],iodata(),iodata()) -> 'ok' | {'error', file:posix() | 'unsupported'}.

* because of the ambiguity in arity, each call can't have an optional
  timeout

  call/5 sets a timeout of 'infinity'. The timeout isn't accessible from
  the "named" functions (e.g., fork, chdir, ...), which means that users
  who need the timeout will have to resort to using the call/5
  interface. An unfortunate side effect of using call/5 is that dialzyer
  won't be able to type check the arguments.

The result of removing the functions without the fork path is that the
code ends up being simpler and more consistent.
2015-07-19 09:41:51 -04:00
Michael Santos
c8a4aabf90 alcove_drv: add start_link/0,1
Export start_link/0,1. start/0,1 is no longer an alias for start_link
and is mainly useful for testing.
2014-12-20 16:49:13 -05:00
Michael Santos
71117bcefd Multiplex erlang process access to the port
Allow any erlang processes to send messages into the port, similar
to the way port drivers can respond directly to a process. The Unix PID
fork path is used as the key to map the port response to the Erlang PID
by the gen_server.

The current implementation is an experiment only and has a race
condition. Process A calls into the gen_server and blocks. Process B calls
into the port for the same Unix fork path and blocks. The response for
both requests will be sent to process B and process A will block forever.

In other words, the last erlang process to make a request to the fork
path becomes the controlling process. If the erlang process dies, any
data generated by the Unix process is sent to the process that started
the gen_server.

What the behaviour should be needs to be defined. For example, using the
gen_tcp behaviour of controlling_process would be problematic: since the
response always goes to the controlling process, a call from another
erlang process would hang. After the Unix process has called exec(),
allowing multiple processes to send in data may make sense.

Making the controlling process the only process privileged to talk to
the fork path would have some weird side effects:

* data for the fork path would have to be serialized through the
  controlling process

* if a non-owning process sends a message to the fork path, either we
  have to extend the type spec for each call to include
  {error,not_owner} or we send a badsig exception and the client is
  forced to wrap each call in a try/catch

There should also be a concept of linking between unix and erlang
processes:

* processes are unlinked

    * erlang process dies: stdout/stderr from the unix process should
      be dropped

    * unix process dies: controlling process gets the normal exit
      messages (exit_status, termsig)

* processes are linked

    * erlang process dies: unix process gets a SIGKILL

    * unix process dies: erlang process gets an exit(kill)

* erlang process monitors unix process (?)

    * unix process dies: erlang process gets a 'DOWN' message

To test the concept works, modify the tcplxc example to talk directly to
the port from each erlang container process (the example needs much more
cleanup and should be converted to an OTP process).
2014-12-08 11:56:45 -05:00
Michael Santos
244c4f6285 Support mount(2) on Solaris
Mount filesystems using the Solaris-specific version of the mount
interface. Solaris adds an options and options length parameter to
mount which takes a NULL terminated string of comma separated arguments.

On other Unix'es the options are either included in the mount flags
(MS_NOEXEC, ...) or in the data argument (<<"size=128M">> for tmpfs).

The behaviour of mount(2) on Solaris is bizarre: the options argument is
input/output, with the mount options placed in the buffer on return.

If the MS_OPTIONSTR is present in the mount flags and the options buffer
is too small, the mount call returns -1 and ERRNO is set to EOVERFLOW
but the mount actually succeeds! A more robust interface might truncate
the options to the size of the buffer, possibly seting the options
length to the required length and return 0.

Surprisingly, the options buffer can also be too large. This is so
weird, it must be a bug in alcove. If the buffer exceeds a certain size,
mount returns -1 with ERRNO set to EINVAL. The mount fails in this case:

    {error,einval} = alcove:mount(Drv, [Child], "swap", Dir, "tmpfs",
            [ms_optionstr], <<>>, <<"size=16m", 0:(1024*8)>>).

Since Solaris has this extra argument, mount/6,7 has to be extended to
mount/7,8, forcing all platforms to pass in an options parameter. This
parameter is ignored on all platforms except Solaris. The result is that
the mount interface is not the Linux mount(2) interface, it is some weird
hybrid that is awkward to use on all platforms (the interface does not
map to the mount(2) man page on any platform).

The value of the option parameter is also not returned to the caller on
Solaris. Options to fix this include:

* breaking out Solaris mount(2) to a Solaris specific call (mountext or
  whatever)

* checking if opt is non-NULL on return and opt len > 0. If so, return:

    {ok, binary()}

  This extends the mount/7,8 type to:

    ok | {ok,binary()} | {error,posix()}
2014-10-26 10:42:54 -04:00
Michael Santos
03fd99abb6 Remove command line switches
Minimize the arguments for the port. These options can be set using
alcove:setopt/3,4.

Command line arguments should be reserved for options that can be set
only at start up.
2014-10-22 10:22:36 -04:00
Michael Santos
131fd966e9 open/4,5: make close on exec optional
open(2) on Linux, OpenBSD, FreeBSD and NetBSD support the O_CLOEXEC
flag. Close on exec can be set by the caller.

This change allows passing privileged file descriptors to an unprivileged
process, by:

* opening the fd as root
* dropping privs
* calling exec

fnctl(2) can be introduced later if there are any platforms that do not
support the O_CLOEXEC flag.
2014-10-11 11:13:26 -04:00
Michael Santos
d706dbcd23 tcplxc: fix fd leak 2014-10-01 11:31:20 -04:00
Michael Santos
489069f934 mount: convert mountflags to list
Pass in an integer or a list of integers/atoms for the mount flags
parameter.

The mount flags is an unsigned integer and the decode function returns
an int. While this is probably ok, there are a number of typecasts in
the lookup code that needs to cleaned up.
2014-09-20 15:23:37 -04:00
Michael Santos
d1c78b831a clone/2,3: use a list of atoms/integers for flags
Accept an integer or a list of atoms/integers for the flags argument. If
a list is passed, the value are OR'ed together. The values for atoms are
looked up from the constants defined in the system header files.
2014-09-20 15:13:56 -04:00
Michael Santos
b6f63b732b examples: use atoms instead of integers 2014-09-20 11:46:51 -04:00
Michael Santos
85acc907b2 Lowercase all the atoms
Use lowercase atoms for the constants taken from header files.
2014-09-15 07:31:09 -04:00
Michael Santos
d425f54097 tcplxc: drain any pending stdio before exiting 2014-07-12 11:36:35 -04:00
Michael Santos
d426a1ef45 tcplxc: cleanup mounts 2014-05-26 11:04:48 -04:00
Michael Santos
80b1b84745 tcplxc: set temp directory 2014-05-23 10:33:43 -04:00
Michael Santos
844afd21d9 tcplxc: don't unmount pts in container 2014-05-23 09:54:53 -04:00
Michael Santos
0b97186624 tcplxc: mount /dev/pts
Allow ttys inside the container using /dev/pts.
2014-05-19 13:13:14 -04:00
Michael Santos
602046d536 tcplxc: set default environment variables
Default environment variables can be overriden by the user.

Rename the proplist keyword from 'env' to 'environ', since it clashes
with the 'env' option to alcove_drv:start/1. These options should be
namespaced or put into another proplist to prevent this.
2014-05-17 10:34:04 -04:00
Michael Santos
1255a32f93 tcplxc: allow creating system files
Mount /etc as a tmpfs filesystem. By default, create passwd and group
files. These can be overwritten by the user.

System files are created while running inside the read-only mount
namespace, so attempts to write the bind mounted system directories will
fail.
2014-05-17 10:34:03 -04:00
Michael Santos
a2b32f56f7 tcplxc: properly set the flags on bind mounts
Setting the mount flags on bind mounts requires 2 calls: 1 to perform
the mount, the second to remount the filesystem with the appropriate
flags. According to the man page, the bind flag is required in both
mounts:

    Note that behavior of the remount operation depends on the /etc/mtab
    file. The first command stores the 'bind' flag to the /etc/mtab
    file and the second command reads the flag from the file.  If you
    have a system without the /etc/mtab  file  or  if  you  explicitly
    define source and target for the remount command (then mount(8)
    does not read /etc/mtab), then you have to use bind flag (or option)
    for the remount command too. For example:

        mount --bind olddir newdir
        mount -o remount,ro,bind olddir newdir
2014-05-17 10:34:03 -04:00
Michael Santos
da221aa79c tcplxc: pass in files to write to the chroot 2014-05-16 11:08:00 -04:00
Michael Santos
5a6bbd5875 tcplxc: allowing running any executable 2014-05-16 11:08:00 -04:00
Michael Santos
1d3c9f1727 examples: exec bash, mount dirs for a full OS image 2014-05-14 10:56:32 -04:00
Michael Santos
74433b1ebd tcplxc: log the remote IP address/port 2014-05-11 15:55:05 -04:00
Michael Santos
d2060d9924 examples: #rlimit{} -> #alcove_rlimit{} 2014-05-09 14:15:36 -04:00
Michael Santos
6d3f13f8b4 tcplxc: close the socket if the shell exits 2014-05-06 08:21:02 -04:00
Michael Santos
d8af759ec5 tcplxc: remove cgroup on exit
Destroy the cgroup when the container exits. Use a constant name for the
container hostname ("alcove" + os pid). If leaking the pid is a concern,
the code could generate random bytes on startup and hash(pid, bytes).
2014-05-05 10:59:42 -04:00
Michael Santos
2f51b52b5e tcplxc: create a cgroup namespace per container 2014-05-04 09:53:05 -04:00
Michael Santos
b68086154f cgroup: use an iolist() for the namespace path
Use a list of binaries as the namespace:

    [<<"alcove">>, <<"guest1234">>] % <<"alcove/guest1234">>

is converted to:

    <<"/sys/fs/cgroup/blkio/alcove/guest1234">>
    <<"/sys/fs/cgroup/cpu/alcove/guest1234">>
    <<"/sys/fs/cgroup/cpuacct/alcove/guest1234">>
    <<"/sys/fs/cgroup/cpuset/alcove/guest1234">>
2014-05-04 09:53:05 -04:00
Michael Santos
6655928fda tcplxc: fix process leak on error 2014-05-03 12:23:47 -04:00
Michael Santos
8d1b387792 tcplxc: restrict the number of processes
With the current cgroup limits, a fork bomb causes the container cgroup
limit to be exceeded and the fork bomb is killed. Works well.

Add a resource limit on the number of processes. It is sort of redundant
given the cgroup limit but is useful on systems without cgroup support.
2014-05-03 11:50:17 -04:00
Michael Santos
53cb7a2ab8 tcplxc: allow cgroup limits to fail
Not all the cgroups may exist. For example, the stock raspbian doesn't
have cpuset. The cgroup code could be smarter about this, but it'd
complicate the example.
2014-05-02 11:03:02 -04:00
Michael Santos
828ba5c260 tcplxc: fix license 2014-05-01 11:40:12 -04:00
Michael Santos
a07e723d5c example: create a Linux container per connection
Rough working code for creating a Linux container, restricted by
cgroups:

    erl: tcplxc:start().

    shell: nc localhost 31337

Multiple containers are supervised by one port in this example.
2014-05-01 10:37:29 -04:00
Michael Santos
51c7f15e38 gpioled: pass the internal state using a record 2014-04-28 09:07:25 -04:00
Michael Santos
27b0416288 Revert "examples: clean up gpioled"
This reverts commit 999de650dc.

unexport vs export
2014-04-26 13:04:30 -04:00
Michael Santos
999de650dc examples: clean up gpioled 2014-04-26 11:42:26 -04:00
Michael Santos
e2c6582661 examples: tcpsh
Bind a shell, running in a namespace, to a socket. Runs one Erlang port
process per container (so a minimum of 2 Unix processes per connection).
2014-04-26 11:26:07 -04:00
Michael Santos
b62e73d2e3 Simplify uid generation 2014-04-24 08:18:50 -04:00
Michael Santos
26a207d876 Stream messages from port through a proxy process
Convert the binary format messages from the port into erlang terms using
a gen_server.
2014-04-22 14:16:06 -04:00
Michael Santos
78fa3f077a Add an example of interacting with GPIO
Tested on a beaglebone black and on a raspberry pi.
2014-04-21 11:26:03 -04:00
Michael Santos
32d5e5f98e Fix license 2014-04-21 11:07:43 -04:00
Michael Santos
882494c3bc Begin adding documentation and examples 2014-04-18 18:02:44 -04:00