Default environment variables can be overriden by the user.
Rename the proplist keyword from 'env' to 'environ', since it clashes
with the 'env' option to alcove_drv:start/1. These options should be
namespaced or put into another proplist to prevent this.
Mount /etc as a tmpfs filesystem. By default, create passwd and group
files. These can be overwritten by the user.
System files are created while running inside the read-only mount
namespace, so attempts to write the bind mounted system directories will
fail.
Setting the mount flags on bind mounts requires 2 calls: 1 to perform
the mount, the second to remount the filesystem with the appropriate
flags. According to the man page, the bind flag is required in both
mounts:
Note that behavior of the remount operation depends on the /etc/mtab
file. The first command stores the 'bind' flag to the /etc/mtab
file and the second command reads the flag from the file. If you
have a system without the /etc/mtab file or if you explicitly
define source and target for the remount command (then mount(8)
does not read /etc/mtab), then you have to use bind flag (or option)
for the remount command too. For example:
mount --bind olddir newdir
mount -o remount,ro,bind olddir newdir
Destroy the cgroup when the container exits. Use a constant name for the
container hostname ("alcove" + os pid). If leaking the pid is a concern,
the code could generate random bytes on startup and hash(pid, bytes).
Use a list of binaries as the namespace:
[<<"alcove">>, <<"guest1234">>] % <<"alcove/guest1234">>
is converted to:
<<"/sys/fs/cgroup/blkio/alcove/guest1234">>
<<"/sys/fs/cgroup/cpu/alcove/guest1234">>
<<"/sys/fs/cgroup/cpuacct/alcove/guest1234">>
<<"/sys/fs/cgroup/cpuset/alcove/guest1234">>
With the current cgroup limits, a fork bomb causes the container cgroup
limit to be exceeded and the fork bomb is killed. Works well.
Add a resource limit on the number of processes. It is sort of redundant
given the cgroup limit but is useful on systems without cgroup support.
Not all the cgroups may exist. For example, the stock raspbian doesn't
have cpuset. The cgroup code could be smarter about this, but it'd
complicate the example.
Rough working code for creating a Linux container, restricted by
cgroups:
erl: tcplxc:start().
shell: nc localhost 31337
Multiple containers are supervised by one port in this example.