Deep dive into process security on Linux
2023-10-11In this post, I'll tell you about 3 ways of securing processes on Linux: Seccomp, AppArmor and Linux Capabilities.
I'll use examples and make them easy to reproduce on your computer using Docker, so you can get your hands dirty and get a feel for the tooling.
How Seccomp, AppArmor and Capabilities compare
The resource I found most useful for understanding when to use each of these was this answer from Security Stack Exchange answering the question βDocker: when to use apparmor vs seccomp vs --cap-dropβ. I recommend reading that to get a summary of their differences, before taking the deep dive into each of them below.
Seccomp
Seccomp allows disabling system calls, using either a blacklist or a whitelist, as well as logging which system calls have been⦠well, called. This helps reduce the surface of an attack which targets flaws in specific system calls.
First, I'll try changing permissions of /home
without any Seccomp profile:
docker run --rm -t -i busybox sh \
-c "chown root:root /home"
# no output
That worked fine. And now, I'll try the same thing but disabling the chown(2)
system call using Seccomp. For this, I'll use the following blacklist Seccomp profile named seccomp.json
:
{
"defaultAction": "SCMP_ACT_ALLOW",
"architectures": [
"SCMP_ARCH_X86_64"
],
"syscalls": [
{
"name": "chown",
"action": "SCMP_ACT_ERRNO",
"args": []
}
]
}
Then, I'll pass it to Docker and run the chown
command again:
docker run --rm -t -i --security-opt seccomp=seccomp.json busybox sh \
-c "chown root:root /home"
# chown: /home: Operation not permitted
This time, I got an error, which goes to show that Seccomp has prevented chown(2)
.
In the example above, I used a blacklist, with the defaultAction
set to SCMP_ACT_ALLOW
. It's also possible to create whitelists by setting the defaultAction
to SCMP_ACT_ERRNO
and then adding items with action
set to SCMP_ACT_ALLOW
in the syscalls
list. That can prove a bit more involved but results in a much safer configuration.
Seccomp is "all or nothing". I can ask it to turn a system call either on or off. But I can't ask it to allow a system call only with specific values as arguments. That's a job for AppArmor.
AppArmor
AppArmor helps implement more fine-grained access control. Reading through parts of man apparmor.d
is a good way to understand what rules AppArmor supports. Here are just some of the things it's capable of:
- restrict the use of specific mount options for devices;
- disable access to specific files and directories;
- choose what signals can be issued or handled by a process;
- allow or disallow using sockets with specific IP addresses.
Let's get our hands dirty with an AppArmor profile. First, I need to create the profile. I could use the aa-easyprof
or aa-logprof
commands, but here's a sample I've hand-built, I save it in a file called apparmor.profile
:
#include <tunables/global>
profile read-only-etc {
#include <abstractions/base>
allow file /bin/ls ix,
allow file /etc/ r,
}
The <abstractions/base>
(see code in Gitlab) include makes adding other rules easier, by giving read access to shared libraries, as well as write access to some special files like /dev/null
or /dev/urandom
, etc. The allow file ...
rules allow ls /etc/
to work correctly.
By default, anything not listed in the profile is disallowed. So with this profile, reading user data from somewhere inside /home/
will result in an error.
Next, I load the profile into the Linux kernel:
sudo apparmor_parser -r apparmor.profile
Finally, I can use it when running a program:
docker run --rm -t -i --security-opt "apparmor=read-only-etc" busybox sh \
-c "ls /home/; ls /etc/; cat /etc/passwd;"
# ls: can't open '/home/': Permission denied
# group hosts mtab nsswitch.conf resolv.conf
# hostname localtime network passwd shadow
# sh: cat: Permission denied
We can see how AppArmor has allowed me to list files in /etc/
with the program /bin/ls
, but I can't access the contents of the files because the profile doesn't allow it. Here's a new version of the profile that allows reading the files in /etc/
:
#include <tunables/global>
profile read-only-etc {
#include <abstractions/base>
allow file /bin/ls ix,
allow file /etc/ r,
allow file /bin/cat ix, # this is new
allow file /etc/** r, # this is new
}
Now, when I run the previous commands again, I can see the contents of /etc/passwd
:
docker run --rm -t -i --security-opt "apparmor=read-only-etc" busybox sh \
-c "ls /home/; ls /etc/; cat /etc/passwd;"
# ls: can't open '/home/': Permission denied
# group hostname hosts localtime mtab # network nsswitch.conf passwd resolv.conf shadow
# root:x:0:0:root:/root:/bin/sh
# daemon:x:1:1:daemon:/usr/sbin:/bin/false
# ...
# nobody:x:65534:65534:nobody:/home:/bin/false
Since I'm done with AppArmor for now, I can remove the profile:
sudo apparmor_parser -R apparmor.profile
Linux Capabilities
Linux Capabilities help do things as root
, without having to give the process the full power of root
. Said another way, it mitigates the impact of a compromised process running as root
.
Let's look at a few examples. In a container, I'll try changing the permissions of /home
and then try deleting /home
. We'll see the impact capabilities can have for such a use case.
First, I'll use my personal Linux user with id -u
and id -g
:
docker run --rm -u `id -u`:`id -g` -t -i busybox sh \
-c "ls -ld /home; chown root:root /home; rm -rf /home; ls -ld /home"
# drwxr-xr-x 2 nobody nobody 4096 Jul 17 18:30 /home
# chown: /home: Operation not permitted
# rm: can't remove '/home': Permission denied
# drwxr-xr-x 2 nobody nobody 4096 Jul 17 18:30 /home
That didn't work, which is to be expected, since my Linux user is not root
and doesn't own the /home
directory. It can neither change its permissions nor remove it. Here's the same test running as root
:
docker run --rm -t -i busybox sh \
-c "ls -ld /home; chown root:root /home; rm -rf /home; ls -ld /home"
# drwxr-xr-x 2 nobody nobody 4096 Jul 17 18:30 /home
# ls: /home: No such file or directory
That worked fine. As expected, root
can do anything, including changing permissions of and deleting the /home
directory. Finally, let's drop the CHOWN
Linux capability and run as root
again:
docker run --rm -t -i --cap-drop CHOWN busybox sh \
-c "ls -ld /home; chown root:root /home; rm -rf /home; ls -ld /home"
# drwxr-xr-x 2 nobody nobody 4096 Jul 17 18:30 /home
# chown: /home: Operation not permitted
# ls: /home: No such file or directory
Even though we're using root
, we can't do everything we want. We can remove /home
, but we can't change its permissions. Cool, right?
Seccomp, AppArmor and Capabilities together
I've talked about how to use each of these separately. But how would I combine their strengths for defense in depth?
Let's discuss an example we might see for real:
Before my webserver starts, I want to make sure its configuration belongs to the right user and group. I'm not sure what the user and group may be before this operation.
Therefore, I want to run a privileged process, but only with permissions to change file permissions on /etc/nginx/.
Nothing more, nothing less.
We'd run as root, allow only the CHOWN
capability, allow only the system calls used by chown
using Seccomp and allow running only on /etc/nginx/
using AppArmor. Let's see how that might go with Docker.
First, we need a Seccomp profile. Ideally, I want a whitelist profile because it's safer. But how do I know the list of system calls I want to allow for the command chown
, since I haven't written that command's code myself?
I'll get to that in a minute. For now, I'll just use this profile below in a file called seccomp.json
:
{
"defaultAction": "SCMP_ACT_ERRNO",
"architectures": [
"SCMP_ARCH_X86_64"
],
"syscalls": [
{"name":"access", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"arch_prctl", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"brk", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"capget", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"capset", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"chdir", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"clone", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"close", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"close_range", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"connect", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"epoll_ctl", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"execve", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"exit_group", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"fchownat", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"faccessat2", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"fcntl", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"fstat", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"fstatfs", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"futex", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"getcwd", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"getdents64", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"getegid", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"geteuid", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"getgid", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"getpgrp", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"getpid", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"getppid", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"getrandom", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"getuid", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"ioctl", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"kill", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"lseek", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"mmap", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"mprotect", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"munmap", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"newfstatat", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"openat", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"prctl", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"prctl", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"pread64", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"prlimit64", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"pselect6", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"read", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"readlink", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"rseq", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"rt_sigaction", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"rt_sigprocmask", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"rt_sigreturn", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"setgid", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"setgroups", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"setpgid", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"setresgid", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"setresuid", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"setuid", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"set_robust_list", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"set_tid_address", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"signalfd4", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"socket", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"uname", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"wait4", "action": "SCMP_ACT_ALLOW", "args": []},
{"name":"write", "action": "SCMP_ACT_ALLOW", "args": []}
]
}
Then we need an AppArmor profile in apparmor.profile
, here it is:
#include <tunables/global>
profile chown-etc {
#include <abstractions/base>
allow file /bin/chown ix,
allow file /etc/nginx/ rw,
allow file /etc/nginx/** rw,
capability chown,
}
Finally, we can run a container dropping all capabilities but CHOWN
with the profiles prepared above:
docker run --rm -t -i \
--security-opt apparmor=chown-etc \
--security-opt seccomp=seccomp.json \
--cap-drop=ALL --cap-add=CHOWN \
--entrypoint /bin/chown nginx \
-R 1000:1000 /etc/nginx/
This runs without any errors. The command chown
runs as root, but with very limited permissions as to what it can actually do.
How to create the Seccomp profile
Back to the Seccomp profile. How did I find out which system calls to whitelist?
First, I built a profile that did nothing but log system calls:
{
"defaultAction": "SCMP_ACT_LOG",
"architectures": [
"SCMP_ARCH_X86_64"
],
"syscalls": [
]
}
Then, I ran the container in one terminal and viewed logs from /var/log/audit/audit.log
in another. It looked something like this. I had to install auditd
for this audit log to be available.
After adding all the listed system calls to the seccomp.json
profile, I could replace SCMP_ACT_LOG
with SCMP_ACT_ERRNO
and I was done.
There was a little oddity, though. For the profile to work correctly with Docker, I needed to allow the futex
, capget
, epoll_ctl
, fstat
, setgid
, setuid
, faccessat2
, rt_sigreturn
and setgroups
system calls. Podman didn't need these system calls for the program to run correctly.
I hope you enjoyed this deep dive into Seccomp, AppArmor and Linux Capabilities. We've gone over their differences, strengths and also ran some code to see them in action. We've also combined them in the spirit of defense in depth.
Their usage can be daunting and require maintenance. For some projects, it may be too much. For others, it may be warranted to mitigate the impact of an attack.