Announcement: Just launched 🚀 Data Fragment — a tool that lets you search the entire web's HTML to find niche SaaS customers. I'd love your feedback!

Deep dive into process security on Linux

2023-10-11

In this post, I'll tell you about 3 ways of securing processes on Linux: Seccomp, AppArmor and Linux Capabilities.

I'll use examples and make them easy to reproduce on your computer using Docker, so you can get your hands dirty and get a feel for the tooling.

How Seccomp, AppArmor and Capabilities compare

The resource I found most useful for understanding when to use each of these was this answer from Security Stack Exchange answering the question “Docker: when to use apparmor vs seccomp vs --cap-drop”. I recommend reading that to get a summary of their differences, before taking the deep dive into each of them below.

Seccomp

Seccomp allows disabling system calls, using either a blacklist or a whitelist, as well as logging which system calls have been… well, called. This helps reduce the surface of an attack which targets flaws in specific system calls.

First, I'll try changing permissions of /home without any Seccomp profile:

docker run --rm -t -i busybox sh \
  -c "chown root:root /home"

# no output

That worked fine. And now, I'll try the same thing but disabling the chown(2) system call using Seccomp. For this, I'll use the following blacklist Seccomp profile named seccomp.json:

{
	"defaultAction": "SCMP_ACT_ALLOW",
	"architectures": [
		"SCMP_ARCH_X86_64"
	],
	"syscalls": [
		{
			"name": "chown",
			"action": "SCMP_ACT_ERRNO",
			"args": []
		}
	]
}

Then, I'll pass it to Docker and run the chown command again:

docker run --rm -t -i --security-opt seccomp=seccomp.json busybox sh \
  -c "chown root:root /home"

# chown: /home: Operation not permitted

This time, I got an error, which goes to show that Seccomp has prevented chown(2).

In the example above, I used a blacklist, with the defaultAction set to SCMP_ACT_ALLOW. It's also possible to create whitelists by setting the defaultAction to SCMP_ACT_ERRNO and then adding items with action set to SCMP_ACT_ALLOW in the syscalls list. That can prove a bit more involved but results in a much safer configuration.

Seccomp is "all or nothing". I can ask it to turn a system call either on or off. But I can't ask it to allow a system call only with specific values as arguments. That's a job for AppArmor.

AppArmor

AppArmor helps implement more fine-grained access control. Reading through parts of man apparmor.d is a good way to understand what rules AppArmor supports. Here are just some of the things it's capable of:

restrict the use of specific mount options for devices;
disable access to specific files and directories;
choose what signals can be issued or handled by a process;
allow or disallow using sockets with specific IP addresses.

Let's get our hands dirty with an AppArmor profile. First, I need to create the profile. I could use the aa-easyprof or aa-logprof commands, but here's a sample I've hand-built, I save it in a file called apparmor.profile:

#include <tunables/global>
profile read-only-etc {
    #include <abstractions/base>
    allow file /bin/ls ix,
    allow file /etc/ r,
}

The <abstractions/base> (see code in Gitlab) include makes adding other rules easier, by giving read access to shared libraries, as well as write access to some special files like /dev/null or /dev/urandom, etc. The allow file ... rules allow ls /etc/ to work correctly.

By default, anything not listed in the profile is disallowed. So with this profile, reading user data from somewhere inside /home/ will result in an error.

Next, I load the profile into the Linux kernel:

sudo apparmor_parser -r apparmor.profile

Finally, I can use it when running a program:

docker run --rm -t -i --security-opt "apparmor=read-only-etc" busybox sh \
  -c "ls /home/; ls /etc/; cat /etc/passwd;"

# ls: can't open '/home/': Permission denied
# group          hosts          mtab           nsswitch.conf  resolv.conf
# hostname       localtime      network        passwd         shadow
# sh: cat: Permission denied

We can see how AppArmor has allowed me to list files in /etc/ with the program /bin/ls, but I can't access the contents of the files because the profile doesn't allow it. Here's a new version of the profile that allows reading the files in /etc/:

#include <tunables/global>
profile read-only-etc {
    #include <abstractions/base>
    allow file /bin/ls ix,
    allow file /etc/ r,
    allow file /bin/cat ix,   # this is new
    allow file /etc/** r,     # this is new
}

Now, when I run the previous commands again, I can see the contents of /etc/passwd:

docker run --rm -t -i --security-opt "apparmor=read-only-etc" busybox sh \
  -c "ls /home/; ls /etc/; cat /etc/passwd;"
# ls: can't open '/home/': Permission denied
# group          hostname       hosts          localtime      mtab          # network        nsswitch.conf  passwd         resolv.conf    shadow
# root:x:0:0:root:/root:/bin/sh
# daemon:x:1:1:daemon:/usr/sbin:/bin/false
# ...
# nobody:x:65534:65534:nobody:/home:/bin/false

Since I'm done with AppArmor for now, I can remove the profile:

sudo apparmor_parser -R apparmor.profile

Linux Capabilities

Linux Capabilities help do things as root, without having to give the process the full power of root. Said another way, it mitigates the impact of a compromised process running as root.

Let's look at a few examples. In a container, I'll try changing the permissions of /home and then try deleting /home. We'll see the impact capabilities can have for such a use case.

First, I'll use my personal Linux user with id -u and id -g:

docker run --rm -u `id -u`:`id -g` -t -i busybox sh \
  -c "ls -ld /home; chown root:root /home; rm -rf /home; ls -ld /home"

# drwxr-xr-x    2 nobody   nobody        4096 Jul 17 18:30 /home
# chown: /home: Operation not permitted
# rm: can't remove '/home': Permission denied
# drwxr-xr-x    2 nobody   nobody        4096 Jul 17 18:30 /home

That didn't work, which is to be expected, since my Linux user is not root and doesn't own the /home directory. It can neither change its permissions nor remove it. Here's the same test running as root:

docker run --rm -t -i busybox sh \
  -c "ls -ld /home; chown root:root /home; rm -rf /home; ls -ld /home"

# drwxr-xr-x    2 nobody   nobody        4096 Jul 17 18:30 /home
# ls: /home: No such file or directory

That worked fine. As expected, root can do anything, including changing permissions of and deleting the /home directory. Finally, let's drop the CHOWN Linux capability and run as root again:

docker run --rm -t -i --cap-drop CHOWN busybox sh \
  -c "ls -ld /home; chown root:root /home; rm -rf /home; ls -ld /home"

# drwxr-xr-x    2 nobody   nobody        4096 Jul 17 18:30 /home
# chown: /home: Operation not permitted
# ls: /home: No such file or directory

Even though we're using root, we can't do everything we want. We can remove /home, but we can't change its permissions. Cool, right?

Seccomp, AppArmor and Capabilities together

I've talked about how to use each of these separately. But how would I combine their strengths for defense in depth?

Let's discuss an example we might see for real:

Before my webserver starts, I want to make sure its configuration belongs to the right user and group. I'm not sure what the user and group may be before this operation.

Therefore, I want to run a privileged process, but only with permissions to change file permissions on /etc/nginx/.

Nothing more, nothing less.

We'd run as root, allow only the CHOWN capability, allow only the system calls used by chown using Seccomp and allow running only on /etc/nginx/ using AppArmor. Let's see how that might go with Docker.

First, we need a Seccomp profile. Ideally, I want a whitelist profile because it's safer. But how do I know the list of system calls I want to allow for the command chown, since I haven't written that command's code myself?

I'll get to that in a minute. For now, I'll just use this profile below in a file called seccomp.json:

{
  "defaultAction": "SCMP_ACT_ERRNO",
  "architectures": [
    "SCMP_ARCH_X86_64"
  ],
  "syscalls": [
      {"name":"access", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"arch_prctl", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"brk", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"capget", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"capset", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"chdir", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"clone", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"close", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"close_range", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"connect", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"epoll_ctl", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"execve", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"exit_group", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"fchownat", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"faccessat2", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"fcntl", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"fstat", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"fstatfs", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"futex", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"getcwd", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"getdents64", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"getegid", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"geteuid", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"getgid", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"getpgrp", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"getpid", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"getppid", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"getrandom", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"getuid", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"ioctl", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"kill", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"lseek", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"mmap", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"mprotect", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"munmap", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"newfstatat", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"openat", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"prctl", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"prctl", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"pread64", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"prlimit64", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"pselect6", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"read", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"readlink", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"rseq", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"rt_sigaction", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"rt_sigprocmask", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"rt_sigreturn", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"setgid", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"setgroups", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"setpgid", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"setresgid", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"setresuid", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"setuid", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"set_robust_list", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"set_tid_address", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"signalfd4", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"socket", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"uname", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"wait4", "action": "SCMP_ACT_ALLOW", "args": []},
      {"name":"write", "action": "SCMP_ACT_ALLOW", "args": []}
  ]
}

Then we need an AppArmor profile in apparmor.profile, here it is:

#include <tunables/global>
profile chown-etc {
    #include <abstractions/base>
    allow file /bin/chown ix,
    allow file /etc/nginx/ rw,
    allow file /etc/nginx/** rw,
    capability chown,
}

Finally, we can run a container dropping all capabilities but CHOWN with the profiles prepared above:

docker run --rm -t -i \
  --security-opt apparmor=chown-etc \
  --security-opt seccomp=seccomp.json \
  --cap-drop=ALL --cap-add=CHOWN \
  --entrypoint /bin/chown nginx \
  -R 1000:1000 /etc/nginx/

This runs without any errors. The command chown runs as root, but with very limited permissions as to what it can actually do.

How to create the Seccomp profile

Back to the Seccomp profile. How did I find out which system calls to whitelist?

First, I built a profile that did nothing but log system calls:

{
  "defaultAction": "SCMP_ACT_LOG",
  "architectures": [
    "SCMP_ARCH_X86_64"
  ],
  "syscalls": [
  ]
}

Then, I ran the container in one terminal and viewed logs from /var/log/audit/audit.log in another. It looked something like this. I had to install auditd for this audit log to be available.

After adding all the listed system calls to the seccomp.json profile, I could replace SCMP_ACT_LOG with SCMP_ACT_ERRNO and I was done.

There was a little oddity, though. For the profile to work correctly with Docker, I needed to allow the futex, capget, epoll_ctl, fstat, setgid, setuid, faccessat2, rt_sigreturn and setgroups system calls. Podman didn't need these system calls for the program to run correctly.

I hope you enjoyed this deep dive into Seccomp, AppArmor and Linux Capabilities. We've gone over their differences, strengths and also ran some code to see them in action. We've also combined them in the spirit of defense in depth.

Their usage can be daunting and require maintenance. For some projects, it may be too much. For others, it may be warranted to mitigate the impact of an attack.

Feedback or comments

Want to discuss this page? Send me an email or post to Hacker News.