syd — seccomp-bpf and seccomp-notify based application sandbox
syd [-hvb] [--dry-run] [--dump <fd|path|tmp>] [--export <bpf|pfc:filename>] [--memaccess 0..3] [--arch arch...] [--config pathspec...] [--magic command...] [--lock] [--chroot directory] [--chdir <directory|tmp>] [--env var...] [--env var=val...] [--ionice class:data] [--nice level] [--background] [--stdout logfile] [--stderr logfile] [--startas name] [--umask mode] [--uid user-id] [--gid group-id] {command [arg...]}
syd [--export <bpf|pfc:filename>] [--arch arch...] [--config pathspec...] [--magic command...] {noexec}
syd --test
sydbox is a seccomp(2) based sandboxing utility for modern Linux[>=5.6] machines to sandbox unwanted process access to filesystem and network resources.
sydbox requires no root access and
no
ptrace(2)
rights. They don't depend on any specific Linux kernel option to function.
The only dependency is libseccomp which is available on many different architectures,
including x86
, x86_64
,
x32
, arm
,
aarch64
, mips
,
mips64
...
This makes it very easy for a regular user to use. This is the motto of SydBox: bring easy, simple, flexible and powerful security to the Linux user!
The basic idea of sydbox is to run a command under certain restrictions.
These restrictions define which system calls the command is permitted to run and which argument
values are permitted for the given system call. The restrictions may be applied via two ways.
seccomp-bpf
can be used to apply simple Secure Computing user filters to
run sandboxing fully on kernel space, and seccomp-notify
functionality can
be used to run sandboxing on kernel space and fallback to user space to dereference pointer
arguments of system calls -- which are one of
pathname,
UNIX socket address,
IPv4 or
IPv6
network address -- and make dynamic decisions
using `rsync`-like wildcards
such as `allowlist/write+/home/sydbox/***`, or
`allowlist/write+/run/user/*/pulse`
for pathnames,
and using
CIDR
notation such as
`allowlist/network/connect+inet:127.0.0.1/8@9050`, or
`allowlist/network/connect+inet6:::1/8@9050`
for
IPv4 and
IPv6 addresses
and perform an action which is by default denying the system call with an
appropriate error -- which is usually **permission denied** -- or
kill the process running the system call, or kill all processes at once
with
SIGKILL
.
seccomp-bpf
filters are extremely fast and secure yet somewhat limited.
The limitation stems from the fact that a seccomp-bpf
filter may not
dereference a pointer in a system call argument. This means, e.g, one may not check if
a path name argument is under a certain directory tree. However, one may check if a
file opening call is read or write. Note, this is important from the
security point of view as dereferencing a pointer is a
Time-of-Check-to-Time-of-Use-Problem, or shortly
TOCTOU. This means using seccomp-user-notify
is
never completely secure. Use it at your own risk.
To be able to use sydbox, you need a recent Linux kernel with the system
calls pidfd_getfd
, pidfd_send_signal
,
process_vm_readv
and process_vm_writev
. The
Secure Computing facility of the Linux kernel should support
the SECCOMP_USER_NOTIF_FLAG_CONTINUE
operation.
is recommended. It is recommended to enable the kernel configuration option
CONFIG_CROSS_MEMORY_ATTACH
. Linux-5.11 or later is recommeded.
Check with syd --test to verify all the requirements are met.
The following options are understood:
-h
, --help
-v
, --version
-b
, --bpf-only
allow
,
deny
or bpf
.
See the section called “Sandboxing” for more information.
--dry-run
-d
to get an overview of what the traced process
is doing without intervening with its processing.
-d
<fd[0-9]+|path|tmp>
, --dump
=<fd[0-9]+|path|tmp>
Dump system call information to the given file descriptor.
Use a number to dump to a file descriptor, e.g. 2 for standard error,
use a string to write the dump to a path, and use tmp
to write the dump to a temporary file. In the latter case, Sydbox prints
the path of the temporary file to standard error on start and exit.
-e
<bpf|pfc>:filename
, --export
=<bpf|pfc>:filename
Export the seccomp filters in given format. Format can be exactly one of
bpf
for Berkeley Packet Filter or
pfc
for Pseudo Filter Code. The output of
bpf
is suitable for loading into the kernel, while
the output of pfc
is human readable and is intended
primarily as a debugging tool for developers using libseccomp.
If a filename is given after the format name and a colon, write the seccomp filters into the given filename. If no filename is given write to standard error.
If you just want to inspect the seccomp filters and not execute a process,
pass the special string noexec
as the command, e.g:
syd -e pfc:out noexec when SydBox will exit with
either the numeric value of the environment variable
SYDBOX_NOEXEC
or 0
if the variable is not set. The exit happens after preparing all the
requested restrictions and right before process execution.
-M
0..3
, --memaccess
=0..3
Mode on using cross memory attach or /proc/pid/mem
.
Cross memory attach requires a Linux kernel with the
CONFIG_CROSS_MEMORY_ATTACH
option enabled.
Default mode is 0
.
Modes 2 and 3 may run into too many processes errors.
Use another mode or adapt the sysctl fs.nr_open
as necessary if this is the case.
-a
arch
, --arch
=arch
Filter system calls for the given architecture, may be repeated
Available architectures are
native
,
x86_64
,
x86
,
x86
,
x32
,
arm
,
aarch64
,
mips
,
mips64
,
ppc
,
ppc64
,
ppc64le
,
s390
,
s390x
,
parisc
,
parisc64
, and
riscv64
.
default: native
, may be repeated.
-c
pathspec
, --config
=pathspec
pathspec
to the configuration file, may be repeated.
See the section called “Configuration” for more information.
-m
magic
, --magic
=magic
-l
, --lock
-C
directory
, --chroot
=directory
-D
directory
, --chdir
=directory
Change directory to this directory before starting the program.
Path to the chdir should be relative to the chroot.
If the special string tmp
is given,
sydbox creates a temporary directory in a secure manner
and changes directory to it.
If read sandboxing is one of allow
or deny
, this directory acts as the
obligatory prefix for all directory changing system calls
which means the process is not allowed to leave this directory
tree. This is functionally similar to a chroot but more
practical to handle.
-E
var=val
, --env
=var=val
var=val
in the environment for command, may be repeated
-I
class[:data]
, --ionice
=class[:data]
0
for none
,
1
for real time
,
2
for best effort
,
and 3
for idle
.
Data can be from 0
to 7
inclusive.
-N
level
, --nice
=level
-20
is the highest priority
,
and 19
is the lowest priority
.
The default niceness for processes is inherited from its parent process
and is usually 0
.
-B
, --background
-1
logfile
, --stdout
=logfile
--background
. The logfile Must be an absolute pathname,
but relative to the path optionally given with --chroot
.
The logfile can also be a named pipe.
-2
logfile
, --stderr
=logfile
--background
. The logfile Must be an absolute pathname,
but relative to the path optionally given with --chroot
.
The logfile can also be a named pipe.
-A
name
, --startas
=name
-K
mode
, --umask
=mode
-U
user-id
, --uid
=user-id
-G
group-id
, --gid
=group-id
-t
, --test
There are four sandboxing types:
Read sandboxing
Write sandboxing
execve(2) sandboxing
Network sandboxing
Sandboxing may have four states:
off
Sandboxing is off, none of the relevant system calls are checked and all access is allowed.
bpf
Sandboxing is initialized at startup, tracing happens at kernel space.
The action for the system call is deny with errno EPERM
.
deny
Sandboxing defaults to deny, allowlists can be used to allow access.
allow
Sandboxing defaults to allow, denylists can be used to deny access.
In addition, there are filters for every sandboxing to prevent Sydbox from reporting an access violation. Note, access is still denied in such cases.
This sandboxing checks certain system calls for filesystem write access. If a system call tries to write, modify or change attributes of a path, this attempt is reported and the system call is denied. There are two ways to customize this behaviour. Sydbox may be configured to "allowlist" some path patterns. If the path argument of the system call which is subject to be modified matches a pattern in the list of allowlisted path patterns, this attempt is not denied. Additionally, Sydbox may be configured to "filter" some path patterns. In this case a match will prevent Sydbox from reporting a warning about the access violation, the system call is still denied though.
List of filtered system calls are:
access
(2),
faccessat
(2),
faccessat2
(2),
chmod
(2),
fchmodat
(2),
chown
(2),
chown32
(2),
lchown
(2),
lchown32
(2),
fchownat
(2),
open
(2),
openat
(2),
openat2
(2),
creat
(2),
mkdir
(2),
mkdirat
(2),
mknod
(2),
mknodat
(2),
rmdir
(2),
truncate
(2),
truncate64
(2),
mount
(2),
umount
(2),
umount2
(2),
utime
(2),
utimes
(2),
utimensat
(2),
futimesat
(2),
unlink
(2),
unlinkat
(2),
link
(2),
linkat
(2),
rename
(2),
renameat
(2),
renameat2
(2),
symlink
(2),
symlinkat
(2),
setxattr
(2),
lsetxattr
(2),
removexattr
(2), and
lremovexattr
(2).
This sandboxing checks certain system calls for filesystem read access. If a system call tries to read a path, this attempt is reported and the system call is denied. See the section called “Write Sandboxing” for more information on how to customize this behaviour.
List of filtered system calls are:
access
(2),
chdir
(2),
fchdir
(2),
faccessat
(2),
faccessat2
(2),
open
(2),
openat
(2),
openat2
(2),
listxattr
(2), and
llistxattr
(2).
This sandboxing denies execve(2), and execveat(2) calls in case the path argument does not match one of the allowlisted patterns. Note, all exec(3) family functions are sandboxed because these functions are just wrappers of either one of execve(2) or execveat(2) system calls.
This sandboxing exposes a way to prevent unwanted network calls. The filtered system calls are:
bind
(2),
connect
(2),
sendto
(2),
recvmsg
(2), and
sendmsg
(2).
To increase usability, these system calls are
filtered in two groups: bind and connect.
bind
(2) belongs to the first group, whereas the other
system calls belong to the connect group.
Sydbox is configured through the so-called magic commands. There are three ways to supply magic commands:
Sydbox may be configured using a configuration file. The path to the configuration file is
speficied using the -c
command line switch or the SYDBOX_CONFIG
environment variable. More than one configuration file may be specified this way. However, only the
initial configuration file can change the core
configuration. If path to the configuration file is prefixed with the character '@
',
Sydbox looks for this configuration file under
where $sharedir
/sydbox/$sharedir
is usually /usr/share
. The command line switch has precedence over the
SYDBOX_CONFIG
environment variable.
Sydbox may be configured using magic
stat(2) calls during runtime.
This is achieved by calling stat()
system call on the special path
/dev/sydbox
followed by the magic command. Note that runtime configuration is only
possible if the magic lock is unset. The system call stat()
was
chosen as the magic call because it is practical to invoke using builtin shell commands like:
test -e /dev/sydbox/core/sandbox/read:deny
which enables read sandboxing for a shell running under Sydbox. It is also possible to
query certain values using the return value of the magic stat
(2):
test -e '/dev/sydbox/core/sandbox/read?' &&\ echo "read sandboxing on" ||\ echo "read sandboxing off"
Some of these shell builtins may actually call
lstat(2) or
newfstatat(2)
system calls instead of
stat(2) thus
Sydbox makes sure to check lstat()
and newfstatat()
system calls for magic commands as well.
Inspection (dry run, sandbox mode = dump) behaves identical to off for magic stat
(2)
Every magic command accepts an argument of a certain type. The available types are listed below:
boolean
A boolean type may have one of the two values,
true
or false
.
To specify boolean values when supplying magic commands to Sydbox, you may also use
true
or false
.
In addition you can use the short forms
t
or f
and you can also use
1
or 0
.
integer
This type represents the basic integer type.
string
This type represents the basic string type.
string-array
This type represents a list of strings. Other types aren't allowed within this type.
command
This is a special type which is used to make sydbox execute certain functions. It is meant to be used as a basic interprocess communication to workaround some tracing limitations.
Magic commands of this type can only be used with the magic
stat
(2) system call.
As mentioned in the section called “Configuration” Sydbox may be configured using the so-called magic commands. Format of the magic commands is simple:
${PREFIX}/section/of/option${OPERATION_CHARACTER}value
where ${PREFIX}
is /dev/sydbox
by default (may be altered at compile-time using SYDBOX_MAGIC_PREFIX
definition).
This prefix is only required for magic stat()
, not for -m
command line
switch.
${OPERATION_CHARACTER}
determines the operation of the magic command.
Possible values are listed below:
Configuration file format of sydbox is simple. It is just a way to supply many magic commands in a convenient
way. All empty lines and lines starting with the number sign '#
' are ignored. All the other
lines are treated as if they were supplied to Sydbox via the -m
command line switch.
Configuration file naming of sydbox follows a naming scheme which makes it possible to extract magic command API version from the file name. A sydbox configuration file must have the extension "syd-" followed by the API version (e.g. "syd-2" for API version 2).
Current magic command API of sydbox version is `2'.
Sydbox recognizes the following magic commands:
core/sandbox/exec
type: string
default: false
query: yes
A string specifying how execve
(2) system call should be sandboxed.
See the section called “execve(2) Sandboxing” for more information.
core/sandbox/read
type: string
default: bpf
query: yes
A string specifying how read sandboxing should be done. See the section called “Read Sandboxing” for more information.
core/sandbox/write
type: string
default: bpf
query: yes
A string specifying how write sandboxing should be done. See the section called “Write Sandboxing” for more information.
core/sandbox/network
type: string
default: bpf
query: yes
A string specifying how network sandboxing should be done. See the section called “Network Sandboxing” for more information.
core/restrict/general
type: integer
default: 0
An integer specifying the level of permitted system calls.
Level 0 performs the default restrictions of SydBox where there
is a list of system calls which are denylisted and are denied
unconditionaly with the errno ECANCELED
.
These restrictions are present to improve the security of SydBox
and are applied regardless of the restrict level.
The list of denylisted system calls in Level 0 are
acct
(2),
add_key
(2),
adjtimex
(2),
afs_syscall
(2),
chroot
(2),
finit_module
(2),
fsmount
(2),
get_kernel_syms
(2),
init_module
(2),
kexec_file_load
(2),
kexec_load
(2),
keyctl
(2),
mount
(2),
move_mount
(2),
nfsservctl
(2),
pidfd_getfd
(2),
pivot_root
(2),
pkey_alloc
(2),
pkey_free
(2),
pkey_mprotect
(2),
process_vm_readv
(2),
process_vm_writev
(2),
ptrace
(2),
quotactl
(2),
reboot
(2),
request_key
(2),
security
(2),
setdomainname
(2),
sethostname
(2),
swapoff
(2),
swapon
(2),
umount
(2),
umount2
(2),
unshare
(2),
uselib
(2),
vm86
(2),
vm86old
(2),
vserver
(2),
Level 1 is strict and resembles the first version of the Secure Computing Mode. Level 2 is less strict than Level 1. Both Level 1 and Level 2 permit only read access to the filesystem. Level 3 is identical to Level 2 except it permits write access to the filesystem.
The list of permitted system calls in Level 1 are
arch_prctl
(2),
close
(2),
dup
(2),
dup2
(2),
execve
(2),
execveat
(2),
exit
(2),
exit_group
(2),
getpid
(2),
set_tid_address
(2),
read
(2),
readv
(2),
preadv
(2),
preadv2
(2),
write
(2),
writev
(2),
pwritev
(2),
pwritev2
(2),
open
(2),
openat
(2),
stat
(2),
fstat
(2),
lstat
(2),
newfstatat
(2),
sigreturn
(2),
brk
(2),
mmap
(2),
mmap2
(2), and
munmap
(2).
Only
read-only open
calls are permitted.
The list of permitted system calls in Level 2 and Level 3 are
access
(2),
brk
(2),
clock_gettime
(2),
close
(2),
clone
(2),
dup
(2),
dup2
(2),
execve
(2),
execveat
(2),
epoll_create
(2),
epoll_wait
(2),
epoll_pwait
(2),
eventfd2
(2),
fork
(2),
vfork
(2),
clone
(2),
clone3
(2),
pipe
(2),
pipe2
(2),
fcntl
(2),
fstat
(2),
fsync
(2),
futex
(2),
getdents
(2),
getegid
(2),
geteuid
(2),
getgid
(2),
getpgrp
(2),
getpid
(2),
getppid
(2),
getpgid
(2),
getrlimit
(2),
gettimeofday
(2),
gettid
(2),
getuid
(2),
lseek
(2),
_llseek
(2),
lstat
(2),
mlockall
(2),
mmap
(2),
mmap2
(2),
munmap
(2),
nanosleep
(2),
newfstatat
(2),
open
(2),
openat
(2),
prlimit
(2),
pselect6
(2),
read
(2),
rt_sigaction
(2),
rt_sigprocmask
(2),
rt_sigreturn
(2),
sched_getaffinity
(2),
sched_yield
(2),
sendmsg
(2),
set_robust_list
(2),
setpgid
(2),
setrlimit
(2),
shutdown
(2),
sigaltstack
(2),
sigreturn
(2),
stat
(2),
uname
(2),
wait4
(2),
write
(2),
writev
(2),
exit_group
(2),
exit
(2),
madvise
(2),
stat
(2),
getrandom
(2),
sysinfo
(2),
recv
(2),
send
(2),
bind
(2),
listen
(2),
connect
(2),
getsockname
(2),
getpeername
(2),
recvmsg
(2),
recvfrom
(2),
sendto
(2),
readlink
(2),
readlinkat
(2),
select
(2),
pselect6
(2),
poll
(2),
arch_prctl
(2),
membarrier
(2), and
set_tid_address
(2).
In addition, Level 3 permits the system calls
chmod
(2),
fchmod
(2),
fchmodat
(2),
chown
(2),
chown32
(2),
lchown
(2),
lchown32
(2),
fchownat
(2),
creat
(2),
mkdir
(2),
mkdirat
(2),
mknod
(2),
mknodat
(2),
rmdir
(2),
truncate
(2),
truncate64
(2),
link
(2),
linkat
(2),
unlink
(2),
unlinkat
(2),
rename
(2),
renameat
(2),
renameat2
(2),
symlink
(2),
symlinkat
(2),
utime
(2),
utimes
(2),
utimensat
(2),
futimesat
(2),
setxattr
(2),
lsetxattr
(2),
removexattr
(2),
lremovexattr
(2), and
openat2
(2)
as well.
core/restrict/identity_change
type: boolean
default: true
A boolean specifying whether user and group identity
changes should be restricted. In this mode, user identity
changes to user ids equal or less than 11 are not permitted.
This is usually the inclusive range between root
and operator users. Check the file
/etc/passwd
to see which range of users
are covered on your system. The limit is 14 for group identity changes,
meaning group identity changes with a group id less than or
equal to 14 are not permitted. This is usually the inclusive
range between the root and
uucp groups. Check the file
/etc/group
to see which range of groups
is covered on your system.
There is a second mode of action with this option: if one the
options --uid
, or --gid
is given,
SydBox configures the sandbox in such a way that only user or group
changes to the given user identity and/or group identity is possible.
E.g: run SydBox with --uid $(id -u nginx)
so that
SydBox will be able to change their user identity to the
nginx user. Any other user identity change
is prohibited.
core/restrict/io_control
type: boolean
default: false
A boolean specifying whether ioctl
calls
should be restricted. In this mode only a subset of
ioctl
requests are allowed.
The list of permitted ioctl
requests are
TCGETS
,
TIOCGLCKTRMIOS
,
TIOCGWINSZ
,
TIOCSWINSZ
,
FIONREAD
,
TIOCINQ
,
TIOCOUTQ
,
TCFLSH
,
TIOCSTI
,
TIOCSCTTY
,
TIOCNOTTY
,
TIOCGPGRP
,
TIOCSPGRP
,
TIOCGSID
,
TIOCEXCL
,
TIOCGEXCL
,
TIOCNXCL
,
TIOCGETD
,
TIOCSETD
,
TIOCPKT
,
TIOCGPKT
,
TIOCSPTLCK
,
TIOCGPTLCK
,
TIOCGPTPEER
,
TIOCGSOFTCAR
,
TIOCSSOFTCAR
,
KDGETLED
,
KDSETLED
,
KDGKBLED
,
KDSKBLED
,
KDGKBTYPE
,
KDGETMODE
,
KDSETMODE
,
KDMKTONE
,
KIOCSOUND
,
GIO_CMAP
,
PIO_CMAP
,
GIO_FONT
,
PIO_FONT
,
GIO_FONTX
,
PIO_FONTX
,
PIO_FONTRESET
,
GIO_SCRNMAP
,
PIO_SCRNMAP
,
GIO_UNISCRNMAP
,
PIO_UNISCRNMAP
,
GIO_UNIMAP
,
PIO_UNIMAP
,
PIO_UNIMAPCLR
,
KDGKBMODE
,
KDSKBMODE
,
KDGKBMETA
,
KDSKBMETA
,
KDGKBENT
,
KDSKBENT
,
KDGKBSENT
,
KDSKBSENT
,
KDGKBDIACR
,
KDGETKEYCODE
,
KDSETKEYCODE
,
KDSIGACCEPT
,
VT_OPENQRY
,
VT_GETMODE
,
VT_SETMODE
,
VT_GETSTATE
,
VT_RELDISP
,
VT_ACTIVATE
,
VT_WAITACTIVE
,
VT_DISALLOCATE
,
VT_RESIZE
, and
VT_RESIZEX
.
This option requires core/restrict/general
to be non-zero.
core/restrict/memory_map
type: boolean
default: false
A boolean specifying whether memory mapping should be restricted.
In this mode, only a subset of readable, writable and executable memory mappings
are allowed. Shared memory mappings are not allowed. Memory mappings which are
both writable and executable are not allowed. There are many more restrictions.
Check the filter_mmap
and filter_mmap2
functions in the file src/syscall-filter.c
of sydbox' source
code for a complete list of restrictions.
This option filters mmap
and mmap2
system calls.
The set of options restricted for memory mappings is borrowed from the sandbox of the Tor project.
This option requires core/restrict/general
to be non-zero.
This setting is meant as a protection against TOCTOU
attacks.
However, it should be noted that such attack vectors are inevitable if seccomp
user notifications are enabled. See the section called “Security” for more
information.
core/restrict/shared_memory_writable
type: boolean
default: false
A boolean specifying whether writable shared memory mappings should be forbidden.
This function filters mmap
(2) and
mmap2
(2) system calls with
PROT_WRITE
given as the memory protection mode and
MAP_SHARED
given as sharing mode.
This option has precedence over the option core/restrict/memory_map
.
If both are enabled, only restrictions given by this option are applied.
Note, though, the option core/restrict/memory_map
includes this
restrictions of this option and many more so it's recommended to use.
This setting is meant as a protection against TOCTOU
attacks.
However, it should be noted that such attack vectors are inevitable if seccomp
user notifications are enabled. See the section called “Security” for more
information.
core/allowlist/per_process_directories
type: boolean
default: true
A boolean specifying whether per-process directories like
/proc/
should automatically be allowlisted.
$pid
core/allowlist/successful_bind
type: boolean
default: true
A boolean specifying whether the socket address arguments of successful
bind
(2) calls should be allowlisted
for connect
(2),
sendto
(2),
recvmsg
(2), and
sendmsg
(2)
system calls.
These socket addresses are allowlisted globally and not per-process for
usability reasons. Thus, for example, a process which forks to call
bind
(2) will have its socket
address allowlisted for their parent as well.
core/allowlist/unsupported_socket_families
type: boolean
default: true
A boolean specifying whether unknown socket families should be allowed access when network sandboxing is on.
core/violation/decision
type: string
default: deny
, or bpf
if
-b
is given.
A string specifying the decision to take when an access violation occurs. Possible values are
kill
, killall
and deny
.
Default is deny
which means to deny the system call and resume execution.
core/violation/exit_code
type: integer
default: -1
An integer specifying the exit code in case core/violation/decision
is
killall
. As a special case, if this integer is equal to zero, sydbox exits with
128
added to the eldest process' exit value in case an access violation has occured.
This special case is meant for program tests to check whether an access violation has occured using the
exit code.
core/violation/raise_fail
type: boolean
default: false
A boolean specifying whether certain failures like errors during path resolution should be treated as access violations. Note this is just a switch for reporting, the access to the system call is denied nevertheless.
core/violation/raise_safe
type: boolean
default: false
A boolean specifying whether certain violations which are considered safe should be reported. For
example, mkdir
(2) is a system call which fails when it can not
create an existant directory. In this special case, sydbox denies the system call with
EEXIST
for consistency and does not raise an access violation in case
core/violation/raise_safe
is set to false
. Other examples are, the
access
(2) system call which is silently denied with
EACCES
and
listxattr
(2), and
llistxattr
(2) system calls which are silently denied with
ENOTSUP
if this option is set to false.
core/trace/magic_lock
type: string
default: off
A string specifying the state of the magic lock. Possible values are on
,
off
and exec
. If magic lock is on
no magic
commands are allowed. Note, the magic lock is tracked per-process. If exec
is
specified, the magic lock is set to on
when the process returns from the system call
execve
(2).
core/trace/memory_access
type: integer
default: 0
Mode on using cross memory attach or /proc/pid/mem
.
Cross memory attach requires a Linux kernel with the
CONFIG_CROSS_MEMORY_ATTACH
option enabled.
Default mode is 0
.
This option is functionally identical to the -M
command line switch.
core/trace/use_toolong_hack
type: boolean
default: false
A boolean specifying whether sydbox should use a hack to determine working directories under a path longer
than PATH_MAX
.
core/match/case_sensitive
type: boolean
default: true
A boolean specifying the case sensitivity of pattern matching.
See the section called “Pattern Matching” for more information.
core/match/no_wildcard
type: string
default: literal
A string specifying how to match patterns with no
'*
' or '?
' characters in them.
Possible values are literal
and prefix
.
With literal
such patterns are matched literally, whereas
with prefix
Sydbox appends /***
to the
end of such patterns to make them a prefix match. Implemented mostly to provide
compatibility with sydbox-0 patterns.
See the section called “Pattern Matching” for more information.
exec/kill_if_match
type: string-array
default: [empty array]
This setting specifies a list of path patterns. If one of these patterns matches the resolved path of an
execve
(2) system call, the process in question is killed. See
the section called “Pattern Matching” for more information on wildmatch patterns.
The initial execve
(2) is not checked.
Thus, if sydbox is called like:
$> sydbox -m exec/kill_if_match+/bin/sh -- /bin/sh
she will execute the /bin/sh command.
filter/exec
type: string-array
default: [empty array]
Specifies a list of path patterns to filter for execve
(2)
sandboxing. See the section called “execve(2) Sandboxing” and the section called “Pattern Matching”.
filter/read
type: string-array
default: [empty array]
Specifies a list of path patterns to filter for read sandboxing. See the section called “Read Sandboxing” and the section called “Pattern Matching”.
filter/write
type: string-array
default: [empty array]
Specifies a list of path patterns to filter for write sandboxing. See the section called “Write Sandboxing” and the section called “Pattern Matching”.
filter/network
type: string-array
default: [empty array]
Specifies a list of network addresses to filter for network sandboxing. See the section called “Network Sandboxing” and the section called “Address Matching”.
allowlist/exec
type: string-array
default: [empty array]
Specifies a list of path patterns to allowlist for execve
(2)
sandboxing. See the section called “execve(2) Sandboxing” and the section called “Pattern Matching”.
allowlist/read
type: string-array
default: [empty array]
Specifies a list of path patterns to allowlist for read sandboxing. See the section called “Read Sandboxing” and the section called “Pattern Matching”.
allowlist/write
type: string-array
default: [empty array]
Specifies a list of path patterns to allowlist for write sandboxing. See the section called “Write Sandboxing” and the section called “Pattern Matching”.
allowlist/network/bind
type: string-array
default: [empty array]
Specifies a list of network addresses to allowlist for bind
(2)
network sandboxing. See the section called “Network Sandboxing” and
the section called “Address Matching”.
allowlist/network/connect
type: string-array
default: [empty array]
Specifies a list of network addresses to allowlist for
connect
(2) and
sendto
(2) network sandboxing.
See the section called “Network Sandboxing” and the section called “Address Matching”.
denylist/exec
type: string-array
default: [empty array]
Specifies a list of path patterns to denylist for execve
(2)
sandboxing. See the section called “execve(2) Sandboxing” and the section called “Pattern Matching”.
denylist/read
type: string-array
default: [empty array]
Specifies a list of path patterns to denylist for read sandboxing. See the section called “Read Sandboxing” and the section called “Pattern Matching”.
denylist/write
type: string-array
default: [empty array]
Specifies a list of path patterns to denylist for write sandboxing. See the section called “Write Sandboxing” and the section called “Pattern Matching”.
denylist/network/bind
type: string-array
default: [empty array]
Specifies a list of network addresses to denylist for bind
(2)
network sandboxing. See the section called “Network Sandboxing” and
the section called “Address Matching”.
denylist/network/connect
type: string-array
default: [empty array]
Specifies a list of network addresses to denylist for
connect
(2) and
connect
(2) network sandboxing.
See the section called “Network Sandboxing” and the section called “Address Matching”.
cmd/exec
type: command
default: none
Makes sydbox execute an external command without sandboxing. The program name and arguments must be
separated with the US
(unit separator, octal: 037) character.
sydfmt(1) may be used
to do this. Consult its manual page for more information.
This command can only be used with the magic stat
(2)
system call.
Sydbox uses shell-style pattern matching for allowlists and filters. The wildmatching code is borrowed from rsync and behaves like:
A '*' matches any path component, but it stops at slashes.
Use '**' to match anything, including slashes.
A '?' matches any character except a slash (/).
A "[" introduces a character class, such as [a-z] or [[:alpha:]].
In a wildcard pattern, a backslash can be used to escape a wildcard character, but it is matched literally when no wildcards are present.
A trailing "dir_name/***" will match both the directory (as if "dir_name/" had been specified) and everything in the directory (as if "dir_name/**" had been specified).
Sydbox checks patterns from multiple sources. There is no precedence between different sources, and the last matching pattern decides the outcome.
Sydbox has a simple address scheme to match network addresses. The addresses can be in the following forms:
Specifies a UNIX socket path, ${PATTERN}
specifies a path pattern.
See the section called “Pattern Matching” for more information on path patterns.
Specifies an abstract UNIX socket path, ${PATTERN}
specifies a path pattern.
See the section called “Pattern Matching” for more information on path patterns.
Specifies an IPV4 address. For more information, read the paragraph below.
Specifies an IPV6 address. For more information, read the paragraph below.
/${NETMASK}
may be omitted from inet:
and inet6:
addresses and ${PORT_RANGE}
can in two forms: either an integer or a service name from the
services(5) database. Either as
as a single entity or as a range in the form BEGIN-END.
In addition there are some aliases, you may use instead of specifying an address:
LOOPBACK
Expanded to inet:127.0.0.0/8
LOOPBACK6
Expanded to inet6:::1/8
LOCAL
Expanded to four addresses as defined in RFC1918:
inet:127.0.0.0/8
inet:10.0.0.0/8
inet:172.16.0.0/12
inet:192.168.0.0/16
LOCAL6
Expanded to four addresses:
inet6:::1
inet6:fe80::/7
inet6:fc00::/7
inet6:fec0::/7
Below are examples of invocation and configuration of Sydbox.
Below are some invocation examples:
Allow all reads, deny read access to /etc/shadow
:
$> syd -E LC_ALL=POSIX \ -m core/sandbox/read:allow \ -m denylist/read+/etc/shadow \ -- /bin/sh -c 'cat /etc/shadow' sydbox@3141592653: -- Access Violation! -- sydbox@3141592653: process id=20926 (abi=0 name:`cat') sydbox@3141592653: cwd: `/home/alip' sydbox@3141592653: cmdline: `cat /etc/shadow' sydbox@3141592653: open(`/etc/shadow') cat: /etc/shadow: Operation not permitted $>
Deny all reads and writes, allow read access to /dev/zero
and write
access to /dev/full
. The executable dd is not static in this case thus
allow access to /lib64 where it will load its shared libraries from as well:
On the system of the author the dd binary links only to libraries under /lib64, use ldd to check the linked libraries on your system.
Note the quoting to escape shell expansion.
$> syd-E LC_ALL=POSIX \ -m core/sandbox/read:deny \ -m core/sandbox/write:deny \ -m 'allowlist/read+/lib64/***' \ -m allowlist/read+/dev/zero \ -m allowlist/read+/dev/full \ -- dd if=/dev/zero of=/dev/full count=1 dd: writing to '/dev/full': No space left on device 1+0 records in 0+0 records out 0 bytes (0 B) copied, 0.000447024 s, 0.0 kB/s $>
Kill common bittorrent applications:
The initial execve
is not checked.
$> syd -E LC_ALL=POSIX \ -m exec/kill_if_match+/usr/bin/ktorrent \ -m exec/kill_if_match+/usr/bin/rtorrent \ -- /bin/sh -c ktorrent sydbox@3141592653: callback_exec: kill_if_match pattern=`/usr/bin/ktorrent' matches execve path=`/usr/bin/ktorrent' sydbox@3141592653: callback_exec: killing process:3097 [abi:0 cwd:`/home/alip']
Execute a process without sandboxing so it will continue execution after sandboxing:
$> syd -- sh -c 'stat "$(./syd-format exec echo hello world)"' hello world File: ‘/dev/sydbox/cmd/exec!echo\037hello\037world’ Size: 0 Blocks: 0 IO Block: 512 character special file Device: 0h/0d Inode: 0 Links: 0 Device type: 1,3 Access: (0666/crw-rw-rw-) Uid: ( 0/ root) Gid: ( 0/ root) $>
Sydbox dumps information about the traced process tree to standard error upon receiving the SIGUSR1 signal. Send SIGUSR2 signal for more verbose process information.
Report bugs by direct mail to <alip@exherbo.org>
Refer to BUGS on http://git.exherbo.org/sydbox-1.git/tree/BUGS for more information on providing information with bug reports.
Attaching poems encourages consideration tremendously.
If you run SydBox with the --bpf
, shortly -b
,
option or if you set al sandboxing modes to exactly one of bpf
and off
all system call sandboxing happens in kernel-space and
this approach is secure.
However, otherwise SydBox must dereference the pointer arguments which is known to be insecure because it makes TOCTOU, time-of-check time-of-use, attacks possible.
sydfmt(1), strace(1), seccomp(2), seccomp_init(3), seccomp_load(3), seccomp_attr_set(3), seccomp_rule_add(3),
SPDX-License-Identifier:
GPL-2.0-only
Copyright © 2010, 2011, 2012, 2013, 2014, 2015, 2018, 2020, 2021 Ali Polatel <alip@exherbo.org>