linux/kernel
YiFei Zhu f9d480b6ff seccomp/cache: Lookup syscall allowlist bitmap for fast path
The overhead of running Seccomp filters has been part of some past
discussions [1][2][3]. Oftentimes, the filters have a large number
of instructions that check syscall numbers one by one and jump based
on that. Some users chain BPF filters which further enlarge the
overhead. A recent work [6] comprehensively measures the Seccomp
overhead and shows that the overhead is non-negligible and has a
non-trivial impact on application performance.

We observed some common filters, such as docker's [4] or
systemd's [5], will make most decisions based only on the syscall
numbers, and as past discussions considered, a bitmap where each bit
represents a syscall makes most sense for these filters.

The fast (common) path for seccomp should be that the filter permits
the syscall to pass through, and failing seccomp is expected to be
an exceptional case; it is not expected for userspace to call a
denylisted syscall over and over.

When it can be concluded that an allow must occur for the given
architecture and syscall pair (this determination is introduced in
the next commit), seccomp will immediately allow the syscall,
bypassing further BPF execution.

Each architecture number has its own bitmap. The architecture
number in seccomp_data is checked against the defined architecture
number constant before proceeding to test the bit against the
bitmap with the syscall number as the index of the bit in the
bitmap, and if the bit is set, seccomp returns allow. The bitmaps
are all clear in this patch and will be initialized in the next
commit.

When only one architecture exists, the check against architecture
number is skipped, suggested by Kees Cook [7].

[1] https://lore.kernel.org/linux-security-module/c22a6c3cefc2412cad00ae14c1371711@huawei.com/T/
[2] https://lore.kernel.org/lkml/202005181120.971232B7B@keescook/T/
[3] https://github.com/seccomp/libseccomp/issues/116
[4] ae0ef82b90/profiles/seccomp/default.json
[5] 6743a1caf4/src/shared/seccomp-util.c (L270)
[6] Draco: Architectural and Operating System Support for System Call Security
    https://tianyin.github.io/pub/draco.pdf, MICRO-53, Oct. 2020
[7] https://lore.kernel.org/bpf/202010091614.8BB0EB64@keescook/

Co-developed-by: Dimitrios Skarlatos <dskarlat@cs.cmu.edu>
Signed-off-by: Dimitrios Skarlatos <dskarlat@cs.cmu.edu>
Signed-off-by: YiFei Zhu <yifeifz2@illinois.edu>
Reviewed-by: Jann Horn <jannh@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/10f91a367ec4fcdea7fc3f086de3f5f13a4a7436.1602431034.git.yifeifz2@illinois.edu
2020-11-20 11:16:34 -08:00
..
bpf bpf: Update verification logic for LSM programs 2020-11-06 13:15:21 -08:00
cgroup kernel/: fix repeated words in comments 2020-10-16 11:11:19 -07:00
configs compiler: remove CONFIG_OPTIMIZE_INLINING entirely 2020-04-07 10:43:42 -07:00
debug kdb: Fix pager search for multi-line strings 2020-10-01 14:44:08 +01:00
dma swiotlb: remove the tbl_dma_addr argument to swiotlb_tbl_map_single 2020-11-02 10:10:39 -05:00
entry entry: Fix the incorrect ordering of lockdep and RCU check 2020-11-04 18:06:14 +01:00
events perf: Tweak perf_event_attr::exclusive semantics 2020-11-09 18:12:36 +01:00
gcov gcov: add support for GCC 10.1 2020-09-11 09:33:54 -07:00
irq A set of fixes for interrupt chip drivers: 2020-11-08 09:52:57 -08:00
kcsan kernel/: fix repeated words in comments 2020-10-16 11:11:19 -07:00
livepatch kernel/: fix repeated words in comments 2020-10-16 11:11:19 -07:00
locking lockdep: Avoid to modify chain keys in validate_chain() 2020-11-10 18:38:38 +01:00
power PM: sleep: fix typo in kernel/power/process.c 2020-10-27 19:11:44 +01:00
printk printk: ringbuffer: Replace zero-length array with flexible-array member 2020-10-30 16:57:42 -05:00
rcu arm64 fixes for -rc4 2020-11-13 09:23:10 -08:00
sched A set of scheduler fixes: 2020-11-15 09:39:35 -08:00
time time: Prevent undefined behaviour in timespec64_to_ns() 2020-10-26 11:48:11 +01:00
trace tracing: Make -ENOMEM the default error for parse_synth_field() 2020-11-02 15:58:32 -05:00
.gitignore
acct.c kernel: acct.c: fix some kernel-doc nits 2020-10-16 11:11:19 -07:00
async.c treewide: Remove uninitialized_var() usage 2020-07-16 12:35:15 -07:00
audit_fsnotify.c fsnotify: create method handle_inode_event() in fsnotify_operations 2020-07-27 23:25:50 +02:00
audit_tree.c \n 2020-08-06 19:29:51 -07:00
audit_watch.c fsnotify: create method handle_inode_event() in fsnotify_operations 2020-07-27 23:25:50 +02:00
audit.c audit: Remove redundant null check 2020-08-26 09:10:39 -04:00
audit.h audit: change unnecessary globals into statics 2020-08-17 20:26:58 -04:00
auditfilter.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
auditsc.c audit/stable-5.9 PR 20200803 2020-08-04 14:20:26 -07:00
backtracetest.c treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD() 2020-07-30 11:15:58 -07:00
bounds.c
capability.c LSM: Signal to SafeSetID when setting group IDs 2020-10-13 09:17:34 -07:00
compat.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
configs.c
context_tracking.c context_tracking: Ensure that the critical path cannot be instrumented 2020-06-11 15:14:36 +02:00
cpu_pm.c notifier: Fix broken error handling pattern 2020-09-01 09:58:03 +02:00
cpu.c The changes in this cycle are: 2020-06-03 13:06:42 -07:00
crash_core.c kdump: append kernel build-id string to VMCOREINFO 2020-08-12 10:58:01 -07:00
crash_dump.c crash_dump: Remove no longer used saved_max_pfn 2020-04-15 11:21:54 +02:00
cred.c exec: Teach prepare_exec_creds how exec treats uids & gids 2020-05-20 14:44:21 -05:00
delayacct.c
dma.c
elfcore.c
exec_domain.c
exit.c don't dump the threads that had been already exiting when zapped. 2020-10-28 16:39:49 -04:00
extable.c kernel/extable.c: use address-of operator on section symbols 2020-04-07 10:43:42 -07:00
fail_function.c
fork.c fork: fix copy_process(CLONE_PARENT) race with the exiting ->real_parent 2020-11-08 11:18:39 -08:00
freezer.c
futex.c futex: Don't enable IRQs unconditionally in put_pi_state() 2020-11-09 14:30:30 +01:00
gen_kheaders.sh kbuild: add variables for compression tools 2020-06-06 23:42:01 +09:00
groups.c LSM: Signal to SafeSetID when setting group IDs 2020-10-13 09:17:34 -07:00
hung_task.c kernel/hung_task.c: make type annotations consistent 2020-11-02 12:14:19 -08:00
iomem.c
irq_work.c irq_work, smp: Allow irq_work on call_single_queue 2020-05-28 10:54:15 +02:00
jump_label.c kernel/: fix repeated words in comments 2020-10-16 11:11:19 -07:00
kallsyms.c treewide: Convert macro and uses of __section(foo) to __section("foo") 2020-10-25 14:51:49 -07:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c kcov: make some symbols static 2020-08-12 10:58:02 -07:00
kexec_core.c kernel/: fix repeated words in comments 2020-10-16 11:11:19 -07:00
kexec_elf.c
kexec_file.c kernel/resource: move and rename IORESOURCE_MEM_DRIVER_MANAGED 2020-10-16 11:11:18 -07:00
kexec_internal.h
kexec.c LSM: Introduce kernel_post_load_data() hook 2020-10-05 13:37:03 +02:00
kheaders.c
kmod.c kmod: remove redundant "be an" in the comment 2020-08-12 10:58:01 -07:00
kprobes.c kprobes: Tell lockdep about kprobe nesting 2020-11-04 09:46:06 -05:00
ksysfs.c
kthread.c kthread_worker: prevent queuing delayed work from timer_fn when it is being canceled 2020-11-02 12:14:19 -08:00
latencytop.c sysctl: pass kernel pointers to ->proc_handler 2020-04-27 02:07:40 -04:00
Makefile Kbuild updates for v5.10 2020-10-22 13:13:57 -07:00
module_signature.c
module_signing.c
module-internal.h
module.c Modules updates for v5.10 2020-10-22 13:08:57 -07:00
notifier.c notifier: Fix broken error handling pattern 2020-09-01 09:58:03 +02:00
nsproxy.c nsproxy: support CLONE_NEWTIME with setns() 2020-07-08 11:14:22 +02:00
padata.c padata: fix possible padata_works_lock deadlock 2020-09-04 17:51:55 +10:00
panic.c panic: don't dump stack twice on warn 2020-11-14 11:26:04 -08:00
params.c params: Replace zero-length array with flexible-array member 2020-10-29 17:22:59 -05:00
pid_namespace.c kernel/: fix repeated words in comments 2020-10-16 11:11:19 -07:00
pid.c pid: move pidfd_get_pid() to pid.c 2020-10-18 09:27:10 -07:00
profile.c
ptrace.c
range.c kernel.h: split out min()/max() et al. helpers 2020-10-16 11:11:19 -07:00
reboot.c reboot: fix overflow parsing reboot cpu number 2020-11-14 11:26:03 -08:00
regset.c regset: kill ->get() 2020-07-27 14:31:12 -04:00
relay.c kernel/relay.c: drop unneeded initialization 2020-10-16 11:11:22 -07:00
resource.c kernel/resource: make iomem_resource implicit in release_mem_region_adjustable() 2020-10-16 11:11:18 -07:00
rseq.c
scftorture.c scftorture: Add cond_resched() to test loop 2020-08-24 18:38:38 -07:00
scs.c mm: memcontrol: account kernel stack per node 2020-08-07 11:33:25 -07:00
seccomp.c seccomp/cache: Lookup syscall allowlist bitmap for fast path 2020-11-20 11:16:34 -08:00
signal.c ptrace: fix task_join_group_stop() for the case when current is traced 2020-11-02 12:14:19 -08:00
smp.c Merge tag 'core-rcu-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2020-10-18 14:34:50 -07:00
smpboot.c
smpboot.h
softirq.c softirq: Add debug check to __raise_softirq_irqoff() 2020-09-16 15:18:56 +02:00
stackleak.c stackleak: let stack_erasing_sysctl take a kernel pointer buffer 2020-09-19 13:13:39 -07:00
stacktrace.c stacktrace: Remove reliable argument from arch_stack_walk() callback 2020-09-18 14:24:16 +01:00
static_call.c static_call: Fix return type of static_call_init 2020-10-02 21:18:25 +02:00
stop_machine.c stop_machine, rcu: Mark functions as notrace 2020-10-26 12:12:27 +01:00
sys_ni.c mm/madvise: introduce process_madvise() syscall: an external memory hinting API 2020-10-18 09:27:10 -07:00
sys.c kernel/sys.c: fix prototype of prctl_get_tid_address() 2020-10-25 11:44:16 -07:00
sysctl-test.c
sysctl.c mm: allow a controlled amount of unfairness in the page lock 2020-09-17 10:26:41 -07:00
task_work.c task_work: cleanup notification modes 2020-10-17 15:05:30 -06:00
taskstats.c taskstats: move specifying netlink policy back to ops 2020-10-02 19:11:12 -07:00
test_kprobes.c
torture.c torture: Dump ftrace at shutdown only if requested 2020-06-29 12:01:45 -07:00
tracepoint.c tracepoint: Replace zero-length array with flexible-array member 2020-10-29 17:22:59 -05:00
tsacct.c
ucount.c ucount: Make sure ucounts in /proc/sys/user don't regress again 2020-04-07 21:51:27 +02:00
uid16.c
uid16.h
umh.c usermodehelper: reset umask to default before executing user process 2020-10-06 10:31:52 -07:00
up.c
user_namespace.c kernel/: fix repeated words in comments 2020-10-16 11:11:19 -07:00
user-return-notifier.c
user.c user.c: make uidhash_table static 2020-06-04 19:06:24 -07:00
usermode_driver.c umd: Stop using split_argv 2020-07-07 11:58:59 -05:00
utsname_sysctl.c sysctl: pass kernel pointers to ->proc_handler 2020-04-27 02:07:40 -04:00
utsname.c nsproxy: add struct nsset 2020-05-09 13:57:12 +02:00
watch_queue.c watch_queue: Limit the number of watches a user can hold 2020-08-17 09:39:18 -07:00
watchdog_hld.c
watchdog.c kernel/watchdog: fix watchdog_allowed_mask not used warning 2020-11-14 11:26:03 -08:00
workqueue_internal.h
workqueue.c workqueue: fix a kernel-doc warning 2020-10-16 07:28:20 +02:00