linux/kernel
Song Liu 615755a77b bpf: extend stackmap to save binary_build_id+offset instead of address
Currently, bpf stackmap store address for each entry in the call trace.
To map these addresses to user space files, it is necessary to maintain
the mapping from these virtual address to symbols in the binary. Usually,
the user space profiler (such as perf) has to scan /proc/pid/maps at the
beginning of profiling, and monitor mmap2() calls afterwards. Given the
cost of maintaining the address map, this solution is not practical for
system wide profiling that is always on.

This patch tries to solve this problem with a variation of stackmap. This
variation is enabled by flag BPF_F_STACK_BUILD_ID. Instead of storing
addresses, the variation stores ELF file build_id + offset.

Build ID is a 20-byte unique identifier for ELF files. The following
command shows the Build ID of /bin/bash:

  [user@]$ readelf -n /bin/bash
  ...
    Build ID: XXXXXXXXXX
  ...

With BPF_F_STACK_BUILD_ID, bpf_get_stackid() tries to parse Build ID
for each entry in the call trace, and translate it into the following
struct:

  struct bpf_stack_build_id_offset {
          __s32           status;
          unsigned char   build_id[BPF_BUILD_ID_SIZE];
          union {
                  __u64   offset;
                  __u64   ip;
          };
  };

The search of build_id is limited to the first page of the file, and this
page should be in page cache. Otherwise, we fallback to store ip for this
entry (ip field in struct bpf_stack_build_id_offset). This requires the
build_id to be stored in the first page. A quick survey of binary and
dynamic library files in a few different systems shows that almost all
binary and dynamic library files have build_id in the first page.

Build_id is only meaningful for user stack. If a kernel stack is added to
a stackmap with BPF_F_STACK_BUILD_ID, it will automatically fallback to
only store ip (status == BPF_STACK_BUILD_ID_IP). Similarly, if build_id
lookup failed for some reason, it will also fallback to store ip.

User space can access struct bpf_stack_build_id_offset with bpf
syscall BPF_MAP_LOOKUP_ELEM. It is necessary for user space to
maintain mapping from build id to binary files. This mostly static
mapping is much easier to maintain than per process address maps.

Note: Stackmap with build_id only works in non-nmi context at this time.
This is because we need to take mm->mmap_sem for find_vma(). If this
changes, we would like to allow build_id lookup in nmi context.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-15 01:09:28 +01:00
..
bpf bpf: extend stackmap to save binary_build_id+offset instead of address 2018-03-15 01:09:28 +01:00
cgroup kernel/cpuset: current_cpuset_is_being_rebound can be boolean 2018-02-06 18:32:47 -08:00
configs KVM changes for 4.16 2018-02-10 13:16:35 -08:00
debug signal: Simplify and fix kdb_send_sig 2018-01-03 18:01:08 -06:00
events vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
gcov License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
irq genirq/matrix: Handle CPU offlining proper 2018-02-22 22:05:43 +01:00
livepatch Merge branch 'for-4.16/remove-immediate' into for-linus 2018-01-31 16:36:38 +01:00
locking locking/qspinlock: Ensure node->count is updated before initialising node 2018-02-13 14:50:14 +01:00
power x86/power: Fix swsusp_arch_resume prototype 2018-02-02 23:33:50 +01:00
printk Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk 2018-03-01 10:06:39 -08:00
rcu SCSI misc on 20180131 2018-01-31 11:23:28 -08:00
sched sched/cpufreq: Remove unused SUGOV_KTHREAD_PRIORITY macro 2018-02-13 13:04:03 +01:00
time timers: Forward timer base before migrating timers 2018-02-28 23:34:33 +01:00
trace bpf: add support to read sample address in bpf program 2018-03-08 02:22:34 +01:00
.gitignore
acct.c kernel/acct.c: fix the acct->needcheck check in check_free_space() 2018-01-04 16:45:09 -08:00
async.c kernel/async.c: revert "async: simplify lowest_in_progress()" 2018-02-06 18:32:44 -08:00
audit_fsnotify.c
audit_tree.c Merge branch 'fsnotify' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2017-11-14 14:08:20 -08:00
audit_watch.c audit/stable-4.13 PR 20170816 2017-08-16 16:48:34 -07:00
audit.c net: Convert audit_net_ops 2018-02-13 10:36:06 -05:00
audit.h audit/stable-4.15 PR 20171113 2017-11-15 13:28:48 -08:00
auditfilter.c audit: filter PATH records keyed on filesystem magic 2017-11-10 16:08:56 -05:00
auditsc.c audit/stable-4.15 PR 20171113 2017-11-15 13:28:48 -08:00
backtracetest.c
bounds.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
capability.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
compat.c cpumask: make cpumask_size() return "unsigned int" 2018-02-06 18:32:45 -08:00
configs.c
context_tracking.c
cpu_pm.c PM / CPU: replace raw_notifier with atomic_notifier 2017-07-31 13:09:49 +02:00
cpu.c Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-12-31 12:30:34 -08:00
crash_core.c kdump: write correct address of mem_section into vmcoreinfo 2018-01-13 10:42:48 -08:00
crash_dump.c
cred.c
delayacct.c delayacct: Account blkio completion on the correct task 2018-01-16 03:29:36 +01:00
dma.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
elfcore.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
exec_domain.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
exit.c kernel/exit.c: export abort() to modules 2018-01-04 16:45:09 -08:00
extable.c extable: Make init_kernel_text() global 2018-02-21 16:54:06 +01:00
fail_function.c error-injection: Support fault injection framework 2018-01-12 17:33:38 -08:00
fork.c include/linux/sched/mm.h: re-inline mmdrop() 2018-02-21 15:35:42 -08:00
freezer.c
futex_compat.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
futex.c pids: introduce find_get_task_by_vpid() helper 2018-02-06 18:32:46 -08:00
groups.c kernel: make groups_sort calling a responsibility group_info allocators 2017-12-14 16:00:49 -08:00
hung_task.c
irq_work.c irq/work: Improve the flag definitions 2018-01-08 19:43:15 +01:00
jump_label.c extable: Make init_kernel_text() global 2018-02-21 16:54:06 +01:00
kallsyms.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk 2018-02-01 13:36:15 -08:00
kcmp.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c kcov: detect double association with a single task 2018-02-06 18:32:46 -08:00
kexec_core.c x86/mm, kexec: Allow kexec to be used with SME 2017-07-18 11:38:04 +02:00
kexec_file.c resource: Provide resource struct in resource walk callback 2017-11-07 15:35:57 +01:00
kexec_internal.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
kexec.c kdump: protect vmcoreinfo data under the crash memory 2017-07-12 16:26:00 -07:00
kmod.c kmod: move #ifdef CONFIG_MODULES wrapper to Makefile 2017-09-08 18:26:51 -07:00
kprobes.c kprobes: Propagate error from disarm_kprobe_ftrace() 2018-02-16 09:12:58 +01:00
ksysfs.c kexec: move vmcoreinfo out of the kernel's .bss section 2017-07-12 16:25:59 -07:00
kthread.c treewide: Remove TIMER_FUNC_TYPE and TIMER_DATA_TYPE casts 2017-11-21 16:35:54 -08:00
latencytop.c
Makefile error-injection: Support fault injection framework 2018-01-12 17:33:38 -08:00
memremap.c memremap: fix softlockup reports at teardown 2018-03-02 19:34:50 -08:00
module_signing.c
module-internal.h
module.c Modules updates for v4.16 2018-02-07 14:29:34 -08:00
notifier.c
nsproxy.c
padata.c padata: add SPDX identifier 2018-01-05 18:43:00 +11:00
panic.c kernel/panic.c: add TAINT_AUX 2017-11-17 16:10:04 -08:00
params.c kernel/params.c: improve STANDARD_PARAM_DEF readability 2017-10-03 17:54:26 -07:00
pid_namespace.c pid: remove pidhash 2017-11-17 16:10:04 -08:00
pid.c pids: introduce find_get_task_by_vpid() helper 2018-02-06 18:32:46 -08:00
profile.c
ptrace.c pids: introduce find_get_task_by_vpid() helper 2018-02-06 18:32:46 -08:00
range.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
reboot.c kernel/reboot.c: add devm_register_reboot_notifier() 2017-11-17 16:10:04 -08:00
relay.c kernel/relay.c: limit kmalloc size to KMALLOC_MAX_SIZE 2018-02-21 15:35:43 -08:00
resource.c Merge branch 'akpm' (patches from Andrew) 2018-02-06 22:15:42 -08:00
seccomp.c - Fix seccomp GET_METADATA to deal with field sizes correctly (Tycho Andersen) 2018-02-22 10:50:24 -08:00
signal.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching 2018-01-31 13:02:18 -08:00
smp.c smp/core: Use lockdep to assert IRQs are disabled/enabled 2017-11-08 11:13:50 +01:00
smpboot.c watchdog/core, powerpc: Lock cpus across reconfiguration 2017-10-04 10:53:54 +02:00
smpboot.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
softirq.c softirq: Eliminate cond_resched_rcu_qs() in favor of cond_resched() 2017-12-04 10:28:58 -08:00
stacktrace.c
stop_machine.c
sys_ni.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
sys.c fix typo in assignment of fs default overflow gid 2017-12-14 16:01:45 -06:00
sysctl_binary.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
sysctl.c pipe: reject F_SETPIPE_SZ with size over UINT_MAX 2018-02-06 18:32:47 -08:00
task_work.c locking/barriers: Convert users of lockless_dereference() to READ_ONCE() 2017-12-17 13:57:15 +01:00
taskstats.c pids: introduce find_get_task_by_vpid() helper 2018-02-06 18:32:46 -08:00
test_kprobes.c kprobes: Disable the jprobes test code 2017-10-20 11:02:54 +02:00
torture.c torture: Save a line in stutter_wait(): while -> for 2017-12-11 09:18:30 -08:00
tracepoint.c tracepoint: Remove smp_read_barrier_depends() from comment 2017-12-04 10:52:56 -08:00
tsacct.c
ucount.c
uid16.c kernel: make groups_sort calling a responsibility group_info allocators 2017-12-14 16:00:49 -08:00
umh.c kernel/umh.c: optimize 'proc_cap_handler()' 2017-11-17 16:10:01 -08:00
up.c smp: Avoid using two cache lines for struct call_single_data 2017-08-29 15:14:38 +02:00
user_namespace.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2017-11-16 12:20:15 -08:00
user-return-notifier.c
user.c efivarfs: Limit the rate for non-root to read files 2018-02-22 10:21:02 -08:00
utsname_sysctl.c
utsname.c
watchdog_hld.c Merge branch 'linus' into core/urgent, to pick up dependent commits 2017-11-04 08:53:04 +01:00
watchdog.c Merge branch 'linus' into sched/core, to pick up fixes 2017-11-08 10:17:15 +01:00
workqueue_internal.h Merge branch 'for-4.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2017-11-06 12:26:49 -08:00
workqueue.c Fixes for 4.16. I contains fixes for deadlock on runtime suspend on few 2018-02-22 08:39:26 +10:00