perf trace


  • raw_syscalls:sys_enter and sys_exit
  • strace like
  • perf targets: system wide, CPU, cgroups, etc
  • Much lower overhead than strace
  • Mix and match with other tracepoints, kprobes, uprobes, etc
  • But only integer args

Syscall pointer args: 1st "solution"


  • kprobes at strategic locations
  • getname_flags() for filenames
  • Needs to combine syscall event with kprobes one
  • fragile: probe location changes over time

Augmented Syscalls: eBPF


  • Hooks into raw_syscalls:sys_enter
  • Appends pointer contents to existing payload
  • Existing beautifiers changed to use it
  • When available, continue working without it
  • No need for probe:vfs_getname
  • Filters
  • Extend the kernel without writing "kernel" code

Augmented Syscalls


  • bpf-output event
  • bpf_perf_event_output
  • bpf_probe_read
  • bpf_probe_read_str

Augmenting syscalls


struct bpf_map SEC("maps") __augmented__ = {
       .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
}
int syscall_enter(syscall)(struct syscall_enter_open_args *args)
{
	struct augmented_open_args augmented;
	probe_read(&augmented.args, sizeof(augmented.args), args);
	str_len = probe_read_str(&augmented.filename.value,
				 sizeof(augmented.filename.value),
				 args->filename_ptr);
	perf_event_output(args, &__augmented__, BPF_F_CURRENT_CPU,
			  &augmented, sizeof(augmented));
	return 0;
}
					

Augmented Syscalls


# cd tools/perf/examples/bpf/
# perf trace -e augmented_syscalls.c cat hello.c 
#include <stdio.h>

int syscall_enter(openat)(void *args)
{
	puts("Hello, world\n");
	return 0;
}

license(GPL);
 0.000 cat/31285 openat(dfd: CWD, filename: /etc/ld.so.cache, flags: CLOEXEC)
 0.029 cat/31285 openat(dfd: CWD, filename: /lib64/libc.so.6, flags: CLOEXEC)
 0.250 cat/31285 open(filename: /usr/lib/locale/locale-archive, flags: CLOEXEC)
 0.300 cat/31285 openat(dfd: CWD, filename: hello.c)
#
					

Further details: maps


  • syscall numbers loaded from userspace
  • syscall arg types in another map
  • together with how many bytes to copy
  • types: kernel BTF section
  • syscalls to trace: in a map too (done)
  • One kernel .o

Difficulties: real or feared


  • clang optimizations versus validator
  • size constraints
  • error reporting (EPERM? ENOMEM?)
  • string operations

Difficulties: real


  • barriers to avoid combining ctx accesses
  • size constraints: improve libbpf/verifier messages
  • clang-tidy

TODO


  • BTF struct auto-beautifier in 'perf trace'
  • mapping of --filter-pids to BPF (done)
  • syscall arg filters (should be done)
  • Merge Jiri Olsa's 'perf bpf'
  • Userspace/stap VM, access to perf userspace functions