linux/kernel
Steven Rostedt f431b634f2 tracing/syscalls: Allow archs to ignore tracing compat syscalls
The tracing of ia32 compat system calls has been a bit of a pain as they
use different system call numbers than the 64bit equivalents.

I wrote a simple 'lls' program that lists files. I compiled it as a i686
ELF binary and ran it under a x86_64 box. This is the result:

echo 0 > /debug/tracing/tracing_on
echo 1 > /debug/tracing/events/syscalls/enable
echo 1 > /debug/tracing/tracing_on ; ./lls ; echo 0 > /debug/tracing/tracing_on

grep lls /debug/tracing/trace

[.. skipping calls before TS_COMPAT is set ...]

             lls-1127  [005] d...   936.409188: sys_recvfrom(fd: 0, ubuf: 4d560fc4, size: 0, flags: 8048034, addr: 8, addr_len: f7700420)
             lls-1127  [005] d...   936.409190: sys_recvfrom -> 0x8a77000
             lls-1127  [005] d...   936.409211: sys_lgetxattr(pathname: 0, name: 1000, value: 3, size: 22)
             lls-1127  [005] d...   936.409215: sys_lgetxattr -> 0xf76ff000
             lls-1127  [005] d...   936.409223: sys_dup2(oldfd: 4d55ae9b, newfd: 4)
             lls-1127  [005] d...   936.409228: sys_dup2 -> 0xfffffffffffffffe
             lls-1127  [005] d...   936.409236: sys_newfstat(fd: 4d55b085, statbuf: 80000)
             lls-1127  [005] d...   936.409242: sys_newfstat -> 0x3
             lls-1127  [005] d...   936.409243: sys_removexattr(pathname: 3, name: ffcd0060)
             lls-1127  [005] d...   936.409244: sys_removexattr -> 0x0
             lls-1127  [005] d...   936.409245: sys_lgetxattr(pathname: 0, name: 19614, value: 1, size: 2)
             lls-1127  [005] d...   936.409248: sys_lgetxattr -> 0xf76e5000
             lls-1127  [005] d...   936.409248: sys_newlstat(filename: 3, statbuf: 19614)
             lls-1127  [005] d...   936.409249: sys_newlstat -> 0x0
             lls-1127  [005] d...   936.409262: sys_newfstat(fd: f76fb588, statbuf: 80000)
             lls-1127  [005] d...   936.409279: sys_newfstat -> 0x3
             lls-1127  [005] d...   936.409279: sys_close(fd: 3)
             lls-1127  [005] d...   936.421550: sys_close -> 0x200
             lls-1127  [005] d...   936.421558: sys_removexattr(pathname: 3, name: ffcd00d0)
             lls-1127  [005] d...   936.421560: sys_removexattr -> 0x0
             lls-1127  [005] d...   936.421569: sys_lgetxattr(pathname: 4d564000, name: 1b1abc, value: 5, size: 802)
             lls-1127  [005] d...   936.421574: sys_lgetxattr -> 0x4d564000
             lls-1127  [005] d...   936.421575: sys_capget(header: 4d70f000, dataptr: 1000)
             lls-1127  [005] d...   936.421580: sys_capget -> 0x0
             lls-1127  [005] d...   936.421580: sys_lgetxattr(pathname: 4d710000, name: 3000, value: 3, size: 812)
             lls-1127  [005] d...   936.421589: sys_lgetxattr -> 0x4d710000
             lls-1127  [005] d...   936.426130: sys_lgetxattr(pathname: 4d713000, name: 2abc, value: 3, size: 32)
             lls-1127  [005] d...   936.426141: sys_lgetxattr -> 0x4d713000
             lls-1127  [005] d...   936.426145: sys_newlstat(filename: 3, statbuf: f76ff3f0)
             lls-1127  [005] d...   936.426146: sys_newlstat -> 0x0
             lls-1127  [005] d...   936.431748: sys_lgetxattr(pathname: 0, name: 1000, value: 3, size: 22)

Obviously I'm not calling newfstat with a fd of 4d55b085. The calls are
obviously incorrect, and confusing.

Other efforts have been made to fix this:

https://lkml.org/lkml/2012/3/26/367

But the real solution is to rewrite the syscall internals and come up
with a fixed solution. One that doesn't require all the kluge that the
current solution has.

Thus for now, instead of outputting incorrect data, simply ignore them.
With this patch the changes now have:

 #> grep lls /debug/tracing/trace
 #>

Compat system calls simply are not traced. If users need compat
syscalls, then they should just use the raw syscall tracepoints.

For an architecture to make their compat syscalls ignored, it must
define ARCH_TRACE_IGNORE_COMPAT_SYSCALLS (done in asm/ftrace.h) and also
define an arch_trace_is_compat_syscall() function that will return true
if the current task should ignore tracing the syscall.

I want to stress that this change does not affect actual syscalls in any
way, shape or form. It is only used within the tracing system and
doesn't interfere with the syscall logic at all. The changes are
consolidated nicely into trace_syscalls.c and asm/ftrace.h.

I had to make one small modification to asm/thread_info.h and that was
to remove the include of asm/ftrace.h. As asm/ftrace.h required the
current_thread_info() it was causing include hell. That include was
added back in 2008 when the function graph tracer was added:

 commit caf4b323 "tracing, x86: add low level support for ftrace return tracing"

It does not need to be included there.

Link: http://lkml.kernel.org/r/1360703939.21867.99.camel@gandalf.local.home

Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-02-12 17:46:28 -05:00
..
debug module: add new state MODULE_STATE_UNFORMED. 2013-01-12 13:27:05 +10:30
events uprobes: remove redundant check 2013-01-24 16:40:15 -03:00
gcov
irq irq: tsk->comm is an array 2012-12-18 15:02:11 -08:00
power Merge branch 'for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2012-12-12 08:18:24 -08:00
sched wake_up_process() should be never used to wakeup a TASK_STOPPED/TRACED task 2013-01-22 10:08:17 -08:00
time Merge tag 'kvm-3.8-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm 2012-12-13 15:31:08 -08:00
trace tracing/syscalls: Allow archs to ignore tracing compat syscalls 2013-02-12 17:46:28 -05:00
.gitignore
acct.c vfs: make path_openat take a struct filename pointer 2012-10-12 20:15:09 -04:00
async.c async: fix __lowest_in_progress() 2013-01-22 16:21:24 -08:00
audit_tree.c audit: catch possible NULL audit buffers 2013-01-11 14:54:55 -08:00
audit_watch.c audit: catch possible NULL audit buffers 2013-01-11 14:54:55 -08:00
audit.c kernel/audit.c: avoid negative sleep durations 2013-01-11 14:54:56 -08:00
audit.h audit: optimize audit_compare_dname_path 2012-10-12 00:32:02 -04:00
auditfilter.c audit: fix auditfilter.c kernel-doc warnings 2013-01-10 14:35:23 -08:00
auditsc.c audit: catch possible NULL audit buffers 2013-01-11 14:54:55 -08:00
backtracetest.c
bounds.c
capability.c userns: Teach inode_capable to understand inodes whose uids map to other namespaces. 2012-05-15 14:59:24 -07:00
cgroup_freezer.c cgroup: rename ->create/post_create/pre_destroy/destroy() to ->css_alloc/online/offline/free() 2012-11-19 08:13:38 -08:00
cgroup.c Merge branch 'akpm' (Andrew's patch-bomb) 2012-12-17 20:58:12 -08:00
compat.c x32: fix sigtimedwait 2012-12-26 01:15:03 -05:00
configs.c
context_tracking.c context_tracking: New context tracking susbsystem 2012-11-30 11:40:07 -08:00
cpu_pm.c kernel/cpu_pm.c: fix various typos 2012-05-31 17:49:27 -07:00
cpu.c Merge branch 'x86-bsp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-12-11 19:56:33 -08:00
cpuset.c cpuset: use N_MEMORY instead N_HIGH_MEMORY 2012-12-12 17:38:32 -08:00
crash_dump.c
cred.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2012-12-18 10:55:28 -08:00
delayacct.c
dma.c
elfcore.c
exec_domain.c
exit.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2012-12-17 15:44:47 -08:00
extable.c extable: Skip sorting if sorted at build time. 2012-04-19 15:06:55 -07:00
fork.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal 2013-01-20 13:58:48 -08:00
freezer.c freezer: change ptrace_stop/do_signal_stop to use freezable_schedule() 2012-10-26 14:27:49 -07:00
futex_compat.c
futex.c futex: avoid wake_futex() for a PI futex_q 2012-11-26 17:41:24 -08:00
groups.c userns: Convert in_group_p and in_egroup_p to use kgid_t 2012-05-03 03:29:33 -07:00
hrtimer.c hrtimer: Update hrtimer base offsets each hrtimer_interrupt 2012-07-11 23:34:39 +02:00
hung_task.c hung task debugging: Inject NMI when hung and going to panic 2012-04-25 12:39:25 +02:00
irq_work.c irq_work: fix compile failure on tile from missing include 2012-04-13 13:15:16 -04:00
itimer.c itimer: Use printk_once instead of WARN_ONCE 2012-04-10 11:00:30 +02:00
jump_label.c jump_label: Export jump_label_rate_limit() 2012-08-06 19:00:35 +03:00
kallsyms.c vsprintf: fix %ps on non symbols when using kallsyms 2012-05-29 16:22:32 -07:00
kcmp.c kcmp: include linux/ptrace.h 2012-12-20 17:40:19 -08:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks locking: Adjust spin lock inlining Kconfig options 2012-09-13 17:56:13 +02:00
Kconfig.preempt
kexec.c kdump: remove unneeded include 2012-10-06 03:05:19 +09:00
kfifo.c [media] kernel:kfifo: export __kfifo_max_r symbol 2012-04-11 18:24:37 -03:00
kmod.c Bury the conditionals from kernel_thread/kernel_execve series 2012-12-19 18:07:38 -05:00
kprobes.c kprobes/x86: Move ftrace-based kprobe code into kprobes-ftrace.c 2013-01-21 13:22:36 -05:00
ksysfs.c Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-12-11 18:10:49 -08:00
kthread.c kthread: use N_MEMORY instead N_HIGH_MEMORY 2012-12-12 17:38:33 -08:00
latencytop.c
lglock.c brlocks/lglocks: turn into functions 2012-05-29 23:28:41 -04:00
lockdep_internals.h
lockdep_proc.c lockdep: Use KSYM_NAME_LEN'ed buffer for __get_key_name() 2012-10-24 12:39:09 +02:00
lockdep_states.h
lockdep.c lockdep: Check if nested lock is actually held 2012-09-13 17:00:44 +02:00
Makefile Nothing all that exciting; a new module-from-fd syscall for those who want 2012-12-19 07:55:08 -08:00
modsign_certificate.S MODSIGN: Avoid using .incbin in C source 2012-12-14 13:06:44 +10:30
modsign_pubkey.c keys: use keyring_alloc() to create module signing keyring 2012-12-20 17:40:21 -08:00
module_signing.c MODSIGN: Don't use enum-type bitfields in module signature info block 2012-12-05 11:27:24 +10:30
module-internal.h MODSIGN: Move the magic string to the end of a module and eliminate the search 2012-10-19 17:30:40 -07:00
module.c module: fix missing module_mutex unlock 2013-01-20 20:22:58 -08:00
mutex-debug.c
mutex-debug.h
mutex.c
mutex.h
notifier.c
nsproxy.c userns: Implement unshare of the user namespace 2012-11-20 04:18:14 -08:00
padata.c padata: use __this_cpu_read per-cpu helper 2012-12-06 17:16:23 +08:00
panic.c panic: fix a possible deadlock in panic() 2012-07-30 17:25:13 -07:00
params.c params: replace printk(KERN_<LVL>...) with pr_<lvl>(...) 2012-05-04 17:28:18 -07:00
pid_namespace.c pidns: Stop pid allocation when init dies 2012-12-25 16:10:05 -08:00
pid.c pidns: Stop pid allocation when init dies 2012-12-25 16:10:05 -08:00
posix-cpu-timers.c A few /dev/random improvements for the v3.8 merge window. 2012-12-19 20:23:37 -08:00
posix-timers.c
printk.c printk: fix incorrect length from print_time() when seconds > 99999 2013-01-04 16:11:48 -08:00
profile.c profiling: Remove unused timer hook 2013-01-24 15:37:26 +01:00
ptrace.c ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL 2013-01-22 10:08:00 -08:00
range.c
rcu.h rcu: Add a module parameter to force use of expedited RCU primitives 2012-10-23 14:54:08 -07:00
rcupdate.c rcu: Add a module parameter to force use of expedited RCU primitives 2012-10-23 14:54:08 -07:00
rcutiny_plugin.h rcu: Add a module parameter to force use of expedited RCU primitives 2012-10-23 14:54:08 -07:00
rcutiny.c rcu: Fix TINY_RCU rcu_is_cpu_rrupt_from_idle check 2012-11-13 14:08:34 -08:00
rcutorture.c Merge branches 'urgent.2012.10.27a', 'doc.2012.11.16a', 'fixes.2012.11.13a', 'srcu.2012.10.27a', 'stall.2012.11.13a', 'tracing.2012.11.08a' and 'idle.2012.10.24a' into HEAD 2012-11-16 09:59:58 -08:00
rcutree_plugin.h rcu: Separate accounting of callbacks from callback-free CPUs 2012-11-16 10:05:57 -08:00
rcutree_trace.c rcu: Separate accounting of callbacks from callback-free CPUs 2012-11-16 10:05:57 -08:00
rcutree.c context_tracking: New context tracking susbsystem 2012-11-30 11:40:07 -08:00
rcutree.h rcu: Separate accounting of callbacks from callback-free CPUs 2012-11-16 10:05:57 -08:00
relay.c splice: fix racy pipe->buffers uses 2012-06-13 21:16:42 +02:00
res_counter.c res_counter: return amount of charges after res_counter_uncharge() 2012-12-18 15:02:12 -08:00
resource.c kernel/resource.c: fix stack overflow in __reserve_region_with_split() 2012-10-06 03:05:31 +09:00
rtmutex_common.h
rtmutex-debug.c
rtmutex-debug.h
rtmutex-tester.c
rtmutex.c
rtmutex.h
rwsem.c lockdep, rwsem: provide down_write_nest_lock() 2013-01-11 14:54:55 -08:00
seccomp.c seccomp: Make syscall skipping and nr changes more consistent 2012-10-02 21:14:29 +10:00
semaphore.c semaphore: fix improper comment reference to mutex 2012-04-05 17:15:55 -07:00
signal.c ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL 2013-01-22 10:08:00 -08:00
smp.c smp: Remove ipi_call_lock[_irq]()/ipi_call_unlock[_irq]() 2012-06-05 17:27:14 +02:00
smpboot.c hotplug: Fix UP bug in smpboot hotplug code 2012-08-13 17:01:07 +02:00
smpboot.h smpboot: Provide infrastructure for percpu hotplug threads 2012-08-13 17:01:07 +02:00
softirq.c cputime: Specialize irq vtime hooks 2012-10-29 21:31:32 +01:00
spinlock.c
srcu.c Merge branches 'urgent.2012.10.27a', 'doc.2012.11.16a', 'fixes.2012.11.13a', 'srcu.2012.10.27a', 'stall.2012.11.13a', 'tracing.2012.11.08a' and 'idle.2012.10.24a' into HEAD 2012-11-16 09:59:58 -08:00
stacktrace.c
stop_machine.c
sys_ni.c module: add syscall to load module from fd 2012-12-14 13:05:22 +10:30
sys.c cputime: Rename thread_group_times to thread_group_cputime_adjusted 2012-11-28 17:07:57 +01:00
sysctl_binary.c pidns: Use task_active_pid_ns where appropriate 2012-11-19 05:59:09 -08:00
sysctl.c Automatic NUMA Balancing V11 2012-12-16 15:18:08 -08:00
task_work.c task_work: task_work_add() should not succeed after exit_task_work() 2012-09-13 16:47:34 +02:00
taskstats.c taskstats: cgroupstats_user_cmd() may leak on error 2012-10-06 03:05:31 +09:00
test_kprobes.c
time.c time: Move update_vsyscall definitions to timekeeper_internal.h 2012-09-24 12:38:06 -04:00
timeconst.pl
timer.c timers: Fix endless looping between cascade() and internal_add_timer() 2012-10-09 21:27:14 +02:00
tracepoint.c
tsacct.c userns: Convert taskstats to handle the user and pid namespaces. 2012-09-18 01:01:32 -07:00
uid16.c userns: Convert setting and getting uid and gid system calls to use kuid and kgid 2012-05-03 03:28:41 -07:00
up.c
user_namespace.c userns: Fix typo in description of the limitation of userns_install 2012-12-14 18:36:36 -08:00
user-return-notifier.c
user.c proc: Usable inode numbers for the namespace file descriptors. 2012-11-20 04:19:49 -08:00
utsname_sysctl.c
utsname.c userns: Require CAP_SYS_ADMIN for most uses of setns. 2012-12-14 16:12:03 -08:00
wait.c propagate name change to comments in kernel source 2012-12-06 10:39:54 +01:00
watchdog.c watchdog: Fix disable/enable regression 2012-12-19 12:10:33 -08:00
workqueue_sched.h
workqueue.c Merge branch 'for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2012-12-12 08:15:13 -08:00