License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 17:07:57 +03:00
// SPDX-License-Identifier: GPL-2.0
2005-04-17 02:20:36 +04:00
# include <linux/linkage.h>
# include <linux/errno.h>
# include <asm/unistd.h>
2007-10-17 10:29:25 +04:00
/* we can't #include <linux/syscalls.h> here,
but tell gcc to not warn with - Wmissing - prototypes */
asmlinkage long sys_ni_syscall ( void ) ;
2005-04-17 02:20:36 +04:00
/*
* Non - implemented system calls get redirected here .
*/
asmlinkage long sys_ni_syscall ( void )
{
return - ENOSYS ;
}
2018-03-06 21:53:01 +03:00
/*
* This list is kept in the same order as include / uapi / asm - generic / unistd . h .
* Architecture specific entries go below , followed by deprecated or obsolete
* system calls .
*/
cond_syscall ( sys_io_setup ) ;
cond_syscall ( compat_sys_io_setup ) ;
cond_syscall ( sys_io_destroy ) ;
cond_syscall ( sys_io_submit ) ;
cond_syscall ( compat_sys_io_submit ) ;
cond_syscall ( sys_io_cancel ) ;
cond_syscall ( sys_io_getevents ) ;
cond_syscall ( compat_sys_io_getevents ) ;
/* fs/xattr.c */
/* fs/dcache.c */
/* fs/cookies.c */
2005-04-17 02:20:36 +04:00
cond_syscall ( sys_lookup_dcookie ) ;
2013-02-26 03:42:04 +04:00
cond_syscall ( compat_sys_lookup_dcookie ) ;
2018-03-06 21:53:01 +03:00
/* fs/eventfd.c */
cond_syscall ( sys_eventfd2 ) ;
/* fs/eventfd.c */
cond_syscall ( sys_epoll_create1 ) ;
cond_syscall ( sys_epoll_ctl ) ;
cond_syscall ( sys_epoll_pwait ) ;
cond_syscall ( compat_sys_epoll_pwait ) ;
/* fs/fcntl.c */
/* fs/inotify_user.c */
cond_syscall ( sys_inotify_init1 ) ;
cond_syscall ( sys_inotify_add_watch ) ;
cond_syscall ( sys_inotify_rm_watch ) ;
/* fs/ioctl.c */
/* fs/ioprio.c */
cond_syscall ( sys_ioprio_set ) ;
cond_syscall ( sys_ioprio_get ) ;
/* fs/locks.c */
cond_syscall ( sys_flock ) ;
/* fs/namei.c */
/* fs/namespace.c */
/* fs/nfsctl.c */
/* fs/open.c */
/* fs/pipe.c */
/* fs/quota.c */
cond_syscall ( sys_quotactl ) ;
/* fs/readdir.c */
/* fs/read_write.c */
/* fs/sendfile.c */
/* fs/select.c */
/* fs/signalfd.c */
cond_syscall ( sys_signalfd4 ) ;
cond_syscall ( compat_sys_signalfd4 ) ;
/* fs/splice.c */
/* fs/stat.c */
/* fs/sync.c */
/* fs/timerfd.c */
cond_syscall ( sys_timerfd_create ) ;
cond_syscall ( sys_timerfd_settime ) ;
cond_syscall ( compat_sys_timerfd_settime ) ;
cond_syscall ( sys_timerfd_gettime ) ;
cond_syscall ( compat_sys_timerfd_gettime ) ;
/* fs/utimes.c */
/* kernel/acct.c */
cond_syscall ( sys_acct ) ;
/* kernel/capability.c */
cond_syscall ( sys_capget ) ;
cond_syscall ( sys_capset ) ;
/* kernel/exec_domain.c */
/* kernel/exit.c */
/* kernel/fork.c */
/* kernel/futex.c */
cond_syscall ( sys_futex ) ;
cond_syscall ( compat_sys_futex ) ;
cond_syscall ( sys_set_robust_list ) ;
cond_syscall ( compat_sys_set_robust_list ) ;
cond_syscall ( sys_get_robust_list ) ;
cond_syscall ( compat_sys_get_robust_list ) ;
/* kernel/hrtimer.c */
/* kernel/itimer.c */
/* kernel/kexec.c */
2005-06-26 01:57:52 +04:00
cond_syscall ( sys_kexec_load ) ;
cond_syscall ( compat_sys_kexec_load ) ;
2018-03-06 21:53:01 +03:00
/* kernel/module.c */
2005-04-17 02:20:36 +04:00
cond_syscall ( sys_init_module ) ;
cond_syscall ( sys_delete_module ) ;
2018-03-06 21:53:01 +03:00
/* kernel/posix-timers.c */
/* kernel/printk.c */
cond_syscall ( sys_syslog ) ;
/* kernel/ptrace.c */
/* kernel/sched/core.c */
/* kernel/signal.c */
/* kernel/sys.c */
cond_syscall ( sys_setregid ) ;
cond_syscall ( sys_setgid ) ;
cond_syscall ( sys_setreuid ) ;
cond_syscall ( sys_setuid ) ;
cond_syscall ( sys_setresuid ) ;
cond_syscall ( sys_getresuid ) ;
cond_syscall ( sys_setresgid ) ;
cond_syscall ( sys_getresgid ) ;
cond_syscall ( sys_setfsuid ) ;
cond_syscall ( sys_setfsgid ) ;
cond_syscall ( sys_setgroups ) ;
cond_syscall ( sys_getgroups ) ;
/* kernel/time.c */
/* kernel/timer.c */
/* ipc/mqueue.c */
cond_syscall ( sys_mq_open ) ;
cond_syscall ( compat_sys_mq_open ) ;
cond_syscall ( sys_mq_unlink ) ;
cond_syscall ( sys_mq_timedsend ) ;
cond_syscall ( compat_sys_mq_timedsend ) ;
cond_syscall ( sys_mq_timedreceive ) ;
cond_syscall ( compat_sys_mq_timedreceive ) ;
cond_syscall ( sys_mq_notify ) ;
cond_syscall ( compat_sys_mq_notify ) ;
cond_syscall ( sys_mq_getsetattr ) ;
cond_syscall ( compat_sys_mq_getsetattr ) ;
/* ipc/msg.c */
cond_syscall ( sys_msgget ) ;
cond_syscall ( sys_msgctl ) ;
cond_syscall ( compat_sys_msgctl ) ;
cond_syscall ( sys_msgrcv ) ;
cond_syscall ( compat_sys_msgrcv ) ;
cond_syscall ( sys_msgsnd ) ;
cond_syscall ( compat_sys_msgsnd ) ;
/* ipc/sem.c */
cond_syscall ( sys_semget ) ;
cond_syscall ( sys_semctl ) ;
cond_syscall ( compat_sys_semctl ) ;
cond_syscall ( sys_semtimedop ) ;
cond_syscall ( compat_sys_semtimedop ) ;
cond_syscall ( sys_semop ) ;
/* ipc/shm.c */
cond_syscall ( sys_shmget ) ;
cond_syscall ( sys_shmctl ) ;
cond_syscall ( compat_sys_shmctl ) ;
cond_syscall ( sys_shmat ) ;
cond_syscall ( compat_sys_shmat ) ;
cond_syscall ( sys_shmdt ) ;
/* net/socket.c */
cond_syscall ( sys_socket ) ;
2005-04-17 02:20:36 +04:00
cond_syscall ( sys_socketpair ) ;
cond_syscall ( sys_bind ) ;
cond_syscall ( sys_listen ) ;
cond_syscall ( sys_accept ) ;
cond_syscall ( sys_connect ) ;
cond_syscall ( sys_getsockname ) ;
cond_syscall ( sys_getpeername ) ;
cond_syscall ( sys_setsockopt ) ;
2007-10-29 10:54:39 +03:00
cond_syscall ( compat_sys_setsockopt ) ;
2005-04-17 02:20:36 +04:00
cond_syscall ( sys_getsockopt ) ;
2007-10-29 10:54:39 +03:00
cond_syscall ( compat_sys_getsockopt ) ;
2018-03-06 21:53:01 +03:00
cond_syscall ( sys_sendto ) ;
2005-04-17 02:20:36 +04:00
cond_syscall ( sys_shutdown ) ;
2018-03-06 21:53:01 +03:00
cond_syscall ( sys_recvfrom ) ;
cond_syscall ( compat_sys_recvfrom ) ;
2005-04-17 02:20:36 +04:00
cond_syscall ( sys_sendmsg ) ;
2007-10-29 10:54:39 +03:00
cond_syscall ( compat_sys_sendmsg ) ;
2005-04-17 02:20:36 +04:00
cond_syscall ( sys_recvmsg ) ;
2007-10-29 10:54:39 +03:00
cond_syscall ( compat_sys_recvmsg ) ;
2018-03-06 21:53:01 +03:00
/* mm/filemap.c */
/* mm/nommu.c, also with MMU */
cond_syscall ( sys_mremap ) ;
/* security/keys/keyctl.c */
2005-04-17 02:20:36 +04:00
cond_syscall ( sys_add_key ) ;
cond_syscall ( sys_request_key ) ;
cond_syscall ( sys_keyctl ) ;
cond_syscall ( compat_sys_keyctl ) ;
2018-03-06 21:53:01 +03:00
/* arch/example/kernel/sys_example.c */
2006-04-11 09:53:06 +04:00
2018-03-06 21:53:01 +03:00
/* mm/fadvise.c */
cond_syscall ( sys_fadvise64_64 ) ;
/* mm/, CONFIG_MMU only */
cond_syscall ( sys_swapon ) ;
cond_syscall ( sys_swapoff ) ;
2006-04-11 09:53:06 +04:00
cond_syscall ( sys_mprotect ) ;
cond_syscall ( sys_msync ) ;
cond_syscall ( sys_mlock ) ;
cond_syscall ( sys_munlock ) ;
cond_syscall ( sys_mlockall ) ;
cond_syscall ( sys_munlockall ) ;
cond_syscall ( sys_mincore ) ;
cond_syscall ( sys_madvise ) ;
cond_syscall ( sys_remap_file_pages ) ;
2018-03-06 21:53:01 +03:00
cond_syscall ( sys_mbind ) ;
cond_syscall ( compat_sys_mbind ) ;
cond_syscall ( sys_get_mempolicy ) ;
cond_syscall ( compat_sys_get_mempolicy ) ;
cond_syscall ( sys_set_mempolicy ) ;
cond_syscall ( compat_sys_set_mempolicy ) ;
cond_syscall ( sys_migrate_pages ) ;
2006-11-03 09:07:24 +03:00
cond_syscall ( compat_sys_migrate_pages ) ;
2018-03-06 21:53:01 +03:00
cond_syscall ( sys_move_pages ) ;
cond_syscall ( compat_sys_move_pages ) ;
[PATCH] BLOCK: Make it possible to disable the block layer [try #6]
Make it possible to disable the block layer. Not all embedded devices require
it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require
the block layer to be present.
This patch does the following:
(*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev
support.
(*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls
an item that uses the block layer. This includes:
(*) Block I/O tracing.
(*) Disk partition code.
(*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS.
(*) The SCSI layer. As far as I can tell, even SCSI chardevs use the
block layer to do scheduling. Some drivers that use SCSI facilities -
such as USB storage - end up disabled indirectly from this.
(*) Various block-based device drivers, such as IDE and the old CDROM
drivers.
(*) MTD blockdev handling and FTL.
(*) JFFS - which uses set_bdev_super(), something it could avoid doing by
taking a leaf out of JFFS2's book.
(*) Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and
linux/elevator.h contingent on CONFIG_BLOCK being set. sector_div() is,
however, still used in places, and so is still available.
(*) Also made contingent are the contents of linux/mpage.h, linux/genhd.h and
parts of linux/fs.h.
(*) Makes a number of files in fs/ contingent on CONFIG_BLOCK.
(*) Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK.
(*) set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK
is not enabled.
(*) fs/no-block.c is created to hold out-of-line stubs and things that are
required when CONFIG_BLOCK is not set:
(*) Default blockdev file operations (to give error ENODEV on opening).
(*) Makes some /proc changes:
(*) /proc/devices does not list any blockdevs.
(*) /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK.
(*) Makes some compat ioctl handling contingent on CONFIG_BLOCK.
(*) If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if
given command other than Q_SYNC or if a special device is specified.
(*) In init/do_mounts.c, no reference is made to the blockdev routines if
CONFIG_BLOCK is not defined. This does not prohibit NFS roots or JFFS2.
(*) The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return
error ENOSYS by way of cond_syscall if so).
(*) The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if
CONFIG_BLOCK is not set, since they can't then happen.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2006-09-30 22:45:40 +04:00
perf: Do the big rename: Performance Counters -> Performance Events
Bye-bye Performance Counters, welcome Performance Events!
In the past few months the perfcounters subsystem has grown out its
initial role of counting hardware events, and has become (and is
becoming) a much broader generic event enumeration, reporting, logging,
monitoring, analysis facility.
Naming its core object 'perf_counter' and naming the subsystem
'perfcounters' has become more and more of a misnomer. With pending
code like hw-breakpoints support the 'counter' name is less and
less appropriate.
All in one, we've decided to rename the subsystem to 'performance
events' and to propagate this rename through all fields, variables
and API names. (in an ABI compatible fashion)
The word 'event' is also a bit shorter than 'counter' - which makes
it slightly more convenient to write/handle as well.
Thanks goes to Stephane Eranian who first observed this misnomer and
suggested a rename.
User-space tooling and ABI compatibility is not affected - this patch
should be function-invariant. (Also, defconfigs were not touched to
keep the size down.)
This patch has been generated via the following script:
FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
sed -i \
-e 's/PERF_EVENT_/PERF_RECORD_/g' \
-e 's/PERF_COUNTER/PERF_EVENT/g' \
-e 's/perf_counter/perf_event/g' \
-e 's/nb_counters/nb_events/g' \
-e 's/swcounter/swevent/g' \
-e 's/tpcounter_event/tp_event/g' \
$FILES
for N in $(find . -name perf_counter.[ch]); do
M=$(echo $N | sed 's/perf_counter/perf_event/g')
mv $N $M
done
FILES=$(find . -name perf_event.*)
sed -i \
-e 's/COUNTER_MASK/REG_MASK/g' \
-e 's/COUNTER/EVENT/g' \
-e 's/\<event\>/event_id/g' \
-e 's/counter/event/g' \
-e 's/Counter/Event/g' \
$FILES
... to keep it as correct as possible. This script can also be
used by anyone who has pending perfcounters patches - it converts
a Linux kernel tree over to the new naming. We tried to time this
change to the point in time where the amount of pending patches
is the smallest: the end of the merge window.
Namespace clashes were fixed up in a preparatory patch - and some
stylistic fallout will be fixed up in a subsequent patch.
( NOTE: 'counters' are still the proper terminology when we deal
with hardware registers - and these sed scripts are a bit
over-eager in renaming them. I've undone some of that, but
in case there's something left where 'counter' would be
better than 'event' we can undo that on an individual basis
instead of touching an otherwise nicely automated patch. )
Suggested-by: Stephane Eranian <eranian@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-21 14:02:48 +04:00
cond_syscall ( sys_perf_event_open ) ;
2018-03-06 21:53:01 +03:00
cond_syscall ( sys_accept4 ) ;
cond_syscall ( sys_recvmmsg ) ;
cond_syscall ( compat_sys_recvmmsg ) ;
/*
* Architecture specific syscalls : see further below
*/
2009-12-18 05:24:25 +03:00
2018-03-06 21:53:01 +03:00
/* fanotify */
2009-12-18 05:24:25 +03:00
cond_syscall ( sys_fanotify_init ) ;
2009-12-18 05:24:26 +03:00
cond_syscall ( sys_fanotify_mark ) ;
2011-01-29 16:13:26 +03:00
/* open by handle */
cond_syscall ( sys_name_to_handle_at ) ;
2011-01-29 16:13:26 +03:00
cond_syscall ( sys_open_by_handle_at ) ;
cond_syscall ( compat_sys_open_by_handle_at ) ;
2012-06-01 03:26:44 +04:00
2018-03-06 21:53:01 +03:00
cond_syscall ( sys_sendmmsg ) ;
cond_syscall ( compat_sys_sendmmsg ) ;
cond_syscall ( sys_process_vm_readv ) ;
cond_syscall ( compat_sys_process_vm_readv ) ;
cond_syscall ( sys_process_vm_writev ) ;
cond_syscall ( compat_sys_process_vm_writev ) ;
2012-06-01 03:26:44 +04:00
/* compare kernel pointers */
cond_syscall ( sys_kcmp ) ;
2014-06-26 03:08:24 +04:00
2018-03-06 21:53:01 +03:00
cond_syscall ( sys_finit_module ) ;
2014-06-26 03:08:24 +04:00
/* operate on Secure Computing state */
cond_syscall ( sys_seccomp ) ;
2014-09-26 11:16:58 +04:00
2018-03-06 21:53:01 +03:00
cond_syscall ( sys_memfd_create ) ;
2014-09-26 11:16:58 +04:00
/* access BPF programs and maps */
cond_syscall ( sys_bpf ) ;
syscalls: implement execveat() system call
This patchset adds execveat(2) for x86, and is derived from Meredydd
Luff's patch from Sept 2012 (https://lkml.org/lkml/2012/9/11/528).
The primary aim of adding an execveat syscall is to allow an
implementation of fexecve(3) that does not rely on the /proc filesystem,
at least for executables (rather than scripts). The current glibc version
of fexecve(3) is implemented via /proc, which causes problems in sandboxed
or otherwise restricted environments.
Given the desire for a /proc-free fexecve() implementation, HPA suggested
(https://lkml.org/lkml/2006/7/11/556) that an execveat(2) syscall would be
an appropriate generalization.
Also, having a new syscall means that it can take a flags argument without
back-compatibility concerns. The current implementation just defines the
AT_EMPTY_PATH and AT_SYMLINK_NOFOLLOW flags, but other flags could be
added in future -- for example, flags for new namespaces (as suggested at
https://lkml.org/lkml/2006/7/11/474).
Related history:
- https://lkml.org/lkml/2006/12/27/123 is an example of someone
realizing that fexecve() is likely to fail in a chroot environment.
- http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=514043 covered
documenting the /proc requirement of fexecve(3) in its manpage, to
"prevent other people from wasting their time".
- https://bugzilla.redhat.com/show_bug.cgi?id=241609 described a
problem where a process that did setuid() could not fexecve()
because it no longer had access to /proc/self/fd; this has since
been fixed.
This patch (of 4):
Add a new execveat(2) system call. execveat() is to execve() as openat()
is to open(): it takes a file descriptor that refers to a directory, and
resolves the filename relative to that.
In addition, if the filename is empty and AT_EMPTY_PATH is specified,
execveat() executes the file to which the file descriptor refers. This
replicates the functionality of fexecve(), which is a system call in other
UNIXen, but in Linux glibc it depends on opening "/proc/self/fd/<fd>" (and
so relies on /proc being mounted).
The filename fed to the executed program as argv[0] (or the name of the
script fed to a script interpreter) will be of the form "/dev/fd/<fd>"
(for an empty filename) or "/dev/fd/<fd>/<filename>", effectively
reflecting how the executable was found. This does however mean that
execution of a script in a /proc-less environment won't work; also, script
execution via an O_CLOEXEC file descriptor fails (as the file will not be
accessible after exec).
Based on patches by Meredydd Luff.
Signed-off-by: David Drysdale <drysdale@google.com>
Cc: Meredydd Luff <meredydd@senatehouse.org>
Cc: Shuah Khan <shuah.kh@samsung.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Rich Felker <dalias@aerifal.cx>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 03:57:29 +03:00
/* execveat */
cond_syscall ( sys_execveat ) ;
sys_membarrier(): system-wide memory barrier (generic, x86)
Here is an implementation of a new system call, sys_membarrier(), which
executes a memory barrier on all threads running on the system. It is
implemented by calling synchronize_sched(). It can be used to
distribute the cost of user-space memory barriers asymmetrically by
transforming pairs of memory barriers into pairs consisting of
sys_membarrier() and a compiler barrier. For synchronization primitives
that distinguish between read-side and write-side (e.g. userspace RCU
[1], rwlocks), the read-side can be accelerated significantly by moving
the bulk of the memory barrier overhead to the write-side.
The existing applications of which I am aware that would be improved by
this system call are as follows:
* Through Userspace RCU library (http://urcu.so)
- DNS server (Knot DNS) https://www.knot-dns.cz/
- Network sniffer (http://netsniff-ng.org/)
- Distributed object storage (https://sheepdog.github.io/sheepdog/)
- User-space tracing (http://lttng.org)
- Network storage system (https://www.gluster.org/)
- Virtual routers (https://events.linuxfoundation.org/sites/events/files/slides/DPDK_RCU_0MQ.pdf)
- Financial software (https://lkml.org/lkml/2015/3/23/189)
Those projects use RCU in userspace to increase read-side speed and
scalability compared to locking. Especially in the case of RCU used by
libraries, sys_membarrier can speed up the read-side by moving the bulk of
the memory barrier cost to synchronize_rcu().
* Direct users of sys_membarrier
- core dotnet garbage collector (https://github.com/dotnet/coreclr/issues/198)
Microsoft core dotnet GC developers are planning to use the mprotect()
side-effect of issuing memory barriers through IPIs as a way to implement
Windows FlushProcessWriteBuffers() on Linux. They are referring to
sys_membarrier in their github thread, specifically stating that
sys_membarrier() is what they are looking for.
To explain the benefit of this scheme, let's introduce two example threads:
Thread A (non-frequent, e.g. executing liburcu synchronize_rcu())
Thread B (frequent, e.g. executing liburcu
rcu_read_lock()/rcu_read_unlock())
In a scheme where all smp_mb() in thread A are ordering memory accesses
with respect to smp_mb() present in Thread B, we can change each
smp_mb() within Thread A into calls to sys_membarrier() and each
smp_mb() within Thread B into compiler barriers "barrier()".
Before the change, we had, for each smp_mb() pairs:
Thread A Thread B
previous mem accesses previous mem accesses
smp_mb() smp_mb()
following mem accesses following mem accesses
After the change, these pairs become:
Thread A Thread B
prev mem accesses prev mem accesses
sys_membarrier() barrier()
follow mem accesses follow mem accesses
As we can see, there are two possible scenarios: either Thread B memory
accesses do not happen concurrently with Thread A accesses (1), or they
do (2).
1) Non-concurrent Thread A vs Thread B accesses:
Thread A Thread B
prev mem accesses
sys_membarrier()
follow mem accesses
prev mem accesses
barrier()
follow mem accesses
In this case, thread B accesses will be weakly ordered. This is OK,
because at that point, thread A is not particularly interested in
ordering them with respect to its own accesses.
2) Concurrent Thread A vs Thread B accesses
Thread A Thread B
prev mem accesses prev mem accesses
sys_membarrier() barrier()
follow mem accesses follow mem accesses
In this case, thread B accesses, which are ensured to be in program
order thanks to the compiler barrier, will be "upgraded" to full
smp_mb() by synchronize_sched().
* Benchmarks
On Intel Xeon E5405 (8 cores)
(one thread is calling sys_membarrier, the other 7 threads are busy
looping)
1000 non-expedited sys_membarrier calls in 33s =3D 33 milliseconds/call.
* User-space user of this system call: Userspace RCU library
Both the signal-based and the sys_membarrier userspace RCU schemes
permit us to remove the memory barrier from the userspace RCU
rcu_read_lock() and rcu_read_unlock() primitives, thus significantly
accelerating them. These memory barriers are replaced by compiler
barriers on the read-side, and all matching memory barriers on the
write-side are turned into an invocation of a memory barrier on all
active threads in the process. By letting the kernel perform this
synchronization rather than dumbly sending a signal to every process
threads (as we currently do), we diminish the number of unnecessary wake
ups and only issue the memory barriers on active threads. Non-running
threads do not need to execute such barrier anyway, because these are
implied by the scheduler context switches.
Results in liburcu:
Operations in 10s, 6 readers, 2 writers:
memory barriers in reader: 1701557485 reads, 2202847 writes
signal-based scheme: 9830061167 reads, 6700 writes
sys_membarrier: 9952759104 reads, 425 writes
sys_membarrier (dyn. check): 7970328887 reads, 425 writes
The dynamic sys_membarrier availability check adds some overhead to
the read-side compared to the signal-based scheme, but besides that,
sys_membarrier slightly outperforms the signal-based scheme. However,
this non-expedited sys_membarrier implementation has a much slower grace
period than signal and memory barrier schemes.
Besides diminishing the number of wake-ups, one major advantage of the
membarrier system call over the signal-based scheme is that it does not
need to reserve a signal. This plays much more nicely with libraries,
and with processes injected into for tracing purposes, for which we
cannot expect that signals will be unused by the application.
An expedited version of this system call can be added later on to speed
up the grace period. Its implementation will likely depend on reading
the cpu_curr()->mm without holding each CPU's rq lock.
This patch adds the system call to x86 and to asm-generic.
[1] http://urcu.so
membarrier(2) man page:
MEMBARRIER(2) Linux Programmer's Manual MEMBARRIER(2)
NAME
membarrier - issue memory barriers on a set of threads
SYNOPSIS
#include <linux/membarrier.h>
int membarrier(int cmd, int flags);
DESCRIPTION
The cmd argument is one of the following:
MEMBARRIER_CMD_QUERY
Query the set of supported commands. It returns a bitmask of
supported commands.
MEMBARRIER_CMD_SHARED
Execute a memory barrier on all threads running on the system.
Upon return from system call, the caller thread is ensured that
all running threads have passed through a state where all memory
accesses to user-space addresses match program order between
entry to and return from the system call (non-running threads
are de facto in such a state). This covers threads from all pro=E2=80=90
cesses running on the system. This command returns 0.
The flags argument needs to be 0. For future extensions.
All memory accesses performed in program order from each targeted
thread is guaranteed to be ordered with respect to sys_membarrier(). If
we use the semantic "barrier()" to represent a compiler barrier forcing
memory accesses to be performed in program order across the barrier,
and smp_mb() to represent explicit memory barriers forcing full memory
ordering across the barrier, we have the following ordering table for
each pair of barrier(), sys_membarrier() and smp_mb():
The pair ordering is detailed as (O: ordered, X: not ordered):
barrier() smp_mb() sys_membarrier()
barrier() X X O
smp_mb() X O O
sys_membarrier() O O O
RETURN VALUE
On success, these system calls return zero. On error, -1 is returned,
and errno is set appropriately. For a given command, with flags
argument set to 0, this system call is guaranteed to always return the
same value until reboot.
ERRORS
ENOSYS System call is not implemented.
EINVAL Invalid arguments.
Linux 2015-04-15 MEMBARRIER(2)
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Nicholas Miell <nmiell@comcast.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Pranith Kumar <bobby.prani@gmail.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-11 23:07:39 +03:00
2018-03-06 21:53:01 +03:00
cond_syscall ( sys_userfaultfd ) ;
sys_membarrier(): system-wide memory barrier (generic, x86)
Here is an implementation of a new system call, sys_membarrier(), which
executes a memory barrier on all threads running on the system. It is
implemented by calling synchronize_sched(). It can be used to
distribute the cost of user-space memory barriers asymmetrically by
transforming pairs of memory barriers into pairs consisting of
sys_membarrier() and a compiler barrier. For synchronization primitives
that distinguish between read-side and write-side (e.g. userspace RCU
[1], rwlocks), the read-side can be accelerated significantly by moving
the bulk of the memory barrier overhead to the write-side.
The existing applications of which I am aware that would be improved by
this system call are as follows:
* Through Userspace RCU library (http://urcu.so)
- DNS server (Knot DNS) https://www.knot-dns.cz/
- Network sniffer (http://netsniff-ng.org/)
- Distributed object storage (https://sheepdog.github.io/sheepdog/)
- User-space tracing (http://lttng.org)
- Network storage system (https://www.gluster.org/)
- Virtual routers (https://events.linuxfoundation.org/sites/events/files/slides/DPDK_RCU_0MQ.pdf)
- Financial software (https://lkml.org/lkml/2015/3/23/189)
Those projects use RCU in userspace to increase read-side speed and
scalability compared to locking. Especially in the case of RCU used by
libraries, sys_membarrier can speed up the read-side by moving the bulk of
the memory barrier cost to synchronize_rcu().
* Direct users of sys_membarrier
- core dotnet garbage collector (https://github.com/dotnet/coreclr/issues/198)
Microsoft core dotnet GC developers are planning to use the mprotect()
side-effect of issuing memory barriers through IPIs as a way to implement
Windows FlushProcessWriteBuffers() on Linux. They are referring to
sys_membarrier in their github thread, specifically stating that
sys_membarrier() is what they are looking for.
To explain the benefit of this scheme, let's introduce two example threads:
Thread A (non-frequent, e.g. executing liburcu synchronize_rcu())
Thread B (frequent, e.g. executing liburcu
rcu_read_lock()/rcu_read_unlock())
In a scheme where all smp_mb() in thread A are ordering memory accesses
with respect to smp_mb() present in Thread B, we can change each
smp_mb() within Thread A into calls to sys_membarrier() and each
smp_mb() within Thread B into compiler barriers "barrier()".
Before the change, we had, for each smp_mb() pairs:
Thread A Thread B
previous mem accesses previous mem accesses
smp_mb() smp_mb()
following mem accesses following mem accesses
After the change, these pairs become:
Thread A Thread B
prev mem accesses prev mem accesses
sys_membarrier() barrier()
follow mem accesses follow mem accesses
As we can see, there are two possible scenarios: either Thread B memory
accesses do not happen concurrently with Thread A accesses (1), or they
do (2).
1) Non-concurrent Thread A vs Thread B accesses:
Thread A Thread B
prev mem accesses
sys_membarrier()
follow mem accesses
prev mem accesses
barrier()
follow mem accesses
In this case, thread B accesses will be weakly ordered. This is OK,
because at that point, thread A is not particularly interested in
ordering them with respect to its own accesses.
2) Concurrent Thread A vs Thread B accesses
Thread A Thread B
prev mem accesses prev mem accesses
sys_membarrier() barrier()
follow mem accesses follow mem accesses
In this case, thread B accesses, which are ensured to be in program
order thanks to the compiler barrier, will be "upgraded" to full
smp_mb() by synchronize_sched().
* Benchmarks
On Intel Xeon E5405 (8 cores)
(one thread is calling sys_membarrier, the other 7 threads are busy
looping)
1000 non-expedited sys_membarrier calls in 33s =3D 33 milliseconds/call.
* User-space user of this system call: Userspace RCU library
Both the signal-based and the sys_membarrier userspace RCU schemes
permit us to remove the memory barrier from the userspace RCU
rcu_read_lock() and rcu_read_unlock() primitives, thus significantly
accelerating them. These memory barriers are replaced by compiler
barriers on the read-side, and all matching memory barriers on the
write-side are turned into an invocation of a memory barrier on all
active threads in the process. By letting the kernel perform this
synchronization rather than dumbly sending a signal to every process
threads (as we currently do), we diminish the number of unnecessary wake
ups and only issue the memory barriers on active threads. Non-running
threads do not need to execute such barrier anyway, because these are
implied by the scheduler context switches.
Results in liburcu:
Operations in 10s, 6 readers, 2 writers:
memory barriers in reader: 1701557485 reads, 2202847 writes
signal-based scheme: 9830061167 reads, 6700 writes
sys_membarrier: 9952759104 reads, 425 writes
sys_membarrier (dyn. check): 7970328887 reads, 425 writes
The dynamic sys_membarrier availability check adds some overhead to
the read-side compared to the signal-based scheme, but besides that,
sys_membarrier slightly outperforms the signal-based scheme. However,
this non-expedited sys_membarrier implementation has a much slower grace
period than signal and memory barrier schemes.
Besides diminishing the number of wake-ups, one major advantage of the
membarrier system call over the signal-based scheme is that it does not
need to reserve a signal. This plays much more nicely with libraries,
and with processes injected into for tracing purposes, for which we
cannot expect that signals will be unused by the application.
An expedited version of this system call can be added later on to speed
up the grace period. Its implementation will likely depend on reading
the cpu_curr()->mm without holding each CPU's rq lock.
This patch adds the system call to x86 and to asm-generic.
[1] http://urcu.so
membarrier(2) man page:
MEMBARRIER(2) Linux Programmer's Manual MEMBARRIER(2)
NAME
membarrier - issue memory barriers on a set of threads
SYNOPSIS
#include <linux/membarrier.h>
int membarrier(int cmd, int flags);
DESCRIPTION
The cmd argument is one of the following:
MEMBARRIER_CMD_QUERY
Query the set of supported commands. It returns a bitmask of
supported commands.
MEMBARRIER_CMD_SHARED
Execute a memory barrier on all threads running on the system.
Upon return from system call, the caller thread is ensured that
all running threads have passed through a state where all memory
accesses to user-space addresses match program order between
entry to and return from the system call (non-running threads
are de facto in such a state). This covers threads from all pro=E2=80=90
cesses running on the system. This command returns 0.
The flags argument needs to be 0. For future extensions.
All memory accesses performed in program order from each targeted
thread is guaranteed to be ordered with respect to sys_membarrier(). If
we use the semantic "barrier()" to represent a compiler barrier forcing
memory accesses to be performed in program order across the barrier,
and smp_mb() to represent explicit memory barriers forcing full memory
ordering across the barrier, we have the following ordering table for
each pair of barrier(), sys_membarrier() and smp_mb():
The pair ordering is detailed as (O: ordered, X: not ordered):
barrier() smp_mb() sys_membarrier()
barrier() X X O
smp_mb() X O O
sys_membarrier() O O O
RETURN VALUE
On success, these system calls return zero. On error, -1 is returned,
and errno is set appropriately. For a given command, with flags
argument set to 0, this system call is guaranteed to always return the
same value until reboot.
ERRORS
ENOSYS System call is not implemented.
EINVAL Invalid arguments.
Linux 2015-04-15 MEMBARRIER(2)
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Nicholas Miell <nmiell@comcast.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Pranith Kumar <bobby.prani@gmail.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-11 23:07:39 +03:00
/* membarrier */
cond_syscall ( sys_membarrier ) ;
2016-09-12 23:38:42 +03:00
2018-03-06 21:53:01 +03:00
cond_syscall ( sys_mlock2 ) ;
cond_syscall ( sys_copy_file_range ) ;
2016-09-12 23:38:42 +03:00
/* memory protection keys */
cond_syscall ( sys_pkey_mprotect ) ;
cond_syscall ( sys_pkey_alloc ) ;
cond_syscall ( sys_pkey_free ) ;
2018-03-06 21:53:01 +03:00
/*
* Architecture specific weak syscall entries .
*/
/* pciconfig: alpha, arm, arm64, ia64, sparc */
cond_syscall ( sys_pciconfig_read ) ;
cond_syscall ( sys_pciconfig_write ) ;
cond_syscall ( sys_pciconfig_iobase ) ;
/* sys_socketcall: arm, mips, x86, ... */
cond_syscall ( sys_socketcall ) ;
cond_syscall ( compat_sys_socketcall ) ;
/* compat syscalls for arm64, x86, ... */
cond_syscall ( compat_sys_sysctl ) ;
cond_syscall ( compat_sys_fanotify_mark ) ;
/* x86 */
cond_syscall ( sys_vm86old ) ;
cond_syscall ( sys_modify_ldt ) ;
cond_syscall ( compat_sys_quotactl32 ) ;
cond_syscall ( sys_vm86 ) ;
cond_syscall ( sys_kexec_file_load ) ;
/* s390 */
cond_syscall ( sys_s390_pci_mmio_read ) ;
cond_syscall ( sys_s390_pci_mmio_write ) ;
cond_syscall ( compat_sys_s390_ipc ) ;
/* powerpc */
cond_syscall ( ppc_rtas ) ;
cond_syscall ( sys_spu_run ) ;
cond_syscall ( sys_spu_create ) ;
cond_syscall ( sys_subpage_prot ) ;
/*
* Deprecated system calls which are still defined in
* include / uapi / asm - generic / unistd . h and wanted by > = 1 arch
*/
/* __ARCH_WANT_SYSCALL_NO_FLAGS */
cond_syscall ( sys_epoll_create ) ;
cond_syscall ( sys_inotify_init ) ;
cond_syscall ( sys_eventfd ) ;
cond_syscall ( sys_signalfd ) ;
cond_syscall ( compat_sys_signalfd ) ;
/* __ARCH_WANT_SYSCALL_OFF_T */
cond_syscall ( sys_fadvise64 ) ;
/* __ARCH_WANT_SYSCALL_DEPRECATED */
cond_syscall ( sys_epoll_wait ) ;
cond_syscall ( sys_recv ) ;
cond_syscall ( compat_sys_recv ) ;
cond_syscall ( sys_send ) ;
cond_syscall ( sys_bdflush ) ;
cond_syscall ( sys_uselib ) ;
/*
* The syscalls below are not found in include / uapi / asm - generic / unistd . h
*/
/* obsolete: SGETMASK_SYSCALL */
cond_syscall ( sys_sgetmask ) ;
cond_syscall ( sys_ssetmask ) ;
/* obsolete: SYSFS_SYSCALL */
cond_syscall ( sys_sysfs ) ;
/* obsolete: __ARCH_WANT_SYS_IPC */
cond_syscall ( sys_ipc ) ;
cond_syscall ( compat_sys_ipc ) ;
/* obsolete: UID16 */
cond_syscall ( sys_chown16 ) ;
cond_syscall ( sys_fchown16 ) ;
cond_syscall ( sys_getegid16 ) ;
cond_syscall ( sys_geteuid16 ) ;
cond_syscall ( sys_getgid16 ) ;
cond_syscall ( sys_getgroups16 ) ;
cond_syscall ( sys_getresgid16 ) ;
cond_syscall ( sys_getresuid16 ) ;
cond_syscall ( sys_getuid16 ) ;
cond_syscall ( sys_lchown16 ) ;
cond_syscall ( sys_setfsgid16 ) ;
cond_syscall ( sys_setfsuid16 ) ;
cond_syscall ( sys_setgid16 ) ;
cond_syscall ( sys_setgroups16 ) ;
cond_syscall ( sys_setregid16 ) ;
cond_syscall ( sys_setresgid16 ) ;
cond_syscall ( sys_setresuid16 ) ;
cond_syscall ( sys_setreuid16 ) ;
cond_syscall ( sys_setuid16 ) ;