linux/kernel
Kees Cook 800179c9b8 fs: add link restrictions
This adds symlink and hardlink restrictions to the Linux VFS.

Symlinks:

A long-standing class of security issues is the symlink-based
time-of-check-time-of-use race, most commonly seen in world-writable
directories like /tmp. The common method of exploitation of this flaw
is to cross privilege boundaries when following a given symlink (i.e. a
root process follows a symlink belonging to another user). For a likely
incomplete list of hundreds of examples across the years, please see:
http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp

The solution is to permit symlinks to only be followed when outside
a sticky world-writable directory, or when the uid of the symlink and
follower match, or when the directory owner matches the symlink's owner.

Some pointers to the history of earlier discussion that I could find:

 1996 Aug, Zygo Blaxell
  http://marc.info/?l=bugtraq&m=87602167419830&w=2
 1996 Oct, Andrew Tridgell
  http://lkml.indiana.edu/hypermail/linux/kernel/9610.2/0086.html
 1997 Dec, Albert D Cahalan
  http://lkml.org/lkml/1997/12/16/4
 2005 Feb, Lorenzo Hernández García-Hierro
  http://lkml.indiana.edu/hypermail/linux/kernel/0502.0/1896.html
 2010 May, Kees Cook
  https://lkml.org/lkml/2010/5/30/144

Past objections and rebuttals could be summarized as:

 - Violates POSIX.
   - POSIX didn't consider this situation and it's not useful to follow
     a broken specification at the cost of security.
 - Might break unknown applications that use this feature.
   - Applications that break because of the change are easy to spot and
     fix. Applications that are vulnerable to symlink ToCToU by not having
     the change aren't. Additionally, no applications have yet been found
     that rely on this behavior.
 - Applications should just use mkstemp() or O_CREATE|O_EXCL.
   - True, but applications are not perfect, and new software is written
     all the time that makes these mistakes; blocking this flaw at the
     kernel is a single solution to the entire class of vulnerability.
 - This should live in the core VFS.
   - This should live in an LSM. (https://lkml.org/lkml/2010/5/31/135)
 - This should live in an LSM.
   - This should live in the core VFS. (https://lkml.org/lkml/2010/8/2/188)

Hardlinks:

On systems that have user-writable directories on the same partition
as system files, a long-standing class of security issues is the
hardlink-based time-of-check-time-of-use race, most commonly seen in
world-writable directories like /tmp. The common method of exploitation
of this flaw is to cross privilege boundaries when following a given
hardlink (i.e. a root process follows a hardlink created by another
user). Additionally, an issue exists where users can "pin" a potentially
vulnerable setuid/setgid file so that an administrator will not actually
upgrade a system fully.

The solution is to permit hardlinks to only be created when the user is
already the existing file's owner, or if they already have read/write
access to the existing file.

Many Linux users are surprised when they learn they can link to files
they have no access to, so this change appears to follow the doctrine
of "least surprise". Additionally, this change does not violate POSIX,
which states "the implementation may require that the calling process
has permission to access the existing file"[1].

This change is known to break some implementations of the "at" daemon,
though the version used by Fedora and Ubuntu has been fixed[2] for
a while. Otherwise, the change has been undisruptive while in use in
Ubuntu for the last 1.5 years.

[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/linkat.html
[2] http://anonscm.debian.org/gitweb/?p=collab-maint/at.git;a=commitdiff;h=f4114656c3a6c6f6070e315ffdf940a49eda3279

This patch is based on the patches in Openwall and grsecurity, along with
suggestions from Al Viro. I have added a sysctl to enable the protected
behavior, and documentation.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-29 21:37:58 +04:00
..
debug KGDB/KDB regression fixes 2012-04-04 17:26:08 -07:00
events perf: Use css_tryget() to avoid propping up css refcount 2012-06-18 11:45:57 +02:00
gcov gcov: disable CONSTRUCTORS for UML 2011-07-26 16:49:45 -07:00
irq merge task_work and rcu_head, get rid of separate allocation for keyring case 2012-07-22 23:57:56 +04:00
power PM / Hibernate: Use get_gendisk to verify partition if resume_file is integer format 2012-05-18 20:44:59 +02:00
sched Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-06-08 14:59:29 -07:00
time timekeeping: Provide hrtimer update function 2012-07-11 23:34:39 +02:00
trace Merge branch 'for-linus' of git://git.kernel.dk/linux-block 2012-07-03 15:45:10 -07:00
.gitignore
acct.c Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-01-08 12:19:57 -08:00
async.c kernel/async: remove redundant declaration. 2012-01-13 09:32:18 +10:30
audit_tree.c VFS: Make clone_mnt()/copy_tree()/collect_mounts() return errors 2012-07-14 16:37:27 +04:00
audit_watch.c get rid of kern_path_parent() 2012-07-14 16:35:02 +04:00
audit.c constify path argument of audit_log_d_path() 2012-03-20 21:29:40 -04:00
audit.h audit: remove AUDIT_SETUP_CONTEXT as it isn't used 2012-01-17 16:16:57 -05:00
auditfilter.c audit: allow interfield comparison in audit rules 2012-01-17 16:17:01 -05:00
auditsc.c seccomp: remove duplicated failure logging 2012-04-14 11:13:20 +10:00
backtracetest.c
bounds.c
capability.c userns: Teach inode_capable to understand inodes whose uids map to other namespaces. 2012-05-15 14:59:24 -07:00
cgroup_freezer.c cgroup: convert all non-memcg controllers to the new cftype interface 2012-04-01 12:09:55 -07:00
cgroup.c VFS: Pass mount flags to sget() 2012-07-14 16:38:34 +04:00
compat.c new helper: sigsuspend() 2012-05-21 23:52:30 -04:00
configs.c kernel/configs.c: include MODULE_*() when CONFIG_IKCONFIG_PROC=n 2011-07-25 20:57:15 -07:00
cpu_pm.c kernel/cpu_pm.c: fix various typos 2012-05-31 17:49:27 -07:00
cpu.c kernel/cpu.c: document clear_tasks_mm_cpumask() 2012-05-31 17:49:30 -07:00
cpuset.c Merge branch 'for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2012-05-22 17:40:19 -07:00
crash_dump.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
cred.c keys: kill task_struct->replacement_session_keyring 2012-05-23 22:11:41 -04:00
delayacct.c
dma.c Remove all #inclusions of asm/system.h 2012-03-28 18:30:03 +01:00
elfcore.c
exec_domain.c
exit.c move exit_task_work() past exit_files() et.al. 2012-07-22 23:57:57 +04:00
extable.c extable: Skip sorting if sorted at build time. 2012-04-19 15:06:55 -07:00
fork.c trim task_work: get rid of hlist 2012-07-22 23:57:55 +04:00
freezer.c PM / Freezer: Remove references to TIF_FREEZE in comments 2012-03-04 23:08:54 +01:00
futex_compat.c futex: Mark get_robust_list as deprecated 2012-03-29 11:37:17 +02:00
futex.c futex: Mark get_robust_list as deprecated 2012-03-29 11:37:17 +02:00
groups.c userns: Convert in_group_p and in_egroup_p to use kgid_t 2012-05-03 03:29:33 -07:00
hrtimer.c hrtimer: Update hrtimer base offsets each hrtimer_interrupt 2012-07-11 23:34:39 +02:00
hung_task.c hung task debugging: Inject NMI when hung and going to panic 2012-04-25 12:39:25 +02:00
irq_work.c irq_work: fix compile failure on tile from missing include 2012-04-13 13:15:16 -04:00
itimer.c itimer: Use printk_once instead of WARN_ONCE 2012-04-10 11:00:30 +02:00
jump_label.c static keys: Inline the static_key_enabled() function 2012-02-28 20:01:08 +01:00
kallsyms.c vsprintf: fix %ps on non symbols when using kallsyms 2012-05-29 16:22:32 -07:00
kcmp.c syscalls, x86: add __NR_kcmp syscall 2012-05-31 17:49:32 -07:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks locking/kconfig: Simplify INLINE_SPIN_UNLOCK usage 2012-03-23 13:18:57 +01:00
Kconfig.preempt locking/kconfig: Simplify INLINE_SPIN_UNLOCK usage 2012-03-23 13:18:57 +01:00
kexec.c Merge branch 'akpm' (Andrew's patch-bomb) 2012-03-28 17:19:28 -07:00
kfifo.c [media] kernel:kfifo: export __kfifo_max_r symbol 2012-04-11 18:24:37 -03:00
kmod.c kmod.c: fix kernel-doc warning 2012-05-31 17:49:28 -07:00
kprobes.c kprobes: return proper error code from register_kprobe() 2012-03-05 15:49:42 -08:00
ksysfs.c kernel: ksysfs.c is implicitly using stat.h 2011-10-31 09:20:13 -04:00
kthread.c freezer: kill unused set_freezable_with_signal() 2011-11-23 09:28:17 -08:00
latencytop.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
lglock.c brlocks/lglocks: turn into functions 2012-05-29 23:28:41 -04:00
lockdep_internals.h
lockdep_proc.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
lockdep_states.h
lockdep.c lockdep: Add CPU-idle/offline warning to lockdep-RCU splat 2012-02-21 09:06:06 -08:00
Makefile Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-06-01 10:34:35 -07:00
module.c Guard check in module loader against integer overflow 2012-05-23 22:28:53 +09:30
mutex-debug.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
mutex-debug.h
mutex.c sched/rt: Use schedule_preempt_disabled() 2012-03-01 10:28:03 +01:00
mutex.h
notifier.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
nsproxy.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
padata.c padata: Fix cpu hotplug 2012-03-29 19:52:46 +08:00
panic.c kdump: Execute kmsg_dump(KMSG_DUMP_PANIC) after smp_send_stop() 2012-05-18 14:02:10 +02:00
params.c params: replace printk(KERN_<LVL>...) with pr_<lvl>(...) 2012-05-04 17:28:18 -07:00
pid_namespace.c pidns: guarantee that the pidns init will be the last pidns process reaped 2012-06-20 14:39:36 -07:00
pid.c mm: add a low limit to alloc_large_system_hash 2012-05-24 00:28:21 -04:00
posix-cpu-timers.c [S390] cputime: add sparse checking and cleanup 2011-12-15 14:56:19 +01:00
posix-timers.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
printk.c kmsg: merge continuation records while printing 2012-07-09 12:15:42 -07:00
profile.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
ptrace.c userns: Convert ptrace, kill, set_priority permission checks to work with kuids and kgids 2012-05-03 03:28:51 -07:00
range.c range: fix bogus misuse of module.h to get printk() 2011-10-31 09:20:11 -04:00
rcu.h rcu: Allow nesting of rcu_idle_enter() and rcu_idle_exit() 2012-02-21 09:06:12 -08:00
rcupdate.c rcu: Make exit_rcu() more precise and consolidate 2012-05-02 14:48:27 -07:00
rcutiny_plugin.h rcu: Make exit_rcu() more precise and consolidate 2012-05-02 14:48:27 -07:00
rcutiny.c rcu: Add RCU_NONIDLE() for idle-loop RCU read-side critical sections 2012-02-21 09:06:13 -08:00
rcutorture.c rcu: Add rcutorture test for call_srcu() 2012-04-30 10:48:26 -07:00
rcutree_plugin.h rcu: Precompute RCU_FAST_NO_HZ timer offsets 2012-06-06 20:43:28 -07:00
rcutree_trace.c rcu: Make rcu_barrier() less disruptive 2012-05-09 14:27:54 -07:00
rcutree.c rcu: Stop rcu_do_batch() from multiplexing the "count" variable 2012-06-25 12:35:25 -07:00
rcutree.h rcu: Move RCU_FAST_NO_HZ per-CPU variables to rcu_dynticks structure 2012-06-06 20:43:28 -07:00
relay.c splice: fix racy pipe->buffers uses 2012-06-13 21:16:42 +02:00
res_counter.c rescounters: add res_counter_uncharge_until() 2012-05-29 16:22:27 -07:00
resource.c kernel/resource.c: correct the comment of allocate_resource() 2012-05-31 17:49:26 -07:00
rtmutex_common.h
rtmutex-debug.c lockdep, rtmutex, bug: Show taint flags on error 2011-12-06 08:16:49 +01:00
rtmutex-debug.h
rtmutex-tester.c rtmutex-tester: convert sysdev_class to a regular subsystem 2011-12-14 14:54:22 -08:00
rtmutex.c Revert "rcu: Permit rt_mutex_unlock() with irqs disabled" 2011-12-11 10:33:18 -08:00
rtmutex.h
rwsem.c Remove all #inclusions of asm/system.h 2012-03-28 18:30:03 +01:00
seccomp.c seccomp: fix build warnings when there is no CONFIG_SECCOMP_FILTER 2012-04-18 12:24:52 +10:00
semaphore.c semaphore: fix improper comment reference to mutex 2012-04-05 17:15:55 -07:00
signal.c signal: make sure we don't get stopped with pending task_work 2012-07-22 23:57:54 +04:00
smp.c smp: Implement kick_all_cpus_sync() 2012-05-08 12:35:06 +02:00
smpboot.c smpboot, idle: Fix comment mismatch over idle_threads_init() 2012-05-24 22:58:08 +02:00
smpboot.h smp: Fix idle_thread_init() inline stub 2012-05-04 12:52:25 +02:00
softirq.c Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-03-20 10:32:09 -07:00
spinlock.c locking/kconfig: Simplify INLINE_SPIN_UNLOCK usage 2012-03-23 13:18:57 +01:00
srcu.c rcu: Implement per-domain single-threaded call_srcu() state machine 2012-04-30 10:48:25 -07:00
stacktrace.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
stop_machine.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
sys_ni.c syscalls, x86: add __NR_kcmp syscall 2012-05-31 17:49:32 -07:00
sys.c c/r: prctl: less paranoid prctl_set_mm_exe_file() 2012-07-11 16:04:43 -07:00
sysctl_binary.c binary_sysctl(): fix memory leak 2011-12-20 10:25:04 -08:00
sysctl.c fs: add link restrictions 2012-07-29 21:37:58 +04:00
task_work.c deal with task_work callbacks adding more work 2012-07-22 23:57:57 +04:00
taskstats.c Make TASKSTATS require root access 2011-09-19 17:04:37 -07:00
test_kprobes.c
time.c time: Remove bogus comments 2012-03-15 18:17:55 -07:00
timeconst.pl
timer.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2012-05-23 17:42:39 -07:00
tracepoint.c static keys: Introduce 'struct static_key', static_key_true()/false() and static_key_slow_[inc|dec]() 2012-02-24 10:05:59 +01:00
tsacct.c [S390] cputime: add sparse checking and cleanup 2011-12-15 14:56:19 +01:00
uid16.c userns: Convert setting and getting uid and gid system calls to use kuid and kgid 2012-05-03 03:28:41 -07:00
up.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
user_namespace.c userns: Store uid and gid values in struct cred with kuid_t and kgid_t types 2012-05-03 03:28:38 -07:00
user-return-notifier.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
user.c userns: Silence silly gcc warning. 2012-05-19 15:44:40 -06:00
utsname_sysctl.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
utsname.c userns: Use cred->user_ns instead of cred->user->user_ns 2012-04-07 16:55:51 -07:00
wait.c lockdep/waitqueues: Add better annotation 2011-12-21 10:07:39 +01:00
watchdog.c watchdog: Quiet down the boot messages 2012-06-14 12:20:50 +02:00
workqueue_sched.h
workqueue.c lockdep: fix oops in processing workqueue 2012-05-15 08:08:31 -07:00