Commit Graph

1950 Commits

Author SHA1 Message Date
Linus Torvalds
3c4cfadef6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking changes from David S Miller:

 1) Remove the ipv4 routing cache.  Now lookups go directly into the FIB
    trie and use prebuilt routes cached there.

    No more garbage collection, no more rDOS attacks on the routing
    cache.  Instead we now get predictable and consistent performance,
    no matter what the pattern of traffic we service.

    This has been almost 2 years in the making.  Special thanks to
    Julian Anastasov, Eric Dumazet, Steffen Klassert, and others who
    have helped along the way.

    I'm sure that with a change of this magnitude there will be some
    kind of fallout, but such things ought the be simple to fix at this
    point.  Luckily I'm not European so I'll be around all of August to
    fix things :-)

    The major stages of this work here are each fronted by a forced
    merge commit whose commit message contains a top-level description
    of the motivations and implementation issues.

 2) Pre-demux of established ipv4 TCP sockets, saves a route demux on
    input.

 3) TCP SYN/ACK performance tweaks from Eric Dumazet.

 4) Add namespace support for netfilter L4 conntrack helpers, from Gao
    Feng.

 5) Add config mechanism for Energy Efficient Ethernet to ethtool, from
    Yuval Mintz.

 6) Remove quadratic behavior from /proc/net/unix, from Eric Dumazet.

 7) Support for connection tracker helpers in userspace, from Pablo
    Neira Ayuso.

 8) Allow userspace driven TX load balancing functions in TEAM driver,
    from Jiri Pirko.

 9) Kill off NLMSG_PUT and RTA_PUT macros, more gross stuff with
    embedded gotos.

10) TCP Small Queues, essentially minimize the amount of TCP data queued
    up in the packet scheduler layer.  Whereas the existing BQL (Byte
    Queue Limits) limits the pkt_sched --> netdevice queuing levels,
    this controls the TCP --> pkt_sched queueing levels.

    From Eric Dumazet.

11) Reduce the number of get_page/put_page ops done on SKB fragments,
    from Alexander Duyck.

12) Implement protection against blind resets in TCP (RFC 5961), from
    Eric Dumazet.

13) Support the client side of TCP Fast Open, basically the ability to
    send data in the SYN exchange, from Yuchung Cheng.

    Basically, the sender queues up data with a sendmsg() call using
    MSG_FASTOPEN, then they do the connect() which emits the queued up
    fastopen data.

14) Avoid all the problems we get into in TCP when timers or PMTU events
    hit a locked socket.  The TCP Small Queues changes added a
    tcp_release_cb() that allows us to queue work up to the
    release_sock() caller, and that's what we use here too.  From Eric
    Dumazet.

15) Zero copy on TX support for TUN driver, from Michael S. Tsirkin.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1870 commits)
  genetlink: define lockdep_genl_is_held() when CONFIG_LOCKDEP
  r8169: revert "add byte queue limit support".
  ipv4: Change rt->rt_iif encoding.
  net: Make skb->skb_iif always track skb->dev
  ipv4: Prepare for change of rt->rt_iif encoding.
  ipv4: Remove all RTCF_DIRECTSRC handliing.
  ipv4: Really ignore ICMP address requests/replies.
  decnet: Don't set RTCF_DIRECTSRC.
  net/ipv4/ip_vti.c: Fix __rcu warnings detected by sparse.
  ipv4: Remove redundant assignment
  rds: set correct msg_namelen
  openvswitch: potential NULL deref in sample()
  tcp: dont drop MTU reduction indications
  bnx2x: Add new 57840 device IDs
  tcp: avoid oops in tcp_metrics and reset tcpm_stamp
  niu: Change niu_rbr_fill() to use unlikely() to check niu_rbr_add_page() return value
  niu: Fix to check for dma mapping errors.
  net: Fix references to out-of-scope variables in put_cmsg_compat()
  net: ethernet: davinci_emac: add pm_runtime support
  net: ethernet: davinci_emac: Remove unnecessary #include
  ...
2012-07-24 10:01:50 -07:00
Linus Torvalds
e05644e17e Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
 "Nothing groundbreaking for this kernel, just cleanups and fixes, and a
  couple of Smack enhancements."

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (21 commits)
  Smack: Maintainer Record
  Smack: don't show empty rules when /smack/load or /smack/load2 is read
  Smack: user access check bounds
  Smack: onlycap limits on CAP_MAC_ADMIN
  Smack: fix smack_new_inode bogosities
  ima: audit is compiled only when enabled
  ima: ima_initialized is set only if successful
  ima: add policy for pseudo fs
  ima: remove unused cleanup functions
  ima: free securityfs violations file
  ima: use full pathnames in measurement list
  security: Fix nommu build.
  samples: seccomp: add .gitignore for untracked executables
  tpm: check the chip reference before using it
  TPM: fix memleak when register hardware fails
  TPM: chip disabled state erronously being reported as error
  MAINTAINERS: TPM maintainers' contacts update
  Merge branches 'next-queue' and 'next' into next
  Remove unused code from MPI library
  Revert "crypto: GnuPG based MPI lib - additional sources (part 4)"
  ...
2012-07-23 18:49:06 -07:00
Linus Torvalds
a66d2c8f7e Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull the big VFS changes from Al Viro:
 "This one is *big* and changes quite a few things around VFS.  What's in there:

   - the first of two really major architecture changes - death to open
     intents.

     The former is finally there; it was very long in making, but with
     Miklos getting through really hard and messy final push in
     fs/namei.c, we finally have it.  Unlike his variant, this one
     doesn't introduce struct opendata; what we have instead is
     ->atomic_open() taking preallocated struct file * and passing
     everything via its fields.

     Instead of returning struct file *, it returns -E...  on error, 0
     on success and 1 in "deal with it yourself" case (e.g.  symlink
     found on server, etc.).

     See comments before fs/namei.c:atomic_open().  That made a lot of
     goodies finally possible and quite a few are in that pile:
     ->lookup(), ->d_revalidate() and ->create() do not get struct
     nameidata * anymore; ->lookup() and ->d_revalidate() get lookup
     flags instead, ->create() gets "do we want it exclusive" flag.

     With the introduction of new helper (kern_path_locked()) we are rid
     of all struct nameidata instances outside of fs/namei.c; it's still
     visible in namei.h, but not for long.  Come the next cycle,
     declaration will move either to fs/internal.h or to fs/namei.c
     itself.  [me, miklos, hch]

   - The second major change: behaviour of final fput().  Now we have
     __fput() done without any locks held by caller *and* not from deep
     in call stack.

     That obviously lifts a lot of constraints on the locking in there.
     Moreover, it's legal now to call fput() from atomic contexts (which
     has immediately simplified life for aio.c).  We also don't need
     anti-recursion logics in __scm_destroy() anymore.

     There is a price, though - the damn thing has become partially
     asynchronous.  For fput() from normal process we are guaranteed
     that pending __fput() will be done before the caller returns to
     userland, exits or gets stopped for ptrace.

     For kernel threads and atomic contexts it's done via
     schedule_work(), so theoretically we might need a way to make sure
     it's finished; so far only one such place had been found, but there
     might be more.

     There's flush_delayed_fput() (do all pending __fput()) and there's
     __fput_sync() (fput() analog doing __fput() immediately).  I hope
     we won't need them often; see warnings in fs/file_table.c for
     details.  [me, based on task_work series from Oleg merged last
     cycle]

   - sync series from Jan

   - large part of "death to sync_supers()" work from Artem; the only
     bits missing here are exofs and ext4 ones.  As far as I understand,
     those are going via the exofs and ext4 trees resp.; once they are
     in, we can put ->write_super() to the rest, along with the thread
     calling it.

   - preparatory bits from unionmount series (from dhowells).

   - assorted cleanups and fixes all over the place, as usual.

  This is not the last pile for this cycle; there's at least jlayton's
  ESTALE work and fsfreeze series (the latter - in dire need of fixes,
  so I'm not sure it'll make the cut this cycle).  I'll probably throw
  symlink/hardlink restrictions stuff from Kees into the next pile, too.
  Plus there's a lot of misc patches I hadn't thrown into that one -
  it's large enough as it is..."

* 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (127 commits)
  ext4: switch EXT4_IOC_RESIZE_FS to mnt_want_write_file()
  btrfs: switch btrfs_ioctl_balance() to mnt_want_write_file()
  switch dentry_open() to struct path, make it grab references itself
  spufs: shift dget/mntget towards dentry_open()
  zoran: don't bother with struct file * in zoran_map
  ecryptfs: don't reinvent the wheels, please - use struct completion
  don't expose I_NEW inodes via dentry->d_inode
  tidy up namei.c a bit
  unobfuscate follow_up() a bit
  ext3: pass custom EOF to generic_file_llseek_size()
  ext4: use core vfs llseek code for dir seeks
  vfs: allow custom EOF in generic_file_llseek code
  vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes
  vfs: Remove unnecessary flushing of block devices
  vfs: Make sys_sync writeout also block device inodes
  vfs: Create function for iterating over block devices
  vfs: Reorder operations during sys_sync
  quota: Move quota syncing to ->sync_fs method
  quota: Split dquot_quota_sync() to writeback and cache flushing part
  vfs: Move noop_backing_dev_info check from sync into writeback
  ...
2012-07-23 12:27:27 -07:00
Al Viro
765927b2d5 switch dentry_open() to struct path, make it grab references itself
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-23 00:01:29 +04:00
Al Viro
d35abdb288 hold task_lock around checks in keyctl
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-22 23:58:01 +04:00
Al Viro
67d1214551 merge task_work and rcu_head, get rid of separate allocation for keyring case
task_work and rcu_head are identical now; merge them (calling the result
struct callback_head, rcu_head #define'd to it), kill separate allocation
in security/keys since we can just use cred->rcu now.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-22 23:57:56 +04:00
Al Viro
41f9d29f09 trimming task_work: kill ->data
get rid of the only user of ->data; this is _not_ the final variant - in the
end we'll have task_work and rcu_head identical and just use cred->rcu,
at which point the separate allocation will be gone completely.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-22 23:57:54 +04:00
David S. Miller
abaa72d7fd Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
2012-07-19 11:17:30 -07:00
Linus Torvalds
e2f3b78557 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull SELinux regression fixes from James Morris.

Andrew Morton has a box that hit that open perms problem.

I also renamed the "epollwakeup" selinux name for the new capability to
be "block_suspend", to match the rename done by commit d9914cf661
("PM: Rename CAP_EPOLLWAKEUP to CAP_BLOCK_SUSPEND").

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  SELinux: do not check open perms if they are not known to policy
  SELinux: include definition of new capabilities
2012-07-18 13:42:44 -07:00
Eric Paris
3d2195c332 SELinux: do not check open perms if they are not known to policy
When I introduced open perms policy didn't understand them and I
implemented them as a policycap.  When I added the checking of open perm
to truncate I forgot to conditionalize it on the userspace defined
policy capability.  Running an old policy with a new kernel will not
check open on open(2) but will check it on truncate.  Conditionalize the
truncate check the same as the open check.

Signed-off-by: Eric Paris <eparis@redhat.com>
Cc: stable@vger.kernel.org # 3.4.x
Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-07-16 11:41:47 +10:00
Eric Paris
64919e6091 SELinux: include definition of new capabilities
The kernel has added CAP_WAKE_ALARM and CAP_EPOLLWAKEUP.  We need to
define these in SELinux so they can be mediated by policy.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-07-16 11:40:31 +10:00
Rafal Krypa
65ee7f45cf Smack: don't show empty rules when /smack/load or /smack/load2 is read
This patch removes empty rules (i.e. with access set to '-') from the
rule list presented to user space.

Smack by design never removes labels nor rules from its lists. Access
for a rule may be set to '-' to effectively disable it. Such rules would
show up in the listing generated when /smack/load or /smack/load2 is
read. This may cause clutter if many rules were disabled.

As a rule with access set to '-' is equivalent to no rule at all, they
may be safely hidden from the listing.

Targeted for git://git.gitorious.org/smack-next/kernel.git

Signed-off-by: Rafal Krypa <r.krypa@samsung.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2012-07-13 15:49:24 -07:00
Casey Schaufler
3518721a89 Smack: user access check bounds
Some of the bounds checking used on the /smack/access
interface was lost when support for long labels was
added. No kernel access checks are affected, however
this is a case where /smack/access could be used
incorrectly and fail to detect the error. This patch
reintroduces the original checks.

Targeted for git://git.gitorious.org/smack-next/kernel.git

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2012-07-13 15:49:24 -07:00
Casey Schaufler
1880eff77e Smack: onlycap limits on CAP_MAC_ADMIN
Smack is integrated with the POSIX capabilities scheme,
using the capabilities CAP_MAC_OVERRIDE and CAP_MAC_ADMIN to
determine if a process is allowed to ignore Smack checks or
change Smack related data respectively. Smack provides an
additional restriction that if an onlycap value is set
by writing to /smack/onlycap only tasks with that Smack
label are allowed to use CAP_MAC_OVERRIDE.

This change adds CAP_MAC_ADMIN as a capability that is affected
by the onlycap mechanism.

Targeted for git://git.gitorious.org/smack-next/kernel.git

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2012-07-13 15:49:23 -07:00
Casey Schaufler
eb982cb4cf Smack: fix smack_new_inode bogosities
In January of 2012 Al Viro pointed out three bits of code that
he titled "new_inode_smack bogosities". This patch repairs these
errors.

1. smack_sb_kern_mount() included a NULL check that is impossible.
   The check and NULL case are removed.
2. smack_kb_kern_mount() included pointless locking. The locking is
   removed. Since this is the only place that lock was used the lock
   is removed from the superblock_smack structure.
3. smk_fill_super() incorrectly and unnecessarily set the Smack label
   for the smackfs root inode. The assignment has been removed.

Targeted for git://gitorious.org/smack-next/kernel.git

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2012-07-13 15:49:23 -07:00
Dmitry Kasatkin
417c6c8ee2 ima: audit is compiled only when enabled
IMA auditing code was compiled even when CONFIG_AUDIT was not enabled.
This patch compiles auditing code only when possible and enabled.

Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2012-07-05 16:43:59 -04:00
Dmitry Kasatkin
7ff2267af5 ima: ima_initialized is set only if successful
Set ima_initialized only if initialization was successful.

Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2012-07-05 16:43:57 -04:00
Dmitry Kasatkin
8445d64dd7 ima: add policy for pseudo fs
Exclude DEVPTS and BINFMT filesystems from the measurement policy.

Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2012-07-05 16:42:33 -04:00
David S. Miller
c90a9bb907 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-07-05 03:44:25 -07:00
Paul Mundt
75331a597c security: Fix nommu build.
The security + nommu configuration presently blows up with an undefined
reference to BDI_CAP_EXEC_MAP:

security/security.c: In function 'mmap_prot':
security/security.c:687:36: error: dereferencing pointer to incomplete type
security/security.c:688:16: error: 'BDI_CAP_EXEC_MAP' undeclared (first use in this function)
security/security.c:688:16: note: each undeclared identifier is reported only once for each function it appears in

include backing-dev.h directly to fix it up.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-07-03 21:41:03 +10:00
Dmitry Kasatkin
c7de7adc18 ima: remove unused cleanup functions
IMA cannot be used as module and does not need __exit functions.
Removed them.

Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2012-07-02 16:43:30 -04:00
Dmitry Kasatkin
0ea4f8ae41 ima: free securityfs violations file
On ima_fs_init() error, free securityfs violations file.

Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
2012-07-02 16:43:30 -04:00
Mimi Zohar
08e1b76ae3 ima: use full pathnames in measurement list
The IMA measurement list contains filename hints, which can be
ambigious without the full pathname.  This patch replaces the
filename hint with the full pathname, simplifying for userspace
the correlating of file hash measurements with files.

Change log v1:
- Revert to short filenames, when full pathname is longer than IMA
  measurement buffer size. (Based on Dmitry's review)

Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2012-07-02 16:43:29 -04:00
Paul Mundt
659b5e7652 security: Fix nommu build.
The security + nommu configuration presently blows up with an undefined
reference to BDI_CAP_EXEC_MAP:

security/security.c: In function 'mmap_prot':
security/security.c:687:36: error: dereferencing pointer to incomplete type
security/security.c:688:16: error: 'BDI_CAP_EXEC_MAP' undeclared (first use in this function)
security/security.c:688:16: note: each undeclared identifier is reported only once for each function it appears in

include backing-dev.h directly to fix it up.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-07-02 23:56:04 +10:00
Pablo Neira Ayuso
a31f2d17b3 netlink: add netlink_kernel_cfg parameter to netlink_kernel_create
This patch adds the following structure:

struct netlink_kernel_cfg {
        unsigned int    groups;
        void            (*input)(struct sk_buff *skb);
        struct mutex    *cb_mutex;
};

That can be passed to netlink_kernel_create to set optional configurations
for netlink kernel sockets.

I've populated this structure by looking for NULL and zero parameters at the
existing code. The remaining parameters that always need to be set are still
left in the original interface.

That includes optional parameters for the netlink socket creation. This allows
easy extensibility of this interface in the future.

This patch also adapts all callers to use this new interface.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-29 16:46:02 -07:00
David S. Miller
01f534d0ae selinux: netlink: Move away from NLMSG_PUT().
And use nlmsg_data() while we're here too.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-26 21:54:06 -07:00
James Morris
66dd07b88a Merge commit 'v3.5-rc2' into next 2012-06-10 22:52:10 +10:00
Alban Crequy
2597a8344c netfilter: selinux: switch hook PFs to nfproto
This patch is a cleanup. Use NFPROTO_* for consistency with other
netfilter code.

Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk>
Reviewed-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
Reviewed-by: Vincent Sanders <vincent.sanders@collabora.co.uk>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07 14:58:43 +02:00
Linus Torvalds
1193755ac6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs changes from Al Viro.
 "A lot of misc stuff.  The obvious groups:
   * Miklos' atomic_open series; kills the damn abuse of
     ->d_revalidate() by NFS, which was the major stumbling block for
     all work in that area.
   * ripping security_file_mmap() and dealing with deadlocks in the
     area; sanitizing the neighborhood of vm_mmap()/vm_munmap() in
     general.
   * ->encode_fh() switched to saner API; insane fake dentry in
     mm/cleancache.c gone.
   * assorted annotations in fs (endianness, __user)
   * parts of Artem's ->s_dirty work (jff2 and reiserfs parts)
   * ->update_time() work from Josef.
   * other bits and pieces all over the place.

  Normally it would've been in two or three pull requests, but
  signal.git stuff had eaten a lot of time during this cycle ;-/"

Fix up trivial conflicts in Documentation/filesystems/vfs.txt (the
'truncate_range' inode method was removed by the VM changes, the VFS
update adds an 'update_time()' method), and in fs/btrfs/ulist.[ch] (due
to sparse fix added twice, with other changes nearby).

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (95 commits)
  nfs: don't open in ->d_revalidate
  vfs: retry last component if opening stale dentry
  vfs: nameidata_to_filp(): don't throw away file on error
  vfs: nameidata_to_filp(): inline __dentry_open()
  vfs: do_dentry_open(): don't put filp
  vfs: split __dentry_open()
  vfs: do_last() common post lookup
  vfs: do_last(): add audit_inode before open
  vfs: do_last(): only return EISDIR for O_CREAT
  vfs: do_last(): check LOOKUP_DIRECTORY
  vfs: do_last(): make ENOENT exit RCU safe
  vfs: make follow_link check RCU safe
  vfs: do_last(): use inode variable
  vfs: do_last(): inline walk_component()
  vfs: do_last(): make exit RCU safe
  vfs: split do_lookup()
  Btrfs: move over to use ->update_time
  fs: introduce inode operation ->update_time
  reiserfs: get rid of resierfs_sync_super
  reiserfs: mark the superblock as dirty a bit later
  ...
2012-06-01 10:34:35 -07:00
Al Viro
98de59bfe4 take calculation of final prot in security_mmap_file() into a helper
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 10:37:17 -04:00
Al Viro
8b3ec6814c take security_mmap_file() outside of ->mmap_sem
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 10:37:01 -04:00
Linus Torvalds
fb21affa49 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull second pile of signal handling patches from Al Viro:
 "This one is just task_work_add() series + remaining prereqs for it.

  There probably will be another pull request from that tree this
  cycle - at least for helpers, to get them out of the way for per-arch
  fixes remaining in the tree."

Fix trivial conflict in kernel/irq/manage.c: the merge of Andrew's pile
had brought in commit 97fd75b7b8 ("kernel/irq/manage.c: use the
pr_foo() infrastructure to prefix printks") which changed one of the
pr_err() calls that this merge moves around.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
  keys: kill task_struct->replacement_session_keyring
  keys: kill the dummy key_replace_session_keyring()
  keys: change keyctl_session_to_parent() to use task_work_add()
  genirq: reimplement exit_irq_thread() hook via task_work_add()
  task_work_add: generic process-context callbacks
  avr32: missed _TIF_NOTIFY_RESUME on one of do_notify_resume callers
  parisc: need to check NOTIFY_RESUME when exiting from syscall
  move key_repace_session_keyring() into tracehook_notify_resume()
  TIF_NOTIFY_RESUME is defined on all targets now
2012-05-31 18:47:30 -07:00
Christopher Yeoh
ac34ebb3a6 aio/vfs: cleanup of rw_copy_check_uvector() and compat_rw_copy_check_uvector()
A cleanup of rw_copy_check_uvector and compat_rw_copy_check_uvector after
changes made to support CMA in an earlier patch.

Rather than having an additional check_access parameter to these
functions, the first paramater type is overloaded to allow the caller to
specify CHECK_IOVEC_ONLY which means check that the contents of the iovec
are valid, but do not check the memory that they point to.  This is used
by process_vm_readv/writev where we need to validate that a iovec passed
to the syscall is valid but do not want to check the memory that it points
to at this point because it refers to an address space in another process.

Signed-off-by: Chris Yeoh <yeohc@au1.ibm.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:32 -07:00
Boaz Harrosh
81ab6e7b26 kmod: convert two call sites to call_usermodehelper_fns()
Both kernel/sys.c && security/keys/request_key.c where inlining the exact
same code as call_usermodehelper_fns(); So simply convert these sites to
directly use call_usermodehelper_fns().

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:28 -07:00
Andrew Morton
4f1c28d241 security/keys/keyctl.c: suppress memory allocation failure warning
This allocation may be large.  The code is probing to see if it will
succeed and if not, it falls back to vmalloc().  We should suppress any
page-allocation failure messages when the fallback happens.

Reported-by: Dave Jones <davej@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:26 -07:00
Al Viro
e5467859f7 split ->file_mmap() into ->mmap_addr()/->mmap_file()
... i.e. file-dependent and address-dependent checks.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-31 13:11:54 -04:00
Al Viro
d007794a18 split cap_mmap_addr() out of cap_file_mmap()
... switch callers.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-31 13:10:54 -04:00
Al Viro
cc1dad7183 selinuxfs snprintf() misuses
a) %d does _not_ produce a page worth of output
b) snprintf() doesn't return negatives - it used to in old glibc, but
that's the kernel...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:33 -04:00
David Howells
423b978802 KEYS: Fix some sparse warnings
Fix some sparse warnings in the keyrings code:

 (1) compat_keyctl_instantiate_key_iov() should be static.

 (2) There were a couple of places where a pointer was being compared against
     integer 0 rather than NULL.

 (3) keyctl_instantiate_key_common() should not take a __user-labelled iovec
     pointer as the caller must have copied the iovec to kernel space.

 (4) __key_link_begin() takes and __key_link_end() releases
     keyring_serialise_link_sem under some circumstances and so this should be
     declared.

     Note that adding __acquires() and __releases() for this doesn't help cure
     the warnings messages - something only commenting out both helps.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-05-25 20:51:42 +10:00
Oleg Nesterov
413cd3d9ab keys: change keyctl_session_to_parent() to use task_work_add()
Change keyctl_session_to_parent() to use task_work_add() and move
key_replace_session_keyring() logic into task_work->func().

Note that we do task_work_cancel() before task_work_add() to ensure that
only one work can be pending at any time.  This is important, we must not
allow user-space to abuse the parent's ->task_works list.

The callback, replace_session_keyring(), checks PF_EXITING.  I guess this
is not really needed but looks better.

As a side effect, this fixes the (unlikely) race.  The callers of
key_replace_session_keyring() and keyctl_session_to_parent() lack the
necessary barriers, the parent can miss the request.

Now we can remove task_struct->replacement_session_keyring and related
code.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexander Gordeev <agordeev@redhat.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Smith <dsmith@redhat.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-23 22:11:23 -04:00
Al Viro
1227dd773d TIF_NOTIFY_RESUME is defined on all targets now
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-23 22:09:19 -04:00
Linus Torvalds
644473e9c6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace enhancements from Eric Biederman:
 "This is a course correction for the user namespace, so that we can
  reach an inexpensive, maintainable, and reasonably complete
  implementation.

  Highlights:
   - Config guards make it impossible to enable the user namespace and
     code that has not been converted to be user namespace safe.

   - Use of the new kuid_t type ensures the if you somehow get past the
     config guards the kernel will encounter type errors if you enable
     user namespaces and attempt to compile in code whose permission
     checks have not been updated to be user namespace safe.

   - All uids from child user namespaces are mapped into the initial
     user namespace before they are processed.  Removing the need to add
     an additional check to see if the user namespace of the compared
     uids remains the same.

   - With the user namespaces compiled out the performance is as good or
     better than it is today.

   - For most operations absolutely nothing changes performance or
     operationally with the user namespace enabled.

   - The worst case performance I could come up with was timing 1
     billion cache cold stat operations with the user namespace code
     enabled.  This went from 156s to 164s on my laptop (or 156ns to
     164ns per stat operation).

   - (uid_t)-1 and (gid_t)-1 are reserved as an internal error value.
     Most uid/gid setting system calls treat these value specially
     anyway so attempting to use -1 as a uid would likely cause
     entertaining failures in userspace.

   - If setuid is called with a uid that can not be mapped setuid fails.
     I have looked at sendmail, login, ssh and every other program I
     could think of that would call setuid and they all check for and
     handle the case where setuid fails.

   - If stat or a similar system call is called from a context in which
     we can not map a uid we lie and return overflowuid.  The LFS
     experience suggests not lying and returning an error code might be
     better, but the historical precedent with uids is different and I
     can not think of anything that would break by lying about a uid we
     can't map.

   - Capabilities are localized to the current user namespace making it
     safe to give the initial user in a user namespace all capabilities.

  My git tree covers all of the modifications needed to convert the core
  kernel and enough changes to make a system bootable to runlevel 1."

Fix up trivial conflicts due to nearby independent changes in fs/stat.c

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits)
  userns:  Silence silly gcc warning.
  cred: use correct cred accessor with regards to rcu read lock
  userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq
  userns: Convert cgroup permission checks to use uid_eq
  userns: Convert tmpfs to use kuid and kgid where appropriate
  userns: Convert sysfs to use kgid/kuid where appropriate
  userns: Convert sysctl permission checks to use kuid and kgids.
  userns: Convert proc to use kuid/kgid where appropriate
  userns: Convert ext4 to user kuid/kgid where appropriate
  userns: Convert ext3 to use kuid/kgid where appropriate
  userns: Convert ext2 to use kuid/kgid where appropriate.
  userns: Convert devpts to use kuid/kgid where appropriate
  userns: Convert binary formats to use kuid/kgid where appropriate
  userns: Add negative depends on entries to avoid building code that is userns unsafe
  userns: signal remove unnecessary map_cred_ns
  userns: Teach inode_capable to understand inodes whose uids map to other namespaces.
  userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
  userns: Convert stat to return values mapped from kuids and kgids
  userns: Convert user specfied uids and gids in chown into kuids and kgid
  userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
  ...
2012-05-23 17:42:39 -07:00
Linus Torvalds
88d6ae8dc3 Merge branch 'for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
 "cgroup file type addition / removal is updated so that file types are
  added and removed instead of individual files so that dynamic file
  type addition / removal can be implemented by cgroup and used by
  controllers.  blkio controller changes which will come through block
  tree are dependent on this.  Other changes include res_counter cleanup
  and disallowing kthread / PF_THREAD_BOUND threads to be attached to
  non-root cgroups.

  There's a reported bug with the file type addition / removal handling
  which can lead to oops on cgroup umount.  The issue is being looked
  into.  It shouldn't cause problems for most setups and isn't a
  security concern."

Fix up trivial conflict in Documentation/feature-removal-schedule.txt

* 'for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits)
  res_counter: Account max_usage when calling res_counter_charge_nofail()
  res_counter: Merge res_counter_charge and res_counter_charge_nofail
  cgroups: disallow attaching kthreadd or PF_THREAD_BOUND threads
  cgroup: remove cgroup_subsys->populate()
  cgroup: get rid of populate for memcg
  cgroup: pass struct mem_cgroup instead of struct cgroup to socket memcg
  cgroup: make css->refcnt clearing on cgroup removal optional
  cgroup: use negative bias on css->refcnt to block css_tryget()
  cgroup: implement cgroup_rm_cftypes()
  cgroup: introduce struct cfent
  cgroup: relocate __d_cgrp() and __d_cft()
  cgroup: remove cgroup_add_file[s]()
  cgroup: convert memcg controller to the new cftype interface
  memcg: always create memsw files if CONFIG_CGROUP_MEM_RES_CTLR_SWAP
  cgroup: convert all non-memcg controllers to the new cftype interface
  cgroup: relocate cftype and cgroup_subsys definitions in controllers
  cgroup: merge cft_release_agent cftype array into the base files array
  cgroup: implement cgroup_add_cftypes() and friends
  cgroup: build list of all cgroups under a given cgroupfs_root
  cgroup: move cgroup_clear_directory() call out of cgroup_populate_dir()
  ...
2012-05-22 17:40:19 -07:00
Linus Torvalds
cb60e3e65c Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
 "New notable features:
   - The seccomp work from Will Drewry
   - PR_{GET,SET}_NO_NEW_PRIVS from Andy Lutomirski
   - Longer security labels for Smack from Casey Schaufler
   - Additional ptrace restriction modes for Yama by Kees Cook"

Fix up trivial context conflicts in arch/x86/Kconfig and include/linux/filter.h

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (65 commits)
  apparmor: fix long path failure due to disconnected path
  apparmor: fix profile lookup for unconfined
  ima: fix filename hint to reflect script interpreter name
  KEYS: Don't check for NULL key pointer in key_validate()
  Smack: allow for significantly longer Smack labels v4
  gfp flags for security_inode_alloc()?
  Smack: recursive tramsmute
  Yama: replace capable() with ns_capable()
  TOMOYO: Accept manager programs which do not start with / .
  KEYS: Add invalidation support
  KEYS: Do LRU discard in full keyrings
  KEYS: Permit in-place link replacement in keyring list
  KEYS: Perform RCU synchronisation on keys prior to key destruction
  KEYS: Announce key type (un)registration
  KEYS: Reorganise keys Makefile
  KEYS: Move the key config into security/keys/Kconfig
  KEYS: Use the compat keyctl() syscall wrapper on Sparc64 for Sparc32 compat
  Yama: remove an unused variable
  samples/seccomp: fix dependencies on arch macros
  Yama: add additional ptrace scopes
  ...
2012-05-21 20:27:36 -07:00
James Morris
ff2bb047c4 Merge branch 'master' of git://git.infradead.org/users/eparis/selinux into next
Per pull request, for 3.5.
2012-05-22 11:21:06 +10:00
John Johansen
cffee16e8b apparmor: fix long path failure due to disconnected path
BugLink: http://bugs.launchpad.net/bugs/955892

All failures from __d_path where being treated as disconnected paths,
however __d_path can also fail when the generated pathname is too long.

The initial ENAMETOOLONG error was being lost, and ENAMETOOLONG was only
returned if the subsequent dentry_path call resulted in that error.  Other
wise if the path was split across a mount point such that the dentry_path
fit within the buffer when the __d_path did not the failure was treated
as a disconnected path.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2012-05-18 11:09:52 -07:00
John Johansen
bf83208e0b apparmor: fix profile lookup for unconfined
BugLink: http://bugs.launchpad.net/bugs/978038

also affects apparmor portion of
BugLink: http://bugs.launchpad.net/bugs/987371

The unconfined profile is not stored in the regular profile list, but
change_profile and exec transitions may want access to it when setting
up specialized transitions like switch to the unconfined profile of a
new policy namespace.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2012-05-18 11:09:28 -07:00
Mimi Zohar
fbbb456347 ima: fix filename hint to reflect script interpreter name
When IMA was first upstreamed, the bprm filename and interp were
always the same.  Currently, the bprm->filename and bprm->interp
are the same, except for when only bprm->interp contains the
interpreter name.  So instead of using the bprm->filename as
the IMA filename hint in the measurement list, we could replace
it with bprm->interp, but this feels too fragil.

The following patch is not much better, but at least there is some
indication that sometimes we're passing the filename and other times
the interpreter name.

Reported-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-05-16 10:36:41 +10:00
James Morris
12fa8a2732 Merge branch 'for-1205' of http://git.gitorious.org/smack-next/kernel into next
Pull request from Casey.
2012-05-16 01:11:29 +10:00
David Howells
b404aef72f KEYS: Don't check for NULL key pointer in key_validate()
Don't bother checking for NULL key pointer in key_validate() as all of the
places that call it will crash anyway if the relevant key pointer is NULL by
the time they call key_validate().  Therefore, the checking must be done prior
to calling here.

Whilst we're at it, simplify the key_validate() function a bit and mark its
argument const.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-05-16 00:54:33 +10:00
Casey Schaufler
f7112e6c9a Smack: allow for significantly longer Smack labels v4
V4 updated to current linux-security#next
Targeted for git://gitorious.org/smack-next/kernel.git

Modern application runtime environments like to use
naming schemes that are structured and generated without
human intervention. Even though the Smack limit of 23
characters for a label name is perfectly rational for
human use there have been complaints that the limit is
a problem in environments where names are composed from
a set or sources, including vendor, author, distribution
channel and application name. Names like

	softwarehouse-pgwodehouse-coolappstore-mellowmuskrats

are becoming harder to avoid. This patch introduces long
label support in Smack. Labels are now limited to 255
characters instead of the old 23.

The primary reason for limiting the labels to 23 characters
was so they could be directly contained in CIPSO category sets.
This is still done were possible, but for labels that are too
large a mapping is required. This is perfectly safe for communication
that stays "on the box" and doesn't require much coordination
between boxes beyond what would have been required to keep label
names consistent.

The bulk of this patch is in smackfs, adding and updating
administrative interfaces. Because existing APIs can't be
changed new ones that do much the same things as old ones
have been introduced.

The Smack specific CIPSO data representation has been removed
and replaced with the data format used by netlabel. The CIPSO
header is now computed when a label is imported rather than
on use. This results in improved IP performance. The smack
label is now allocated separately from the containing structure,
allowing for larger strings.

Four new /smack interfaces have been introduced as four
of the old interfaces strictly required labels be specified
in fixed length arrays.

The access interface is supplemented with the check interface:
	access  "Subject                 Object                  rwxat"
	access2 "Subject Object rwaxt"

The load interface is supplemented with the rules interface:
	load   "Subject                 Object                  rwxat"
	load2  "Subject Object rwaxt"

The load-self interface is supplemented with the self-rules interface:
	load-self   "Subject                 Object                  rwxat"
	load-self2  "Subject Object rwaxt"

The cipso interface is supplemented with the wire interface:
	cipso  "Subject                  lvl cnt  c1  c2 ..."
	cipso2 "Subject lvl cnt  c1  c2 ..."

The old interfaces are maintained for compatibility.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2012-05-14 22:48:38 -07:00
Tetsuo Handa
ceffec5541 gfp flags for security_inode_alloc()?
Dave Chinner wrote:
> Yes, because you have no idea what the calling context is except
> for the fact that is from somewhere inside filesystem code and the
> filesystem could be holding locks. Therefore, GFP_NOFS is really the
> only really safe way to allocate memory here.

I see. Thank you.

I'm not sure, but can call trace happen where somewhere inside network
filesystem or stackable filesystem code with locks held invokes operations that
involves GFP_KENREL memory allocation outside that filesystem?
----------
[PATCH] SMACK: Fix incorrect GFP_KERNEL usage.

new_inode_smack() which can be called from smack_inode_alloc_security() needs
to use GFP_NOFS like SELinux's inode_alloc_security() does, for
security_inode_alloc() is called from inode_init_always() and
inode_init_always() is called from xfs_inode_alloc() which is using GFP_NOFS.

smack_inode_init_security() needs to use GFP_NOFS like
selinux_inode_init_security() does, for initxattrs() callback function (e.g.
btrfs_initxattrs()) which is called from security_inode_init_security() is
using GFP_NOFS.

smack_audit_rule_match() needs to use GFP_ATOMIC, for
security_audit_rule_match() can be called from audit_filter_user_rules() and
audit_filter_user_rules() is called from audit_filter_user() with RCU read lock
held.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Casey Schaufler <cschaufler@cschaufler-intel.(none)>
2012-05-14 22:47:44 -07:00
Casey Schaufler
2267b13a7c Smack: recursive tramsmute
The transmuting directory feature of Smack requires that
the transmuting attribute be explicitly set in all cases.
It seems the users of this facility would expect that the
transmuting attribute be inherited by subdirectories that
are created in a transmuting directory. This does not seem
to add any additional complexity to the understanding of
how the system works.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2012-05-14 22:45:17 -07:00
Kees Cook
2cc8a71641 Yama: replace capable() with ns_capable()
When checking capabilities, the question we want to be asking is "does
current() have the capability in the child's namespace?"

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-05-15 10:27:57 +10:00
Tetsuo Handa
77b513dda9 TOMOYO: Accept manager programs which do not start with / .
The pathname of /usr/sbin/tomoyo-editpolicy seen from Ubuntu 12.04 Live CD is
squashfs:/usr/sbin/tomoyo-editpolicy rather than /usr/sbin/tomoyo-editpolicy .
Therefore, we need to accept manager programs which do not start with / .

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-05-15 10:24:29 +10:00
David Howells
fd75815f72 KEYS: Add invalidation support
Add support for invalidating a key - which renders it immediately invisible to
further searches and causes the garbage collector to immediately wake up,
remove it from keyrings and then destroy it when it's no longer referenced.

It's better not to do this with keyctl_revoke() as that marks the key to start
returning -EKEYREVOKED to searches when what is actually desired is to have the
key refetched.

To invalidate a key the caller must be granted SEARCH permission by the key.
This may be too strict.  It may be better to also permit invalidation if the
caller has any of READ, WRITE or SETATTR permission.

The primary use for this is to evict keys that are cached in special keyrings,
such as the DNS resolver or an ID mapper.

Signed-off-by: David Howells <dhowells@redhat.com>
2012-05-11 10:56:56 +01:00
David Howells
31d5a79d7f KEYS: Do LRU discard in full keyrings
Do an LRU discard in keyrings that are full rather than returning ENFILE.  To
perform this, a time_t is added to the key struct and updated by the creation
of a link to a key and by a key being found as the result of a search.  At the
completion of a successful search, the keyrings in the path between the root of
the search and the first found link to it also have their last-used times
updated.

Note that discarding a link to a key from a keyring does not necessarily
destroy the key as there may be references held by other places.

An alternate discard method that might suffice is to perform FIFO discard from
the keyring, using the spare 2-byte hole in the keylist header as the index of
the next link to be discarded.

This is useful when using a keyring as a cache for DNS results or foreign
filesystem IDs.


This can be tested by the following.  As root do:

	echo 1000 >/proc/sys/kernel/keys/root_maxkeys

	kr=`keyctl newring foo @s`
	for ((i=0; i<2000; i++)); do keyctl add user a$i a $kr; done

Without this patch ENFILE should be reported when the keyring fills up.  With
this patch, the keyring discards keys in an LRU fashion.  Note that the stored
LRU time has a granularity of 1s.

After doing this, /proc/key-users can be observed and should show that most of
the 2000 keys have been discarded:

	[root@andromeda ~]# cat /proc/key-users
	    0:   517 516/516 513/1000 5249/20000

The "513/1000" here is the number of quota-accounted keys present for this user
out of the maximum permitted.

In /proc/keys, the keyring shows the number of keys it has and the number of
slots it has allocated:

	[root@andromeda ~]# grep foo /proc/keys
	200c64c4 I--Q--     1 perm 3b3f0000     0     0 keyring   foo: 509/509

The maximum is (PAGE_SIZE - header) / key pointer size.  That's typically 509
on a 64-bit system and 1020 on a 32-bit system.

Signed-off-by: David Howells <dhowells@redhat.com>
2012-05-11 10:56:56 +01:00
David Howells
233e4735f2 KEYS: Permit in-place link replacement in keyring list
Make use of the previous patch that makes the garbage collector perform RCU
synchronisation before destroying defunct keys.  Key pointers can now be
replaced in-place without creating a new keyring payload and replacing the
whole thing as the discarded keys will not be destroyed until all currently
held RCU read locks are released.

If the keyring payload space needs to be expanded or contracted, then a
replacement will still need allocating, and the original will still have to be
freed by RCU.

Signed-off-by: David Howells <dhowells@redhat.com>
2012-05-11 10:56:56 +01:00
David Howells
65d87fe68a KEYS: Perform RCU synchronisation on keys prior to key destruction
Make the keys garbage collector invoke synchronize_rcu() prior to destroying
keys with a zero usage count.  This means that a key can be examined under the
RCU read lock in the safe knowledge that it won't get deallocated until after
the lock is released - even if its usage count becomes zero whilst we're
looking at it.

This is useful in keyring search vs key link.  Consider a keyring containing a
link to a key.  That link can be replaced in-place in the keyring without
requiring an RCU copy-and-replace on the keyring contents without breaking a
search underway on that keyring when the displaced key is released, provided
the key is actually destroyed only after the RCU read lock held by the search
algorithm is released.

This permits __key_link() to replace a key without having to reallocate the key
payload.  A key gets replaced if a new key being linked into a keyring has the
same type and description.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
2012-05-11 10:56:56 +01:00
David Howells
1eb1bcf5bf KEYS: Announce key type (un)registration
Announce the (un)registration of a key type in the core key code rather than
in the callers.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Mimi Zohar <zohar@us.ibm.com>
2012-05-11 10:56:56 +01:00
David Howells
9f7ce8e249 KEYS: Reorganise keys Makefile
Reorganise the keys directory Makefile to put all the core bits together and
the type-specific bits after.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Mimi Zohar <zohar@us.ibm.com>
2012-05-11 10:56:56 +01:00
David Howells
f0894940ae KEYS: Move the key config into security/keys/Kconfig
Move the key config into security/keys/Kconfig as there are going to be a lot
of key-related options.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Mimi Zohar <zohar@us.ibm.com>
2012-05-11 10:56:56 +01:00
Pablo Neira Ayuso
d16cf20e2f netfilter: remove ip_queue support
This patch removes ip_queue support which was marked as obsolete
years ago. The nfnetlink_queue modules provides more advanced
user-space packet queueing mechanism.

This patch also removes capability code included in SELinux that
refers to ip_queue. Otherwise, we break compilation.

Several warning has been sent regarding this to the mailing list
in the past month without anyone rising the hand to stop this
with some strong argument.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-08 20:25:42 +02:00
James Morris
898bfc1d46 Linux 3.4-rc5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQEcBAABAgAGBQJPnb50AAoJEHm+PkMAQRiGAE0H/A4zFZIUGmF3miKPDYmejmrZ
 oVDYxVAu6JHjHWhu8E3VsinvyVscowjV8dr15eSaQzmDmRkSHAnUQ+dB7Di7jLC2
 MNopxsWjwyZ8zvvr3rFR76kjbWKk/1GYytnf7GPZLbJQzd51om2V/TY/6qkwiDSX
 U8Tt7ihSgHAezefqEmWp2X/1pxDCEt+VFyn9vWpkhgdfM1iuzF39MbxSZAgqDQ/9
 JJrBHFXhArqJguhENwL7OdDzkYqkdzlGtS0xgeY7qio2CzSXxZXK4svT6FFGA8Za
 xlAaIvzslDniv3vR2ZKd6wzUwFHuynX222hNim3QMaYdXm012M+Nn1ufKYGFxI0=
 =4d4w
 -----END PGP SIGNATURE-----

Merge tag 'v3.4-rc5' into next

Linux 3.4-rc5

Merge to pull in prerequisite change for Smack:
86812bb0de

Requested by Casey.
2012-05-04 12:46:40 +10:00
Eric W. Biederman
18815a1808 userns: Convert capabilities related permsion checks
- Use uid_eq when comparing kuids
  Use gid_eq when comparing kgids
- Use make_kuid(user_ns, 0) to talk about the user_namespace root uid

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-05-03 03:28:40 -07:00
Eric W. Biederman
078de5f706 userns: Store uid and gid values in struct cred with kuid_t and kgid_t types
cred.h and a few trivial users of struct cred are changed.  The rest of the users
of struct cred are left for other patches as there are too many changes to make
in one go and leave the change reviewable.  If the user namespace is disabled and
CONFIG_UIDGID_STRICT_TYPE_CHECKS are disabled the code will contiue to compile
and behave correctly.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-05-03 03:28:38 -07:00
Eric W. Biederman
ae2975bc34 userns: Convert group_info values from gid_t to kgid_t.
As a first step to converting struct cred to be all kuid_t and kgid_t
values convert the group values stored in group_info to always be
kgid_t values.   Unless user namespaces are used this change should
have no effect.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-05-03 03:27:21 -07:00
Eric W. Biederman
783291e690 userns: Simplify the user_namespace by making userns->creator a kuid.
- Transform userns->creator from a user_struct reference to a simple
  kuid_t, kgid_t pair.

  In cap_capable this allows the check to see if we are the creator of
  a namespace to become the classic suser style euid permission check.

  This allows us to remove the need for a struct cred in the mapping
  functions and still be able to dispaly the user namespace creators
  uid and gid as 0.

- Remove the now unnecessary delayed_work in free_user_ns.

  All that is left for free_user_ns to do is to call kmem_cache_free
  and put_user_ns.  Those functions can be called in any context
  so call them directly from free_user_ns removing the need for delayed work.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-04-26 02:00:59 -07:00
Dan Carpenter
08162e6a23 Yama: remove an unused variable
GCC complains that we don't use "one" any more after 389da25f93 "Yama:
add additional ptrace scopes".

security/yama/yama_lsm.c:322:12: warning: ?one? defined but not used
	[-Wunused-variable]

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-04-23 17:20:22 +10:00
Kees Cook
389da25f93 Yama: add additional ptrace scopes
This expands the available Yama ptrace restrictions to include two more
modes. Mode 2 requires CAP_SYS_PTRACE for PTRACE_ATTACH, and mode 3
completely disables PTRACE_ATTACH (and locks the sysctl).

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-04-19 13:39:56 +10:00
Jonghwan Choi
51b79bee62 security: fix compile error in commoncap.c
Add missing "personality.h"
security/commoncap.c: In function 'cap_bprm_set_creds':
security/commoncap.c:510: error: 'PER_CLEAR_ON_SETID' undeclared (first use in this function)
security/commoncap.c:510: error: (Each undeclared identifier is reported only once
security/commoncap.c:510: error: for each function it appears in.)

Signed-off-by: Jonghwan Choi <jhbird.choi@samsung.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-04-19 12:56:39 +10:00
Eric Paris
d52fc5dde1 fcaps: clear the same personality flags as suid when fcaps are used
If a process increases permissions using fcaps all of the dangerous
personality flags which are cleared for suid apps should also be cleared.
Thus programs given priviledge with fcaps will continue to have address space
randomization enabled even if the parent tried to disable it to make it
easier to attack.

Signed-off-by: Eric Paris <eparis@redhat.com>
Reviewed-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-04-18 12:37:56 +10:00
Casey Schaufler
86812bb0de Smack: move label list initialization
A kernel with Smack enabled will fail if tmpfs has xattr support.

Move the initialization of predefined Smack label
list entries to the LSM initialization from the
smackfs setup. This became an issue when tmpfs
acquired xattr support, but was never correct.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-04-18 12:02:28 +10:00
John Johansen
c29bceb396 Fix execve behavior apparmor for PR_{GET,SET}_NO_NEW_PRIVS
Add support for AppArmor to explicitly fail requested domain transitions
if NO_NEW_PRIVS is set and the task is not unconfined.

Transitions from unconfined are still allowed because this always results
in a reduction of privileges.

Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>

v18: new acked-by, new description
Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-04-14 11:13:18 +10:00
Andy Lutomirski
259e5e6c75 Add PR_{GET,SET}_NO_NEW_PRIVS to prevent execve from granting privs
With this change, calling
  prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)
disables privilege granting operations at execve-time.  For example, a
process will not be able to execute a setuid binary to change their uid
or gid if this bit is set.  The same is true for file capabilities.

Additionally, LSM_UNSAFE_NO_NEW_PRIVS is defined to ensure that
LSMs respect the requested behavior.

To determine if the NO_NEW_PRIVS bit is set, a task may call
  prctl(PR_GET_NO_NEW_PRIVS, 0, 0, 0, 0);
It returns 1 if set and 0 if it is not set. If any of the arguments are
non-zero, it will return -1 and set errno to -EINVAL.
(PR_SET_NO_NEW_PRIVS behaves similarly.)

This functionality is desired for the proposed seccomp filter patch
series.  By using PR_SET_NO_NEW_PRIVS, it allows a task to modify the
system call behavior for itself and its child tasks without being
able to impact the behavior of a more privileged task.

Another potential use is making certain privileged operations
unprivileged.  For example, chroot may be considered "safe" if it cannot
affect privileged tasks.

Note, this patch causes execve to fail when PR_SET_NO_NEW_PRIVS is
set and AppArmor is in use.  It is fixed in a subsequent patch.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Eric Paris <eparis@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>

v18: updated change desc
v17: using new define values as per 3.4
Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-04-14 11:13:18 +10:00
Kees Cook
923e9a1399 Smack: build when CONFIG_AUDIT not defined
This fixes builds where CONFIG_AUDIT is not defined and
CONFIG_SECURITY_SMACK=y.

This got introduced by the stack-usage reducation commit 48c62af68a
("LSM: shrink the common_audit_data data union").

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-10 16:14:40 -07:00
Eric Paris
c737f8284c SELinux: remove unused common_audit_data in flush_unauthorized_files
We don't need this variable and it just eats stack space.  Remove it.

Signed-off-by: Eric Paris <eparis@redhat.com>
2012-04-09 12:23:57 -04:00
Wanlong Gao
562c99f20d SELinux: avc: remove the useless fields in avc_add_callback
avc_add_callback now just used for registering reset functions
in initcalls, and the callback functions just did reset operations.
So, reducing the arguments to only one event is enough now.

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2012-04-09 12:23:44 -04:00
Wanlong Gao
0b36e44cc6 SELinux: replace weak GFP_ATOMIC to GFP_KERNEL in avc_add_callback
avc_add_callback now only called from initcalls, so replace the
weak GFP_ATOMIC to GFP_KERNEL, and mark this function __init
to make a warning when not been called from initcalls.

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2012-04-09 12:23:07 -04:00
Eric Paris
899838b25f SELinux: unify the selinux_audit_data and selinux_late_audit_data
We no longer need the distinction.  We only need data after we decide to do an
audit.  So turn the "late" audit data into just "data" and remove what we
currently have as "data".

Signed-off-by: Eric Paris <eparis@redhat.com>
2012-04-09 12:23:06 -04:00
Eric Paris
1d34929271 SELinux: remove auditdeny from selinux_audit_data
It's just takin' up space.

Signed-off-by: Eric Paris <eparis@redhat.com>
2012-04-09 12:23:05 -04:00
Eric Paris
50c205f5e5 LSM: do not initialize common_audit_data to 0
It isn't needed.  If you don't set the type of the data associated with
that type it is a pretty obvious programming bug.  So why waste the cycles?

Signed-off-by: Eric Paris <eparis@redhat.com>
2012-04-09 12:23:04 -04:00
Eric Paris
07f62eb66c LSM: BUILD_BUG_ON if the common_audit_data union ever grows
We did a lot of work to shrink the common_audit_data.  Add a BUILD_BUG_ON
so future programers (let's be honest, probably me) won't do something
foolish like make it large again!

Signed-off-by: Eric Paris <eparis@redhat.com>
2012-04-09 12:23:03 -04:00
Eric Paris
b466066f9b LSM: remove the task field from common_audit_data
There are no legitimate users.  Always use current and get back some stack
space for the common_audit_data.

Signed-off-by: Eric Paris <eparis@redhat.com>
2012-04-09 12:23:03 -04:00
Eric Paris
0972c74ecb apparmor: move task from common_audit_data to apparmor_audit_data
apparmor is the only LSM that uses the common_audit_data tsk field.
Instead of making all LSMs pay for the stack space move the aa usage into
the apparmor_audit_data.

Signed-off-by: Eric Paris <eparis@redhat.com>
2012-04-09 12:23:02 -04:00
Eric Paris
bd5e50f9c1 LSM: remove the COMMON_AUDIT_DATA_INIT type expansion
Just open code it so grep on the source code works better.

Signed-off-by: Eric Paris <eparis@redhat.com>
2012-04-09 12:23:01 -04:00
Eric Paris
d4cf970d07 SELinux: move common_audit_data to a noinline slow path function
selinux_inode_has_perm is a hot path.  Instead of declaring the
common_audit_data on the stack move it to a noinline function only used in
the rare case we need to send an audit message.

Signed-off-by: Eric Paris <eparis@redhat.com>
2012-04-09 12:23:00 -04:00
Eric Paris
602a8dd6ea SELinux: remove inode_has_perm_noadp
Both callers could better be using file_has_perm() to get better audit
results.

Signed-off-by: Eric Paris <eparis@redhat.com>
2012-04-09 12:23:00 -04:00
Eric Paris
2e33405785 SELinux: delay initialization of audit data in selinux_inode_permission
We pay a rather large overhead initializing the common_audit_data.
Since we only need this information if we actually emit an audit
message there is little need to set it up in the hot path.  This patch
splits the functionality of avc_has_perm() into avc_has_perm_noaudit(),
avc_audit_required() and slow_avc_audit().  But we take care of setting
up to audit between required() and the actual audit call.  Thus saving
measurable time in a hot path.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Eric Paris <eparis@redhat.com>
2012-04-09 12:22:59 -04:00
Eric Paris
154c50ca4e SELinux: if sel_make_bools errors don't leave inconsistent state
We reset the bool names and values array to NULL, but do not reset the
number of entries in these arrays to 0.  If we error out and then get back
into this function we will walk these NULL pointers based on the belief
that they are non-zero length.

Signed-off-by: Eric Paris <eparis@redhat.com>
cc: stable@kernel.org
2012-04-09 12:22:58 -04:00
Eric Paris
92ae9e82d9 SELinux: remove needless sel_div function
I'm not really sure what the idea behind the sel_div function is, but it's
useless.  Since a and b are both unsigned, it's impossible for a % b < 0.
That means that part of the function never does anything.  Thus it's just a
normal /.  Just do that instead.  I don't even understand what that operation
was supposed to mean in the signed case however....

If it was signed:
sel_div(-2, 4) == ((-2 / 4) - ((-2 % 4) < 0))
		  ((0)      - ((-2)     < 0))
		  ((0)      - (1))
		  (-1)

What actually happens:
sel_div(-2, 4) == ((18446744073709551614 / 4) - ((18446744073709551614 % 4) < 0))
		  ((4611686018427387903)      - ((2 < 0))
		  (4611686018427387903        - 0)
		  ((unsigned int)4611686018427387903)
		  (4294967295)

Neither makes a whole ton of sense to me.  So I'm getting rid of the
function entirely.

Signed-off-by: Eric Paris <eparis@redhat.com>
2012-04-09 12:22:57 -04:00
Eric Paris
bb7081ab93 SELinux: possible NULL deref in context_struct_to_string
It's possible that the caller passed a NULL for scontext.  However if this
is a defered mapping we might still attempt to call *scontext=kstrdup().
This is bad.  Instead just return the len.

Signed-off-by: Eric Paris <eparis@redhat.com>
2012-04-09 12:22:56 -04:00
Eric Paris
d6ea83ec68 SELinux: audit failed attempts to set invalid labels
We know that some yum operation is causing CAP_MAC_ADMIN failures.  This
implies that an RPM is laying down (or attempting to lay down) a file with
an invalid label.  The problem is that we don't have any information to
track down the cause.  This patch will cause such a failure to report the
failed label in an SELINUX_ERR audit message.  This is similar to the
SELINUX_ERR reports on invalid transitions and things like that.  It should
help run down problems on what is trying to set invalid labels in the
future.

Resulting records look something like:
type=AVC msg=audit(1319659241.138:71): avc:  denied  { mac_admin } for pid=2594 comm="chcon" capability=33 scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=capability2
type=SELINUX_ERR msg=audit(1319659241.138:71): op=setxattr invalid_context=unconfined_u:object_r:hello:s0
type=SYSCALL msg=audit(1319659241.138:71): arch=c000003e syscall=188 success=no exit=-22 a0=a2c0e0 a1=390341b79b a2=a2d620 a3=1f items=1 ppid=2519 pid=2594 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=1 comm="chcon" exe="/usr/bin/chcon" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null)
type=CWD msg=audit(1319659241.138:71):  cwd="/root" type=PATH msg=audit(1319659241.138:71): item=0 name="test" inode=785879 dev=fc:03 mode=0100644 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:admin_home_t:s0

Signed-off-by: Eric Paris <eparis@redhat.com>
2012-04-09 12:22:56 -04:00
Eric Paris
83d498569e SELinux: rename dentry_open to file_open
dentry_open takes a file, rename it to file_open

Signed-off-by: Eric Paris <eparis@redhat.com>
2012-04-09 12:22:50 -04:00
Eric Paris
95dbf73931 SELinux: check OPEN on truncate calls
In RH BZ 578841 we realized that the SELinux sandbox program was allowed to
truncate files outside of the sandbox.  The reason is because sandbox
confinement is determined almost entirely by the 'open' permission.  The idea
was that if the sandbox was unable to open() files it would be unable to do
harm to those files.  This turns out to be false in light of syscalls like
truncate() and chmod() which don't require a previous open() call.  I looked
at the syscalls that did not have an associated 'open' check and found that
truncate(), did not have a seperate permission and even if it did have a
separate permission such a permission owuld be inadequate for use by
sandbox (since it owuld have to be granted so liberally as to be useless).
This patch checks the OPEN permission on truncate.  I think a better solution
for sandbox is a whole new permission, but at least this fixes what we have
today.

Signed-off-by: Eric Paris <eparis@redhat.com>
2012-04-09 12:22:49 -04:00
Eric Paris
eed7795d0a SELinux: add default_type statements
Because Fedora shipped userspace based on my development tree we now
have policy version 27 in the wild defining only default user, role, and
range.  Thus to add default_type we need a policy.28.

Signed-off-by: Eric Paris <eparis@redhat.com>
2012-04-09 12:22:48 -04:00
Eric Paris
aa893269de SELinux: allow default source/target selectors for user/role/range
When new objects are created we have great and flexible rules to
determine the type of the new object.  We aren't quite as flexible or
mature when it comes to determining the user, role, and range.  This
patch adds a new ability to specify the place a new objects user, role,
and range should come from.  For users and roles it can come from either
the source or the target of the operation.  aka for files the user can
either come from the source (the running process and todays default) or
it can come from the target (aka the parent directory of the new file)

examples always are done with
directory context: system_u:object_r:mnt_t:s0-s0:c0.c512
process context: unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

[no rule]
	unconfined_u:object_r:mnt_t:s0   test_none
[default user source]
	unconfined_u:object_r:mnt_t:s0   test_user_source
[default user target]
	system_u:object_r:mnt_t:s0       test_user_target
[default role source]
	unconfined_u:unconfined_r:mnt_t:s0 test_role_source
[default role target]
	unconfined_u:object_r:mnt_t:s0   test_role_target
[default range source low]
	unconfined_u:object_r:mnt_t:s0 test_range_source_low
[default range source high]
	unconfined_u:object_r:mnt_t:s0:c0.c1023 test_range_source_high
[default range source low-high]
	unconfined_u:object_r:mnt_t:s0-s0:c0.c1023 test_range_source_low-high
[default range target low]
	unconfined_u:object_r:mnt_t:s0 test_range_target_low
[default range target high]
	unconfined_u:object_r:mnt_t:s0:c0.c512 test_range_target_high
[default range target low-high]
	unconfined_u:object_r:mnt_t:s0-s0:c0.c512 test_range_target_low-high

Signed-off-by: Eric Paris <eparis@redhat.com>
2012-04-09 12:22:47 -04:00
Eric Paris
72e8c8593f SELinux: loosen DAC perms on reading policy
There is no reason the DAC perms on reading the policy file need to be root
only.  There are selinux checks which should control this access.

Signed-off-by: Eric Paris <eparis@redhat.com>
2012-04-09 12:22:36 -04:00
Eric Paris
47a93a5bcb SELinux: allow seek operations on the file exposing policy
sesearch uses:
lseek(3, 0, SEEK_SET)                   = -1 ESPIPE (Illegal seek)

Make that work.

Signed-off-by: Eric Paris <eparis@redhat.com>
2012-04-09 12:22:30 -04:00
Eric W. Biederman
aeb3ae9da9 userns: Add an explicit reference to the parent user namespace
I am about to remove the struct user_namespace reference from struct user_struct.
So keep an explicit track of the parent user namespace.

Take advantage of this new reference and replace instances of user_ns->creator->user_ns
with user_ns->parent.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-04-07 16:55:52 -07:00
Eric W. Biederman
0093ccb68f cred: Refcount the user_ns pointed to by the cred.
struct user_struct will shortly loose it's user_ns reference
so make the cred user_ns reference a proper reference complete
with reference counting.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-04-07 16:55:52 -07:00
Eric W. Biederman
c4a4d60379 userns: Use cred->user_ns instead of cred->user->user_ns
Optimize performance and prepare for the removal of the user_ns reference
from user_struct.  Remove the slow long walk through cred->user->user_ns and
instead go straight to cred->user_ns.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-04-07 16:55:51 -07:00
Linus Torvalds
b61c37f579 lsm_audit: don't specify the audit pre/post callbacks in 'struct common_audit_data'
It just bloats the audit data structure for no good reason, since the
only time those fields are filled are just before calling the
common_lsm_audit() function, which is also the only user of those
fields.

So just make them be the arguments to common_lsm_audit(), rather than
bloating that structure that is passed around everywhere, and is
initialized in hot paths.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-03 09:49:59 -07:00
Eric Paris
3f0882c482 SELinux: do not allocate stack space for AVC data unless needed
Instead of declaring the entire selinux_audit_data on the stack when we
start an operation on declare it on the stack if we are going to use it.
We know it's usefulness at the end of the security decision and can declare
it there.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-03 09:49:41 -07:00
Eric Paris
f8294f1144 SELinux: remove avd from slow_avc_audit()
We don't use the argument, so remove it.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-03 09:49:10 -07:00
Eric Paris
7f6a47cf14 SELinux: remove avd from selinux_audit_data
We do not use it.  Remove it.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-03 09:49:10 -07:00
Eric Paris
48c62af68a LSM: shrink the common_audit_data data union
After shrinking the common_audit_data stack usage for private LSM data I'm
not going to shrink the data union.  To do this I'm going to move anything
larger than 2 void * ptrs to it's own structure and require it to be declared
separately on the calling stack.  Thus hot paths which don't need more than
a couple pointer don't have to declare space to hold large unneeded
structures.  I could get this down to one void * by dealing with the key
struct and the struct path.  We'll see if that is helpful after taking care of
networking.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-03 09:49:10 -07:00
Eric Paris
3b3b0e4fc1 LSM: shrink sizeof LSM specific portion of common_audit_data
Linus found that the gigantic size of the common audit data caused a big
perf hit on something as simple as running stat() in a loop.  This patch
requires LSMs to declare the LSM specific portion separately rather than
doing it in a union.  Thus each LSM can be responsible for shrinking their
portion and don't have to pay a penalty just because other LSMs have a
bigger space requirement.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-03 09:48:40 -07:00
Tejun Heo
4baf6e3325 cgroup: convert all non-memcg controllers to the new cftype interface
Convert debug, freezer, cpuset, cpu_cgroup, cpuacct, net_prio, blkio,
net_cls and device controllers to use the new cftype based interface.
Termination entry is added to cftype arrays and populate callbacks are
replaced with cgroup_subsys->base_cftypes initializations.

This is functionally identical transformation.  There shouldn't be any
visible behavior change.

memcg is rather special and will be converted separately.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <paul@paulmenage.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Vivek Goyal <vgoyal@redhat.com>
2012-04-01 12:09:55 -07:00
Linus Torvalds
8bb1f22952 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull second try at vfs part d#2 from Al Viro:
 "Miklos' first series (with do_lookup() rewrite split into edible
  chunks) + assorted bits and pieces.

  The 'untangling of do_lookup()' series is is a splitup of what used to
  be a monolithic patch from Miklos, so this series is basically "how do
  I convince myself that his patch is correct (or find a hole in it)".
  No holes found and I like the resulting cleanup, so in it went..."

Changes from try 1: Fix a boot problem with selinux, and commit messages
prettied up a bit.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (24 commits)
  vfs: fix out-of-date dentry_unhash() comment
  vfs: split __lookup_hash
  untangling do_lookup() - take __lookup_hash()-calling case out of line.
  untangling do_lookup() - switch to calling __lookup_hash()
  untangling do_lookup() - merge d_alloc_and_lookup() callers
  untangling do_lookup() - merge failure exits in !dentry case
  untangling do_lookup() - massage !dentry case towards __lookup_hash()
  untangling do_lookup() - get rid of need_reval in !dentry case
  untangling do_lookup() - eliminate a loop.
  untangling do_lookup() - expand the area under ->i_mutex
  untangling do_lookup() - isolate !dentry stuff from the rest of it.
  vfs: move MAY_EXEC check from __lookup_hash()
  vfs: don't revalidate just looked up dentry
  vfs: fix d_need_lookup/d_revalidate order in do_lookup
  ext3: move headers to fs/ext3/
  migrate ext2_fs.h guts to fs/ext2/ext2.h
  new helper: ext2_image_size()
  get rid of pointless includes of ext2_fs.h
  ext2: No longer export ext2_fs.h to user space
  mtdchar: kill persistently held vfsmount
  ...
2012-03-31 13:42:57 -07:00
Al Viro
2f99c36986 get rid of pointless includes of ext2_fs.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-31 16:03:15 -04:00
Al Viro
a1c2aa1e86 selinuxfs: merge dentry allocation into sel_make_dir()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-31 16:03:15 -04:00
Linus Torvalds
cdb0f9a1ad selinux: inline avc_audit() and avc_has_perm_noaudit() into caller
Now that all the slow-path code is gone from these functions, we can
inline them into the main caller - avc_has_perm_flags().

Now the compiler can see that 'avc' is allocated on the stack for this
case, which helps register pressure a bit.  It also actually shrinks the
total stack frame, because the stack frame that avc_has_perm_flags()
always needed (for that 'avc' allocation) is now sufficient for the
inlined functions too.

Inlining isn't bad - but mindless inlining of cold code (see the
previous commit) is.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-31 11:24:22 -07:00
Linus Torvalds
a554bea899 selinux: don't inline slow-path code into avc_has_perm_noaudit()
The selinux AVC paths remain some of the hottest (and deepest) codepaths
at filename lookup time, and we make it worse by having the slow path
cases take up I$ and stack space even when they don't trigger.  Gcc
tends to always want to inline functions that are just called once -
never mind that this might make for slower and worse code in the caller.

So this tries to improve on it a bit by making the slow-path cases
explicitly separate functions that are marked noinline, causing gcc to
at least no longer allocate stack space for them unless they are
actually called.  It also seems to help register allocation a tiny bit,
since gcc now doesn't take the slow case code into account.

Uninlining the slow path may also allow us to inline the remaining hot
path into the one caller that actually matters: avc_has_perm_flags().
I'll have to look at that separately, but both avc_audit() and
avc_has_perm_noaudit() are now small and lean enough that inlining them
may make sense.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-31 11:24:22 -07:00
Linus Torvalds
a591afc01d Merge branch 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x32 support for x86-64 from Ingo Molnar:
 "This tree introduces the X32 binary format and execution mode for x86:
  32-bit data space binaries using 64-bit instructions and 64-bit kernel
  syscalls.

  This allows applications whose working set fits into a 32 bits address
  space to make use of 64-bit instructions while using a 32-bit address
  space with shorter pointers, more compressed data structures, etc."

Fix up trivial context conflicts in arch/x86/{Kconfig,vdso/vma.c}

* 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
  x32: Fix alignment fail in struct compat_siginfo
  x32: Fix stupid ia32/x32 inversion in the siginfo format
  x32: Add ptrace for x32
  x32: Switch to a 64-bit clock_t
  x32: Provide separate is_ia32_task() and is_x32_task() predicates
  x86, mtrr: Use explicit sizing and padding for the 64-bit ioctls
  x86/x32: Fix the binutils auto-detect
  x32: Warn and disable rather than error if binutils too old
  x32: Only clear TIF_X32 flag once
  x32: Make sure TS_COMPAT is cleared for x32 tasks
  fs: Remove missed ->fds_bits from cessation use of fd_set structs internally
  fs: Fix close_on_exec pointer in alloc_fdtable
  x32: Drop non-__vdso weak symbols from the x32 VDSO
  x32: Fix coding style violations in the x32 VDSO code
  x32: Add x32 VDSO support
  x32: Allow x32 to be configured
  x32: If configured, add x32 system calls to system call tables
  x32: Handle process creation
  x32: Signal-related system calls
  x86: Add #ifdef CONFIG_COMPAT to <asm/sys_ia32.h>
  ...
2012-03-29 18:12:23 -07:00
Linus Torvalds
0195c00244 Disintegrate and delete asm/system.h
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIVAwUAT3NKzROxKuMESys7AQKElw/+JyDxJSlj+g+nymkx8IVVuU8CsEwNLgRk
 8KEnRfLhGtkXFLSJYWO6jzGo16F8Uqli1PdMFte/wagSv0285/HZaKlkkBVHdJ/m
 u40oSjgT013bBh6MQ0Oaf8pFezFUiQB5zPOA9QGaLVGDLXCmgqUgd7exaD5wRIwB
 ZmyItjZeAVnDfk1R+ZiNYytHAi8A5wSB+eFDCIQYgyulA1Igd1UnRtx+dRKbvc/m
 rWQ6KWbZHIdvP1ksd8wHHkrlUD2pEeJ8glJLsZUhMm/5oMf/8RmOCvmo8rvE/qwl
 eDQ1h4cGYlfjobxXZMHqAN9m7Jg2bI946HZjdb7/7oCeO6VW3FwPZ/Ic75p+wp45
 HXJTItufERYk6QxShiOKvA+QexnYwY0IT5oRP4DrhdVB/X9cl2MoaZHC+RbYLQy+
 /5VNZKi38iK4F9AbFamS7kd0i5QszA/ZzEzKZ6VMuOp3W/fagpn4ZJT1LIA3m4A9
 Q0cj24mqeyCfjysu0TMbPtaN+Yjeu1o1OFRvM8XffbZsp5bNzuTDEvviJ2NXw4vK
 4qUHulhYSEWcu9YgAZXvEWDEM78FXCkg2v/CrZXH5tyc95kUkMPcgG+QZBB5wElR
 FaOKpiC/BuNIGEf02IZQ4nfDxE90QwnDeoYeV+FvNj9UEOopJ5z5bMPoTHxm4cCD
 NypQthI85pc=
 =G9mT
 -----END PGP SIGNATURE-----

Merge tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system

Pull "Disintegrate and delete asm/system.h" from David Howells:
 "Here are a bunch of patches to disintegrate asm/system.h into a set of
  separate bits to relieve the problem of circular inclusion
  dependencies.

  I've built all the working defconfigs from all the arches that I can
  and made sure that they don't break.

  The reason for these patches is that I recently encountered a circular
  dependency problem that came about when I produced some patches to
  optimise get_order() by rewriting it to use ilog2().

  This uses bitops - and on the SH arch asm/bitops.h drags in
  asm-generic/get_order.h by a circuituous route involving asm/system.h.

  The main difficulty seems to be asm/system.h.  It holds a number of
  low level bits with no/few dependencies that are commonly used (eg.
  memory barriers) and a number of bits with more dependencies that
  aren't used in many places (eg.  switch_to()).

  These patches break asm/system.h up into the following core pieces:

    (1) asm/barrier.h

        Move memory barriers here.  This already done for MIPS and Alpha.

    (2) asm/switch_to.h

        Move switch_to() and related stuff here.

    (3) asm/exec.h

        Move arch_align_stack() here.  Other process execution related bits
        could perhaps go here from asm/processor.h.

    (4) asm/cmpxchg.h

        Move xchg() and cmpxchg() here as they're full word atomic ops and
        frequently used by atomic_xchg() and atomic_cmpxchg().

    (5) asm/bug.h

        Move die() and related bits.

    (6) asm/auxvec.h

        Move AT_VECTOR_SIZE_ARCH here.

  Other arch headers are created as needed on a per-arch basis."

Fixed up some conflicts from other header file cleanups and moving code
around that has happened in the meantime, so David's testing is somewhat
weakened by that.  We'll find out anything that got broken and fix it..

* tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system: (38 commits)
  Delete all instances of asm/system.h
  Remove all #inclusions of asm/system.h
  Add #includes needed to permit the removal of asm/system.h
  Move all declarations of free_initmem() to linux/mm.h
  Disintegrate asm/system.h for OpenRISC
  Split arch_align_stack() out from asm-generic/system.h
  Split the switch_to() wrapper out of asm-generic/system.h
  Move the asm-generic/system.h xchg() implementation to asm-generic/cmpxchg.h
  Create asm-generic/barrier.h
  Make asm-generic/cmpxchg.h #include asm-generic/cmpxchg-local.h
  Disintegrate asm/system.h for Xtensa
  Disintegrate asm/system.h for Unicore32 [based on ver #3, changed by gxt]
  Disintegrate asm/system.h for Tile
  Disintegrate asm/system.h for Sparc
  Disintegrate asm/system.h for SH
  Disintegrate asm/system.h for Score
  Disintegrate asm/system.h for S390
  Disintegrate asm/system.h for PowerPC
  Disintegrate asm/system.h for PA-RISC
  Disintegrate asm/system.h for MN10300
  ...
2012-03-28 15:58:21 -07:00
David Howells
9ffc93f203 Remove all #inclusions of asm/system.h
Remove all #inclusions of asm/system.h preparatory to splitting and killing
it.  Performed with the following command:

perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *`

Signed-off-by: David Howells <dhowells@redhat.com>
2012-03-28 18:30:03 +01:00
John Johansen
0421ea91dd apparmor: Fix change_onexec when called from a confined task
Fix failure in aa_change_onexec api when the request is made from a confined
task.  This failure was caused by two problems

 The AA_MAY_ONEXEC perm was not being mapped correctly for this case.

 The executable name was being checked as second time instead of using the
 requested onexec profile name, which may not be the same as the exec
 profile name. This mistake can not be exploited to grant extra permission
 because of the above flaw where the ONEXEC permission was not being mapped
 so it will not be granted.

BugLink: http://bugs.launchpad.net/bugs/963756

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-03-28 01:00:05 +11:00
David Howells
778aae84ef SELinux: selinux/xfrm.h needs net/flow.h
selinux/xfrm.h needs to #include net/flow.h or else suffer:

In file included from security/selinux/ss/services.c:69:0:
security/selinux/include/xfrm.h: In function 'selinux_xfrm_notify_policyload':
security/selinux/include/xfrm.h:53:14: error: 'flow_cache_genid' undeclared (first use in this function)
security/selinux/include/xfrm.h:53:14: note: each undeclared identifier is reported only once for each function it appears in

Signed-off-by: David Howells <dhowells@redhat.com>
2012-03-26 16:38:47 +01:00
Oleg Nesterov
9d944ef32e usermodehelper: kill umh_wait, renumber UMH_* constants
No functional changes.  It is not sane to use UMH_KILLABLE with enum
umh_wait, but obviously we do not want another argument in
call_usermodehelper_* helpers.  Kill this enum, use the plain int.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:41 -07:00
Oleg Nesterov
70834d3070 usermodehelper: use UMH_WAIT_PROC consistently
A few call_usermodehelper() callers use the hardcoded constant instead of
the proper UMH_WAIT_PROC, fix them.

Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Michal Januszewski <spock@gentoo.org>
Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:41 -07:00
Linus Torvalds
f63d395d47 NFS client updates for Linux 3.4
New features include:
 - Add NFS client support for containers.
   This should enable most of the necessary functionality, including
   lockd support, and support for rpc.statd, NFSv4 idmapper and
   RPCSEC_GSS upcalls into the correct network namespace from
   which the mount system call was issued.
 - NFSv4 idmapper scalability improvements
   Base the idmapper cache on the keyring interface to allow concurrent
   access to idmapper entries. Start the process of migrating users from
   the single-threaded daemon-based approach to the multi-threaded
   request-key based approach.
 - NFSv4.1 implementation id.
   Allows the NFSv4.1 client and server to mutually identify each other
   for logging and debugging purposes.
 - Support the 'vers=4.1' mount option for mounting NFSv4.1 instead of
   having to use the more counterintuitive 'vers=4,minorversion=1'.
 - SUNRPC tracepoints.
   Start the process of adding tracepoints in order to improve debugging
   of the RPC layer.
 - pNFS object layout support for autologin.
 
 Important bugfixes include:
 - Fix a bug in rpc_wake_up/rpc_wake_up_status that caused them to fail
   to wake up all tasks when applied to priority waitqueues.
 - Ensure that we handle read delegations correctly, when we try to
   truncate a file.
 - A number of fixes for NFSv4 state manager loops (mostly to do with
   delegation recovery).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJPalZbAAoJEGcL54qWCgDyCi4P+QHcmzQhJO7HWx3Pzjs67bFT
 xMSYaKHGWS4AJKUBVl5OKBxUExfrMHBNbElV3IKUIwBlDx8RVtnwfptKSe146iki
 dn4TrRO5es8nmI4hRDcGMlzJDZq4y0Qg//qiUFmojiNW/Avw0ljfMoVUejJJ09FV
 oeDk4EGtcxkEyH+g48ZjYbyspRnG8qtD3atf70Z3lYE0ELdG/B5Dyzw1RDrA5p73
 xJX3lqy8p/4ROzw/dmNoxdAXOrr3Q4/T58Bvp/lUglPy/EHyPmWzFoH0MU0C/PFu
 5VnAl6QDbNCTcIw9FvJlX/mIyErpNG9eKzUskUc9L9SA+B+J/i4rIap4KATRN3nH
 7QhE5qUacPuJnvxml7MPmlQTuft3fkAQ7NhKIWrbRi1QS9FmJC5NxctIb8loqlFn
 yIXdKeLfMshB+NyuFS9uzStX7SmV3eMgVd+5ZxRjYxm+PKJLw2KXeudArL6M5mHK
 3QeKZpqwaYQ3RfaTNpvAp0doiXHCO5UbWfI0Pe8xQs/QcMCNReffqV2G4IJKFAu6
 WpoN2UDQC9LCBifLw2nS7kku8+ZVXLQU8OC1NVl3TG15xD9cNLXuk3/y5llPGq4O
 odo52uLFpJohbDaHMj5RTKOfchTQCm2iyuVmxZEeAySypMSiAXmW7COSKHs/HxI1
 VBm+EI00Pvmm5+fUjIlp
 =LuHE
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.4-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates for Linux 3.4 from Trond Myklebust:
 "New features include:
   - Add NFS client support for containers.

     This should enable most of the necessary functionality, including
     lockd support, and support for rpc.statd, NFSv4 idmapper and
     RPCSEC_GSS upcalls into the correct network namespace from which
     the mount system call was issued.

   - NFSv4 idmapper scalability improvements

     Base the idmapper cache on the keyring interface to allow
     concurrent access to idmapper entries.  Start the process of
     migrating users from the single-threaded daemon-based approach to
     the multi-threaded request-key based approach.

   - NFSv4.1 implementation id.

     Allows the NFSv4.1 client and server to mutually identify each
     other for logging and debugging purposes.

   - Support the 'vers=4.1' mount option for mounting NFSv4.1 instead of
     having to use the more counterintuitive 'vers=4,minorversion=1'.

   - SUNRPC tracepoints.

     Start the process of adding tracepoints in order to improve
     debugging of the RPC layer.

   - pNFS object layout support for autologin.

  Important bugfixes include:

   - Fix a bug in rpc_wake_up/rpc_wake_up_status that caused them to
     fail to wake up all tasks when applied to priority waitqueues.

   - Ensure that we handle read delegations correctly, when we try to
     truncate a file.

   - A number of fixes for NFSv4 state manager loops (mostly to do with
     delegation recovery)."

* tag 'nfs-for-3.4-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (224 commits)
  NFS: fix sb->s_id in nfs debug prints
  xprtrdma: Remove assumption that each segment is <= PAGE_SIZE
  xprtrdma: The transport should not bug-check when a dup reply is received
  pnfs-obj: autologin: Add support for protocol autologin
  NFS: Remove nfs4_setup_sequence from generic rename code
  NFS: Remove nfs4_setup_sequence from generic unlink code
  NFS: Remove nfs4_setup_sequence from generic read code
  NFS: Remove nfs4_setup_sequence from generic write code
  NFS: Fix more NFS debug related build warnings
  SUNRPC/LOCKD: Fix build warnings when CONFIG_SUNRPC_DEBUG is undefined
  nfs: non void functions must return a value
  SUNRPC: Kill compiler warning when RPC_DEBUG is unset
  SUNRPC/NFS: Add Kbuild dependencies for NFS_DEBUG/RPC_DEBUG
  NFS: Use cond_resched_lock() to reduce latencies in the commit scans
  NFSv4: It is not safe to dereference lsp->ls_state in release_lockowner
  NFS: ncommit count is being double decremented
  SUNRPC: We must not use list_for_each_entry_safe() in rpc_wake_up()
  Try using machine credentials for RENEW calls
  NFSv4.1: Fix a few issues in filelayout_commit_pagelist
  NFSv4.1: Clean ups and bugfixes for the pNFS read/writeback/commit code
  ...
2012-03-23 08:53:47 -07:00
Linus Torvalds
48aab2f79d security: optimize avc_audit() common path
avc_audit() did a lot of jumping around and had a big stack frame, all
for the uncommon case.

Split up the uncommon case (which we really can't make go fast anyway)
into its own slow function, and mark the conditional branches
appropriately for the common likely case.

This causes avc_audit() to no longer show up as one of the hottest
functions on the branch profiles (the new "perf -b" thing), and makes
the cycle profiles look really nice and dense too.

The whole audit path is still annoyingly very much one of the biggest
costs of name lookup, so these things are worth optimizing for.  I wish
we could just tell people to turn it off, but realistically we do need
it: we just need to make sure that the overhead of the necessary evil is
as low as possible.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-22 17:01:41 -07:00
Linus Torvalds
e2a0883e40 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile 1 from Al Viro:
 "This is _not_ all; in particular, Miklos' and Jan's stuff is not there
  yet."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (64 commits)
  ext4: initialization of ext4_li_mtx needs to be done earlier
  debugfs-related mode_t whack-a-mole
  hfsplus: add an ioctl to bless files
  hfsplus: change finder_info to u32
  hfsplus: initialise userflags
  qnx4: new helper - try_extent()
  qnx4: get rid of qnx4_bread/qnx4_getblk
  take removal of PF_FORKNOEXEC to flush_old_exec()
  trim includes in inode.c
  um: uml_dup_mmap() relies on ->mmap_sem being held, but activate_mm() doesn't hold it
  um: embed ->stub_pages[] into mmu_context
  gadgetfs: list_for_each_safe() misuse
  ocfs2: fix leaks on failure exits in module_init
  ecryptfs: make register_filesystem() the last potential failure exit
  ntfs: forgets to unregister sysctls on register_filesystem() failure
  logfs: missing cleanup on register_filesystem() failure
  jfs: mising cleanup on register_filesystem() failure
  make configfs_pin_fs() return root dentry on success
  configfs: configfs_create_dir() has parent dentry in dentry->d_parent
  configfs: sanitize configfs_create()
  ...
2012-03-21 13:36:41 -07:00
Linus Torvalds
3556485f15 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates for 3.4 from James Morris:
 "The main addition here is the new Yama security module from Kees Cook,
  which was discussed at the Linux Security Summit last year.  Its
  purpose is to collect miscellaneous DAC security enhancements in one
  place.  This also marks a departure in policy for LSM modules, which
  were previously limited to being standalone access control systems.
  Chromium OS is using Yama, and I believe there are plans for Ubuntu,
  at least.

  This patchset also includes maintenance updates for AppArmor, TOMOYO
  and others."

Fix trivial conflict in <net/sock.h> due to the jumo_label->static_key
rename.

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (38 commits)
  AppArmor: Fix location of const qualifier on generated string tables
  TOMOYO: Return error if fails to delete a domain
  AppArmor: add const qualifiers to string arrays
  AppArmor: Add ability to load extended policy
  TOMOYO: Return appropriate value to poll().
  AppArmor: Move path failure information into aa_get_name and rename
  AppArmor: Update dfa matching routines.
  AppArmor: Minor cleanup of d_namespace_path to consolidate error handling
  AppArmor: Retrieve the dentry_path for error reporting when path lookup fails
  AppArmor: Add const qualifiers to generated string tables
  AppArmor: Fix oops in policy unpack auditing
  AppArmor: Fix error returned when a path lookup is disconnected
  KEYS: testing wrong bit for KEY_FLAG_REVOKED
  TOMOYO: Fix mount flags checking order.
  security: fix ima kconfig warning
  AppArmor: Fix the error case for chroot relative path name lookup
  AppArmor: fix mapping of META_READ to audit and quiet flags
  AppArmor: Fix underflow in xindex calculation
  AppArmor: Fix dropping of allowed operations that are force audited
  AppArmor: Add mising end of structure test to caps unpacking
  ...
2012-03-21 13:25:04 -07:00
Linus Torvalds
9f3938346a Merge branch 'kmap_atomic' of git://github.com/congwang/linux
Pull kmap_atomic cleanup from Cong Wang.

It's been in -next for a long time, and it gets rid of the (no longer
used) second argument to k[un]map_atomic().

Fix up a few trivial conflicts in various drivers, and do an "evil
merge" to catch some new uses that have come in since Cong's tree.

* 'kmap_atomic' of git://github.com/congwang/linux: (59 commits)
  feature-removal-schedule.txt: schedule the deprecated form of kmap_atomic() for removal
  highmem: kill all __kmap_atomic() [swarren@nvidia.com: highmem: Fix ARM build break due to __kmap_atomic rename]
  drbd: remove the second argument of k[un]map_atomic()
  zcache: remove the second argument of k[un]map_atomic()
  gma500: remove the second argument of k[un]map_atomic()
  dm: remove the second argument of k[un]map_atomic()
  tomoyo: remove the second argument of k[un]map_atomic()
  sunrpc: remove the second argument of k[un]map_atomic()
  rds: remove the second argument of k[un]map_atomic()
  net: remove the second argument of k[un]map_atomic()
  mm: remove the second argument of k[un]map_atomic()
  lib: remove the second argument of k[un]map_atomic()
  power: remove the second argument of k[un]map_atomic()
  kdb: remove the second argument of k[un]map_atomic()
  udf: remove the second argument of k[un]map_atomic()
  ubifs: remove the second argument of k[un]map_atomic()
  squashfs: remove the second argument of k[un]map_atomic()
  reiserfs: remove the second argument of k[un]map_atomic()
  ocfs2: remove the second argument of k[un]map_atomic()
  ntfs: remove the second argument of k[un]map_atomic()
  ...
2012-03-21 09:40:26 -07:00
Al Viro
40ffe67d2e switch unix_sock to struct path
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-20 21:29:41 -04:00
Linus Torvalds
0d9cabdcce Merge branch 'for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup changes from Tejun Heo:
 "Out of the 8 commits, one fixes a long-standing locking issue around
  tasklist walking and others are cleanups."

* 'for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: Walk task list under tasklist_lock in cgroup_enable_task_cg_list
  cgroup: Remove wrong comment on cgroup_enable_task_cg_list()
  cgroup: remove cgroup_subsys argument from callbacks
  cgroup: remove extra calls to find_existing_css_set
  cgroup: replace tasklist_lock with rcu_read_lock
  cgroup: simplify double-check locking in cgroup_attach_proc
  cgroup: move struct cgroup_pidlist out from the header file
  cgroup: remove cgroup_attach_task_current_cg()
2012-03-20 18:11:21 -07:00
Cong Wang
c58e0377d6 tomoyo: remove the second argument of k[un]map_atomic()
Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-03-20 21:48:28 +08:00
James Morris
09f61cdbb3 Merge branch 'for-security' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor into next 2012-03-20 12:52:17 +11:00
Tetsuo Handa
7e570145cb AppArmor: Fix location of const qualifier on generated string tables
Signed-off-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: John Johansen <john.johansen@canonical.com>
2012-03-19 18:22:46 -07:00
Tetsuo Handa
7d7473dbdb TOMOYO: Return error if fails to delete a domain
Call sequence:
tomoyo_write_domain() --> tomoyo_delete_domain()

In 'tomoyo_delete_domain', return -EINTR if locking attempt is
interrupted by signal.

At present it returns success to its caller 'tomoyo_write_domain()'
even though domain is not deleted. 'tomoyo_write_domain()' assumes
domain is deleted and returns success to its caller. This is wrong behaviour.

'tomoyo_write_domain' should return error from tomoyo_delete_domain() to its
caller.

Signed-off-by: Santosh Nayak <santoshprasadnayak@gmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-03-20 12:06:50 +11:00
James Morris
b01d3fb921 Merge branch 'for-security' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor into next 2012-03-15 14:43:02 +11:00
Jan Engelhardt
2d4cee7e3a AppArmor: add const qualifiers to string arrays
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: John Johansen <john.johansen@canonical.com>
2012-03-14 19:09:13 -07:00
John Johansen
ad5ff3db53 AppArmor: Add ability to load extended policy
Add the base support for the new policy extensions. This does not bring
any additional functionality, or change current semantics.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <kees@ubuntu.com>
2012-03-14 19:09:03 -07:00
Tetsuo Handa
6041e8346f TOMOYO: Return appropriate value to poll().
"struct file_operations"->poll() expects "unsigned int" return value.
All files in /sys/kernel/security/tomoyo/ directory other than
/sys/kernel/security/tomoyo/query and /sys/kernel/security/tomoyo/audit should
return POLLIN | POLLRDNORM | POLLOUT | POLLWRNORM rather than -ENOSYS.
Also, /sys/kernel/security/tomoyo/query and /sys/kernel/security/tomoyo/audit
should return POLLOUT | POLLWRNORM rather than 0 when there is no data to read.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-03-15 12:29:18 +11:00
John Johansen
57fa1e1809 AppArmor: Move path failure information into aa_get_name and rename
Move the path name lookup failure messages into the main path name lookup
routine, as the information is useful in more than just aa_path_perm.

Also rename aa_get_name to aa_path_name as it is not getting a reference
counted object with a corresponding put fn.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <kees@ubuntu.com>
2012-03-14 06:15:25 -07:00
John Johansen
0fe1212d05 AppArmor: Update dfa matching routines.
Update aa_dfa_match so that it doesn't result in an input string being
walked twice (once to get its length and another time to match)

Add a single step functions
  aa_dfa_next

Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <kees@ubuntu.com>
2012-03-14 06:15:24 -07:00
John Johansen
3372b68a3c AppArmor: Minor cleanup of d_namespace_path to consolidate error handling
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <kees@ubuntu.com>
2012-03-14 06:15:23 -07:00
John Johansen
fbba8d89ac AppArmor: Retrieve the dentry_path for error reporting when path lookup fails
When __d_path and d_absolute_path fail due to the name being outside of
the current namespace no name is reported.  Use dentry_path to provide
some hint as to which file was being accessed.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <kees@ubuntu.com>
2012-03-14 06:15:22 -07:00
John Johansen
33e521acff AppArmor: Add const qualifiers to generated string tables
Signed-off-by: John Johansen <john.johansen@canonical.com>
2012-03-14 06:15:12 -07:00
John Johansen
b1b4bc2ed9 AppArmor: Fix oops in policy unpack auditing
Post unpacking of policy a verification pass is made on x transition
indexes.  When this fails a call to audit_iface is made resulting in an
oops, because audit_iface is expecting a valid buffer position but
since the failure comes from post unpack verification there is none.

Make the position argument optional so that audit_iface can be called
from post unpack verification.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2012-03-14 06:15:02 -07:00
John Johansen
ef9a762279 AppArmor: Fix error returned when a path lookup is disconnected
The returning of -ESATLE when a path lookup fails as disconnected is wrong.
Since AppArmor is rejecting the access return -EACCES instead.

This also fixes a bug in complain (learning) mode where disconnected paths
are denied because -ESTALE errors are not ignored causing failures that
can change application behavior.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2012-03-14 06:14:52 -07:00
Dan Carpenter
f67dabbdde KEYS: testing wrong bit for KEY_FLAG_REVOKED
The test for "if (cred->request_key_auth->flags & KEY_FLAG_REVOKED) {"
should actually testing that the (1 << KEY_FLAG_REVOKED) bit is set.
The current code actually checks for KEY_FLAG_DEAD.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-03-07 11:12:06 +11:00
Bryan Schumaker
59e6b9c113 Created a function for setting timeouts on keys
The keyctl_set_timeout function isn't exported to other parts of the
kernel, but I want to use it for the NFS idmapper.  I already have the
key, but I wanted a generic way to set the timeout.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-01 16:50:31 -05:00
Tetsuo Handa
df91e49477 TOMOYO: Fix mount flags checking order.
Userspace can pass in arbitrary combinations of MS_* flags to mount().

If both MS_BIND and one of MS_SHARED/MS_PRIVATE/MS_SLAVE/MS_UNBINDABLE are
passed, device name which should be checked for MS_BIND was not checked because
MS_SHARED/MS_PRIVATE/MS_SLAVE/MS_UNBINDABLE had higher priority than MS_BIND.

If both one of MS_BIND/MS_MOVE and MS_REMOUNT are passed, device name which
should not be checked for MS_REMOUNT was checked because MS_BIND/MS_MOVE had
higher priority than MS_REMOUNT.

Fix these bugs by changing priority to MS_REMOUNT -> MS_BIND ->
MS_SHARED/MS_PRIVATE/MS_SLAVE/MS_UNBINDABLE -> MS_MOVE as with do_mount() does.

Also, unconditionally return -EINVAL if more than one of
MS_SHARED/MS_PRIVATE/MS_SLAVE/MS_UNBINDABLE is passed so that TOMOYO will not
generate inaccurate audit logs, for commit 7a2e8a8f "VFS: Sanity check mount
flags passed to change_mnt_propagation()" clarified that these flags must be
exclusively passed.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-03-01 10:23:19 +11:00
Randy Dunlap
a69f158902 security: fix ima kconfig warning
Fix IMA kconfig warning on non-X86 architectures:

warning: (IMA) selects TCG_TIS which has unmet direct dependencies
(TCG_TPM && X86)

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Rajiv Andrade <srajiv@linux.vnet.ibm.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-02-28 11:01:15 +11:00
John Johansen
28042fabf4 AppArmor: Fix the error case for chroot relative path name lookup
When a chroot relative pathname lookup fails it is falling through to
do a d_absolute_path lookup.  This is incorrect as d_absolute_path should
only be used to lookup names for namespace absolute paths.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <kees@ubuntu.com>
2012-02-27 11:38:23 -08:00
John Johansen
38305a4bab AppArmor: fix mapping of META_READ to audit and quiet flags
The mapping of AA_MAY_META_READ for the allow mask was also being mapped
to the audit and quiet masks. This would result in some operations being
audited when the should not.

This flaw was hidden by the previous audit bug which would drop some
messages that where supposed to be audited.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <kees@ubuntu.com>
2012-02-27 11:38:22 -08:00
John Johansen
8b964eae20 AppArmor: Fix underflow in xindex calculation
If the xindex value stored in the accept tables is 0, the extraction of
that value will result in an underflow (0 - 4).

In properly compiled policy this should not happen for file rules but
it may be possible for other rule types in the future.

To exploit this underflow a user would have to be able to load a corrupt
policy, which requires CAP_MAC_ADMIN, overwrite system policy in kernel
memory or know of a compiler error resulting in the flaw being present
for loaded policy (no such flaw is known at this time).

Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <kees@ubuntu.com>
2012-02-27 11:38:21 -08:00
John Johansen
ade3ddc01e AppArmor: Fix dropping of allowed operations that are force audited
The audit permission flag, that specifies an audit message should be
provided when an operation is allowed, was being ignored in some cases.

This is because the auto audit mode (which determines the audit mode from
system flags) was incorrectly assigned the same value as audit mode. The
shared value would result in messages that should be audited going through
a second evaluation as to whether they should be audited based on the
auto audit, resulting in some messages being dropped.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <kees@ubuntu.com>
2012-02-27 11:38:21 -08:00
John Johansen
cdbd2884df AppArmor: Add mising end of structure test to caps unpacking
The unpacking of struct capsx is missing a check for the end of the
caps structure.  This can lead to unpack failures depending on what else
is packed into the policy file being unpacked.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <kees@ubuntu.com>
2012-02-27 11:38:20 -08:00
Kees Cook
d384b0a1a3 AppArmor: export known rlimit names/value mappings in securityfs
Since the parser needs to know which rlimits are known to the kernel,
export the list via a mask file in the "rlimit" subdirectory in the
securityfs "features" directory.

Signed-off-by: Kees Cook <kees@ubuntu.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
2012-02-27 11:38:19 -08:00
Kees Cook
a9bf8e9fd5 AppArmor: add "file" details to securityfs
Create the "file" directory in the securityfs for tracking features
related to files.

Signed-off-by: Kees Cook <kees@ubuntu.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
2012-02-27 11:38:18 -08:00
Kees Cook
e74abcf335 AppArmor: add initial "features" directory to securityfs
This adds the "features" subdirectory to the AppArmor securityfs
to display boolean features flags and the known capability mask.

Signed-off-by: Kees Cook <kees@ubuntu.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
2012-02-27 11:38:17 -08:00
Kees Cook
9acd494be9 AppArmor: refactor securityfs to use structures
Use a file tree structure to represent the AppArmor securityfs.

Signed-off-by: Kees Cook <kees@ubuntu.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
2012-02-27 11:38:09 -08:00
David Howells
1fd36adcd9 Replace the fd_sets in struct fdtable with an array of unsigned longs
Replace the fd_sets in struct fdtable with an array of unsigned longs and then
use the standard non-atomic bit operations rather than the FD_* macros.

This:

 (1) Removes the abuses of struct fd_set:

     (a) Since we don't want to allocate a full fd_set the vast majority of the
     	 time, we actually, in effect, just allocate a just-big-enough array of
     	 unsigned longs and cast it to an fd_set type - so why bother with the
     	 fd_set at all?

     (b) Some places outside of the core fdtable handling code (such as
     	 SELinux) want to look inside the array of unsigned longs hidden inside
     	 the fd_set struct for more efficient iteration over the entire set.

 (2) Eliminates the use of FD_*() macros in the kernel completely.

 (3) Permits the __FD_*() macros to be deleted entirely where not exposed to
     userspace.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: http://lkml.kernel.org/r/20120216174954.23314.48147.stgit@warthog.procyon.org.uk
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
2012-02-19 10:30:57 -08:00
Eric Paris
b0d5de4d58 IMA: fix audit res field to indicate 1 for success and 0 for failure
The audit res field ususally indicates success with a 1 and 0 for a
failure.  So make IMA do it the same way.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
2012-02-16 12:01:42 +11:00
Kees Cook
bf06189e4d Yama: add PR_SET_PTRACER_ANY
For a process to entirely disable Yama ptrace restrictions, it can use
the special PR_SET_PTRACER_ANY pid to indicate that any otherwise allowed
process may ptrace it. This is stronger than calling PR_SET_PTRACER with
pid "1" because it includes processes in external pid namespaces. This is
currently needed by the Chrome renderer, since its crash handler (Breakpad)
runs external to the renderer's pid namespace.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <jmorris@namei.org>
2012-02-16 10:25:18 +11:00
Al Viro
4040153087 security: trim security.h
Trim security.h

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: James Morris <jmorris@namei.org>
2012-02-14 10:45:42 +11:00
Al Viro
191c542442 mm: collapse security_vm_enough_memory() variants into a single function
Collapse security_vm_enough_memory() variants into a single function.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: James Morris <jmorris@namei.org>
2012-02-14 10:45:39 +11:00
Kees Cook
2d514487fa security: Yama LSM
This adds the Yama Linux Security Module to collect DAC security
improvements (specifically just ptrace restrictions for now) that have
existed in various forms over the years and have been carried outside the
mainline kernel by other Linux distributions like Openwall and grsecurity.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <jmorris@namei.org>
2012-02-10 09:18:52 +11:00
Kees Cook
1a2a4d06e1 security: create task_free security callback
The current LSM interface to cred_free is not sufficient for allowing
an LSM to track the life and death of a task. This patch adds the
task_free hook so that an LSM can clean up resources on task death.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <jmorris@namei.org>
2012-02-10 09:14:51 +11:00
James Morris
9e3ff38647 Merge branch 'next-queue' into next 2012-02-09 17:02:34 +11:00
Li Zefan
761b3ef50e cgroup: remove cgroup_subsys argument from callbacks
The argument is not used at all, and it's not necessary, because
a specific callback handler of course knows which subsys it
belongs to.

Now only ->pupulate() takes this argument, because the handlers of
this callback always call cgroup_add_file()/cgroup_add_files().

So we reduce a few lines of code, though the shrinking of object size
is minimal.

 16 files changed, 113 insertions(+), 162 deletions(-)

   text    data     bss     dec     hex filename
5486240  656987 7039960 13183187         c928d3 vmlinux.o.orig
5486170  656987 7039960 13183117         c9288d vmlinux.o

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2012-02-02 09:20:22 -08:00
Linus Torvalds
7908b3ef68 Merge git://git.samba.org/sfrench/cifs-2.6
* git://git.samba.org/sfrench/cifs-2.6:
  CIFS: Rename *UCS* functions to *UTF16*
  [CIFS] ACL and FSCACHE support no longer EXPERIMENTAL
  [CIFS] Fix build break with multiuser patch when LANMAN disabled
  cifs: warn about impending deprecation of legacy MultiuserMount code
  cifs: fetch credentials out of keyring for non-krb5 auth multiuser mounts
  cifs: sanitize username handling
  keys: add a "logon" key type
  cifs: lower default wsize when unix extensions are not used
  cifs: better instrumentation for coalesce_t2
  cifs: integer overflow in parse_dacl()
  cifs: Fix sparse warning when calling cifs_strtoUCS
  CIFS: Add descriptions to the brlock cache functions
2012-01-23 08:59:49 -08:00
Dmitry Kasatkin
4c2c392763 ima: policy for RAMFS
Don't measure ramfs files.

Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
2012-01-19 21:30:21 -05:00
Fabio Estevam
f4a0391dfa ima: fix Kconfig dependencies
Fix the following build warning:
warning: (IMA) selects TCG_TPM which has unmet direct dependencies
(HAS_IOMEM && EXPERIMENTAL)

Suggested-by: Rajiv Andrade <srajiv@linux.vnet.ibm.com>
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Rajiv Andrade <srajiv@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
2012-01-19 21:30:09 -05:00
Mimi Zohar
f6b24579d0 keys: fix user_defined key sparse messages
Replace the rcu_assign_pointer() calls with rcu_assign_keypointer().

Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
2012-01-19 16:16:29 +11:00
Mimi Zohar
3db59dd933 ima: fix cred sparse warning
Fix ima_policy.c sparse "warning: dereference of noderef expression"
message, by accessing cred->uid using current_cred().

Changelog v1:
- Change __cred to just cred (based on David Howell's comment)

Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
2012-01-19 15:59:11 +11:00
David Howells
700920eb5b KEYS: Allow special keyrings to be cleared
The kernel contains some special internal keyrings, for instance the DNS
resolver keyring :

2a93faf1 I-----     1 perm 1f030000     0     0 keyring   .dns_resolver: empty

It would occasionally be useful to allow the contents of such keyrings to be
flushed by root (cache invalidation).

Allow a flag to be set on a keyring to mark that someone possessing the
sysadmin capability can clear the keyring, even without normal write access to
the keyring.

Set this flag on the special keyrings created by the DNS resolver, the NFS
identity mapper and the CIFS identity mapper.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
2012-01-19 14:38:51 +11:00
Jeff Layton
9f6ed2ca25 keys: add a "logon" key type
For CIFS, we want to be able to store NTLM credentials (aka username
and password) in the keyring. We do not, however want to allow users
to fetch those keys back out of the keyring since that would be a
security risk.

Unfortunately, due to the nuances of key permission bits, it's not
possible to do this. We need to grant search permissions so the kernel
can find these keys, but that also implies permissions to read the
payload.

Resolve this by adding a new key_type. This key type is essentially
the same as key_type_user, but does not define a .read op. This
prevents the payload from ever being visible from userspace. This
key type also vets the description to ensure that it's "qualified"
by checking to ensure that it has a ':' in it that is preceded by
other characters.

Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-01-17 22:39:40 -06:00
Linus Torvalds
a25a2b8409 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  integrity: digital signature config option name change
  lib: Removed MPILIB, MPILIB_EXTRA, and SIGNATURE prompts
  lib: MPILIB Kconfig description update
  lib: digital signature dependency fix
  lib: digital signature config option name change
  encrypted-keys: fix rcu and sparse messages
  keys: fix trusted/encrypted keys sparse rcu_assign_pointer messages
  KEYS: Add missing smp_rmb() primitives to the keyring search code
  TOMOYO: Accept \000 as a valid character.
  security: update MAINTAINERS file with new git repo
2012-01-17 16:43:39 -08:00
Linus Torvalds
f429ee3b80 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit: (29 commits)
  audit: no leading space in audit_log_d_path prefix
  audit: treat s_id as an untrusted string
  audit: fix signedness bug in audit_log_execve_info()
  audit: comparison on interprocess fields
  audit: implement all object interfield comparisons
  audit: allow interfield comparison between gid and ogid
  audit: complex interfield comparison helper
  audit: allow interfield comparison in audit rules
  Kernel: Audit Support For The ARM Platform
  audit: do not call audit_getname on error
  audit: only allow tasks to set their loginuid if it is -1
  audit: remove task argument to audit_set_loginuid
  audit: allow audit matching on inode gid
  audit: allow matching on obj_uid
  audit: remove audit_finish_fork as it can't be called
  audit: reject entry,always rules
  audit: inline audit_free to simplify the look of generic code
  audit: drop audit_set_macxattr as it doesn't do anything
  audit: inline checks for not needing to collect aux records
  audit: drop some potentially inadvisable likely notations
  ...

Use evil merge to fix up grammar mistakes in Kconfig file.

Bad speling and horrible grammar (and copious swearing) is to be
expected, but let's keep it to commit messages and comments, rather than
expose it to users in config help texts or printouts.
2012-01-17 16:41:31 -08:00
Dmitry Kasatkin
f1be242c95 integrity: digital signature config option name change
Similar to SIGNATURE, rename INTEGRITY_DIGSIG to INTEGRITY_SIGNATURE.

Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: James Morris <jmorris@namei.org>
2012-01-18 10:46:27 +11:00
Dmitry Kasatkin
5e8898e97a lib: digital signature config option name change
It was reported that DIGSIG is confusing name for digital signature
module. It was suggested to rename DIGSIG to SIGNATURE.

Requested-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: James Morris <jmorris@namei.org>
2012-01-18 10:46:21 +11:00
Mimi Zohar
6ac6172a93 encrypted-keys: fix rcu and sparse messages
Enabling CONFIG_PROVE_RCU and CONFIG_SPARSE_RCU_POINTER resulted in
"suspicious rcu_dereference_check() usage!" and "incompatible types
in comparison expression (different address spaces)" messages.

Access the masterkey directly when holding the rwsem.

Changelog v1:
- Use either rcu_read_lock()/rcu_derefence_key()/rcu_read_unlock()
or remove the unnecessary rcu_derefence() - David Howells

Reported-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
2012-01-18 10:41:30 +11:00
Mimi Zohar
ee0b31a25a keys: fix trusted/encrypted keys sparse rcu_assign_pointer messages
Define rcu_assign_keypointer(), which uses the key payload.rcudata instead
of payload.data, to resolve the CONFIG_SPARSE_RCU_POINTER message:
"incompatible types in comparison expression (different address spaces)"

Replace the rcu_assign_pointer() calls in encrypted/trusted keys with
rcu_assign_keypointer().

Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
2012-01-18 10:41:29 +11:00
David Howells
efde8b6e16 KEYS: Add missing smp_rmb() primitives to the keyring search code
Add missing smp_rmb() primitives to the keyring search code.

When keyring payloads are appended to without replacement (thus using up spare
slots in the key pointer array), an smp_wmb() is issued between the pointer
assignment and the increment of the key count (nkeys).

There should be corresponding read barriers between the read of nkeys and
dereferences of keys[n] when n is dependent on the value of nkeys.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
2012-01-18 10:41:27 +11:00
Tetsuo Handa
25add8cf99 TOMOYO: Accept \000 as a valid character.
TOMOYO 2.5 in Linux 3.2 and later handles Unix domain socket's address.
Thus, tomoyo_correct_word2() needs to accept \000 as a valid character, or
TOMOYO 2.5 cannot handle Unix domain's abstract socket address.

Reported-by: Steven Allen <steven@stebalien.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
CC: stable@vger.kernel.org [3.2+]
Signed-off-by: James Morris <jmorris@namei.org>
2012-01-18 10:40:59 +11:00
Kees Cook
c158a35c8a audit: no leading space in audit_log_d_path prefix
audit_log_d_path() injects an additional space before the prefix,
which serves no purpose and doesn't mix well with other audit_log*()
functions that do not sneak extra characters into the log.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-17 16:17:04 -05:00
Kees Cook
41fdc3054e audit: treat s_id as an untrusted string
The use of s_id should go through the untrusted string path, just to be
extra careful.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-17 16:17:03 -05:00
Linus Torvalds
c49c41a413 Merge branch 'for-linus' of git://selinuxproject.org/~jmorris/linux-security
* 'for-linus' of git://selinuxproject.org/~jmorris/linux-security:
  capabilities: remove __cap_full_set definition
  security: remove the security_netlink_recv hook as it is equivalent to capable()
  ptrace: do not audit capability check when outputing /proc/pid/stat
  capabilities: remove task_ns_* functions
  capabitlies: ns_capable can use the cap helpers rather than lsm call
  capabilities: style only - move capable below ns_capable
  capabilites: introduce new has_ns_capabilities_noaudit
  capabilities: call has_ns_capability from has_capability
  capabilities: remove all _real_ interfaces
  capabilities: introduce security_capable_noaudit
  capabilities: reverse arguments to security_capable
  capabilities: remove the task from capable LSM hook entirely
  selinux: sparse fix: fix several warnings in the security server cod
  selinux: sparse fix: fix warnings in netlink code
  selinux: sparse fix: eliminate warnings for selinuxfs
  selinux: sparse fix: declare selinux_disable() in security.h
  selinux: sparse fix: move selinux_complete_init
  selinux: sparse fix: make selinux_secmark_refcount static
  SELinux: Fix RCU deref check warning in sel_netport_insert()

Manually fix up a semantic mis-merge wrt security_netlink_recv():

 - the interface was removed in commit fd77846152 ("security: remove
   the security_netlink_recv hook as it is equivalent to capable()")

 - a new user of it appeared in commit a38f7907b9 ("crypto: Add
   userspace configuration API")

causing no automatic merge conflict, but Eric Paris pointed out the
issue.
2012-01-14 18:36:33 -08:00
Rusty Russell
90ab5ee941 module_param: make bool parameters really bool (drivers & misc)
module_param(bool) used to counter-intuitively take an int.  In
fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
trick.

It's time to remove the int/unsigned int option.  For this version
it'll simply give a warning, but it'll break next kernel version.

Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-13 09:32:20 +10:30
Linus Torvalds
e7691a1ce3 Merge branch 'for-linus' of git://selinuxproject.org/~jmorris/linux-security
* 'for-linus' of git://selinuxproject.org/~jmorris/linux-security: (32 commits)
  ima: fix invalid memory reference
  ima: free duplicate measurement memory
  security: update security_file_mmap() docs
  selinux: Casting (void *) value returned by kmalloc is useless
  apparmor: fix module parameter handling
  Security: tomoyo: add .gitignore file
  tomoyo: add missing rcu_dereference()
  apparmor: add missing rcu_dereference()
  evm: prevent racing during tfm allocation
  evm: key must be set once during initialization
  mpi/mpi-mpow: NULL dereference on allocation failure
  digsig: build dependency fix
  KEYS: Give key types their own lockdep class for key->sem
  TPM: fix transmit_cmd error logic
  TPM: NSC and TIS drivers X86 dependency fix
  TPM: Export wait_for_stat for other vendor specific drivers
  TPM: Use vendor specific function for status probe
  tpm_tis: add delay after aborting command
  tpm_tis: Check return code from getting timeouts/durations
  tpm: Introduce function to poll for result of self test
  ...

Fix up trivial conflict in lib/Makefile due to addition of CONFIG_MPI
and SIGSIG next to CONFIG_DQL addition.
2012-01-10 21:51:23 -08:00
Al Viro
3e25eb9c4b securityfs: fix object creation races
inode needs to be fully set up before we feed it to d_instantiate().
securityfs_create_file() does *not* do so; it sets ->i_fop and
->i_private only after we'd exposed the inode.  Unfortunately,
that's done fairly deep in call chain, so the amount of churn
is considerable.  Helper functions killed by substituting into
their solitary call sites, dead code removed.  We finally can
bury default_file_ops, now that the final value of ->i_fop is
available (and assigned) at the point where inode is allocated.

Reviewed-by: James Morris <jmorris@namei.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-10 10:20:35 -05:00
Linus Torvalds
db0c2bf69a Merge branch 'for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
* 'for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits)
  cgroup: fix to allow mounting a hierarchy by name
  cgroup: move assignement out of condition in cgroup_attach_proc()
  cgroup: Remove task_lock() from cgroup_post_fork()
  cgroup: add sparse annotation to cgroup_iter_start() and cgroup_iter_end()
  cgroup: mark cgroup_rmdir_waitq and cgroup_attach_proc() as static
  cgroup: only need to check oldcgrp==newgrp once
  cgroup: remove redundant get/put of task struct
  cgroup: remove redundant get/put of old css_set from migrate
  cgroup: Remove unnecessary task_lock before fetching css_set on migration
  cgroup: Drop task_lock(parent) on cgroup_fork()
  cgroups: remove redundant get/put of css_set from css_set_check_fetched()
  resource cgroups: remove bogus cast
  cgroup: kill subsys->can_attach_task(), pre_attach() and attach_task()
  cgroup, cpuset: don't use ss->pre_attach()
  cgroup: don't use subsys->can_attach_task() or ->attach_task()
  cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach()
  cgroup: improve old cgroup handling in cgroup_attach_proc()
  cgroup: always lock threadgroup during migration
  threadgroup: extend threadgroup_lock() to cover exit and exec
  threadgroup: rename signal->threadgroup_fork_lock to ->group_rwsem
  ...

Fix up conflict in kernel/cgroup.c due to commit e0197aae59: "cgroups:
fix a css_set not found bug in cgroup_attach_proc" that already
mentioned that the bug is fixed (differently) in Tejun's cgroup
patchset. This one, in other words.
2012-01-09 12:59:24 -08:00
James Morris
8fcc995495 Merge branch 'next' into for-linus
Conflicts:
	security/integrity/evm/evm_crypto.c

Resolved upstream fix vs. next conflict manually.

Signed-off-by: James Morris <jmorris@namei.org>
2012-01-09 12:16:48 +11:00
Linus Torvalds
972b2c7199 Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (165 commits)
  reiserfs: Properly display mount options in /proc/mounts
  vfs: prevent remount read-only if pending removes
  vfs: count unlinked inodes
  vfs: protect remounting superblock read-only
  vfs: keep list of mounts for each superblock
  vfs: switch ->show_options() to struct dentry *
  vfs: switch ->show_path() to struct dentry *
  vfs: switch ->show_devname() to struct dentry *
  vfs: switch ->show_stats to struct dentry *
  switch security_path_chmod() to struct path *
  vfs: prefer ->dentry->d_sb to ->mnt->mnt_sb
  vfs: trim includes a bit
  switch mnt_namespace ->root to struct mount
  vfs: take /proc/*/mounts and friends to fs/proc_namespace.c
  vfs: opencode mntget() mnt_set_mountpoint()
  vfs: spread struct mount - remaining argument of next_mnt()
  vfs: move fsnotify junk to struct mount
  vfs: move mnt_devname
  vfs: move mnt_list to struct mount
  vfs: switch pnode.h macros to struct mount *
  ...
2012-01-08 12:19:57 -08:00
Al Viro
cdcf116d44 switch security_path_chmod() to struct path *
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-06 23:16:53 -05:00
Al Viro
d8c9584ea2 vfs: prefer ->dentry->d_sb to ->mnt->mnt_sb
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-06 23:16:53 -05:00
Al Viro
ece2ccb668 Merge branches 'vfsmount-guts', 'umode_t' and 'partitions' into Z 2012-01-06 23:15:54 -05:00
Eric Paris
fd77846152 security: remove the security_netlink_recv hook as it is equivalent to capable()
Once upon a time netlink was not sync and we had to get the effective
capabilities from the skb that was being received.  Today we instead get
the capabilities from the current task.  This has rendered the entire
purpose of the hook moot as it is now functionally equivalent to the
capable() call.

Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-05 18:53:01 -05:00
Eric Paris
69f594a389 ptrace: do not audit capability check when outputing /proc/pid/stat
Reading /proc/pid/stat of another process checks if one has ptrace permissions
on that process.  If one does have permissions it outputs some data about the
process which might have security and attack implications.  If the current
task does not have ptrace permissions the read still works, but those fields
are filled with inocuous (0) values.  Since this check and a subsequent denial
is not a violation of the security policy we should not audit such denials.

This can be quite useful to removing ptrace broadly across a system without
flooding the logs when ps is run or something which harmlessly walks proc.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Serge E. Hallyn <serge.hallyn@canonical.com>
2012-01-05 18:53:00 -05:00
Eric Paris
2920a8409d capabilities: remove all _real_ interfaces
The name security_real_capable and security_real_capable_noaudit just don't
make much sense to me.  Convert them to use security_capable and
security_capable_noaudit.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Serge E. Hallyn <serge.hallyn@canonical.com>
2012-01-05 18:52:55 -05:00
Eric Paris
c7eba4a975 capabilities: introduce security_capable_noaudit
Exactly like security_capable except don't audit any denials.  This is for
places where the kernel may make decisions about what to do if a task has a
given capability, but which failing that capability is not a sign of a
security policy violation.  An example is checking if a task has
CAP_SYS_ADMIN to lower it's likelyhood of being killed by the oom killer.
This check is not a security violation if it is denied.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Serge E. Hallyn <serge.hallyn@canonical.com>
2012-01-05 18:52:54 -05:00
Eric Paris
b7e724d303 capabilities: reverse arguments to security_capable
security_capable takes ns, cred, cap.  But the LSM capable() hook takes
cred, ns, cap.  The capability helper functions also take cred, ns, cap.
Rather than flip argument order just to flip it back, leave them alone.
Heck, this should be a little faster since argument will be in the right
place!

Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-05 18:52:53 -05:00
Eric Paris
6a9de49115 capabilities: remove the task from capable LSM hook entirely
The capabilities framework is based around credentials, not necessarily the
current task.  Yet we still passed the current task down into LSMs from the
security_capable() LSM hook as if it was a meaningful portion of the security
decision.  This patch removes the 'generic' passing of current and instead
forces individual LSMs to use current explicitly if they think it is
appropriate.  In our case those LSMs are SELinux and AppArmor.

I believe the AppArmor use of current is incorrect, but that is wholely
unrelated to this patch.  This patch does not change what AppArmor does, it
just makes it clear in the AppArmor code that it is doing it.

The SELinux code still uses current in it's audit message, which may also be
wrong and needs further investigation.  Again this is NOT a change, it may
have always been wrong, this patch just makes it clear what is happening.

Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-05 18:52:53 -05:00
James Morris
2653812e14 selinux: sparse fix: fix several warnings in the security server cod
Fix several sparse warnings in the SELinux security server code.

Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-05 18:52:52 -05:00
James Morris
02f5daa563 selinux: sparse fix: fix warnings in netlink code
Fix sparse warnings in SELinux Netlink code.

Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-05 18:52:51 -05:00