37517 Commits

Author SHA1 Message Date
Peter Feiner
68b5a65248 mm: softdirty: respect VM_SOFTDIRTY in PTE holes
After a VMA is created with the VM_SOFTDIRTY flag set, /proc/pid/pagemap
should report that the VMA's virtual pages are soft-dirty until
VM_SOFTDIRTY is cleared (i.e., by the next write of "4" to
/proc/pid/clear_refs).  However, pagemap ignores the VM_SOFTDIRTY flag
for virtual addresses that fall in PTE holes (i.e., virtual addresses
that don't have a PMD, PUD, or PGD allocated yet).

To observe this bug, use mmap to create a VMA large enough such that
there's a good chance that the VMA will occupy an unused PMD, then test
the soft-dirty bit on its pages.  In practice, I found that a VMA that
covered a PMD's worth of address space was big enough.

This patch adds the necessary VMA lookup to the PTE hole callback in
/proc/pid/pagemap's page walk and sets soft-dirty according to the VMAs'
VM_SOFTDIRTY flag.

Signed-off-by: Peter Feiner <pfeiner@google.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Hugh Dickins <hughd@google.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06 18:01:22 -07:00
Rafael Aquini
cc7452b6dc mm: export NR_SHMEM via sysinfo(2) / si_meminfo() interfaces
Historically, we exported shared pages to userspace via sysinfo(2)
sharedram and /proc/meminfo's "MemShared" fields.  With the advent of
tmpfs, from kernel v2.4 onward, that old way for accounting shared mem
was deemed inaccurate and we started to export a hard-coded 0 for
sysinfo.sharedram.  Later on, during the 2.6 timeframe, "MemShared" got
re-introduced to /proc/meminfo re-branded as "Shmem", but we're still
reporting sysinfo.sharedmem as that old hard-coded zero, which makes the
"shared memory" report inconsistent across interfaces.

This patch leverages the addition of explicit accounting for pages used
by shmem/tmpfs -- "4b02108 mm: oom analysis: add shmem vmstat" -- in
order to make the users of sysinfo(2) and si_meminfo*() friends aware of
that vmstat entry and make them report it consistently across the
interfaces, as well to make sysinfo(2) returned data consistent with our
current API documentation states.

Signed-off-by: Rafael Aquini <aquini@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06 18:01:19 -07:00
Fabian Frederick
1b7f8ba603 fs/ocfs2/slot_map.c: replace count*size kzalloc by kcalloc
kcalloc manages count*sizeof overflow.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06 18:01:13 -07:00
Tariq Saeed
bba1cb17d9 ocfs2: race between umount and unfinished remastering during recovery
Orabug: 19074140

When umount is issued during recovery on the new master that has not
finished remastering locks, it triggers BUG() in
dlm_send_mig_lockres_msg().  Here is the situation:

 1) node A has a lock on resource X mastered by node B.

 2) node B dies ->  node A sets recovering flag for res X

 3) Node C becomes the new master for resources owned by the
    dead node and is remastering locks of the dead node but
    has not finished the remastering process yet.

 4) umount is issued on node C.

 5) During processing of umount, ignoring unfished recovery,
    node C attempts to migrate resource X to node A.

 6) node A finds res X in DLM_LOCK_RES_RECOVERING state, considers
    it a logic error and sends back -EFAULT.

 7) node C asserts BUG() upon seeing EFAULT resp from node B.

Fix is to delay migrating res X till remastering is finished at which
point recovering flag will be cleared on both A and C.

Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06 18:01:13 -07:00
Xue jiufei
7567c14883 ocfs2: remove conversion of total_backoff in dlm_join_domain()
The unit of total_backoff is msecs not jiffies, so no need to do the
conversion.  Otherwise, the join timeout is not 90 sec.

Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
Signed-off-by: joyce.xue <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06 18:01:13 -07:00
Yingtai Xie
981035b47d ocfs2: correctly check the return value of ocfs2_search_extent_list
ocfs2_search_extent_list may return -1, so we should check the return
value in ocfs2_split_and_insert, otherwise it may cause array index out of
bound.

And ocfs2_search_extent_list can only return value less than
el->l_next_free_rec, so check if it is equal or larger than
le16_to_cpu(el->l_next_free_rec) is meaningless.

Signed-off-by: Yingtai Xie <xieyingtai@huawei.com>
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06 18:01:13 -07:00
Fabian Frederick
c811f5f41e fs/squashfs/super.c: logging cleanup
- Convert printk to pr_foo()
- Add pr_fmt for future logging entries
- Coalesce formats

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06 18:01:13 -07:00
Fabian Frederick
14694888db fs/squashfs/file_direct.c: replace count*size kmalloc by kmalloc_array
kmalloc_array() manages count*sizeof overflow.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Phillip Lougher <phillip@squashfs.org.uk>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06 18:01:13 -07:00
Fabian Frederick
2502722dde ntfs: kernel-doc warning fixes
cached_page and lru_pvec were removed from ntfs_attr_extend_initialized
in commit 2ec93b0bf35f ("ntfs: clean up ntfs_attr_extend_initialized")

lru_pvec has been removed from __ntfs_grab_cache_pages in commit
4c99000ac47c ("ntfs: use add_to_page_cache_lru()")

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Acked-by: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06 18:01:12 -07:00
Fabian Frederick
2122da26c3 fs/logfs/readwrite.c: kernel-doc warning fixes
s/-/:/ and fix variable names.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Joern Engel <joern@logfs.org>
Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06 18:01:12 -07:00
Jan Kara
5838d4442b fanotify: fix double free of pending permission events
Commit 85816794240b ("fanotify: Fix use after free for permission
events") introduced a double free issue for permission events which are
pending in group's notification queue while group is being destroyed.
These events are freed from fanotify_handle_event() but they are not
removed from groups notification queue and thus they get freed again
from fsnotify_flush_notify().

Fix the problem by removing permission events from notification queue
before freeing them if we skip processing access response.  Also expand
comments in fanotify_release() to explain group shutdown in detail.

Fixes: 85816794240b9659e66e4d9b0df7c6e814e5f603
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: Douglas Leeder <douglas.leeder@sophos.com>
Tested-by: Douglas Leeder <douglas.leeder@sophos.com>
Reported-by: Heinrich Schuchard <xypron.glpk@gmx.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06 18:01:12 -07:00
Jan Kara
8ba8fa9170 fsnotify: rename event handling functions
Rename fsnotify_add_notify_event() to fsnotify_add_event() since the
"notify" part is duplicit.  Rename fsnotify_remove_notify_event() and
fsnotify_peek_notify_event() to fsnotify_remove_first_event() and
fsnotify_peek_first_event() respectively since "notify" part is duplicit
and they really look at the first event in the queue.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06 18:01:12 -07:00
Fabian Frederick
3e58406484 fs/fscache: make ctl_table static
fscache_sysctls and fscache_sysctls_root are only used in main.c

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: David Howells <dhowells@redhat.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06 18:01:12 -07:00
Linus Torvalds
ae045e2455 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Highlights:

   1) Steady transitioning of the BPF instructure to a generic spot so
      all kernel subsystems can make use of it, from Alexei Starovoitov.

   2) SFC driver supports busy polling, from Alexandre Rames.

   3) Take advantage of hash table in UDP multicast delivery, from David
      Held.

   4) Lighten locking, in particular by getting rid of the LRU lists, in
      inet frag handling.  From Florian Westphal.

   5) Add support for various RFC6458 control messages in SCTP, from
      Geir Ola Vaagland.

   6) Allow to filter bridge forwarding database dumps by device, from
      Jamal Hadi Salim.

   7) virtio-net also now supports busy polling, from Jason Wang.

   8) Some low level optimization tweaks in pktgen from Jesper Dangaard
      Brouer.

   9) Add support for ipv6 address generation modes, so that userland
      can have some input into the process.  From Jiri Pirko.

  10) Consolidate common TCP connection request code in ipv4 and ipv6,
      from Octavian Purdila.

  11) New ARP packet logger in netfilter, from Pablo Neira Ayuso.

  12) Generic resizable RCU hash table, with intial users in netlink and
      nftables.  From Thomas Graf.

  13) Maintain a name assignment type so that userspace can see where a
      network device name came from (enumerated by kernel, assigned
      explicitly by userspace, etc.) From Tom Gundersen.

  14) Automatic flow label generation on transmit in ipv6, from Tom
      Herbert.

  15) New packet timestamping facilities from Willem de Bruijn, meant to
      assist in measuring latencies going into/out-of the packet
      scheduler, latency from TCP data transmission to ACK, etc"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1536 commits)
  cxgb4 : Disable recursive mailbox commands when enabling vi
  net: reduce USB network driver config options.
  tg3: Modify tg3_tso_bug() to handle multiple TX rings
  amd-xgbe: Perform phy connect/disconnect at dev open/stop
  amd-xgbe: Use dma_set_mask_and_coherent to set DMA mask
  net: sun4i-emac: fix memory leak on bad packet
  sctp: fix possible seqlock seadlock in sctp_packet_transmit()
  Revert "net: phy: Set the driver when registering an MDIO bus device"
  cxgb4vf: Turn off SGE RX/TX Callback Timers and interrupts in PCI shutdown routine
  team: Simplify return path of team_newlink
  bridge: Update outdated comment on promiscuous mode
  net-timestamp: ACK timestamp for bytestreams
  net-timestamp: TCP timestamping
  net-timestamp: SCHED timestamp on entering packet scheduler
  net-timestamp: add key to disambiguate concurrent datagrams
  net-timestamp: move timestamp flags out of sk_flags
  net-timestamp: extend SCM_TIMESTAMPING ancillary data struct
  cxgb4i : Move stray CPL definitions to cxgb4 driver
  tcp: reduce spurious retransmits due to transient SACK reneging
  qlcnic: Initialize dcbnl_ops before register_netdev
  ...
2014-08-06 09:38:14 -07:00
Linus Torvalds
bb2cbf5e93 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
 "In this release:

   - PKCS#7 parser for the key management subsystem from David Howells
   - appoint Kees Cook as seccomp maintainer
   - bugfixes and general maintenance across the subsystem"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (94 commits)
  X.509: Need to export x509_request_asymmetric_key()
  netlabel: shorter names for the NetLabel catmap funcs/structs
  netlabel: fix the catmap walking functions
  netlabel: fix the horribly broken catmap functions
  netlabel: fix a problem when setting bits below the previously lowest bit
  PKCS#7: X.509 certificate issuer and subject are mandatory fields in the ASN.1
  tpm: simplify code by using %*phN specifier
  tpm: Provide a generic means to override the chip returned timeouts
  tpm: missing tpm_chip_put in tpm_get_random()
  tpm: Properly clean sysfs entries in error path
  tpm: Add missing tpm_do_selftest to ST33 I2C driver
  PKCS#7: Use x509_request_asymmetric_key()
  Revert "selinux: fix the default socket labeling in sock_graft()"
  X.509: x509_request_asymmetric_keys() doesn't need string length arguments
  PKCS#7: fix sparse non static symbol warning
  KEYS: revert encrypted key change
  ima: add support for measuring and appraising firmware
  firmware_class: perform new LSM checks
  security: introduce kernel_fw_from_file hook
  PKCS#7: Missing inclusion of linux/err.h
  ...
2014-08-06 08:06:39 -07:00
Linus Torvalds
e7fda6c4c3 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer and time updates from Thomas Gleixner:
 "A rather large update of timers, timekeeping & co

   - Core timekeeping code is year-2038 safe now for 32bit machines.
     Now we just need to fix all in kernel users and the gazillion of
     user space interfaces which rely on timespec/timeval :)

   - Better cache layout for the timekeeping internal data structures.

   - Proper nanosecond based interfaces for in kernel users.

   - Tree wide cleanup of code which wants nanoseconds but does hoops
     and loops to convert back and forth from timespecs.  Some of it
     definitely belongs into the ugly code museum.

   - Consolidation of the timekeeping interface zoo.

   - A fast NMI safe accessor to clock monotonic for tracing.  This is a
     long standing request to support correlated user/kernel space
     traces.  With proper NTP frequency correction it's also suitable
     for correlation of traces accross separate machines.

   - Checkpoint/restart support for timerfd.

   - A few NOHZ[_FULL] improvements in the [hr]timer code.

   - Code move from kernel to kernel/time of all time* related code.

   - New clocksource/event drivers from the ARM universe.  I'm really
     impressed that despite an architected timer in the newer chips SoC
     manufacturers insist on inventing new and differently broken SoC
     specific timers.

[ Ed. "Impressed"? I don't think that word means what you think it means ]

   - Another round of code move from arch to drivers.  Looks like most
     of the legacy mess in ARM regarding timers is sorted out except for
     a few obnoxious strongholds.

   - The usual updates and fixlets all over the place"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits)
  timekeeping: Fixup typo in update_vsyscall_old definition
  clocksource: document some basic timekeeping concepts
  timekeeping: Use cached ntp_tick_length when accumulating error
  timekeeping: Rework frequency adjustments to work better w/ nohz
  timekeeping: Minor fixup for timespec64->timespec assignment
  ftrace: Provide trace clocks monotonic
  timekeeping: Provide fast and NMI safe access to CLOCK_MONOTONIC
  seqcount: Add raw_write_seqcount_latch()
  seqcount: Provide raw_read_seqcount()
  timekeeping: Use tk_read_base as argument for timekeeping_get_ns()
  timekeeping: Create struct tk_read_base and use it in struct timekeeper
  timekeeping: Restructure the timekeeper some more
  clocksource: Get rid of cycle_last
  clocksource: Move cycle_last validation to core code
  clocksource: Make delta calculation a function
  wireless: ath9k: Get rid of timespec conversions
  drm: vmwgfx: Use nsec based interfaces
  drm: i915: Use nsec based interfaces
  timekeeping: Provide ktime_get_raw()
  hangcheck-timer: Use ktime_get_ns()
  ...
2014-08-05 17:46:42 -07:00
Jeff Mahoney
27d0e5bc85 reiserfs: fix corruption introduced by balance_leaf refactor
Commits f1f007c308e (reiserfs: balance_leaf refactor, pull out
balance_leaf_insert_left) and cf22df182bf (reiserfs: balance_leaf
refactor, pull out balance_leaf_paste_left) missed that the `body'
pointer was getting repositioned. Subsequent users of the pointer
would expect it to be repositioned, and as a result, parts of the
tree would get overwritten. The most common observed corruption
is indirect block pointers being overwritten.

Since the body value isn't actually used anymore in the called routines,
we can pass back the offset it should be shifted. We constify the body
and ih pointers in the balance_leaf as a mostly-free preventative measure.

Cc: <stable@vger.kernel.org> # 3.16
Reported-and-tested-by: Jeff Chua <jeff.chua.linux@gmail.com>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2014-08-05 23:18:38 +02:00
Jeff Layton
14a571a8ec nfsd: add some comments to the nfsd4 object definitions
Add some comments that describe what each of these objects is, and how
they related to one another.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 16:09:20 -04:00
Jeff Layton
b687f6863e nfsd: remove the client_mutex and the nfs4_lock/unlock_state wrappers
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 15:00:54 -04:00
Steve French
f29ebb47d5 Add worker function to set allocation size
Adds setinfo worker function for SMB2/SMB3 support of SET_ALLOCATION_INFORMATION

Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilovsky@samba.org>
2014-08-05 12:53:37 -05:00
Jeff Layton
74cf76df0f nfsd: remove nfs4_lock_state: nfs4_state_shutdown_net
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:20 -04:00
Jeff Layton
dab6ef2415 nfsd: remove nfs4_lock_state: nfs4_laundromat
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:20 -04:00
Trond Myklebust
05149dd4dc nfsd: Remove nfs4_lock_state(): reclaim_complete()
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:19 -04:00
Trond Myklebust
cb86fb1428 nfsd: Remove nfs4_lock_state(): setclientid, setclientid_confirm, renew
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:18 -04:00
Trond Myklebust
3974552dce nfsd: Remove nfs4_lock_state(): exchange_id, create/destroy_session()
Also destroy_clientid and bind_conn_to_session.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:17 -04:00
Trond Myklebust
3234975f47 nfsd: Remove nfs4_lock_state(): nfsd4_open and nfsd4_open_confirm
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:16 -04:00
Trond Myklebust
084d4d4549 nfsd: Remove nfs4_lock_state(): nfsd4_delegreturn()
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:15 -04:00
Trond Myklebust
36626a2ecf nfsd: Remove nfs4_lock_state(): nfsd4_open_downgrade + nfsd4_close
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:14 -04:00
Trond Myklebust
2dd7f2ad4e nfsd: Remove nfs4_lock_state(): nfsd4_lock/locku/lockt()
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:13 -04:00
Trond Myklebust
51f5e78355 nfsd: Remove nfs4_lock_state(): nfsd4_release_lockowner
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:12 -04:00
Trond Myklebust
e7d5dc19ce nfsd: Remove nfs4_lock_state(): nfsd4_test_stateid/nfsd4_free_stateid
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:12 -04:00
Trond Myklebust
c2d1d6a8f0 nfsd: Remove nfs4_lock_state(): nfs4_preprocess_stateid_op()
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:11 -04:00
Jeff Layton
285abdee53 nfsd: remove old fault injection infrastructure
Remove the old nfsd_for_n_state function and move nfsd_find_client
higher up into the file to get rid of forward declaration. Remove
the struct nfsd_fault_inject_op arguments from the operations as
they are no longer needed by any of them.

Finally, remove the old "standard" get and set routines, which
also eliminates the client_mutex from this code.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:10 -04:00
Jeff Layton
98d5c7c5bd nfsd: add more granular locking to *_delegations fault injectors
...instead of relying on the client_mutex.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:09 -04:00
Jeff Layton
82e05efaec nfsd: add more granular locking to forget_openowners fault injector
...instead of relying on the client_mutex.

Also, fix up the printk output that is generated when the file is read.
It currently says that it's reporting the number of open files, but
it's actually reporting the number of openowners.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:08 -04:00
Jeff Layton
016200c373 nfsd: add more granular locking to forget_locks fault injector
...instead of relying on the client_mutex.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:07 -04:00
Jeff Layton
3738d50e7f nfsd: add a list_head arg to nfsd_foreach_client_lock
In a later patch, we'll want to collect the locks onto a list for later
destruction. If "func" is defined and "collect" is defined, then we'll
add the lock stateid to the list.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:06 -04:00
Jeff Layton
69fc9edf98 nfsd: add nfsd_inject_forget_clients
...which uses the client_lock for protection instead of client_mutex.
Also remove nfsd_forget_client as there are no more callers.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:05 -04:00
Jeff Layton
a0926d1527 nfsd: add a forget_client set_clnt routine
...that relies on the client_lock instead of client_mutex.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:04 -04:00
Jeff Layton
7ec0e36f1a nfsd: add a forget_clients "get" routine with proper locking
Add a new "get" routine for forget_clients that relies on the
client_lock instead of the client_mutex.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:04 -04:00
Jeff Layton
c96223d3b6 nfsd: abstract out the get and set routines into the fault injection ops
Now that we've added more granular locking in other places, it's time
to address the fault injection code. This code is currently quite
reliant on the client_mutex for protection. Start to change this by
adding a new set of fault injection op vectors.

For now they all use the legacy ones. In later patches we'll add new
routines that can deal with more granular locking.

Also, move some of the printk routines into the callers to make the
results of the operations more uniform.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:02 -04:00
Jeff Layton
294ac32e99 nfsd: protect clid and verifier generation with client_lock
The clid counter is a global counter currently. Move it to be a per-net
property so that it can be properly protected by the nn->client_lock
instead of relying on the client_mutex.

The verifier generator is also potentially racy if there are two
simultaneous callers. Generate the verifier when we generate the clid
value, so it's also created under the client_lock. With this, there's
no need to keep two counters as they'd always be in sync anyway, so
just use the clientid_counter for both.

As Trond points out, what would be best is to eventually move this
code to use IDR instead of the hash tables. That would also help ensure
uniqueness, but that's probably best done as a separate project.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:02 -04:00
Jeff Layton
fd699b8a48 nfsd: don't destroy clients that are busy
It's possible that we'll have an in-progress call on some of the clients
while a rogue EXCHANGE_ID or DESTROY_CLIENTID call comes in. Be sure to
try and mark the client expired first, so that the refcount is
respected.

This will only be a problem once the client_mutex is removed.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:55:01 -04:00
Kinglong Mee
fb94d766af NFSD: Put the reference of nfs4_file when freeing stid
After testing nfs4 lock, I restart the nfsd service, got messages as,

[ 5677.403419] nfsd: last server has exited, flushing export cache
[ 5677.463728] =============================================================================
[ 5677.463942] BUG nfsd4_files (Tainted: G    B      OE): Objects remaining in nfsd4_files on kmem_cache_close()
[ 5677.464055] -----------------------------------------------------------------------------

[ 5677.464203] INFO: Slab 0xffffea0000233400 objects=28 used=1 fp=0xffff880008cd3d98 flags=0x3ffc0000004080
[ 5677.464318] CPU: 0 PID: 3772 Comm: rmmod Tainted: G    B      OE 3.16.0-rc2+ #29
[ 5677.464420] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[ 5677.464538]  0000000000000000 0000000036af2c9f ffff88000ce97d68 ffffffff816eacfa
[ 5677.464643]  ffffea0000233400 ffff88000ce97e40 ffffffff811cda44 ffffffff00000020
[ 5677.464774]  ffff88000ce97e50 ffff88000ce97e00 656a624f00000008 616d657220737463
[ 5677.464875] Call Trace:
[ 5677.464925]  [<ffffffff816eacfa>] dump_stack+0x45/0x56
[ 5677.464983]  [<ffffffff811cda44>] slab_err+0xb4/0xe0
[ 5677.465040]  [<ffffffff811d0457>] ? __kmalloc+0x117/0x290
[ 5677.465099]  [<ffffffff81100eec>] ? on_each_cpu_cond+0xac/0xf0
[ 5677.465158]  [<ffffffff811d1bc0>] ? kmem_cache_close+0x110/0x2e0
[ 5677.465218]  [<ffffffff811d1be0>] kmem_cache_close+0x130/0x2e0
[ 5677.465279]  [<ffffffff8135a0c1>] ? kobject_cleanup+0x91/0x1b0
[ 5677.465338]  [<ffffffff811d22be>] __kmem_cache_shutdown+0xe/0x10
[ 5677.465399]  [<ffffffff8119bd28>] kmem_cache_destroy+0x48/0x100
[ 5677.465466]  [<ffffffffa05ef78d>] nfsd4_free_slabs+0x2d/0x50 [nfsd]
[ 5677.465530]  [<ffffffffa05fa987>] exit_nfsd+0x34/0x6ad [nfsd]
[ 5677.465589]  [<ffffffff81104ac2>] SyS_delete_module+0x162/0x200
[ 5677.465649]  [<ffffffff81013b69>] ? do_notify_resume+0x59/0x90
[ 5677.465759]  [<ffffffff816f2369>] system_call_fastpath+0x16/0x1b
[ 5677.465822] INFO: Object 0xffff880008cd0000 @offset=0
[ 5677.465882] INFO: Allocated in nfsd4_process_open1+0x61/0x350 [nfsd] age=7599 cpu=0 pid=3253
[ 5677.466115]  __slab_alloc+0x3b0/0x4b1
[ 5677.466166]  kmem_cache_alloc+0x1e4/0x240
[ 5677.466220]  nfsd4_process_open1+0x61/0x350 [nfsd]
[ 5677.466276]  nfsd4_open+0xee/0x860 [nfsd]
[ 5677.466329]  nfsd4_proc_compound+0x4d7/0x7f0 [nfsd]
[ 5677.466384]  nfsd_dispatch+0xbb/0x200 [nfsd]
[ 5677.466447]  svc_process_common+0x453/0x6f0 [sunrpc]
[ 5677.466506]  svc_process+0x103/0x170 [sunrpc]
[ 5677.466559]  nfsd+0x117/0x190 [nfsd]
[ 5677.466609]  kthread+0xd8/0xf0
[ 5677.466656]  ret_from_fork+0x7c/0xb0
[ 5677.466775] kmem_cache_destroy nfsd4_files: Slab cache still has objects
[ 5677.466839] CPU: 0 PID: 3772 Comm: rmmod Tainted: G    B      OE 3.16.0-rc2+ #29
[ 5677.466937] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[ 5677.467049]  0000000000000000 0000000036af2c9f ffff88000ce97eb0 ffffffff816eacfa
[ 5677.467150]  ffff880020bb2d00 ffff88000ce97ed0 ffffffff8119bdd9 0000000000000000
[ 5677.467250]  ffffffffa06065c0 ffff88000ce97ee0 ffffffffa05ef78d ffff88000ce97ef0
[ 5677.467351] Call Trace:
[ 5677.467397]  [<ffffffff816eacfa>] dump_stack+0x45/0x56
[ 5677.467454]  [<ffffffff8119bdd9>] kmem_cache_destroy+0xf9/0x100
[ 5677.467516]  [<ffffffffa05ef78d>] nfsd4_free_slabs+0x2d/0x50 [nfsd]
[ 5677.467579]  [<ffffffffa05fa987>] exit_nfsd+0x34/0x6ad [nfsd]
[ 5677.467639]  [<ffffffff81104ac2>] SyS_delete_module+0x162/0x200
[ 5677.467765]  [<ffffffff81013b69>] ? do_notify_resume+0x59/0x90
[ 5677.467826]  [<ffffffff816f2369>] system_call_fastpath+0x16/0x1b

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Jeff Layton <jlayton@primarydata.com>
Fixes: 11b9164adad7 "nfsd: Add a struct nfs4_file field to struct nfs4_stid"
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-08-05 10:53:36 -04:00
Linus Torvalds
8e099d1e8b Bug fixes and clean ups for the 3.17 merge window
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJT4Eh0AAoJENNvdpvBGATwmaAP/0W9B/VPY4rSnanpXvsYlone
 wjIjh11NZdU3daW/c/VhgEIO81rFaoFoQ/dN+pIo7uHG8JRqZjyHXZY2eVF5SFFp
 FQyGEGMRhE+u78IDg3U99OlAUTo8SAHjHVlYUBVpT9lazMLWRPqP7uHJbXow5ijK
 /bXSVY+6fOWY1/yruCZv1nRtg9JNZgCc1LOPDqn6K16jItBKfYBvVbpw6hise2v2
 rPfcSlKJ5Wzo/PgNX+IR9nnUOXzpbdM2CZbxy0qB2jZYirzE6VNeHhc1JWxQWNB1
 Dg3j/ynEWPs03+9ywLcQ0kEQvUXhQQlMGmfPkgzWeAUQDUv4QAeYmFiRhc/EgJWY
 othDmKVqy0Pn9rmGCOMg/TJFH0Mz/c7PTxFVF1onxs2Sqvl3yCdZANT+ie5UoC9m
 zkUHdY3HkARiK/I6d5CCJzvHMxWNyf6bAJmoR6L/SaOPXebc4cfFtjgV01kbTWv+
 rW9MCj3TiIC9MpWdVJADcmc2w3cY0L/NUbFWHZhMSDFiuJvcLUw5afaJTvKEPuKp
 WnnYICPj6wMP7Gy/isTxBGGi0UjPm67DHGLpG+syPDi1RxU6Vw3p9PjbdvCRj7so
 UD3xzntCHOzVcgAxE92V4pMZAajtv0sfIBVzMm7k8iUXzb7Er1c5dV6ROkOzguq3
 Ogj/c2JHSSB4TSYdVxsG
 =hf9n
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "Bug fixes and clean ups for the 3.17 merge window"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix ext4_discard_allocated_blocks() if we can't allocate the pa struct
  ext4: fix COLLAPSE RANGE test for bigalloc file systems
  ext4: check inline directory before converting
  ext4: fix incorrect locking in move_extent_per_page
  ext4: use correct depth value
  ext4: add i_data_sem sanity check
  ext4: fix wrong size computation in ext4_mb_normalize_request()
  ext4: make ext4_has_inline_data() as a inline function
  ext4: remove readpage() check in ext4_mmap_file()
  ext4: fix punch hole on files with indirect mapping
  ext4: remove metadata reservation checks
  ext4: rearrange initialization to fix EXT4FS_DEBUG
2014-08-04 20:46:54 -07:00
Linus Torvalds
b54ecfb702 Merge tag 'for-f2fs-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
 "This series includes patches to:
   - add nobarrier mount option
   - support tmpfile and rename2
   - enhance the fdatasync behavior
   - fix the error path
   - fix the recovery routine
   - refactor a part of the checkpoint procedure
   - reduce some lock contentions"

* tag 'for-f2fs-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (40 commits)
  f2fs: use for_each_set_bit to simplify the code
  f2fs: add f2fs_balance_fs for expand_inode_data
  f2fs: invalidate xattr node page when evict inode
  f2fs: avoid skipping recover_inline_xattr after recover_inline_data
  f2fs: add tracepoint for f2fs_direct_IO
  f2fs: reduce competition among node page writes
  f2fs: fix coding style
  f2fs: remove redundant lines in allocate_data_block
  f2fs: add tracepoint for f2fs_issue_flush
  f2fs: avoid retrying wrong recovery routine when error was occurred
  f2fs: test before set/clear bits
  f2fs: fix wrong condition for unlikely
  f2fs: enable in-place-update for fdatasync
  f2fs: skip unnecessary data writes during fsync
  f2fs: add info of appended or updated data writes
  f2fs: use radix_tree for ino management
  f2fs: add infra for ino management
  f2fs: punch the core function for inode management
  f2fs: add nobarrier mount option
  f2fs: fix to put root inode in error path of fill_super
  ...
2014-08-04 20:30:07 -07:00
Linus Torvalds
29b88e23a9 Driver core patches for 3.17-rc1
Here's the big driver-core pull request for 3.17-rc1.
 
 Largest thing in here is the dma-buf rework and fence code, that touched
 many different subsystems so it was agreed it should go through this
 tree to handle merge issues.  There's also some firmware loading
 updates, as well as tests added, and a few other tiny changes, the
 changelog has the details.
 
 All have been in linux-next for a long time.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlPf1XcACgkQMUfUDdst+ylREACdHLXBa02yLrRzbrONJ+nARuFv
 JuQAoMN49PD8K9iMQpXqKBvZBsu+iCIY
 =w8OJ
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here's the big driver-core pull request for 3.17-rc1.

  Largest thing in here is the dma-buf rework and fence code, that
  touched many different subsystems so it was agreed it should go
  through this tree to handle merge issues.  There's also some firmware
  loading updates, as well as tests added, and a few other tiny changes,
  the changelog has the details.

  All have been in linux-next for a long time"

* tag 'driver-core-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (32 commits)
  ARM: imx: Remove references to platform_bus in mxc code
  firmware loader: Fix _request_firmware_load() return val for fw load abort
  platform: Remove most references to platform_bus device
  test: add firmware_class loader test
  doc: fix minor typos in firmware_class README
  staging: android: Cleanup style issues
  Documentation: devres: Sort managed interfaces
  Documentation: devres: Add devm_kmalloc() et al
  fs: debugfs: remove trailing whitespace
  kernfs: kernel-doc warning fix
  debugfs: Fix corrupted loop in debugfs_remove_recursive
  stable_kernel_rules: Add pointer to netdev-FAQ for network patches
  driver core: platform: add device binding path 'driver_override'
  driver core/platform: remove unused implicit padding in platform_object
  firmware loader: inform direct failure when udev loader is disabled
  firmware: replace ALIGN(PAGE_SIZE) by PAGE_ALIGN
  firmware: read firmware size using i_size_read()
  firmware loader: allow disabling of udev as firmware loader
  reservation: add suppport for read-only access using rcu
  reservation: update api and add some helpers
  ...

Conflicts:
	drivers/base/platform.c
2014-08-04 18:34:04 -07:00
Linus Torvalds
98959948a7 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:

 - Move the nohz kick code out of the scheduler tick to a dedicated IPI,
   from Frederic Weisbecker.

  This necessiated quite some background infrastructure rework,
  including:

   * Clean up some irq-work internals
   * Implement remote irq-work
   * Implement nohz kick on top of remote irq-work
   * Move full dynticks timer enqueue notification to new kick
   * Move multi-task notification to new kick
   * Remove unecessary barriers on multi-task notification

 - Remove proliferation of wait_on_bit() action functions and allow
   wait_on_bit_action() functions to support a timeout.  (Neil Brown)

 - Another round of sched/numa improvements, cleanups and fixes.  (Rik
   van Riel)

 - Implement fast idling of CPUs when the system is partially loaded,
   for better scalability.  (Tim Chen)

 - Restructure and fix the CPU hotplug handling code that may leave
   cfs_rq and rt_rq's throttled when tasks are migrated away from a dead
   cpu.  (Kirill Tkhai)

 - Robustify the sched topology setup code.  (Peterz Zijlstra)

 - Improve sched_feat() handling wrt.  static_keys (Jason Baron)

 - Misc fixes.

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
  sched/fair: Fix 'make xmldocs' warning caused by missing description
  sched: Use macro for magic number of -1 for setparam
  sched: Robustify topology setup
  sched: Fix sched_setparam() policy == -1 logic
  sched: Allow wait_on_bit_action() functions to support a timeout
  sched: Remove proliferation of wait_on_bit() action functions
  sched/numa: Revert "Use effective_load() to balance NUMA loads"
  sched: Fix static_key race with sched_feat()
  sched: Remove extra static_key*() function indirection
  sched/rt: Fix replenish_dl_entity() comments to match the current upstream code
  sched: Transform resched_task() into resched_curr()
  sched/deadline: Kill task_struct->pi_top_task
  sched: Rework check_for_tasks()
  sched/rt: Enqueue just unthrottled rt_rq back on the stack in __disable_runtime()
  sched/fair: Disable runtime_enabled on dying rq
  sched/numa: Change scan period code to match intent
  sched/numa: Rework best node setting in task_numa_migrate()
  sched/numa: Examine a task move when examining a task swap
  sched/numa: Simplify task_numa_compare()
  sched/numa: Use effective_load() to balance NUMA loads
  ...
2014-08-04 16:23:30 -07:00
Scott Mayhew
71a6ec8ac5 nfs: reject changes to resvport and sharecache during remount
Commit c8e47028 made it possible to change resvport/noresvport and
sharecache/nosharecache via a remount operation, neither of which should be
allowed.

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Fixes: c8e47028 (nfs: Apply NFS_MOUNT_CMP_FLAGMASK to nfs_compare_remount_data)
Cc: stable@vger.kernel.org # 3.16+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-08-04 17:41:52 -04:00
Kinglong Mee
5b53dc88b0 NFS: Avoid infinite loop when RELEASE_LOCKOWNER getting expired error
Fix Commit 60ea681299 (NFS: Migration support for RELEASE_LOCKOWNER)
If getting expired error, client will enter a infinite loop as,

client                            server
   RELEASE_LOCKOWNER(old clid) ----->
                <--- expired error
   RENEW(old clid)             ----->
                <--- expired error
   SETCLIENTID                 ----->
                <--- a new clid
   SETCLIENTID_CONFIRM (new clid) -->
                <--- ok
   RELEASE_LOCKOWNER(old clid) ----->
                <--- expired error
   RENEW(new clid)             ----->
                <-- ok
   RELEASE_LOCKOWNER(old clid) ----->
                <--- expired error
   RENEW(new clid)             ----->
                <-- ok
                ... ...

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
[Trond: replace call to nfs4_async_handle_error() with
 nfs4_schedule_lease_recovery()]
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-08-04 16:51:38 -04:00