Commit Graph

42399 Commits

Author SHA1 Message Date
Al Viro
cadfbb6ec2 dax_io(): don't let non-error value escape via retval instead of EFAULT
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-11 09:36:57 -07:00
Vivek Goyal
dbd3ca5075 fs/block_dev.c: Remove WARN_ON() when inode writeback fails
If a block device is hot removed and later last reference to device
is put, we try to writeback the dirty inode. But device is gone and
that writeback fails.

Currently we do a WARN_ON() which does not seem to be the right thing.
Convert it to a ratelimited kernel warning.

Reported-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
[jmoyer@redhat.com: get rid of unnecessary name initialization, 80 cols]
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-11 09:36:57 -07:00
Linus Torvalds
3419b45039 Merge branch 'for-4.4/io-poll' of git://git.kernel.dk/linux-block
Pull block IO poll support from Jens Axboe:
 "Various groups have been doing experimentation around IO polling for
  (really) fast devices.  The code has been reviewed and has been
  sitting on the side for a few releases, but this is now good enough
  for coordinated benchmarking and further experimentation.

  Currently O_DIRECT sync read/write are supported.  A framework is in
  the works that allows scalable stats tracking so we can auto-tune
  this.  And we'll add libaio support as well soon.  Fow now, it's an
  opt-in feature for test purposes"

* 'for-4.4/io-poll' of git://git.kernel.dk/linux-block:
  direct-io: be sure to assign dio->bio_bdev for both paths
  directio: add block polling support
  NVMe: add blk polling support
  block: add block polling support
  blk-mq: return tag/queue combo in the make_request_fn handlers
  block: change ->make_request_fn() and users to return a queue cookie
2015-11-10 17:23:49 -08:00
Linus Torvalds
01504f5e9e This pull request includes the following UBI/UBIFS changes:
* access time support for UBIFS by Dongsheng Yang
 * random cleanups and bug fixes all over the place
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJWQmnVAAoJEEtJtSqsAOnWpnkP/0pUN+Cx58QJoJXxaSgClH/l
 QwYGznz4Fv3QuSafXJZy8CPzsTasaOOtoVKJPfPTvP8Drrl3FFbtdhqlqdx1YKqw
 DuKeViwmE/HF49a5pxgtR8032l35s3zZUWwd2yKA3B6SVEIsjAoWlIutyzKA/DiR
 R4mGVsfRevfzq1+l2YXGGZ8PJXMtRKnFTjzmIUJY0OeT3xCvl/uUFCid2x693aut
 GpJ6AMKGYcU+BuQJmb89wvdRYCHPdYEJr7TWq+GBxU3gDPlLY9302IhPq9ftv8Lp
 ClA0X3GZqVPR921XR+aA4ysRbfENT+UBewiBt2OEwQLnI6jxEn9EqsgErATxpOYy
 PEduYuU56mqc67EE4r7+YHfkidMSa8KrR9s+GpzR6JStW9HNGFDfGfCAK7BcdvWT
 +TuHfNk787Gbl4TO41iFI87s6zNQszYOpxBTEFYHx0FQWLNxD5c79UkisL+biqvJ
 bX8+7pClJQXcY0CXqe56GVQLC5v4HbvMmnstiNk52ihxEB0NT6htzWe/GPb/+UZ1
 K+kOAT54DPeag3FK4J1WZ57cYyJHdbyASB5vKvb+A1IwzrYFo8P35GekoWNK7eAJ
 Od5mb++7pTEFSH7oSwsprPOmrHLRmXgRG8yhLRfqX1ZwFeyFdqGQxjce9O/QB1jo
 jr4dxAY6Gnjdo1q07QSI
 =9n2S
 -----END PGP SIGNATURE-----

Merge tag 'upstream-4.4-rc1' of git://git.infradead.org/linux-ubifs

Pull UBI/UBIFS updates from Richard Weinberger:

 - access time support for UBIFS by Dongsheng Yang

 - random cleanups and bug fixes all over the place

* tag 'upstream-4.4-rc1' of git://git.infradead.org/linux-ubifs:
  ubifs: introduce UBIFS_ATIME_SUPPORT to ubifs
  ubifs: make ubifs_[get|set]xattr atomic
  UBIFS: Delete unnecessary checks before the function call "iput"
  UBI: Remove in vain semicolon
  UBI: Fastmap: Fix PEB array type
  UBIFS: Fix possible memory leak in ubifs_readdir()
  fs/ubifs: remove unnecessary new_valid_dev check
  ubi: fastmap: Implement produce_free_peb()
  UBIFS: print verbose message when rescanning a corrupted node
  UBIFS: call dbg_is_power_cut() instead of reading c->dbg->pc_happened
  UBI: drop null test before destroy functions
  UBI: Update comments to reflect UBI_METAONLY flag
  UBI: Fix debug message
  UBI: Fix typo in comment
  UBI: Fastmap: Simplify expression
  UBIFS: fix a typo in comment of ubifs_budget_req
  UBIFS: use kmemdup rather than duplicating its implementation
2015-11-10 16:35:06 -08:00
Linus Torvalds
264015f8a8 libnvdimm for 4.4:
1/ Add support for the ACPI 6.0 NFIT hot add mechanism to process
    updates of the NFIT at runtime.
 
 2/ Teach the coredump implementation how to filter out DAX mappings.
 
 3/ Introduce NUMA hints for allocations made by the pmem driver, and as
    a side effect all devm allocations now hint their NUMA node by
    default.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWQX2sAAoJEB7SkWpmfYgCWsEQAK7w/xM9zClVY/DDlFJxFtYq
 DZJ4faPj+E3FMTiJIEDzjtRgQvOFE+wmJtntYsCqKH/QZmpnyk9jeT/CbJzEEL2k
 WsAk+qHGLcVUlSb36blwN1RFzYqC+IDYThewJqUvxDbOwL1AbiibbX7gplzZHLhW
 +rj3ScVlSNOPRDgGGpkAeLNNsttuKvsGo7nB/JZopm0tV6g14rSK09wQbVhv6S6T
 Lu7VGYqnJlkteL9YlzRiROf9hW2ZFCMGJz1YZydPTy3aX3hGTBX4w2qvmsPwBIKP
 kW/gCNisVJGk1cZCk4joSJ8i/b3x3fE0zdZ5waivJ5jDvYbUUfyk0KtJkfw207Rl
 14yWitUC6aeVuCeOqXHgsjRi+1QVN9Pg7i49xgGiUN1igQiUYRTgQPWZxDv6Zo/s
 USrLFQBaRd+hJw+dl7A47lJ3mUF96tPCoQb4LCQ7DVsg5U4J2TvqXLH9Gek/CCZ4
 QsMkZDTQlZw4+JEDlzBgg/L7xVty8DadplTADMdjaRhFU3y8zKNJ85Ileokt7KVt
 IsBT4+S5HeZLvinZY95932DwAmFp1DtsyENd1BUXL06ddyvlQrFJ6NQaXji4fuDc
 EVQmMoTAqDujZFupMAux9vkUBDFj/hmaVD5F7j3+MWP87OCritw/IZn+2LgTaKoX
 EmttaYrDr2jJwIaGyw+H
 =a2/L
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Dan Williams:
 "Outside of the new ACPI-NFIT hot-add support this pull request is more
  notable for what it does not contain, than what it does.  There were a
  handful of development topics this cycle, dax get_user_pages, dax
  fsync, and raw block dax, that need more more iteration and will wait
  for 4.5.

  The patches to make devm and the pmem driver NUMA aware have been in
  -next for several weeks.  The hot-add support has not, but is
  contained to the NFIT driver and is passing unit tests.  The coredump
  support is straightforward and was looked over by Jeff.  All of it has
  received a 0day build success notification across 107 configs.

  Summary:

   - Add support for the ACPI 6.0 NFIT hot add mechanism to process
     updates of the NFIT at runtime.

   - Teach the coredump implementation how to filter out DAX mappings.

   - Introduce NUMA hints for allocations made by the pmem driver, and
     as a side effect all devm allocations now hint their NUMA node by
     default"

* tag 'libnvdimm-for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  coredump: add DAX filtering for FDPIC ELF coredumps
  coredump: add DAX filtering for ELF coredumps
  acpi: nfit: Add support for hot-add
  nfit: in acpi_nfit_init, break on a 0-length table
  pmem, memremap: convert to numa aware allocations
  devm_memremap_pages: use numa_mem_id
  devm: make allocations numa aware by default
  devm_memremap: convert to return ERR_PTR
  devm_memunmap: use devres_release()
  pmem: kill memremap_pmem()
  x86, mm: quiet arch_add_memory()
2015-11-10 12:07:22 -08:00
Linus Torvalds
d55fc37856 Merge branch 'i2c/for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c updates from Wolfram Sang:

 - New drivers: UniPhier (with and without FIFO)

 - some drivers got some bigger rework: ismt, designware, img-scb (rcar
   had to be reverted because issues were showing up just lately)

 - ACPI: reworked the device scanning and added support for muxes

... and quite a lot of driver bugfixes and cleanups this time.  All
files touched outside of the i2c realm have proper acks.

* 'i2c/for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (70 commits)
  i2c: rcar: Revert the latest refactoring series
  i2c: pnx: remove superfluous assignment
  MAINTAINERS: i2c: drop i2c-pnx maintainer
  MAINTAINERS: i2c: mark also subdirectories as maintained
  i2c: cadence: enable driver for ARM64
  i2c: i801: Document Intel DNV and Broxton
  i2c: at91: manage unexpected RXRDY flag when starting a transfer
  i2c: pnx: Use setup_timer instead of open coding it
  i2c: add ACPI support for I2C mux ports
  acpi: add acpi_preset_companion() stub
  i2c: pxa: Add support for pxa910/988 & new configuration features
  i2c: au1550: Convert to devm_kzalloc and devm_ioremap_resource
  i2c-dev: Fix I2C_SLAVE ioctl comment
  i2c-dev: Fix typo in ioctl name reference
  i2c: sirf: tune the divider to make i2c bus freq more accurate
  i2c: imx: Use -ENXIO as error in the NACK case
  i2c: i801: Add support for Intel Broxton
  i2c: i801: Add support for Intel DNV
  i2c: mediatek: add i2c resume support
  i2c: imx: implement bus recovery
  ...
2015-11-10 11:58:25 -08:00
Jens Axboe
c1c534609f direct-io: be sure to assign dio->bio_bdev for both paths
btrfs sets ->submit_io(), and we failed to set the block dev for
that path. That resulted in a potential NULL dereference when
we later wait for IO in dio_await_one().

Reported-by: kernel test robot <ying.huang@linux.intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-10 10:14:38 -07:00
Linus Torvalds
bd4f203e43 Merge branch 'akpm' (patches from Andrew)
Merge third patch-bomb from Andrew Morton:
 "We're pretty much done over here - I'm still waiting for a nouveau
  merge so I can cleanly finish up Christoph's dma-mapping rework.

   - bunch of small misc stuff

   - fold abs64() into abs(), remove abs64()

   - new_valid_dev() cleanups

   - binfmt_elf_fdpic feature work"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (24 commits)
  fs/binfmt_elf_fdpic.c: provide NOMMU loader for regular ELF binaries
  fs/stat.c: remove unnecessary new_valid_dev() check
  fs/reiserfs/namei.c: remove unnecessary new_valid_dev() check
  fs/nilfs2/namei.c: remove unnecessary new_valid_dev() check
  fs/ncpfs/dir.c: remove unnecessary new_valid_dev() check
  fs/jfs: remove unnecessary new_valid_dev() checks
  fs/hpfs/namei.c: remove unnecessary new_valid_dev() check
  fs/f2fs/namei.c: remove unnecessary new_valid_dev() check
  fs/ext2/namei.c: remove unnecessary new_valid_dev() check
  fs/exofs/namei.c: remove unnecessary new_valid_dev() check
  fs/btrfs/inode.c: remove unnecessary new_valid_dev() check
  fs/9p: remove unnecessary new_valid_dev() checks
  include/linux/kdev_t.h: old/new_valid_dev() can return bool
  include/linux/kdev_t.h: remove unused huge_valid_dev()
  kmap_atomic_to_page() has no users, remove it
  drivers/scsi/cxgbi: fix build with EXTRA_CFLAGS
  dma: remove external references to dma_supported
  Documentation/sysctl/vm.txt: fix misleading code reference of overcommit_memory
  remove abs64()
  kernel.h: make abs() work with 64-bit types
  ...
2015-11-09 21:05:13 -08:00
Linus Torvalds
e6604ecb70 NFS client updates for Linux 4.4
Highlights include:
 
 Features:
 - RDMA client backchannel from Chuck
 - Support for NFSv4.2 file CLONE using the btrfs ioctl
 
 Bugfixes + cleanups
 - Move socket data receive out of the bottom halves and into a workqueue
 - Refactor NFSv4 error handling so synchronous and asynchronous RPC handles
   errors identically.
 - Fix a panic when blocks or object layouts reads return a bad data length
 - Fix nfsroot so it can handle a 1024 byte long path.
 - Fix bad usage of page offset in bl_read_pagelist
 - Various NFSv4 callback cleanups+fixes
 - Fix GETATTR bitmap verification
 - Support hexadecimal number for sunrpc debug sysctl files
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWQPMXAAoJEGcL54qWCgDy6ZUQAL32vpgyMXe7R4jcxoQxm52+
 tn8FrY8aBZAqucvQsIGCrYfE01W/s8goDTQdZODn0MCcoor12BTPVYNIR42/J/no
 MNnRTDF0dJ4WG+inX9G87XGG6sFN3wDaQcCaexknkQZlFNF4KthxojzR2BgjmRVI
 p3WKkLSNTt6DYQQ8eDetvKoDT0AjR/KCYm89tiE8GMhKYcaZl6dTazJxwOcp2CX9
 YDW6+fQbsv8qp5v2ay03e88O/DSmcNRFoxy/KUGT9OwJgdN08IN8fTt6GG38yycT
 D9tb9uObBRcll4PnucouadBcykGr6jAP0z8HklE266LH1dwYLOHQoDFdgAs0QGtq
 nlySiKvToj6CYXonXoPOjZF3P/lxlkj5ViZ2enBxgxrPmyWl172cUSa6NTXOMO46
 kPpxw50xa1gP5kkBVwIZ6XZuzl/5YRhB3BRP3g6yuJCbAwVBJvawYU7riC+6DEB9
 zygVfm21vi9juUQXJ37zXVRBTtoFhFjuSxcAYxc63o181lWYShKQ3IiRYg+zTxnq
 7DOhXa0ZNGvMgJJi0tH9Es3/S6TrGhyKh5gKY/o2XUjY0hCSsCSdP6jw6Mb9Ax1s
 0LzByHAikxBKPt2OFeoUgwycI2xqow4iAfuFk071iP7n0nwC804cUHSkGxW67dBZ
 Ve5Skkg1CV+oWQYxGmGZ
 =py1V
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.4-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  New features:
   - RDMA client backchannel from Chuck
   - Support for NFSv4.2 file CLONE using the btrfs ioctl

  Bugfixes + cleanups:
   - Move socket data receive out of the bottom halves and into a
     workqueue
   - Refactor NFSv4 error handling so synchronous and asynchronous RPC
     handles errors identically.
   - Fix a panic when blocks or object layouts reads return a bad data
     length
   - Fix nfsroot so it can handle a 1024 byte long path.
   - Fix bad usage of page offset in bl_read_pagelist
   - Various NFSv4 callback cleanups+fixes
   - Fix GETATTR bitmap verification
   - Support hexadecimal number for sunrpc debug sysctl files"

* tag 'nfs-for-4.4-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (53 commits)
  Sunrpc: Supports hexadecimal number for sysctl files of sunrpc debug
  nfs: Fix GETATTR bitmap verification
  nfs: Remove unused xdr page offsets in getacl/setacl arguments
  fs/nfs: remove unnecessary new_valid_dev check
  SUNRPC: fix variable type
  NFS: Enable client side NFSv4.1 backchannel to use other transports
  pNFS/flexfiles: Add support for FF_FLAGS_NO_IO_THRU_MDS
  pNFS/flexfiles: When mirrored, retry failed reads by switching mirrors
  SUNRPC: Remove the TCP-only restriction in bc_svc_process()
  svcrdma: Add backward direction service for RPC/RDMA transport
  xprtrdma: Handle incoming backward direction RPC calls
  xprtrdma: Add support for sending backward direction RPC replies
  xprtrdma: Pre-allocate Work Requests for backchannel
  xprtrdma: Pre-allocate backward rpc_rqst and send/receive buffers
  SUNRPC: Abstract backchannel operations
  xprtrdma: Saving IRQs no longer needed for rb_lock
  xprtrdma: Remove reply tasklet
  xprtrdma: Use workqueue to process RPC/RDMA replies
  xprtrdma: Replace send and receive arrays
  xprtrdma: Refactor reply handler error handling
  ...
2015-11-09 18:11:22 -08:00
Linus Torvalds
9d74288ca7 GFS2: merge window
Here is a list of patches we've accumulated for GFS2 for the current upstream
 merge window. There are only six patches this time:
 
 1. A cleanup patch from Andreas to remove the gl_spin #define in favor
    of its value for the sake of clarity.
 2. A fix from Andy Price to mark the inode dirty during fallocate.
 3. A fix from Andy Price to set s_mode on mount failures to prevent
    a stack trace.
 4. A patch from me to prevent a kernel BUG() in trans_add_meta/trans_add_data
    due to uninitialized storage.
 5. A patch from me to protecting our freeing of the in-core directory
    hash table to prevent double-free.
 6. A fix for a page/block rounding problem that resulted in a metadata
    coherency problem when the block size != page size.
 
 I've got a lot more patches in various stages of review and testing,
 but I'm afraid they'll have to wait until the next merge window. So
 next time we're likely to have a lot more.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWQMSyAAoJENeLYdPf93o7k+EH/inFFqkYxLCyXZngihTHdZvS
 tYAwYxPJw6UgSrZ1dY6iwcmhy6YgT7a98RJdPA3Kj0SvJxQVBiJ5uc0VKpK0bj72
 l7pVPkMEWCHs8u8RAIGfnik8y6IxOP35+EN0U/3ZLMG1Gc+Tmq9M8KLlnhfX980q
 oaniaDJAUaSSW8RxD2AKabxjoJ0DKnE6MtDHsL/JWhp1j5co5BbbwOzBmBa2mLCI
 RQ8YEvjqjtgm91g33pkxXJVMjAkFqLjRSVfomd5MSQWRUb+eGIpd3LFThHzVfm55
 2f4j2kd2V4i7rTrh8Q3RdoYRoFgpXyxXQ3R2UYL59b7B2DvGTyAKyDxfU237ZXQ=
 =yanT
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 updates from Bob Peterson:
 "Here is a list of patches we've accumulated for GFS2 for the current
  upstream merge window.  There are only six patches this time:

   1. A cleanup patch from Andreas to remove the gl_spin #define in favor
      of its value for the sake of clarity.
   2. A fix from Andy Price to mark the inode dirty during fallocate.
   3. A fix from Andy Price to set s_mode on mount failures to prevent a
      stack trace.
   4  A patch from me to prevent a kernel BUG() in trans_add_meta/trans_add_data
      due to uninitialized storage.
   5. A patch from me to protecting our freeing of the in-core directory
      hash table to prevent double-free.
   6. A fix for a page/block rounding problem that resulted in a metadata
      coherency problem when the block size != page size"

  I've got a lot more patches in various stages of review and testing,
  but I'm afraid they'll have to wait until the next merge window.  So
  next time we're likely to have a lot more"

* tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  GFS2: Fix rgrp end rounding problem for bsize < page size
  GFS2: Protect freeing directory hash table with i_lock spin_lock
  gfs2: Remove gl_spin define
  gfs2: Add missing else in trans_add_meta/data
  GFS2: Set s_mode before parsing mount options
  GFS2: fallocate: do not rely on file_update_time to mark the inode dirty
2015-11-09 18:01:23 -08:00
Linus Torvalds
123a28d8b5 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext2 fix from Jan Kara:
 "Fix for DAX on ext2"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  ext2: Add locking for DAX faults
2015-11-09 17:38:34 -08:00
Linus Torvalds
e4da7e9a54 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu
Pull m68knommu/coldfire fix from Greg Ungerer:
 "Only a single patch, fixes brk area setup problem in nommu
  environments"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
  fs/binfmt_elf_fdpic.c: fix brk area overlap with stack on NOMMU
2015-11-09 16:22:26 -08:00
Rich Felker
1bde925d23 fs/binfmt_elf_fdpic.c: provide NOMMU loader for regular ELF binaries
The ELF binary loader in binfmt_elf.c requires an MMU, making it
impossible to use regular ELF binaries on NOMMU archs.  However, the FDPIC
ELF loader in binfmt_elf_fdpic.c is fully capable as a loader for plain
ELF, which requires constant displacements between LOAD segments, since it
already supports FDPIC ELF files flagged as needing constant displacement.

This patch adjusts the FDPIC ELF loader to accept non-FDPIC ELF files on
NOMMU archs.  They are treated identically to FDPIC ELF files with the
constant-displacement flag bit set, except for personality, which must
match the ABI of the program being loaded; the PER_LINUX_FDPIC personality
controls how the kernel interprets function pointers passed to sigaction.

Files that do not set a stack size requirement explicitly are given a
default stack size (matching the amount of committed stack the normal ELF
loader for MMU archs would give them) rather than being rejected; this is
necessary because plain ELF files generally do not declare stack
requirements in theit program headers.

Only ET_DYN (PIE) format ELF files are supported, since loading at a fixed
virtual address is not possible on NOMMU.

This patch was developed and tested on J2 (SH2-compatible) but should
be usable immediately on all archs where binfmt_elf_fdpic is
available. Moreover, by providing dummy definitions of the
elf_check_fdpic() and elf_check_const_displacement() macros for archs
which lack an FDPIC ABI, it should be possible to enable building of
binfmt_elf_fdpic on all other NOMMU archs and thereby give them ELF
binary support, but I have not yet tested this.

The motivation for using binfmt_elf_fdpic.c rather than adapting
binfmt_elf.c to NOMMU is that the former already has all the necessary
code to work properly on NOMMU and has already received widespread
real-world use and testing. I hope this is not controversial.

I'm not really happy with having to unset the FDPIC_FUNCPTRS
personality bit when loading non-FDPIC ELF. This bit should really
reset automatically on execve, since otherwise, executing non-ELF
binaries (e.g. bFLT) from an FDPIC process will leave the personality
in the wrong state and severely break signal handling. But that's a
separate, existing bug and I don't know the right place to fix it.

Signed-off-by: Rich Felker <dalias@libc.org>
Acked-by: Greg Ungerer <gerg@uclinux.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Oleg Endo <oleg.endo@t-online.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-09 15:11:24 -08:00
Yaowei Bai
28f65708a5 fs/stat.c: remove unnecessary new_valid_dev() check
new_valid_dev() always returns 1, so the !new_valid_dev() check is not
needed.  Remove it.

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-09 15:11:24 -08:00
Yaowei Bai
3cc5d9a905 fs/reiserfs/namei.c: remove unnecessary new_valid_dev() check
new_valid_dev() always returns 1, so the !new_valid_dev() check is not
needed.  Remove it.

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-09 15:11:24 -08:00
Yaowei Bai
3348a172be fs/nilfs2/namei.c: remove unnecessary new_valid_dev() check
new_valid_dev() always returns 1, so the !new_valid_dev() check is not
needed.  Remove it.

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-09 15:11:24 -08:00
Yaowei Bai
4467e29f0c fs/ncpfs/dir.c: remove unnecessary new_valid_dev() check
new_valid_dev() always returns 1, so the !new_valid_dev() check is not
needed.  Remove it.

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Petr Vandrovec <petr@vandrovec.name>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-09 15:11:24 -08:00
Yaowei Bai
a2a1704409 fs/jfs: remove unnecessary new_valid_dev() checks
new_valid_dev() always returns 1, so the !new_valid_dev() checks are not
needed.  Remove them.

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-09 15:11:24 -08:00
Yaowei Bai
fdca5e6a6d fs/hpfs/namei.c: remove unnecessary new_valid_dev() check
new_valid_dev() always returns 1, so the !new_valid_dev() check is not
needed.  Remove it.

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-09 15:11:24 -08:00
Yaowei Bai
a8415e4b13 fs/f2fs/namei.c: remove unnecessary new_valid_dev() check
new_valid_dev() always returns 1, so the !new_valid_dev() check is not
needed.  Remove it.

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Changman Lee <cm224.lee@samsung.com>
Cc: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-09 15:11:24 -08:00
Yaowei Bai
d7df00072e fs/ext2/namei.c: remove unnecessary new_valid_dev() check
new_valid_dev() always returns 1, so the !new_valid_dev() check is not
needed.  Remove it.

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jan Kara <jack@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-09 15:11:24 -08:00
Yaowei Bai
5738939289 fs/exofs/namei.c: remove unnecessary new_valid_dev() check
new_valid_dev() always returns 1, so the !new_valid_dev() check is not
needed.  Remove it.

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Acked-by: Boaz Harrosh <ooo@electrozaur.com>
Cc: Benny Halevy <bhalevy@primarydata.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-09 15:11:24 -08:00
Yaowei Bai
7cac0a8599 fs/btrfs/inode.c: remove unnecessary new_valid_dev() check
new_valid_dev() always returns 1, so the !new_valid_dev() check is not
needed.  Remove it.

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Acked-by: David Sterba <dsterba@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-09 15:11:24 -08:00
Yaowei Bai
349c7037b1 fs/9p: remove unnecessary new_valid_dev() checks
new_valid_dev() always returns 1, so the !new_valid_dev() check is not
needed.  Remove it.

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-09 15:11:24 -08:00
Andrew Morton
79211c8ed1 remove abs64()
Switch everything to the new and more capable implementation of abs().
Mainly to give the new abs() a bit of a workout.

Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-09 15:11:24 -08:00
Randy Dunlap
dbce03b9e3 fs/writeback.c: fix kernel-doc warnings
Fix kernel-doc warnings in fs/fs-writeback.c by moving a #define macro to
after the function's opening brace.  Also #undef this macro at the end of
the function.

  ../fs/fs-writeback.c:1984: warning: Excess function parameter 'inode' description in 'I_DIRTY_INODE'
  ../fs/fs-writeback.c:1984: warning: Excess function parameter 'flags' description in 'I_DIRTY_INODE'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-09 15:11:24 -08:00
Randy Dunlap
30fdc8ee0e fs/inode.c: fix kernel-doc warning
Fix kernel-doc warning in fs/inode.c:

  ../fs/inode.c:1606: warning: No description found for parameter 'inode'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-09 15:11:24 -08:00
Ross Zwisler
ab27a8d04b coredump: add DAX filtering for FDPIC ELF coredumps
Add explicit filtering for DAX mappings to FDPIC ELF coredump.  This is
useful because DAX mappings have the potential to be very large.

This patch has only been compile tested.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-11-09 13:29:54 -05:00
Ross Zwisler
5037835c1f coredump: add DAX filtering for ELF coredumps
Add two new flags to the existing coredump mechanism for ELF files to
allow us to explicitly filter DAX mappings.  This is desirable because
DAX mappings, like hugetlb mappings, have the potential to be very
large.

Update the coredump_filter documentation in
Documentation/filesystems/proc.txt so that it addresses the new DAX
coredump flags.  Also update the documented default value of
coredump_filter to be consistent with the core(5) man page.  The
documentation being updated talks about bit 4, Dump ELF headers, which
is enabled if CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is turned on in the
kernel config.  This kernel config option defaults to "y" if both ELF
binaries and coredump are enabled.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-11-09 13:29:54 -05:00
Bob Peterson
31dddd9eb9 GFS2: Fix rgrp end rounding problem for bsize < page size
This patch fixes a bug introduced by commit 7005c3e. That patch
tries to map a vm range for resource groups, but the calculation
breaks down when the block size is less than the page size.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2015-11-09 09:38:02 -06:00
Linus Torvalds
ad804a0b2a Merge branch 'akpm' (patches from Andrew)
Merge second patch-bomb from Andrew Morton:

 - most of the rest of MM

 - procfs

 - lib/ updates

 - printk updates

 - bitops infrastructure tweaks

 - checkpatch updates

 - nilfs2 update

 - signals

 - various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc,
   dma-debug, dma-mapping, ...

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits)
  ipc,msg: drop dst nil validation in copy_msg
  include/linux/zutil.h: fix usage example of zlib_adler32()
  panic: release stale console lock to always get the logbuf printed out
  dma-debug: check nents in dma_sync_sg*
  dma-mapping: tidy up dma_parms default handling
  pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode
  kexec: use file name as the output message prefix
  fs, seqfile: always allow oom killer
  seq_file: reuse string_escape_str()
  fs/seq_file: use seq_* helpers in seq_hex_dump()
  coredump: change zap_threads() and zap_process() to use for_each_thread()
  coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP
  signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT)
  signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread()
  signal: turn dequeue_signal_lock() into kernel_dequeue_signal()
  signals: kill block_all_signals() and unblock_all_signals()
  nilfs2: fix gcc uninitialized-variable warnings in powerpc build
  nilfs2: fix gcc unused-but-set-variable warnings
  MAINTAINERS: nilfs2: add header file for tracing
  nilfs2: add tracepoints for analyzing reading and writing metadata files
  ...
2015-11-07 14:32:45 -08:00
Linus Torvalds
75021d2859 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial updates from Jiri Kosina:
 "Trivial stuff from trivial tree that can be trivially summed up as:

   - treewide drop of spurious unlikely() before IS_ERR() from Viresh
     Kumar

   - cosmetic fixes (that don't really affect basic functionality of the
     driver) for pktcdvd and bcache, from Julia Lawall and Petr Mladek

   - various comment / printk fixes and updates all over the place"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
  bcache: Really show state of work pending bit
  hwmon: applesmc: fix comment typos
  Kconfig: remove comment about scsi_wait_scan module
  class_find_device: fix reference to argument "match"
  debugfs: document that debugfs_remove*() accepts NULL and error values
  net: Drop unlikely before IS_ERR(_OR_NULL)
  mm: Drop unlikely before IS_ERR(_OR_NULL)
  fs: Drop unlikely before IS_ERR(_OR_NULL)
  drivers: net: Drop unlikely before IS_ERR(_OR_NULL)
  drivers: misc: Drop unlikely before IS_ERR(_OR_NULL)
  UBI: Update comments to reflect UBI_METAONLY flag
  pktcdvd: drop null test before destroy functions
2015-11-07 13:05:44 -08:00
Jens Axboe
15c4f638f3 directio: add block polling support
This adds support for sync O_DIRECT read/write poll support.

Signed-off-by: Jens Axboe <axboe@fb.com>
[hch: split from a larger patch, minor updates]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
2015-11-07 10:40:47 -07:00
Dongsheng Yang
8c1c5f2638 ubifs: introduce UBIFS_ATIME_SUPPORT to ubifs
To make ubifs support atime flexily, this commit introduces
a Kconfig option named as UBIFS_ATIME_SUPPORT.

With UBIFS_ATIME_SUPPORT=n:
	ubifs keeps the full compatibility to no_atime from
the start of ubifs.

=================UBIFS_ATIME_SUPPORT=n=======================
-o - no atime
-o atime - no atime
-o noatime - no atime
-o relatime - no atime
-o strictatime - no atime
-o lazyatime - no atime

With UBIFS_ATIME_SUPPORT=y:
	ubifs supports the atime same with other main stream
file systems.
=================UBIFS_ATIME_SUPPORT=y=======================
-o - default behavior (relatime currently)
-o atime - atime support
-o noatime - no atime support
-o relatime - relative atime support
-o strictatime - strict atime support
-o lazyatime - lazy atime support

Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Reviewed-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Richard Weinberger <richard@nod.at>
2015-11-07 11:35:08 +01:00
Dongsheng Yang
ab92a20bce ubifs: make ubifs_[get|set]xattr atomic
This commit make the ubifs_[get|set]xattr protected by ui_mutex.

Originally, there is a possibility that ubifs_getxattr to get
a wrong value.

  P1                                  P2
----------                  	----------
ubifs_getxattr                      ubifs_setxattr
					- kfree()
- memcpy()
					- kmemdup()

Then ubifs_getxattr() would get a non-sense data. To solve this
problem, this commit make the xattr of ubifs_inode updated in
atomic.

Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2015-11-07 11:33:17 +01:00
Greg Thelen
0f930902eb fs, seqfile: always allow oom killer
Since 5cec38ac86 ("fs, seq_file: fallback to vmalloc instead of oom kill
processes") seq_buf_alloc() avoids calling the oom killer for PAGE_SIZE or
smaller allocations; but larger allocations can use the oom killer via
vmalloc().  Thus reads of small files can return ENOMEM, but larger files
use the oom killer to avoid ENOMEM.

The effect of this bug is that reads from /proc and other virtual
filesystems can return ENOMEM instead of the preferred behavior - oom
killing something (possibly the calling process).  I don't know of anyone
except Google who has noticed the issue.

I suspect the fix is more needed in smaller systems where there isn't any
reclaimable memory.  But these seem like the kinds of systems which
probably don't use the oom killer for production situations.

Memory overcommit requires use of the oom killer to select a victim
regardless of file size.

Enable oom killer for small seq_buf_alloc() allocations.

Fixes: 5cec38ac86 ("fs, seq_file: fallback to vmalloc instead of oom kill processes")
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Greg Thelen <gthelen@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Andy Shevchenko
25c6bb76ea seq_file: reuse string_escape_str()
strint_escape_str() escapes input string by given criteria.  In case of
seq_escape() the criteria is to convert some characters to their octal
representation.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Andy Shevchenko
8b91a318e4 fs/seq_file: use seq_* helpers in seq_hex_dump()
This improves code readability.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Oleg Nesterov
d61ba58953 coredump: change zap_threads() and zap_process() to use for_each_thread()
Change zap_threads() paths to use for_each_thread() rather than
while_each_thread().

While at it, change zap_threads() to avoid the nested if's to make the
code more readable and lessen the indentation.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Kyle Walker <kwalker@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Stanislav Kozina <skozina@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Oleg Nesterov
5fa534c987 coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP
task_will_free_mem() is wrong in many ways, and in particular the
SIGNAL_GROUP_COREDUMP check is not reliable: a task can participate in the
coredumping without SIGNAL_GROUP_COREDUMP bit set.

change zap_threads() paths to always set SIGNAL_GROUP_COREDUMP even if
other CLONE_VM processes can't react to SIGKILL.  Fortunately, at least
oom-kill case if fine; it kills all tasks sharing the same mm, so it
should also kill the process which actually dumps the core.

The change in prepare_signal() is not strictly necessary, it just ensures
that the patch does not bring another subtle behavioural change.  But it
reminds us that this SIGNAL_GROUP_EXIT/COREDUMP case needs more changes.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Kyle Walker <kwalker@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Stanislav Kozina <skozina@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Oleg Nesterov
9317bb9696 signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT)
jffs2_garbage_collect_thread() does allow_signal(SIGCONT) for no reason,
SIGCONT will wake a stopped task up even if it is ignored.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Tejun Heo <tj@kernel.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Markus Pargmann <mpa@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Oleg Nesterov
9a13049e83 signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread()
jffs2_garbage_collect_thread() can race with SIGCONT and sleep in
TASK_STOPPED state after it was already sent. Add the new helper,
kernel_signal_stop(), which does this correctly.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Tejun Heo <tj@kernel.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Markus Pargmann <mpa@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Oleg Nesterov
be0e6f290f signal: turn dequeue_signal_lock() into kernel_dequeue_signal()
1. Rename dequeue_signal_lock() to kernel_dequeue_signal(). This
   matches another "for kthreads only" kernel_sigaction() helper.

2. Remove the "tsk" and "mask" arguments, they are always current
   and current->blocked. And it is simply wrong if tsk != current.

3. We could also remove the 3rd "siginfo_t *info" arg but it looks
   potentially useful. However we can simplify the callers if we
   change kernel_dequeue_signal() to accept info => NULL.

4. Remove _irqsave, it is never called from atomic context.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Tejun Heo <tj@kernel.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Markus Pargmann <mpa@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Ryusuke Konishi
4f05028f8d nilfs2: fix gcc uninitialized-variable warnings in powerpc build
Some false positive warnings are reported for powerpc build.

The following warnings are reported in
 http://kisskb.ellerman.id.au/kisskb/buildresult/12519703/

   CC      fs/nilfs2/super.o
 fs/nilfs2/super.c: In function 'nilfs_resize_fs':
 fs/nilfs2/super.c:376:2: warning: 'blocknr' may be used uninitialized in this function [-Wuninitialized]
 fs/nilfs2/super.c:362:11: note: 'blocknr' was declared here
   CC      fs/nilfs2/recovery.o
 fs/nilfs2/recovery.c: In function 'nilfs_salvage_orphan_logs':
 fs/nilfs2/recovery.c:631:21: warning: 'sum' may be used uninitialized in this function [-Wuninitialized]
 fs/nilfs2/recovery.c:585:32: note: 'sum' was declared here
 fs/nilfs2/recovery.c: In function 'nilfs_search_super_root':
 fs/nilfs2/recovery.c:873:11: warning: 'sum' may be used uninitialized in this function [-Wuninitialized]

Another similar warning is reported in
 http://kisskb.ellerman.id.au/kisskb/buildresult/12520079/

   CC      fs/nilfs2/btree.o
 fs/nilfs2/btree.c: In function 'nilfs_btree_convert_and_insert':
 include/asm-generic/bitops/non-atomic.h:105:20: warning: 'bh' may be used uninitialized in this function [-Wuninitialized]
 fs/nilfs2/btree.c:1859:22: note: 'bh' was declared here

This cleans out these warnings by forcing the variables to be initialized.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Ryusuke Konishi
09ef29e0f6 nilfs2: fix gcc unused-but-set-variable warnings
Fix the following build warnings:

 $ make W=1
 [...]
   CC [M]  fs/nilfs2/btree.o
 fs/nilfs2/btree.c: In function 'nilfs_btree_split':
 fs/nilfs2/btree.c:923:8: warning: variable 'newptr' set but not used [-Wunused-but-set-variable]
   __u64 newptr;
         ^
 fs/nilfs2/btree.c:922:8: warning: variable 'newkey' set but not used [-Wunused-but-set-variable]
   __u64 newkey;
         ^
   CC [M]  fs/nilfs2/dat.o
 fs/nilfs2/dat.c: In function 'nilfs_dat_prepare_end':
 fs/nilfs2/dat.c:158:8: warning: variable 'start' set but not used [-Wunused-but-set-variable]
   __u64 start;
         ^
   CC [M]  fs/nilfs2/segment.o
 fs/nilfs2/segment.c: In function 'nilfs_segctor_do_immediate_flush':
 fs/nilfs2/segment.c:2433:6: warning: variable 'err' set but not used [-Wunused-but-set-variable]
   int err;
       ^
   CC [M]  fs/nilfs2/sufile.o
 fs/nilfs2/sufile.c: In function 'nilfs_sufile_alloc':
 fs/nilfs2/sufile.c:320:27: warning: variable 'ncleansegs' set but not used [-Wunused-but-set-variable]
   unsigned long nsegments, ncleansegs, nsus, cnt;
                            ^
   CC [M]  fs/nilfs2/alloc.o
 fs/nilfs2/alloc.c: In function 'nilfs_palloc_prepare_alloc_entry':
 fs/nilfs2/alloc.c:478:38: warning: variable 'groups_per_desc_block' set but not used [-Wunused-but-set-variable]
   unsigned long n, entries_per_group, groups_per_desc_block;
                                       ^

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Hitoshi Mitake
a9cd207c23 nilfs2: add tracepoints for analyzing reading and writing metadata files
This patch adds tracepoints for analyzing requests of reading and writing
metadata files.  The tracepoints cover every in-place mdt files (cpfile,
sufile, and datfile).

Example of tracing mdt_insert_new_block():
              cp-14635 [000] ...1 30598.199309: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 155
              cp-14635 [000] ...1 30598.199520: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 5
              cp-14635 [000] ...1 30598.200828: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 253

Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: TK Kato <TK.Kato@wdc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Hitoshi Mitake
83eec5e6dd nilfs2: add tracepoints for analyzing sufile manipulation
This patch adds tracepoints which would be useful for analyzing segment
usage from a perspective of high level sufile manipulation (check, alloc,
free).  sufile is an important in-place updated metadata file, so
analyzing the behavior would be useful for performance turning.

example of usage (a case of allocation):

$ sudo bin/tpoint nilfs2:nilfs2_segment_usage_allocated
Tracing nilfs2:nilfs2_segment_usage_allocated. Ctrl-C to end.
        segctord-17800 [002] ...1 10671.867294: nilfs2_segment_usage_allocated: sufile = ffff880054f908a8 segnum = 2
        segctord-17800 [002] ...1 10675.073477: nilfs2_segment_usage_allocated: sufile = ffff880054f908a8 segnum = 3

Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Benixon Dhas <benixon.dhas@wdc.com>
Cc: TK Kato <TK.Kato@wdc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Hitoshi Mitake
44fda11460 nilfs2: add a tracepoint for transaction events
This patch adds a tracepoint for transaction events of nilfs.  With the
tracepoint, these events can be tracked: begin, abort, commit, trylock,
lock, and unlock.  Basically, these events have corresponding functions
e.g.  begin event corresponds nilfs_transaction_begin().  The unlock event
is an exception.  It corresponds to the iteration in
nilfs_transaction_lock().

Only one tracepoint is introcued: nilfs2_transaction_transition.  The
above events are distinguished with newly introduced enum.  With this
tracepoint, we can analyse a critical section of segment constructoin.

Sample output by tpoint of perf-tools:
              cp-4457  [000] ...1    63.266220: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 1 flags = 9 state = BEGIN
              cp-4457  [000] ...1    63.266221: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 0 flags = 9 state = COMMIT
              cp-4457  [000] ...1    63.266221: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 0 flags = 9 state = COMMIT
        segctord-4371  [001] ...1    68.261196: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = TRYLOCK
        segctord-4371  [001] ...1    68.261280: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = LOCK
        segctord-4371  [001] ...1    68.261877: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 1 flags = 10 state = BEGIN
        segctord-4371  [001] ...1    68.262116: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 18 state = COMMIT
        segctord-4371  [001] ...1    68.265032: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 18 state = UNLOCK
        segctord-4371  [001] ...1   132.376847: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = TRYLOCK

This patch also does trivial cleaning of comma usage in collection stage
transition event for consistent coding style.

Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Hitoshi Mitake
5849770383 nilfs2: add a tracepoint for tracking stage transition of segment construction
This patch adds a tracepoint for tracking stage transition of block
collection in segment construction.  With the tracepoint, we can analysis
the behavior of segment construction in depth.  It would be useful for
bottleneck detection and debugging, etc.

The tracepoint is created with the standard trace API of linux (like ext3,
ext4, f2fs and btrfs).  So we can analysis with existing tools easily.  Of
course, more detailed analysis will be possible if we can create nilfs
specific analysis tools.

Below is an example of event dump with Brendan Gregg's perf-tools
(https://github.com/brendangregg/perf-tools).  Time consumption between
each stage can be obtained.

$ sudo bin/tpoint nilfs2:nilfs2_collection_stage_transition
Tracing nilfs2:nilfs2_collection_stage_transition. Ctrl-C to end.
        segctord-14875 [003] ...1 28311.067794: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_INIT
        segctord-14875 [003] ...1 28311.068139: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_GC
        segctord-14875 [003] ...1 28311.068139: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_FILE
        segctord-14875 [003] ...1 28311.068486: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_IFILE
        segctord-14875 [003] ...1 28311.068540: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_CPFILE
        segctord-14875 [003] ...1 28311.068561: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_SUFILE
        segctord-14875 [003] ...1 28311.068565: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_DAT
        segctord-14875 [003] ...1 28311.068573: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_SR
        segctord-14875 [003] ...1 28311.068574: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_DONE

For capturing transition correctly, this patch adds wrappers for the
member scnt of nilfs_cstage.  With this change, every transition of the
stage can produce trace event in a correct manner.

Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Ryusuke Konishi
d0c14a9ee7 nilfs2: free unused dat file blocks during garbage collection
As a nilfs2 volume ages, the amount of available disk space decreases
little by little due to bloat of DAT (disk address translation) metadata
file.  Even if we delete all files in a file system and free their block
addresses from the DAT file through a garbage collection, empty DAT blocks
are not freed.

This fixes the issue by extending the deallocator of block addresses so
that empty data blocks and empty bitmap blocks of DAT are deleted.

The following comparison shows the effect of this patch.  Each shows disk
amount information of a nilfs2 volume that we cleaned out by deleting all
files and running gc after having filled 90% of its capacity.

Before:
Filesystem     1K-blocks     Used Available Use% Mounted on
/dev/sda1      500105212  3022844 472072192   1% /test

After:
Filesystem     1K-blocks     Used Available Use% Mounted on
/dev/sda1      500105212    16380 475078656   1% /test

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00