51787 Commits

Author SHA1 Message Date
Linus Torvalds
808eb24e0e New in this version:
- Refactor the incore extent map manipulations to use a cursor instead of
   directly modifying extent data.
 - Refactor the incore extent map cursor to use an in-memory btree instead
   of a single high-order allocation.  This eliminates a major source of
   complaints about insufficient memory when opening a heavily fragmented
   file into a system whose memory is also heavily fragmented.
 - Fix a longstanding bug where deleting a file with a complex extended
   attribute btree incorrectly handled memory pointers, which could lead
   to memory corruption.
 - Improve metadata validation to eliminate crashing problems found while
   fuzzing xfs.
 - Move the error injection tag definitions into libxfs to be shared with
   userspace components.
 - Fix some log recovery bugs where we'd underflow log block position
   vector and incorrectly fail log recovery.
 - Drain the buffer lru after log recovery to force recovered buffers back
   through the verifiers after mount.  On a v4 filesystem the log never
   attaches verifiers during log replay (v5 does), so we could end up with
   buffers marked verified but without having ever been verified.
 - Fix various other bugs.
 - Introduce the first part of a new online fsck tool.  The new fsck tool
   will be able to iterate every piece of metadata in the filesystem to
   look for obvious errors and corruptions.  In the next release cycle
   the checking will be extended to cross-reference with the other fs
   metadata, so this feature should only be used by the developers in the
   mean time.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCgAGBQJaBdbZAAoJEPh/dxk0SrTrKoUP/RroXfZX3PSn3Z0Qo99E6Ev9
 +Z3CoJSSfXPJtSPBh6mUgonzzpKMqoN3kj8ZezYRLaeSEo+36ZkBtdLOb/8PydOZ
 4agNvtGDhwt88+1vSAccbT6l4wB/Z16NfzGaVN4dioHF1LpC4rORqdEuoq5xXxzo
 JVjuwTbz8uPSCpTTukzll9XFghvvj+YXm20MgEOCJiR5uULlGW5gZ38mNCmS76Bk
 Nks5dNSmNzlGwIpwsVmthd0s0jwj8WeQPnUOv27naRm4J6GOvB5gE8vn15e07AHT
 EqeTTHy25lnJhmpazphvDwbN3B6UdWCHGoG8ll2B+45pZegS7SKt4G6b4ittHq9x
 +ErCHFElrNCO77QDQmQoXHy6+DJV/Rdnyb5K575rA91TAb0q2C7OP6vQt6oV0rDM
 obZ7M3MvW9jBVn9A07Hdsk4+J2/SYW0jf5Dv4O69U1KuvZYUES2B++PL+u7pdTpy
 JPg1+pWO+AgxRKQNviFFzRwQDPE3JSp854TCE/5D/59h2ZeSWg+g4ZH5jcLjKwKM
 +uHbJgqOdgk2/WPHiEFCOouom3RUxdE1Yg7S87sbaQC4iU5oWWQ8Kenl2AUyNQEN
 yaU/leq6rqX3Z2z+T70ujWSvh5xl07YHLW3LJszZMi4w+i8C7c0lIX9F8CNu26Cf
 yJApOvMWhhY3Mf7Gn1l5
 =vQrJ
 -----END PGP SIGNATURE-----

Merge tag 'xfs-4.15-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs updates from Darrick Wong:
 "xfs: great scads of new stuff for 4.15.

  This merge cycle, we're making some substantive changes to XFS. The
  in-core extent mappings have been refactored to use proper iterators
  and a btree to handle heavily fragmented files without needing
  high-order memory allocations; some important log recovery bug fixes;
  and the first part of the online fsck functionality.

  (The online fsck feature is disabled by default and more pieces of it
  will be coming in future release cycles.)

  This giant pile of patches has been run through a full xfstests run
  over the weekend and through a quick xfstests run against this
  morning's master, with no major failures reported.

  New in this version:

   - Refactor the incore extent map manipulations to use a cursor
     instead of directly modifying extent data.

   - Refactor the incore extent map cursor to use an in-memory btree
     instead of a single high-order allocation. This eliminates a major
     source of complaints about insufficient memory when opening a
     heavily fragmented file into a system whose memory is also heavily
     fragmented.

   - Fix a longstanding bug where deleting a file with a complex
     extended attribute btree incorrectly handled memory pointers, which
     could lead to memory corruption.

   - Improve metadata validation to eliminate crashing problems found
     while fuzzing xfs.

   - Move the error injection tag definitions into libxfs to be shared
     with userspace components.

   - Fix some log recovery bugs where we'd underflow log block position
     vector and incorrectly fail log recovery.

   - Drain the buffer lru after log recovery to force recovered buffers
     back through the verifiers after mount. On a v4 filesystem the log
     never attaches verifiers during log replay (v5 does), so we could
     end up with buffers marked verified but without having ever been
     verified.

   - Fix various other bugs.

   - Introduce the first part of a new online fsck tool. The new fsck
     tool will be able to iterate every piece of metadata in the
     filesystem to look for obvious errors and corruptions. In the next
     release cycle the checking will be extended to cross-reference with
     the other fs metadata, so this feature should only be used by the
     developers in the mean time"

* tag 'xfs-4.15-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (131 commits)
  xfs: on failed mount, force-reclaim inodes after unmounting quota controls
  xfs: check the uniqueness of the AGFL entries
  xfs: remove u_int* type usage
  xfs: handle zero entries case in xfs_iext_rebalance_leaf
  xfs: add comments documenting the rebalance algorithm
  xfs: trivial indentation fixup for xfs_iext_remove_node
  xfs: remove a superflous assignment in xfs_iext_remove_node
  xfs: add some comments to xfs_iext_insert/xfs_iext_insert_node
  xfs: fix number of records handling in xfs_iext_split_leaf
  fs/xfs: Remove NULL check before kmem_cache_destroy
  xfs: only check da node header padding on v5 filesystems
  xfs: fix btree scrub deref check
  xfs: fix uninitialized return values in scrub code
  xfs: pass inode number to xfs_scrub_ino_set_{preen,warning}
  xfs: refactor the directory data block bestfree checks
  xfs: mark xlog_verify_dest_ptr STATIC
  xfs: mark xlog_recover_check_summary STATIC
  xfs: mark xfs_btree_check_lblock and xfs_btree_check_ptr static
  xfs: remove unreachable error injection code in xfs_qm_dqget
  xfs: remove unused debug counts for xfs_lock_inodes
  ...
2017-11-14 13:15:12 -08:00
Linus Torvalds
ae9a8c4bdc Add support for online resizing of file systems with bigalloc. Fix a
two data corruption bugs involving DAX, as well as a corruption bug
 after a crash during a racing fallocate and delayed allocation.
 Finally, a number of cleanups and optimizations.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAloJCiEACgkQ8vlZVpUN
 gaOahAgAhcgdPagn/B5w+6vKFdH+hOJLKyGI0adGDyWD9YBXN0wFQvliVgXrTKei
 hxW2GdQGc6yHv9mOjvD+4Fn2AnTZk8F3GtG6zdqRM08JGF/IN2Jax2boczG/XnUz
 rT9cd3ic2Ff0KaUX+Yos55QwomTh5CAeRPgvB69o9D6L4VJzTlsWKSOBR19FmrSG
 NDmzZibgWmHcqzW9Bq8ZrXXx+KB42kUlc8tYYm2n6MTaE0LMvp3d9XcFcnm/I7Bk
 MGa2d3/3FArGD6Rkl/E82MXMSElOHJnY6jGYSDaadUeMI5FXkA6tECOSJYXqShdb
 ZJwkOBwfv2lbYZJxIBJTy/iA6zdsoQ==
 =ZzaJ
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:

 - Add support for online resizing of file systems with bigalloc

 - Fix a two data corruption bugs involving DAX, as well as a corruption
   bug after a crash during a racing fallocate and delayed allocation.

 - Finally, a number of cleanups and optimizations.

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: improve smp scalability for inode generation
  ext4: add support for online resizing with bigalloc
  ext4: mention noload when recovering on read-only device
  Documentation: fix little inconsistencies
  ext4: convert timers to use timer_setup()
  jbd2: convert timers to use timer_setup()
  ext4: remove duplicate extended attributes defs
  ext4: add ext4_should_use_dax()
  ext4: add sanity check for encryption + DAX
  ext4: prevent data corruption with journaling + DAX
  ext4: prevent data corruption with inline data + DAX
  ext4: fix interaction between i_size, fallocate, and delalloc after a crash
  ext4: retry allocations conservatively
  ext4: Switch to iomap for SEEK_HOLE / SEEK_DATA
  ext4: Add iomap support for inline data
  iomap: Add IOMAP_F_DATA_INLINE flag
  iomap: Switch from blkno to disk offset
2017-11-14 12:59:42 -08:00
Linus Torvalds
32190f0afb fscrypt: lots of cleanups, mostly courtesy by Eric Biggers
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAloI8AUACgkQ8vlZVpUN
 gaMdjgf8CCW7UhPjoZYwF8sUNtAaX9+JZT1maOcXUhpJ3vRQiRn+AzRH6yBYMm79
 +NZBwVlk4dlEe55Wh4yFIStMAstqzCrke4C9CSbExjgHNsJdU4znyYuLRMbLfyO0
 6c4NObiAIKJdW1/te1aN90keGC6min8pBZot+FqZsRr+Kq2+IOtM43JAv7efOLev
 v3LCjUf9JKxatoB8tgw4AJRa1p18p7D2APWTG05VlFq63TjhVIYNvvwcQlizLwGY
 cuEq3X59FbFdX06fJnucujU3WP3ES4/3rhufBK4NNaec5e5dbnH2KlAx7J5SyMIZ
 0qUFB/dmXDSb3gsfScSGo1F71Ad0CA==
 =asAm
 -----END PGP SIGNATURE-----

Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt

Pull fscrypt updates from Ted Ts'o:
 "Lots of cleanups, mostly courtesy by Eric Biggers"

* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt:
  fscrypt: lock mutex before checking for bounce page pool
  fscrypt: add a documentation file for filesystem-level encryption
  ext4: switch to fscrypt_prepare_setattr()
  ext4: switch to fscrypt_prepare_lookup()
  ext4: switch to fscrypt_prepare_rename()
  ext4: switch to fscrypt_prepare_link()
  ext4: switch to fscrypt_file_open()
  fscrypt: new helper function - fscrypt_prepare_setattr()
  fscrypt: new helper function - fscrypt_prepare_lookup()
  fscrypt: new helper function - fscrypt_prepare_rename()
  fscrypt: new helper function - fscrypt_prepare_link()
  fscrypt: new helper function - fscrypt_file_open()
  fscrypt: new helper function - fscrypt_require_key()
  fscrypt: remove unneeded empty fscrypt_operations structs
  fscrypt: remove ->is_encrypted()
  fscrypt: switch from ->is_encrypted() to IS_ENCRYPTED()
  fs, fscrypt: add an S_ENCRYPTED inode flag
  fscrypt: clean up include file mess
2017-11-14 11:35:15 -08:00
Linus Torvalds
37dc79565c Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
 "Here is the crypto update for 4.15:

  API:

   - Disambiguate EBUSY when queueing crypto request by adding ENOSPC.
     This change touches code outside the crypto API.
   - Reset settings when empty string is written to rng_current.

  Algorithms:

   - Add OSCCA SM3 secure hash.

  Drivers:

   - Remove old mv_cesa driver (replaced by marvell/cesa).
   - Enable rfc3686/ecb/cfb/ofb AES in crypto4xx.
   - Add ccm/gcm AES in crypto4xx.
   - Add support for BCM7278 in iproc-rng200.
   - Add hash support on Exynos in s5p-sss.
   - Fix fallback-induced error in vmx.
   - Fix output IV in atmel-aes.
   - Fix empty GCM hash in mediatek.

  Others:

   - Fix DoS potential in lib/mpi.
   - Fix potential out-of-order issues with padata"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (162 commits)
  lib/mpi: call cond_resched() from mpi_powm() loop
  crypto: stm32/hash - Fix return issue on update
  crypto: dh - Remove pointless checks for NULL 'p' and 'g'
  crypto: qat - Clean up error handling in qat_dh_set_secret()
  crypto: dh - Don't permit 'key' or 'g' size longer than 'p'
  crypto: dh - Don't permit 'p' to be 0
  crypto: dh - Fix double free of ctx->p
  hwrng: iproc-rng200 - Add support for BCM7278
  dt-bindings: rng: Document BCM7278 RNG200 compatible
  crypto: chcr - Replace _manual_ swap with swap macro
  crypto: marvell - Add a NULL entry at the end of mv_cesa_plat_id_table[]
  hwrng: virtio - Virtio RNG devices need to be re-registered after suspend/resume
  crypto: atmel - remove empty functions
  crypto: ecdh - remove empty exit()
  MAINTAINERS: update maintainer for qat
  crypto: caam - remove unused param of ctx_map_to_sec4_sg()
  crypto: caam - remove unneeded edesc zeroization
  crypto: atmel-aes - Reset the controller before each use
  crypto: atmel-aes - properly set IV after {en,de}crypt
  hwrng: core - Reset user selected rng by writing "" to rng_current
  ...
2017-11-14 10:52:09 -08:00
Jan Kara
838bee9e75 Merge udf, isofs, quota, ext2 changes for 4.15-rc1. 2017-11-14 11:09:53 +01:00
Linus Torvalds
fb0255fb29 TTY/Serial patches for 4.15-rc1
Here is the big tty/serial driver pull request for 4.15-rc1.
 
 Lots of serial driver updates in here, some small vt cleanups, and a
 raft of SPDX and license boilerplate cleanups, messing up the diffstat a
 bit.
 
 Nothing major, with no realy functional changes except better hardware
 support for some platforms.
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWgnD+w8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynAmgCfSSr/9qiCE0vfP5eVYjddzxfWyZ4AoMbKORZC
 5x2KVW0Btrbs3WmnD7ZU
 =PSea
 -----END PGP SIGNATURE-----

Merge tag 'tty-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial updates from Greg KH:
 "Here is the big tty/serial driver pull request for 4.15-rc1.

  Lots of serial driver updates in here, some small vt cleanups, and a
  raft of SPDX and license boilerplate cleanups, messing up the diffstat
  a bit.

  Nothing major, with no realy functional changes except better hardware
  support for some platforms.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'tty-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (110 commits)
  tty: ehv_bytechan: fix spelling mistake
  tty: serial: meson: allow baud-rates lower than 9600
  serial: 8250_fintek: Fix crash with baud rate B0
  serial: 8250_fintek: Disable delays for ports != 0
  serial: 8250_fintek: Return -EINVAL on invalid configuration
  tty: Remove redundant license text
  tty: serdev: Remove redundant license text
  tty: hvc: Remove redundant license text
  tty: serial: Remove redundant license text
  tty: add SPDX identifiers to all remaining files in drivers/tty/
  tty: serial: jsm: remove redundant pointer ts
  tty: serial: jsm: add space before the open parenthesis '('
  tty: serial: jsm: fix coding style
  tty: serial: jsm: delete space between function name and '('
  tty: serial: jsm: add blank line after declarations
  tty: serial: jsm: change the type of local variable
  tty: serial: imx: remove dead code imx_dma_rxint
  tty: serial: imx: disable ageing timer interrupt if dma in use
  serial: 8250: fix potential deadlock in rs485-mode
  serial: m32r_sio: Drop redundant .data assignment
  ...
2017-11-13 21:05:31 -08:00
Chao Yu
812c60564c f2fs: inject fault in inc_valid_node_count
This patch adds missing fault injection in inc_valid_node_count.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-13 20:21:35 -08:00
Chao Yu
28cfafb738 f2fs: fix to clear FI_NO_PREALLOC
We need to clear FI_NO_PREALLOC flag in error path of f2fs_file_write_iter,
otherwise we will lose the chance to preallocate blocks in latter write()
at one time.

Fixes: dc91de78e5e1 ("f2fs: do not preallocate blocks which has wrong buffer")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-13 20:21:22 -08:00
Jaegeuk Kim
2c8a4a2823 f2fs: expose quota information in debugfs
This patch shows # of dirty pages and # of hidden quota files.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-13 18:29:01 -08:00
Yunlei He
12f9ef379a f2fs: separate nat entry mem alloc from nat_tree_lock
This patch splits memory allocation part in nat_entry to avoid lock contention.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-13 18:28:48 -08:00
Linus Torvalds
2bcc673101 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "Yet another big pile of changes:

   - More year 2038 work from Arnd slowly reaching the point where we
     need to think about the syscalls themself.

   - A new timer function which allows to conditionally (re)arm a timer
     only when it's either not running or the new expiry time is sooner
     than the armed expiry time. This allows to use a single timer for
     multiple timeout requirements w/o caring about the first expiry
     time at the call site.

   - A new NMI safe accessor to clock real time for the printk timestamp
     work. Can be used by tracing, perf as well if required.

   - A large number of timer setup conversions from Kees which got
     collected here because either maintainers requested so or they
     simply got ignored. As Kees pointed out already there are a few
     trivial merge conflicts and some redundant commits which was
     unavoidable due to the size of this conversion effort.

   - Avoid a redundant iteration in the timer wheel softirq processing.

   - Provide a mechanism to treat RTC implementations depending on their
     hardware properties, i.e. don't inflict the write at the 0.5
     seconds boundary which originates from the PC CMOS RTC to all RTCs.
     No functional change as drivers need to be updated separately.

   - The usual small updates to core code clocksource drivers. Nothing
     really exciting"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (111 commits)
  timers: Add a function to start/reduce a timer
  pstore: Use ktime_get_real_fast_ns() instead of __getnstimeofday()
  timer: Prepare to change all DEFINE_TIMER() callbacks
  netfilter: ipvs: Convert timers to use timer_setup()
  scsi: qla2xxx: Convert timers to use timer_setup()
  block/aoe: discover_timer: Convert timers to use timer_setup()
  ide: Convert timers to use timer_setup()
  drbd: Convert timers to use timer_setup()
  mailbox: Convert timers to use timer_setup()
  crypto: Convert timers to use timer_setup()
  drivers/pcmcia: omap1: Fix error in automated timer conversion
  ARM: footbridge: Fix typo in timer conversion
  drivers/sgi-xp: Convert timers to use timer_setup()
  drivers/pcmcia: Convert timers to use timer_setup()
  drivers/memstick: Convert timers to use timer_setup()
  drivers/macintosh: Convert timers to use timer_setup()
  hwrng/xgene-rng: Convert timers to use timer_setup()
  auxdisplay: Convert timers to use timer_setup()
  sparc/led: Convert timers to use timer_setup()
  mips: ip22/32: Convert timers to use timer_setup()
  ...
2017-11-13 17:56:58 -08:00
Dan Williams
aaa422c4c3 fs, dax: unify IOMAP_F_DIRTY read vs write handling policy in the dax core
While reviewing whether MAP_SYNC should strengthen its current guarantee
of syncing writes from the initiating process to also include
third-party readers observing dirty metadata, Dave pointed out that the
check of IOMAP_WRITE is misplaced.

The policy of what to with IOMAP_F_DIRTY should be separated from the
generic filesystem mechanism of reporting dirty metadata. Move this
policy to the fs-dax core to simplify the per-filesystem iomap handlers,
and further centralize code that implements the MAP_SYNC policy. This
otherwise should not change behavior, it just makes it easier to change
behavior in the future.

Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reported-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-11-13 16:38:44 -08:00
Linus Torvalds
3e2014637c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The main updates in this cycle were:

   - Group balancing enhancements and cleanups (Brendan Jackman)

   - Move CPU isolation related functionality into its separate
     kernel/sched/isolation.c file, with related 'housekeeping_*()'
     namespace and nomenclature et al. (Frederic Weisbecker)

   - Improve the interactive/cpu-intense fairness calculation (Josef
     Bacik)

   - Improve the PELT code and related cleanups (Peter Zijlstra)

   - Improve the logic of pick_next_task_fair() (Uladzislau Rezki)

   - Improve the RT IPI based balancing logic (Steven Rostedt)

   - Various micro-optimizations:

   - better !CONFIG_SCHED_DEBUG optimizations (Patrick Bellasi)

   - better idle loop (Cheng Jian)

   - ... plus misc fixes, cleanups and updates"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
  sched/core: Optimize sched_feat() for !CONFIG_SCHED_DEBUG builds
  sched/sysctl: Fix attributes of some extern declarations
  sched/isolation: Document isolcpus= boot parameter flags, mark it deprecated
  sched/isolation: Add basic isolcpus flags
  sched/isolation: Move isolcpus= handling to the housekeeping code
  sched/isolation: Handle the nohz_full= parameter
  sched/isolation: Introduce housekeeping flags
  sched/isolation: Split out new CONFIG_CPU_ISOLATION=y config from CONFIG_NO_HZ_FULL
  sched/isolation: Rename is_housekeeping_cpu() to housekeeping_cpu()
  sched/isolation: Use its own static key
  sched/isolation: Make the housekeeping cpumask private
  sched/isolation: Provide a dynamic off-case to housekeeping_any_cpu()
  sched/isolation, watchdog: Use housekeeping_cpumask() instead of ad-hoc version
  sched/isolation: Move housekeeping related code to its own file
  sched/idle: Micro-optimize the idle loop
  sched/isolcpus: Fix "isolcpus=" boot parameter handling when !CONFIG_CPUMASK_OFFSTACK
  x86/tsc: Append the 'tsc=' description for the 'tsc=unstable' boot parameter
  sched/rt: Simplify the IPI based RT balancing logic
  block/ioprio: Use a helper to check for RT prio
  sched/rt: Add a helper to test for a RT task
  ...
2017-11-13 13:37:52 -08:00
Linus Torvalds
8e9a2dba86 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core locking updates from Ingo Molnar:
 "The main changes in this cycle are:

   - Another attempt at enabling cross-release lockdep dependency
     tracking (automatically part of CONFIG_PROVE_LOCKING=y), this time
     with better performance and fewer false positives. (Byungchul Park)

   - Introduce lockdep_assert_irqs_enabled()/disabled() and convert
     open-coded equivalents to lockdep variants. (Frederic Weisbecker)

   - Add down_read_killable() and use it in the VFS's iterate_dir()
     method. (Kirill Tkhai)

   - Convert remaining uses of ACCESS_ONCE() to
     READ_ONCE()/WRITE_ONCE(). Most of the conversion was Coccinelle
     driven. (Mark Rutland, Paul E. McKenney)

   - Get rid of lockless_dereference(), by strengthening Alpha atomics,
     strengthening READ_ONCE() with smp_read_barrier_depends() and thus
     being able to convert users of lockless_dereference() to
     READ_ONCE(). (Will Deacon)

   - Various micro-optimizations:

        - better PV qspinlocks (Waiman Long),
        - better x86 barriers (Michael S. Tsirkin)
        - better x86 refcounts (Kees Cook)

   - ... plus other fixes and enhancements. (Borislav Petkov, Juergen
     Gross, Miguel Bernal Marin)"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
  locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE
  rcu: Use lockdep to assert IRQs are disabled/enabled
  netpoll: Use lockdep to assert IRQs are disabled/enabled
  timers/posix-cpu-timers: Use lockdep to assert IRQs are disabled/enabled
  sched/clock, sched/cputime: Use lockdep to assert IRQs are disabled/enabled
  irq_work: Use lockdep to assert IRQs are disabled/enabled
  irq/timings: Use lockdep to assert IRQs are disabled/enabled
  perf/core: Use lockdep to assert IRQs are disabled/enabled
  x86: Use lockdep to assert IRQs are disabled/enabled
  smp/core: Use lockdep to assert IRQs are disabled/enabled
  timers/hrtimer: Use lockdep to assert IRQs are disabled/enabled
  timers/nohz: Use lockdep to assert IRQs are disabled/enabled
  workqueue: Use lockdep to assert IRQs are disabled/enabled
  irq/softirqs: Use lockdep to assert IRQs are disabled/enabled
  locking/lockdep: Add IRQs disabled/enabled assertion APIs: lockdep_assert_irqs_enabled()/disabled()
  locking/pvqspinlock: Implement hybrid PV queued/unfair locks
  locking/rwlocks: Fix comments
  x86/paravirt: Set up the virt_spin_lock_key after static keys get initialized
  block, locking/lockdep: Assign a lock_class per gendisk used for wait_for_completion()
  workqueue: Remove now redundant lock acquisitions wrt. workqueue flushes
  ...
2017-11-13 12:38:26 -08:00
Martin Brandenburg
db0267e7af orangefs: call op_release sooner when creating inodes
Prevents holding an unnecessary op while the kernel processes another op
and yields the CPU.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-11-13 15:10:18 -05:00
Martin Brandenburg
a55f2d8615 orangefs: stop setting atime on inode dirty
The previous code path was to mark the inode dirty, let
orangefs_inode_dirty set a flag in our private inode, then later during
inode release call orangefs_flush_inode which notices the flag and
writes the atime out.

The code path worked almost identically for mtime, ctime, and mode
except that those flags are set explicitly and not as side effects of
dirty.

Now orangefs_flush_inode is removed.  Marking an inode dirty does not
imply an atime update.  Any place where flags were set before is now
an explicit call to orangefs_inode_setattr.  Since OrangeFS does not
utilize inode writeback, the attribute change should be written out
immediately.

Fixes generic/120.

In namei.c, there are several places where the directory mtime and ctime
are set, but only the mtime is sent to the server.  These don't seem
right, but I've left them as is for now.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-11-13 15:10:11 -05:00
Jérémy Lefaure
296200d3bb orangefs: use ARRAY_SIZE
Using the ARRAY_SIZE macro improves the readability of the code.

Found with Coccinelle with the following semantic patch:
@r depends on (org || report)@
type T;
T[] E;
position p;
@@
(
 (sizeof(E)@p /sizeof(*E))
|
 (sizeof(E)@p /sizeof(E[...]))
|
 (sizeof(E)@p /sizeof(T))
)

Signed-off-by: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-11-13 15:09:58 -05:00
Jeff Layton
933f7ac1a1 orangefs: remove initialization of i_version
...as it's completely unused.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-11-13 15:09:33 -05:00
Linus Torvalds
b33e3cc5c9 Merge branch 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem integrity updates from James Morris:
 "There is a mixture of bug fixes, code cleanup, preparatory code for
  new functionality and new functionality.

  Commit 26ddabfe96bb ("evm: enable EVM when X509 certificate is
  loaded") enabled EVM without loading a symmetric key, but was limited
  to defining the x509 certificate pathname at build. Included in this
  set of patches is the ability of enabling EVM, without loading the EVM
  symmetric key, from userspace. New is the ability to prevent the
  loading of an EVM symmetric key."

* 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  ima: Remove redundant conditional operator
  ima: Fix bool initialization/comparison
  ima: check signature enforcement against cmdline param instead of CONFIG
  module: export module signature enforcement status
  ima: fix hash algorithm initialization
  EVM: Only complain about a missing HMAC key once
  EVM: Allow userspace to signal an RSA key has been loaded
  EVM: Include security.apparmor in EVM measurements
  ima: call ima_file_free() prior to calling fasync
  integrity: use kernel_read_file_from_path() to read x509 certs
  ima: always measure and audit files in policy
  ima: don't remove the securityfs policy file
  vfs: fix mounting a filesystem with i_version
2017-11-13 10:41:25 -08:00
David Howells
98bf40cd99 afs: Protect call->state changes against signals
Protect call->state changes against the call being prematurely terminated
due to a signal.

What can happen is that a signal causes afs_wait_for_call_to_complete() to
abort an afs_call because it's not yet complete whilst afs_deliver_to_call()
is delivering data to that call.

If the data delivery causes the state to change, this may overwrite the state
of the afs_call, making it not-yet-complete again - but no further
notifications will be forthcoming from AF_RXRPC as the rxrpc call has been
aborted and completed, so kAFS will just hang in various places waiting for
that call or on page bits that need clearing by that call.

A tracepoint to monitor call state changes is also provided.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:21 +00:00
David Howells
13524ab3c6 afs: Trace page dirty/clean
Add a trace event that logs the dirtying and cleaning of pages attached to
AFS inodes.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:21 +00:00
David Howells
1cf7a1518a afs: Implement shared-writeable mmap
Implement shared-writeable mmap for AFS.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:21 +00:00
David Howells
4343d00872 afs: Get rid of the afs_writeback record
Get rid of the afs_writeback record that kAFS is using to match keys with
writes made by that key.

Instead, keep a list of keys that have a file open for writing and/or
sync'ing and iterate through those.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:20 +00:00
David Howells
215804a992 afs: Introduce a file-private data record
Introduce a file-private data record for kAFS and put the key into it
rather than storing the key in file->private_data.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:20 +00:00
Marc Dionne
83732ec514 afs: Use a dynamic port if 7001 is in use
It is not required that the afs client operate on port 7001.
The port could be in use because another kernel or userspace
client has already bound to it.

If the port is in use, just fallback to using a dynamic port.

Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:20 +00:00
David Howells
dab17c1add afs: Fix directory read/modify race
Because parsing of the directory wasn't being done under any sort of lock,
the pages holding the directory content can get invalidated whilst the
parsing is ongoing.

Further, the directory page check function gets called outside of the page
lock, so if the page gets cleared or updated, this may return reports of
bad magic numbers in the directory page.

Also, the directory may change size whilst checking and parsing are
ongoing, so more care needs to be taken here.

Fix this by:

 (1) Perform the page check from the page filling function before we set
     PageUptodate and drop the page lock.

 (2) Check for the file having shrunk and the page having been abandoned
     before checking the page contents.

 (3) Lock the page whilst parsing it for the directory iterator.

Whilst we're at it, add a tracepoint to report check failure.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:20 +00:00
David Howells
2c099014a0 afs: Trace the sending of pages
Add a pair of tracepoints to log the sending of pages for an FS.StoreData
or FS.StoreData64 operation.

Tracepoint afs_send_pages notes each set of pages added to the operation.
There may be several of these per operation as we get up at most 8
contiguous pages in one go because the bvec we're using is on the stack.

Tracepoint afs_sent_pages notes the end of adding data from a whole run of
pages to the operation and the completion of the request phase.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:19 +00:00
David Howells
025db80c9e afs: Trace the initiation and completion of client calls
Add tracepoints to trace the initiation and completion of client calls
within the kafs filesystem.

The afs_make_vl_call tracepoint watches calls to the volume location
database server.

The afs_make_fs_call tracepoint watches calls to the file server.

The afs_call_done tracepoint watches for call completion.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:19 +00:00
David Howells
1199db6035 afs: Fix total-length calculation for multiple-page send
Fix the total-length calculation in afs_make_call() when the operation
being dispatched has data from a series of pages attached.

Despite the patched code looking like that it should reduce mathematically
to the current code, it doesn't because the 32-bit unsigned arithmetic
being used to calculate the page-offset-difference doesn't correctly extend
to a 64-bit value when the result is effectively negative.

Without this, some FS.StoreData operations that span multiple pages fail,
reporting too little or too much data.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:19 +00:00
David Howells
5f0fc8ba6a afs: Only progress call state at end of Tx phase from rxrpc callback
Only progress the AFS call state at the end of Tx phase from the callback
passed to rxrpc_kernel_send_data() rather than setting it before the last
data send call.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:19 +00:00
David Howells
bf99a53ce2 afs: Make use of the YFS service upgrade to fully support IPv6
YFS VL servers offer an upgraded Volume Location service that can return
IPv6 addresses to fileservers and volume servers in addition to IPv4
addresses using the YFSVL.GetEndpoints operation which we should use if
it's available.

To this end:

 (1) Make rxrpc_kernel_recv_data() return the call's current service ID so
     that the caller can detect service upgrade and see what the service
     was upgraded to.

 (2) When we see a VL server address we haven't seen before, send a
     VL.GetCapabilities operation to it with the service upgrade bit set.

     If we get an upgrade to the YFS VL service, change the service ID in
     the address list for that address to use the upgraded service and set
     a flag to note that this appears to be a YFS-compatible server.

 (3) If, when a server's addresses are being looked up, we note that we
     previously detected a YFS-compatible server, then send the
     YFSVL.GetEndpoints operation rather than VL.GetAddrsU.

 (4) Build a fileserver address list from the reply of YFSVL.GetEndpoints,
     including both IPv4 and IPv6 addresses.  Volume server addresses are
     discarded.

 (5) The address list is sorted by address and port now, instead of just
     address.  This allows multiple servers on the same host sitting on
     different ports.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:19 +00:00
David Howells
d2ddc776a4 afs: Overhaul volume and server record caching and fileserver rotation
The current code assumes that volumes and servers are per-cell and are
never shared, but this is not enforced, and, indeed, public cells do exist
that are aliases of each other.  Further, an organisation can, say, set up
a public cell and a private cell with overlapping, but not identical, sets
of servers.  The difference is purely in the database attached to the VL
servers.

The current code will malfunction if it sees a server in two cells as it
assumes global address -> server record mappings and that each server is in
just one cell.

Further, each server may have multiple addresses - and may have addresses
of different families (IPv4 and IPv6, say).

To this end, the following structural changes are made:

 (1) Server record management is overhauled:

     (a) Server records are made independent of cell.  The namespace keeps
     	 track of them, volume records have lists of them and each vnode
     	 has a server on which its callback interest currently resides.

     (b) The cell record no longer keeps a list of servers known to be in
     	 that cell.

     (c) The server records are now kept in a flat list because there's no
     	 single address to sort on.

     (d) Server records are now keyed by their UUID within the namespace.

     (e) The addresses for a server are obtained with the VL.GetAddrsU
     	 rather than with VL.GetEntryByName, using the server's UUID as a
     	 parameter.

     (f) Cached server records are garbage collected after a period of
     	 non-use and are counted out of existence before purging is allowed
     	 to complete.  This protects the work functions against rmmod.

     (g) The servers list is now in /proc/fs/afs/servers.

 (2) Volume record management is overhauled:

     (a) An RCU-replaceable server list is introduced.  This tracks both
     	 servers and their coresponding callback interests.

     (b) The superblock is now keyed on cell record and numeric volume ID.

     (c) The volume record is now tied to the superblock which mounts it,
     	 and is activated when mounted and deactivated when unmounted.
     	 This makes it easier to handle the cache cookie without causing a
     	 double-use in fscache.

     (d) The volume record is loaded from the VLDB using VL.GetEntryByNameU
     	 to get the server UUID list.

     (e) The volume name is updated if it is seen to have changed when the
     	 volume is updated (the update is keyed on the volume ID).

 (3) The vlocation record is got rid of and VLDB records are no longer
     cached.  Sufficient information is stored in the volume record, though
     an update to a volume record is now no longer shared between related
     volumes (volumes come in bundles of three: R/W, R/O and backup).

and the following procedural changes are made:

 (1) The fileserver cursor introduced previously is now fleshed out and
     used to iterate over fileservers and their addresses.

 (2) Volume status is checked during iteration, and the server list is
     replaced if a change is detected.

 (3) Server status is checked during iteration, and the address list is
     replaced if a change is detected.

 (4) The abort code is saved into the address list cursor and -ECONNABORTED
     returned in afs_make_call() if a remote abort happened rather than
     translating the abort into an error message.  This allows actions to
     be taken depending on the abort code more easily.

     (a) If a VMOVED abort is seen then this is handled by rechecking the
     	 volume and restarting the iteration.

     (b) If a VBUSY, VRESTARTING or VSALVAGING abort is seen then this is
         handled by sleeping for a short period and retrying and/or trying
         other servers that might serve that volume.  A message is also
         displayed once until the condition has cleared.

     (c) If a VOFFLINE abort is seen, then this is handled as VBUSY for the
     	 moment.

     (d) If a VNOVOL abort is seen, the volume is rechecked in the VLDB to
     	 see if it has been deleted; if not, the fileserver is probably
     	 indicating that the volume couldn't be attached and needs
     	 salvaging.

     (e) If statfs() sees one of these aborts, it does not sleep, but
     	 rather returns an error, so as not to block the umount program.

 (5) The fileserver iteration functions in vnode.c are now merged into
     their callers and more heavily macroised around the cursor.  vnode.c
     is removed.

 (6) Operations on a particular vnode are serialised on that vnode because
     the server will lock that vnode whilst it operates on it, so a second
     op sent will just have to wait.

 (7) Fileservers are probed with FS.GetCapabilities before being used.
     This is where service upgrade will be done.

 (8) A callback interest on a fileserver is set up before an FS operation
     is performed and passed through to afs_make_call() so that it can be
     set on the vnode if the operation returns a callback.  The callback
     interest is passed through to afs_iget() also so that it can be set
     there too.

In general, record updating is done on an as-needed basis when we try to
access servers, volumes or vnodes rather than offloading it to work items
and special threads.

Notes:

 (1) Pre AFS-3.4 servers are no longer supported, though this can be added
     back if necessary (AFS-3.4 was released in 1998).

 (2) VBUSY is retried forever for the moment at intervals of 1s.

 (3) /proc/fs/afs/<cell>/servers no longer exists.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:19 +00:00
David Howells
9cc6fc50f7 afs: Move server rotation code into its own file
Move server rotation code into its own file.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:19 +00:00
David Howells
8b2a464ced afs: Add an address list concept
Add an RCU replaceable address list structure to hold a list of server
addresses.  The list also holds the

To this end:

 (1) A cell's VL server address list can be loaded directly via insmod or
     echo to /proc/fs/afs/cells or dynamically from a DNS query for AFSDB
     or SRV records.

 (2) Anyone wanting to use a cell's VL server address must wait until the
     cell record comes online and has tried to obtain some addresses.

 (3) An FS server's address list, for the moment, has a single entry that
     is the key to the server list.  This will change in the future when a
     server is instead keyed on its UUID and the VL.GetAddrsU operation is
     used.

 (4) An 'address cursor' concept is introduced to handle iteration through
     the address list.  This is passed to the afs_make_call() as, in the
     future, stuff (such as abort code) that doesn't outlast the call will
     be returned in it.

In the future, we might want to annotate the list with information about
how each address fares.  We might then want to propagate such annotations
over address list replacement.

Whilst we're at it, we allow IPv6 addresses to be specified in
colon-delimited lists by enclosing them in square brackets.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:18 +00:00
David Howells
989782dcdc afs: Overhaul cell database management
Overhaul the way that the in-kernel AFS client keeps track of cells in the
following manner:

 (1) Cells are now held in an rbtree to make walking them quicker and RCU
     managed (though this is probably overkill).

 (2) Cells now have a manager work item that:

     (A) Looks after fetching and refreshing the VL server list.

     (B) Manages cell record lifetime, including initialising and
     	 destruction.

     (B) Manages cell record caching whereby threads are kept around for a
     	 certain time after last use and then destroyed.

     (C) Manages the FS-Cache index cookie for a cell.  It is not permitted
     	 for a cookie to be in use twice, so we have to be careful to not
     	 allow a new cell record to exist at the same time as an old record
     	 of the same name.

 (3) Each AFS network namespace is given a manager work item that manages
     the cells within it, maintaining a single timer to prod cells into
     updating their DNS records.

     This uses the reduce_timer() facility to make the timer expire at the
     soonest timed event that needs happening.

 (4) When a module is being unloaded, cells and cell managers are now
     counted out using dec_after_work() to make sure the module text is
     pinned until after the data structures have been cleaned up.

 (5) Each cell's VL server list is now protected by a seqlock rather than a
     semaphore.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:18 +00:00
David Howells
be080a6f43 afs: Overhaul permit caching
Overhaul permit caching in AFS by making it per-vnode and sharing permit
lists where possible.

When most of the fileserver operations are called, they return a status
structure indicating the (revised) details of the vnode or vnodes involved
in the operation.  This includes the access mark derived from the ACL
(named CallerAccess in the protocol definition file).  This is cacheable
and if the ACL changes, the server will tell us that it is breaking the
callback promise, at which point we can discard the currently cached
permits.

With this patch, the afs_permits structure has, at the end, an array of
{ key, CallerAccess } elements, sorted by key pointer.  This is then cached
in a hash table so that it can be shared between vnodes with the same
access permits.

Permit lists can only be shared if they contain the exact same set of
key->CallerAccess mappings.

Note that that table is global rather than being per-net_ns.  If the keys
in a permit list cross net_ns boundaries, there is no problem sharing the
cached permits, since the permits are just integer masks.

Since permit lists pin keys, the permit cache also makes it easier for a
future patch to find all occurrences of a key and remove them by means of
setting the afs_permits::invalidated flag and then clearing the appropriate
key pointer.  In such an event, memory barriers will need adding.

Lastly, the permit caching is skipped if the server has sent either a
vnode-specific or an entire-server callback since the start of the
operation.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:18 +00:00
David Howells
c435ee3455 afs: Overhaul the callback handling
Overhaul the AFS callback handling by the following means:

 (1) Don't give up callback promises on vnodes that we are no longer using,
     rather let them just expire on the server or let the server break
     them.  This is actually more efficient for the server as the callback
     lookup is expensive if there are lots of extant callbacks.

 (2) Only give up the callback promises we have from a server when the
     server record is destroyed.  Then we can just give up *all* the
     callback promises on it in one go.

 (3) Servers can end up being shared between cells if cells are aliased, so
     don't add all the vnodes being backed by a particular server into a
     big FID-indexed tree on that server as there may be duplicates.

     Instead have each volume instance (~= superblock) register an interest
     in a server as it starts to make use of it and use this to allow the
     processor for callbacks from the server to find the superblock and
     thence the inode corresponding to the FID being broken by means of
     ilookup_nowait().

 (4) Rather than iterating over the entire callback list when a mass-break
     comes in from the server, maintain a counter of mass-breaks in
     afs_server (cb_seq) and make afs_validate() check it against the copy
     in afs_vnode.

     It would be nice not to have to take a read_lock whilst doing this,
     but that's tricky without using RCU.

 (5) Save a ref on the fileserver we're using for a call in the afs_call
     struct so that we can access its cb_s_break during call decoding.

 (6) Write-lock around callback and status storage in a vnode and read-lock
     around getattr so that we don't see the status mid-update.

This has the following consequences:

 (1) Data invalidation isn't seen until someone calls afs_validate() on a
     vnode.  Unfortunately, we need to use a key to query the server, but
     getting one from a background thread is tricky without caching loads
     of keys all over the place.

 (2) Mass invalidation isn't seen until someone calls afs_validate().

 (3) Callback breaking is going to hit the inode_hash_lock quite a bit.
     Could this be replaced with rcu_read_lock() since inodes are destroyed
     under RCU conditions.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:18 +00:00
David Howells
d0676a1678 afs: Rename struct afs_call server member to cm_server
Rename the server member of struct afs_call to cm_server as we're only
going to be using it for incoming calls for the Cache Manager service.
This makes it easier to differentiate from the pointer to the target server
for the client, which will point to a different structure to allow for
callback handling.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:18 +00:00
David Howells
03dc2cfca5 afs: Fix the afs_uuid struct to make the char-sized fields signed
In AFS's encoding of a UUID, the eight 'char' fields are all signed, so
represent them with __s8 rather than __u8.  This makes the compiler
sign-extend them correctly when XDR-encoding them.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:18 +00:00
David Howells
f4b3526d83 afs: Connect up the CB.ProbeUuid
The handler for the CB.ProbeUuid operation in the cache manager is
implemented, but isn't listed in the switch-statement of operation
selection, so won't be used.  Fix this by adding it.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:18 +00:00
David Howells
33cd7f2b76 afs: Potentially return call->reply[0] from afs_make_call()
If call->ret_reply0 is set, return call->reply[0] on success.  Change the
return type of afs_make_call() to long so that this can be passed back
without bit loss and then cast to a pointer if required.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:17 +00:00
David Howells
97e3043ad8 afs: Condense afs_call's reply{,2,3,4} into an array
Condense struct afs_call's reply anchor members - reply{,2,3,4} - into an
array.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:17 +00:00
David Howells
f780c8ea0e afs: Consolidate abort_to_error translators
The AFS abort code space is shared across all services, so there's no need
for separate abort_to_error translators for each service.

Consolidate them into a single function and remove the function pointers
for them.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:17 +00:00
David Howells
3838d3ecde afs: Allow IPv6 address specification of VL servers
Allow VL server specifications to be given IPv6 addresses as well as IPv4
addresses, for example as:

	echo add foo.org 1111:2222:3333:0:4444:5555:6666:7777 >/proc/fs/afs/cells

Note that ':' is the expected separator for separating IPv4 addresses, but
if a ',' is detected or no '.' is detected in the string, the delimiter is
switched to ','.

This also works with DNS AFSDB or SRV record strings fetched by upcall from
userspace.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:17 +00:00
David Howells
4d9df9868f afs: Keep and pass sockaddr_rxrpc addresses rather than in_addr
Keep and pass sockaddr_rxrpc addresses around rather than keeping and
passing in_addr addresses to allow for the use of IPv6 and non-standard
port numbers in future.

This also allows the port and service_id fields to be removed from the
afs_call struct.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:17 +00:00
David Howells
ad6a942a9e afs: Update the cache index structure
Update the cache index structure in the following ways:

 (1) Don't use the volume name followed by the volume type as levels in the
     cache index.  Volumes can be renamed.  Use the volume ID instead.

 (2) Don't store the VLDB data for a volume in the tree.  If the volume
     database should be cached locally, then it should be done in a separate
     tree.

 (3) Expand the volume ID stored in the cache to 64 bits.

 (4) Expand the file/vnode ID stored in the cache to 96 bits.

 (5) Increment the cache structure version number to 1.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:17 +00:00
David Howells
91a90380ef afs: Add some protocol defs
Add some protocol definitions, including max field lengths, flag defs, an
XDR-encoded UUID def, more VL operation IDs and more fileserver abort
codes.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:17 +00:00
David Howells
9ed900b116 afs: Push the net ns pointer to more places
Push the network namespace pointer to more places in AFS, including the
afs_server structure (which doesn't hold a ref on the netns).

In particular, afs_put_cell() now takes requires a net ns parameter so that
it can safely alter the netns after decrementing the cell usage count - the
cell will be deallocated by a background thread after being cached for a
period, which means that it's not safe to access it after reducing its
usage count.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:17 +00:00
David Howells
49566f6f06 afs: Note the cell in the superblock info also
Keep a reference to the cell in the superblock info structure in addition
to the volume and net pointers.  This will make it easier to clean up in a
future patch in which afs_put_volume() will need the cell pointer.

Whilst we're at it, make the cell and volume getting functions return a
pointer to the object got to make the call sites look neater.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:16 +00:00
David Howells
59fa1c4a9f afs: Fix server reaping
Fix server reaping and make sure it's all done before we start trying to
purge cells, given that servers currently pin cells.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:16 +00:00