75548 Commits

Author SHA1 Message Date
Jens Axboe
9666d4206e io_uring: fail links if msg-ring doesn't succeeed
We must always call req_set_fail() if the request is failed, otherwise
we won't sever links for dependent chains correctly.

Fixes: 4f57f06ce218 ("io_uring: add support for IORING_OP_MSG_RING command")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-29 10:51:08 -06:00
Linus Torvalds
1930a6e739 ptrace: Cleanups for v5.18
This set of changes removes tracehook.h, moves modification of all of
 the ptrace fields inside of siglock to remove races, adds a missing
 permission check to ptrace.c
 
 The removal of tracehook.h is quite significant as it has been a major
 source of confusion in recent years.  Much of that confusion was
 around task_work and TIF_NOTIFY_SIGNAL (which I have now decoupled
 making the semantics clearer).
 
 For people who don't know tracehook.h is a vestiage of an attempt to
 implement uprobes like functionality that was never fully merged, and
 was later superseeded by uprobes when uprobes was merged.  For many
 years now we have been removing what tracehook functionaly a little
 bit at a time.  To the point where now anything left in tracehook.h is
 some weird strange thing that is difficult to understand.
 
 Eric W. Biederman (15):
       ptrace: Move ptrace_report_syscall into ptrace.h
       ptrace/arm: Rename tracehook_report_syscall report_syscall
       ptrace: Create ptrace_report_syscall_{entry,exit} in ptrace.h
       ptrace: Remove arch_syscall_{enter,exit}_tracehook
       ptrace: Remove tracehook_signal_handler
       task_work: Remove unnecessary include from posix_timers.h
       task_work: Introduce task_work_pending
       task_work: Call tracehook_notify_signal from get_signal on all architectures
       task_work: Decouple TIF_NOTIFY_SIGNAL and task_work
       signal: Move set_notify_signal and clear_notify_signal into sched/signal.h
       resume_user_mode: Remove #ifdef TIF_NOTIFY_RESUME in set_notify_resume
       resume_user_mode: Move to resume_user_mode.h
       tracehook: Remove tracehook.h
       ptrace: Move setting/clearing ptrace_message into ptrace_stop
       ptrace: Return the signal to continue with from ptrace_stop
 
 Jann Horn (1):
       ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
 
 Yang Li (1):
       ptrace: Remove duplicated include in ptrace.c
 
  MAINTAINERS                          |   1 -
  arch/Kconfig                         |   5 +-
  arch/alpha/kernel/ptrace.c           |   5 +-
  arch/alpha/kernel/signal.c           |   4 +-
  arch/arc/kernel/ptrace.c             |   5 +-
  arch/arc/kernel/signal.c             |   4 +-
  arch/arm/kernel/ptrace.c             |  12 +-
  arch/arm/kernel/signal.c             |   4 +-
  arch/arm64/kernel/ptrace.c           |  14 +--
  arch/arm64/kernel/signal.c           |   4 +-
  arch/csky/kernel/ptrace.c            |   5 +-
  arch/csky/kernel/signal.c            |   4 +-
  arch/h8300/kernel/ptrace.c           |   5 +-
  arch/h8300/kernel/signal.c           |   4 +-
  arch/hexagon/kernel/process.c        |   4 +-
  arch/hexagon/kernel/signal.c         |   1 -
  arch/hexagon/kernel/traps.c          |   6 +-
  arch/ia64/kernel/process.c           |   4 +-
  arch/ia64/kernel/ptrace.c            |   6 +-
  arch/ia64/kernel/signal.c            |   1 -
  arch/m68k/kernel/ptrace.c            |   5 +-
  arch/m68k/kernel/signal.c            |   4 +-
  arch/microblaze/kernel/ptrace.c      |   5 +-
  arch/microblaze/kernel/signal.c      |   4 +-
  arch/mips/kernel/ptrace.c            |   5 +-
  arch/mips/kernel/signal.c            |   4 +-
  arch/nds32/include/asm/syscall.h     |   2 +-
  arch/nds32/kernel/ptrace.c           |   5 +-
  arch/nds32/kernel/signal.c           |   4 +-
  arch/nios2/kernel/ptrace.c           |   5 +-
  arch/nios2/kernel/signal.c           |   4 +-
  arch/openrisc/kernel/ptrace.c        |   5 +-
  arch/openrisc/kernel/signal.c        |   4 +-
  arch/parisc/kernel/ptrace.c          |   7 +-
  arch/parisc/kernel/signal.c          |   4 +-
  arch/powerpc/kernel/ptrace/ptrace.c  |   8 +-
  arch/powerpc/kernel/signal.c         |   4 +-
  arch/riscv/kernel/ptrace.c           |   5 +-
  arch/riscv/kernel/signal.c           |   4 +-
  arch/s390/include/asm/entry-common.h |   1 -
  arch/s390/kernel/ptrace.c            |   1 -
  arch/s390/kernel/signal.c            |   5 +-
  arch/sh/kernel/ptrace_32.c           |   5 +-
  arch/sh/kernel/signal_32.c           |   4 +-
  arch/sparc/kernel/ptrace_32.c        |   5 +-
  arch/sparc/kernel/ptrace_64.c        |   5 +-
  arch/sparc/kernel/signal32.c         |   1 -
  arch/sparc/kernel/signal_32.c        |   4 +-
  arch/sparc/kernel/signal_64.c        |   4 +-
  arch/um/kernel/process.c             |   4 +-
  arch/um/kernel/ptrace.c              |   5 +-
  arch/x86/kernel/ptrace.c             |   1 -
  arch/x86/kernel/signal.c             |   5 +-
  arch/x86/mm/tlb.c                    |   1 +
  arch/xtensa/kernel/ptrace.c          |   5 +-
  arch/xtensa/kernel/signal.c          |   4 +-
  block/blk-cgroup.c                   |   2 +-
  fs/coredump.c                        |   1 -
  fs/exec.c                            |   1 -
  fs/io-wq.c                           |   6 +-
  fs/io_uring.c                        |  11 +-
  fs/proc/array.c                      |   1 -
  fs/proc/base.c                       |   1 -
  include/asm-generic/syscall.h        |   2 +-
  include/linux/entry-common.h         |  47 +-------
  include/linux/entry-kvm.h            |   2 +-
  include/linux/posix-timers.h         |   1 -
  include/linux/ptrace.h               |  81 ++++++++++++-
  include/linux/resume_user_mode.h     |  64 ++++++++++
  include/linux/sched/signal.h         |  17 +++
  include/linux/task_work.h            |   5 +
  include/linux/tracehook.h            | 226 -----------------------------------
  include/uapi/linux/ptrace.h          |   2 +-
  kernel/entry/common.c                |  19 +--
  kernel/entry/kvm.c                   |   9 +-
  kernel/exit.c                        |   3 +-
  kernel/livepatch/transition.c        |   1 -
  kernel/ptrace.c                      |  47 +++++---
  kernel/seccomp.c                     |   1 -
  kernel/signal.c                      |  62 +++++-----
  kernel/task_work.c                   |   4 +-
  kernel/time/posix-cpu-timers.c       |   1 +
  mm/memcontrol.c                      |   2 +-
  security/apparmor/domain.c           |   1 -
  security/selinux/hooks.c             |   1 -
  85 files changed, 372 insertions(+), 495 deletions(-)
 
 Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEgjlraLDcwBA2B+6cC/v6Eiajj0AFAmJCQkoACgkQC/v6Eiaj
 j0DCWQ/5AZVFU+hX32obUNCLackHTwgcCtSOs3JNBmNA/zL/htPiYYG0ghkvtlDR
 Dw5J5DnxC6P7PVAdAqrpvx2uX2FebHYU0bRlyLx8LYUEP5dhyNicxX9jA882Z+vw
 Ud0Ue9EojwGWS76dC9YoKUj3slThMATbhA2r4GVEoof8fSNJaBxQIqath44t0FwU
 DinWa+tIOvZANGBZr6CUUINNIgqBIZCH/R4h6ArBhMlJpuQ5Ufk2kAaiWFwZCkX4
 0LuuAwbKsCKkF8eap5I2KrIg/7zZVgxAg9O3cHOzzm8OPbKzRnNnQClcDe8perqp
 S6e/f3MgpE+eavd1EiLxevZ660cJChnmikXVVh8ZYYoefaMKGqBaBSsB38bNcLjY
 3+f2dB+TNBFRnZs1aCujK3tWBT9QyjZDKtCBfzxDNWBpXGLhHH6j6lA5Lj+Cef5K
 /HNHFb+FuqedlFZh5m1Y+piFQ70hTgCa2u8b+FSOubI2hW9Zd+WzINV0ANaZ2LvZ
 4YGtcyDNk1q1+c87lxP9xMRl/xi6rNg+B9T2MCo4IUnHgpSVP6VEB3osgUmrrrN0
 eQlUI154G/AaDlqXLgmn1xhRmlPGfmenkxpok1AuzxvNJsfLKnpEwQSc13g3oiZr
 disZQxNY0kBO2Nv3G323Z6PLinhbiIIFez6cJzK5v0YJ2WtO3pY=
 =uEro
 -----END PGP SIGNATURE-----

Merge tag 'ptrace-cleanups-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace

Pull ptrace cleanups from Eric Biederman:
 "This set of changes removes tracehook.h, moves modification of all of
  the ptrace fields inside of siglock to remove races, adds a missing
  permission check to ptrace.c

  The removal of tracehook.h is quite significant as it has been a major
  source of confusion in recent years. Much of that confusion was around
  task_work and TIF_NOTIFY_SIGNAL (which I have now decoupled making the
  semantics clearer).

  For people who don't know tracehook.h is a vestiage of an attempt to
  implement uprobes like functionality that was never fully merged, and
  was later superseeded by uprobes when uprobes was merged. For many
  years now we have been removing what tracehook functionaly a little
  bit at a time. To the point where anything left in tracehook.h was
  some weird strange thing that was difficult to understand"

* tag 'ptrace-cleanups-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  ptrace: Remove duplicated include in ptrace.c
  ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
  ptrace: Return the signal to continue with from ptrace_stop
  ptrace: Move setting/clearing ptrace_message into ptrace_stop
  tracehook: Remove tracehook.h
  resume_user_mode: Move to resume_user_mode.h
  resume_user_mode: Remove #ifdef TIF_NOTIFY_RESUME in set_notify_resume
  signal: Move set_notify_signal and clear_notify_signal into sched/signal.h
  task_work: Decouple TIF_NOTIFY_SIGNAL and task_work
  task_work: Call tracehook_notify_signal from get_signal on all architectures
  task_work: Introduce task_work_pending
  task_work: Remove unnecessary include from posix_timers.h
  ptrace: Remove tracehook_signal_handler
  ptrace: Remove arch_syscall_{enter,exit}_tracehook
  ptrace: Create ptrace_report_syscall_{entry,exit} in ptrace.h
  ptrace/arm: Rename tracehook_report_syscall report_syscall
  ptrace: Move ptrace_report_syscall into ptrace.h
2022-03-28 17:29:53 -07:00
Steve French
fdf59eb548 smb3: cleanup and clarify status of tree connections
Currently the way the tid (tree connection) status is tracked
is confusing.  The same enum is used for structs cifs_tcon
and cifs_ses and TCP_Server_info, but each of these three has
different states that they transition among.  The current
code also unnecessarily uses camelCase.

Convert from use of statusEnum to a new tid_status_enum for
tree connections.  The valid states for a tid are:

        TID_NEW = 0,
        TID_GOOD,
        TID_EXITING,
        TID_NEED_RECON,
        TID_NEED_TCON,
        TID_IN_TCON,
        TID_NEED_FILES_INVALIDATE, /* unused, considering removing in future */
        TID_IN_FILES_INVALIDATE

It also removes CifsNeedTcon, CifsInTcon, CifsNeedFilesInvalidate and
CifsInFilesInvalidate from the statusEnum used for session and
TCP_Server_Info since they are not relevant for those.

A follow on patch will fix the places where we use the
tcon->need_reconnect flag to be more consistent with the tid->status.

Also fixes a bug that was:
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-03-28 17:07:30 -05:00
Linus Torvalds
266d17a8c0 Driver core changes for 5.18-rc1
Here is the set of driver core changes for 5.18-rc1.
 
 Not much here, primarily it was a bunch of cleanups and small updates:
 	- kobj_type cleanups for default_groups
 	- documentation updates
 	- firmware loader minor changes
 	- component common helper added and take advantage of it in many
 	  drivers (the largest part of this pull request).
 
 There will be a merge conflict in drivers/power/supply/ab8500_chargalg.c
 with your tree, the merge conflict should be easy (take all the
 changes).
 
 All of these have been in linux-next for a while with no reported
 problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYkG6PA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ylMFwCfSIyAU4oLEgj+/Rfmx4o45cAVIWMAnit3zbdU
 wUUCGqKcOnTJEcW6dMPh
 =1VVi
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the set of driver core changes for 5.18-rc1.

  Not much here, primarily it was a bunch of cleanups and small updates:

   - kobj_type cleanups for default_groups

   - documentation updates

   - firmware loader minor changes

   - component common helper added and take advantage of it in many
     drivers (the largest part of this pull request).

  All of these have been in linux-next for a while with no reported
  problems"

* tag 'driver-core-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (54 commits)
  Documentation: update stable review cycle documentation
  drivers/base/dd.c : Remove the initial value of the global variable
  Documentation: update stable tree link
  Documentation: add link to stable release candidate tree
  devres: fix typos in comments
  Documentation: add note block surrounding security patch note
  samples/kobject: Use sysfs_emit instead of sprintf
  base: soc: Make soc_device_match() simpler and easier to read
  driver core: dd: fix return value of __setup handler
  driver core: Refactor sysfs and drv/bus remove hooks
  driver core: Refactor multiple copies of device cleanup
  scripts: get_abi.pl: Fix typo in help message
  kernfs: fix typos in comments
  kernfs: remove unneeded #if 0 guard
  ALSA: hda/realtek: Make use of the helper component_compare_dev_name
  video: omapfb: dss: Make use of the helper component_compare_dev
  power: supply: ab8500: Make use of the helper component_compare_dev
  ASoC: codecs: wcd938x: Make use of the helper component_compare/release_of
  iommu/mediatek: Make use of the helper component_compare/release_of
  drm: of: Make use of the helper component_release_of
  ...
2022-03-28 12:41:28 -07:00
Darrick J. Wong
85bcfa26f9 xfs: don't report reserved bnobt space as available
On a modern filesystem, we don't allow userspace to allocate blocks for
data storage from the per-AG space reservations, the user-controlled
reservation pool that prevents ENOSPC in the middle of internal
operations, or the internal per-AG set-aside that prevents unwanted
filesystem shutdowns due to ENOSPC during a bmap btree split.

Since we now consider freespace btree blocks as unavailable for
allocation for data storage, we shouldn't report those blocks via statfs
either.  This makes the numbers that we return via the statfs f_bavail
and f_bfree fields a more conservative estimate of actual free space.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-03-28 08:39:10 -07:00
Darrick J. Wong
82be38bcf8 xfs: fix overfilling of reserve pool
Due to cycling of m_sb_lock, it's possible for multiple callers of
xfs_reserve_blocks to race at changing the pool size, subtracting blocks
from fdblocks, and actually putting it in the pool.  The result of all
this is that we can overfill the reserve pool to hilarious levels.

xfs_mod_fdblocks, when called with a positive value, already knows how
to take freed blocks and either fill the reserve until it's full, or put
them in fdblocks.  Use that instead of setting m_resblks_avail directly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-03-28 08:39:02 -07:00
Darrick J. Wong
0baa2657dc xfs: always succeed at setting the reserve pool size
Nowadays, xfs_mod_fdblocks will always choose to fill the reserve pool
with freed blocks before adding to fdblocks.  Therefore, we can change
the behavior of xfs_reserve_blocks slightly -- setting the target size
of the pool should always succeed, since a deficiency will eventually
be made up as blocks get freed.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-03-28 08:38:56 -07:00
Darrick J. Wong
15f04fdc75 xfs: remove infinite loop when reserving free block pool
Infinite loops in kernel code are scary.  Calls to xfs_reserve_blocks
should be rare (people should just use the defaults!) so we really don't
need to try so hard.  Simplify the logic here by removing the infinite
loop.

Cc: Brian Foster <bfoster@redhat.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-03-28 08:38:50 -07:00
Darrick J. Wong
c8c5682597 xfs: don't include bnobt blocks when reserving free block pool
xfs_reserve_blocks controls the size of the user-visible free space
reserve pool.  Given the difference between the current and requested
pool sizes, it will try to reserve free space from fdblocks.  However,
the amount requested from fdblocks is also constrained by the amount of
space that we think xfs_mod_fdblocks will give us.  If we forget to
subtract m_allocbt_blks before calling xfs_mod_fdblocks, it will will
return ENOSPC and we'll hang the kernel at mount due to the infinite
loop.

In commit fd43cf600cf6, we decided that xfs_mod_fdblocks should not hand
out the "free space" used by the free space btrees, because some portion
of the free space btrees hold in reserve space for future btree
expansion.  Unfortunately, xfs_reserve_blocks' estimation of the number
of blocks that it could request from xfs_mod_fdblocks was not updated to
include m_allocbt_blks, so if space is extremely low, the caller hangs.

Fix this by creating a function to estimate the number of blocks that
can be reserved from fdblocks, which needs to exclude the set-aside and
m_allocbt_blks.

Found by running xfs/306 (which formats a single-AG 20MB filesystem)
with an fstests configuration that specifies a 1k blocksize and a
specially crafted log size that will consume 7/8 of the space (17920
blocks, specifically) in that AG.

Cc: Brian Foster <bfoster@redhat.com>
Fixes: fd43cf600cf6 ("xfs: set aside allocation btree blocks from block reservation")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-03-28 08:38:43 -07:00
Trond Myklebust
7c9d845f06 NFSv4/pNFS: Fix another issue with a list iterator pointing to the head
In nfs4_callback_devicenotify(), if we don't find a matching entry for
the deviceid, we're left with a pointer to 'struct nfs_server' that
actually points to the list of super blocks associated with our struct
nfs_client.
Furthermore, even if we have a valid pointer, nothing pins the super
block, and so the struct nfs_server could end up getting freed while
we're using it.

Since all we want is a pointer to the struct pnfs_layoutdriver_type,
let's skip all the iteration over super blocks, and just use APIs to
find the layout driver directly.

Reported-by: Xiaomeng Tong <xiam0nd.tong@gmail.com>
Fixes: 1be5683b03a7 ("pnfs: CB_NOTIFY_DEVICEID")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-28 08:36:34 -04:00
Linus Torvalds
7001052160 Add support for Intel CET-IBT, available since Tigerlake (11th gen), which is a
coarse grained, hardware based, forward edge Control-Flow-Integrity mechanism
 where any indirect CALL/JMP must target an ENDBR instruction or suffer #CP.
 
 Additionally, since Alderlake (12th gen)/Sapphire-Rapids, speculation is
 limited to 2 instructions (and typically fewer) on branch targets not starting
 with ENDBR. CET-IBT also limits speculation of the next sequential instruction
 after the indirect CALL/JMP [1].
 
 CET-IBT is fundamentally incompatible with retpolines, but provides, as
 described above, speculation limits itself.
 
 [1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEv3OU3/byMaA0LqWJdkfhpEvA5LoFAmI/LI8VHHBldGVyekBp
 bmZyYWRlYWQub3JnAAoJEHZH4aRLwOS6ZnkP/2QCgQLTu6oRxv9O020CHwlaSEeD
 1Hoy3loum5q5hAi1Ik3dR9p0H5u64c9qbrBVxaFoNKaLt5GKrtHaDSHNk2L/CFHX
 urpH65uvTLxbyZzcahkAahoJ71XU+m7PcrHLWMunw9sy10rExYVsUOlFyoyG6XCF
 BDCNZpdkC09ZM3vwlWGMZd5Pp+6HcZNPyoV9tpvWAS2l+WYFWAID7mflbpQ+tA8b
 y/hM6b3Ud0rT2ubuG1iUpopgNdwqQZ+HisMPGprh+wKZkYwS2l8pUTrz0MaBkFde
 go7fW16kFy2HQzGm6aIEBmfcg0palP/mFVaWP0zS62LwhJSWTn5G6xWBr3yxSsht
 9gWCiI0oDZuTg698MedWmomdG2SK6yAuZuqmdKtLLoWfWgviPEi7TDFG/cKtZdAW
 ag8GM8T4iyYZzpCEcWO9GWbjo6TTGq30JBQefCBG47GjD0csv2ubXXx0Iey+jOwT
 x3E8wnv9dl8V9FSd/tMpTFmje8ges23yGrWtNpb5BRBuWTeuGiBPZED2BNyyIf+T
 dmewi2ufNMONgyNp27bDKopY81CPAQq9cVxqNm9Cg3eWPFnpOq2KGYEvisZ/rpEL
 EjMQeUBsy/C3AUFAleu1vwNnkwP/7JfKYpN00gnSyeQNZpqwxXBCKnHNgOMTXyJz
 beB/7u2KIUbKEkSN
 =jZfK
 -----END PGP SIGNATURE-----

Merge tag 'x86_core_for_5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 CET-IBT (Control-Flow-Integrity) support from Peter Zijlstra:
 "Add support for Intel CET-IBT, available since Tigerlake (11th gen),
  which is a coarse grained, hardware based, forward edge
  Control-Flow-Integrity mechanism where any indirect CALL/JMP must
  target an ENDBR instruction or suffer #CP.

  Additionally, since Alderlake (12th gen)/Sapphire-Rapids, speculation
  is limited to 2 instructions (and typically fewer) on branch targets
  not starting with ENDBR. CET-IBT also limits speculation of the next
  sequential instruction after the indirect CALL/JMP [1].

  CET-IBT is fundamentally incompatible with retpolines, but provides,
  as described above, speculation limits itself"

[1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html

* tag 'x86_core_for_5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
  kvm/emulate: Fix SETcc emulation for ENDBR
  x86/Kconfig: Only allow CONFIG_X86_KERNEL_IBT with ld.lld >= 14.0.0
  x86/Kconfig: Only enable CONFIG_CC_HAS_IBT for clang >= 14.0.0
  kbuild: Fixup the IBT kbuild changes
  x86/Kconfig: Do not allow CONFIG_X86_X32_ABI=y with llvm-objcopy
  x86: Remove toolchain check for X32 ABI capability
  x86/alternative: Use .ibt_endbr_seal to seal indirect calls
  objtool: Find unused ENDBR instructions
  objtool: Validate IBT assumptions
  objtool: Add IBT/ENDBR decoding
  objtool: Read the NOENDBR annotation
  x86: Annotate idtentry_df()
  x86,objtool: Move the ASM_REACHABLE annotation to objtool.h
  x86: Annotate call_on_stack()
  objtool: Rework ASM_REACHABLE
  x86: Mark __invalid_creds() __noreturn
  exit: Mark do_group_exit() __noreturn
  x86: Mark stop_this_cpu() __noreturn
  objtool: Ignore extra-symbol code
  objtool: Rename --duplicate to --lto
  ...
2022-03-27 10:17:23 -07:00
Steve French
be13500043 smb3: move defines for query info and query fsinfo to smbfs_common
Includes moving to common code (from cifs and ksmbd protocol related
headers)
- query and query directory info levels and structs
- set info structs
- SMB2 lock struct and flags
- SMB2 echo req

Also shorten a few flag names (e.g. SMB2_LOCKFLAG_EXCLUSIVE_LOCK
to SMB2_LOCKFLAG_EXCLUSIVE)

Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-03-26 23:09:51 -05:00
Steve French
15e7b6d753 smb3: move defines for ioctl protocol header and SMB2 sizes to smbfs_common
The definitions for the ioctl SMB3 request and response as well
as length of various fields defined in the protocol documentation
were duplicated in fs/ksmbd and fs/cifs.  Move these to the common
code in fs/smbfs_common/smb2pdu.h

Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-03-26 23:09:20 -05:00
Linus Torvalds
a060c9409e Fix buffered write page prefaulting
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmI90BMUHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTpdexAApvfewSSFZ4J4mGqrGJcn1+iTkX1t
 ruDuDKOoRXbk5I+RCJKDJireIvdzElNE4f4rvgSk0Z+JMBaYcq4wI6tkuVqCs7uo
 ozrnY+Qfh2a9DA5+ghnNIdaow4v6FSrV6vvYA0ibf5D75r034j5PLnu9Ktj+rg6S
 9dFe5lTHNaBcEg9VMChT2RU9ysmdlnFL155J2iJUWCnhDcCUW15YZeRRa2ECZTcZ
 FHrgNsoFuAzsbAqV58bwirEumbZbWjfCU0blrAu00rKuaWgba0fTXegWAqHBrpii
 4hg3MJmttpg/hAUdxJFVCeE6Q/XR9nPkajIzmLxzw06Sb+YJMYpR64mno5vcQc64
 eDFGmeG75XHEfjfzVb2rG1oTUy8x3E5TzsmGp+FTs9fxhwkFSbPhx2QReX/m1nvP
 1X3JFrORTg3+lE2HBANftxWPMqNJmiuKEbQQvC8y0lgXMA3yGilnwG9eXYandKU1
 AEXK8B0mOPvE+DSnmGYRmw/vMXFarZ7Ua3dFPFgWic7ai8J5UsPCgv/I1cB4Ku/X
 ZzOR8hbgVxDzDKC4nby3g4FSB5olRkFRYw+M+oinfJ3WfIoL0N11NxAI8OpXcVl2
 Nr8/KnhK60CjgjnMAF871AS/qJnyjxYuPj+vDq46VoRQnFKoj0O/rLeokpEMGUET
 sitD2bxtoGrlpq8=
 =5EUY
 -----END PGP SIGNATURE-----

Merge tag 'write-page-prefaulting' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull iomap fixlet from Andreas Gruenbacher:
 "Fix buffered write page prefaulting.

  I forgot to send it the previous merge window. I've only improved the
  patch description since"

* tag 'write-page-prefaulting' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  fs/iomap: Fix buffered write page prefaulting
2022-03-26 12:41:52 -07:00
Linus Torvalds
752d422e74 for-5.18/alloc-cleanups-2022-03-25
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmI+UKAQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpkf8D/0WpfCaiaS3U269b3rX/PyM7LuQUAD9xMQI
 GUXX17VQvMPFvbDUgwWdkb+XqKvmJtPDPdpGqwzidWgzwjpJq/KsZkWILG8XpjKh
 ceMJ8qv6fhINAoyL+uB2Wne6YtLB0E9kqSGy72dv95TntUhajw9T7sqHGLN+HKo3
 j70AzFMTwighMMbZqAUoP5coreHVrz4TApUKocQVu2bWVvvMkBj94tEWlVoGblsE
 67bPRhCfvun6iO94uZErh4W9IeauyhKYQVBre+93loUerhEDWvLTOAZYftW1uPfG
 TheVSrYoJ3l+KcRxa1YSloAhmb3OX/YOYaEm4lr+sLg9yl6/aY6GvnfzwjL9o3kf
 mGp969Ro06e1DMgmyCfokar0jNVAqYMZN69eXONe2haU/J+RwX9qvkMs2EaPtwkO
 7VQQY8x/FOrjobsYnNQU70sm8i7zu7piEbs9iTwILlFSzN85sl8aNlnOQYJWNzvq
 7BUwry0eH2H6yLbLR+F9tw3IIbo/V/BWNf6ZqBojwIb2sDwP1jghPKWKyTBKjBfF
 TwjSLzMmwDaa27gt+zAizEJwmx9FGCMy4O527cnwF/+9nGJ6l8hZQtTXttgyPY87
 o00/c5tgvJ15JQdsTGLeN6+nzlxylF6mVTWwUWW96E08YsARcXNbDZYf3UVXgoH/
 YW7uBHaByQ==
 =k4vt
 -----END PGP SIGNATURE-----

Merge tag 'for-5.18/alloc-cleanups-2022-03-25' of git://git.kernel.dk/linux-block

Pull bio allocation fix from Jens Axboe:
 "We got some reports of users seeing:

	Unexpected gfp: 0x2 (__GFP_HIGHMEM). Fixing up to gfp: 0x1192888

  which is a regression caused by the bio allocation cleanups"

* tag 'for-5.18/alloc-cleanups-2022-03-25' of git://git.kernel.dk/linux-block:
  fs: do not pass __GFP_HIGHMEM to bio_alloc in do_mpage_readpage
2022-03-26 11:59:30 -07:00
Linus Torvalds
561593a048 for-5.18/write-streams-2022-03-18
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmI1AHwQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgplPjEACVJzKg5NkxpdkDThvq5tejws9KxB/4mHJg
 NoDMcv1TF+Orsd/HNW6XrgYnbU0ObHom3568/xb8BNegRVFe7V4ME/4IYNRyGOmV
 qbfciu04L1UkJhI52CIidkOioFABL3r1zgLCIz5vk0Cv9X7Le9x0UabHxJf7u9S+
 Z3lNdyxezN0SGx8VT86l/7lSoHtG3VHO9IsQCuNGF02SB+6uGpXBlptbEoQ4nTxd
 T7/H9FNOe2Wf7eKvcOOds8UlvZYAfYcY0GcRrIOXdHIy25mKFWwn5cDgFTMOH5ID
 xXpm+JFkDkrfSW1o4FFPxbN9Z6RbVXbGCsrXlIragLO2MJQdXiIUxS1OPT5oAado
 H9MlX6QtkwziLW9zUWa/N/jmRjc2vzHAxD6JFg/wXxNdtY0kd8TQpaxwTB8mVDPN
 VCGutt7lJS1CQInQ+ppzbdqzzuLHC1RHAyWSmfUE9rb8cbjxtJBnSIorYRLUesMT
 GRwqVTXW0osxSgCb1iDiBCJANrX1yPZcemv4Wh1gzbT6IE9sWxWXsE5sy9KvswNc
 M+E4nu/TYYTfkynItJjLgmDLOoi+V0FBY6ba0mRPBjkriSP4AVlwsZLGVsAHQzuA
 o5paW1GjRCCwhIQ6+AzZIoOz6wqvprBlUgUkUneyYAQ2ZKC3pZi8zPnpoVdFucVa
 VaTzP71C1Q==
 =efaq
 -----END PGP SIGNATURE-----

Merge tag 'for-5.18/write-streams-2022-03-18' of git://git.kernel.dk/linux-block

Pull NVMe write streams removal from Jens Axboe:
 "This removes the write streams support in NVMe. No vendor ever really
  shipped working support for this, and they are not interested in
  supporting it.

  With the NVMe support gone, we have nothing in the tree that supports
  this. Remove passing around of the hints.

  The only discussion point in this patchset imho is the fact that the
  file specific write hint setting/getting fcntl helpers will now return
  -1/EINVAL like they did before we supported write hints. No known
  applications use these functions, I only know of one prototype that I
  help do for RocksDB, and that's not used. That said, with a change
  like this, it's always a bit controversial. Alternatively, we could
  just make them return 0 and pretend it worked. It's placement based
  hints after all"

* tag 'for-5.18/write-streams-2022-03-18' of git://git.kernel.dk/linux-block:
  fs: remove fs.f_write_hint
  fs: remove kiocb.ki_hint
  block: remove the per-bio/request write hint
  nvme: remove support or stream based temperature hint
2022-03-26 11:51:46 -07:00
Trond Myklebust
d02d81efc7 NFS: Don't loop forever in nfs_do_recoalesce()
If __nfs_pageio_add_request() fails to add the request, it will return
with either desc->pg_error < 0, or mirror->pg_recoalesce will be set, so
we are guaranteed either to exit the function altogether, or to loop.

However if there is nothing left in mirror->pg_list to coalesce, we must
exit, so make sure that we clear mirror->pg_recoalesce every time we
loop.

Reported-by: Olga Kornievskaia <aglo@umich.edu>
Fixes: 70536bf4eb07 ("NFS: Clean up reset of the mirror accounting variables")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-25 21:51:03 -04:00
Linus Torvalds
a452c4eb40 \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmI7PZQACgkQnJ2qBz9k
 QNnW+QgA6c0mVip5ESE+RgRMLnajAFs0kmOzGEXpoa+QXvxrU3GuY5ssVAM4MlSx
 yzCnhlhHua3tci7rhTQ0pPrxBXStIxf/EqKB8y4ylwZhZAP3XdStvTsBizt1966m
 1GyQiAHQFLsIsbbXfXcAVClYBHSkD8zAElFLvVB08q8zFXMpkC+2oh/gpwAlOPYz
 uczYatJV4edx57E3yX2lQJfDZK8I3OeDKeladDzCliTim8zR8sZTfZU5seiM7rHU
 2XexWsM8tvUn33pvw6JE0GwOhp0ILBgMWxtA3Z1OVW+6+sngIQ4NsRRq/MaRy/ht
 zfOnNtPVyIf7DTgla0mqqPNM7/qZmQ==
 =AjOm
 -----END PGP SIGNATURE-----

Merge tag 'fs_for_v5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull reiserfs updates from Jan Kara:
 "The biggest change in this pull is the addition of a deprecation
  message about reiserfs with the outlook that we'd eventually be able
  to remove it from the kernel. Because it is practically unmaintained
  and untested and odd enough that people don't want to bother with it
  anymore...

  Otherwise there are small udf and ext2 fixes"

* tag 'fs_for_v5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  udf: remove redundant assignment of variable etype
  reiserfs: Deprecate reiserfs
  ext2: correct max file size computing
  reiserfs: get rid of AOP_FLAG_CONT_EXPAND flag
2022-03-25 17:38:15 -07:00
Linus Torvalds
a8988507e5 \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmI7PPUACgkQnJ2qBz9k
 QNm1ygf/WcyxdM4+FTrhVVFGKPriqM8Ftx2r88/sod8CPUZrmCtLbCtO4rVBAEip
 c1eVRFEcZV+A/pt/UVvIklb+/Vlc1RmP1pEBzOcDaDvOMcnBoV43rTHxwxrX1YdX
 Ykr6C37d+51FQu/cuVC/LHx2+MhIkAe2ebHbBZNVpk+s98iY9xjo+0cRaDbzgkXq
 NqySxDghZnRO4wvYbYYrStJ/XLJePjrBU1n18T0m7wpJ5UuyAnXlVwTZgkVhdNLy
 EnBUIRvhXxr3WzfH6I6G7fOHTSbYuho9CSK3K49TdiqAtoxrlWgX2LWE5WRzhHWd
 nlTu64JWeSMTIFCxxinxSs7F9nUiDg==
 =6VY+
 -----END PGP SIGNATURE-----

Merge tag 'fsnotify_for_v5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull fsnotify updates from Jan Kara:
 "A few fsnotify improvements and cleanups"

* tag 'fsnotify_for_v5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fsnotify: remove redundant parameter judgment
  fsnotify: optimize FS_MODIFY events with no ignored masks
  fsnotify: fix merge with parent's ignored mask
2022-03-25 16:58:51 -07:00
Pavel Begunkov
c86d18f4aa io_uring: fix memory leak of uid in files registration
When there are no files for __io_sqe_files_scm() to process in the
range, it'll free everything and return. However, it forgets to put uid.

Fixes: 08a451739a9b5 ("io_uring: allow sparse fixed file sets")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/accee442376f33ce8aaebb099d04967533efde92.1648226048.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-25 16:07:06 -06:00
Linus Torvalds
50560ce6a0 Kbuild -std=gnu11 updates for v5.18
Linus pointed out the benefits of C99 some years ago, especially variable
 declarations in loops [1]. At that time, we were not ready for the
 migration due to old compilers.
 
 Recently, Jakob Koschel reported a bug in list_for_each_entry(), which
 leaks the invalid pointer out of the loop [2]. In the discussion, we
 agreed that the time had come. Now that GCC 5.1 is the minimum compiler
 version, there is nothing to prevent us from going to -std=gnu99, or even
 straight to -std=gnu11.
 
 Discussions for a better list iterator implementation are ongoing, but
 this patch set must land first.
 
 [1] https://lore.kernel.org/all/CAHk-=wgr12JkKmRd21qh-se-_Gs69kbPgR9x4C+Es-yJV2GLkA@mail.gmail.com/
 [2] https://lore.kernel.org/lkml/86C4CE7D-6D93-456B-AA82-F8ADEACA40B7@gmail.com/
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmI9JqMVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsG3dkP/Ar7r8m4hc60kJE8JfXaxDpGOGka
 2yVm0EPfwV1lFGq440p4mqKc1iRTVLNMPsyaG/ZhriIp8PlSUjXLW290Sty6Z8Pd
 zcxwDg09ZXkMoDX+lc2Wr9F0wpswWJjqU/TzGLP5/qkVMe46KheXIQSPJAp8tVUt
 u2of/MTgTVMa4r7Iex/+NFWCPr4lTkWkSfzVN/Jd1r91UOyzy4E1VFRNlXIk/Fc9
 BFa67k0SHx/3FFElfwzFaejYUZjHjNzK3E1Zq8Q1vkWUxrzeEnzqTEiP7QaAi4Sa
 7MbqyqQvNoPw3uvKu5kwjDE+LHMEPTsmuaKVFpAc+qCpMtZCI6g9Q48pzQsWBMO2
 hZlEmYR9Zk0TpJp1flpOnNzoy7xPzNs0rcB3PaSOZyv+dTqtJ981IP+r4RNVlwje
 y3N9vq4RSAj/kAE/wi6FiPc/8vfbY71PbEXmg8556+kn3ne6aXl13ZrXIxz8w5jK
 bIgIFmrEPH7941KvFjoXhaFp/qv9hvLpWhQZu7CFRaj5V28qqUQ5TQFJREPePRtJ
 RFPEuOJqEGMxW/xbhcfrA1AO/y9Grxbe65e8Mph4YCfWpWaUww6vN01LC+k6UgDm
 Yq2u+wSFjWpRxOEPLWNsjnrZZgfdjk22O+TNOMs92X8/gXinmu3kZG5IUavahg7+
 J0SsIjIXhmLGKdDm
 =KMDk
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-gnu11-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild update for C11 language base from Masahiro Yamada:
 "Kbuild -std=gnu11 updates for v5.18

  Linus pointed out the benefits of C99 some years ago, especially
  variable declarations in loops [1]. At that time, we were not ready
  for the migration due to old compilers.

  Recently, Jakob Koschel reported a bug in list_for_each_entry(), which
  leaks the invalid pointer out of the loop [2]. In the discussion, we
  agreed that the time had come. Now that GCC 5.1 is the minimum
  compiler version, there is nothing to prevent us from going to
  -std=gnu99, or even straight to -std=gnu11.

  Discussions for a better list iterator implementation are ongoing, but
  this patch set must land first"

[1] https://lore.kernel.org/all/CAHk-=wgr12JkKmRd21qh-se-_Gs69kbPgR9x4C+Es-yJV2GLkA@mail.gmail.com/
[2] https://lore.kernel.org/lkml/86C4CE7D-6D93-456B-AA82-F8ADEACA40B7@gmail.com/

* tag 'kbuild-gnu11-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  Kbuild: use -std=gnu11 for KBUILD_USERCFLAGS
  Kbuild: move to -std=gnu11
  Kbuild: use -Wdeclaration-after-statement
  Kbuild: add -Wno-shift-negative-value where -Wextra is used
2022-03-25 11:48:01 -07:00
Steve French
113be37d87 [smb3] move more common protocol header definitions to smbfs_common
We have duplicated definitions for various SMB3 PDUs in
fs/ksmbd and fs/cifs.  Some had already been moved to
fs/smbfs_common/smb2pdu.h

Move definitions for
- error response
- query info and various related protocol flags
- various lease handling flags and the create lease context

to smbfs_common/smb2pdu.h to reduce code duplication

Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-03-25 10:40:56 -05:00
Andreas Gruenbacher
631f871f07 fs/iomap: Fix buffered write page prefaulting
When part of the user buffer passed to generic_perform_write() or
iomap_file_buffered_write() cannot be faulted in for reading, the entire
write currently fails.  The correct behavior would be to write all the
data that can be written, up to the point of failure.

Commit a6294593e8a1 ("iov_iter: Turn iov_iter_fault_in_readable into
fault_in_iov_iter_readable") gave us the information needed, so fix the
page prefaulting in generic_perform_write() and iomap_write_iter() to
only bail out when no pages could be faulted in.

We already factor in that pages that are faulted in may no longer be
resident by the time they are accessed.  Paging out pages has the same
effect as not faulting in those pages in the first place, so the code
can already deal with that.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2022-03-25 15:14:03 +01:00
Pavel Begunkov
8197b053a8 io_uring: fix put_kbuf without proper locking
io_put_kbuf_comp() should only be called while holding
->completion_lock, however there is no such assumption in io_clean_op()
and thus it can corrupt ->io_buffer_comp. Take the lock there, and
workaround the only user of io_clean_op() calling it with locks. Not
the prettiest solution, but it's easier to refactor it for-next.

Fixes: cc3cec8367cba ("io_uring: speedup provided buffer handling")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/743e2130b73ec6d48c4c5dd15db896c433431e6d.1648212967.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-25 07:43:53 -06:00
Pavel Begunkov
ab0ac0959b io_uring: fix invalid flags for io_put_kbuf()
io_req_complete_failed() doesn't require callers to hold ->uring_lock,
use IO_URING_F_UNLOCKED version of io_put_kbuf(). The only affected
place is the fail path of io_apoll_task_func(). Also add a lockdep
annotation to catch such bugs in the future.

Fixes: 3b2b78a8eb7cc ("io_uring: extend provided buf return to fails")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ccf602dbf8df3b6a8552a262d8ee0a13a086fbc7.1648212967.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-25 07:43:53 -06:00
Pavel Begunkov
41cdcc2202 io_uring: improve req fields comments
Move a misplaced comment about req->creds and add a line with
assumptions about req->link.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1e51d1e6b1f3708c2d4127b4e371f9daa4c5f859.1648209006.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-25 06:37:52 -06:00
Dylan Yudaken
52dd86406d io_uring: enable EPOLLEXCLUSIVE for accept poll
When polling sockets for accept, use EPOLLEXCLUSIVE. This is helpful
when multiple accept SQEs are submitted.

For O_NONBLOCK sockets multiple queued SQEs would previously have all
completed at once, but most with -EAGAIN as the result. Now only one
wakes up and completes.

For sockets without O_NONBLOCK there is no user facing change, but
internally the extra requests would previously be queued onto a worker
thread as they would wake up with no connection waiting, and be
punted. Now they do not wake up unnecessarily.

Co-developed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220325093755.4123343-1-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-25 06:30:23 -06:00
Linus Torvalds
85c7000fda The highlights are:
- several changes to how snap context and snap realms are tracked
   (Xiubo Li).  In particular, this should resolve a long-standing
   issue of high kworker CPU usage and various stalls caused by
   needless iteration over all inodes in the snap realm.
 
 - async create fixes to address hangs in some edge cases (Jeff Layton)
 
 - support for getvxattr MDS op for querying server-side xattrs, such
   as file/directory layouts and ephemeral pins (Milind Changire)
 
 - average latency is now maintained for all metrics (Venky Shankar)
 
 - some tweaks around handling inline data to make it fit better with
   netfs helper library (David Howells)
 
 Also a couple of memory leaks got plugged along with a few assorted
 fixups.  Last but not least, Xiubo has stepped up to serve as a CephFS
 co-maintainer.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmI8qE4THGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHzi9UvB/kBZt4mkyjqRC+KJ5rukw5q7lyAYGC1
 QYVbuTuIJydyDvqQp9pXYFndlj10pb7ULnUlQlcBfntVAr9s7xx7ZKrKciE48MPT
 vLiJmq3MpEedM4oE4FgcJbmHtltDgZWvOxXB7renpHNeHuPeezNpKzaKQXGUHUDo
 +7cX5XWBzZk+AYbEvxQUsjDozcgDp31qf015mAX3r0P7XFkBB7xwZA7sb7Cw1GEr
 S6ZdlNoFcWUq0ULUdh2C7l5a2mKQnVnpOO3TMjE6tSqJ74iozRy4tO9aFgj99NEn
 D1rQbCLr3JPfY//JFyqEOIDYf3hepOMmEkoGHOFukckFKUe3yfJJxa3r
 =ESr5
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-5.18-rc1' of https://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
 "The highlights are:

   - several changes to how snap context and snap realms are tracked
     (Xiubo Li). In particular, this should resolve a long-standing
     issue of high kworker CPU usage and various stalls caused by
     needless iteration over all inodes in the snap realm.

   - async create fixes to address hangs in some edge cases (Jeff
     Layton)

   - support for getvxattr MDS op for querying server-side xattrs, such
     as file/directory layouts and ephemeral pins (Milind Changire)

   - average latency is now maintained for all metrics (Venky Shankar)

   - some tweaks around handling inline data to make it fit better with
     netfs helper library (David Howells)

  Also a couple of memory leaks got plugged along with a few assorted
  fixups. Last but not least, Xiubo has stepped up to serve as a CephFS
  co-maintainer"

* tag 'ceph-for-5.18-rc1' of https://github.com/ceph/ceph-client: (27 commits)
  ceph: fix memory leak in ceph_readdir when note_last_dentry returns error
  ceph: uninitialized variable in debug output
  ceph: use tracked average r/w/m latencies to display metrics in debugfs
  ceph: include average/stdev r/w/m latency in mds metrics
  ceph: track average r/w/m latency
  ceph: use ktime_to_timespec64() rather than jiffies_to_timespec64()
  ceph: assign the ci only when the inode isn't NULL
  ceph: fix inode reference leakage in ceph_get_snapdir()
  ceph: misc fix for code style and logs
  ceph: allocate capsnap memory outside of ceph_queue_cap_snap()
  ceph: do not release the global snaprealm until unmounting
  ceph: remove incorrect and unused CEPH_INO_DOTDOT macro
  MAINTAINERS: add Xiubo Li as cephfs co-maintainer
  ceph: eliminate the recursion when rebuilding the snap context
  ceph: do not update snapshot context when there is no new snapshot
  ceph: zero the dir_entries memory when allocating it
  ceph: move to a dedicated slabcache for ceph_cap_snap
  ceph: add getvxattr op
  libceph: drop else branches in prepare_read_data{,_cont}
  ceph: fix comments mentioning i_mutex
  ...
2022-03-24 18:32:48 -07:00
Linus Torvalds
b1b07ba356 New code for 5.18:
- Fix some incorrect mapping state being passed to iomap during COW
  - Don't create bogus selinux audit messages when deciding to degrade
    gracefully due to lack of privilege
  - Fix setattr implementation to use VFS helpers so that we drop setgid
    consistently with the other filesystems
  - Fix link/unlink/rename to check quota limits
  - Constify xfs_name_dotdot to prevent abuse of in-kernel symbols
  - Fix log livelock between the AIL and inodegc threads during recovery
  - Fix a log stall when the AIL races with pushers
  - Fix stalls in CIL flushes due to pinned inode cluster buffers during
    recovery
  - Fix log corruption due to incorrect usage of xfs_is_shutdown vs
    xlog_is_shutdown because during an induced fs shutdown, AIL writeback
    must continue until the log is shut down, even if the filesystem has
    already shut down
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmI3UQ8ACgkQ+H93GTRK
 tOttpBAAjz05QkClIYKlHREiWt4a2a0FnRBvkz2fvZb0Nk3kST/drJ14VAb1Q3g1
 HvvOGwssyJShkbd6DBGTqrp9AzZ0mfhSYPpARtCNwOvUNV4tXwLdiBZ86H4/l3qy
 seS1tXW4pUKm6xhN9fnMw1sk4oJmPXq67JW+1h35oDaE6tpdd9lehFMpMxaTMjED
 4WsU0UwKCWr4l1lKP1EXvh+ENJeo0QZpccNHV3ChLnWx0mPMRpf4tQP2A01oBPkh
 oXHSms0TV7FHDIv+Zil3kBgkfh266WhcPdZ4pfIqyQc51LzbgrxEqcrZHGI5XJjB
 zcQd8RkzOI68XIDchijOorOkQZmDEDIGlCgSY6q/JV4N2iFyF84hHedk2le5j97l
 Jmout2Xng3bgstl4963IjXk8SvPTebnI76a62XoEcHnf4KBROLKIU7wNCh5LXNvk
 CI6AWJGy6EAGEc3BHyPFZxZF72D9rUmRbPIJBc7rBhJgMoIy4L4sFlvVtyppbERb
 gMH9PNTLjY5/abQkV044iLOsEDh5FEPBIehoaENNo252vj2R/WsAAQL13l0cMsrN
 vAryGdAEelZEyg62k+HT4W87zjH0Kgtgli6zMdx7akYdjKbMOIZALEu61iBH7KT+
 heAz8pU/krTy+Q583XU13eCWnY1wPrHRMwdGF+i4WUGiKuJSoYg=
 =uHT0
 -----END PGP SIGNATURE-----

Merge tag 'xfs-5.18-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs updates from Darrick Wong:
 "The biggest change this cycle is bringing XFS' inode attribute setting
  code back towards alignment with what the VFS does. IOWs, setgid bit
  handling should be a closer match with ext4 and btrfs behavior.

  The rest of the branch is bug fixes around the filesystem -- patching
  gaps in quota enforcement, removing bogus selinux audit messages, and
  fixing log corruption and problems with log recovery. There will be a
  second pull request later on in the merge window with more bug fixes.

  Dave Chinner will be taking over as XFS maintainer for one release
  cycle, starting from the day 5.18-rc1 drops until 5.19-rc1 is tagged
  so that I can focus on starting a massive design review for the
  (feature complete after five years) online repair feature.

  Summary:

   - Fix some incorrect mapping state being passed to iomap during COW

   - Don't create bogus selinux audit messages when deciding to degrade
     gracefully due to lack of privilege

   - Fix setattr implementation to use VFS helpers so that we drop
     setgid consistently with the other filesystems

   - Fix link/unlink/rename to check quota limits

   - Constify xfs_name_dotdot to prevent abuse of in-kernel symbols

   - Fix log livelock between the AIL and inodegc threads during
     recovery

   - Fix a log stall when the AIL races with pushers

   - Fix stalls in CIL flushes due to pinned inode cluster buffers
     during recovery

   - Fix log corruption due to incorrect usage of xfs_is_shutdown vs
     xlog_is_shutdown because during an induced fs shutdown, AIL
     writeback must continue until the log is shut down, even if the
     filesystem has already shut down"

* tag 'xfs-5.18-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: xfs_is_shutdown vs xlog_is_shutdown cage fight
  xfs: AIL should be log centric
  xfs: log items should have a xlog pointer, not a mount
  xfs: async CIL flushes need pending pushes to be made stable
  xfs: xfs_ail_push_all_sync() stalls when racing with updates
  xfs: check buffer pin state after locking in delwri_submit
  xfs: log worker needs to start before intent/unlink recovery
  xfs: constify xfs_name_dotdot
  xfs: constify the name argument to various directory functions
  xfs: reserve quota for target dir expansion when renaming files
  xfs: reserve quota for dir expansion when linking/unlinking files
  xfs: refactor user/group quota chown in xfs_setattr_nonsize
  xfs: use setattr_copy to set vfs inode attributes
  xfs: don't generate selinux audit messages for capability testing
  xfs: add missing cmap->br_state = XFS_EXT_NORM update
2022-03-24 18:28:01 -07:00
Linus Torvalds
f0614eefbf dax for 5.18
- Fix a crash due to a missing rcu_barrier() in dax_fs_exit()
 
 - Fix two miscellaneous doc issues
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCYjp4zgAKCRDfioYZHlFs
 ZwgDAQD/jJE/uyFozwZR/YD8G293Hzs/acPUEQ/09t06W68SIAEAtwBr3WHKiFJL
 M+utFlm05fShOQl3lGwqiPS5/mQvzwc=
 =7AnI
 -----END PGP SIGNATURE-----

Merge tag 'dax-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull DAX updates from Dan Williams:
 "Andrew has been shepherding major dax features that touch the core -mm
  through his tree, but I still collect the dax updates that are core-mm
  independent.

   - Fix a crash due to a missing rcu_barrier() in dax_fs_exit()

   - Fix two miscellaneous doc issues"

* tag 'dax-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  dax: Fix missing kdoc for dax_device
  dax: make sure inodes are flushed before destroy cache
  fsdax: fix function description
2022-03-24 18:12:09 -07:00
Jens Axboe
34d2bfe7d4 io_uring: improve task work cache utilization
While profiling task_work intensive workloads, I noticed that most of
the time in tctx_task_work() is spending stalled on loading 'req'. This
is one of the unfortunate side effects of using linked lists,
particularly when they end up being passe around.

Prefetch the next request, if there is one. There's a sufficient amount
of work in between that this makes it available for the next loop.

While fiddling with the cache layout, move the link outside of the
hot completion cacheline. It's rarely used in hot workloads, so better
to bring in kbuf which is used for networked loads with provided buffers.

This reduces tctx_task_work() overhead from ~3% to 1-1.5% in my testing.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-24 17:09:26 -06:00
Andreas Gruenbacher
3bde4c4858 gfs2: Make sure not to return short direct writes
When direct writes fail with -ENOTBLK because we're writing into a
hole (gfs2_iomap_begin()) or because of a page invalidation failure
(iomap_dio_rw()), we're falling back to buffered writes.  In that case,
when we lose the inode glock in gfs2_file_buffered_write(), we want to
re-acquire it instead of returning a short write.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-03-24 23:40:43 +01:00
Andreas Gruenbacher
11661835f9 gfs2: Remove dead code in gfs2_file_read_iter
Function iomap_dio_rw() only returns -ENOTBLK for write requests and
gfs2_file_direct_read() no longer returns -ENOTBLK since commit
1d45bb7f9d2a5 ("gfs2: Use iomap for stuffed direct I/O reads"), so there
is no need to check for -ENOTBLK in gfs2_file_read_iter() anymore.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-03-24 23:40:36 +01:00
Andreas Gruenbacher
46f3e0421c gfs2: Fix gfs2_file_buffered_write endless loop workaround
Since commit 554c577cee95b, gfs2_file_buffered_write() can accidentally
return a truncated iov_iter, which might confuse callers.  Fix that.

Fixes: 554c577cee95b ("gfs2: Prevent endless loops in gfs2_file_buffered_write")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-03-24 23:38:11 +01:00
Dylan Yudaken
a73825ba70 io_uring: fix async accept on O_NONBLOCK sockets
Do not set REQ_F_NOWAIT if the socket is non blocking. When enabled this
causes the accept to immediately post a CQE with EAGAIN, which means you
cannot perform an accept SQE on a NONBLOCK socket asynchronously.

By removing the flag if there is no pending accept then poll is armed as
usual and when a connection comes in the CQE is posted.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220324143435.2875844-1-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-24 16:10:29 -06:00
Linus Torvalds
52deda9551 Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "Various misc subsystems, before getting into the post-linux-next
  material.

  41 patches.

  Subsystems affected by this patch series: procfs, misc, core-kernel,
  lib, checkpatch, init, pipe, minix, fat, cgroups, kexec, kdump,
  taskstats, panic, kcov, resource, and ubsan"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (41 commits)
  Revert "ubsan, kcsan: Don't combine sanitizer with kcov on clang"
  kernel/resource: fix kfree() of bootmem memory again
  kcov: properly handle subsequent mmap calls
  kcov: split ioctl handling into locked and unlocked parts
  panic: move panic_print before kmsg dumpers
  panic: add option to dump all CPUs backtraces in panic_print
  docs: sysctl/kernel: add missing bit to panic_print
  taskstats: remove unneeded dead assignment
  kasan: no need to unset panic_on_warn in end_report()
  ubsan: no need to unset panic_on_warn in ubsan_epilogue()
  panic: unset panic_on_warn inside panic()
  docs: kdump: add scp example to write out the dump file
  docs: kdump: update description about sysfs file system support
  arm64: mm: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
  x86/setup: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
  riscv: mm: init: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
  kexec: make crashk_res, crashk_low_res and crash_notes symbols always visible
  cgroup: use irqsave in cgroup_rstat_flush_locked().
  fat: use pointer to simple type in put_user()
  minix: fix bug when opening a file with O_DIRECT
  ...
2022-03-24 14:14:07 -07:00
Linus Torvalds
3ce62cf4dc flexible-array transformations for 5.18-rc1
Hi Linus,
 
 Please, pull the following treewide patch that replaces zero-length arrays with
 flexible-array members. This patch has been baking in linux-next for a
 whole development cycle.
 
 Thanks
 --
 Gustavo
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEkmRahXBSurMIg1YvRwW0y0cG2zEFAmI6GIUACgkQRwW0y0cG
 2zFLWw/+OB1gZeQD3boKpUMntWnn6wjhUxdrO8CYkpzG+B+8TFECXNjy8HV1CSiw
 GKKRndYELOyYaD5o/F2vtPe10iPHbrdIlMFRPBRoht0/cvSZgzHlfT8EjWQwerYY
 dieztUFKjeSj0MXivdNDnKOTm8o9cz8KmCrWFP+My37Fasn/9+nBX8iNVIvAX4xy
 T+IVmjtDifQUsTs298UGnBvDeuZOiGHhXXU5rq6lIX0Rl554OsWZW94d6jUPj/h7
 t1v6jdojNuyaMKn45/xnPj9VvmDiSu3K67m3fjRdzLPDOhISjr2fw4KEUOKdsebh
 yJ9t5u8IufyPbm9kyI+rZt+T8ZlV2/qt2+mt6QgtDMnWrs+4nU15JY0SHImMSBZQ
 rBEZcQlrIcGJ+CsNB8Y7jIGYO0SSkhodAvfl0LRA0AbTqLGqq0OkAQS5D52r3H2r
 uz6xdYb7kG43XaRyaAIPqhZsp/jk2NrXvEvin2tSaXZFR1cxp+oxcV2UajmnOU6i
 EIBS4PzJnYx2RZRa+h8YbBa/+D4N6+fj/tjmwBawiUBPjjaLAsGFNwUHqvBoD05S
 bk6oXi654NBwVjsknZ0grVz0TtSvdZ3uJL5FZApTOHITqH8vlxlNefmHri4vZRZO
 NN7NIQ0yaUCnorzMg+vP8ZtflhQwrMJbjwIS9YD0RHd7MBhYX8k=
 =xZD2
 -----END PGP SIGNATURE-----

Merge tag 'flexible-array-transformations-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux

Pull flexible-array transformations from Gustavo Silva:
 "Treewide patch that replaces zero-length arrays with flexible-array
  members.

  This has been baking in linux-next for a whole development cycle"

* tag 'flexible-array-transformations-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
  treewide: Replace zero-length arrays with flexible-array members
2022-03-24 11:39:32 -07:00
Linus Torvalds
2e2d4650b3 fs.rt.v5.18
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYjr50AAKCRCRxhvAZXjc
 oroBAQC0daGrkl6XJirLzyMVjNWWbuSEdB+Q3ipMrwySxjYmtwD/XSPGAKGBbHQz
 13EtvTMT8FlqSyDIKHYBM5uDyrWQZgs=
 =uJdq
 -----END PGP SIGNATURE-----

Merge tag 'fs.rt.v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull mount attributes PREEMPT_RT update from Christian Brauner:
 "This contains Sebastian's fix to make changing mount
  attributes/getting write access compatible with CONFIG_PREEMPT_RT.

  The change only applies when users explicitly opt-in to real-time via
  CONFIG_PREEMPT_RT otherwise things are exactly as before. We've waited
  quite a long time with this to make sure folks could take a good look"

* tag 'fs.rt.v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  fs/namespace: Boost the mount_lock.lock owner instead of spinning on PREEMPT_RT.
2022-03-24 10:06:43 -07:00
Linus Torvalds
15f2e3d6c1 fs.v5.18
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYjrxKAAKCRCRxhvAZXjc
 oi77AQDdLziGyeKSHlyPw1xeWiqsZVRLhhD07F8kVaAkjg3ZWwEAs3jL0Pzfnd7J
 IIx//6YK46cbfc471FdVuUDfMjkfVAw=
 =VaFd
 -----END PGP SIGNATURE-----

Merge tag 'fs.v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull mount_setattr updates from Christian Brauner:
 "This contains a few more patches to massage the mount_setattr()
  codepaths and one minor fix to reuse a helper we added some time back.

  The final two patches do similar cleanups in different ways. One patch
  is mine and the other is Al's who was nice enough to give me a branch
  for it.

  Since his came in later and my branch had been sitting in -next for
  quite some time we just put his on top instead of swap them"

* tag 'fs.v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  mount_setattr(): clean the control flow and calling conventions
  fs: clean up mount_setattr control flow
  fs: don't open-code mnt_hold_writers()
  fs: simplify check in mount_setattr_commit()
  fs: add mnt_allow_writers() and simplify mount_setattr_prepare()
2022-03-24 09:55:15 -07:00
Olga Kornievskaia
1d15d121cc NFSv4.1: don't retry BIND_CONN_TO_SESSION on session error
There is no reason to retry the operation if a session error had
occurred in such case result structure isn't filled out.

Fixes: dff58530c4ca ("NFSv4.1: fix handling of backchannel binding in BIND_CONN_TO_SESSION")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-24 12:06:07 -04:00
Jakob Koschel
3de24f3d70 NFS: replace usage of found with dedicated list iterator variable
To move the list iterator variable into the list_for_each_entry_*()
macro in the future it should be avoided to use the list iterator
variable after the loop body.

To *never* use the list iterator variable after the loop it was
concluded to use a separate iterator variable instead of a
found boolean [1].

This removes the need to use a found variable and simply checking if
the variable was set, can determine if the break/goto was hit.

Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/
Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-24 12:06:07 -04:00
Jens Axboe
7ef66d186e io_uring: remove IORING_CQE_F_MSG
This was introduced with the message ring opcode, but isn't strictly
required for the request itself. The sender can encode what is needed
in user_data, which is passed to the receiver. It's unclear if having
a separate flag that essentially says "This CQE did not originate from
an SQE on this ring" provides any real utility to applications. While
we can always re-introduce a flag to provide this information, we cannot
take it away at a later point in time.

Remove the flag while we still can, before it's in a released kernel.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-24 06:53:18 -06:00
Helge Deller
2cd50532ce fat: use pointer to simple type in put_user()
The put_user(val,ptr) macro wants a pointer to a simple type, but in
fat_ioctl_filldir() the d_name field references an "array of chars".  Be
more accurate and explicitly give the pointer to the first character of
the d_name[] array.

I noticed that issue while trying to optimize the parisc put_user()
macro and used an intermediate variable to store the pointer.  In that
case I got this error:

  In file included from include/linux/uaccess.h:11,
                   from include/linux/compat.h:17,
                   from fs/fat/dir.c:18:
  fs/fat/dir.c: In function `fat_ioctl_filldir':
  fs/fat/dir.c:725:33: error: invalid initializer
    725 |                 if (put_user(0, d2->d_name)                     ||         \
        |                                 ^~
  include/asm/uaccess.h:152:33: note: in definition of macro `__put_user'
    152 |         __typeof__(ptr) __ptr = ptr;                            \
        |                                 ^~~
  fs/fat/dir.c:759:1: note: in expansion of macro `FAT_IOCTL_FILLDIR_FUNC'
    759 | FAT_IOCTL_FILLDIR_FUNC(fat_ioctl_filldir, __fat_dirent)

Andreas Schwab <schwab@linux-m68k.org> suggested to use

   __typeof__(&*(ptr)) __ptr = ptr;

instead.  This works, but nevertheless it's probably reasonable to fix
the original caller too.

Link: https://lkml.kernel.org/r/Ygo+A9MREmC1H3kr@p100
Signed-off-by: Helge Deller <deller@gmx.de>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: David Laight <David.Laight@aculab.com>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23 19:00:34 -07:00
Qinghua Jin
9ce3c0d26c minix: fix bug when opening a file with O_DIRECT
Testcase:
1. create a minix file system and mount it
2. open a file on the file system with O_RDWR|O_CREAT|O_TRUNC|O_DIRECT
3. open fails with -EINVAL but leaves an empty file behind. All other
   open() failures don't leave the failed open files behind.

It is hard to check the direct_IO op before creating the inode.  Just as
ext4 and btrfs do, this patch will resolve the issue by allowing to
create the file with O_DIRECT but returning error when writing the file.

Link: https://lkml.kernel.org/r/20220107133626.413379-1-qhjin.dev@gmail.com
Signed-off-by: Qinghua Jin <qhjin.dev@gmail.com>
Reported-by: Colin Ian King <colin.king@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23 19:00:34 -07:00
Andrei Vagin
aeb213cdde fs/pipe.c: local vars have to match types of proper pipe_inode_info fields
head, tail, ring_size are declared as unsigned int, so all local
variables that operate with these fields have to be unsigned to avoid
signed integer overflow.

Right now, it isn't an issue because the maximum pipe size is limited by
1U<<31.

Link: https://lkml.kernel.org/r/20220106171946.36128-1-avagin@gmail.com
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Suggested-by: Dmitry Safonov <0x7f454c46@gmail.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23 19:00:34 -07:00
Andrei Vagin
5a519c8fe4 fs/pipe: use kvcalloc to allocate a pipe_buffer array
Right now, kcalloc is used to allocate a pipe_buffer array.  The size of
the pipe_buffer struct is 40 bytes.  kcalloc allows allocating reliably
chunks with sizes less or equal to PAGE_ALLOC_COSTLY_ORDER (3).  It
means that the maximum pipe size is 3.2MB in this case.

In CRIU, we use pipes to dump processes memory.  CRIU freezes a target
process, injects a parasite code into it and then this code splices
memory into pipes.  If a maximum pipe size is small, we need to do many
iterations or create many pipes.

kvcalloc attempt to allocate physically contiguous memory, but upon
failure, fall back to non-contiguous (vmalloc) allocation and so it
isn't limited by PAGE_ALLOC_COSTLY_ORDER.

The maximum pipe size for non-root users is limited by the
/proc/sys/fs/pipe-max-size sysctl that is 1MB by default, so only the
root user will be able to trigger vmalloc allocations.

Link: https://lkml.kernel.org/r/20220104171058.22580-1-avagin@gmail.com
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23 19:00:34 -07:00
Yang Li
e9f5d1017c proc/vmcore: fix vmcore_alloc_buf() kernel-doc comment
Fix a spelling problem to remove warnings found by running
scripts/kernel-doc, which is caused by using 'make W=1'.

  fs/proc/vmcore.c:492: warning: Function parameter or member 'size' not described in 'vmcore_alloc_buf'
  fs/proc/vmcore.c:492: warning: Excess function parameter 'sizez' description in 'vmcore_alloc_buf'

Link: https://lkml.kernel.org/r/20220129011449.105278-1-yang.lee@linux.alibaba.com
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Acked-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23 19:00:33 -07:00
David Hildenbrand
5039b17036 proc/vmcore: fix possible deadlock on concurrent mmap and read
Lockdep noticed that there is chance for a deadlock if we have concurrent
mmap, concurrent read, and the addition/removal of a callback.

As nicely explained by Boqun:
 "Lockdep warned about the above sequences because rw_semaphore is a
  fair read-write lock, and the following can cause a deadlock:

	TASK 1			TASK 2		TASK 3
	======			======		======
	down_write(mmap_lock);
				down_read(vmcore_cb_rwsem)
						down_write(vmcore_cb_rwsem); // blocked
	down_read(vmcore_cb_rwsem); // cannot get the lock because of the fairness
				down_read(mmap_lock); // blocked

  IOW, a reader can block another read if there is a writer queued by
  the second reader and the lock is fair"

To fix this, convert to srcu to make this deadlock impossible.  We need
srcu as our callbacks can sleep.  With this change, I cannot trigger any
lockdep warnings.

    ======================================================
    WARNING: possible circular locking dependency detected
    5.17.0-0.rc0.20220117git0c947b893d69.68.test.fc36.x86_64 #1 Not tainted
    ------------------------------------------------------
    makedumpfile/542 is trying to acquire lock:
    ffffffff832d2eb8 (vmcore_cb_rwsem){.+.+}-{3:3}, at: mmap_vmcore+0x340/0x580

    but task is already holding lock:
    ffff8880af226438 (&mm->mmap_lock#2){++++}-{3:3}, at: vm_mmap_pgoff+0x84/0x150

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #1 (&mm->mmap_lock#2){++++}-{3:3}:
           lock_acquire+0xc3/0x1a0
           __might_fault+0x4e/0x70
           _copy_to_user+0x1f/0x90
           __copy_oldmem_page+0x72/0xc0
           read_from_oldmem+0x77/0x1e0
           read_vmcore+0x2c2/0x310
           proc_reg_read+0x47/0xa0
           vfs_read+0x101/0x340
           __x64_sys_pread64+0x5d/0xa0
           do_syscall_64+0x43/0x90
           entry_SYSCALL_64_after_hwframe+0x44/0xae

    -> #0 (vmcore_cb_rwsem){.+.+}-{3:3}:
           validate_chain+0x9f4/0x2670
           __lock_acquire+0x8f7/0xbc0
           lock_acquire+0xc3/0x1a0
           down_read+0x4a/0x140
           mmap_vmcore+0x340/0x580
           proc_reg_mmap+0x3e/0x90
           mmap_region+0x504/0x880
           do_mmap+0x38a/0x520
           vm_mmap_pgoff+0xc1/0x150
           ksys_mmap_pgoff+0x178/0x200
           do_syscall_64+0x43/0x90
           entry_SYSCALL_64_after_hwframe+0x44/0xae

    other info that might help us debug this:

     Possible unsafe locking scenario:

           CPU0                    CPU1
           ----                    ----
      lock(&mm->mmap_lock#2);
                                   lock(vmcore_cb_rwsem);
                                   lock(&mm->mmap_lock#2);
      lock(vmcore_cb_rwsem);

     *** DEADLOCK ***

    1 lock held by makedumpfile/542:
     #0: ffff8880af226438 (&mm->mmap_lock#2){++++}-{3:3}, at: vm_mmap_pgoff+0x84/0x150

    stack backtrace:
    CPU: 0 PID: 542 Comm: makedumpfile Not tainted 5.17.0-0.rc0.20220117git0c947b893d69.68.test.fc36.x86_64 #1
    Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
    Call Trace:
     __lock_acquire+0x8f7/0xbc0
     lock_acquire+0xc3/0x1a0
     down_read+0x4a/0x140
     mmap_vmcore+0x340/0x580
     proc_reg_mmap+0x3e/0x90
     mmap_region+0x504/0x880
     do_mmap+0x38a/0x520
     vm_mmap_pgoff+0xc1/0x150
     ksys_mmap_pgoff+0x178/0x200
     do_syscall_64+0x43/0x90

Link: https://lkml.kernel.org/r/20220119193417.100385-1-david@redhat.com
Fixes: cc5f2704c934 ("proc/vmcore: convert oldmem_pfn_is_ram callback to more generic vmcore callbacks")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Baoquan He <bhe@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23 19:00:33 -07:00
Hao Lee
3a72917ccf proc: alloc PATH_MAX bytes for /proc/${pid}/fd/ symlinks
It's not a standard approach that use __get_free_page() to alloc path
buffer directly.  We'd better use kmalloc and PATH_MAX.

	PAGE_SIZE is different on different archs. An unlinked file
	with very long canonical pathname will readlink differently
	because "(deleted)" eats into a buffer.	--adobriyan

[akpm@linux-foundation.org: remove now-unneeded cast]

Link: https://lkml.kernel.org/r/Ye1fCxyZZ0I5lgOL@localhost.localdomain
Signed-off-by: Hao Lee <haolee.swjtu@gmail.com>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23 19:00:33 -07:00
Linus Torvalds
194dfe88d6 asm-generic updates for 5.18
There are three sets of updates for 5.18 in the asm-generic tree:
 
  - The set_fs()/get_fs() infrastructure gets removed for good. This
    was already gone from all major architectures, but now we can
    finally remove it everywhere, which loses some particularly
    tricky and error-prone code.
    There is a small merge conflict against a parisc cleanup, the
    solution is to use their new version.
 
  - The nds32 architecture ends its tenure in the Linux kernel. The
    hardware is still used and the code is in reasonable shape, but
    the mainline port is not actively maintained any more, as all
    remaining users are thought to run vendor kernels that would never
    be updated to a future release.
    There are some obvious conflicts against changes to the removed
    files.
 
  - A series from Masahiro Yamada cleans up some of the uapi header
    files to pass the compile-time checks.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmI69BsACgkQmmx57+YA
 GNn/zA//f4d5VTT0ThhRxRWTu9BdThGHoB8TUcY7iOhbsWu0X/913NItRC3UeWNl
 IdmisaXgVtirg1dcC2pWUmrcHdoWOCEGfK4+Zr2NhSWfuZDWvODHK9pGWk4WLnhe
 cQgUNBvIuuAMryGtrOBwHPO4TpfCyy2ioeVP36ZfcsWXdDxTrqfaq/56mk3sxIP6
 sUTk1UEjut9NG4C9xIIvcSU50R3l6LryQE/H9kyTLtaSvfvTOvprcVYCq0GPmSzo
 DtQ1Wwa9zbJ+4EqoMiP5RrgQwWvOTg2iRByLU8ytwlX3e/SEF0uihvMv1FQbL8zG
 G8RhGUOKQSEhaBfc3lIkm8GpOVPh0uHzB6zhn7daVmAWtazRD2Nu59BMjipa+ims
 a8Z58iHH7jRAnKeEkVZqXKb1CEiUxaQx/IeVPzN4QlwMhDtwrI76LY7ZJ1zCqTGY
 ENG0yRLav1XselYBslOYXGtOEWcY5EZPWqLyWbp4P9vz2g0Fe0gZxoIOvPmNQc89
 QnfXpCt7vm/DGkyO255myu08GOLeMkisVqUIzLDB9avlym5mri7T7vk9abBa2YyO
 CRpTL5gl1/qKPWuH1UI5mvhT+sbbBE2SUHSuy84btns39ZKKKynwCtdu+hSQkKLE
 h9pV30Gf1cLTD4JAE0RWlUgOmbBLVp34loTOexQj4MrLM1noOnw=
 =vtCN
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic updates from Arnd Bergmann:
 "There are three sets of updates for 5.18 in the asm-generic tree:

   - The set_fs()/get_fs() infrastructure gets removed for good.

     This was already gone from all major architectures, but now we can
     finally remove it everywhere, which loses some particularly tricky
     and error-prone code. There is a small merge conflict against a
     parisc cleanup, the solution is to use their new version.

   - The nds32 architecture ends its tenure in the Linux kernel.

     The hardware is still used and the code is in reasonable shape, but
     the mainline port is not actively maintained any more, as all
     remaining users are thought to run vendor kernels that would never
     be updated to a future release.

   - A series from Masahiro Yamada cleans up some of the uapi header
     files to pass the compile-time checks"

* tag 'asm-generic-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (27 commits)
  nds32: Remove the architecture
  uaccess: remove CONFIG_SET_FS
  ia64: remove CONFIG_SET_FS support
  sh: remove CONFIG_SET_FS support
  sparc64: remove CONFIG_SET_FS support
  lib/test_lockup: fix kernel pointer check for separate address spaces
  uaccess: generalize access_ok()
  uaccess: fix type mismatch warnings from access_ok()
  arm64: simplify access_ok()
  m68k: fix access_ok for coldfire
  MIPS: use simpler access_ok()
  MIPS: Handle address errors for accesses above CPU max virtual user address
  uaccess: add generic __{get,put}_kernel_nofault
  nios2: drop access_ok() check from __put_user()
  x86: use more conventional access_ok() definition
  x86: remove __range_not_ok()
  sparc64: add __{get,put}_kernel_nofault()
  nds32: fix access_ok() checks in get/put_user
  uaccess: fix nios2 and microblaze get_user_8()
  sparc64: fix building assembly files
  ...
2022-03-23 18:03:08 -07:00