69451 Commits

Author SHA1 Message Date
Steve French
40f077a02b cifs: clarify hostname vs ip address in /proc/fs/cifs/DebugData
/proc/fs/cifs/DebugData called the ip address for server sessions
"Name" which is confusing since it is not a hostname. Change
this field name to "Address" and for the list of servers add
new field "Hostname" which is populated from the hostname used
to connect to the server.  See below. And also don't print
[NONE] when the interface list is empty as it is not clear
what 'NONE' referred to.

Servers:
1) ConnectionId: 0x1 Hostname: localhost
Number of credits: 389 Dialect 0x311
TCP status: 1 Instance: 1
Local Users To Server: 1 SecMode: 0x1 Req On Wire: 0
In Send: 0 In MaxReq Wait: 0

	Sessions:
	1) Address: 127.0.0.1
...

Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-02-22 21:20:44 -06:00
Steve French
b438fcf128 cifs: change confusing field serverName (to ip_addr)
ses->serverName is not the server name, but the string form
of the ip address of the server.  Change the name to ip_addr
to avoid confusion (and fix the array length to match
maximum length of ipv6 address).

Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-02-22 21:20:43 -06:00
Linus Torvalds
e913a8cdc2 Fixes around VM_FPNMAP and follow_pfn
- replace mm/frame_vector.c by get_user_pages in misc/habana and
   drm/exynos drivers, then move that into media as it's sole user
 - close race in generic_access_phys
 - s390 pci ioctl fix of this series landed in 5.11 already
 - properly revoke iomem mappings (/dev/mem, pci files)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEb4nG6jLu8Y5XI+PfTA9ye/CYqnEFAmAzgywACgkQTA9ye/CY
 qnFPbA//RUHB5bD7vwnEglfJhonKSi/Vt3dNQwUI+pCFK8muWvvPyTkGXKjjT2dI
 uAOY2F23wymtIexV3fNLgnMez7kMcupOLkdxJic4GiO+HJn1jnkshdX7/dGtUW7O
 G3yfnf/D27i912tT3j6PN7dVnasAYYtndCgImM027Zigzn4ibY+02tnzd5XTj1F8
 yq8Swx88oqF8v10HxfpF3RLShqT3S17mFmd9dTv0GkZX497Pe75O44XcXzkD33Bj
 wasH2Tz8gMEQx6TNAGlJe13dzDHReh2cG0z2r+6PTA6KnaMMxbEIImHNuhWOmHb/
 nf8Jpu9uMOLzB+3hG3TzISTDBhAgPfoJ8Ov40VJCWMtCVBnyMyPJr28Oobb8Dj3V
 SXvjSVlLeobOLt+E9vAS+Rmas07LCGBdNP9sexxV7S/sveSQ5W+FptaQW03EghwA
 nBYEUC68WqpX99lJCFPmv5zmy5xkecjpU6mLHZljtV1ORzktqWZdVhmC8njHMAMY
 Hi/emnPxEX1FpOD38rr7F9KUUSsy4t/ZaCgVaLcxCcbglCHXSHC41R09p9TBRSJo
 G6Lksjyj4aa+UL5dZDAtLY0shg0bv2u93dGQNaDAC+uzj6D0ErBBzDK570zBKjp/
 75+nqezJlD0d7I6rOl6FwiEYeSrYXJxYEveKVUr8CnH6sfeBlwo=
 =lQoR
 -----END PGP SIGNATURE-----

Merge tag 'topic/iomem-mmap-vs-gup-2021-02-22' of git://anongit.freedesktop.org/drm/drm

Pull follow_pfn() updates from Daniel Vetter:
 "Fixes around VM_FPNMAP and follow_pfn:

   - replace mm/frame_vector.c by get_user_pages in misc/habana and
     drm/exynos drivers, then move that into media as it's sole user

   - close race in generic_access_phys

   - s390 pci ioctl fix of this series landed in 5.11 already

   - properly revoke iomem mappings (/dev/mem, pci files)"

* tag 'topic/iomem-mmap-vs-gup-2021-02-22' of git://anongit.freedesktop.org/drm/drm:
  PCI: Revoke mappings like devmem
  PCI: Also set up legacy files only after sysfs init
  sysfs: Support zapping of binary attr mmaps
  resource: Move devmem revoke code to resource framework
  /dev/mem: Only set filp->f_mapping
  PCI: Obey iomem restrictions for procfs mmap
  mm: Close race in generic_access_phys
  media: videobuf2: Move frame_vector into media subsystem
  mm/frame-vector: Use FOLL_LONGTERM
  misc/habana: Use FOLL_LONGTERM for userptr
  misc/habana: Stop using frame_vector helpers
  drm/exynos: Use FOLL_LONGTERM for g2d cmdlists
  drm/exynos: Stop using frame_vector helpers
2021-02-22 17:45:02 -08:00
Linus Torvalds
4b5f9254e4 kconfig for kcmp syscall
drm userspaces uses this, systemd uses this, makes sense to pull it
 out from the checkpoint-restore bundle. Kees reviewed this from
 security pov and is happy with the final version.
 
 LWN coverage: https://lwn.net/Articles/845448/
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEb4nG6jLu8Y5XI+PfTA9ye/CYqnEFAmAzaXIACgkQTA9ye/CY
 qnH5FQ//eL/7a/PDICuCRIN2p2aQwHoe9d12q+01RolAgce6F9mR9SFiKGSCR+t7
 daw4G/BaGxSYzvz1IqWbXDMhN87jAXV/IGs9k4OkSIcbnDmMY78EKMZB1c1t7AZo
 zmeAuQvmTAcBogTwC6IE9N1JwhH3fmudq4p8zZ4zLojJNSPjrwCvF/xQI/Yaw52V
 CTfni8mrjYJ+pZ1qn9XP3IceAFEEI27ubZj2DJU+5xpRJAdIAobo0XbVOf8XQ0uc
 /BRLyXFS66EDsY1wWHT6y6UXDNZgbLic0olC1aielaBJh+Wq6bQHgephxpasU5y7
 cZX7XTX2N1q8j8NmgzWLYRgERqtXv0CPHKdimTs8SaUcPDGhxcnwPR6hmdQEC+i6
 IjntWMERjfuyD+s6qVuc7s8WS7+Ry9OxgdVskHASqGpBvsSliXN1o02Am6WUuGsB
 HZxTjCe967FyL4LGU0YjobMTUUSWfYQkOBKABlvYUySNZ0ZHnSygHIWiWjC6b89A
 KmXiHJoocNfDlKwX6bf3OWQ+dGGFu2wo5wYzldIiqYJVidp50xdOosdRE1R6WwuG
 IOLCdNKdqDgtig+90/fFZ06liXZvqUdDafWgUs/g6lLquFrcq5aAIiSdR6PcPKB0
 MwfWcCglLtYrxgDHvNaqnW18yRQq2TGbe+A65aXzLPp45pKP8Hk=
 =uiSj
 -----END PGP SIGNATURE-----

Merge tag 'topic/kcmp-kconfig-2021-02-22' of git://anongit.freedesktop.org/drm/drm

Pull kcmp kconfig update from Daniel Vetter:
 "Make the kcmp syscall available independently of checkpoint/restore.

  drm userspaces uses this, systemd uses this, so makes sense to pull it
  out from the checkpoint-restore bundle.

  Kees reviewed this from security pov and is happy with the final
  version"

Link: https://lwn.net/Articles/845448/

* tag 'topic/kcmp-kconfig-2021-02-22' of git://anongit.freedesktop.org/drm/drm:
  kcmp: Support selection of SYS_kcmp without CHECKPOINT_RESTORE
2021-02-22 17:15:30 -08:00
Linus Torvalds
7c70f3a748 Optimization:
- Cork the socket while there are queued replies
 
 Fixes:
 
 - DRC shutdown ordering
 - svc_rdma_accept() lockdep splat
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmAsA80ACgkQM2qzM29m
 f5erXA/+MrR3ZtwK2eaTITu13TzzTrMURbp/n0wCCW/Ls1YMb6bn9ggtBwu2W5Cn
 Vb0RO9OLcmoI6CjqPh0CTUvvZspMYOAX4W1jQecKt2ml075APdlqUcv9YWPUQqVJ
 qTg8HxDymvHvY3I3FcBxhzofmGzF8AOmQZJw9uI5Wt/ivBfqGWcAGlxyRmB3mdsm
 cJRK0Sy7QMn2LefMcpMEeSbPA049/NZNRp6fcXnpPQFer42thoosYsNhTlAJfCXC
 C5S0z3/T6rpuJucV9la/WkpUA0YhWbPEHWNdAB5tzSqmoEo4LpzJzjv7uyQU4oue
 QlmChIz9qasgTI/BnCkBIzPD99S4UQcXjX0BnNinkQ77e6+b/vdAR+T+NLHJdkAf
 +7Xz6T9aZNaz2R49CjYl6/kG0rlNkjUzyURRYs/9zEBhogMPH/N4T7Z2M+ljCkeb
 tc3OaFDXZ2rfr7EKBGsfnEKINM1gpYipzILkr8GSHUMZLzOB/64upKySaJVjCGXj
 7Sf1w+vJUWwYc+FqFvbaR4ybr01VIfdsecpn1TtY870zG1JzimzAHVZk1/xC9+CX
 J+lVOXbjawDl1Et3V3fWq6Y7mhAWves/NKPcbSug9sFc4qRHEmPbAq/RRtlsjQcn
 foMr5R8qd8OwEamVypZ2nIFxq4q3b742AS8lZhaK+DyZKq3oLac=
 =+R4U
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull more nfsd updates from Chuck Lever:
 "Here are a few additional NFSD commits for the merge window:

 Optimization:
   - Cork the socket while there are queued replies

  Fixes:
   - DRC shutdown ordering
   - svc_rdma_accept() lockdep splat"

* tag 'nfsd-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  SUNRPC: Further clean up svc_tcp_sendmsg()
  SUNRPC: Remove redundant socket flags from svc_tcp_sendmsg()
  SUNRPC: Use TCP_CORK to optimise send performance on the server
  svcrdma: Hold private mutex while invoking rdma_accept()
  nfsd: register pernet ops last, unregister first
2021-02-22 13:29:55 -08:00
Linus Torvalds
20bf195e93 With netfs helper library and fscache rework delayed, just a few cap
handling improvements to avoid grabbing mmap_lock in some code paths
 and deal with capsnaps better and a mount option cleanup.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmAzuGwTHGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHzixPqB/9kQxU8IkCF0wOm+dm0tBW3PjYxBFuz
 HryHU6WJHDbX9/enH6PgMj6ZpRwxgzDq8xUpmRKVeaPflej9PnfQyH/On+vQWRUX
 WyWyBx0QqbrKYvYK0cCjHzVC5kbtBA8C/1OSSs5EkJIh518RBMkeru9pYL7+TI5x
 zeQVXzOJB2Bz7y8Odd2RjlkAkix/J1m0LIggRaoWrTygz93PKXfjzhDpa4KC4WZj
 W6LjnYPpYjo34poKx/3N3ZSgGP+Y3F7ZDeNfSnPB2WKs7vzcYUCpWXBSHnHTz+lK
 H2O5GdmxQ6BFp4SZvYtf5e78igH/m/QmzAYGW2EmmKttOcyrb2282snb
 =8MQu
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-5.12-rc1' of git://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
 "With netfs helper library and fscache rework delayed, just a few cap
  handling improvements to avoid grabbing mmap_lock in some code paths
  and deal with capsnaps better and a mount option cleanup"

* tag 'ceph-for-5.12-rc1' of git://github.com/ceph/ceph-client:
  ceph: defer flushing the capsnap if the Fb is used
  libceph: remove osdtimeout option entirely
  libceph: deprecate [no]cephx_require_signatures options
  ceph: allow queueing cap/snap handling after putting cap references
  ceph: clean up inode work queueing
  ceph: fix flush_snap logic after putting caps
2021-02-22 13:27:51 -08:00
Linus Torvalds
9fe1904626 \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmAzo34ACgkQnJ2qBz9k
 QNmsUwf8DFq8Uu2PI2BFOzHkEG6F3y+/KPpja9k08q3A1NSM28RYBaFeWc9wGImZ
 jtu3k1+8TiK51OkYGxa5LeIKpaMZrylEGXhdYTyfBJiJSHrjApWiq1jsCvtxk/xt
 3pjI9+OItwmZVo/INYAWS8+QdweX87PkaZtKi0//pqgFdnsjMCKDUxkCIB3IEigk
 I7orTiBpTSgP3iwcuRhchyyCFjIeoW+L2nbNuv8CYjXo9WIAF5ypQx+r1T2f1Ieu
 Vt9u41gwRUYfn3a5YdKMJZgAkcv7a4QYP4+tbSnD9Wl3jtorCBgTC6EDUyGNWqdr
 lqRIJ0jp1ET387J/YAGCGFsdz1AIjw==
 =YTNN
 -----END PGP SIGNATURE-----

Merge tag 'fs_for_v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull isofs, udf, and quota updates from Jan Kara:
 "Several udf, isofs, and quota fixes"

* tag 'fs_for_v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  parser: Fix kernel-doc markups
  udf: handle large user and group ID
  isofs: handle large user and group ID
  parser: add unsigned int parser
  udf: fix silent AED tagLocation corruption
  isofs: release buffer head before return
  quota: Fix memory leak when handling corrupted quota file
2021-02-22 13:25:37 -08:00
Linus Torvalds
db99038542 \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmAzolwACgkQnJ2qBz9k
 QNnwAQgAhw7PYZgRGnhm/VEDBD1EiPqNIhV3+EuUcNHlrNERx0jPN3bcoXmJD7FE
 PCCwbsYtQyqjYFipuzvBnUur5s7CxrwyDhvE8bgYdOB43Gy94awwvwF+JbMnBaPj
 gZSvArKD7ISAUpt560jtH5KedNAZnDkPITePME2GQsOpZ9SHHjsJEhSheTaHk0t1
 03Odx6gK5CcRvRD4KQYTa/hvZH95dVHSdakgFODAUoyfR65KMLhBihNOVHZsEVEZ
 S99j0YBY15nxS8ygo+Iz3Y3KOzy9G1xRAzk3wSeDGzhNRfzYP/IIZWWY/KWowmvH
 afx0pa0KiYjgqDpDjsyYPOJ2Ul4cPA==
 =gXlh
 -----END PGP SIGNATURE-----

Merge tag 'fsnotify_for_v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull fsnotify update from Jan Kara:
 "Make inotify groups be charged against appropriate memcgs"

* tag 'fsnotify_for_v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  inotify, memcg: account inotify instances to kmemcg
2021-02-22 13:23:29 -08:00
Linus Torvalds
d61c6a58ae \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmAzoWUACgkQnJ2qBz9k
 QNnFgQgAlng0JOzeCQvLpwweqFl0FCxYbOsZXC1xDyvfX3TiA6A6oiOR4tx3uhQN
 cOQmJXaiMn4oCXjD1j6WZwGfy23yx0XchaoFK9jy2IqodaB/zUjkiWYYqt0G3XIX
 ud35mxjLAGS12BCD0c+vHy2RMsUFl5ep+5aBHRHZJJhCcYbl7e5ctXZ3xB1Q0mgI
 r639gD8JhH3ICdu9W0NaMvqOrVhJFNmhSGATKL/N96+oKub2x2ycYE4L2OXegxy3
 mnFf26LjA8jt7K+KfHloTvkC6D4HVnnvKFvKiIbGKafiWhAE7q57ZO6BPCMajGue
 3UHIhWGmwKXRU72+nW6N+089GbcO/g==
 =1e+z
 -----END PGP SIGNATURE-----

Merge tag 'lazytime_for_v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull lazytime updates from Jan Kara:
 "Cleanups of the lazytime handling in the writeback code making rules
  for calling ->dirty_inode() filesystem handlers saner"

* tag 'lazytime_for_v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  ext4: simplify i_state checks in __ext4_update_other_inode_time()
  gfs2: don't worry about I_DIRTY_TIME in gfs2_fsync()
  fs: improve comments for writeback_single_inode()
  fs: drop redundant check from __writeback_single_inode()
  fs: clean up __mark_inode_dirty() a bit
  fs: pass only I_DIRTY_INODE flags to ->dirty_inode
  fs: don't call ->dirty_inode for lazytime timestamp updates
  fat: only specify I_DIRTY_TIME when needed in fat_update_time()
  fs: only specify I_DIRTY_TIME when needed in generic_update_time()
  fs: correctly document the inode dirty flags
2021-02-22 13:17:39 -08:00
Linus Torvalds
c63dca9e23 Description for this pull request:
- Improve file deletion performance with dirsync mount option.
 - fix shift-out-of-bounds in exfat_fill_super() generated by syzkaller.
 -----BEGIN PGP SIGNATURE-----
 
 iQJMBAABCgA2FiEE6NzKS6Uv/XAAGHgyZwv7A1FEIQgFAmAzA7MYHG5hbWphZS5q
 ZW9uQHNhbXN1bmcuY29tAAoJEGcL+wNRRCEIYKMP/1i4uV1KebeQTkIiNlPMae+H
 hq2TgkrOLp7gKY+G8VoSrYIG9U5fEGh+bo6UwIXQpGDrnFb9AWAE0YoyCUV0sTGS
 6CQmmjl1Cq4gwmBs2yjkbmGa54d2NRExZc2zMewDxnWRQjIvqJftxh1qowjwN13j
 arfjvWGJ61Je/S1jQ0vS0+nC5f4Mcy+60JyAQB4gxX5327o1ZVohs16jYTTZuS4+
 PsARwM+q3wYQ/THeTf01E/a77IfsS49EtR1aTo8/9fCfQCxKdYT0wkJQQ/J0yhTL
 l6LMp1PNHV3iePRPZWAF15n8eB2dXGfcaathgtecY+9arKJVDnAoALt91gxN1bxE
 ZZZWkOlviN4FmdQvKNtVaBhdAxNqybe1P0yLvH1Vm2nY0oyY+qgQ79A/xNWcBvhg
 Q6Y1fV08KLWRPsafoZVntb3pd/DT1ekeedrFuqPe3c6kTziFn1GmUhb+6DGCEoFs
 2sbuxATOJCdNL4GHLt0CRNQYU2Dm6g0yuQ3qodu4wcco9yUgiNXe2sM6NpMXaTH7
 XUlgsaxl8zO5TKrVfre93gIP48HRf2/gZu8ejXs2rUEIzhSy3rsIMyt2DhTUe9Db
 CW2SMGQADG9Aiod4gRaunvMrTE3S7NMXLsOG5K6eThar8fOnqQ/F0NHi5DUWuFdw
 17G5gwimXbZxW0GhTvWH
 =kzJV
 -----END PGP SIGNATURE-----

Merge tag 'exfat-for-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat

Pull exfat updates from Namjae Jeon:

 - improve file deletion performance with dirsync mount option

 - fix shift-out-of-bounds in exfat_fill_super() reported by syzkaller

* tag 'exfat-for-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
  exfat: improve performance of exfat_free_cluster when using dirsync mount option
  exfat: fix shift-out-of-bounds in exfat_fill_super()
2021-02-22 13:15:50 -08:00
Linus Torvalds
0f3d950ddd zonefs changes for 5.12-rc1
Two patches in this pull request:
 * A fix that did not make it in time for 5.11, to correct the file size
   initialization of full sequential zone, from Shin'ichiro
 * Add file operation tracepoints to help with debugging, from Johannes
 
 Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCYDLicAAKCRDdoc3SxdoY
 dpN4AQCh1EbZ3NvHRJ4bMWnNQ3OsdkTWYix7ur3C/5Ft7oQbKQEAge6cUgIjEkrD
 u3znsZSYE/RM+LxrhE1RquTwERkSeQk=
 =StDj
 -----END PGP SIGNATURE-----

Merge tag 'zonefs-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs

Pull zonefs updates from Damien Le Moal:
 "Two changes:

   - A fix that did not make it in time for 5.11, to correct the file
     size initialization of full sequential zone, from Shin'ichiro

   - Add file operation tracepoints to help with debugging, from
     Johannes"

* tag 'zonefs-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
  zonefs: Fix file size of zones in full condition
  zonefs: add tracepoints for file operations
2021-02-22 13:13:51 -08:00
Linus Torvalds
250a25e7a1 Merge branch 'work.audit' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull RCU-safe common_lsm_audit() from Al Viro:
 "Make common_lsm_audit() non-blocking and usable from RCU pathwalk
  context.

  We don't really need to grab/drop dentry in there - rcu_read_lock() is
  enough. There's a couple of followups using that to simplify the
  logics in selinux, but those hadn't soaked in -next yet, so they'll
  have to go in next window"

* 'work.audit' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  make dump_common_audit_data() safe to be called from RCU pathwalk
  new helper: d_find_alias_rcu()
2021-02-22 13:05:30 -08:00
Linus Torvalds
205f92d7f2 Merge branch 'work.d_name' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull d_name whack-a-mole from Al Viro:
 "A bunch of places that play with ->d_name in printks instead of using
  proper formats..."

* 'work.d_name' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  orangefs_file_mmap(): use %pD
  cifs_debug: use %pd instead of messing with ->d_name
  erofs: use %pd instead of messing with ->d_name
  cramfs: use %pD instead of messing with file_dentry()->d_name
2021-02-22 13:03:30 -08:00
Andreas Gruenbacher
2129b42888 gfs2: Per-revoke accounting in transactions
In the log, revokes are stored as a revoke descriptor (struct
gfs2_log_descriptor), followed by zero or more additional revoke blocks
(struct gfs2_meta_header).  On filesystems with a blocksize of 4k, the
revoke descriptor contains up to 503 revokes, and the metadata blocks
contain up to 509 revokes each.  We've so far been reserving space for
revokes in transactions in block granularity, so a lot more space than
necessary was being allocated and then released again.

This patch switches to assigning revokes to transactions individually
instead.  Initially, space for the revoke descriptor is reserved and
handed out to transactions.  When more revokes than that are reserved,
additional revoke blocks are added.  When the log is flushed, the space
for the additional revoke blocks is released, but we keep the space for
the revoke descriptor block allocated.

Transactions may still reserve more revokes than they will actually need
in the end, but now we won't overshoot the target as much, and by only
returning the space for excess revokes at log flush time, we further
reduce the amount of contention between processes.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-02-22 21:16:23 +01:00
Andreas Gruenbacher
fe3e397668 gfs2: Rework the log space allocation logic
The current log space allocation logic is hard to understand or extend.
The principle it that when the log is flushed, we may or may not have a
transaction active that has space allocated in the log.  To deal with
that, we set aside a magical number of blocks to be used in case we
don't have an active transaction.  It isn't clear that the pool will
always be big enough.  In addition, we can't return unused log space at
the end of a transaction, so the number of blocks allocated must exactly
match the number of blocks used.

Simplify this as follows:
 * When transactions are allocated or merged, always reserve enough
   blocks to flush the transaction (err on the safe side).
 * In gfs2_log_flush, return any allocated blocks that haven't been used.
 * Maintain a pool of spare blocks big enough to do one log flush, as
   before.
 * In gfs2_log_flush, when we have no active transaction, allocate a
   suitable number of blocks.  For that, use the spare pool when
   called from logd, and leave the pool alone otherwise.  This means
   that when the log is almost full, logd will still be able to do one
   more log flush, which will result in more log space becoming
   available.

This will make the log space allocator code easier to work with in
the future.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-02-22 21:16:22 +01:00
Andreas Gruenbacher
71b219f4e5 gfs2: Minor calc_reserved cleanup
No functional change.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-02-22 21:16:22 +01:00
Linus Torvalds
0e63a5c6ba It has been a relatively quiet cycle in docsland.
- As promised, the minimum Sphinx version to build the docs is now 1.7,
    and we have dropped support for Python 2 entirely.  That allowed the
    removal of a bunch of compatibility code.
 
  - A set of treewide warning fixups from Mauro that I applied after it
    became clear nobody else was going to deal with them.
 
  - The automarkup mechanism can now create cross-references from relative
    paths to RST files.
 
  - More translations, typo fixes, and warning fixes.
 
 No conflicts with any other tree as far as I know.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmAq4EUPHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5YTIAH/1I5MlVQwuvNKjwCAEdmltQgHv6SmXSpDkrp
 SGuviWVXxqz8dTyo7C2R12qE/7nP8zGAmclNdX78ynl5qWaj05lQsjBgMYSoQO/F
 +akyLQSL8/8SQrtDPPBcboPuIz9DzkX51kkQthvCf0puJi0ScKVHO9Sk9SKUgDoK
 cnCE9VwpGL7YX/ee2wt91UYREijgJ9P7eQ6rqKvUZ5Itu9ikfu9vQU41GR9tOXDK
 MQK+k38pWdl8wRgTgA0pkVhMf1G732bxTTicvFHXcyqmCkh7++m2+ysT8O+SBBMX
 e5BbP0yysSqThjwFHOW5PWM1AWD5iVz+pnwJwEaJ4K76tJJOw9M=
 =bcDk
 -----END PGP SIGNATURE-----

Merge tag 'docs-5.12' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "It has been a relatively quiet cycle in docsland.

   - As promised, the minimum Sphinx version to build the docs is now
     1.7, and we have dropped support for Python 2 entirely. That
     allowed the removal of a bunch of compatibility code.

   - A set of treewide warning fixups from Mauro that I applied after it
     became clear nobody else was going to deal with them.

   - The automarkup mechanism can now create cross-references from
     relative paths to RST files.

   - More translations, typo fixes, and warning fixes"

* tag 'docs-5.12' of git://git.lwn.net/linux: (75 commits)
  docs: kernel-hacking: be more civil
  docs: Remove the Microsoft rhetoric
  Documentation/admin-guide: kernel-parameters: Update nohlt section
  doc/admin-guide: fix spelling mistake: "perfomance" -> "performance"
  docs: Document cross-referencing using relative path
  docs: Enable usage of relative paths to docs on automarkup
  docs: thermal: fix spelling mistakes
  Documentation: admin-guide: Update kvm/xen config option
  docs: Make syscalls' helpers naming consistent
  coding-style.rst: Avoid comma statements
  Documentation: /proc/loadavg: add 3 more field descriptions
  Documentation/submitting-patches: Add blurb about backtraces in commit messages
  Docs: drop Python 2 support
  Move our minimum Sphinx version to 1.7
  Documentation: input: define ABS_PRESSURE/ABS_MT_PRESSURE resolution as grams
  scripts/kernel-doc: add internal hyperlink to DOC: sections
  Update Documentation/admin-guide/sysctl/fs.rst
  docs: Update DTB format references
  docs: zh_CN: add iio index.rst translation
  docs/zh_CN: add iio ep93xx_adc.rst translation
  ...
2021-02-22 10:57:46 -08:00
Johannes Thumshirn
6e37d24599 btrfs: zoned: fix deadlock on log sync
Lockdep with fstests test case btrfs/041 detected a unsafe locking
scenario when we allocate the log node on a zoned filesystem.

btrfs/041
 ============================================
 WARNING: possible recursive locking detected
 5.11.0-rc7+ #939 Not tainted
 --------------------------------------------
 xfs_io/698 is trying to acquire lock:
 ffff88810cd673a0 (&root->log_mutex){+.+.}-{3:3}, at: btrfs_sync_log+0x3d1/0xee0 [btrfs]

 but task is already holding lock:
 ffff88810b0fc3a0 (&root->log_mutex){+.+.}-{3:3}, at: btrfs_sync_log+0x313/0xee0 [btrfs]

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&root->log_mutex);
   lock(&root->log_mutex);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 2 locks held by xfs_io/698:
  #0: ffff88810cd66620 (sb_internal){.+.+}-{0:0}, at: btrfs_sync_file+0x2c3/0x570 [btrfs]
  #1: ffff88810b0fc3a0 (&root->log_mutex){+.+.}-{3:3}, at: btrfs_sync_log+0x313/0xee0 [btrfs]

 stack backtrace:
 CPU: 0 PID: 698 Comm: xfs_io Not tainted 5.11.0-rc7+ #939
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014
 Call Trace:
  dump_stack+0x77/0x97
  __lock_acquire.cold+0xb9/0x32a
  lock_acquire+0xb5/0x400
  ? btrfs_sync_log+0x3d1/0xee0 [btrfs]
  __mutex_lock+0x7b/0x8d0
  ? btrfs_sync_log+0x3d1/0xee0 [btrfs]
  ? btrfs_sync_log+0x3d1/0xee0 [btrfs]
  ? find_first_extent_bit+0x9f/0x100 [btrfs]
  ? __mutex_unlock_slowpath+0x35/0x270
  btrfs_sync_log+0x3d1/0xee0 [btrfs]
  btrfs_sync_file+0x3a8/0x570 [btrfs]
  __x64_sys_fsync+0x34/0x60
  do_syscall_64+0x33/0x40
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

This happens, because we are taking the ->log_mutex albeit it has already
been locked.

Also while at it, fix the bogus unlock of the tree_log_mutex in the error
handling.

Fixes: 3ddebf27fcd3 ("btrfs: zoned: reorder log node allocation on zoned filesystem")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-22 18:08:48 +01:00
Josef Bacik
95c85fba1f btrfs: avoid double put of block group when emptying cluster
It's wrong calling btrfs_put_block_group in
__btrfs_return_cluster_to_free_space if the block group passed is
different than the block group the cluster represents. As this means the
cluster doesn't have a reference to the passed block group. This results
in double put and a use-after-free bug.

Fix this by simply bailing if the block group we passed in does not
match the block group on the cluster.

Fixes: fa9c0d795f7b ("Btrfs: rework allocation clustering")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-22 18:07:45 +01:00
Filipe Manana
3660d0bcdb btrfs: fix stale data exposure after cloning a hole with NO_HOLES enabled
When using the NO_HOLES feature, if we clone a file range that spans only
a hole into a range that is at or beyond the current i_size of the
destination file, we end up not setting the full sync runtime flag on the
inode. As a result, if we then fsync the destination file and have a power
failure, after log replay we can end up exposing stale data instead of
having a hole for that range.

The conditions for this to happen are the following:

1) We have a file with a size of, for example, 1280K;

2) There is a written (non-prealloc) extent for the file range from 1024K
   to 1280K with a length of 256K;

3) This particular file extent layout is durably persisted, so that the
   existing superblock persisted on disk points to a subvolume root where
   the file has that exact file extent layout and state;

4) The file is truncated to a smaller size, to an offset lower than the
   start offset of its last extent, for example to 800K. The truncate sets
   the full sync runtime flag on the inode;

6) Fsync the file to log it and clear the full sync runtime flag;

7) Clone a region that covers only a hole (implicit hole due to NO_HOLES)
   into the file with a destination offset that starts at or beyond the
   256K file extent item we had - for example to offset 1024K;

8) Since the clone operation does not find extents in the source range,
   we end up in the if branch at the bottom of btrfs_clone() where we
   punch a hole for the file range starting at offset 1024K by calling
   btrfs_replace_file_extents(). There we end up not setting the full
   sync flag on the inode, because we don't know we are being called in
   a clone context (and not fallocate's punch hole operation), and
   neither do we create an extent map to represent a hole because the
   requested range is beyond eof;

9) A further fsync to the file will be a fast fsync, since the clone
   operation did not set the full sync flag, and therefore it relies on
   modified extent maps to correctly log the file layout. But since
   it does not find any extent map marking the range from 1024K (the
   previous eof) to the new eof, it does not log a file extent item
   for that range representing the hole;

10) After a power failure no hole for the range starting at 1024K is
   punched and we end up exposing stale data from the old 256K extent.

Turning this into exact steps:

  $ mkfs.btrfs -f -O no-holes /dev/sdi
  $ mount /dev/sdi /mnt

  # Create our test file with 3 extents of 256K and a 256K hole at offset
  # 256K. The file has a size of 1280K.
  $ xfs_io -f -s \
              -c "pwrite -S 0xab -b 256K 0 256K" \
              -c "pwrite -S 0xcd -b 256K 512K 256K" \
              -c "pwrite -S 0xef -b 256K 768K 256K" \
              -c "pwrite -S 0x73 -b 256K 1024K 256K" \
              /mnt/sdi/foobar

  # Make sure it's durably persisted. We want the last committed super
  # block to point to this particular file extent layout.
  sync

  # Now truncate our file to a smaller size, falling within a position of
  # the second extent. This sets the full sync runtime flag on the inode.
  # Then fsync the file to log it and clear the full sync flag from the
  # inode. The third extent is no longer part of the file and therefore
  # it is not logged.
  $ xfs_io -c "truncate 800K" -c "fsync" /mnt/foobar

  # Now do a clone operation that only clones the hole and sets back the
  # file size to match the size it had before the truncate operation
  # (1280K).
  $ xfs_io \
        -c "reflink /mnt/foobar 256K 1024K 256K" \
        -c "fsync" \
        /mnt/foobar

  # File data before power failure:
  $ od -A d -t x1 /mnt/foobar
  0000000 ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab
  *
  0262144 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  *
  0524288 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd
  *
  0786432 ef ef ef ef ef ef ef ef ef ef ef ef ef ef ef ef
  *
  0819200 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  *
  1310720

  <power fail>

  # Mount the fs again to replay the log tree.
  $ mount /dev/sdi /mnt

  # File data after power failure:
  $ od -A d -t x1 /mnt/foobar
  0000000 ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab
  *
  0262144 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  *
  0524288 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd
  *
  0786432 ef ef ef ef ef ef ef ef ef ef ef ef ef ef ef ef
  *
  0819200 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  *
  1048576 73 73 73 73 73 73 73 73 73 73 73 73 73 73 73 73
  *
  1310720

The range from 1024K to 1280K should correspond to a hole but instead it
points to stale data, to the 256K extent that should not exist after the
truncate operation.

The issue does not exists when not using NO_HOLES, because for that case
we use file extent items to represent holes, these are found and copied
during the loop that iterates over extents at btrfs_clone(), and that
causes btrfs_replace_file_extents() to be called with a non-NULL
extent_info argument and therefore set the full sync runtime flag on the
inode.

So fix this by making the code that deals with a trailing hole during
cloning, at btrfs_clone(), to set the full sync flag on the inode, if the
range starts at or beyond the current i_size.

A test case for fstests will follow soon.

Backporting notes: for kernel 5.4 the change goes to ioctl.c into
btrfs_clone before the last call to btrfs_punch_hole_range.

CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-22 18:07:45 +01:00
Josef Bacik
1119a72e22 btrfs: tree-checker: do not error out if extent ref hash doesn't match
The tree checker checks the extent ref hash at read and write time to
make sure we do not corrupt the file system.  Generally extent
references go inline, but if we have enough of them we need to make an
item, which looks like

key.objectid	= <bytenr>
key.type	= <BTRFS_EXTENT_DATA_REF_KEY|BTRFS_TREE_BLOCK_REF_KEY>
key.offset	= hash(tree, owner, offset)

However if key.offset collide with an unrelated extent reference we'll
simply key.offset++ until we get something that doesn't collide.
Obviously this doesn't match at tree checker time, and thus we error
while writing out the transaction.  This is relatively easy to
reproduce, simply do something like the following

  xfs_io -f -c "pwrite 0 1M" file
  offset=2

  for i in {0..10000}
  do
	  xfs_io -c "reflink file 0 ${offset}M 1M" file
	  offset=$(( offset + 2 ))
  done

  xfs_io -c "reflink file 0 17999258914816 1M" file
  xfs_io -c "reflink file 0 35998517829632 1M" file
  xfs_io -c "reflink file 0 53752752058368 1M" file

  btrfs filesystem sync

And the sync will error out because we'll abort the transaction.  The
magic values above are used because they generate hash collisions with
the first file in the main subvol.

The fix for this is to remove the hash value check from tree checker, as
we have no idea which offset ours should belong to.

Reported-by: Tuomas Lähdekorpi <tuomas.lahdekorpi@gmail.com>
Fixes: 0785a9aacf9d ("btrfs: tree-checker: Add EXTENT_DATA_REF check")
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add comment]
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-22 18:07:44 +01:00
Filipe Manana
dd0734f2a8 btrfs: fix race between swap file activation and snapshot creation
When creating a snapshot we check if the current number of swap files, in
the root, is non-zero, and if it is, we error out and warn that we can not
create the snapshot because there are active swap files.

However this is racy because when a task started activation of a swap
file, another task might have started already snapshot creation and might
have seen the counter for the number of swap files as zero. This means
that after the swap file is activated we may end up with a snapshot of the
same root successfully created, and therefore when the first write to the
swap file happens it has to fall back into COW mode, which should never
happen for active swap files.

Basically what can happen is:

1) Task A starts snapshot creation and enters ioctl.c:create_snapshot().
   There it sees that root->nr_swapfiles has a value of 0 so it continues;

2) Task B enters btrfs_swap_activate(). It is not aware that another task
   started snapshot creation but it did not finish yet. It increments
   root->nr_swapfiles from 0 to 1;

3) Task B checks that the file meets all requirements to be an active
   swap file - it has NOCOW set, there are no snapshots for the inode's
   root at the moment, no file holes, no reflinked extents, etc;

4) Task B returns success and now the file is an active swap file;

5) Task A commits the transaction to create the snapshot and finishes.
   The swap file's extents are now shared between the original root and
   the snapshot;

6) A write into an extent of the swap file is attempted - there is a
   snapshot of the file's root, so we fall back to COW mode and therefore
   the physical location of the extent changes on disk.

So fix this by taking the snapshot lock during swap file activation before
locking the extent range, as that is the order in which we lock these
during buffered writes.

Fixes: ed46ff3d42378 ("Btrfs: support swap files")
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-22 18:07:35 +01:00
Filipe Manana
195a49eaf6 btrfs: fix race between writes to swap files and scrub
When we active a swap file, at btrfs_swap_activate(), we acquire the
exclusive operation lock to prevent the physical location of the swap
file extents to be changed by operations such as balance and device
replace/resize/remove. We also call there can_nocow_extent() which,
among other things, checks if the block group of a swap file extent is
currently RO, and if it is we can not use the extent, since a write
into it would result in COWing the extent.

However we have no protection against a scrub operation running after we
activate the swap file, which can result in the swap file extents to be
COWed while the scrub is running and operating on the respective block
group, because scrub turns a block group into RO before it processes it
and then back again to RW mode after processing it. That means an attempt
to write into a swap file extent while scrub is processing the respective
block group, will result in COWing the extent, changing its physical
location on disk.

Fix this by making sure that block groups that have extents that are used
by active swap files can not be turned into RO mode, therefore making it
not possible for a scrub to turn them into RO mode. When a scrub finds a
block group that can not be turned to RO due to the existence of extents
used by swap files, it proceeds to the next block group and logs a warning
message that mentions the block group was skipped due to active swap
files - this is the same approach we currently use for balance.

Fixes: ed46ff3d42378 ("Btrfs: support swap files")
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-22 18:07:15 +01:00
Filipe Manana
20903032cd btrfs: avoid checking for RO block group twice during nocow writeback
During the nocow writeback path, we currently iterate the rbtree of block
groups twice: once for checking if the target block group is RO with the
call to btrfs_extent_readonly()), and once again for getting a nocow
reference on the block group with a call to btrfs_inc_nocow_writers().

Since btrfs_inc_nocow_writers() already returns false when the target
block group is RO, remove the call to btrfs_extent_readonly(). Not only
we avoid searching the blocks group rbtree twice, it also helps reduce
contention on the lock that protects it (specially since it is a spin
lock and not a read-write lock). That may make a noticeable difference
on very large filesystems, with thousands of allocated block groups.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-22 17:15:55 +01:00
Nikolay Borisov
3c17916510 btrfs: fix race between extent freeing/allocation when using bitmaps
During allocation the allocator will try to allocate an extent using
cluster policy. Once the current cluster is exhausted it will remove the
entry under btrfs_free_cluster::lock and subsequently acquire
btrfs_free_space_ctl::tree_lock to dispose of the already-deleted entry
and adjust btrfs_free_space_ctl::total_bitmap. This poses a problem
because there exists a race condition between removing the entry under
one lock and doing the necessary accounting holding a different lock
since extent freeing only uses the 2nd lock. This can result in the
following situation:

T1:                                    T2:
btrfs_alloc_from_cluster               insert_into_bitmap <holds tree_lock>
 if (entry->bytes == 0)                   if (block_group && !list_empty(&block_group->cluster_list)) {
    rb_erase(entry)

 spin_unlock(&cluster->lock);
   (total_bitmaps is still 4)           spin_lock(&cluster->lock);
                                         <doesn't find entry in cluster->root>
 spin_lock(&ctl->tree_lock);             <goes to new_bitmap label, adds
<blocked since T2 holds tree_lock>       <a new entry and calls add_new_bitmap>
					    recalculate_thresholds  <crashes,
                                              due to total_bitmaps
					      becoming 5 and triggering
					      an ASSERT>

To fix this ensure that once depleted, the cluster entry is deleted when
both cluster lock and tree locks are held in the allocator (T1), this
ensures that even if there is a race with a concurrent
insert_into_bitmap call it will correctly find the entry in the cluster
and add the new space to it.

CC: <stable@vger.kernel.org> # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-22 17:15:31 +01:00
Qu Wenruo
04d4ba4c90 btrfs: make check_compressed_csum() to be subpage compatible
Currently check_compressed_csum() completely relies on sectorsize ==
PAGE_SIZE to do checksum verification for compressed extents.

To make it subpage compatible, this patch will:
- Do extra calculation for the csum range
  Since we have multiple sectors inside a page, we need to only hash
  the range we want, not the full page anymore.

- Do sector-by-sector hash inside the page

With this patch and previous conversion on
btrfs_submit_compressed_read(), now we can read subpage compressed
extents properly, and do proper csum verification.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-22 17:15:27 +01:00
Qu Wenruo
be6a13613f btrfs: make btrfs_submit_compressed_read() subpage compatible
For compressed read, we always submit page read using page size.  This
doesn't work well with subpage, as for subpage one page can contain
several sectors.  Such submission will read range out of what we want,
and cause problems.

Thankfully to make it subpage compatible, we only need to change how the
last page of the compressed extent is read.

Instead of always adding a full page to the compressed read bio, if we're
at the last page, calculate the size using compressed length, so that we
only add part of the range into the compressed read bio.

Since we are here, also change the PAGE_SIZE used in
lookup_extent_mapping() to sectorsize.
This modification won't cause any functional change, as
lookup_extent_mapping() can handle the case where the search range is
larger than found extent range.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-22 17:15:25 +01:00
Ira Weiny
d70cef0d46 btrfs: fix raid6 qstripe kmap
When a qstripe is required an extra page is allocated and mapped.  There
were 3 problems:

1) There is no corresponding call of kunmap() for the qstripe page.
2) There is no reason to map the qstripe page more than once if the
   number of bits set in rbio->dbitmap is greater than one.
3) There is no reason to map the parity page and unmap it each time
   through the loop.

The page memory can continue to be reused with a single mapping on each
iteration by raid6_call.gen_syndrome() without remapping.  So map the
page for the duration of the loop.

Similarly, improve the algorithm by mapping the parity page just 1 time.

Fixes: 5a6ac9eacb49 ("Btrfs, raid56: support parity scrub on raid56")
CC: stable@vger.kernel.org # 4.4.x: c17af96554a8: btrfs: raid56: simplify tracking of Q stripe presence
CC: stable@vger.kernel.org # 4.4.x
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-22 17:15:21 +01:00
Pavel Begunkov
8e5c66c485 io_uring: clear request count when freeing caches
BUG: KASAN: double-free or invalid-free in io_req_caches_free.constprop.0+0x3ce/0x530 fs/io_uring.c:8709

Workqueue: events_unbound io_ring_exit_work
Call Trace:
 [...]
 __cache_free mm/slab.c:3424 [inline]
 kmem_cache_free_bulk+0x4b/0x1b0 mm/slab.c:3744
 io_req_caches_free.constprop.0+0x3ce/0x530 fs/io_uring.c:8709
 io_ring_ctx_free fs/io_uring.c:8764 [inline]
 io_ring_exit_work+0x518/0x6b0 fs/io_uring.c:8846
 process_one_work+0x98d/0x1600 kernel/workqueue.c:2275
 worker_thread+0x64c/0x1120 kernel/workqueue.c:2421
 kthread+0x3b1/0x4a0 kernel/kthread.c:292
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

Freed by task 11900:
 [...]
 kmem_cache_free_bulk+0x4b/0x1b0 mm/slab.c:3744
 io_req_caches_free.constprop.0+0x3ce/0x530 fs/io_uring.c:8709
 io_uring_flush+0x483/0x6e0 fs/io_uring.c:9237
 filp_close+0xb4/0x170 fs/open.c:1286
 close_files fs/file.c:403 [inline]
 put_files_struct fs/file.c:418 [inline]
 put_files_struct+0x1d0/0x350 fs/file.c:415
 exit_files+0x7e/0xa0 fs/file.c:435
 do_exit+0xc27/0x2ae0 kernel/exit.c:820
 do_group_exit+0x125/0x310 kernel/exit.c:922
 [...]

io_req_caches_free() doesn't zero submit_state->free_reqs, so io_uring
considers just freed requests to be good and sound and will reuse or
double free them. Zero the counter.

Reported-by: syzbot+30b4936dcdb3aafa4fb4@syzkaller.appspotmail.com
Fixes: 41be53e94fb04 ("io_uring: kill cached requests from exiting task closing the ring")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-22 06:31:31 -07:00
Hyeongseok Kim
f728760aa9 exfat: improve performance of exfat_free_cluster when using dirsync mount option
There are stressful update of cluster allocation bitmap when using
dirsync mount option which is doing sync buffer on every cluster bit
clearing. This could result in performance degradation when deleting
big size file.
Fix to update only when the bitmap buffer index is changed would make
less disk access, improving performance especially for truncate operation.

Testing with Samsung 256GB sdcard, mounted with dirsync option
(mount -t exfat /dev/block/mmcblk0p1 /temp/mount -o dirsync)

Remove 4GB file, blktrace result.
[Before] : 39 secs.
Total (blktrace):
 Reads Queued:      0,        0KiB   Writes Queued:      32775,    16387KiB
 Read Dispatches:   0,        0KiB   Write Dispatches:   32775,    16387KiB
 Reads Requeued:    0                Writes Requeued:        0
 Reads Completed:   0,        0KiB   Writes Completed:   32775,    16387KiB
 Read Merges:       0,        0KiB   Write Merges:           0,        0KiB
 IO unplugs:        2                Timer unplugs:          0

[After] : 1 sec.
Total (blktrace):
 Reads Queued:      0,        0KiB   Writes Queued:         13,        6KiB
 Read Dispatches:   0,        0KiB   Write Dispatches:      13,        6KiB
 Reads Requeued:    0                Writes Requeued:        0
 Reads Completed:   0,        0KiB   Writes Completed:      13,        6KiB
 Read Merges:       0,        0KiB   Write Merges:           0,        0KiB
 IO unplugs:        1                Timer unplugs:          0

Signed-off-by: Hyeongseok Kim <hyeongseok@gmail.com>
Acked-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2021-02-22 09:55:14 +09:00
Namjae Jeon
78c276f549 exfat: fix shift-out-of-bounds in exfat_fill_super()
syzbot reported a warning which could cause shift-out-of-bounds issue.

Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x183/0x22e lib/dump_stack.c:120
 ubsan_epilogue lib/ubsan.c:148 [inline]
 __ubsan_handle_shift_out_of_bounds+0x432/0x4d0 lib/ubsan.c:395
 exfat_read_boot_sector fs/exfat/super.c:471 [inline]
 __exfat_fill_super fs/exfat/super.c:556 [inline]
 exfat_fill_super+0x2acb/0x2d00 fs/exfat/super.c:624
 get_tree_bdev+0x406/0x630 fs/super.c:1291
 vfs_get_tree+0x86/0x270 fs/super.c:1496
 do_new_mount fs/namespace.c:2881 [inline]
 path_mount+0x1937/0x2c50 fs/namespace.c:3211
 do_mount fs/namespace.c:3224 [inline]
 __do_sys_mount fs/namespace.c:3432 [inline]
 __se_sys_mount+0x2f9/0x3b0 fs/namespace.c:3409
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

exfat specification describe sect_per_clus_bits field of boot sector
could be at most 25 - sect_size_bits and at least 0. And sect_size_bits
can also affect this calculation, It also needs validation.
This patch add validation for sect_per_clus_bits and sect_size_bits
field of boot sector.

Fixes: 719c1e182916 ("exfat: add super block operations")
Cc: stable@vger.kernel.org # v5.9+
Reported-by: syzbot+da4fe66aaadd3c2e2d1c@syzkaller.appspotmail.com
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2021-02-22 09:55:13 +09:00
Linus Torvalds
d1fec2214b selinux/stable-5.12 PR 20210215
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmAqwVUUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXP1nw//bbmtBhpaG+RnmPrSGZgy3gqbB3gU
 ggJ5UKNvYclrej2dur3EHXPEB0YWDv2D2OgChfTAu+T7sc2sBF3bz9qAu1a556mV
 JdfID8aoUwSk+oN7AKcwbdua+wLhXppAnYSKaknR+tjmWzvVKBDkrOovl52oR6L8
 Wx3YHCy7yPO79wqGqoWLCI7aI8ByfovoyOf6Xr/sPl+gMuBvbFoJeO1Pa9YNoI0z
 noGT1h6vLjgyvegqMX5lCkh1sUlcOsmXkAksw1FyEAfJfr0MPLLkVoTaBAook5NO
 X7VEhv845CjfIfoCXDdIHzriDWHp3tEDMSQaLwU3QSjfsbyNVh4ggwuHZYqrR9dL
 DerCa+89XYdCldrBzBeRs3Qd/6bZtHpd62pHDgn+NwMdjEckCHh41t2f2odD+Rdy
 2Fv+50C3m+7JjUawKhzgWR3BYJhafiKKUiWc2GBm1cBSr7+vSKokDG27gJmtNCoE
 TedSlQTPyi47zjZMnf/laSqGEUG9xz79xAiDPDP5yuxbDvN5andRYHmhI4thbGcq
 5DsVx5DDWaXtJxRVlsTgTeyvjdp61Rbvj8jvbbD/St+8PNsbpFOerbjSaidovfJK
 Y0YrkL/sKcGkM8HbQCcl1DKd4l1EfDIKUch078LQJHetuHh4L89U+r5uqZRsgZYD
 /EWeEw56llrepMQ=
 =5fVL
 -----END PGP SIGNATURE-----

Merge tag 'selinux-pr-20210215' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux updates from Paul Moore:
 "We've got a good handful of patches for SELinux this time around; with
  everything passing the selinux-testsuite and applying cleanly to your
  tree as of a few minutes ago. The highlights are:

   - Add support for labeling anonymous inodes, and extend this new
     support to userfaultfd.

   - Fallback to SELinux genfs file labeling if the filesystem does not
     have xattr support. This is useful for virtiofs which can vary in
     its xattr support depending on the backing filesystem.

   - Classify and handle MPTCP the same as TCP in SELinux.

   - Ensure consistent behavior between inode_getxattr and
     inode_listsecurity when the SELinux policy is not loaded. This
     fixes a known problem with overlayfs.

   - A couple of patches to prune some unused variables from the SELinux
     code, mark private variables as static, and mark other variables as
     __ro_after_init or __read_mostly"

* tag 'selinux-pr-20210215' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  fs: anon_inodes: rephrase to appropriate kernel-doc
  userfaultfd: use secure anon inodes for userfaultfd
  selinux: teach SELinux about anonymous inodes
  fs: add LSM-supporting anon-inode interface
  security: add inode_init_security_anon() LSM hook
  selinux: fall back to SECURITY_FS_USE_GENFS if no xattr support
  selinux: mark selinux_xfrm_refcount as __read_mostly
  selinux: mark some global variables __ro_after_init
  selinux: make selinuxfs_mount static
  selinux: drop the unnecessary aurule_callback variable
  selinux: remove unused global variables
  selinux: fix inconsistency between inode_getxattr and inode_listsecurity
  selinux: handle MPTCP consistently with TCP
2021-02-21 16:54:54 -08:00
Jens Axboe
843bbfd49f io-wq: make io_wq_fork_thread() available to other users
We want to use this in io_uring proper as well, for the SQPOLL thread.
Rename it from fork_thread() to io_wq_fork_thread(), and make it
available through the io-wq.h header.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-21 17:25:22 -07:00
Jens Axboe
bf1daa4bfc io-wq: only remove worker from free_list, if it was there
If the worker isn't on the free_list, don't attempt to delete it.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-21 17:25:22 -07:00
Jens Axboe
4379bf8bd7 io_uring: remove io_identity
We are no longer grabbing state, so no need to maintain an IO identity
that we COW if there are changes.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-21 17:25:22 -07:00
Jens Axboe
44526bedc2 io_uring: remove any grabbing of context
The async workers are siblings of the task itself, so by definition we
have all the state that we need. Remove any of the state grabbing that
we have, and requests flagging what they need.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-21 17:25:22 -07:00
Jens Axboe
c6d77d92b7 io-wq: worker idling always returns false
Remove the bool return, and the checking for it in the caller.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-21 17:25:22 -07:00
Jens Axboe
3bfe610669 io-wq: fork worker threads from original task
Instead of using regular kthread kernel threads, create kernel threads
that are like a real thread that the task would create. This ensures that
we get all the context that we need, without having to carry that state
around. This greatly reduces the code complexity, and the risk of missing
state for a given request type.

With the move away from kthread, we can also dump everything related to
assigned state to the new threads.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-21 17:25:22 -07:00
Jens Axboe
958234d5ec io-wq: don't pass 'wqe' needlessly around
Just grab it from the worker itself, which we're already passing in.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-21 17:25:22 -07:00
Jens Axboe
5aa75ed5b9 io_uring: tie async worker side to the task context
Move it outside of the io_ring_ctx, and tie it to the io_uring task
context.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-21 17:25:22 -07:00
Jens Axboe
3b094e727d io-wq: get rid of wq->use_refs
We don't support attach anymore, so doesn't make sense to carry the
use_refs reference count. Get rid of it.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-21 17:25:22 -07:00
Jens Axboe
d25e3a3de0 io_uring: disable io-wq attaching
Moving towards making the io_wq per ring per task, so we can't really
share it between rings. Which is fine, since we've now dropped some
of that fat from it.

Retain compatibility with how attaching works, so that any attempt to
attach to an fd that doesn't exist, or isn't an io_uring fd, will fail
like it did before.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-21 17:25:22 -07:00
Jens Axboe
1cbd9c2bcf io-wq: don't create any IO workers upfront
When the manager thread starts up, it creates a worker per node for
the given context. Just let these get created dynamically, like we do
for adding further workers.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-21 17:25:22 -07:00
Jens Axboe
7c25c0d16e io_uring: remove the need for relying on an io-wq fallback worker
We hit this case when the task is exiting, and we need somewhere to
do background cleanup of requests. Instead of relying on the io-wq
task manager to do this work for us, just stuff it somewhere where
we can safely run it ourselves directly.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-21 17:25:22 -07:00
Jens Axboe
2713154906 Merge branch 'for-5.12/io_uring' into io_uring-worker.v3
* for-5.12/io_uring: (21 commits)
  io_uring: run task_work on io_uring_register()
  io_uring: fix leaving invalid req->flags
  io_uring: wait potential ->release() on resurrect
  io_uring: keep generic rsrc infra generic
  io_uring: zero ref_node after killing it
  io_uring: make the !CONFIG_NET helpers a bit more robust
  io_uring: don't hold uring_lock when calling io_run_task_work*
  io_uring: fail io-wq submission from a task_work
  io_uring: don't take uring_lock during iowq cancel
  io_uring: fail links more in io_submit_sqe()
  io_uring: don't do async setup for links' heads
  io_uring: do io_*_prep() early in io_submit_sqe()
  io_uring: split sqe-prep and async setup
  io_uring: don't submit link on error
  io_uring: move req link into submit_state
  io_uring: move io_init_req() into io_submit_sqe()
  io_uring: move io_init_req()'s definition
  io_uring: don't duplicate ->file check in sfr
  io_uring: keep io_*_prep() naming consistent
  io_uring: kill fictitious submit iteration index
  ...
2021-02-21 17:22:53 -07:00
Pavel Begunkov
b6c23dd5a4 io_uring: run task_work on io_uring_register()
Do run task_work before io_uring_register(), that might make a first
quiesce round much nicer. We generally do that for any syscall invocation
to avoid spurious -EINTR/-ERESTARTSYS, for task_work that we generate.
This patch brings io_uring_register() inline with the two other io_uring
syscalls.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-21 17:18:56 -07:00
Linus Torvalds
66f73fb3fa This pull request contains changes (actually just fixes) for UBIFS
JFFS2:
 - Fix for use-after-free in jffs2_sum_write_data()
 - Fix for out-of-bounds access in jffs2_zlib_compress()
 
 UBI:
 - Remove dead/useless code
 
 UBIFS:
 - Fix for a memory leak in ubifs_init_authentication()
 - Fix for high stack usage
 - Fix for a off-by-one error in xattrs code
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmAyuh8WHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wT+bD/9Q2Ar9yMX9drPyAnyb3vudE8c8
 l0RdNLyBSL87pskpszNZR2+o8Yi3vjlbGWq5i97JsP/7UOb4Gc/MfXPYJteP1xUN
 S46EZwgcZa4XCgMSSdMk/NZl7bVdbwjvcGjw1CA4RdPkwt8s2jwYdS+hPrHu6o87
 3xkP7kWShs/2KhUyvodZgAu6SYcTW+OjiKwdAIKxa1Ak9YUMGzsSHqCbl19he5MG
 hMxFZIqRZ2zZUfFeYXffVApJI8eBEKVud2qtNA/A6eGsy5Wx3c4F/bxG/uWdoJPp
 n5CmFRc6UGh8teA43aag5BnLv8sR9bC1Kf3lQX4nwfpBSzE7LwIMN7SVpL0JH5vT
 dJdwn37JYL/RQjmjTk++O/sSgeg9jJWMG+VOSmuKWPgP6xAYEVXqWg9njuV3wl9W
 NFBoybP82IyVHcthOcTrY8dx0F7A4q+3PkMy+7cikO2fYKVvJjdKgTp4pcVnGCR3
 IadXzNRdYrLPvYwf25D2AyETwQQxcmh/Ox7ZOhkUXuFQ/KnhU0yqbO3cTTB1A/mO
 jY2SPtXXeUZwgGpGc8Lyr8/KGZ6tJX/3jswwmg+XvdegBLRogqty8XOcwUuJszCh
 1kDAKs2LJ6UaMYyhV6Jxr4c+wgHoKJG+voY+oTkrUP4Lt0hQVELCylEkX2uJo60Y
 x4Gic/YbRUwnfjlAcg==
 =xorv
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs

Pull JFFS2/UBIFS and UBI updates from Richard Weinberger:
 "JFFS2:
   - Fix for use-after-free in jffs2_sum_write_data()
   - Fix for out-of-bounds access in jffs2_zlib_compress()

  UBI:
   - Remove dead/useless code

  UBIFS:
   - Fix for a memory leak in ubifs_init_authentication()
   - Fix for high stack usage
   - Fix for a off-by-one error in xattrs code"

* tag 'for-linus-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
  ubifs: Fix error return code in alloc_wbufs()
  jffs2: check the validity of dstlen in jffs2_zlib_compress()
  ubifs: Fix off-by-one error
  ubifs: replay: Fix high stack usage, again
  ubifs: Fix memleak in ubifs_init_authentication
  jffs2: fix use after free in jffs2_sum_write_data()
  ubi: eba: Delete useless kfree code
  ubi: remove dead code in validate_vid_hdr()
2021-02-21 13:57:08 -08:00
Linus Torvalds
04471d3f18 This pull request contains the following changes for UML:
- Many cleanups and fixes for our virtio code
 - Add support for a pseudo RTC
 - Fix for a possible jailbreak
 - Minor fixes (spelling, header files)
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmAyu0oWHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wd5jD/9sb/5xYhXCSfTdPS/eIrWvBQoc
 B8rxLfRpYW1Yvzz4R60/vKe8/td5I1/AvlprLp/1AJeawl49vCbSOqwdjn+58Uqb
 rlagZ2Ikilfn5lVIsxPf8fjbleonvBe8qVA30gJKgCYdYuAcXLs404jZ8MMvwZ0g
 t4G7BUc7bq19+UVF06kwefzC64c1WgPiHTmO6DT6RcXoFKq/x6i1FN4QnMEoiKQi
 SsficYHo7FsIhJZKtgTfujzEInLyMMuZ9mTJU/3wwUveLWArG0NRtIttC6FPvhi4
 xx5RlTfC/Jzoqi9Qo14bLqV6KcOU/J7Oi4bDpYyhNggF/QfhnNgT8MGPwx5f+Gso
 8OJg3MsZw70480EBH7/xLSdhZ3ih178Rr/asmiJkwLOYm5zJ22yqtx/jXQBlGOz3
 FHPgTMJcRMzosGqSrhl+KxFdrK1uSLbcFZS3Sp8PUGdtgPPu19Proz2SPdHzt1Mj
 QJY30nRKKUoTLnRYnQV3VSa7uZXGVAK+HtkpRl/ubTKbGcSF8rdl4fYhOPnmAsKQ
 F4HBXwqKBht7eKN2BsNNTLz86OFBopn8eFqq8XxwOqF9O7DZitU0sOboWJyMUY2u
 /QzKxtSAUnNg6Ab+whKhAvkwktJ7IrVJh1JENWDy0pRoCGdF135ajic0bpFDCjqV
 ohOT/Ol+p4/ClLgxiw==
 =e5Qa
 -----END PGP SIGNATURE-----

Merge tag 'for-linux-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml

Pull UML updates from Richard Weinberger:

 - Many cleanups and fixes for our virtio code

 - Add support for a pseudo RTC

 - Fix for a possible jailbreak

 - Minor fixes (spelling, header files)

* tag 'for-linux-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  um: irq.h: include <asm-generic/irq.h>
  um: io.h: include <linux/types.h>
  um: add a pseudo RTC
  um: remove process stub VMA
  um: rework userspace stubs to not hard-code stub location
  um: separate child and parent errors in clone stub
  um: defer killing userspace on page table update failures
  um: mm: check more comprehensively for stub changes
  um: print register names in wait_for_stub
  um: hostfs: use a kmem cache for inodes
  mm: Remove arch_remap() and mm-arch-hooks.h
  um: fix spelling mistake in Kconfig "privleges" -> "privileges"
  um: virtio: allow devices to be configured for wakeup
  um: time-travel: rework interrupt handling in ext mode
  um: virtio: disable VQs during suspend
  um: virtio: fix handling of messages without payload
  um: virtio: clean up a comment
2021-02-21 13:53:00 -08:00
Linus Torvalds
df24212a49 s390 updates for the 5.12 merge window
- Convert to using the generic entry infrastructure.
 
 - Add vdso time namespace support.
 
 - Switch s390 and alpha to 64-bit ino_t. As discussed here
   lkml.kernel.org/r/YCV7QiyoweJwvN+m@osiris
 
 - Get rid of expensive stck (store clock) usages where possible. Utilize
   cpu alternatives to patch stckf when supported.
 
 - Make tod_clock usage less error prone by converting it to a union and
   rework code which is using it.
 
 - Machine check handler fixes and cleanups.
 
 - Drop couple of minor inline asm optimizations to fix clang build.
 
 - Default configs changes notably to make libvirt happy.
 
 - Various changes to rework and improve qdio code.
 
 - Other small various fixes and improvements all over the code.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmAyzcwACgkQjYWKoQLX
 FBjjMwgAmeY3oMkj93bnUF/OnbYTJQ0ZHmlyeboKt7SnFyvNpOVGyRfl7+fPHsNu
 +t9QZQk0f7fSxbcC04gz0ZMw1YbTjWihgZJsN6s+qtrRsv/kVqKr7kvhFrcs8uSZ
 rLiwIRWGVAbprnJZWCNqaGpKkOM0wPYZ5W3Mtnoxe4nTM2LwSu2RWI8ibTGYLQPy
 FybKos2hYOFBTGQdrxmg1zAvpE8DJg4qQNLhYvnmHd8Bw/FNBmoyhx8rS8z06NmS
 dWMk7pfvQaslIIaFC3Yo7/sJVa/JJH33FlBonc+MSO8OZz5O6vG4bk9ZHq6DfHUH
 V1I38xiBdYdSXDq8QqT3N9d+CtjeMQ==
 =Lt/v
 -----END PGP SIGNATURE-----

Merge tag 's390-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Vasily Gorbik:

 - Convert to using the generic entry infrastructure.

 - Add vdso time namespace support.

 - Switch s390 and alpha to 64-bit ino_t. As discussed at

     https://lore.kernel.org/linux-mm/YCV7QiyoweJwvN+m@osiris/

 - Get rid of expensive stck (store clock) usages where possible.
   Utilize cpu alternatives to patch stckf when supported.

 - Make tod_clock usage less error prone by converting it to a union and
   rework code which is using it.

 - Machine check handler fixes and cleanups.

 - Drop couple of minor inline asm optimizations to fix clang build.

 - Default configs changes notably to make libvirt happy.

 - Various changes to rework and improve qdio code.

 - Other small various fixes and improvements all over the code.

* tag 's390-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (68 commits)
  s390/qdio: remove 'merge_pending' mechanism
  s390/qdio: improve handling of PENDING buffers for QEBSM devices
  s390/qdio: rework q->qdio_error indication
  s390/qdio: inline qdio_kick_handler()
  s390/time: remove get_tod_clock_ext()
  s390/crypto: use store_tod_clock_ext()
  s390/hypfs: use store_tod_clock_ext()
  s390/debug: use union tod_clock
  s390/kvm: use union tod_clock
  s390/vdso: use union tod_clock
  s390/time: convert tod_clock_base to union
  s390/time: introduce new store_tod_clock_ext()
  s390/time: rename store_tod_clock_ext() and use union tod_clock
  s390/time: introduce union tod_clock
  s390,alpha: switch to 64-bit ino_t
  s390: split cleanup_sie
  s390: use r13 in cleanup_sie as temp register
  s390: fix kernel asce loading when sie is interrupted
  s390: add stack for machine check handler
  s390: use WRITE_ONCE when re-allocating async stack
  ...
2021-02-21 13:40:06 -08:00
Linus Torvalds
3e10585335 x86:
- Support for userspace to emulate Xen hypercalls
 - Raise the maximum number of user memslots
 - Scalability improvements for the new MMU.  Instead of the complex
   "fast page fault" logic that is used in mmu.c, tdp_mmu.c uses an
   rwlock so that page faults are concurrent, but the code that can run
   against page faults is limited.  Right now only page faults take the
   lock for reading; in the future this will be extended to some
   cases of page table destruction.  I hope to switch the default MMU
   around 5.12-rc3 (some testing was delayed due to Chinese New Year).
 - Cleanups for MAXPHYADDR checks
 - Use static calls for vendor-specific callbacks
 - On AMD, use VMLOAD/VMSAVE to save and restore host state
 - Stop using deprecated jump label APIs
 - Workaround for AMD erratum that made nested virtualization unreliable
 - Support for LBR emulation in the guest
 - Support for communicating bus lock vmexits to userspace
 - Add support for SEV attestation command
 - Miscellaneous cleanups
 
 PPC:
 - Support for second data watchpoint on POWER10
 - Remove some complex workarounds for buggy early versions of POWER9
 - Guest entry/exit fixes
 
 ARM64
 - Make the nVHE EL2 object relocatable
 - Cleanups for concurrent translation faults hitting the same page
 - Support for the standard TRNG hypervisor call
 - A bunch of small PMU/Debug fixes
 - Simplification of the early init hypercall handling
 
 Non-KVM changes (with acks):
 - Detection of contended rwlocks (implemented only for qrwlocks,
   because KVM only needs it for x86)
 - Allow __DISABLE_EXPORTS from assembly code
 - Provide a saner follow_pfn replacements for modules
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmApSRgUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOc7wf9FnlinKoTFaSk7oeuuhF/CoCVwSFs
 Z9+A2sNI99tWHQxFR6dyDkEFeQoXnqSxfLHtUVIdH/JnTg0FkEvFz3NK+0PzY1PF
 PnGNbSoyhP58mSBG4gbBAxdF3ZJZMB8GBgYPeR62PvMX2dYbcHqVBNhlf6W4MQK4
 5mAUuAnbf19O5N267sND+sIg3wwJYwOZpRZB7PlwvfKAGKf18gdBz5dQ/6Ej+apf
 P7GODZITjqM5Iho7SDm/sYJlZprFZT81KqffwJQHWFMEcxFgwzrnYPx7J3gFwRTR
 eeh9E61eCBDyCTPpHROLuNTVBqrAioCqXLdKOtO5gKvZI3zmomvAsZ8uXQ==
 =uFZU
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "x86:

   - Support for userspace to emulate Xen hypercalls

   - Raise the maximum number of user memslots

   - Scalability improvements for the new MMU.

     Instead of the complex "fast page fault" logic that is used in
     mmu.c, tdp_mmu.c uses an rwlock so that page faults are concurrent,
     but the code that can run against page faults is limited. Right now
     only page faults take the lock for reading; in the future this will
     be extended to some cases of page table destruction. I hope to
     switch the default MMU around 5.12-rc3 (some testing was delayed
     due to Chinese New Year).

   - Cleanups for MAXPHYADDR checks

   - Use static calls for vendor-specific callbacks

   - On AMD, use VMLOAD/VMSAVE to save and restore host state

   - Stop using deprecated jump label APIs

   - Workaround for AMD erratum that made nested virtualization
     unreliable

   - Support for LBR emulation in the guest

   - Support for communicating bus lock vmexits to userspace

   - Add support for SEV attestation command

   - Miscellaneous cleanups

  PPC:

   - Support for second data watchpoint on POWER10

   - Remove some complex workarounds for buggy early versions of POWER9

   - Guest entry/exit fixes

  ARM64:

   - Make the nVHE EL2 object relocatable

   - Cleanups for concurrent translation faults hitting the same page

   - Support for the standard TRNG hypervisor call

   - A bunch of small PMU/Debug fixes

   - Simplification of the early init hypercall handling

  Non-KVM changes (with acks):

   - Detection of contended rwlocks (implemented only for qrwlocks,
     because KVM only needs it for x86)

   - Allow __DISABLE_EXPORTS from assembly code

   - Provide a saner follow_pfn replacements for modules"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (192 commits)
  KVM: x86/xen: Explicitly pad struct compat_vcpu_info to 64 bytes
  KVM: selftests: Don't bother mapping GVA for Xen shinfo test
  KVM: selftests: Fix hex vs. decimal snafu in Xen test
  KVM: selftests: Fix size of memslots created by Xen tests
  KVM: selftests: Ignore recently added Xen tests' build output
  KVM: selftests: Add missing header file needed by xAPIC IPI tests
  KVM: selftests: Add operand to vmsave/vmload/vmrun in svm.c
  KVM: SVM: Make symbol 'svm_gp_erratum_intercept' static
  locking/arch: Move qrwlock.h include after qspinlock.h
  KVM: PPC: Book3S HV: Fix host radix SLB optimisation with hash guests
  KVM: PPC: Book3S HV: Ensure radix guest has no SLB entries
  KVM: PPC: Don't always report hash MMU capability for P9 < DD2.2
  KVM: PPC: Book3S HV: Save and restore FSCR in the P9 path
  KVM: PPC: remove unneeded semicolon
  KVM: PPC: Book3S HV: Use POWER9 SLBIA IH=6 variant to clear SLB
  KVM: PPC: Book3S HV: No need to clear radix host SLB before loading HPT guest
  KVM: PPC: Book3S HV: Fix radix guest SLB side channel
  KVM: PPC: Book3S HV: Remove support for running HPT guest on RPT host without mixed mode support
  KVM: PPC: Book3S HV: Introduce new capability for 2nd DAWR
  KVM: PPC: Book3S HV: Add infrastructure to support 2nd DAWR
  ...
2021-02-21 13:31:43 -08:00