Commit Graph

83857 Commits

Author SHA1 Message Date
Linus Torvalds
d2c5231581 Fourteen hotfixes, eleven of which are cc:stable. The remainder pertain
to issues which were introduced after 6.5.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZRmSDAAKCRDdBJ7gKXxA
 jlSaAQCe3SnBdjRmuzbp5iIfNJOY7GXLN4NwMsArRUxRGY27IwD+KWhXZP/ydVnt
 ZgS4x9rmarHuh5Pxds+6SRGhihRz/Ak=
 =sf/5
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2023-10-01-08-34' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "Fourteen hotfixes, eleven of which are cc:stable. The remainder
  pertain to issues which were introduced after 6.5"

* tag 'mm-hotfixes-stable-2023-10-01-08-34' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  Crash: add lock to serialize crash hotplug handling
  selftests/mm: fix awk usage in charge_reserved_hugetlb.sh and hugetlb_reparenting_test.sh that may cause error
  mm: mempolicy: keep VMA walk if both MPOL_MF_STRICT and MPOL_MF_MOVE are specified
  mm/damon/vaddr-test: fix memory leak in damon_do_test_apply_three_regions()
  mm, memcg: reconsider kmem.limit_in_bytes deprecation
  mm: zswap: fix potential memory corruption on duplicate store
  arm64: hugetlb: fix set_huge_pte_at() to work with all swap entries
  mm: hugetlb: add huge page size param to set_huge_pte_at()
  maple_tree: add MAS_UNDERFLOW and MAS_OVERFLOW states
  maple_tree: add mas_is_active() to detect in-tree walks
  nilfs2: fix potential use after free in nilfs_gccache_submit_read_data()
  mm: abstract moving to the next PFN
  mm: report success more often from filemap_map_folio_range()
  fs: binfmt_elf_efpic: fix personality for ELF-FDPIC
2023-10-01 13:33:25 -07:00
Linus Torvalds
3b347e4032 Tracing fixes for v6.6-rc3:
- Make sure 32 bit applications using user events have aligned access when
   running on a 64 bit kernel.
 
 - Add cond_resched in the loop that handles converting enums in print_fmt
   string is trace events.
 
 - Fix premature wake ups of polling processes in the tracing ring buffer. When
   a task polls waiting for a percentage of the ring buffer to be filled, the
   writer still will wake it up at every event. Add the polling's percentage to
   the "shortest_full" list to tell the writer when to wake it up.
 
 - For eventfs dir lookups on dynamic events, an event system's only event could
   be removed, leaving its dentry with no children. This is totally legitimate.
   But in eventfs_release() it must not access the children array, as it is only
   allocated when the dentry has children.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZRiI2xQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qlvoAQDKbevbqA0C8lEV1rbVh4Q9Rnq580rz
 EAyEO/RrSOwE9AEA2z+Q597mDjEiqQBvqTjBkS+0xZ7AUQYZRWgTHRIbegg=
 =tqOM
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Make sure 32-bit applications using user events have aligned access
   when running on a 64-bit kernel.

 - Add cond_resched in the loop that handles converting enums in
   print_fmt string is trace events.

 - Fix premature wake ups of polling processes in the tracing ring
   buffer. When a task polls waiting for a percentage of the ring buffer
   to be filled, the writer still will wake it up at every event. Add
   the polling's percentage to the "shortest_full" list to tell the
   writer when to wake it up.

 - For eventfs dir lookups on dynamic events, an event system's only
   event could be removed, leaving its dentry with no children. This is
   totally legitimate. But in eventfs_release() it must not access the
   children array, as it is only allocated when the dentry has children.

* tag 'trace-v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  eventfs: Test for dentries array allocated in eventfs_release()
  tracing/user_events: Align set_bit() address for all archs
  tracing: relax trace_event_eval_update() execution with cond_resched()
  ring-buffer: Update "shortest_full" in polling
2023-09-30 18:19:02 -07:00
Steven Rostedt (Google)
2598bd3ca8 eventfs: Test for dentries array allocated in eventfs_release()
The dcache_dir_open_wrapper() could be called when a dynamic event is
being deleted leaving a dentry with no children. In this case the
dlist->dentries array will never be allocated. This needs to be checked
for in eventfs_release(), otherwise it will trigger a NULL pointer
dereference.

Link: https://lore.kernel.org/linux-trace-kernel/20230930090106.1c3164e9@rorschach.local.home

Cc: Mark Rutland <mark.rutland@arm.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Fixes: ef36b4f928 ("eventfs: Remember what dentries were created on dir open")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-09-30 16:26:04 -04:00
Linus Torvalds
25d48d570e Bug fixes for 6.6-rc4:
* Handle a race between writing and shrinking block devices by
    returning EIO.
  * Fix a typo in a comment.
 
 Signed-off-by: Darrick J. Wong <djwong@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZRWpqAAKCRBKO3ySh0YR
 puinAP46EI8AvxQOid2ukGIEP09ZdhYNcJkWsigZ8k7Z/wcqagD/UVliaDPGC2kk
 rlrPN6jmNyqDzAP6muBmqu2v44GVWwY=
 =7nql
 -----END PGP SIGNATURE-----

Merge tag 'iomap-6.6-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull iomap fixes from Darrick Wong:

 - Handle a race between writing and shrinking block devices by
   returning EIO

 - Fix a typo in a comment

* tag 'iomap-6.6-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  iomap: Spelling s/preceeding/preceding/g
  iomap: add a workaround for racy i_size updates on block devices
2023-09-30 11:01:38 -07:00
Linus Torvalds
ae21363998 nfsd-6.6 fixes:
- Fix NFSv4 READ corner case
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmUYSnEACgkQM2qzM29m
 f5dtSxAAk/n3CzeIJFALIcP+JmHW6Zh2KbpygcqhxFoO1FAyRBJgOIEhtdD/c65Q
 fzfCQDZUzdTSirklBS5m0lIMhqaYMe1LUC8OIq/hh309vQ6WnlLltIU3fsaZFva5
 kAaDeI/oGq2YnbjQENWCGHvKC7RR1jOTWASI/+52oYg/fwRkt14so/wDl54mxhNP
 q6Gt0Xw/mXkslAkQNQ+7a5eAa4TtepzkjkNWL4hSqboHQ4QFT2tmXqDV3E+V9tzX
 F/tlEN3EJo3nDNcUYAh6ec/YXo0tBh4DsnJxA9bRYloU40EUGP19ewXLEQgXPw9M
 IgSNluxTmDTQGz6kSUkQpepevWDUN0+fDt1kVqACIRXuvuMcGuDUCbD4VMa8HNnc
 bCMg8R/SkrBnzTamHJpPhbV/F/C0Iwp80RpqDMAnt4splrQUCJeB+Wl1qJAnYUW5
 TdJxSJKCvypMIcNCd1ZMSVbeByUgZ/qXKKsS2YNS4l8+DGdnEJQ2mnFKw3Nk5bEF
 byWoyW6RMXCIwaqQa8hNsO5kW4MTxURxLXcgqzN7+5L2ht7pAYpNjnOc0vIBPe5t
 9z+/dSk8a9QXBwRTxPL4QOgludXPqvokGBtCIV1LD5xQA9OZuXivnBz2DWmCt+IR
 GOYfjs3gSlvVrv7+EGffYlOh9f7GN/wlUfUxERKcB6HPweGOtqs=
 =aD10
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fix from Chuck Lever:

 - Fix NFSv4 READ corner case

* tag 'nfsd-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  NFSD: Fix zero NFSv4 READ results when RQ_SPLICE_OK is not set
2023-09-30 09:44:48 -07:00
Linus Torvalds
ba77f7a63f smb3 client fix for password freeing potential oops
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmUXmlQACgkQiiy9cAdy
 T1Gh8Av8DprSP5ARIljIzuzPL60R/TF8WvWTuQZ93+KzGkM0eMXSvZ6VH0T3WXGs
 amfgwZFw1TOiQ6j3OqxX7ppd2PeooY65mX2tKVEQ27POQXB3VyUy2gBSPCTsDckD
 uOFX55GoPzkxxRgTWO7UHAbBzrZDJ8geH12Z0z9EXgJJjnJhkQi7dPkA+OusE3y0
 wo4AjKOk+7HOFiuvG3p1XZKYHeB36PD+xATqRBxJAaUGkgV76stFEK/7lNWzxg/t
 NdiHQG5ILZP6L2RZrCVX88Et3HggGV5AF3GpDb8ZNfD7xQcu8lgVzHlw2E6CA5DS
 WFR6zrBiBsz9mPrS9cWY83C+aL8MHeumalBjLKQ5krL+IpC46H7a12yAWt5Yk6Q7
 zdRQ9TtE8CMozqEJiaCFgL1Hz1CuTCvK974q+p5RG8OejRLzgmzFKYp8k7+L0qrc
 tZXQ16McYkBRcRIgaOk0i8GpJ44qNdhu/KTI5iKNqH0ScfHn56CIBiwSmZHiX75t
 zcCdF0bz
 =KBcN
 -----END PGP SIGNATURE-----

Merge tag '6.6-rc3-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fix from Steve French:
 "Fix for password freeing potential oops (also for stable)"

* tag '6.6-rc3-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6:
  fs/smb/client: Reset password pointer to NULL
2023-09-30 09:39:23 -07:00
Pan Bian
7ee29facd8 nilfs2: fix potential use after free in nilfs_gccache_submit_read_data()
In nilfs_gccache_submit_read_data(), brelse(bh) is called to drop the
reference count of bh when the call to nilfs_dat_translate() fails.  If
the reference count hits 0 and its owner page gets unlocked, bh may be
freed.  However, bh->b_page is dereferenced to put the page after that,
which may result in a use-after-free bug.  This patch moves the release
operation after unlocking and putting the page.

NOTE: The function in question is only called in GC, and in combination
with current userland tools, address translation using DAT does not occur
in that function, so the code path that causes this issue will not be
executed.  However, it is possible to run that code path by intentionally
modifying the userland GC library or by calling the GC ioctl directly.

[konishi.ryusuke@gmail.com: NOTE added to the commit log]
Link: https://lkml.kernel.org/r/1543201709-53191-1-git-send-email-bianpan2016@163.com
Link: https://lkml.kernel.org/r/20230921141731.10073-1-konishi.ryusuke@gmail.com
Fixes: a3d93f709e ("nilfs2: block cache for garbage collection")
Signed-off-by: Pan Bian <bianpan2016@163.com>
Reported-by: Ferry Meng <mengferry@linux.alibaba.com>
Closes: https://lkml.kernel.org/r/20230818092022.111054-1-mengferry@linux.alibaba.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29 17:20:46 -07:00
Greg Ungerer
7c31515857 fs: binfmt_elf_efpic: fix personality for ELF-FDPIC
The elf-fdpic loader hard sets the process personality to either
PER_LINUX_FDPIC for true elf-fdpic binaries or to PER_LINUX for normal ELF
binaries (in this case they would be constant displacement compiled with
-pie for example).  The problem with that is that it will lose any other
bits that may be in the ELF header personality (such as the "bug
emulation" bits).

On the ARM architecture the ADDR_LIMIT_32BIT flag is used to signify a
normal 32bit binary - as opposed to a legacy 26bit address binary.  This
matters since start_thread() will set the ARM CPSR register as required
based on this flag.  If the elf-fdpic loader loses this bit the process
will be mis-configured and crash out pretty quickly.

Modify elf-fdpic loader personality setting so that it preserves the upper
three bytes by using the SET_PERSONALITY macro to set it.  This macro in
the generic case sets PER_LINUX and preserves the upper bytes. 
Architectures can override this for their specific use case, and ARM does
exactly this.

The problem shows up quite easily running under qemu using the ARM
architecture, but not necessarily on all types of real ARM hardware.  If
the underlying ARM processor does not support the legacy 26-bit addressing
mode then everything will work as expected.

Link: https://lkml.kernel.org/r/20230907011808.2985083-1-gerg@kernel.org
Fixes: 1bde925d23 ("fs/binfmt_elf_fdpic.c: provide NOMMU loader for regular ELF binaries")
Signed-off-by: Greg Ungerer <gerg@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg Ungerer <gerg@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29 17:20:45 -07:00
Linus Torvalds
9f3ebbef74 Two SMB3 server fixes
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmUXOMIACgkQiiy9cAdy
 T1G+XQv9Fj1kWJRPHih1wTRFHAgysHRtFw04KvW9SLmpGkPLBslJm3Fg1yPiXytc
 nfqZNCPS/tuWIdqc9YRJqEWKfCO6X/+0IESnN6Wl4jIqMSviL/Hg3DzXZr5YCsAy
 1vJ+DYmQkmWNgZ8grnFjKCSezTrAb+b+VLZqsx7dzT8NhTRxdKoTBDS31jTGMswV
 OIQ1b/aLv9hgUS08wqzSKveMq8DkX66UbkSakM+tVImu32eh7u1HG89P7y3e/dr3
 lGwd/Pq+IIiyAgZ0uoPdI9hQ2+Md6JfhVFMiMTLUfwh1LCDgrYnYrOkuZWYvDr9z
 t2Y0+IwEkljk7HcFaL0NKPJW2beG4eNh/t2t6ff6vK8MhzlXp3KM5yVBlXNYc7hA
 JfsMxVIFhUobeRKbQY9S6BstHyo19pdfeHDm/+RicIhRfOo++7kWYzwqKsD0pvLC
 wcr3CBLqqsPXamRwUBbxnMASjYVmoz4nSXusXLDxmSWK39NCjEIz3YeFZfcAdoou
 jnvMikMA
 =jim3
 -----END PGP SIGNATURE-----

Merge tag '6.6-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:
 "Two SMB3 server fixes for null pointer dereferences:

   - invalid SMB3 request case (fixes issue found in testing the read
     compound patch)

   - iovec error case in response processing"

* tag '6.6-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: check iov vector index in ksmbd_conn_write()
  ksmbd: return invalid parameter error response if smb2 request is invalid
2023-09-29 16:51:38 -07:00
Linus Torvalds
14c06b913d A series that fixes an involved "double watch error" deadlock in RBD
marked for stable and two cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmUW5QwTHGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHziw7+B/99D/BRIJUaCz8hm2xMZC3Yu6Cvi2de
 YlgHZBuHm5lmihzITEdoHTmWlIpgGchqjTaikCvVooKEe1w4sNr7nYFiXUFVw9sf
 W/I06dtlJlj2f4oyK91i4sIzpQKbXZznFDpTHThjRJt+uyUp3RYVbrCDMmGAnJv3
 foppstycm5fe2Y2e/RgNyYOHY+EAjvS5UrpvT3lAX+iw5KXR1pyMBrFo8iICLPlZ
 TIPP4mwlVwb/WB1rGnxaK65RJxGuXLwuMWdLF9kq1ZeCld6owPH3x2RDax2+vMvA
 bZxI6gQymU2SquKiNseYF3kQ+2KdC5mfjmkoncOH79Je4JPHf7xa4LYZ
 =7/Ez
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-6.6-rc4' of https://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
 "A series that fixes an involved 'double watch error' deadlock in RBD
  marked for stable and two cleanups"

* tag 'ceph-for-6.6-rc4' of https://github.com/ceph/ceph-client:
  rbd: take header_rwsem in rbd_dev_refresh() only when updating
  rbd: decouple parent info read-in from updating rbd_dev
  rbd: decouple header read-in from updating rbd_dev->header
  rbd: move rbd_dev_refresh() definition
  Revert "ceph: make members in struct ceph_mds_request_args_ext a union"
  ceph: remove unnecessary check for NULL in parse_longname()
2023-09-29 16:46:24 -07:00
Linus Torvalds
10c0b6ba25 Bug fixes for 6.6-rc4:
* Include modifications made to commit "xfs: load uncached unlinked inodes
   into memory on demand" (Commit ID: 68b957f64f)
   which address review comments provided by Dave Chinner.
 
 Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQjMC4mbgVeU7MxEIYH7y4RirJu9AUCZRPtowAKCRAH7y4RirJu
 9EcNAQDnuVtf89FL0Qqqtho5TeK2UO4JhEcTWI4Wj1d9w7h4lAEA5ZTYu8oJDg0k
 zoTXgr9sbpzcf53fgY0hwqPVjdV8dwU=
 =WkWe
 -----END PGP SIGNATURE-----

Merge tag 'xfs-6.6-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fix from Chandan Babu:

 - fix for commit 68b957f64f ("xfs: load uncached unlinked inodes into
   memory on demand") which address review comments provided by Dave
   Chinner

* tag 'xfs-6.6-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: fix reloading entire unlinked bucket lists
2023-09-29 16:41:25 -07:00
Quang Le
e6e43b8aa7 fs/smb/client: Reset password pointer to NULL
Forget to reset ctx->password to NULL will lead to bug like double free

Cc: stable@vger.kernel.org
Cc: Willy Tarreau <w@1wt.eu>
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Quang Le <quanglex97@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-09-28 14:49:51 -05:00
Geert Uytterhoeven
684f7e6d28 iomap: Spelling s/preceeding/preceding/g
Fix a misspelling of "preceding".

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-09-28 09:26:58 -07:00
Chuck Lever
0d32a6bbb8 NFSD: Fix zero NFSv4 READ results when RQ_SPLICE_OK is not set
nfsd4_encode_readv() uses xdr->buf->page_len as a starting point for
the nfsd_iter_read() sink buffer -- page_len is going to be offset
by the parts of the COMPOUND that have already been encoded into
xdr->buf->pages.

However, that value must be captured /before/
xdr_reserve_space_vec() advances page_len by the expected size of
the read payload. Otherwise, the whole front part of the first
page of the payload in the reply will be uninitialized.

Mantas hit this because sec=krb5i forces RQ_SPLICE_OK off, which
invokes the readv part of the nfsd4_encode_read() path. Also,
older Linux NFS clients appear to send shorter READ requests
for files smaller than a page, whereas newer clients just send
page-sized requests and let the server send as many bytes as
are in the file.

Reported-by: Mantas Mikulėnas <grawity@gmail.com>
Closes: https://lore.kernel.org/linux-nfs/f1d0b234-e650-0f6e-0f5d-126b3d51d1eb@gmail.com/
Fixes: 703d752155 ("NFSD: Hoist rq_vec preparation into nfsd_read() [step two]")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-09-28 10:34:28 -04:00
Linus Torvalds
cac405a3bf for-6.6-rc3-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmURvloACgkQxWXV+ddt
 WDt+CQ/+NgBtQn7eyABsdHzXWPxpFyGZrdw5ldKnly3G+WDW2GKMaZ6CpDuEZGNQ
 vMAkSGX5LIHXvO79pDnGG0i+bRINWrc5HZVZ/p5Da6wplBTgIPlbLmxaZX9MJLbx
 j7Oz37GXiQJY8BxnVCnsb+bhhTrTbO9HFUQr/nxefIvu22OBdL1WXYcfuBOeEsFG
 qr/aeC52YqCVgXvt+8a5DqAKE0NWc4PFMFUMo4vlf1xuL652fvff7xiup1CAIgBh
 qsCa17E7q+qjri2phAhbFNadfpH5wGfyjTWScOlaFuXjRhW2v2oqz3WU5IQj4dmu
 PI+k++PLUzIxT0IcjD1YbZzRFaEI6fR2W0GA4LK08fjVehh2ao5jOjtRgLl8HlqG
 qC5fslAPzUxRmwMmCjSGfXF14sgtyLy8eVWf69xn06/1cbEmfHDrWNXP1QHuq6eT
 Jqy8Ywia3jRzzfZ1utABJPLBW4hFQKkyobtyd67fxslUFmtuLvLqGTiOdmVFiD9K
 o+BF2xjEz2n8O1+aRZk5SFNC9zcaASaRg/wQrhvSI9qxM18fh4TXgKQOniLzAK7v
 lZc+JkegFW4CVquCUpmbsdZAOpVNRXfPOJIt/w6G+oRbaiTvPUnrH+uyq8IGREbw
 E7d8XIP0qlF0DQBGK4Mw/riZz/e5MmEKNjza6M+fj2uglpfWTv4=
 =6WEW
 -----END PGP SIGNATURE-----

Merge tag 'for-6.6-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - delayed refs fixes:
     - fix race when refilling delayed refs block reserve
     - prevent transaction block reserve underflow when starting
       transaction
     - error message and value adjustments

 - fix build warnings with CONFIG_CC_OPTIMIZE_FOR_SIZE and
   -Wmaybe-uninitialized

 - fix for smatch report where uninitialized data from invalid extent
   buffer range could be returned to the caller

 - fix numeric overflow in statfs when calculating lower threshold
   for a full filesystem

* tag 'for-6.6-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: initialize start_slot in btrfs_log_prealloc_extents
  btrfs: make sure to initialize start and len in find_free_dev_extent
  btrfs: reset destination buffer when read_extent_buffer() gets invalid range
  btrfs: properly report 0 avail for very full file systems
  btrfs: log message if extent item not found when running delayed extent op
  btrfs: remove redundant BUG_ON() from __btrfs_inc_extent_ref()
  btrfs: return -EUCLEAN for delayed tree ref with a ref count not equals to 1
  btrfs: prevent transaction block reserve underflow when starting transaction
  btrfs: fix race when refilling delayed refs block reserve
2023-09-26 09:44:08 -07:00
Linus Torvalds
84422aee15 v6.6-rc4.vfs.fixes
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZRKHuAAKCRCRxhvAZXjc
 ohOLAQDU9Fxq5UdqCdmsyi/b24XJFZlQhcVIZy2Hrhcor9TiVQEAjuECGlxFPSgj
 atVOWLdugDJquiHextqTEMgIecJpNw4=
 =uINF
 -----END PGP SIGNATURE-----

Merge tag 'v6.6-rc4.vfs.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:
 "This contains the usual miscellaneous fixes and cleanups for vfs and
  individual fses:

  Fixes:
   - Revert ki_pos on error from buffered writes for direct io fallback
   - Add missing documentation for block device and superblock handling
     for changes merged this cycle
   - Fix reiserfs flexible array usage
   - Ensure that overlayfs sets ctime when setting mtime and atime
   - Disable deferred caller completions with overlayfs writes until
     proper support exists

  Cleanups:
   - Remove duplicate initialization in pipe code
   - Annotate aio kioctx_table with __counted_by"

* tag 'v6.6-rc4.vfs.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
  overlayfs: set ctime when setting mtime and atime
  ntfs3: put resources during ntfs_fill_super()
  ovl: disable IOCB_DIO_CALLER_COMP
  porting: document superblock as block device holder
  porting: document new block device opening order
  fs/pipe: remove duplicate "offset" initializer
  fs-writeback: do not requeue a clean inode having skipped pages
  aio: Annotate struct kioctx_table with __counted_by
  direct_write_fallback(): on error revert the ->ki_pos update from buffered write
  reiserfs: Replace 1-element array with C99 style flex-array
2023-09-26 08:50:30 -07:00
Christoph Hellwig
381c043233 iomap: add a workaround for racy i_size updates on block devices
A szybot reproducer that does write I/O while truncating the size of a
block device can end up in clean_bdev_aliases, which tries to clean the
bdev aliases that it uses.  This is because iomap_to_bh automatically
sets the BH_New flag when outside of i_size.  For block devices updates
to i_size are racy and we can hit this case in a tiny race window,
leading to the eventual clean_bdev_aliases call.  Fix this by erroring
out of > i_size I/O on block devices.

Reported-by: syzbot+1fa947e7f09e136925b8@syzkaller.appspotmail.com
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: syzbot+1fa947e7f09e136925b8@syzkaller.appspotmail.com
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-09-25 08:55:00 -07:00
Jeff Layton
03dbab3bba
overlayfs: set ctime when setting mtime and atime
Nathan reported that he was seeing the new warning in
setattr_copy_mgtime pop when starting podman containers. Overlayfs is
trying to set the atime and mtime via notify_change without also
setting the ctime.

POSIX states that when the atime and mtime are updated via utimes() that
we must also update the ctime to the current time. The situation with
overlayfs copy-up is analogies, so add ATTR_CTIME to the bitmask.
notify_change will fill in the value.

Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Christian Brauner <brauner@kernel.org>
Acked-by: Amir Goldstein <amir73il@gmail.com>
Message-Id: <20230913-ctime-v1-1-c6bc509cbc27@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-09-25 14:53:54 +02:00
Christian Brauner
493c71926c
ntfs3: put resources during ntfs_fill_super()
During ntfs_fill_super() some resources are allocated that we need to
cleanup in ->put_super() such as additional inodes. When
ntfs_fill_super() fails these resources need to be cleaned up as well.

Reported-by: syzbot+2751da923b5eb8307b0b@syzkaller.appspotmail.com
Fixes: 78a06688a4 ("ntfs3: drop inode references in ntfs_put_super()")
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-09-25 14:12:42 +02:00
Jens Axboe
2d1b3bbc3d
ovl: disable IOCB_DIO_CALLER_COMP
overlayfs copies the kiocb flags when it sets up a new kiocb to handle
a write, but it doesn't properly support dealing with the deferred
caller completions of the kiocb. This means it doesn't get the final
write completion value, and hence will complete the write with '0' as
the result.

We could support the caller completions in overlayfs, but for now let's
just disable them in the generated write kiocb.

Reported-by: Zorro Lang <zlang@redhat.com>
Link: https://lore.kernel.org/io-uring/20230924142754.ejwsjen5pvyc32l4@dell-per750-06-vm-08.rhts.eng.pek2.redhat.com/
Fixes: 8c052fb300 ("iomap: support IOCB_DIO_CALLER_COMP")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Message-Id: <71897125-e570-46ce-946a-d4729725e28f@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-09-25 11:37:28 +02:00
Darrick J. Wong
537c013b14 xfs: fix reloading entire unlinked bucket lists
During review of the patcheset that provided reloading of the incore
iunlink list, Dave made a few suggestions, and I updated the copy in my
dev tree.  Unfortunately, I then got distracted by ... who even knows
what ... and forgot to backport those changes from my dev tree to my
release candidate branch.  I then sent multiple pull requests with stale
patches, and that's what was merged into -rc3.

So.

This patch re-adds the use of an unlocked iunlink list check to
determine if we want to allocate the resources to recreate the incore
list.  Since lost iunlinked inodes are supposed to be rare, this change
helps us avoid paying the transaction and AGF locking costs every time
we open any inode.

This also re-adds the shutdowns on failure, and re-applies the
restructuring of the inner loop in xfs_inode_reload_unlinked_bucket, and
re-adds a requested comment about the quotachecking code.

Retain the original RVB tag from Dave since there's no code change from
the last submission.

Fixes: 68b957f64f ("xfs: load uncached unlinked inodes into memory on demand")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-09-24 18:12:13 -07:00
Linus Torvalds
5edc6bb321 Tracing fixes for 6.6-rc2:
- Fix the "bytes" output of the per_cpu stat file
   The tracefs/per_cpu/cpu*/stats "bytes" was giving bogus values as the
   accounting was not accurate. It is suppose to show how many used bytes are
   still in the ring buffer, but even when the ring buffer was empty it would
   still show there were bytes used.
 
 - Fix a bug in eventfs where reading a dynamic event directory (open) and then
   creating a dynamic event that goes into that diretory screws up the accounting.
   On close, the newly created event dentry will get a "dput" without ever having
   a "dget" done for it. The fix is to allocate an array on dir open to save what
   dentries were actually "dget" on, and what ones to "dput" on close.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZQ9wihQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6quz4AP4vSFohvmAcTzC+sKP7gMLUvEmqL76+
 1pixXrQOIP5BrQEApUW3VnjqYgjZJR2ne0N4MvvmYElm/ylBhDd4JRrD3g8=
 =X9wd
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Fix the "bytes" output of the per_cpu stat file

   The tracefs/per_cpu/cpu*/stats "bytes" was giving bogus values as the
   accounting was not accurate. It is suppose to show how many used
   bytes are still in the ring buffer, but even when the ring buffer was
   empty it would still show there were bytes used.

 - Fix a bug in eventfs where reading a dynamic event directory (open)
   and then creating a dynamic event that goes into that diretory screws
   up the accounting.

   On close, the newly created event dentry will get a "dput" without
   ever having a "dget" done for it. The fix is to allocate an array on
   dir open to save what dentries were actually "dget" on, and what ones
   to "dput" on close.

* tag 'trace-v6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  eventfs: Remember what dentries were created on dir open
  ring-buffer: Fix bytes info in per_cpu buffer stats
2023-09-24 13:55:34 -07:00
Linus Torvalds
85eba5f175 13 hotfixes, 10 of which pertain to post-6.5 issues. The other 3 are
cc:stable.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZQ8hRwAKCRDdBJ7gKXxA
 jlK9AQDzT/FUQV3kIshsV1IwAKFcg7gtcFSN0vs+pV+e1+4tbQD/Z2OgfGFFsCSP
 X6uc2cYHc9DG5/o44iFgadW8byMssQs=
 =w+St
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2023-09-23-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "13 hotfixes, 10 of which pertain to post-6.5 issues. The other three
  are cc:stable"

* tag 'mm-hotfixes-stable-2023-09-23-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  proc: nommu: fix empty /proc/<pid>/maps
  filemap: add filemap_map_order0_folio() to handle order0 folio
  proc: nommu: /proc/<pid>/maps: release mmap read lock
  mm: memcontrol: fix GFP_NOFS recursion in memory.high enforcement
  pidfd: prevent a kernel-doc warning
  argv_split: fix kernel-doc warnings
  scatterlist: add missing function params to kernel-doc
  selftests/proc: fixup proc-empty-vm test after KSM changes
  revert "scripts/gdb/symbols: add specific ko module load command"
  selftests: link libasan statically for tests with -fsanitize=address
  task_work: add kerneldoc annotation for 'data' argument
  mm: page_alloc: fix CMA and HIGHATOMIC landing on the wrong buddy list
  sh: mm: re-add lost __ref to ioremap_prot() to fix modpost warning
2023-09-23 11:51:16 -07:00
Linus Torvalds
8565bdf8cd Six smb3 client fixes, including three for stable
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmUOSHkACgkQiiy9cAdy
 T1HAswwAmCPPUWgIiR4XqWiXOmWh60+4Q7xsmSVvRAixvTf79Tkif/1AmVYhnYls
 QvnbT5TIFPtFbnWHOIYSPxkjBlVRNTgSviSdOebZAv4aP3DJ+HhWqnv39ti7DFMt
 qbNn9j53czq5eLChkIiCfqbeg4I8ra+vqZeeDs5TaRFqMLZ5KlJGlVJjnf88/o8m
 cCjifqq53Bq1pZeaR1Q9aiwuzsSKazs4odEeFvwFSw0tsdjEj4Rk5vJqDcMnt460
 yIXzoUv3iGJ2oz7qi5uysGPOpLAbDqwbJ5+127qja3cbgHg3MJqwotheRuULYX8G
 Dz9hhgu5oGlvSO+fZXnsT2bQL+oqYUyql9EU5NBTF2gPf5M3E1vQaMNPj1cueNAg
 dfa3x3Ez/M1XuH2aptK3ePuPIQIZgMjpMC7BPBKaIvxtGcNxEIuL0s+TEQbjJ8R1
 /ybJv2i+LYfCrnjTDvH0Y4lTS40AHIcOwywQd6rbMuStK71+B7/bNYJkdKKK9PEF
 L39ZwXDj
 =Q4Rt
 -----END PGP SIGNATURE-----

Merge tag '6.6-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:
 "Six smb3 client fixes, including three for stable, from the SMB
  plugfest (testing event) this week:

   - Reparse point handling fix (found when investigating dir
     enumeration when fifo in dir)

   - Fix excessive thread creation for dir lease cleanup

   - UAF fix in negotiate path

   - remove duplicate error message mapping and fix confusing warning
     message

   - add dynamic trace point to improve debugging RDMA connection
     attempts"

* tag '6.6-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb3: fix confusing debug message
  smb: client: handle STATUS_IO_REPARSE_TAG_NOT_HANDLED
  smb3: remove duplicate error mapping
  cifs: Fix UAF in cifs_demultiplex_thread()
  smb3: do not start laundromat thread when dir leases  disabled
  smb3: Add dynamic trace points for RDMA (smbdirect) reconnect
2023-09-23 11:34:48 -07:00
Linus Torvalds
59c376d636 Fixes for 6.6-rc3:
* Return EIO on bad inputs to iomap_to_bh instead of BUGging, to deal
   less poorly with block device io racing with block device resizing.
 * Fix a stale page data exposure bug introduced in 6.6-rc1 when
   unsharing a file range that is not in the page cache.
 
 Signed-off-by: Darrick J. Wong <djwong@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZQsCFAAKCRBKO3ySh0YR
 pkhZAP9VHpjBn95MEai0dxAVjAi8IDcfwdzBuifBWlkQwnt6MAEAiHfHEDfN23o9
 4Xg9EDqa8IOSYwxphJYnYG73Luvi5QQ=
 =Xtnv
 -----END PGP SIGNATURE-----

Merge tag 'iomap-6.6-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull iomap fixes from Darrick Wong:

 - Return EIO on bad inputs to iomap_to_bh instead of BUGging, to deal
   less poorly with block device io racing with block device resizing

 - Fix a stale page data exposure bug introduced in 6.6-rc1 when
   unsharing a file range that is not in the page cache

* tag 'iomap-6.6-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  iomap: convert iomap_unshare_iter to use large folios
  iomap: don't skip reading in !uptodate folios when unsharing a range
  iomap: handle error conditions more gracefully in iomap_to_bh
2023-09-23 09:56:40 -07:00
Linus Torvalds
3abc79dce6 Bug fixes for 6.6-rc3:
* Fix an integer overflow bug when processing an fsmap call.
 
  * Fix crash due to CPU hot remove event racing with filesystem mount
    operation.
 
  * During read-only mount, XFS does not allow the contents of the log to be
    recovered when there are one or more unrecognized rcompat features in the
    primary superblock, since the log might have intent items which the kernel
    does not know how to process.
 
  * During recovery of log intent items, XFS now reserves log space sufficient
    for one cycle of a permanent transaction to execute. Otherwise, this could
    lead to livelocks due to non-availability of log space.
 
  * On an fs which has an ondisk unlinked inode list, trying to delete a file
    or allocating an O_TMPFILE file can cause the fs to the shutdown if the
    first inode in the ondisk inode list is not present in the inode cache.
    The bug is solved by explicitly loading the first inode in the ondisk
    unlinked inode list into the inode cache if it is not already cached.
 
    A similar problem arises when the uncached inode is present in the middle
    of the ondisk unlinked inode list. This second bug is triggered when
    executing operations like quotacheck and bulkstat. In this case, XFS now
    reads in the entire ondisk unlinked inode list.
 
  * Enable LARP mode only on recent v5 filesystems.
 
  * Fix a out of bounds memory access in scrub.
 
  * Fix a performance bug when locating the tail of the log during mounting a
    filesystem.
 
 Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQjMC4mbgVeU7MxEIYH7y4RirJu9AUCZQkx4QAKCRAH7y4RirJu
 9HrTAQD6QhvHkS43vueGOb4WISZPG/jMKJ/FjvwLZrIZ0erbJwEAtRWhClwFv3NZ
 exJFtsmxrKC6Vifuo0pvfoCiK5mUvQ8=
 =SrJR
 -----END PGP SIGNATURE-----

Merge tag 'xfs-6.6-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Chandan Babu:

 - Fix an integer overflow bug when processing an fsmap call

 - Fix crash due to CPU hot remove event racing with filesystem mount
   operation

 - During read-only mount, XFS does not allow the contents of the log to
   be recovered when there are one or more unrecognized rcompat features
   in the primary superblock, since the log might have intent items
   which the kernel does not know how to process

 - During recovery of log intent items, XFS now reserves log space
   sufficient for one cycle of a permanent transaction to execute.
   Otherwise, this could lead to livelocks due to non-availability of
   log space

 - On an fs which has an ondisk unlinked inode list, trying to delete a
   file or allocating an O_TMPFILE file can cause the fs to the shutdown
   if the first inode in the ondisk inode list is not present in the
   inode cache. The bug is solved by explicitly loading the first inode
   in the ondisk unlinked inode list into the inode cache if it is not
   already cached

   A similar problem arises when the uncached inode is present in the
   middle of the ondisk unlinked inode list. This second bug is
   triggered when executing operations like quotacheck and bulkstat. In
   this case, XFS now reads in the entire ondisk unlinked inode list

 - Enable LARP mode only on recent v5 filesystems

 - Fix a out of bounds memory access in scrub

 - Fix a performance bug when locating the tail of the log during
   mounting a filesystem

* tag 'xfs-6.6-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: use roundup_pow_of_two instead of ffs during xlog_find_tail
  xfs: only call xchk_stats_merge after validating scrub inputs
  xfs: require a relatively recent V5 filesystem for LARP mode
  xfs: make inode unlinked bucket recovery work with quotacheck
  xfs: load uncached unlinked inodes into memory on demand
  xfs: reserve less log space when recovering log intent items
  xfs: fix log recovery when unknown rocompat bits are set
  xfs: reload entire unlinked bucket lists
  xfs: allow inode inactivation during a ro mount log recovery
  xfs: use i_prev_unlinked to distinguish inodes that are not on the unlinked list
  xfs: remove CPU hotplug infrastructure
  xfs: remove the all-mounts list
  xfs: use per-mount cpumask to track nonempty percpu inodegc lists
  xfs: fix an agbno overflow in __xfs_getfsmap_datadev
  xfs: fix per-cpu CIL structure aggregation racing with dying cpus
  xfs: fix select in config XFS_ONLINE_SCRUB_STATS
2023-09-22 16:32:19 -07:00
Steven Rostedt (Google)
ef36b4f928 eventfs: Remember what dentries were created on dir open
Using the following code with libtracefs:

	int dfd;

	// create the directory events/kprobes/kp1
	tracefs_kprobe_raw(NULL, "kp1", "schedule_timeout", "time=$arg1");

	// Open the kprobes directory
	dfd = tracefs_instance_file_open(NULL, "events/kprobes", O_RDONLY);

	// Do a lookup of the kprobes/kp1 directory (by looking at enable)
	tracefs_file_exists(NULL, "events/kprobes/kp1/enable");

	// Now create a new entry in the kprobes directory
	tracefs_kprobe_raw(NULL, "kp2", "schedule_hrtimeout", "expires=$arg1");

	// Do another lookup to create the dentries
	tracefs_file_exists(NULL, "events/kprobes/kp2/enable"))

	// Close the directory
	close(dfd);

What happened above, the first open (dfd) will call
dcache_dir_open_wrapper() that will create the dentries and up their ref
counts.

Now the creation of "kp2" will add another dentry within the kprobes
directory.

Upon the close of dfd, eventfs_release() will now do a dput for all the
entries in kprobes. But this is where the problem lies. The open only
upped the dentry of kp1 and not kp2. Now the close is decrementing both
kp1 and kp2, which causes kp2 to get a negative count.

Doing a "trace-cmd reset" which deletes all the kprobes cause the kernel
to crash! (due to the messed up accounting of the ref counts).

To solve this, save all the dentries that are opened in the
dcache_dir_open_wrapper() into an array, and use this array to know what
dentries to do a dput on in eventfs_release().

Since the dcache_dir_open_wrapper() calls dcache_dir_open() which uses the
file->private_data, we need to also add a wrapper around dcache_readdir()
that uses the cursor assigned to the file->private_data. This is because
the dentries need to also be saved in the file->private_data. To do this
create the structure:

  struct dentry_list {
	void		*cursor;
	struct dentry	**dentries;
  };

Which will hold both the cursor and the dentries. Some shuffling around is
needed to make sure that dcache_dir_open() and dcache_readdir() only see
the cursor.

Link: https://lore.kernel.org/linux-trace-kernel/20230919211804.230edf1e@gandalf.local.home/
Link: https://lore.kernel.org/linux-trace-kernel/20230922163446.1431d4fa@gandalf.local.home

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ajay Kaher <akaher@vmware.com>
Fixes: 6394044955 ("eventfs: Implement eventfs lookup, read, open functions")
Reported-by: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-09-22 16:58:11 -04:00
Namjae Jeon
73f949ea87 ksmbd: check iov vector index in ksmbd_conn_write()
If ->iov_idx is zero, This means that the iov vector for the response
was not added during the request process. In other words, it means that
there is a problem in generating a response, So this patch return as
an error to avoid NULL pointer dereferencing problem.

Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-09-21 14:41:06 -05:00
Namjae Jeon
f2f11fca5d ksmbd: return invalid parameter error response if smb2 request is invalid
If smb2 request from client is invalid, The following kernel oops could
happen. The patch e2b76ab8b5: "ksmbd: add support for read compound"
leads this issue. When request is invalid, It doesn't set anything in
the response buffer. This patch add missing set invalid parameter error
response.

[  673.085542] ksmbd: cli req too short, len 184 not 142. cmd:5 mid:109
[  673.085580] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  673.085591] #PF: supervisor read access in kernel mode
[  673.085600] #PF: error_code(0x0000) - not-present page
[  673.085608] PGD 0 P4D 0
[  673.085620] Oops: 0000 [#1] PREEMPT SMP NOPTI
[  673.085631] CPU: 3 PID: 1039 Comm: kworker/3:0 Not tainted 6.6.0-rc2-tmt #16
[  673.085643] Hardware name: AZW U59/U59, BIOS JTKT001 05/05/2022
[  673.085651] Workqueue: ksmbd-io handle_ksmbd_work [ksmbd]
[  673.085719] RIP: 0010:ksmbd_conn_write+0x68/0xc0 [ksmbd]
[  673.085808] RAX: 0000000000000000 RBX: ffff88811ade4f00 RCX: 0000000000000000
[  673.085817] RDX: 0000000000000000 RSI: ffff88810c2a9780 RDI: ffff88810c2a9ac0
[  673.085826] RBP: ffffc900005e3e00 R08: 0000000000000000 R09: 0000000000000000
[  673.085834] R10: ffffffffa3168160 R11: 63203a64626d736b R12: ffff8881057c8800
[  673.085842] R13: ffff8881057c8820 R14: ffff8882781b2380 R15: ffff8881057c8800
[  673.085852] FS:  0000000000000000(0000) GS:ffff888278180000(0000) knlGS:0000000000000000
[  673.085864] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  673.085872] CR2: 0000000000000000 CR3: 000000015b63c000 CR4: 0000000000350ee0
[  673.085883] Call Trace:
[  673.085890]  <TASK>
[  673.085900]  ? show_regs+0x6a/0x80
[  673.085916]  ? __die+0x25/0x70
[  673.085926]  ? page_fault_oops+0x154/0x4b0
[  673.085938]  ? tick_nohz_tick_stopped+0x18/0x50
[  673.085954]  ? __irq_work_queue_local+0xba/0x140
[  673.085967]  ? do_user_addr_fault+0x30f/0x6c0
[  673.085979]  ? exc_page_fault+0x79/0x180
[  673.085992]  ? asm_exc_page_fault+0x27/0x30
[  673.086009]  ? ksmbd_conn_write+0x68/0xc0 [ksmbd]
[  673.086067]  ? ksmbd_conn_write+0x46/0xc0 [ksmbd]
[  673.086123]  handle_ksmbd_work+0x28d/0x4b0 [ksmbd]
[  673.086177]  process_one_work+0x178/0x350
[  673.086193]  ? __pfx_worker_thread+0x10/0x10
[  673.086202]  worker_thread+0x2f3/0x420
[  673.086210]  ? _raw_spin_unlock_irqrestore+0x27/0x50
[  673.086222]  ? __pfx_worker_thread+0x10/0x10
[  673.086230]  kthread+0x103/0x140
[  673.086242]  ? __pfx_kthread+0x10/0x10
[  673.086253]  ret_from_fork+0x39/0x60
[  673.086263]  ? __pfx_kthread+0x10/0x10
[  673.086274]  ret_from_fork_asm+0x1b/0x30

Fixes: e2b76ab8b5 ("ksmbd: add support for read compound")
Reported-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-09-21 14:41:06 -05:00
Linus Torvalds
b5cbe7c00a v6.6-rc3.vfs.ctime.revert
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZQsZLQAKCRCRxhvAZXjc
 op0vAP96hkSUnmXmxTr8GHId3yfElN8ZZ3aSfePeBdljjKEZVAEA2+cbHLy4GqRi
 TpjP1HNIdmtbVSC2ZnrgqkbwGageQgg=
 =s92y
 -----END PGP SIGNATURE-----

Merge tag 'v6.6-rc3.vfs.ctime.revert' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull finegrained timestamp reverts from Christian Brauner:
 "Earlier this week we sent a few minor fixes for the multi-grained
  timestamp work in [1]. While we were polishing those up after Linus
  realized that there might be a nicer way to fix them we received a
  regression report in [2] that fine grained timestamps break gnulib
  tests and thus possibly other tools.

  The kernel will elide fine-grain timestamp updates when no one is
  actively querying for them to avoid performance impacts. So a sequence
  like write(f1) stat(f2) write(f2) stat(f2) write(f1) stat(f1) may
  result in timestamp f1 to be older than the final f2 timestamp even
  though f1 was last written too but the second write didn't update the
  timestamp.

  Such plotholes can lead to subtle bugs when programs compare
  timestamps. For example, the nap() function in [2] will estimate that
  it needs to wait one ns on a fine-grain timestamp enabled filesytem
  between subsequent calls to observe a timestamp change. But in general
  we don't update timestamps with more than one jiffie if we think that
  no one is actively querying for fine-grain timestamps to avoid
  performance impacts.

  While discussing various fixes the decision was to go back to the
  drawing board and ultimately to explore a solution that involves only
  exposing such fine-grained timestamps to nfs internally and never to
  userspace.

  As there are multiple solutions discussed the honest thing to do here
  is not to fix this up or disable it but to cleanly revert. The general
  infrastructure will probably come back but there is no reason to keep
  this code in mainline.

  The general changes to timestamp handling are valid and a good cleanup
  that will stay. The revert is fully bisectable"

Link: https://lore.kernel.org/all/20230918-hirte-neuzugang-4c2324e7bae3@brauner [1]
Link: https://lore.kernel.org/all/bf0524debb976627693e12ad23690094e4514303.camel@linuxfromscratch.org [2]

* tag 'v6.6-rc3.vfs.ctime.revert' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  Revert "fs: add infrastructure for multigrain timestamps"
  Revert "btrfs: convert to multigrain timestamps"
  Revert "ext4: switch to multigrain timestamps"
  Revert "xfs: switch to multigrain timestamps"
  Revert "tmpfs: add support for multigrain timestamps"
2023-09-21 10:15:26 -07:00
Josef Bacik
b4c639f699 btrfs: initialize start_slot in btrfs_log_prealloc_extents
Jens reported a compiler warning when using
CONFIG_CC_OPTIMIZE_FOR_SIZE=y that looks like this

  fs/btrfs/tree-log.c: In function ‘btrfs_log_prealloc_extents’:
  fs/btrfs/tree-log.c:4828:23: warning: ‘start_slot’ may be used
  uninitialized [-Wmaybe-uninitialized]
   4828 |                 ret = copy_items(trans, inode, dst_path, path,
	|                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   4829 |                                  start_slot, ins_nr, 1, 0);
	|                                  ~~~~~~~~~~~~~~~~~~~~~~~~~
  fs/btrfs/tree-log.c:4725:13: note: ‘start_slot’ was declared here
   4725 |         int start_slot;
	|             ^~~~~~~~~~

The compiler is incorrect, as we only use this code when ins_len > 0,
and when ins_len > 0 we have start_slot properly initialized.  However
we generally find the -Wmaybe-uninitialized warnings valuable, so
initialize start_slot to get rid of the warning.

Reported-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-09-21 18:52:23 +02:00
Josef Bacik
20218dfbaa btrfs: make sure to initialize start and len in find_free_dev_extent
Jens reported a compiler error when using CONFIG_CC_OPTIMIZE_FOR_SIZE=y
that looks like this

  In function ‘gather_device_info’,
      inlined from ‘btrfs_create_chunk’ at fs/btrfs/volumes.c:5507:8:
  fs/btrfs/volumes.c:5245:48: warning: ‘dev_offset’ may be used uninitialized [-Wmaybe-uninitialized]
   5245 |                 devices_info[ndevs].dev_offset = dev_offset;
	|                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~
  fs/btrfs/volumes.c: In function ‘btrfs_create_chunk’:
  fs/btrfs/volumes.c:5196:13: note: ‘dev_offset’ was declared here
   5196 |         u64 dev_offset;

This occurs because find_free_dev_extent is responsible for setting
dev_offset, however if we get an -ENOMEM at the top of the function
we'll return without setting the value.

This isn't actually a problem because we will see the -ENOMEM in
gather_device_info() and return and not use the uninitialized value,
however we also just don't want the compiler warning so rework the code
slightly in find_free_dev_extent() to make sure it's always setting
*start and *len to avoid the compiler warning.

Reported-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-09-21 18:52:20 +02:00
Steve French
c8ebf077fb smb3: fix confusing debug message
The message said it was an invalid mode, when it was intentionally
not set.  Fix confusing message logged to dmesg.

Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-09-20 19:50:05 -05:00
Paulo Alcantara
7fb77d9c87 smb: client: handle STATUS_IO_REPARSE_TAG_NOT_HANDLED
Fix missing set of cifs_open_info_data::reparse_point when SMB2_CREATE
request fails with STATUS_IO_REPARSE_TAG_NOT_HANDLED.

Fixes: 5f71ebc412 ("smb: client: parse reparse point flag in create response")
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-09-20 16:12:09 -05:00
Steve French
6e2e27e47c smb3: remove duplicate error mapping
In status_to_posix_error STATUS_IO_REPARSE_TAG_NOT_HANDLED was mapped
to both -EOPNOTSUPP and also to -EIO but the later one (-EIO) is
ignored. Remove the duplicate.

Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-09-20 16:04:51 -05:00
Qu Wenruo
74ee79142c btrfs: reset destination buffer when read_extent_buffer() gets invalid range
Commit f98b6215d7 ("btrfs: extent_io: do extra check for extent buffer
read write functions") changed how we handle invalid extent buffer range
for read_extent_buffer().

Previously if the range is invalid we just set the destination to zero,
but after the patch we do nothing and error out.

This can lead to smatch static checker errors like:

  fs/btrfs/print-tree.c:186 print_uuid_item() error: uninitialized symbol 'subvol_id'.
  fs/btrfs/tests/extent-io-tests.c:338 check_eb_bitmap() error: uninitialized symbol 'has'.
  fs/btrfs/tests/extent-io-tests.c:353 check_eb_bitmap() error: uninitialized symbol 'has'.
  fs/btrfs/uuid-tree.c:203 btrfs_uuid_tree_remove() error: uninitialized symbol 'read_subid'.
  fs/btrfs/uuid-tree.c:353 btrfs_uuid_tree_iterate() error: uninitialized symbol 'subid_le'.
  fs/btrfs/uuid-tree.c:72 btrfs_uuid_tree_lookup() error: uninitialized symbol 'data'.
  fs/btrfs/volumes.c:7415 btrfs_dev_stats_value() error: uninitialized symbol 'val'.

Fix those warnings by reverting back to the old memset() behavior.
By this we keep the static checker happy and would still make a lot of
noise when such invalid ranges are passed in.

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Fixes: f98b6215d7 ("btrfs: extent_io: do extra check for extent buffer read write functions")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-09-20 20:44:57 +02:00
Josef Bacik
58bfe2ccec btrfs: properly report 0 avail for very full file systems
A user reported some issues with smaller file systems that get very
full.  While investigating this issue I noticed that df wasn't showing
100% full, despite having 0 chunk space and having < 1MiB of available
metadata space.

This turns out to be an overflow issue, we're doing:

  total_available_metadata_space - SZ_4M < global_block_rsv_size

to determine if there's not enough space to make metadata allocations,
which overflows if total_available_metadata_space is < 4M.  Fix this by
checking to see if our available space is greater than the 4M threshold.
This makes df properly report 100% usage on the file system.

CC: stable@vger.kernel.org # 4.14+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-09-20 20:44:40 +02:00
Filipe Manana
8ec0a4a577 btrfs: log message if extent item not found when running delayed extent op
When running a delayed extent operation, if we don't find the extent item
in the extent tree we just return -EIO without any logged message. This
indicates some bug or possibly a memory or fs corruption, so the return
value should not be -EIO but -EUCLEAN instead, and since it's not expected
to ever happen, print an informative error message so that if it happens
we have some idea of what went wrong, where to look at.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-09-20 20:42:58 +02:00
Filipe Manana
d2f79e6385 btrfs: remove redundant BUG_ON() from __btrfs_inc_extent_ref()
At __btrfs_inc_extent_ref() we are doing a BUG_ON() if we are dealing with
a tree block reference that has a reference count that is different from 1,
but we have already dealt with this case at run_delayed_tree_ref(), making
it useless. So remove the BUG_ON().

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-09-20 20:42:47 +02:00
Filipe Manana
1bf76df3fe btrfs: return -EUCLEAN for delayed tree ref with a ref count not equals to 1
When running a delayed tree reference, if we find a ref count different
from 1, we return -EIO. This isn't an IO error, as it indicates either a
bug in the delayed refs code or a memory corruption, so change the error
code from -EIO to -EUCLEAN. Also tag the branch as 'unlikely' as this is
not expected to ever happen, and change the error message to print the
tree block's bytenr without the parenthesis (and there was a missing space
between the 'block' word and the opening parenthesis), for consistency as
that's the style we used everywhere else.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-09-20 20:42:33 +02:00
Filipe Manana
a7ddeeb079 btrfs: prevent transaction block reserve underflow when starting transaction
When starting a transaction, with a non-zero number of items, we reserve
metadata space for that number of items and for delayed refs by doing a
call to btrfs_block_rsv_add(), with the transaction block reserve passed
as the block reserve argument. This reserves metadata space and adds it
to the transaction block reserve. Later we migrate the space we reserved
for delayed references from the transaction block reserve into the delayed
refs block reserve, by calling btrfs_migrate_to_delayed_refs_rsv().

btrfs_migrate_to_delayed_refs_rsv() decrements the number of bytes to
migrate from the source block reserve, and this however may result in an
underflow in case the space added to the transaction block reserve ended
up being used by another task that has not reserved enough space for its
own use - examples are tasks doing reflinks or hole punching because they
end up calling btrfs_replace_file_extents() -> btrfs_drop_extents() and
may need to modify/COW a variable number of leaves/paths, so they keep
trying to use space from the transaction block reserve when they need to
COW an extent buffer, and may end up trying to use more space then they
have reserved (1 unit/path only for removing file extent items).

This can be avoided by simply reserving space first without adding it to
the transaction block reserve, then add the space for delayed refs to the
delayed refs block reserve and finally add the remaining reserved space
to the transaction block reserve. This also makes the code a bit shorter
and simpler. So just do that.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-09-20 20:42:18 +02:00
Filipe Manana
2ed45c0f18 btrfs: fix race when refilling delayed refs block reserve
If we have two (or more) tasks attempting to refill the delayed refs block
reserve we can end up with the delayed block reserve being over reserved,
that is, with a reserved space greater than its size. If this happens, we
are holding to more reserved space than necessary for a while.

The race happens like this:

1) The delayed refs block reserve has a size of 8M and a reserved space of
   6M for example;

2) Task A calls btrfs_delayed_refs_rsv_refill();

3) Task B also calls btrfs_delayed_refs_rsv_refill();

4) Task A sees there's a 2M difference between the size and the reserved
   space of the delayed refs rsv, so it will reserve 2M of space by
   calling btrfs_reserve_metadata_bytes();

5) Task B also sees that 2M difference, and like task A, it reserves
   another 2M of metadata space;

6) Both task A and task B increase the reserved space of block reserve
   by 2M, by calling btrfs_block_rsv_add_bytes(), so the block reserve
   ends up with a size of 8M and a reserved space of 10M;

7) The extra, over reserved space will eventually be freed by some task
   calling btrfs_delayed_refs_rsv_release() -> btrfs_block_rsv_release()
   -> block_rsv_release_bytes(), as there we will detect the over reserve
   and release that space.

So fix this by checking if we still need to add space to the delayed refs
block reserve after reserving the metadata space, and if we don't, just
release that space immediately.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-09-20 20:42:08 +02:00
Linus Torvalds
a229cf67ab for-6.6-rc2-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmULIZUACgkQxWXV+ddt
 WDv77Q//ZiKpmevPQmQfUtmV8WwMfD2a9zRlKBpGggwtrD4mf3CYRLnOpTm81MPO
 vFIuYacBn+9UXqp2j/IbvNWfQAPQNVDxSPXx66uba93RJc+bB1J3TydxcEyJ7fr4
 dwhLLk01jttfk0+rnjF34fmXiHSTtI6D2WeaLCzUbaPLw4SZ+ul+GAdeF3P174iO
 OMNBUln7hK00Q7j8kFf4j6SW1yIIKMTl6MfOFJYanIqzx51PYFFVtKwoCr0Vt53v
 ZHbgrK582ZJO6pKF9kJF/1tqrY9/Df8jzgSypK8pew/SukMOrf7iVwrmhietuhKA
 92j5sxKhCRyq6Qg6ZwC0jyk+oMqrT8r+q3r38a5qDJx/9Q279vkXBqQnACfLjmnH
 6+sNdkY5/uBWnDMh/+d6yBtfbdW5DtuET4McYpJt1Nk2St/f3UzPaL4LcNkDXNPk
 t1Q4W4v0KS1V8TbsLfdD629CMghxQNKVs1XqyCAbUq9ub4LE2CtL3lDm730qZoZt
 +LM7+sAxEOJC6yqYfdEbcIc8l27Hl5nZEzamcvMrRz61N85/8Jx4Sq2b6VSE9TCE
 hNEWAL5sOjhuhmUPhatYC+KO1P6NDP+Yg99yZCZIT9s/P1oK5H+aETshWX+lvJ+Q
 Ai+qzKvp2ERHFcE+R5qIXs/uX7azpzjqsRZxY2/zdp70ugQDSXE=
 =0eEg
 -----END PGP SIGNATURE-----

Merge tag 'for-6.6-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "A few more followup fixes to the directory listing.

  People have noticed different behaviour compared to other filesystems
  after changes in 6.5. This is now unified to more "logical" and
  expected behaviour while still within POSIX. And a few more fixes for
  stable.

   - change behaviour of readdir()/rewinddir() when new directory
     entries are created after opendir(), properly tracking the last
     entry

   - fix race in readdir when multiple threads can set the last entry
     index for a directory

  Additionally:

   - use exclusive lock when direct io might need to drop privs and call
     notify_change()

   - don't clear uptodate bit on page after an error, this may lead to a
     deadlock in subpage mode

   - fix waiting pattern when multiple readers block on Merkle tree
     data, switch to folios"

* tag 'for-6.6-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix race between reading a directory and adding entries to it
  btrfs: refresh dir last index during a rewinddir(3) call
  btrfs: set last dir index to the current last index when opening dir
  btrfs: don't clear uptodate on write errors
  btrfs: file_remove_privs needs an exclusive lock in direct io write
  btrfs: convert btrfs_read_merkle_tree_page() to use a folio
2023-09-20 11:03:45 -07:00
Christian Brauner
647aa76828
Revert "fs: add infrastructure for multigrain timestamps"
This reverts commit ffb6cf19e0.

Users reported regressions due to enabling multi-grained timestamps
unconditionally. As no clear consensus on a solution has come up and the
discussion has gone back to the drawing board revert the infrastructure
changes for. If it isn't code that's here to stay, make it go away.

Message-ID: <20230920-keine-eile-c9755b5825db@brauner>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-09-20 18:05:31 +02:00
Christian Brauner
efd34f0316
Revert "btrfs: convert to multigrain timestamps"
This reverts commit 50e9ceef1d.

Users reported regressions due to enabling multi-grained timestamps
unconditionally. As no clear consensus on a solution has come up and the
discussion has gone back to the drawing board revert the infrastructure
changes for. If it isn't code that's here to stay, make it go away.

Message-ID: <20230920-keine-eile-c9755b5825db@brauner>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-09-20 18:05:31 +02:00
Christian Brauner
50ec1d721e
Revert "ext4: switch to multigrain timestamps"
This reverts commit 0269b58586.

Users reported regressions due to enabling multi-grained timestamps
unconditionally. As no clear consensus on a solution has come up and the
discussion has gone back to the drawing board revert the infrastructure
changes for. If it isn't code that's here to stay, make it go away.

Message-ID: <20230920-keine-eile-c9755b5825db@brauner>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-09-20 18:05:31 +02:00
Christian Brauner
f798accd59
Revert "xfs: switch to multigrain timestamps"
This reverts commit e44df26647.

Users reported regressions due to enabling multi-grained timestamps
unconditionally. As no clear consensus on a solution has come up and the
discussion has gone back to the drawing board revert the infrastructure
changes for. If it isn't code that's here to stay, make it go away.

Message-ID: <20230920-keine-eile-c9755b5825db@brauner>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-09-20 18:05:31 +02:00
Max Kellermann
ae81711c1e
fs/pipe: remove duplicate "offset" initializer
This code duplication was introduced by commit a194dfe6e6 ("pipe:
Rearrange sequence in pipe_write() to preallocate slot"), but since
the pipe's mutex is locked, nobody else can modify the value
meanwhile.

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Message-Id: <20230919074045.1066796-1-max.kellermann@ionos.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-09-20 14:22:01 +02:00
Chunhai Guo
be049c3a08
fs-writeback: do not requeue a clean inode having skipped pages
When writing back an inode and performing an fsync on it concurrently, a
deadlock issue may arise as shown below. In each writeback iteration, a
clean inode is requeued to the wb->b_dirty queue due to non-zero
pages_skipped, without anything actually being written. This causes an
infinite loop and prevents the plug from being flushed, resulting in a
deadlock. We now avoid requeuing the clean inode to prevent this issue.

    wb_writeback        fsync (inode-Y)
blk_start_plug(&plug)
for (;;) {
  iter i-1: some reqs with page-X added into plug->mq_list // f2fs node page-X with PG_writeback
                        filemap_fdatawrite
                          __filemap_fdatawrite_range // write inode-Y with sync_mode WB_SYNC_ALL
                           do_writepages
                            f2fs_write_data_pages
                             __f2fs_write_data_pages // wb_sync_req[DATA]++ for WB_SYNC_ALL
                              f2fs_write_cache_pages
                               f2fs_write_single_data_page
                                f2fs_do_write_data_page
                                 f2fs_outplace_write_data
                                  f2fs_update_data_blkaddr
                                   f2fs_wait_on_page_writeback
                                     wait_on_page_writeback // wait for f2fs node page-X
  iter i:
    progress = __writeback_inodes_wb(wb, work)
    . writeback_sb_inodes
    .   __writeback_single_inode // write inode-Y with sync_mode WB_SYNC_NONE
    .   . do_writepages
    .   .   f2fs_write_data_pages
    .   .   .  __f2fs_write_data_pages // skip writepages due to (wb_sync_req[DATA]>0)
    .   .   .   wbc->pages_skipped += get_dirty_pages(inode) // wbc->pages_skipped = 1
    .   if (!(inode->i_state & I_DIRTY_ALL)) // i_state = I_SYNC | I_SYNC_QUEUED
    .    total_wrote++;  // total_wrote = 1
    .   requeue_inode // requeue inode-Y to wb->b_dirty queue due to non-zero pages_skipped
    if (progress) // progress = 1
      continue;
  iter i+1:
      queue_io
      // similar process with iter i, infinite for-loop !
}
blk_finish_plug(&plug)   // flush plug won't be called

Signed-off-by: Chunhai Guo <guochunhai@vivo.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-Id: <20230916045131.957929-1-guochunhai@vivo.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-09-20 14:22:01 +02:00
Kees Cook
db7fcc884d
aio: Annotate struct kioctx_table with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).

As found with Coccinelle[1], add __counted_by for struct kioctx_table.

[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci

Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: linux-aio@kvack.org
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Message-Id: <20230915201413.never.881-kees@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-09-20 14:22:01 +02:00