IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Just a few syzbot fixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmZQk0cACgkQE6szbY3K
bna7gA/+MSY3I95CwaJ4bBq5SCxOaRcrX099LFh8Zrj+OF+DWE2PtVo1LhhgnYrQ
KpZrS2Q9Qgb2yVqYzOY6LBfH4il1O/WwvloMG0MbuYiQFu9/JL/6CEK9uFyiGmaC
fdiFEN3u+8AK6phTFaqUU2ncG0XFQ1Ple5zmFXo4Y3ZJeNaubJeEDac+kbRvOwYh
rQ6Iy0FNoQymv0BzmuM7g2NsbhdAgHTv7rhGbfpNBZv3lu0yDXsfZZgWTr2oXMSP
FMhm4bcTGAFp5hbwq9k56ND8oSFpamsH7SwS4bDlEe1CNOfMI1JjnrvSEuDrocAE
1Jn2J2Gv9NXnEHKamVzzpUILG67buEtYzJyDQk51N4kulgThdpRzjm+11ylD5U0U
wzIK1HXsKHtRdUiIhQGLCLW61FXM+0QBIk2eXhPq88jsM2zTL7iMbXR3P/nvgUDy
8ia8g5Q+nKxcb223M8WmK0rBwlaNasE/hXiFT54ntt8bK5nmVJjPMxVXUmYth3hw
7STkuT0k5jVsMG1NqLkg+wSupj1AuWbD2hIcas7GkxarEYAULbQcClHYGpMll3Tw
+pJfLjAtBOkcE4TwWDLflVBhwWtdmPNhk51Q3iLVRp0Gm7t0rhE2vE6TjpsIFnrg
rUAgaqQqQ2WXfsRaGa2wx0tRKoW+8Iigq13ndn1AZIrfEtQkYUs=
=vuNC
-----END PGP SIGNATURE-----
Merge tag 'bcachefs-2024-05-24' of https://evilpiepirate.org/git/bcachefs
Pull bcachefs fixes from Kent Overstreet:
"Nothing exciting, just syzbot fixes (except for the one
FMODE_CAN_ODIRECT patch).
Looks like syzbot reports have slowed down; this is all catch up from
two weeks of conferences.
Next hardening project is using Thomas's error injection tooling to
torture test repair"
* tag 'bcachefs-2024-05-24' of https://evilpiepirate.org/git/bcachefs:
bcachefs: Fix race path in bch2_inode_insert()
bcachefs: Ensure we're RW before journalling
bcachefs: Fix shutdown ordering
bcachefs: Fix unsafety in bch2_dirent_name_bytes()
bcachefs: Fix stack oob in __bch2_encrypt_bio()
bcachefs: Fix btree_trans leak in bch2_readahead()
bcachefs: Fix bogus verify_replicas_entry() assert
bcachefs: Check for subvolues with bogus snapshot/inode fields
bcachefs: bch2_checksum() returns 0 for unknown checksum type
bcachefs: Fix bch2_alloc_ciphers()
bcachefs: Add missing guard in bch2_snapshot_has_children()
bcachefs: Fix missing parens in drop_locks_do()
bcachefs: Improve bch2_assert_pos_locked()
bcachefs: Fix shift overflows in replicas.c
bcachefs: Fix shift overflow in btree_lost_data()
bcachefs: Fix ref in trans_mark_dev_sbs() error path
bcachefs: set FMODE_CAN_ODIRECT instead of a dummy direct_IO method
bcachefs: Fix rcu splat in check_fix_ptrs()
Bug fixes:
- The eventfs directories need to have unique inode numbers. Make sure that
they do not get the default file inode number.
- Update the inode uid and gid fields on remount.
When a remount happens where a uid and/or gid is specified, all the tracefs
files and directories should get the specified uid and/or gid. But this
can be sporadic when some uids were assigned already. There's already
a list of inodes that are allocated. Just update their uid and gid fields
at the time of remount.
- Update the eventfs_inodes on remount from the top level "events" descriptor.
There was a bug where not all the eventfs files or directories where
getting updated on remount. One fix was to clear the SAVED_UID/GID
flags from the inode list during the iteration of the inodes during
the remount. But because the eventfs inodes can be freed when the last
referenced is released, not all the eventfs_inodes were being updated.
This lead to the ownership selftest to fail if it was run a second
time (the first time would leave eventfs_inodes with no corresponding
tracefs_inode).
Instead, for eventfs_inodes, only process the "events" eventfs_inode
from the list iteration, as it is guaranteed to have a tracefs_inode
(it's never freed while the "events" directory exists). As it has
a list of its children, and the children have a list of their children,
just iterate all the eventfs_inodes from the "events" descriptor and
it is guaranteed to get all of them.
- Clear the EVENT_INODE flag from the tracefs_drop_inode() callback.
Currently the EVENTFS_INODE FLAG is cleared in the tracefs_d_iput()
callback. But this is the wrong location. The iput() callback is
called when the last reference to the dentry inode is hit. There could
be a case where two dentry's have the same inode, and the flag will
be cleared prematurely. The flag needs to be cleared when the last
reference of the inode is dropped and that happens in the inode's
drop_inode() callback handler.
Clean ups:
- Consolidate the creation of a tracefs_inode for an eventfs_inode
A tracefs_inode is created for both files and directories of the
eventfs system. It is open coded. Instead, consolidate it into a
single eventfs_get_inode() function call.
- Remove the eventfs getattr and permission callbacks.
The permissions for the eventfs files and directories are updated
when the inodes are created, on remount, and when the user sets
them (via setattr). The inodes hold the current permissions so
there is no need to have custom getattr or permissions callbacks
as they will more likely cause them to be incorrect. The inode's
permissions are updated when they should be updated. Remove the
getattr and permissions inode callbacks.
- Do not update eventfs_inode attributes on creation of inodes.
The eventfs_inodes attribute field is used to store the permissions
of the directories and files for when their corresponding inodes
are freed and are created again. But when the creation of the inodes
happen, the eventfs_inode attributes are recalculated. The
recalculation should only happen when the permissions change for
a given file or directory. Currently, the attribute changes are
just being set to their current files so this is not a bug, but
it's unnecessary and error prone. Stop doing that.
- The events directory inode is created once when the events directory
is created and deleted when it is deleted. It is now updated on
remount and when the user changes the permissions. There's no need
to use the eventfs_inode of the events directory to store the
events directory permissions. But using it to store the default
permissions for the files within the directory that have not been
updated by the user can simplify the code.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZk+0ohQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qtWOAQCSdEsWYiNcFBqvKp1kSI+dH1sKfur3
CAoe1trzDEdv/gEAsFkophR9OBzO193in4ZQYNKdEDfeaicEaDctzLxlkwY=
=9zqq
-----END PGP SIGNATURE-----
Merge tag 'trace-tracefs-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracefs/eventfs updates from Steven Rostedt:
"Bug fixes:
- The eventfs directories need to have unique inode numbers. Make
sure that they do not get the default file inode number.
- Update the inode uid and gid fields on remount.
When a remount happens where a uid and/or gid is specified, all the
tracefs files and directories should get the specified uid and/or
gid. But this can be sporadic when some uids were assigned already.
There's already a list of inodes that are allocated. Just update
their uid and gid fields at the time of remount.
- Update the eventfs_inodes on remount from the top level "events"
descriptor.
There was a bug where not all the eventfs files or directories
where getting updated on remount. One fix was to clear the
SAVED_UID/GID flags from the inode list during the iteration of the
inodes during the remount. But because the eventfs inodes can be
freed when the last referenced is released, not all the
eventfs_inodes were being updated. This lead to the ownership
selftest to fail if it was run a second time (the first time would
leave eventfs_inodes with no corresponding tracefs_inode).
Instead, for eventfs_inodes, only process the "events"
eventfs_inode from the list iteration, as it is guaranteed to have
a tracefs_inode (it's never freed while the "events" directory
exists). As it has a list of its children, and the children have a
list of their children, just iterate all the eventfs_inodes from
the "events" descriptor and it is guaranteed to get all of them.
- Clear the EVENT_INODE flag from the tracefs_drop_inode() callback.
Currently the EVENTFS_INODE FLAG is cleared in the tracefs_d_iput()
callback. But this is the wrong location. The iput() callback is
called when the last reference to the dentry inode is hit. There
could be a case where two dentry's have the same inode, and the
flag will be cleared prematurely. The flag needs to be cleared when
the last reference of the inode is dropped and that happens in the
inode's drop_inode() callback handler.
Cleanups:
- Consolidate the creation of a tracefs_inode for an eventfs_inode
A tracefs_inode is created for both files and directories of the
eventfs system. It is open coded. Instead, consolidate it into a
single eventfs_get_inode() function call.
- Remove the eventfs getattr and permission callbacks.
The permissions for the eventfs files and directories are updated
when the inodes are created, on remount, and when the user sets
them (via setattr). The inodes hold the current permissions so
there is no need to have custom getattr or permissions callbacks as
they will more likely cause them to be incorrect. The inode's
permissions are updated when they should be updated. Remove the
getattr and permissions inode callbacks.
- Do not update eventfs_inode attributes on creation of inodes.
The eventfs_inodes attribute field is used to store the permissions
of the directories and files for when their corresponding inodes
are freed and are created again. But when the creation of the
inodes happen, the eventfs_inode attributes are recalculated. The
recalculation should only happen when the permissions change for a
given file or directory. Currently, the attribute changes are just
being set to their current files so this is not a bug, but it's
unnecessary and error prone. Stop doing that.
- The events directory inode is created once when the events
directory is created and deleted when it is deleted. It is now
updated on remount and when the user changes the permissions.
There's no need to use the eventfs_inode of the events directory to
store the events directory permissions. But using it to store the
default permissions for the files within the directory that have
not been updated by the user can simplify the code"
* tag 'trace-tracefs-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
eventfs: Do not use attributes for events directory
eventfs: Cleanup permissions in creation of inodes
eventfs: Remove getattr and permission callbacks
eventfs: Consolidate the eventfs_inode update in eventfs_get_inode()
tracefs: Clear EVENT_INODE flag in tracefs_drop_inode()
eventfs: Update all the eventfs_inodes from the events descriptor
tracefs: Update inode permissions on remount
eventfs: Keep the directories from having the same inode number as files
Highlights include:
Stable fixes:
- nfs: fix undefined behavior in nfs_block_bits()
- NFSv4.2: Fix READ_PLUS when server doesn't support OP_READ_PLUS
Bugfixes:
- Fix mixing of the lock/nolock and local_lock mount options
- NFSv4: Fixup smatch warning for ambiguous return
- NFSv3: Fix remount when using the legacy binary mount api
- SUNRPC: Fix the handling of expired RPCSEC_GSS contexts
- SUNRPC: fix the NFSACL RPC retries when soft mounts are enabled
- rpcrdma: fix handling for RDMA_CM_EVENT_DEVICE_REMOVAL
Features and cleanups:
- NFSv3: Use the atomic_open API to fix open(O_CREAT|O_TRUNC)
- pNFS/filelayout: S layout segment range in LAYOUTGET
- pNFS: rework pnfs_generic_pg_check_layout to check IO range
- NFSv2: Turn off enabling of NFS v2 by default
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmZPpYMACgkQZwvnipYK
APITOw//acjE9YTZcST9kgkf2bfwuHFcdxvMZAr4MV0YsfqMesU2MYmaK/5YMLyo
iNCHjLmlfE2iLAUqvFtakc1F3guACJqqFfMdnMHa1MwPznrL3yNNClGnBamovbPd
XK2MBgpQBXb+xLxqH0A2TtOK2ofk0CFzb3x9eaziox8omBM2j3v6ZARsDHYehuhM
Hig8IxW/kZ7kx5jxqSVktrgW3gDKqIuLssF6fJVINzh45jHC5QO98cuSwetx6Mi1
Aw04HbOE6B66ORrzC1wyGN3PwOkTW2kgAiyB6UNNt+Hnvr0RD5TEqf3s3mzmhP9N
7LJ3H1Okxdcpn0G/bR4LBUg26r5BWxhfPiTYG/l9vAQk65yt2LO1kFzXbECBEfaG
ctGG7/7mMLVPs05kIFYm5S0cIYW2dYNuE20JY50LMaCIopjThdfruQj3yR4xibSt
bHrAbG9wW4qg/cgx860t5h7nbZnD5OOYIqKOCDRNrUfP7P0mK/tD49HggLjDo47M
vIMlYS3bTNSF7uEPTrv6bFr8XOD1I3BVXDQwGaJMZ8zyhkUIQtKO70+i4xM1E/Wl
Jw5Z6NpM8saDD449ZqX4IRUPDAhvz4v00QqD3Tqr4MHEc5sWi898S7XcJgL3bEai
QMJmBkAK8aDAP/suPw8VQc9wqplFNlB+QEh87p2WO+yRoEucn+A=
=HMSC
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-6.10-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Stable fixes:
- nfs: fix undefined behavior in nfs_block_bits()
- NFSv4.2: Fix READ_PLUS when server doesn't support OP_READ_PLUS
Bugfixes:
- Fix mixing of the lock/nolock and local_lock mount options
- NFSv4: Fixup smatch warning for ambiguous return
- NFSv3: Fix remount when using the legacy binary mount api
- SUNRPC: Fix the handling of expired RPCSEC_GSS contexts
- SUNRPC: fix the NFSACL RPC retries when soft mounts are enabled
- rpcrdma: fix handling for RDMA_CM_EVENT_DEVICE_REMOVAL
Features and cleanups:
- NFSv3: Use the atomic_open API to fix open(O_CREAT|O_TRUNC)
- pNFS/filelayout: S layout segment range in LAYOUTGET
- pNFS: rework pnfs_generic_pg_check_layout to check IO range
- NFSv2: Turn off enabling of NFS v2 by default"
* tag 'nfs-for-6.10-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
nfs: fix undefined behavior in nfs_block_bits()
pNFS: rework pnfs_generic_pg_check_layout to check IO range
pNFS/filelayout: check layout segment range
pNFS/filelayout: fixup pNfs allocation modes
rpcrdma: fix handling for RDMA_CM_EVENT_DEVICE_REMOVAL
NFS: Don't enable NFS v2 by default
NFS: Fix READ_PLUS when server doesn't support OP_READ_PLUS
sunrpc: fix NFSACL RPC retry on soft mount
SUNRPC: fix handling expired GSS context
nfs: keep server info for remounts
NFSv4: Fixup smatch warning for ambiguous return
NFS: make sure lock/nolock overriding local_lock mount option
NFS: add atomic_open for NFSv3 to handle O_TRUNC correctly.
pNFS/filelayout: Specify the layout segment range in LAYOUTGET
pNFS/filelayout: Remove the whole file layout requirement
The __assign_str() macro logic of the TRACE_EVENT() macro was optimized so
that it no longer needs the second argument. The __assign_str() is always
matched with __string() field that takes a field name and the source for
that field:
__string(field, source)
The TRACE_EVENT() macro logic will save off the source value and then use
that value to copy into the ring buffer via the __assign_str(). Before
commit c1fa617cae ("tracing: Rework __assign_str() and __string() to not
duplicate getting the string"), the __assign_str() needed the second
argument which would perform the same logic as the __string() source
parameter did. Not only would this add overhead, but it was error prone as
if the __assign_str() source produced something different, it may not have
allocated enough for the string in the ring buffer (as the __string()
source was used to determine how much to allocate)
Now that the __assign_str() just uses the same string that was used in
__string() it no longer needs the source parameter. It can now be removed.
-----BEGIN PGP SIGNATURE-----
iIkEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZk9RMBQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qur+AP9jbSYaGhzZdJ7a3HGA8M4l6JNju8nC
GcX1JpJT4z1qvgD3RkoNvP87etDAUAqmbVhVWnUHCY/vTqr9uB/gqmG6Ag==
=Y+6f
-----END PGP SIGNATURE-----
Merge tag 'trace-assign-str-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing cleanup from Steven Rostedt:
"Remove second argument of __assign_str()
The __assign_str() macro logic of the TRACE_EVENT() macro was
optimized so that it no longer needs the second argument. The
__assign_str() is always matched with __string() field that takes a
field name and the source for that field:
__string(field, source)
The TRACE_EVENT() macro logic will save off the source value and then
use that value to copy into the ring buffer via the __assign_str().
Before commit c1fa617cae ("tracing: Rework __assign_str() and
__string() to not duplicate getting the string"), the __assign_str()
needed the second argument which would perform the same logic as the
__string() source parameter did. Not only would this add overhead, but
it was error prone as if the __assign_str() source produced something
different, it may not have allocated enough for the string in the ring
buffer (as the __string() source was used to determine how much to
allocate)
Now that the __assign_str() just uses the same string that was used in
__string() it no longer needs the source parameter. It can now be
removed"
* tag 'trace-assign-str-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/treewide: Remove second parameter of __assign_str()
Several new features here:
- virtio-net is finally supported in vduse.
- Virtio (balloon and mem) interaction with suspend is improved
- vhost-scsi now handles signals better/faster.
Fixes, cleanups all over the place.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmZN570PHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRp2JUH/1K3fZOHymop6Y5Z3USFS7YdlF+dniedY/vg
TKyWERkXOlxq1d9DVxC0mN7tk72DweuWI0YJjLXofrEW1VuW29ecSbyFXxpeWJls
b7ErffxDAFRas5jkMCngD8TuFnbEegU0mGP5kbiHpEndBydQ2hH99Gg0x7swW+cE
xsvU5zonCCLwLGIP2DrVrn9qGOHtV6o8eZfVKDVXfvicn3lFBkUSxlwEYsO9RMup
aKxV4FT2Pb1yBicwBK4TH1oeEXqEGy1YLEn+kAHRbgoC/5L0/LaiqrkzwzwwOIPj
uPGkacf8CIbX0qZo5EzD8kvfcYL1xhU3eT9WBmpp2ZwD+4bINd4=
=nax1
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
"Several new features here:
- virtio-net is finally supported in vduse
- virtio (balloon and mem) interaction with suspend is improved
- vhost-scsi now handles signals better/faster
And fixes, cleanups all over the place"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (48 commits)
virtio-pci: Check if is_avq is NULL
virtio: delete vq in vp_find_vqs_msix() when request_irq() fails
MAINTAINERS: add Eugenio Pérez as reviewer
vhost-vdpa: Remove usage of the deprecated ida_simple_xx() API
vp_vdpa: don't allocate unused msix vectors
sound: virtio: drop owner assignment
fuse: virtio: drop owner assignment
scsi: virtio: drop owner assignment
rpmsg: virtio: drop owner assignment
nvdimm: virtio_pmem: drop owner assignment
wifi: mac80211_hwsim: drop owner assignment
vsock/virtio: drop owner assignment
net: 9p: virtio: drop owner assignment
net: virtio: drop owner assignment
net: caif: virtio: drop owner assignment
misc: nsm: drop owner assignment
iommu: virtio: drop owner assignment
drm/virtio: drop owner assignment
gpio: virtio: drop owner assignment
firmware: arm_scmi: virtio: drop owner assignment
...
The top "events" directory has a static inode (it's created when it is and
removed when the directory is removed). There's no need to use the events
ei->attr to determine its permissions. But it is used for saving the
permissions of the "events" directory for when it is created, as that is
needed for the default permissions for the files and directories
underneath it.
For example:
# cd /sys/kernel/tracing
# mkdir instances/foo
# chown 1001 instances/foo/events
The files under instances/foo/events should still have the same owner as
instances/foo (which the instances/foo/events ei->attr will hold), but the
events directory now has owner 1001.
Link: https://lore.kernel.org/lkml/20240522165032.104981011@goodmis.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The permissions being set during the creation of the inodes was updating
eventfs_inode attributes as well. Those attributes should only be touched
by the setattr or remount operations, not during the creation of inodes.
The eventfs_inode attributes should only be used to set the inodes and
should not be modified during the inode creation.
Simplify the code and fix the situation by:
1) Removing the eventfs_find_events() and doing a simple lookup for
the events descriptor in eventfs_get_inode()
2) Remove update_events_attr() as the attributes should only be used
to update the inode and should not be modified here.
3) Add update_inode_attr() that uses the attributes to determine what
the inode permissions should be.
4) As the parent_inode of the eventfs_root_inode structure is no longer
needed, remove it.
Now on creation, the inode gets the proper permissions without causing
side effects to the ei->attr field.
Link: https://lore.kernel.org/lkml/20240522165031.944088388@goodmis.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Now that inodes have their permissions updated on remount, the only other
places to update the inode permissions are when they are created and in
the setattr callback. The getattr and permission callbacks are not needed
as the inodes should already be set at their proper settings.
Remove the callbacks, as it not only simplifies the code, but also allows
more flexibility to fix the inconsistencies with various corner cases
(like changing the permission of an instance directory).
Link: https://lore.kernel.org/lkml/20240522165031.782066021@goodmis.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
To simplify the code, create a eventfs_get_inode() that is used when an
eventfs file or directory is created. Have the internal tracefs_inode
updated the appropriate flags in this function and update the inode's
mode as well.
Link: https://lore.kernel.org/lkml/20240522165031.624864160@goodmis.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
When the inode is being dropped from the dentry, the TRACEFS_EVENT_INODE
flag needs to be cleared to prevent a remount from calling
eventfs_remount() on the tracefs_inode private data. There's a race
between the inode is dropped (and the dentry freed) to where the inode is
actually freed. If a remount happens between the two, the eventfs_inode
could be accessed after it is freed (only the dentry keeps a ref count on
it).
Currently the TRACEFS_EVENT_INODE flag is cleared from the dentry iput()
function. But this is incorrect, as it is possible that the inode has
another reference to it. The flag should only be cleared when the inode is
really being dropped and has no more references. That happens in the
drop_inode callback of the inode, as that gets called when the last
reference of the inode is released.
Remove the tracefs_d_iput() function and move its logic to the more
appropriate tracefs_drop_inode() callback function.
Link: https://lore.kernel.org/linux-trace-kernel/20240523051539.908205106@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Fixes: baa23a8d43 ("tracefs: Reset permissions on remount if permissions are options")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The change to update the permissions of the eventfs_inode had the
misconception that using the tracefs_inode would find all the
eventfs_inodes that have been updated and reset them on remount.
The problem with this approach is that the eventfs_inodes are freed when
they are no longer used (basically the reason the eventfs system exists).
When they are freed, the updated eventfs_inodes are not reset on a remount
because their tracefs_inodes have been freed.
Instead, since the events directory eventfs_inode always has a
tracefs_inode pointing to it (it is not freed when finished), and the
events directory has a link to all its children, have the
eventfs_remount() function only operate on the events eventfs_inode and
have it descend into its children updating their uid and gids.
Link: https://lore.kernel.org/all/CAK7LNARXgaWw3kH9JgrnH4vK6fr8LDkNKf3wq8NhMWJrVwJyVQ@mail.gmail.com/
Link: https://lore.kernel.org/linux-trace-kernel/20240523051539.754424703@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: baa23a8d43 ("tracefs: Reset permissions on remount if permissions are options")
Reported-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
When a remount happens, if a gid or uid is specified update the inodes to
have the same gid and uid. This will allow the simplification of the
permissions logic for the dynamically created files and directories.
Link: https://lore.kernel.org/linux-trace-kernel/20240523051539.592429986@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Fixes: baa23a8d43 ("tracefs: Reset permissions on remount if permissions are options")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The directories require unique inode numbers but all the eventfs files
have the same inode number. Prevent the directories from having the same
inode numbers as the files as that can confuse some tooling.
Link: https://lore.kernel.org/linux-trace-kernel/20240523051539.428826685@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Fixes: 834bf76add ("eventfs: Save directory inodes in the eventfs_inode structure")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Bergmann which enables a number of additional build-time warnings. We
fixed all the fallout which we could find, there may still be a few
stragglers.
- Samuel Holland has developed the series "Unified cross-architecture
kernel-mode FPU API". This does a lot of consolidation of
per-architecture kernel-mode FPU usage and enables the use of newer AMD
GPUs on RISC-V.
- Tao Su has fixed some selftests build warnings in the series
"Selftests: Fix compilation warnings due to missing _GNU_SOURCE
definition".
- This pull also includes a nilfs2 fixup from Ryusuke Konishi.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZk6OSAAKCRDdBJ7gKXxA
jpTGAP9hQaZ+g7CO38hKQAtEI8rwcZJtvUAP84pZEGMjYMGLxQD/S8z1o7UHx61j
DUbnunbOkU/UcPx3Fs/gp4KcJARMEgs=
=EPi9
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2024-05-22-17-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more non-mm updates from Andrew Morton:
- A series ("kbuild: enable more warnings by default") from Arnd
Bergmann which enables a number of additional build-time warnings. We
fixed all the fallout which we could find, there may still be a few
stragglers.
- Samuel Holland has developed the series "Unified cross-architecture
kernel-mode FPU API". This does a lot of consolidation of
per-architecture kernel-mode FPU usage and enables the use of newer
AMD GPUs on RISC-V.
- Tao Su has fixed some selftests build warnings in the series
"Selftests: Fix compilation warnings due to missing _GNU_SOURCE
definition".
- This pull also includes a nilfs2 fixup from Ryusuke Konishi.
* tag 'mm-nonmm-stable-2024-05-22-17-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (23 commits)
nilfs2: make block erasure safe in nilfs_finish_roll_forward()
selftests/harness: use 1024 in place of LINE_MAX
Revert "selftests/harness: remove use of LINE_MAX"
selftests/fpu: allow building on other architectures
selftests/fpu: move FP code to a separate translation unit
drm/amd/display: use ARCH_HAS_KERNEL_FPU_SUPPORT
drm/amd/display: only use hard-float, not altivec on powerpc
riscv: add support for kernel-mode FPU
x86: implement ARCH_HAS_KERNEL_FPU_SUPPORT
powerpc: implement ARCH_HAS_KERNEL_FPU_SUPPORT
LoongArch: implement ARCH_HAS_KERNEL_FPU_SUPPORT
lib/raid6: use CC_FLAGS_FPU for NEON CFLAGS
arm64: crypto: use CC_FLAGS_FPU for NEON CFLAGS
arm64: implement ARCH_HAS_KERNEL_FPU_SUPPORT
ARM: crypto: use CC_FLAGS_FPU for NEON CFLAGS
ARM: implement ARCH_HAS_KERNEL_FPU_SUPPORT
arch: add ARCH_HAS_KERNEL_FPU_SUPPORT
x86/fpu: fix asm/fpu/types.h include guard
kbuild: enable -Wcast-function-type-strict unconditionally
kbuild: enable -Wformat-truncation on clang
...
__destroy_new_inode() is appropriate when we have _just_allocated the
inode, but not when it's been fully initialized and on i_sb_list.
Reported-by: syzbot+a0ddc9873c280a4cb18f@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
With the rework of how the __string() handles dynamic strings where it
saves off the source string in field in the helper structure[1], the
assignment of that value to the trace event field is stored in the helper
value and does not need to be passed in again.
This means that with:
__string(field, mystring)
Which use to be assigned with __assign_str(field, mystring), no longer
needs the second parameter and it is unused. With this, __assign_str()
will now only get a single parameter.
There's over 700 users of __assign_str() and because coccinelle does not
handle the TRACE_EVENT() macro I ended up using the following sed script:
git grep -l __assign_str | while read a ; do
sed -e 's/\(__assign_str([^,]*[^ ,]\) *,[^;]*/\1)/' $a > /tmp/test-file;
mv /tmp/test-file $a;
done
I then searched for __assign_str() that did not end with ';' as those
were multi line assignments that the sed script above would fail to catch.
Note, the same updates will need to be done for:
__assign_str_len()
__assign_rel_str()
__assign_rel_str_len()
I tested this with both an allmodconfig and an allyesconfig (build only for both).
[1] https://lore.kernel.org/linux-trace-kernel/20240222211442.634192653@goodmis.org/
Link: https://lore.kernel.org/linux-trace-kernel/20240516133454.681ba6a0@rorschach.local.home
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Christian König <christian.koenig@amd.com> for the amdgpu parts.
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #for
Acked-by: Rafael J. Wysocki <rafael@kernel.org> # for thermal
Acked-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Darrick J. Wong <djwong@kernel.org> # xfs
Tested-by: Guenter Roeck <linux@roeck-us.net>
the btree key cache uses the srcu struct created/destroyed by
btree_iter.c; btree_iter needs to be exited last.
Reported-by: syzbot+3af9daea347788b15213@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
verify_replicas_entry() is only for newly created replicas entries -
existing entries on disk may have unknown data types, and we have real
verifiers for them.
Reported-by: syzbot+73414091bd382684ee2b@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Here is the small set of driver core and kernfs changes for 6.10-rc1.
Nothing major here at all, just a small set of changes for some driver
core apis, and minor fixups. Included in here are:
- sysfs_bin_attr_simple_read() helper added and used
- device_show_string() helper added and used
All usages of these were acked by the various maintainers. Also in here
are:
- kernfs minor cleanup
- removed unused functions
- typo fix in documentation
- pay attention to sysfs_create_link() failures in module.c finally.
All of these have been in linux-next for a very long time with no
reported problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZk3+hQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ylfTwCfUyHWkDZuZ7ehdtjzfmcd4EKZBK8An3AAV99G
ox8PXMxuFTaUEdT/69FQ
=2sEo
-----END PGP SIGNATURE-----
Merge tag 'driver-core-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the small set of driver core and kernfs changes for 6.10-rc1.
Nothing major here at all, just a small set of changes for some driver
core apis, and minor fixups. Included in here are:
- sysfs_bin_attr_simple_read() helper added and used
- device_show_string() helper added and used
All usages of these were acked by the various maintainers. Also in
here are:
- kernfs minor cleanup
- removed unused functions
- typo fix in documentation
- pay attention to sysfs_create_link() failures in module.c finally
All of these have been in linux-next for a very long time with no
reported problems"
* tag 'driver-core-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
device property: Fix a typo in the description of device_get_child_node_count()
kernfs: mount: Remove unnecessary ‘NULL’ values from knparent
scsi: Use device_show_string() helper for sysfs attributes
platform/x86: Use device_show_string() helper for sysfs attributes
perf: Use device_show_string() helper for sysfs attributes
IB/qib: Use device_show_string() helper for sysfs attributes
hwmon: Use device_show_string() helper for sysfs attributes
driver core: Add device_show_string() helper for sysfs attributes
treewide: Use sysfs_bin_attr_simple_read() helper
sysfs: Add sysfs_bin_attr_simple_read() helper
module: don't ignore sysfs_create_link() failures
driver core: Remove unused platform_notify, platform_notify_remove
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCZk3w7AAKCRDh3BK/laaZ
PFlfAQD7puZ3BowTZ6cTCJ0Zg6U9wNMszlYHl3WyDnYDicaiVgD9HTqlv1pbNoFh
e5hyXI4vgaMZkYfpET0zBhVkirSKEg4=
=1vRI
-----END PGP SIGNATURE-----
Merge tag 'ovl-update-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs
Pull overlayfs updates from Miklos Szeredi:
- Add tmpfile support
- Clean up include
* tag 'ovl-update-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
ovl: remove duplicate included header
ovl: remove upper umask handling from ovl_create_upper()
ovl: implement tmpfile
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCZk231AAKCRDh3BK/laaZ
PKarAP9yWQ6HfqiWdMlW5gO9FLpm0Cbey6DAs1cisAdw86uLbwEAqpYKPXIcOaHX
IIzCu5R5tbwNHUxADtzznC9r9o/ZHQY=
=KiAB
-----END PGP SIGNATURE-----
Merge tag 'fuse-update-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse updates from Miklos Szeredi:
- Add fs-verity support (Richard Fung)
- Add multi-queue support to virtio-fs (Peter-Jan Gootzen)
- Fix a bug in NOTIFY_RESEND handling (Hou Tao)
- page -> folio cleanup (Matthew Wilcox)
* tag 'fuse-update-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
virtio-fs: add multi-queue support
virtio-fs: limit number of request queues
fuse: clear FR_SENT when re-adding requests into pending list
fuse: set FR_PENDING atomically in fuse_resend()
fuse: Add initial support for fs-verity
fuse: Convert fuse_readpages_end() to use folio_end_read()
Our applications, built on Elasticsearch[0], frequently create and
delete files. These applications operate within containers, some with a
memory limit exceeding 100GB. Over prolonged periods, the accumulation
of negative dentries within these containers can amount to tens of
gigabytes.
Upon container exit, directories are deleted. However, due to the
numerous associated dentries, this process can be time-consuming. Our
users have expressed frustration with this prolonged exit duration,
which constitutes our first issue.
Simultaneously, other processes may attempt to access the parent
directory of the Elasticsearch directories. Since the task responsible
for deleting the dentries holds the inode lock, processes attempting
directory lookup experience significant delays. This issue, our second
problem, is easily demonstrated:
- Task 1 generates negative dentries:
$ pwd
~/test
$ mkdir es && cd es/ && ./create_and_delete_files.sh
[ After generating tens of GB dentries ]
$ cd ~/test && rm -rf es
[ It will take a long duration to finish ]
- Task 2 attempts to lookup the 'test/' directory
$ pwd
~/test
$ ls
The 'ls' command in Task 2 experiences prolonged execution as Task 1
is deleting the dentries.
We've devised a solution to address both issues by deleting associated
dentry when removing a file. Interestingly, we've noted that a similar
patch was proposed years ago[1], although it was rejected citing the
absence of tangible issues caused by negative dentries. Given our
current challenges, we're resubmitting the proposal. All relevant
stakeholders from previous discussions have been included for reference.
Some alternative solutions are also under discussion[2][3], such as
shrinking child dentries outside of the parent inode lock or even
asynchronously shrinking child dentries. However, given the
straightforward nature of the current solution, I believe this approach
is still necessary.
[ NOTE! This is a pretty fundamental change in how we deal with
unlinking dentries, and it doesn't change the fact that you can have
lots of negative dentries from just doing negative lookups.
But the kernel test robot is at least initially happy with this from a
performance angle, so I'm applying this ASAP just to get more testing
and as a "known fix for an issue people hit in real life".
Put another way: we should still look at the alternatives, and this
patch may get reverted if somebody finds a performance regression on
some other load. - Linus ]
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://github.com/elastic/elasticsearch [0]
Link: https://patchwork.kernel.org/project/linux-fsdevel/patch/1502099673-31620-1-git-send-email-wangkai86@huawei.com [1]
Link: https://lore.kernel.org/linux-fsdevel/20240511200240.6354-2-torvalds@linux-foundation.org/ [2]
Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wjEMf8Du4UFzxuToGDnF3yLaMcrYeyNAaH1NJWa6fwcNQ@mail.gmail.com/ [3]
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Waiman Long <longman@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Wangkai <wangkai86@huawei.com>
Cc: Colin Walters <walters@verbum.org>
Tested-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/all/202405221518.ecea2810-oliver.sang@intel.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
virtio core already sets the .owner, so driver does not need to.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Message-Id: <20240331-module-owner-virtio-v2-24-98f04bfaf46a@linaro.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
This removes the signal/coredump hacks added for vhost_tasks in:
Commit f9010dbdce ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression")
When that patch was added vhost_tasks did not handle SIGKILL and would
try to ignore/clear the signal and continue on until the device's close
function was called. In the previous patches vhost_tasks and the vhost
drivers were converted to support SIGKILL by cleaning themselves up and
exiting. The hacks are no longer needed so this removes them.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20240316004707.45557-10-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZkzp/gAKCRBZ7Krx/gZQ
63KFAQCsKv3XdcF+2BO+QuwPvR6eAvDxFjrFEcQFyyOXgFVLaAD/UMM0HcEFWxBb
PCPvyKVP22wF9PbodkrKJn8DRdtRZwM=
=jvWv
-----END PGP SIGNATURE-----
Merge tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
"Assorted commits that had missed the last merge window..."
* tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
remove call_{read,write}_iter() functions
do_dentry_open(): kill inode argument
kernel_file_open(): get rid of inode argument
get_file_rcu(): no need to check for NULL separately
fd_is_open(): move to fs/file.c
close_on_exec(): pass files_struct instead of fdtable
Replacement of bdev->bd_inode with sane(r) set of primitives.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZkwjlgAKCRBZ7Krx/gZQ
66OmAP9nhZLASn/iM2+979I6O0GW+vid+uLh48uW3d+LbsmVIgD9GYpR+cuLQ/xj
mJESWfYKOVSpFFSrqlzKg9PQlU/GFgs=
=6LRp
-----END PGP SIGNATURE-----
Merge tag 'pull-bd_inode-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull bdev bd_inode updates from Al Viro:
"Replacement of bdev->bd_inode with sane(r) set of primitives by me and
Yu Kuai"
* tag 'pull-bd_inode-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
RIP ->bd_inode
dasd_format(): killing the last remaining user of ->bd_inode
nilfs_attach_log_writer(): use ->bd_mapping->host instead of ->bd_inode
block/bdev.c: use the knowledge of inode/bdev coallocation
gfs2: more obvious initializations of mapping->host
fs/buffer.c: massage the remaining users of ->bd_inode to ->bd_mapping
blk_ioctl_{discard,zeroout}(): we only want ->bd_inode->i_mapping here...
grow_dev_folio(): we only want ->bd_inode->i_mapping there
use ->bd_mapping instead of ->bd_inode->i_mapping
block_device: add a pointer to struct address_space (page cache of bdev)
missing helpers: bdev_unhash(), bdev_drop()
block: move two helpers into bdev.c
block2mtd: prevent direct access of bd_inode
dm-vdo: use bdev_nr_bytes(bdev) instead of i_size_read(bdev->bd_inode)
blkdev_write_iter(): saner way to get inode and bdev
bcachefs: remove dead function bdev_sectors()
ext4: remove block_device_ejected()
erofs_buf: store address_space instead of inode
erofs: switch erofs_bread() to passing offset instead of block number
to struct file * and verifying that caller has device
opened exclusively.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZkwkfQAKCRBZ7Krx/gZQ
62C3AQDW5vuXNx2+KDPma5YStjFpPLC0xtSyAS5D3YANjtyRFgD/TOcCarq7rvBt
KubxHVFsfW+eu6ASeaoMRB83w5OIzwk=
=Liix
-----END PGP SIGNATURE-----
Merge tag 'pull-set_blocksize' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs blocksize updates from Al Viro:
"This gets rid of bogus set_blocksize() uses, switches it over
to be based on a 'struct file *' and verifies that the caller
has the device opened exclusively"
* tag 'pull-set_blocksize' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
make set_blocksize() fail unless block device is opened exclusive
set_blocksize(): switch to passing struct file *
btrfs_get_bdev_and_sb(): call set_blocksize() only for exclusive opens
swsusp: don't bother with setting block size
zram: don't bother with reopening - just use O_EXCL for open
swapon(2): open swap with O_EXCL
swapon(2)/swapoff(2): don't bother with block size
pktcdvd: sort set_blocksize() calls out
bcache_register(): don't bother with set_blocksize()
pidfs started using much saner inodes in commit b28ddcc32d ("pidfs:
convert to path_from_stashed() helper"), but that exposed the fact that
lsof had some knowledge of just how odd our old anon_inode usage was.
For example, legacy anon_inodes hadn't even initialized the inode type
in the inode mode, so everything had a type of zero.
So sane tools like 'stat' would report these files as "weird file", but
'lsof' instead used that (together with the name of the link in proc) to
notice that it's an anonymous inode, and used it to detect pidfd files.
Let's keep our internal new sane inode model, but mask the file type
bits at 'stat()' time in the getattr() function we already have, and by
making the dentry name match what lsof expects too.
This keeps our internal models sane, but should make user space see the
same old odd behavior.
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Link: https://lore.kernel.org/all/a15b1050-4b52-4740-a122-a4d055c17f11@kernel.org/
Link: https://github.com/lsof-org/lsof/issues/317
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Seth Forshee <sforshee@kernel.org>
Cc: Tycho Andersen <tycho@tycho.pizza>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shifting *signed int* typed constant 1 left by 31 bits causes undefined
behavior. Specify the correct *unsigned long* type by using 1UL instead.
Found by Linux Verification Center (linuxtesting.org) with the Svace static
analysis tool.
Cc: stable@vger.kernel.org
Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
All callers of pnfs_generic_pg_check_layout() also want to do a call to
check that the layout's range covers the IO range. Merge the functionality
of the pnfs_generic_pg_check_range() into that of
pnfs_generic_pg_check_layout().
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Before doing the IO, check that we have the layout covering the range of
IO.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Change left over allocation flags.
Fixes: a245832aaa ("pNFS/files: Ensure pNFS allocation modes are consistent with nfsiod")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
In this round, we've tried to address some performance issues on zoned storage
such as direct IO and write_hints. In addition, we've migrated some IO paths
using folio. Meanwhile, there are multiple bug fixes in the compression paths,
sanity check conditions, and error handlers.
Enhancement:
- allow direct io of pinned files for zoned storage
- assign the write hint per stream by default
- convert read paths and test_writeback to folio
- avoid allocating WARM_DATA segment for direct IO
Bug fix:
- fix false alarm on invalid block address
- fix to add missing iput() in gc_data_segment()
- fix to release node block count in error path of f2fs_new_node_page()
- compress: don't allow unaligned truncation on released compress inode
- compress: fix to cover {reserve,release}_compress_blocks() w/ cp_rwsem lock
- compress: fix error path of inc_valid_block_count()
- compress: fix to update i_compr_blocks correctly
- fix block migration when section is not aligned to pow2
- don't trigger OPU on pinfile for direct IO
- fix to do sanity check on i_xattr_nid in sanity_check_inode()
- write missing last sum blk of file pinning section
- clear writeback when compression failed
- fix to adjust appropirate defragment pg_end
As usual, there are several minor code clean-ups, and fixes to manage missing
corner cases in the error paths.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmZLpYcACgkQQBSofoJI
UNJQTw/+NaY7a1EgkMUpBAzxrJMKHcuBtyG42QKqgk6new0XejQGjPHojL2nPrw/
t5G9TsbZbkHNMuhAkkTZMH+DFg92QYhByJlq79fxzya0XyGH4OaY1i4u67FLu0Qz
PS/UKRkEI2B9lH+bGwa//XNMDSnzcao46bNi1SFbCNPGzU1cS35uOy/YgAdFlqTM
WKJmM/AcNir4xtL30tBCVU//0OTtzT8+5YFVyPTeFR4WACsF6eTJAre9938xw1Ef
p6ed6Wl2GYehqgFrAdAF07veZ1hVDSRAAB/1Mu1WKnNp57VBRjJW3DFDyApf+fIe
2KJIDJd9/ece3dycuiZP/LXPV0sODqOI1/5s9RbFVq/QAhTSME5xq8hNXTejdl28
PV6M2tKcTKMRpykppQg/K/N9PaO5Q6oFz0xlrOsrGoAhT1YnZfJi/DmzCZCCwYxW
jyZor/r+849yDDdjhB94ZaByvj5S3OVqgsaunnbMBcGy+DDe0rUMXvRzVK4gTcCF
lSTSp895BggWXLyPuXVNTjC4GIbzVbEDaHILPicfbqi0h5OCXG8YybKHiRs+ss6z
ZrKJQxSVVvhjyHTVcBhb/Nc1s7Fm7DkX+KjV9GV3gwzB+AlVIgPlwyMTc2fZp3ST
dUbmBR5+g4UUz2v4v4ZStAGy9eUFktO89u/roet8/74ppklj73E=
=3mwj
-----END PGP SIGNATURE-----
Merge tag 'f2fs-for-6.10.rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this round, we've tried to address some performance issues on zoned
storage such as direct IO and write_hints. In addition, we've migrated
some IO paths using folio. Meanwhile, there are multiple bug fixes in
the compression paths, sanity check conditions, and error handlers.
Enhancements:
- allow direct io of pinned files for zoned storage
- assign the write hint per stream by default
- convert read paths and test_writeback to folio
- avoid allocating WARM_DATA segment for direct IO
Bug fixes:
- fix false alarm on invalid block address
- fix to add missing iput() in gc_data_segment()
- fix to release node block count in error path of
f2fs_new_node_page()
- compress:
- don't allow unaligned truncation on released compress inode
- cover {reserve,release}_compress_blocks() w/ cp_rwsem lock
- fix error path of inc_valid_block_count()
- fix to update i_compr_blocks correctly
- fix block migration when section is not aligned to pow2
- don't trigger OPU on pinfile for direct IO
- fix to do sanity check on i_xattr_nid in sanity_check_inode()
- write missing last sum blk of file pinning section
- clear writeback when compression failed
- fix to adjust appropirate defragment pg_end
As usual, there are several minor code clean-ups, and fixes to manage
missing corner cases in the error paths"
* tag 'f2fs-for-6.10.rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (50 commits)
f2fs: initialize last_block_in_bio variable
f2fs: Add inline to f2fs_build_fault_attr() stub
f2fs: fix some ambiguous comments
f2fs: fix to add missing iput() in gc_data_segment()
f2fs: allow dirty sections with zero valid block for checkpoint disabled
f2fs: compress: don't allow unaligned truncation on released compress inode
f2fs: fix to release node block count in error path of f2fs_new_node_page()
f2fs: compress: fix to cover {reserve,release}_compress_blocks() w/ cp_rwsem lock
f2fs: compress: fix error path of inc_valid_block_count()
f2fs: compress: fix typo in f2fs_reserve_compress_blocks()
f2fs: compress: fix to update i_compr_blocks correctly
f2fs: check validation of fault attrs in f2fs_build_fault_attr()
f2fs: fix to limit gc_pin_file_threshold
f2fs: remove unused GC_FAILURE_PIN
f2fs: use f2fs_{err,info}_ratelimited() for cleanup
f2fs: fix block migration when section is not aligned to pow2
f2fs: zone: fix to don't trigger OPU on pinfile for direct IO
f2fs: fix to do sanity check on i_xattr_nid in sanity_check_inode()
f2fs: fix to avoid allocating WARM_DATA segment for direct IO
f2fs: remove redundant parameter in is_next_segment_free()
...
* Introduce Parent Pointer extended attribute for inodes.
* Online Repair
- Implement atomic file content exchanges i.e. exchange ranges of bytes
between two files atomically.
- Create temporary files to repair file-based metadata. This uses atomic
file content exchange facility to swap file fork mappings between the
temporary file and the metadata inode.
- Allow callers of directory/xattr code to set an explicit owner number to
be written into the header fields of any new blocks that are created.
This is required to avoid walking every block of the new structure and
modify their ownership during online repair.
- Repair
- Extended attributes
- Inode unlinked state
- Directories
- Symbolic links
- AGI's unlinked inode list.
- Parent pointers.
- Move Orphan files to lost and found directory.
- Fixes for Inode repair functionality.
- Introduce a new sub-AG FITRIM implementation to reduce the duration for
which the AGF lock is held.
- Updates for the design documentation.
- Use Parent Pointers to assist in checking directories, parent pointers,
extended attributes, and link counts.
* Bring back delalloc support for realtime devices which have an extent size
that is equal to filesystem's block size.
* Improve performance of log incompat feature handling.
* Fixes
- Prevent userspace from reading invalid file data due to incorrect.
updation of file size when performing a non-atomic clone operation.
- Minor fixes to online repair.
- Fix confusing return values from xfs_bmapi_write().
- Fix an out of bounds access due to incorrect h_size during log recovery.
- Defer upgrading the extent counters in xfs_reflink_end_cow_extent() until
we know we are going to modify the extent mapping.
- Remove racy access to if_bytes check in xfs_reflink_end_cow_extent().
- Fix sparse warnings.
* Cleanups
- Hold inode locks on all files involved in a rename until the completion
of the operation. This is in preparation for the parent pointers patchset
where parent pointers are applied in a separate chained update from the
actual directory update.
- Compile out v4 support when disabled.
- Cleanup xfs_extent_busy_clear().
- Remove unused flags and fields from struct xfs_da_args.
- Remove definitions of unused functions.
- Improve extended attribute validation.
- Add higher level directory operations helpers to remove duplication of
code.
- Cleanup quota (un)reservation interfaces.
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQjMC4mbgVeU7MxEIYH7y4RirJu9AUCZjZC0wAKCRAH7y4RirJu
9HsCAPoCQvmPefDv56aMb5JEQNpv9dPz2Djj14hqLytQs5P/twD+LF5NhJgQNDUo
Lwnb0tmkAhmG9Y4CCiN1FwSj1rq59gE=
=2hXB
-----END PGP SIGNATURE-----
Merge tag 'xfs-6.10-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs updates from Chandan Babu:
"Online repair feature continues to be expanded. Also, we now support
delayed allocation for realtime devices which have an extent size that
is equal to filesystem's block size.
New code:
- Introduce Parent Pointer extended attribute for inodes
- Bring back delalloc support for realtime devices which have an
extent size that is equal to filesystem's block size
- Improve performance of log incompat feature handling
Online Repair:
- Implement atomic file content exchanges i.e. exchange ranges of
bytes between two files atomically
- Create temporary files to repair file-based metadata. This uses
atomic file content exchange facility to swap file fork mappings
between the temporary file and the metadata inode
- Allow callers of directory/xattr code to set an explicit owner
number to be written into the header fields of any new blocks that
are created. This is required to avoid walking every block of the
new structure and modify their ownership during online repair
- Repair more data structures:
- Extended attributes
- Inode unlinked state
- Directories
- Symbolic links
- AGI's unlinked inode list
- Parent pointers
- Move Orphan files to lost and found directory
- Fixes for Inode repair functionality
- Introduce a new sub-AG FITRIM implementation to reduce the duration
for which the AGF lock is held
- Updates for the design documentation
- Use Parent Pointers to assist in checking directories, parent
pointers, extended attributes, and link counts
Fixes:
- Prevent userspace from reading invalid file data due to incorrect.
updation of file size when performing a non-atomic clone operation
- Minor fixes to online repair
- Fix confusing return values from xfs_bmapi_write()
- Fix an out of bounds access due to incorrect h_size during log
recovery
- Defer upgrading the extent counters in xfs_reflink_end_cow_extent()
until we know we are going to modify the extent mapping
- Remove racy access to if_bytes check in
xfs_reflink_end_cow_extent()
- Fix sparse warnings
Cleanups:
- Hold inode locks on all files involved in a rename until the
completion of the operation. This is in preparation for the parent
pointers patchset where parent pointers are applied in a separate
chained update from the actual directory update
- Compile out v4 support when disabled
- Cleanup xfs_extent_busy_clear()
- Remove unused flags and fields from struct xfs_da_args
- Remove definitions of unused functions
- Improve extended attribute validation
- Add higher level directory operations helpers to remove duplication
of code
- Cleanup quota (un)reservation interfaces"
* tag 'xfs-6.10-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (221 commits)
xfs: simplify iext overflow checking and upgrade
xfs: remove a racy if_bytes check in xfs_reflink_end_cow_extent
xfs: upgrade the extent counters in xfs_reflink_end_cow_extent later
xfs: xfs_quota_unreserve_blkres can't fail
xfs: consolidate the xfs_quota_reserve_blkres definitions
xfs: clean up buffer allocation in xlog_do_recovery_pass
xfs: fix log recovery buffer allocation for the legacy h_size fixup
xfs: widen flags argument to the xfs_iflags_* helpers
xfs: minor cleanups of xfs_attr3_rmt_blocks
xfs: create a helper to compute the blockcount of a max sized remote value
xfs: turn XFS_ATTR3_RMT_BUF_SPACE into a function
xfs: use unsigned ints for non-negative quantities in xfs_attr_remote.c
xfs: do not allocate the entire delalloc extent in xfs_bmapi_write
xfs: fix xfs_bmap_add_extent_delay_real for partial conversions
xfs: remove the xfs_iext_peek_prev_extent call in xfs_bmapi_allocate
xfs: pass the actual offset and len to allocate to xfs_bmapi_allocate
xfs: don't open code XFS_FILBLKS_MIN in xfs_bmapi_write
xfs: lift a xfs_valid_startblock into xfs_bmapi_allocate
xfs: remove the unusued tmp_logflags variable in xfs_bmapi_allocate
xfs: fix error returns from xfs_bmapi_write
...
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmZLMxUACgkQnJ2qBz9k
QNnnXAgA1MfUPm6b4AE3y3EEWUSTQd2THQSGUg/ZLeJW3zE5nNaW74BPYKTSTreY
Bmx0QVrgzJX9EJe2UmFsnPiPEzznn1RIDdPwCQEZpjEt+/iX3/+5+z+aDK57mWpJ
5Lxzv/Ji1iNlYzdDti8MIc9edj923A7JpQQQ7Hz6ldmuc5EHXLrcTTzytPIDU5cp
6RKY7W0sntslTjKvVkZaEqugPtQpe064Sq7a0jtjLz2r+tyxDXakWNGrdQIYDTLR
UGwsFFZL5JbnR12oCMKJ3Vh3YjA5gmyxlbHCghDtGkXrSS1mseumJYsRFazE7y+8
Fp1fbObLe1z22N742r1aNE8vtXLGUw==
=nUBh
-----END PGP SIGNATURE-----
Merge tag 'fs_for_v6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull isofs, udf, quota, ext2, and reiserfs updates from Jan Kara:
- convert isofs to the new mount API
- cleanup isofs Makefile
- udf conversion to folios
- some other small udf cleanups and fixes
- ext2 cleanups
- removal of reiserfs .writepage method
- update reiserfs README file
* tag 'fs_for_v6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
isofs: Use *-y instead of *-objs in Makefile
ext2: Remove LEGACY_DIRECT_IO dependency
isofs: Remove calls to set/clear the error flag
ext2: Remove call to folio_set_error()
udf: Use a folio in udf_write_end()
udf: Convert udf_page_mkwrite() to use a folio
udf: Convert udf_symlink_getattr() to use a folio
udf: Convert udf_adinicb_readpage() to udf_adinicb_read_folio()
udf: Convert udf_expand_file_adinicb() to use a folio
udf: Convert udf_write_begin() to use a folio
udf: Convert udf_symlink_filler() to use a folio
reiserfs: Trim some README bits
quota: fix to propagate error of mark_dquot_dirty() to caller
reiserfs: Convert to writepages
udf: udftime: prevent overflow in udf_disk_stamp_to_time()
ext2: set FMODE_CAN_ODIRECT instead of a dummy direct_IO method
udf: replace deprecated strncpy/strcpy with strscpy
udf: Remove second semicolon
isofs: convert isofs to use the new mount API
fs: quota: use group allocation of per-cpu counters API
This reverts commit e659522446.
These kinds of patches are only making the code worse.
Compilers don't care about the unnecessary check, but removing it makes
the code less obvious to a human. The declaration of 'len' is more than
80 lines earlier, so a human won't easily see that 'len' is of an
unsigned type, so to a human the range check that checks against zero is
much more explicit and obvious.
Any tool that complains about a range check like this just because the
variable is unsigned is actively detrimental, and should be ignored.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmZLJS0ACgkQnJ2qBz9k
QNmFlggAlIg5oDZfOhJur6h3Icldrl2DsnKer0CAP7TFK+GfkFTEb25paoydBEu4
Y0VzZ3n3EqhmsJ8P515k1UPPPXlqqZwSRWGAek0FDhQCXhqEYxiWwf9U343hJNBS
rya4Rnwc1pxqmJU2hrY5R5kEbugUFAIL+qNXzhhLpWonYiy/ya7P5n/qz5F5HJH2
FufRRaPHcHFfk1u0+PvFrk019AS9C6Y3bkcUGtbpdwmFsuN3D4HKuLEkr1+C9Apb
NmkoAwCiSobQhAxGDr6Szqu6r1VCuM+n/O9fqLknnL9u0jm95AmGdIMOdQ/ofx6d
xn3mfRp8gUbPD8PubHhQsMjCmSjGwg==
=kwWW
-----END PGP SIGNATURE-----
Merge tag 'fsnotify_for_v6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify updates from Jan Kara:
- reduce overhead of fsnotify infrastructure when no permission events
are in use
- a few small cleanups
* tag 'fsnotify_for_v6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fsnotify: fix UAF from FS_ERROR event on a shutting down filesystem
fsnotify: optimize the case of no permission event watchers
fsnotify: use an enum for group priority constants
fsnotify: move s_fsnotify_connectors into fsnotify_sb_info
fsnotify: lazy attach fsnotify_sb_info state to sb
fsnotify: create helper fsnotify_update_sb_watchers()
fsnotify: pass object pointer and type to fsnotify mark helpers
fanotify: merge two checks regarding add of ignore mark
fsnotify: create a wrapper fsnotify_find_inode_mark()
fsnotify: create helpers to get sb and connp from object
fsnotify: rename fsnotify_{get,put}_sb_connectors()
fsnotify: Avoid -Wflex-array-member-not-at-end warning
fanotify: remove unneeded sub-zero check for unsigned value
This came up during one of the Bake-a-thon discussions. NFS v2 support
was dropped from nfs-utils/mount.nfs in December 2021. Let's turn it
off by default in the kernel too, since this means there isn't a way
to mount and test it.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Reviewed-by: Jeffrey Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Olga showed me a case where the client was sending multiple READ_PLUS
calls to the server in parallel, and the server replied
NFS4ERR_OPNOTSUPP to each. The client would fall back to READ for the
first reply, but fail to retry the other calls.
I fix this by removing the test for NFS_CAP_READ_PLUS in
nfs4_read_plus_not_supported(). This allows us to reschedule any
READ_PLUS call that has a NFS4ERR_OPNOTSUPP return value, even after the
capability has been cleared.
Reported-by: Olga Kornievskaia <kolga@netapp.com>
Fixes: c567552612 ("NFS: Add READ_PLUS data segment support")
Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
With newer kernels that use fs_context for nfs mounts, remounts fail with
-EINVAL.
$ mount -t nfs -o nolock 10.0.0.1:/tmp/test /mnt/test/
$ mount -t nfs -o remount /mnt/test/
mount: mounting 10.0.0.1:/tmp/test on /mnt/test failed: Invalid argument
For remounts, the nfs server address and port are populated by
nfs_init_fs_context and later overwritten with 0x00 bytes by
nfs23_parse_monolithic. The remount then fails as the server address is
invalid.
Fix this by not overwriting nfs server info in nfs23_parse_monolithic if
we're doing a remount.
Fixes: f2aedb713c ("NFS: Add fs_context support.")
Signed-off-by: Martin Kaiser <martin@kaiser.cx>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Dan Carpenter reports smatch warning for nfs4_try_migration() when a memory
allocation failure results in a zero return value. In this case, a
transient allocation failure error will likely be retried the next time the
server responds with NFS4ERR_MOVED.
We can fixup the smatch warning with a small refactor: attempt all three
allocations before testing and returning on a failure.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Fixes: c3ed222745 ("NFSv4: Fix free of uninitialized nfs4_label on referral lookup.")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Currently, mount option lock/nolock and local_lock option
may override NFS_MOUNT_LOCAL_FLOCK NFS_MOUNT_LOCAL_FCNTL flags
when passing in different order:
mount -o vers=3,local_lock=all,lock:
local_lock=none
mount -o vers=3,lock,local_lock=all:
local_lock=all
This patch will let lock/nolock override local_lock option
as nfs(5) suggested.
Signed-off-by: Chen Hanxiao <chenhx.fnst@fujitsu.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
With two clients, each with NFSv3 mounts of the same directory, the sequence:
client1 client2
ls -l afile
echo hello there > afile
echo HELLO > afile
cat afile
will show
HELLO
there
because the O_TRUNC requested in the final 'echo' doesn't take effect.
This is because the "Negative dentry, just create a file" section in
lookup_open() assumes that the file *does* get created since the dentry
was negative, so it sets FMODE_CREATED, and this causes do_open() to
clear O_TRUNC and so the file doesn't get truncated.
Even mounting with -o lookupcache=none does not help as
nfs_neg_need_reval() always returns false if LOOKUP_CREATE is set.
This patch fixes the problem by providing an atomic_open inode operation
for NFSv3 (and v2). The code is largely the code from the branch in
lookup_open() when atomic_open is not provided. The significant change
is that the O_TRUNC flag is passed a new nfs_do_create() which add
'trunc' handling to nfs_create().
With this change we also optimise away an unnecessary LOOKUP before the
file is created.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Move from only requesting full file layout segments to requesting layout
segments that match our I/O size. This means the server is still free to
return a full file layout if it wants, but partial layouts will no
longer cause an error.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Layout segments have been supported in pNFS for years, so remove the
requirement that the server always sends whole file layouts.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
This fixes an assertion pop in btree_iter.c that checks for forgetting
to pass a snapshot ID when iterating over snapshots btrees.
Reported-by: syzbot+0dfe05235e38653e2aee@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>