IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmTvFssACgkQnJ2qBz9k
QNl7HggAuY154urYJdh7M+mbKDSywhcK0YT5pNNkcXVpv/t2c073Ce57+ObDCBaS
xetyFgH2XlvuAJ4dWmRDwBEzJ0jquKzvYJEMiXAexgy47ctnNPx5kLPsXpt3g+2q
pro7sK1b5BmX/zrgOontbJ8/YAwX85XToD4Cv5XyNSx/ex6/zsd5FProfdiY/HAt
qAcv7NkNTBbJBEBHhBNQSL2wOj3LzQV1U8v0XEcsBvTUxlX2jH8J4CsuFIotXqCF
37SNvZPk2c04HbaLgyU4Ura69qD0fn4vTMocuCoaf0CN2PL5jblRAwsAO2bfSqJE
AxZFq3afI0YV3Y9OrVlzHtSALuiZMQ==
=QPEQ
-----END PGP SIGNATURE-----
Merge tag 'for_v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext2, quota, and udf updates from Jan Kara:
- fixes for possible use-after-free issues with quota when racing with
chown
- fixes for ext2 crashing when xattr allocation races with another
block allocation to the same file from page writeback code
- fix for block number overflow in ext2
- marking of reiserfs as obsolete in MAINTAINERS
- assorted minor cleanups
* tag 'for_v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
ext2: Fix kernel-doc warnings
ext2: improve consistency of ext2_fsblk_t datatype usage
ext2: dump current reservation window info
ext2: fix race between setxattr and write back
ext2: introduce new flags argument for ext2_new_blocks()
ext2: remove ext2_new_block()
ext2: fix datatype of block number in ext2_xattr_set2()
udf: Drop pointless aops assignment
quota: use lockdep_assert_held_write in dquot_load_quota_sb
MAINTAINERS: change reiserfs status to obsolete
udf: Fix -Wstringop-overflow warnings
quota: simplify drop_dquot_ref()
quota: fix dqput() to follow the guarantees dquot_srcu should provide
quota: add new helper dquot_active()
quota: rename dquot_active() to inode_quota_active()
quota: factor out dquot_write_dquot()
ext2: remove redundant assignment to variable desc and variable best_desc
Currently we check whether superblock has ->quota_read and ->quota_write
operations to check whether filesystem supports quotas. However for
example for shmfs we will not read or write dquots so check whether
quota operations are set in the superblock instead.
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Message-Id: <20230725144510.253763-4-cem@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Use lockdep_assert_held_write to assert and self-document the locking
state in dquot_load_quota_sb instead of hand-crafting it with a trylock.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20230802115439.2145212-2-hch@lst.de>
As Honza said, remove_inode_dquot_ref() currently does not release the
last dquot reference but instead adds the dquot to tofree_head list. This
is because dqput() can sleep while dropping of the last dquot reference
(writing back the dquot and calling ->release_dquot()) and that must not
happen under dq_list_lock. Now that dqput() queues the final dquot cleanup
into a workqueue, remove_inode_dquot_ref() can call dqput() unconditionally
and we can significantly simplify it.
Here we open code the simplified code of remove_inode_dquot_ref() into
remove_dquot_ref() and remove the function put_dquot_list() which is no
longer used.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20230630110822.3881712-6-libaokun1@huawei.com>
The dquot_mark_dquot_dirty() using dquot references from the inode
should be protected by dquot_srcu. quota_off code takes care to call
synchronize_srcu(&dquot_srcu) to not drop dquot references while they
are used by other users. But dquot_transfer() breaks this assumption.
We call dquot_transfer() to drop the last reference of dquot and add
it to free_dquots, but there may still be other users using the dquot
at this time, as shown in the function graph below:
cpu1 cpu2
_________________|_________________
wb_do_writeback CHOWN(1)
...
ext4_da_update_reserve_space
dquot_claim_block
...
dquot_mark_dquot_dirty // try to dirty old quota
test_bit(DQ_ACTIVE_B, &dquot->dq_flags) // still ACTIVE
if (test_bit(DQ_MOD_B, &dquot->dq_flags))
// test no dirty, wait dq_list_lock
...
dquot_transfer
__dquot_transfer
dqput_all(transfer_from) // rls old dquot
dqput // last dqput
dquot_release
clear_bit(DQ_ACTIVE_B, &dquot->dq_flags)
atomic_dec(&dquot->dq_count)
put_dquot_last(dquot)
list_add_tail(&dquot->dq_free, &free_dquots)
// add the dquot to free_dquots
if (!test_and_set_bit(DQ_MOD_B, &dquot->dq_flags))
add dqi_dirty_list // add released dquot to dirty_list
This can cause various issues, such as dquot being destroyed by
dqcache_shrink_scan() after being added to free_dquots, which can trigger
a UAF in dquot_mark_dquot_dirty(); or after dquot is added to free_dquots
and then to dirty_list, it is added to free_dquots again after
dquot_writeback_dquots() is executed, which causes the free_dquots list to
be corrupted and triggers a UAF when dqcache_shrink_scan() is called for
freeing dquot twice.
As Honza said, we need to fix dquot_transfer() to follow the guarantees
dquot_srcu should provide. But calling synchronize_srcu() directly from
dquot_transfer() is too expensive (and mostly unnecessary). So we add
dquot whose last reference should be dropped to the new global dquot
list releasing_dquots, and then queue work item which would call
synchronize_srcu() and after that perform the final cleanup of all the
dquots on releasing_dquots.
Fixes: 4580b30ea8 ("quota: Do not dirty bad dquots")
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20230630110822.3881712-5-libaokun1@huawei.com>
Add new helper function dquot_active() to make the code more concise.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20230630110822.3881712-4-libaokun1@huawei.com>
Now we have a helper function dquot_dirty() to determine if dquot has
DQ_MOD_B bit. dquot_active() can easily be misunderstood as a helper
function to determine if dquot has DQ_ACTIVE_B bit. So we avoid this by
renaming it to inode_quota_active() and later on we will add the helper
function dquot_active() to determine if dquot has DQ_ACTIVE_B bit.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20230630110822.3881712-3-libaokun1@huawei.com>
Refactor out dquot_write_dquot() to reduce duplicate code.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20230630110822.3881712-2-libaokun1@huawei.com>
When add_dquot_ref() fails (usually due to IO error or ENOMEM), we want
to disable quotas we are trying to enable. However dquot_disable() call
was passed just the flags we are enabling so in case flags ==
DQUOT_USAGE_ENABLED dquot_disable() call will just fail with EINVAL
instead of properly disabling quotas. Fix the problem by always passing
DQUOT_LIMITS_ENABLED | DQUOT_USAGE_ENABLED to dquot_disable() in this
case.
Reported-and-tested-by: Ye Bin <yebin10@huawei.com>
Reported-by: syzbot+e633c79ceaecbf479854@syzkaller.appspotmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20230605140731.2427629-2-yebin10@huawei.com>
register_sysctl_init() also prints information that may be useful for
further debugging when we fail to register sysctl table. Use it when
registering fs_dqstats_table.
Suggested-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
dquot_set_dqinfo() ignores the return code from the ->write_info
call, which means that quotacalls like Q_SETINFO never see the error.
This doesn't seem right, so fix that.
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20230227120216.31306-2-frank.li@vivo.com>
There is no need to declare two tables to just create directories,
this can be easily be done with a prefix path with register_sysctl().
Simplify this registration.
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20230310231206.3952808-4-mcgrof@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Lin, Yang Shi, Anshuman Khandual and Mike Rapoport
- Some kmemleak fixes from Patrick Wang and Waiman Long
- DAMON updates from SeongJae Park
- memcg debug/visibility work from Roman Gushchin
- vmalloc speedup from Uladzislau Rezki
- more folio conversion work from Matthew Wilcox
- enhancements for coherent device memory mapping from Alex Sierra
- addition of shared pages tracking and CoW support for fsdax, from
Shiyang Ruan
- hugetlb optimizations from Mike Kravetz
- Mel Gorman has contributed some pagealloc changes to improve latency
and realtime behaviour.
- mprotect soft-dirty checking has been improved by Peter Xu
- Many other singleton patches all over the place
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYuravgAKCRDdBJ7gKXxA
jpqSAQDrXSdII+ht9kSHlaCVYjqRFQz/rRvURQrWQV74f6aeiAD+NHHeDPwZn11/
SPktqEUrF1pxnGQxqLh1kUFUhsVZQgE=
=w/UH
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"Most of the MM queue. A few things are still pending.
Liam's maple tree rework didn't make it. This has resulted in a few
other minor patch series being held over for next time.
Multi-gen LRU still isn't merged as we were waiting for mapletree to
stabilize. The current plan is to merge MGLRU into -mm soon and to
later reintroduce mapletree, with a view to hopefully getting both
into 6.1-rc1.
Summary:
- The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe
Lin, Yang Shi, Anshuman Khandual and Mike Rapoport
- Some kmemleak fixes from Patrick Wang and Waiman Long
- DAMON updates from SeongJae Park
- memcg debug/visibility work from Roman Gushchin
- vmalloc speedup from Uladzislau Rezki
- more folio conversion work from Matthew Wilcox
- enhancements for coherent device memory mapping from Alex Sierra
- addition of shared pages tracking and CoW support for fsdax, from
Shiyang Ruan
- hugetlb optimizations from Mike Kravetz
- Mel Gorman has contributed some pagealloc changes to improve
latency and realtime behaviour.
- mprotect soft-dirty checking has been improved by Peter Xu
- Many other singleton patches all over the place"
[ XFS merge from hell as per Darrick Wong in
https://lore.kernel.org/all/YshKnxb4VwXycPO8@magnolia/ ]
* tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (282 commits)
tools/testing/selftests/vm/hmm-tests.c: fix build
mm: Kconfig: fix typo
mm: memory-failure: convert to pr_fmt()
mm: use is_zone_movable_page() helper
hugetlbfs: fix inaccurate comment in hugetlbfs_statfs()
hugetlbfs: cleanup some comments in inode.c
hugetlbfs: remove unneeded header file
hugetlbfs: remove unneeded hugetlbfs_ops forward declaration
hugetlbfs: use helper macro SZ_1{K,M}
mm: cleanup is_highmem()
mm/hmm: add a test for cross device private faults
selftests: add soft-dirty into run_vmtests.sh
selftests: soft-dirty: add test for mprotect
mm/mprotect: fix soft-dirty check in can_change_pte_writable()
mm: memcontrol: fix potential oom_lock recursion deadlock
mm/gup.c: fix formatting in check_and_migrate_movable_page()
xfs: fail dax mount if reflink is enabled on a partition
mm/memcontrol.c: remove the redundant updating of stats_flush_threshold
userfaultfd: don't fail on unrecognized features
hugetlb_cgroup: fix wrong hugetlb cgroup numa stat
...
Currently shrinkers are anonymous objects. For debugging purposes they
can be identified by count/scan function names, but it's not always
useful: e.g. for superblock's shrinkers it's nice to have at least an
idea of to which superblock the shrinker belongs.
This commit adds names to shrinkers. register_shrinker() and
prealloc_shrinker() functions are extended to take a format and arguments
to master a name.
In some cases it's not possible to determine a good name at the time when
a shrinker is allocated. For such cases shrinker_debugfs_rename() is
provided.
The expected format is:
<subsystem>-<shrinker_type>[:<instance>]-<id>
For some shrinkers an instance can be encoded as (MAJOR:MINOR) pair.
After this change the shrinker debugfs directory looks like:
$ cd /sys/kernel/debug/shrinker/
$ ls
dquota-cache-16 sb-devpts-28 sb-proc-47 sb-tmpfs-42
mm-shadow-18 sb-devtmpfs-5 sb-proc-48 sb-tmpfs-43
mm-zspool:zram0-34 sb-hugetlbfs-17 sb-pstore-31 sb-tmpfs-44
rcu-kfree-0 sb-hugetlbfs-33 sb-rootfs-2 sb-tmpfs-49
sb-aio-20 sb-iomem-12 sb-securityfs-6 sb-tracefs-13
sb-anon_inodefs-15 sb-mqueue-21 sb-selinuxfs-22 sb-xfs:vda1-36
sb-bdev-3 sb-nsfs-4 sb-sockfs-8 sb-zsmalloc-19
sb-bpf-32 sb-pipefs-14 sb-sysfs-26 thp-deferred_split-10
sb-btrfs:vda2-24 sb-proc-25 sb-tmpfs-1 thp-zero-9
sb-cgroup2-30 sb-proc-39 sb-tmpfs-27 xfs-buf:vda1-37
sb-configfs-23 sb-proc-41 sb-tmpfs-29 xfs-inodegc:vda1-38
sb-dax-11 sb-proc-45 sb-tmpfs-35
sb-debugfs-7 sb-proc-46 sb-tmpfs-40
[roman.gushchin@linux.dev: fix build warnings]
Link: https://lkml.kernel.org/r/Yr+ZTnLb9lJk6fJO@castle
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lkml.kernel.org/r/20220601032227.4076670-4-roman.gushchin@linux.dev
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Now that we introduced new infrastructure to increase the type safety
for filesystems supporting idmapped mounts port the first part of the
vfs over to them.
This ports the attribute changes codepaths to rely on the new better
helpers using a dedicated type.
Before this change we used to take a shortcut and place the actual
values that would be written to inode->i_{g,u}id into struct iattr. This
had the advantage that we moved idmappings mostly out of the picture
early on but it made reasoning about changes more difficult than it
should be.
The filesystem was never explicitly told that it dealt with an idmapped
mount. The transition to the value that needed to be stored in
inode->i_{g,u}id appeared way too early and increased the probability of
bugs in various codepaths.
We know place the same value in struct iattr no matter if this is an
idmapped mount or not. The vfs will only deal with type safe
vfs{g,u}id_t. This makes it massively safer to perform permission checks
as the type will tell us what checks we need to perform and what helpers
we need to use.
Fileystems raising FS_ALLOW_IDMAP can't simply write ia_vfs{g,u}id to
inode->i_{g,u}id since they are different types. Instead they need to
use the dedicated vfs{g,u}id_to_k{g,u}id() helpers that map the
vfs{g,u}id into the filesystem.
The other nice effect is that filesystems like overlayfs don't need to
care about idmappings explicitly anymore and can simply set up struct
iattr accordingly directly.
Link: https://lore.kernel.org/lkml/CAHk-=win6+ahs1EwLkcq8apqLi_1wXFWbrPf340zYEhObpz4jA@mail.gmail.com [1]
Link: https://lore.kernel.org/r/20220621141454.2914719-9-brauner@kernel.org
Cc: Seth Forshee <sforshee@digitalocean.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
CC: linux-fsdevel@vger.kernel.org
Reviewed-by: Seth Forshee <sforshee@digitalocean.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Port the is_quota_modification() and dqout_transfer() helper to type
safe vfs{g,u}id_t. Since these helpers are only called by a few
filesystems don't introduce a new helper but simply extend the existing
helpers to pass down the mount's idmapping.
Note, that this is a non-functional change, i.e. nothing will have
happened here or at the end of this series to how quota are done! This
a change necessary because we will at the end of this series make
ownership changes easier to reason about by keeping the original value
in struct iattr for both non-idmapped and idmapped mounts.
For now we always pass the initial idmapping which makes the idmapping
functions these helpers call nops.
This is done because we currently always pass the actual value to be
written to i_{g,u}id via struct iattr. While this allowed us to treat
the {g,u}id values in struct iattr as values that can be directly
written to inode->i_{g,u}id it also increases the potential for
confusion for filesystems.
Now that we are have dedicated types to prevent this confusion we will
ultimately only map the value from the idmapped mount into a filesystem
value that can be written to inode->i_{g,u}id when the filesystem
actually updates the inode. So pass down the initial idmapping until we
finished that conversion at which point we pass down the mount's
idmapping.
Since struct iattr uses an anonymous union with overlapping types as
supported by the C standard, filesystems that haven't converted to
ia_vfs{g,u}id won't see any difference and things will continue to work
as before. In other words, no functional changes intended with this
change.
Link: https://lore.kernel.org/r/20220621141454.2914719-7-brauner@kernel.org
Cc: Seth Forshee <sforshee@digitalocean.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
CC: linux-fsdevel@vger.kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Seth Forshee <sforshee@digitalocean.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Earlier we introduced new helpers to abstract ownership update and
remove code duplication. This converts all filesystems supporting
idmapped mounts to make use of these new helpers.
For now we always pass the initial idmapping which makes the idmapping
functions these helpers call nops.
This is done because we currently always pass the actual value to be
written to i_{g,u}id via struct iattr. While this allowed us to treat
the {g,u}id values in struct iattr as values that can be directly
written to inode->i_{g,u}id it also increases the potential for
confusion for filesystems.
Now that we are have dedicated types to prevent this confusion we will
ultimately only map the value from the idmapped mount into a filesystem
value that can be written to inode->i_{g,u}id when the filesystem
actually updates the inode. So pass down the initial idmapping until we
finished that conversion at which point we pass down the mount's
idmapping.
No functional changes intended.
Link: https://lore.kernel.org/r/20220621141454.2914719-6-brauner@kernel.org
Cc: Seth Forshee <sforshee@digitalocean.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
CC: linux-fsdevel@vger.kernel.org
Reviewed-by: Seth Forshee <sforshee@digitalocean.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
As described in commit 02117b8ae9 ("f2fs: Set GF_NOFS in
read_cache_page_gfp while doing f2fs_quota_read"), we must not enter
filesystem reclaim while holding the dq_lock. Prevent this more generally
by using memalloc_nofs_save() while holding the lock.
Link: https://lore.kernel.org/r/20220605143815.2330891-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Strangely, dquot_quota_sync ignores the return code from the ->sync_fs
call, which means that quotacalls like Q_SYNC never see the error. This
doesn't seem right, so fix that.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christian Brauner <brauner@kernel.org>
Move most of the block related definition out of fs.h into more suitable
headers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Instead of having all the sysctl handlers deal with user pointers, which
is rather hairy in terms of the BPF interaction, copy the input to and
from userspace in common code. This also means that the strings are
always NUL-terminated by the common code, making the API a little bit
safer.
As most handler just pass through the data to one of the common handlers
a lot of the changes are mechnical.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Anything that walks all inodes on sb->s_inodes list without rescheduling
risks softlockups.
Previous efforts were made in 2 functions, see:
c27d82f fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
ac05fbb inode: don't softlockup when evicting inodes
but there hasn't been an audit of all walkers, so do that now. This
also consistently moves the cond_resched() calls to the bottom of each
loop in cases where it already exists.
One loop remains: remove_dquot_ref(), because I'm not quite sure how
to deal with that one w/o taking the i_lock.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull vfs d_inode/d_flags memory ordering fixes from Al Viro:
"Fallout from tree-wide audit for ->d_inode/->d_flags barriers use.
Basically, the problem is that negative pinned dentries require
careful treatment - unless ->d_lock is locked or parent is held at
least shared, another thread can make them positive right under us.
Most of the uses turned out to be safe - the main surprises as far as
filesystems are concerned were
- race in dget_parent() fastpath, that might end up with the caller
observing the returned dentry _negative_, due to insufficient
barriers. It is positive in memory, but we could end up seeing the
wrong value of ->d_inode in CPU cache. Fixed.
- manual checks that result of lookup_one_len_unlocked() is positive
(and rejection of negatives). Again, insufficient barriers (we
might end up with inconsistent observed values of ->d_inode and
->d_flags). Fixed by switching to a new primitive that does the
checks itself and returns ERR_PTR(-ENOENT) instead of a negative
dentry. That way we get rid of boilerplate converting negatives
into ERR_PTR(-ENOENT) in the callers and have a single place to
deal with the barrier-related mess - inside fs/namei.c rather than
in every caller out there.
The guts of pathname resolution *do* need to be careful - the race
found by Ritesh is real, as well as several similar races.
Fortunately, it turns out that we can take care of that with fairly
local changes in there.
The tree-wide audit had not been fun, and I hate the idea of repeating
it. I think the right approach would be to annotate the places where
we are _not_ guaranteed ->d_inode/->d_flags stability and have sparse
catch regressions. But I'm still not sure what would be the least
invasive way of doing that and it's clearly the next cycle fodder"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs/namei.c: fix missing barriers when checking positivity
fix dget_parent() fastpath race
new helper: lookup_positive_unlocked()
fs/namei.c: pull positivity check into follow_managed()
Most of the callers of lookup_one_len_unlocked() treat negatives are
ERR_PTR(-ENOENT). Provide a helper that would do just that. Note
that a pinned positive dentry remains positive - it's ->d_inode is
stable, etc.; a pinned _negative_ dentry can become positive at any
point as long as you are not holding its parent at least shared.
So using lookup_one_len_unlocked() needs to be careful;
lookup_positive_unlocked() is safer and that's what the callers
end up open-coding anyway.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Quota statistics counted as 64-bit per-cpu counter. Reading sums per-cpu
fractions as signed 64-bit int, filters negative values and then reports
lower half as signed 32-bit int.
Result may looks like:
fs.quota.allocated_dquots = 22327
fs.quota.cache_hits = -489852115
fs.quota.drops = -487288718
fs.quota.free_dquots = 22083
fs.quota.lookups = -486883485
fs.quota.reads = 22327
fs.quota.syncs = 335064
fs.quota.writes = 3088689
Values bigger than 2^31-1 reported as negative.
All counters except "allocated_dquots" and "free_dquots" are monotonic,
thus they should be reported as is without filtering negative values.
Kernel doesn't have generic helper for 64-bit sysctl yet,
let's use at least unsigned long.
Link: https://lore.kernel.org/r/157337934693.2078.9842146413181153727.stgit@buzz
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Jan Kara <jack@suse.cz>
Make dquot_get_state() gracefully handle a situation when there are no
quota files present even though quotas are enabled.
Signed-off-by: Jan Kara <jack@suse.cz>
Quota on and quota off are protected by s_umount semaphore held in
exclusive mode since commit 7d6cd73d33 "quota: Hold s_umount in
exclusive mode when enabling / disabling quotas". This makes it
impossible for dquot_disable() to race with other enabling or disabling
of quotas. Simplify the cleanup done by dquot_disable() based on this
fact and also remove some stale comments. As a bonus this cleanup makes
dquot_disable() properly handle a case when there are no quota inodes.
Signed-off-by: Jan Kara <jack@suse.cz>
Now dquot_enable() has only two internal callers and both of them just
need to update quota flags and don't need most of checks. Just drop
dquot_enable() and fold necessary functionality into the two calling
places.
Signed-off-by: Jan Kara <jack@suse.cz>
Rename vfs_load_quota_inode() to dquot_load_quota_inode() to be
consistent with naming of other functions used for enabling quota
accounting from filesystems. Also export the function and add some
sanity checks to assure filesystems are calling the function properly.
Signed-off-by: Jan Kara <jack@suse.cz>
We already have quota inode loaded when resuming quotas. Use
vfs_load_quota() to avoid some pointless churn with the quota inode.
Signed-off-by: Jan Kara <jack@suse.cz>
Factor out setting up of quota inode and eventual error cleanup from
vfs_load_quota_inode(). This will simplify situation for filesystems
that don't have any quota inodes.
Signed-off-by: Jan Kara <jack@suse.cz>
There is a race window where quota was redirted once we drop dq_list_lock inside dqput(),
but before we grab dquot->dq_lock inside dquot_release()
TASK1 TASK2 (chowner)
->dqput()
we_slept:
spin_lock(&dq_list_lock)
if (dquot_dirty(dquot)) {
spin_unlock(&dq_list_lock);
dquot->dq_sb->dq_op->write_dquot(dquot);
goto we_slept
if (test_bit(DQ_ACTIVE_B, &dquot->dq_flags)) {
spin_unlock(&dq_list_lock);
dquot->dq_sb->dq_op->release_dquot(dquot);
dqget()
mark_dquot_dirty()
dqput()
goto we_slept;
}
So dquot dirty quota will be released by TASK1, but on next we_sleept loop
we detect this and call ->write_dquot() for it.
XFSTEST: 440a80d4cb
Link: https://lore.kernel.org/r/20191031103920.3919-2-dmonakhov@openvz.org
CC: stable@vger.kernel.org
Signed-off-by: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru>
Signed-off-by: Jan Kara <jack@suse.cz>
It is meaningless to increase DQST_LOOKUPS number while iterating
over dirty/inuse list, so just avoid it.
Link: https://lore.kernel.org/r/20190926083408.4269-1-cgxu519@zoho.com.cn
Signed-off-by: Chengguang Xu <cgxu519@zoho.com.cn>
Signed-off-by: Jan Kara <jack@suse.cz>
We reset time limit when current usage is smaller
or equal to soft limit in other place, so follow
this rule in do_set_dqblk().
Signed-off-by: Chengguang Xu <cgxu519@zoho.com.cn>
Link: https://lore.kernel.org/r/20190724053216.19392-1-cgxu519@zoho.com.cn
Signed-off-by: Jan Kara <jack@suse.cz>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAl0mED8ACgkQnJ2qBz9k
QNnZ9wgAmC+eP8m6jB38HM7gZ+fWGEX3+FvnjbMbnXmNJTsnYWYC1VIRZhwKZb4b
42OGinfLq5tZMY/whrFBdB/c4UbVhAMhd1aFTpM2n5A6FR12YZxaLZEC+MLy3T7z
VU8m4uWDn80OvlUByo4Bylh+Icj78m8tLgj8SHSWxoh/DlGVKSLj9OKufV9Laens
YxubcUxE5sEEu8IVQen84283oJoizmeQf+f9yogAKIaskDLBzxqBIZwEACEUUchz
kEWRiHwS+Ou8EUHuwXqdKKksQgoLHEdxz2szYK1xSQ1wPmxMKPG5DqbQZv2QUBD0
Ek5T5YP4Tmph4s14n+jKDhakAJcqIQ==
=HWaa
-----END PGP SIGNATURE-----
Merge tag 'for_v5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext2, udf and quota updates from Jan Kara:
- some ext2 fixes and cleanups
- a fix of udf bug when extending files
- a fix of quota Q_XGETQSTAT[V] handling
* tag 'for_v5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: Fix incorrect final NOT_ALLOCATED (hole) extent length
ext2: Use kmemdup rather than duplicating its implementation
quota: honor quota type in Q_XGETQSTAT[V] calls
ext2: Always brelse bh on failure in ext2_iget()
ext2: add missing brelse() in ext2_iget()
ext2: Fix a typo in ext2_getattr argument
ext2: fix a typo in comment
ext2: add missing brelse() in ext2_new_inode()
ext2: optimize ext2_xattr_get()
ext2: introduce new helper for xattr entry comparison
ext2: merge xattr next entry check to ext2_xattr_entry_valid()
ext2: code cleanup for ext2_preread_inode()
ext2: code cleanup by using test_opt() and clear_opt()
doc: ext2: update description of quota options for ext2
ext2: Strengthen xattr block checks
ext2: Merge loops in ext2_xattr_set()
ext2: introduce helper for xattr entry validation
ext2: introduce helper for xattr header validation
quota: add dqi_dirty_list description to comment of Dquot List Management
Run below script as root, dquot_add_space will return -EDQUOT since
__dquot_transfer call dquot_add_space with flags=0, and dquot_add_space
think it's a preallocation. Fix it by set flags as DQUOT_SPACE_WARN.
mkfs.ext4 -O quota,project /dev/vdb
mount -o prjquota /dev/vdb /mnt
setquota -P 23 1 1 0 0 /dev/vdb
dd if=/dev/zero of=/mnt/test-file bs=4K count=1
chattr -p 23 test-file
Fixes: 7b9ca4c61b ("quota: Reduce contention on dq_data_lock")
Signed-off-by: yangerkun <yangerkun@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Actually there are four lists for dquot management, so add
the description of dqui_dirty_list to comment.
Signed-off-by: Chengguang Xu <cgxu519@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
When we fail from allocating inode/space, we back out
the change we already did. In a special case which has
exceeded soft limit by the change, we should also check
time limit and reset it properly.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Local variable *reserved* of remove_dquot_ref() is only used if
define CONFIG_QUOTA_DEBUG, but not ebraced in CONFIG_QUOTA_DEBUG
macro, which leads to unused-but-set-variable warning when compiling.
This patch ebrace it into CONFIG_QUOTA_DEBUG macro like what is done
in add_dquot_ref().
Signed-off-by: Jiang Biao <benbjiang@tencent.com>
Signed-off-by: Jan Kara <jack@suse.cz>
We need to check return code only when calling ->read_dqblk(),
so fix it properly.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Signed-off-by: Jan Kara <jack@suse.cz>