linux/fs
Filipe Manana 2f6397e448 btrfs: don't refill whole delayed refs block reserve when starting transaction
Since commit 28270e25c6 ("btrfs: always reserve space for delayed refs
when starting transaction") we started not only to reserve metadata space
for the delayed refs a caller of btrfs_start_transaction() might generate
but also to try to fully refill the delayed refs block reserve, because
there are several case where we generate delayed refs and haven't reserved
space for them, relying on the global block reserve. Relying too much on
the global block reserve is not always safe, and can result in hitting
-ENOSPC during transaction commits or worst, in rare cases, being unable
to mount a filesystem that needs to do orphan cleanup or anything that
requires modifying the filesystem during mount, and has no more
unallocated space and the metadata space is nearly full. This was
explained in detail in that commit's change log.

However the gap between the reserved amount and the size of the delayed
refs block reserve can be huge, so attempting to reserve space for such
a gap can result in allocating many metadata block groups that end up
not being used. After a recent patch, with the subject:

  "btrfs: add new unused block groups to the list of unused block groups"

We started to add new block groups that are unused to the list of unused
block groups, to avoid having them around for a very long time in case
they are never used, because a block group is only added to the list of
unused block groups when we deallocate the last extent or when mounting
the filesystem and the block group has 0 bytes used. This is not a problem
introduced by the commit mentioned earlier, it always existed as our
metadata space reservations are, most of the time, pessimistic and end up
not using all the space they reserved, so we can occasionally end up with
one or two unused metadata block groups for a long period. However after
that commit mentioned earlier, we are just more pessimistic in the
metadata space reservations when starting a transaction and therefore the
issue is more likely to happen.

This however is not always enough because we might create unused metadata
block groups when reserving metadata space at a high rate if there's
always a gap in the delayed refs block reserve and the cleaner kthread
isn't triggered often enough or is busy with other work (running delayed
iputs, cleaning deleted roots, etc), not to mention the block group's
allocated space is only usable for a new block group after the transaction
used to remove it is committed.

A user reported that he's getting a lot of allocated metadata block groups
but the usage percentage of metadata space was very low compared to the
total allocated space, specially after running a series of block group
relocations.

So for now stop trying to refill the gap in the delayed refs block reserve
and reserve space only for the delayed refs we are expected to generate
when starting a transaction.

CC: stable@vger.kernel.org # 6.7+
Reported-by: Ivan Shapovalov <intelfx@intelfx.name>
Link: https://lore.kernel.org/linux-btrfs/9cdbf0ca9cdda1b4c84e15e548af7d7f9f926382.camel@intelfx.name/
Link: https://lore.kernel.org/linux-btrfs/CAL3q7H6802ayLHUJFztzZAVzBLJAGdFx=6FHNNy87+obZXXZpQ@mail.gmail.com/
Tested-by: Ivan Shapovalov <intelfx@intelfx.name>
Reported-by: Heddxh <g311571057@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CAE93xANEby6RezOD=zcofENYZOT-wpYygJyauyUAZkLv6XVFOA@mail.gmail.com/
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-02-13 18:39:09 +01:00
..
9p Bunch of small fixes: 2023-11-04 09:20:04 -10:00
adfs adfs: convert to new timestamp accessors 2023-10-18 13:26:18 +02:00
affs vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
afs afs: Fix refcount underflow from error handling race 2023-12-11 15:40:41 -08:00
autofs autofs: add: new_inode check in autofs_fill_super() 2023-11-20 14:56:36 +01:00
bcachefs bcachefs: Close journal entry if necessary when flushing all pins 2023-12-10 16:53:46 -05:00
befs vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
bfs bfs: convert to new timestamp accessors 2023-10-18 13:26:19 +02:00
btrfs btrfs: don't refill whole delayed refs block reserve when starting transaction 2024-02-13 18:39:09 +01:00
cachefiles - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") 2023-08-29 14:25:26 -07:00
ceph Two items: 2023-11-10 09:52:56 -08:00
coda coda: convert to new timestamp accessors 2023-10-18 13:26:19 +02:00
configfs configfs: convert to new timestamp accessors 2023-10-18 13:26:19 +02:00
cramfs vfs-6.7.ctime 2023-10-30 09:47:13 -10:00
crypto This update includes the following changes: 2023-11-02 16:15:30 -10:00
debugfs Revert "debugfs: annotate debugfs handlers vs. removal with lockdep" 2023-12-04 07:43:16 +01:00
devpts devpts: convert to new timestamp accessors 2023-10-18 13:26:20 +02:00
dlm dlm: slow down filling up processing queue 2023-10-12 15:21:00 -05:00
ecryptfs fs: Pass AT_GETATTR_NOSEC flag to getattr interface function 2023-11-18 14:54:07 +01:00
efivarfs vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
efs vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
erofs MAINTAINERS: erofs: add EROFS webpage 2023-11-17 19:55:46 +08:00
exfat exfat: fix ctime is not updated 2023-11-03 22:24:11 +09:00
exportfs fs: fix build error with CONFIG_EXPORTFS=m or not defined 2023-10-28 16:16:19 +02:00
ext2 ext2: Fix ki_pos update for DIO buffered-io fallback case 2023-11-22 10:17:10 +01:00
ext4 ext4: fix warning in ext4_dio_write_end_io() 2023-11-30 23:29:34 -05:00
f2fs vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
fat vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
freevxfs vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
fscache fscache: Use clear_and_wake_up_bit() in fscache_create_volume_work() 2023-01-30 12:51:54 +00:00
fuse fuse: disable FOPEN_PARALLEL_DIRECT_WRITES with FUSE_DIRECT_IO_ALLOW_MMAP 2023-12-04 10:19:32 +01:00
gfs2 gfs2 fixes 2023-11-07 11:54:17 -08:00
hfs vfs-6.7.ctime 2023-10-30 09:47:13 -10:00
hfsplus vfs-6.7.ctime 2023-10-30 09:47:13 -10:00
hostfs hostfs: convert to new timestamp accessors 2023-10-18 14:08:22 +02:00
hpfs hpfs: convert to new timestamp accessors 2023-10-18 14:08:22 +02:00
hugetlbfs vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
iomap Many singleton patches against the MM code. The patch series which are 2023-11-02 19:38:47 -10:00
isofs isofs: convert to new timestamp accessors 2023-10-18 14:08:22 +02:00
jbd2 jbd2: fix soft lockup in journal_finish_inode_data_buffers() 2023-12-12 10:25:46 -05:00
jffs2 vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
jfs vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
kernfs Driver core changes for 6.7-rc1 2023-11-03 15:15:47 -10:00
lockd SUNRPC: change how svc threads are asked to exit. 2023-10-16 12:44:04 -04:00
minix minix: convert to new timestamp accessors 2023-10-18 14:08:23 +02:00
netfs netfs: Only call folio_start_fscache() one time for each folio 2023-09-18 12:03:46 -07:00
nfs NFS client updates for Linux 6.7 2023-11-08 13:39:16 -08:00
nfs_common NFSv4.2: remove MODULE_LICENSE in non-modules 2023-04-13 13:13:52 -07:00
nfsd nfsd-6.7 fixes: 2023-11-18 11:23:32 -08:00
nilfs2 nilfs2: prevent WARNING in nilfs_sufile_set_segment_usage() 2023-12-06 16:12:50 -08:00
nls nls: Hide new NLS_UCS2_UTILS 2023-08-31 12:07:34 -05:00
notify vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
ntfs vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
ntfs3 vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
ocfs2 As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
omfs omfs: convert to new timestamp accessors 2023-10-18 14:08:25 +02:00
openpromfs openpromfs: convert to new timestamp accessors 2023-10-18 14:08:25 +02:00
orangefs vfs-6.7.ctime 2023-10-30 09:47:13 -10:00
overlayfs vfs-6.7-rc3.fixes 2023-11-24 09:45:40 -08:00
proc mm/pagemap: fix wr-protect even if PM_SCAN_WP_MATCHING not set 2023-12-06 16:12:45 -08:00
pstore pstore updates for v6.7-rc1 2023-10-30 19:26:39 -10:00
qnx4 qnx4: convert to new timestamp accessors 2023-10-18 14:08:26 +02:00
qnx6 qnx6: convert to new timestamp accessors 2023-10-18 14:08:26 +02:00
quota Many singleton patches against the MM code. The patch series which are 2023-11-02 19:38:47 -10:00
ramfs ramfs: convert to new timestamp accessors 2023-10-18 14:08:26 +02:00
reiserfs Many singleton patches against the MM code. The patch series which are 2023-11-02 19:38:47 -10:00
romfs vfs-6.7.ctime 2023-10-30 09:47:13 -10:00
smb four import client fixes addressing potential overflows, all marked for stable as well 2023-12-14 19:57:42 -08:00
squashfs squashfs: squashfs_read_data need to check if the length is 0 2023-12-06 16:12:45 -08:00
sysfs kernfs: sysfs: support custom llseek method for sysfs entries 2023-10-05 13:42:11 +02:00
sysv sysv: convert to new timestamp accessors 2023-10-18 14:08:28 +02:00
tracefs eventfs: Make sure that parent->d_inode is locked in creating files/dirs 2023-11-22 18:37:33 -05:00
ubifs This pull request contains updates for UBI and UBIFS 2023-11-05 08:28:32 -10:00
udf \n 2023-11-02 08:19:51 -10:00
ufs fix ufs_get_locked_folio() breakage 2023-12-13 11:14:09 -05:00
unicode unicode: remove MODULE_LICENSE in non-modules 2023-04-13 13:13:54 -07:00
vboxsf vboxsf: convert to new timestamp accessors 2023-10-18 14:08:29 +02:00
verity fsverity: skip PKCS#7 parser when keyring is empty 2023-08-20 10:33:43 -07:00
xfs Code changes for 6.7-rc2: 2023-11-25 08:57:09 -08:00
zonefs zonefs: convert to new timestamp accessors 2023-10-18 14:08:29 +02:00
aio.c aio: Annotate struct kioctx_table with __counted_by 2023-09-20 14:22:01 +02:00
anon_inodes.c treewide: mark stuff as __ro_after_init 2023-10-18 14:43:23 -07:00
attr.c fs: convert core infrastructure to new timestamp accessors 2023-10-18 13:26:15 +02:00
bad_inode.c fs: convert core infrastructure to new timestamp accessors 2023-10-18 13:26:15 +02:00
binfmt_elf_fdpic.c execve updates for v6.7-rc1 2023-10-30 19:28:19 -10:00
binfmt_elf_test.c
binfmt_elf.c binfmt_elf: Only report padzero() errors when PROT_WRITE 2023-10-03 19:48:44 -07:00
binfmt_flat.c
binfmt_misc.c execve updates for v6.7-rc1 2023-10-30 19:28:19 -10:00
binfmt_script.c
buffer.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
char_dev.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
compat_binfmt_elf.c
coredump.c v6.5/vfs.misc 2023-06-26 09:50:21 -07:00
d_path.c fs: d_path: include internal.h 2023-05-17 09:16:59 +02:00
dax.c mm: convert DAX lock/unlock page to lock/unlock folio 2023-10-04 10:32:20 -07:00
dcache.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
direct-io.c treewide: mark stuff as __ro_after_init 2023-10-18 14:43:23 -07:00
drop_caches.c fs: drop_caches: draining pages before dropping caches 2023-08-18 10:12:11 -07:00
eventfd.c eventfd: prevent underflow for eventfd semaphores 2023-07-11 11:41:34 +02:00
eventpoll.c treewide: mark stuff as __ro_after_init 2023-10-18 14:43:23 -07:00
exec.c mm/mremap: allow moves within the same VMA for stack moves 2023-10-04 10:32:20 -07:00
fcntl.c treewide: mark stuff as __ro_after_init 2023-10-18 14:43:23 -07:00
fhandle.c exportfs: add helpers to check if filesystem can encode/decode file handles 2023-10-24 17:57:45 +02:00
file_table.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
file.c file, i915: fix file reference for mmap_singleton() 2023-10-25 22:17:04 +02:00
filesystems.c
fs_context.c fs: factor out vfs_parse_monolithic_sep() helper 2023-10-12 18:53:36 +03:00
fs_parser.c
fs_pin.c
fs_struct.c kill do_each_thread() 2023-08-21 13:46:25 -07:00
fs_types.c
fs-writeback.c vfs-6.7.misc 2023-10-30 09:14:19 -10:00
fsopen.c fsconfig: ensure that dirfd is set to aux 2023-09-22 14:09:06 +02:00
init.c fs: add a new SB_I_NOUMASK flag 2023-10-19 11:02:47 +02:00
inode.c filemap: add a per-mapping stable writes flag 2023-11-20 15:05:18 +01:00
internal.h fs: store real path instead of fake path in backing file f_path 2023-10-19 11:03:15 +02:00
ioctl.c v6.6-vfs.super 2023-08-28 11:04:18 -07:00
Kconfig mm/hugetlb: have CONFIG_HUGETLB_PAGE select CONFIG_XARRAY_MULTI 2023-12-06 16:12:49 -08:00
Kconfig.binfmt riscv: support the elf-fdpic binfmt loader 2023-08-23 14:17:43 -07:00
kernel_read_file.c fs: Fix kernel-doc warnings 2023-08-19 12:12:12 +02:00
libfs.c libfs: getdents() should return 0 after reaching EOD 2023-11-20 15:34:22 +01:00
locks.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
Makefile bcachefs: Initial commit 2023-10-22 17:08:07 -04:00
mbcache.c mbcache: dynamically allocate the mbcache shrinker 2023-10-04 10:32:25 -07:00
mnt_idmapping.c fs: export mnt_idmap_get/mnt_idmap_put 2023-11-03 23:28:33 +01:00
mount.h
mpage.c buffer: remove folio_create_empty_buffers() 2023-10-25 16:47:10 -07:00
namei.c vfs-6.7.misc 2023-10-30 09:14:19 -10:00
namespace.c fs: indicate request originates from old mount API 2023-12-15 20:27:03 +01:00
nsfs.c fs: convert core infrastructure to new timestamp accessors 2023-10-18 13:26:15 +02:00
open.c fs: store real path instead of fake path in backing file f_path 2023-10-19 11:03:15 +02:00
pipe.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
pnode.c fs: allow to mount beneath top mount 2023-05-19 04:30:22 +02:00
pnode.h fs: allow to mount beneath top mount 2023-05-19 04:30:22 +02:00
posix_acl.c fs: convert to ctime accessor functions 2023-07-13 10:28:04 +02:00
proc_namespace.c tty, proc, kernfs, random: Use copy_splice_read() 2023-05-24 08:42:16 -06:00
read_write.c fs: Fix one kernel-doc comment 2023-08-15 08:32:45 +02:00
readdir.c vfs: get rid of old '->iterate' directory operation 2023-08-06 15:08:35 +02:00
remap_range.c fs: use UB-safe check for signed addition overflow in remap_verify_area 2023-05-24 11:03:59 +02:00
select.c
seq_file.c
signalfd.c
splice.c - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") 2023-08-29 14:25:26 -07:00
stack.c fs: convert core infrastructure to new timestamp accessors 2023-10-18 13:26:15 +02:00
stat.c fs: Pass AT_GETATTR_NOSEC flag to getattr interface function 2023-11-18 14:54:07 +01:00
statfs.c statfs: enforce statfs[64] structure initialization 2023-05-17 15:20:17 +02:00
super.c overlayfs update for 6.7-rc1 2023-11-07 11:46:31 -08:00
sync.c
sysctls.c sysctl: Refactor base paths registrations 2023-05-23 21:43:26 -07:00
timerfd.c
userfaultfd.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
utimes.c fs.idmapped.v6.3 2023-02-20 11:53:11 -08:00
xattr.c xattr: make the xattr array itself const 2023-10-09 16:24:16 +02:00