writeback, cgroup: switch inodes with dirty timestamps to release dying cgwbs

The cgwb cleanup routine will try to release the dying cgwb by switching
the attached inodes.  It fetches the attached inodes from wb->b_attached
list, omitting the fact that inodes only with dirty timestamps reside in
wb->b_dirty_time list, which is the case when lazytime is enabled.  This
causes enormous zombie memory cgroup when lazytime is enabled, as inodes
with dirty timestamps can not be switched to a live cgwb for a long time.

It is reasonable not to switch cgwb for inodes with dirty data, as
otherwise it may break the bandwidth restrictions.  However since the
writeback of inode metadata is not accounted for, let's also switch
inodes with dirty timestamps to avoid zombie memory and block cgroups
when laztytime is enabled.

Fixes: c22d70a162 ("writeback, cgroup: release dying cgwbs by switching attached inodes")
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20231014125511.102978-1-jefflexu@linux.alibaba.com
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
This commit is contained in:
Jingbo Xu 2023-10-14 20:55:11 +08:00 committed by Christian Brauner
parent e311ba29a5
commit 6654408a33
No known key found for this signature in database
GPG Key ID: 91C61BC06578DCA2

View File

@ -613,6 +613,24 @@ out_free:
kfree(isw);
}
static bool isw_prepare_wbs_switch(struct inode_switch_wbs_context *isw,
struct list_head *list, int *nr)
{
struct inode *inode;
list_for_each_entry(inode, list, i_io_list) {
if (!inode_prepare_wbs_switch(inode, isw->new_wb))
continue;
isw->inodes[*nr] = inode;
(*nr)++;
if (*nr >= WB_MAX_INODES_PER_ISW - 1)
return true;
}
return false;
}
/**
* cleanup_offline_cgwb - detach associated inodes
* @wb: target wb
@ -625,7 +643,6 @@ bool cleanup_offline_cgwb(struct bdi_writeback *wb)
{
struct cgroup_subsys_state *memcg_css;
struct inode_switch_wbs_context *isw;
struct inode *inode;
int nr;
bool restart = false;
@ -647,17 +664,17 @@ bool cleanup_offline_cgwb(struct bdi_writeback *wb)
nr = 0;
spin_lock(&wb->list_lock);
list_for_each_entry(inode, &wb->b_attached, i_io_list) {
if (!inode_prepare_wbs_switch(inode, isw->new_wb))
continue;
isw->inodes[nr++] = inode;
if (nr >= WB_MAX_INODES_PER_ISW - 1) {
restart = true;
break;
}
}
/*
* In addition to the inodes that have completed writeback, also switch
* cgwbs for those inodes only with dirty timestamps. Otherwise, those
* inodes won't be written back for a long time when lazytime is
* enabled, and thus pinning the dying cgwbs. It won't break the
* bandwidth restrictions, as writeback of inode metadata is not
* accounted for.
*/
restart = isw_prepare_wbs_switch(isw, &wb->b_attached, &nr);
if (!restart)
restart = isw_prepare_wbs_switch(isw, &wb->b_dirty_time, &nr);
spin_unlock(&wb->list_lock);
/* no attached inodes? bail out */