linux

iv/linux

History

Ziwei Dai 5da7cb193d rcu/kvfree: Avoid freeing new kfree_rcu() memory after old grace period Memory passed to kvfree_rcu() that is to be freed is tracked by a per-CPU kfree_rcu_cpu structure, which in turn contains pointers to kvfree_rcu_bulk_data structures that contain pointers to memory that has not yet been handed to RCU, along with an kfree_rcu_cpu_work structure that tracks the memory that has already been handed to RCU. These structures track three categories of memory: (1) Memory for kfree(), (2) Memory for kvfree(), and (3) Memory for both that arrived during an OOM episode. The first two categories are tracked in a cache-friendly manner involving a dynamically allocated page of pointers (the aforementioned kvfree_rcu_bulk_data structures), while the third uses a simple (but decidedly cache-unfriendly) linked list through the rcu_head structures in each block of memory. On a given CPU, these three categories are handled as a unit, with that CPU's kfree_rcu_cpu_work structure having one pointer for each of the three categories. Clearly, new memory for a given category cannot be placed in the corresponding kfree_rcu_cpu_work structure until any old memory has had its grace period elapse and thus has been removed. And the kfree_rcu_monitor() function does in fact check for this. Except that the kfree_rcu_monitor() function checks these pointers one at a time. This means that if the previous kfree_rcu() memory passed to RCU had only category 1 and the current one has only category 2, the kfree_rcu_monitor() function will send that current category-2 memory along immediately. This can result in memory being freed too soon, that is, out from under unsuspecting RCU readers. To see this, consider the following sequence of events, in which: o Task A on CPU 0 calls rcu_read_lock(), then uses "from_cset", then is preempted. o CPU 1 calls kfree_rcu(cset, rcu_head) in order to free "from_cset" after a later grace period. Except that "from_cset" is freed right after the previous grace period ended, so that "from_cset" is immediately freed. Task A resumes and references "from_cset"'s member, after which nothing good happens. In full detail: CPU 0 CPU 1 ---------------------- ---------------------- count_memcg_event_mm() \|rcu_read_lock() <--- \|mem_cgroup_from_task() \|// css_set_ptr is the "from_cset" mentioned on CPU 1 \|css_set_ptr = rcu_dereference((task)->cgroups) \|// Hard irq comes, current task is scheduled out. cgroup_attach_task() \|cgroup_migrate() \|cgroup_migrate_execute() \|css_set_move_task(task, from_cset, to_cset, true) \|cgroup_move_task(task, to_cset) \|rcu_assign_pointer(.., to_cset) \|... \|cgroup_migrate_finish() \|put_css_set_locked(from_cset) \|from_cset->refcount return 0 \|kfree_rcu(cset, rcu_head) // free from_cset after new gp \|add_ptr_to_bulk_krc_lock() \|schedule_delayed_work(&krcp->monitor_work, ..) kfree_rcu_monitor() \|krcp->bulk_head[0]'s work attached to krwp->bulk_head_free[] \|queue_rcu_work(system_wq, &krwp->rcu_work) \|if rwork->rcu.work is not in WORK_STRUCT_PENDING_BIT state, \|call_rcu(&rwork->rcu, rcu_work_rcufn) <--- request new gp // There is a perious call_rcu(.., rcu_work_rcufn) // gp end, rcu_work_rcufn() is called. rcu_work_rcufn() \|__queue_work(.., rwork->wq, &rwork->work); \|kfree_rcu_work() \|krwp->bulk_head_free[0] bulk is freed before new gp end!!! \|The "from_cset" is freed before new gp end. // the task resumes some time later. \|css_set_ptr->subsys[(subsys_id) <--- Caused kernel crash, because css_set_ptr is freed. This commit therefore causes kfree_rcu_monitor() to refrain from moving kfree_rcu() memory to the kfree_rcu_cpu_work structure until the RCU grace period has completed for all three categories. v2: Use helper function instead of inserted code block at kfree_rcu_monitor(). Fixes: `34c8817455` ("rcu: Support kfree_bulk() interface in kfree_rcu()") Fixes: `5f3c8d6204` ("rcu/tree: Maintain separate array for vmalloc ptrs") Reported-by: Mukesh Ojha <quic_mojha@quicinc.com> Signed-off-by: Ziwei Dai <ziwei.dai@unisoc.com> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Tested-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>		2023-04-06 10:04:23 -07:00
..
Kconfig	printk changes for 6.2	2022-12-12 09:01:36 -08:00
Kconfig.debug	rcu: Allow up to five minutes expedited RCU CPU stall-warning timeouts	2023-01-09 12:09:52 -08:00
Makefile	rcuperf: Change rcuperf to rcuscale	2020-08-24 18:39:24 -07:00
rcu_segcblist.c	rcu: Throttle callback invocation based on number of ready callbacks	2023-01-03 17:28:34 -08:00
rcu_segcblist.h	rcu: Throttle callback invocation based on number of ready callbacks	2023-01-03 17:28:34 -08:00
rcu.h	Merge branch 'stall.2023.01.09a' into HEAD	2023-02-02 16:40:07 -08:00
rcuscale.c	rcu/rcuscale: Use call_rcu_hurry() for async reader test	2022-11-29 14:04:33 -08:00
rcutorture.c	rcutorture: Drop sparse lock-acquisition annotations	2023-01-05 12:10:35 -08:00
refscale.c	refscale: Add tests using SLAB_TYPESAFE_BY_RCU	2023-01-05 12:09:42 -08:00
srcutiny.c	srcu: Make Tiny synchronize_srcu() check for readers	2022-12-01 15:49:12 -08:00
srcutree.c	srcu: Update comment after the index flip	2023-01-03 17:49:23 -08:00
sync.c	rcu/sync: Use call_rcu_hurry() instead of call_rcu	2022-11-29 14:04:33 -08:00
tasks.h	rcu-tasks: Handle queue-shrink/callback-enqueue race condition	2023-01-03 17:52:17 -08:00
tiny.c	rcu: Refactor kvfree_call_rcu() and high-level helpers	2023-01-03 17:48:40 -08:00
tree_exp.h	rcu: Allow expedited RCU CPU stall warnings to dump task stacks	2023-01-03 17:47:44 -08:00
tree_nocb.h	rcu: Shrinker for lazy rcu	2022-11-29 14:02:52 -08:00
tree_plugin.h	rcu: Synchronize ->qsmaskinitnext in rcu_boost_kthread_setaffinity()	2022-10-18 14:59:57 -07:00
tree_stall.h	rcu: Allow up to five minutes expedited RCU CPU stall-warning timeouts	2023-01-09 12:09:52 -08:00
tree.c	rcu/kvfree: Avoid freeing new kfree_rcu() memory after old grace period	2023-04-06 10:04:23 -07:00
tree.h	rcu: Add RCU stall diagnosis information	2023-01-05 12:21:11 -08:00
update.c	Merge branch 'stall.2023.01.09a' into HEAD	2023-02-02 16:40:07 -08:00