Miles Chen
43729e6fea
mm/memcontrol.c: fix use after free in mem_cgroup_iter()
...
commit 54a83d6bcbf8f4700013766b974bf9190d40b689 upstream.
This patch is sent to report an use after free in mem_cgroup_iter()
after merging commit be2657752e9e ("mm: memcg: fix use after free in
mem_cgroup_iter()").
I work with android kernel tree (4.9 & 4.14), and commit be2657752e9e
("mm: memcg: fix use after free in mem_cgroup_iter()") has been merged
to the trees. However, I can still observe use after free issues
addressed in the commit be2657752e9e. (on low-end devices, a few times
this month)
backtrace:
css_tryget <- crash here
mem_cgroup_iter
shrink_node
shrink_zones
do_try_to_free_pages
try_to_free_pages
__perform_reclaim
__alloc_pages_direct_reclaim
__alloc_pages_slowpath
__alloc_pages_nodemask
To debug, I poisoned mem_cgroup before freeing it:
static void __mem_cgroup_free(struct mem_cgroup *memcg)
for_each_node(node)
free_mem_cgroup_per_node_info(memcg, node);
free_percpu(memcg->stat);
+ /* poison memcg before freeing it */
+ memset(memcg, 0x78, sizeof(struct mem_cgroup));
kfree(memcg);
}
The coredump shows the position=0xdbbc2a00 is freed.
(gdb) p/x ((struct mem_cgroup_per_node *)0xe5009e00)->iter[8]
$13 = {position = 0xdbbc2a00, generation = 0x2efd}
0xdbbc2a00: 0xdbbc2e00 0x00000000 0xdbbc2800 0x00000100
0xdbbc2a10: 0x00000200 0x78787878 0x00026218 0x00000000
0xdbbc2a20: 0xdcad6000 0x00000001 0x78787800 0x00000000
0xdbbc2a30: 0x78780000 0x00000000 0x0068fb84 0x78787878
0xdbbc2a40: 0x78787878 0x78787878 0x78787878 0xe3fa5cc0
0xdbbc2a50: 0x78787878 0x78787878 0x00000000 0x00000000
0xdbbc2a60: 0x00000000 0x00000000 0x00000000 0x00000000
0xdbbc2a70: 0x00000000 0x00000000 0x00000000 0x00000000
0xdbbc2a80: 0x00000000 0x00000000 0x00000000 0x00000000
0xdbbc2a90: 0x00000001 0x00000000 0x00000000 0x00100000
0xdbbc2aa0: 0x00000001 0xdbbc2ac8 0x00000000 0x00000000
0xdbbc2ab0: 0x00000000 0x00000000 0x00000000 0x00000000
0xdbbc2ac0: 0x00000000 0x00000000 0xe5b02618 0x00001000
0xdbbc2ad0: 0x00000000 0x78787878 0x78787878 0x78787878
0xdbbc2ae0: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2af0: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2b00: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2b10: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2b20: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2b30: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2b40: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2b50: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2b60: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2b70: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2b80: 0x78787878 0x78787878 0x00000000 0x78787878
0xdbbc2b90: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2ba0: 0x78787878 0x78787878 0x78787878 0x78787878
In the reclaim path, try_to_free_pages() does not setup
sc.target_mem_cgroup and sc is passed to do_try_to_free_pages(), ...,
shrink_node().
In mem_cgroup_iter(), root is set to root_mem_cgroup because
sc->target_mem_cgroup is NULL. It is possible to assign a memcg to
root_mem_cgroup.nodeinfo.iter in mem_cgroup_iter().
try_to_free_pages
struct scan_control sc = {...}, target_mem_cgroup is 0x0;
do_try_to_free_pages
shrink_zones
shrink_node
mem_cgroup *root = sc->target_mem_cgroup;
memcg = mem_cgroup_iter(root, NULL, &reclaim);
mem_cgroup_iter()
if (!root)
root = root_mem_cgroup;
...
css = css_next_descendant_pre(css, &root->css);
memcg = mem_cgroup_from_css(css);
cmpxchg(&iter->position, pos, memcg);
My device uses memcg non-hierarchical mode. When we release a memcg:
invalidate_reclaim_iterators() reaches only dead_memcg and its parents.
If non-hierarchical mode is used, invalidate_reclaim_iterators() never
reaches root_mem_cgroup.
static void invalidate_reclaim_iterators(struct mem_cgroup *dead_memcg)
{
struct mem_cgroup *memcg = dead_memcg;
for (; memcg; memcg = parent_mem_cgroup(memcg)
...
}
So the use after free scenario looks like:
CPU1 CPU2
try_to_free_pages
do_try_to_free_pages
shrink_zones
shrink_node
mem_cgroup_iter()
if (!root)
root = root_mem_cgroup;
...
css = css_next_descendant_pre(css, &root->css);
memcg = mem_cgroup_from_css(css);
cmpxchg(&iter->position, pos, memcg);
invalidate_reclaim_iterators(memcg);
...
__mem_cgroup_free()
kfree(memcg);
try_to_free_pages
do_try_to_free_pages
shrink_zones
shrink_node
mem_cgroup_iter()
if (!root)
root = root_mem_cgroup;
...
mz = mem_cgroup_nodeinfo(root, reclaim->pgdat->node_id);
iter = &mz->iter[reclaim->priority];
pos = READ_ONCE(iter->position);
css_tryget(&pos->css) <- use after free
To avoid this, we should also invalidate root_mem_cgroup.nodeinfo.iter
in invalidate_reclaim_iterators().
[cai@lca.pw: fix -Wparentheses compilation warning]
Link: http://lkml.kernel.org/r/1564580753-17531-1-git-send-email-cai@lca.pw
Link: http://lkml.kernel.org/r/20190730015729.4406-1-miles.chen@mediatek.com
Fixes: 5ac8fb31ad2e ("mm: memcontrol: convert reclaim iterator to simple css refcounting")
Signed-off-by: Miles Chen <miles.chen@mediatek.com>
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-25 10:51:40 +02:00
..
2019-05-08 07:19:07 +02:00
2019-05-21 18:49:01 +02:00
2016-07-26 16:19:19 -07:00
2016-10-11 15:06:33 -07:00
2016-01-27 09:09:57 -05:00
2019-06-22 08:17:12 +02:00
2019-08-06 18:29:37 +02:00
2017-01-12 11:39:32 +01:00
2016-03-17 15:09:34 -07:00
2018-09-19 22:47:17 +02:00
2016-03-17 15:09:34 -07:00
2018-02-25 11:05:49 +01:00
2018-09-15 09:42:57 +02:00
2016-03-15 16:55:16 -07:00
2018-04-24 09:34:18 +02:00
2018-11-10 07:42:52 -08:00
2016-07-26 16:19:19 -07:00
2019-06-11 12:22:45 +02:00
2016-05-19 19:12:14 -07:00
2018-12-05 19:42:36 +01:00
2016-05-20 17:58:30 -07:00
2019-06-22 08:17:12 +02:00
2017-01-06 10:40:13 +01:00
2018-01-31 12:55:53 +01:00
2018-05-22 16:57:57 +02:00
2016-09-13 02:35:27 +02:00
2019-08-06 18:29:41 +02:00
2016-03-17 15:09:34 -07:00
2016-03-17 15:09:34 -07:00
2019-08-04 09:33:41 +02:00
2018-05-30 07:50:41 +02:00
2019-06-22 08:17:18 +02:00
2016-05-22 17:21:27 -07:00
2018-10-10 08:53:20 +02:00
2016-10-12 10:23:41 -07:00
2017-08-30 10:21:47 +02:00
2019-08-25 10:51:40 +02:00
2019-03-13 14:04:58 -07:00
2019-03-23 13:19:49 +01:00
2019-02-20 10:18:34 +01:00
2019-04-05 22:29:05 +02:00
2016-07-28 16:07:41 -07:00
2019-03-13 14:04:54 -07:00
2019-05-21 18:48:58 +02:00
2019-07-10 09:55:43 +02:00
2016-03-17 15:09:34 -07:00
2019-08-06 18:29:41 +02:00
2016-04-28 11:44:19 +02:00
2019-08-04 09:33:42 +02:00
2016-05-19 19:12:14 -07:00
2018-08-15 18:14:45 +02:00
2018-10-20 09:51:31 +02:00
2017-08-24 17:12:19 -07:00
2017-01-06 10:40:13 +01:00
2019-02-12 19:45:02 +01:00
2019-06-22 08:17:12 +02:00
2019-04-05 22:29:06 +02:00
2019-07-10 09:55:38 +02:00
2016-10-07 18:46:29 -07:00
2016-10-07 18:46:29 -07:00
2016-10-07 18:46:27 -07:00
2016-06-03 15:06:22 -07:00
2019-01-26 09:38:35 +01:00
2017-11-24 08:33:42 +01:00
2016-03-17 15:09:34 -07:00
2019-04-27 09:34:47 +02:00
2016-03-17 15:09:34 -07:00
2016-10-18 14:13:37 -07:00
2016-03-17 15:09:34 -07:00
2016-08-26 17:39:35 -07:00
2018-11-21 09:26:03 +01:00
2019-03-23 13:19:44 +01:00
2018-12-01 09:44:19 +01:00
2019-06-22 08:17:13 +02:00
2017-03-22 12:43:38 +01:00
2017-03-22 12:43:38 +01:00
2018-10-03 17:01:50 -07:00
2016-08-02 17:31:41 -04:00
2017-10-21 17:21:36 +02:00
2017-07-05 14:40:17 +02:00
2016-10-07 18:46:28 -07:00
2016-10-07 18:46:28 -07:00
2018-08-15 18:14:45 +02:00
2018-12-08 13:05:09 +01:00
2019-08-25 10:51:40 +02:00
2016-04-04 10:41:08 -07:00
2019-01-16 22:12:32 +01:00
2018-09-19 22:47:17 +02:00
2019-08-25 10:51:20 +02:00
2017-03-12 06:41:43 +01:00
2018-09-19 22:47:12 +02:00
2019-04-27 09:34:46 +02:00
2017-04-08 09:30:36 +02:00
2016-06-03 16:02:55 -07:00
2019-02-27 10:07:03 +01:00
2018-09-05 09:20:02 +02:00