linux/net/core
Vasily Averin 425b9c7f51 memcg: accounting for objects allocated for new netdevice
Creating a new netdevice allocates at least ~50Kb of memory for various
kernel objects, but only ~5Kb of them are accounted to memcg. As a result,
creating an unlimited number of netdevice inside a memcg-limited container
does not fall within memcg restrictions, consumes a significant part
of the host's memory, can cause global OOM and lead to random kills of
host processes.

The main consumers of non-accounted memory are:
 ~10Kb   80+ kernfs nodes
 ~6Kb    ipv6_add_dev() allocations
  6Kb    __register_sysctl_table() allocations
  4Kb    neigh_sysctl_register() allocations
  4Kb    __devinet_sysctl_register() allocations
  4Kb    __addrconf_sysctl_register() allocations

Accounting of these objects allows to increase the share of memcg-related
memory up to 60-70% (~38Kb accounted vs ~54Kb total for dummy netdevice
on typical VM with default Fedora 35 kernel) and this should be enough
to somehow protect the host from misuse inside container.

Other related objects are quite small and may not be taken into account
to minimize the expected performance degradation.

It should be separately mentonied ~300 bytes of percpu allocation
of struct ipstats_mib in snmp6_alloc_dev(), on huge multi-cpu nodes
it can become the main consumer of memory.

This patch does not enables kernfs accounting as it affects
other parts of the kernel and should be discussed separately.
However, even without kernfs, this patch significantly improves the
current situation and allows to take into account more than half
of all netdevice allocations.

Signed-off-by: Vasily Averin <vvs@openvz.org>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/354a0a5f-9ec3-a25c-3215-304eab2157bc@openvz.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-04 19:16:46 -07:00
..
bpf_sk_storage.c bpf: Compute map_btf_id during build time 2022-04-26 11:35:21 -07:00
datagram.c net: inline skb_zerocopy_iter_dgram 2022-04-30 12:58:44 +01:00
dev_addr_lists_test.c net: kunit: add a test for dev_addr_lists 2021-11-20 12:25:57 +00:00
dev_addr_lists.c net: extract a few internals from netdevice.h 2022-04-07 20:32:09 -07:00
dev_ioctl.c net: extract a few internals from netdevice.h 2022-04-07 20:32:09 -07:00
dev.c netdev: reshuffle netif_napi_add() APIs to allow dropping weight 2022-05-03 17:26:10 -07:00
dev.h net: extract a few internals from netdevice.h 2022-04-07 20:32:09 -07:00
devlink.c devlink: introduce line card device info infrastructure 2022-04-25 10:42:28 +01:00
drop_monitor.c drop_monitor: remove quadratic behavior 2022-02-23 12:39:58 +00:00
dst_cache.c wireguard: device: reset peer src endpoint when netns exits 2021-11-29 19:50:45 -08:00
dst.c net: dst: add net device refcount tracking to dst_entry 2021-12-06 16:05:10 -08:00
failover.c net: failover: add net device refcount tracker 2021-12-06 16:06:02 -08:00
fib_notifier.c
fib_rules.c fib: expand fib_rule_policy 2021-12-16 07:18:35 -08:00
filter.c Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2022-04-27 17:09:32 -07:00
flow_dissector.c flow_dissector: Add number of vlan tags dissector 2022-04-20 11:09:13 +01:00
flow_offload.c flow_offload: add reoffload process to update hw_count 2021-12-19 14:08:48 +00:00
gen_estimator.c net: sched: Remove Qdisc::running sequence counter 2021-10-18 12:54:41 +01:00
gen_stats.c net: stats: Read the statistics in ___gnet_stats_copy_basic() instead of adding. 2021-10-21 12:47:56 +01:00
gro_cells.c net: add per-cpu storage and net->core_stats 2022-03-11 23:17:24 -08:00
gro.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-03-10 17:16:56 -08:00
hwbm.c
link_watch.c net: extract a few internals from netdevice.h 2022-04-07 20:32:09 -07:00
lwt_bpf.c bpf, lwt: Fix crash when using bpf_skb_set_tunnel_key() from bpf_xmit lwt hook 2022-04-22 17:45:25 +02:00
lwtunnel.c lwtunnel: Validate RTA_ENCAP_TYPE attribute length 2021-12-31 14:31:59 +00:00
Makefile net: kunit: add a test for dev_addr_lists 2021-11-20 12:25:57 +00:00
neighbour.c memcg: accounting for objects allocated for new netdevice 2022-05-04 19:16:46 -07:00
net_namespace.c net: initialize init_net earlier 2022-02-06 11:04:29 +00:00
net-procfs.c net: extract a few internals from netdevice.h 2022-04-07 20:32:09 -07:00
net-sysfs.c net: extract a few internals from netdevice.h 2022-04-07 20:32:09 -07:00
net-sysfs.h
net-traces.c tcp: add tracepoint for checksum errors 2021-05-14 15:26:03 -07:00
netclassid_cgroup.c bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode 2021-09-13 16:35:58 -07:00
netevent.c net: core: Correct function name netevent_unregister_notifier() in the kerneldoc 2021-03-28 17:56:56 -07:00
netpoll.c netpoll: add net device refcount tracker to struct netpoll 2021-12-06 16:06:02 -08:00
netprio_cgroup.c bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode 2021-09-13 16:35:58 -07:00
of_net.c Revert "of: net: support NVMEM cells with MAC in text format" 2022-01-12 14:14:36 +00:00
page_pool.c net: page_pool: introduce ethtool stats 2022-04-15 10:43:47 +01:00
pktgen.c proc: remove PDE_DATA() completely 2022-01-22 08:33:37 +02:00
ptp_classifier.c ptp: Add generic PTP is_sync() function 2022-03-07 11:31:34 +00:00
request_sock.c
rtnetlink.c rtnl: move rtnl_newlink_create() 2022-05-02 15:14:20 +02:00
scm.c memcg: enable accounting for scm_fp_list objects 2021-07-20 06:00:38 -07:00
secure_seq.c net: align static siphash keys 2021-11-16 19:07:54 -08:00
selftests.c net: core: constify mac addrs in selftests 2021-10-24 13:59:44 +01:00
skbuff.c tcp: optimise skb_zerocopy_iter_stream() 2022-05-02 14:36:24 -07:00
skmsg.c bpf, sockmap: Fix memleak in tcp_bpf_sendmsg while sk msg is full 2022-03-15 16:43:31 +01:00
sock_destructor.h skb_expand_head() adjust skb->truesize incorrectly 2021-10-22 12:35:51 -07:00
sock_diag.c net: Don't include filter.h from net/sock.h 2021-12-29 08:48:14 -08:00
sock_map.c bpf: Compute map_btf_id during build time 2022-04-26 11:35:21 -07:00
sock_reuseport.c tcp: Add stats for socket migration. 2021-06-23 12:56:08 -07:00
sock.c sock: optimise sock_def_write_space barriers 2022-05-01 12:19:01 +01:00
stream.c net: stream: don't purge sk_error_queue in sk_stream_kill_queues() 2021-10-16 09:06:09 +01:00
sysctl_net_core.c net: sysctl: introduce sysctl SYSCTL_THREE 2022-05-03 10:15:06 +02:00
timestamping.c
tso.c
utils.c net: core: Use csum_replace_by_diff() and csum_sub() instead of opencoding 2022-02-21 11:40:44 +00:00
xdp.c Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2022-03-22 11:18:49 -07:00