linux/net
Jesper Dangaard Brouer 44768decb7 page_pool: handle page recycle for NUMA_NO_NODE condition
The check in pool_page_reusable (page_to_nid(page) == pool->p.nid) is
not valid if page_pool was configured with pool->p.nid = NUMA_NO_NODE.

The goal of the NUMA changes in commit d5394610b1 ("page_pool: Don't
recycle non-reusable pages"), were to have RX-pages that belongs to the
same NUMA node as the CPU processing RX-packet during softirq/NAPI. As
illustrated by the performance measurements.

This patch moves the NAPI checks out of fast-path, and at the same time
solves the NUMA_NO_NODE issue.

First realize that alloc_pages_node() with pool->p.nid = NUMA_NO_NODE
will lookup current CPU nid (Numa ID) via numa_mem_id(), which is used
as the the preferred nid.  It is only in rare situations, where
e.g. NUMA zone runs dry, that page gets doesn't get allocated from
preferred nid.  The page_pool API allows drivers to control the nid
themselves via controlling pool->p.nid.

This patch moves the NAPI check to when alloc cache is refilled, via
dequeuing/consuming pages from the ptr_ring. Thus, we can allow placing
pages from remote NUMA into the ptr_ring, as the dequeue/consume step
will check the NUMA node. All current drivers using page_pool will
alloc/refill RX-ring from same CPU running softirq/NAPI process.

Drivers that control the nid explicitly, also use page_pool_update_nid
when changing nid runtime.  To speed up transision to new nid the alloc
cache is now flushed on nid changes.  This force pages to come from
ptr_ring, which does the appropate nid check.

For the NUMA_NO_NODE case, when a NIC IRQ is moved to another NUMA
node, we accept that transitioning the alloc cache doesn't happen
immediately. The preferred nid change runtime via consulting
numa_mem_id() based on the CPU processing RX-packets.

Notice, to avoid stressing the page buddy allocator and avoid doing too
much work under softirq with preempt disabled, the NUMA check at
ptr_ring dequeue will break the refill cycle, when detecting a NUMA
mismatch. This will cause a slower transition, but its done on purpose.

Fixes: d5394610b1 ("page_pool: Don't recycle non-reusable pages")
Reported-by: Li RongQing <lirongqing@baidu.com>
Reported-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-02 15:37:52 -08:00
..
6lowpan
9p 9p pull request for inclusion in 5.4 2019-09-27 15:10:34 -07:00
802 treewide: Use sizeof_field() macro 2019-12-09 10:36:44 -08:00
8021q net: vlan: Use the PHY time stamping interface. 2019-12-25 19:51:33 -08:00
appletalk appletalk: enforce CAP_NET_RAW for raw sockets 2019-09-24 16:37:18 +02:00
atm netdev: pass the stuck queue to the timeout handler 2019-12-12 21:38:57 -08:00
ax25 net: use helpers to change sk_ack_backlog 2019-11-06 16:14:48 -08:00
batman-adv treewide: Use sizeof_field() macro 2019-12-09 10:36:44 -08:00
bluetooth netdev: pass the stuck queue to the timeout handler 2019-12-12 21:38:57 -08:00
bpf bpf: Allow to change skb mark in test_run 2019-12-18 17:05:58 -08:00
bpfilter
bridge Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-12-31 13:37:13 -08:00
caif Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-11-02 13:54:56 -07:00
can can: j1939: j1939_sk_bind(): take priv after lock is held 2019-12-08 11:52:02 +01:00
ceph libceph, rbd, ceph: convert to use the new mount API 2019-11-27 22:28:37 +01:00
core page_pool: handle page recycle for NUMA_NO_NODE condition 2020-01-02 15:37:52 -08:00
dcb
dccp treewide: Use sizeof_field() macro 2019-12-09 10:36:44 -08:00
decnet net: add bool confirm_neigh parameter for dst_ops.update_pmtu 2019-12-24 22:28:54 -08:00
dns_resolver
dsa net: dsa: Deny PTP on master if switch supports it 2019-12-28 11:43:41 -08:00
ethernet net: add annotations on hh->hh_len lockless accesses 2019-11-07 20:07:30 -08:00
ethtool ethtool: provide link state with LINKSTATE_GET request 2019-12-27 16:40:02 -08:00
hsr hsr: fix slab-out-of-bounds Read in hsr_debugfs_rename() 2019-12-30 20:36:27 -08:00
ieee802154 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-11-02 13:54:56 -07:00
ife net: Fix Kconfig indentation 2019-09-26 08:56:17 +02:00
ipv4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-12-31 13:37:13 -08:00
ipv6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-12-31 13:37:13 -08:00
iucv treewide: Use sizeof_field() macro 2019-12-09 10:36:44 -08:00
kcm kcm: disable preemption in kcm_parse_func_strparser() 2019-09-27 10:27:14 +02:00
key
l2tp net: ipv6: add net argument to ip6_dst_lookup_flow 2019-12-04 12:27:12 -08:00
l3mdev
lapb
llc llc2: Fix return statement of llc_stat_ev_rx_null_dsap_xid_c (and _test_c) 2019-12-20 21:19:36 -08:00
mac80211 mac80211: Turn AQL into an NL80211_EXT_FEATURE 2019-12-13 10:34:04 +01:00
mac802154
mpls net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookup 2019-12-04 12:27:13 -08:00
ncsi net/ncsi: Fix gma flag setting after response 2019-12-30 20:34:17 -08:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-12-31 13:37:13 -08:00
netlabel netlabel: remove redundant assignment to pointer iter 2019-09-01 11:45:02 -07:00
netlink treewide: Use sizeof_field() macro 2019-12-09 10:36:44 -08:00
netrom net: core: add generic lockdep keys 2019-10-24 14:53:48 -07:00
nfc net: nfc: nci: fix a possible sleep-in-atomic-context bug in nci_uart_tty_receive() 2019-12-18 11:57:33 -08:00
nsh
openvswitch openvswitch: New MPLS actions for layer 2 tunnelling 2019-12-24 22:24:45 -08:00
packet af_packet: refactoring code for prb_calc_retire_blk_tmo 2019-12-26 15:19:59 -08:00
phonet net: use skb_queue_empty_lockless() in poll() handlers 2019-10-28 13:33:41 -07:00
psample net: psample: fix skb_over_panic 2019-11-26 14:40:13 -08:00
qrtr net: qrtr: Simplify 'qrtr_tun_release()' 2019-10-30 17:58:23 -07:00
rds Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-11-16 21:51:42 -08:00
rfkill rfkill: Fix incorrect check to avoid NULL pointer dereference 2019-12-16 10:15:49 +01:00
rose net: use helpers to change sk_ack_backlog 2019-11-06 16:14:48 -08:00
rxrpc RxRPC fixes 2019-12-24 16:12:47 -08:00
sched Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-12-31 13:37:13 -08:00
sctp Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-12-31 13:37:13 -08:00
smc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-12-22 15:15:05 -08:00
strparser
sunrpc This is a relatively quiet cycle for nfsd, mainly various bugfixes. 2019-12-07 16:56:00 -08:00
switchdev
tipc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-12-22 15:15:05 -08:00
tls net/tls: add helper for testing if socket is RX offloaded 2019-12-19 17:46:51 -08:00
unix Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-12-22 15:15:05 -08:00
vmw_vsock Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-12-22 15:15:05 -08:00
wimax wimax: no need to check return value of debugfs_create functions 2019-08-10 15:25:47 -07:00
wireless Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-12-22 15:15:05 -08:00
x25 net/x25: add new state X25_STATE_5 2019-12-09 10:28:43 -08:00
xdp Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2019-12-27 14:20:10 -08:00
xfrm Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next 2019-11-25 20:02:57 -08:00
compat.c y2038: socket: use __kernel_old_timespec instead of timespec 2019-11-15 14:38:29 +01:00
Kconfig ethtool: introduce ethtool netlink interface 2019-12-27 16:40:01 -08:00
Makefile ethtool: move to its own directory 2019-12-12 17:07:05 -08:00
socket.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-12-22 15:15:05 -08:00
sysctl_net.c