linux/net
Jakub Kicinski 6025b9135f net: dqs: add NIC stall detector based on BQL
softnet_data->time_squeeze is sometimes used as a proxy for
host overload or indication of scheduling problems. In practice
this statistic is very noisy and has hard to grasp units -
e.g. is 10 squeezes a second to be expected, or high?

Delaying network (NAPI) processing leads to drops on NIC queues
but also RTT bloat, impacting pacing and CA decisions.
Stalls are a little hard to detect on the Rx side, because
there may simply have not been any packets received in given
period of time. Packet timestamps help a little bit, but
again we don't know if packets are stale because we're
not keeping up or because someone (*cough* cgroups)
disabled IRQs for a long time.

We can, however, use Tx as a proxy for Rx stalls. Most drivers
use combined Rx+Tx NAPIs so if Tx gets starved so will Rx.
On the Tx side we know exactly when packets get queued,
and completed, so there is no uncertainty.

This patch adds stall checks to BQL. Why BQL? Because
it's a convenient place to add such checks, already
called by most drivers, and it has copious free space
in its structures (this patch adds no extra cache
references or dirtying to the fast path).

The algorithm takes one parameter - max delay AKA stall
threshold and increments a counter whenever NAPI got delayed
for at least that amount of time. It also records the length
of the longest stall.

To be precise every time NAPI has not polled for at least
stall thrs we check if there were any Tx packets queued
between last NAPI run and now - stall_thrs/2.

Unlike the classic Tx watchdog this mechanism does not
ignore stalls caused by Tx being disabled, or loss of link.
I don't think the check is worth the complexity, and
stall is a stall, whether due to host overload, flow
control, link down... doesn't matter much to the application.

We have been running this detector in production at Meta
for 2 years, with the threshold of 8ms. It's the lowest
value where false positives become rare. There's still
a constant stream of reported stalls (especially without
the ksoftirqd deferral patches reverted), those who like
their stall metrics to be 0 may prefer higher value.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 10:23:26 +00:00
..
6lowpan net: fill in MODULE_DESCRIPTION()s for 6LoWPAN 2024-02-09 14:12:01 -08:00
9p
802
8021q rtnetlink: prepare nla_put_iflink() to run under RCU 2024-02-26 11:46:12 +00:00
appletalk
atm net: fill in MODULE_DESCRIPTION()s for mpoa 2024-02-09 14:12:01 -08:00
ax25
batman-adv This cleanup patchset includes the following patches: 2024-02-02 12:44:16 +00:00
bluetooth Bluetooth: Enforce validation on max value of connection interval 2024-02-28 09:44:11 -05:00
bpf net: move skbuff_cache(s) to net_hotdata 2024-03-07 21:12:42 -08:00
bridge Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-02-29 14:24:56 -08:00
caif
can linux-can-next-for-6.9-20240220 2024-02-20 15:32:45 +01:00
ceph libceph: just wait for more data to be available on the socket 2024-02-07 14:43:29 +01:00
core net: dqs: add NIC stall detector based on BQL 2024-03-08 10:23:26 +00:00
dcb
dccp net: dccp: Simplify the allocation of slab caches in dccp_ackvec_init 2024-02-02 12:19:26 +00:00
devlink devlink: fix port dump cmd type 2024-02-21 17:11:04 -08:00
dns_resolver
dsa net: dsa: Leverage core stats allocator 2024-03-07 20:37:13 -08:00
ethernet
ethtool ethtool: remove ethtool_eee_use_linkmodes 2024-03-06 20:40:20 -08:00
handshake net/handshake: Fix handshake_req_destroy_test1 2024-02-08 18:32:29 -08:00
hsr Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-02-29 14:24:56 -08:00
ieee802154 rtnetlink: prepare nla_put_iflink() to run under RCU 2024-02-26 11:46:12 +00:00
ife
ipv4 net: introduce include/net/rps.h 2024-03-07 21:12:43 -08:00
ipv6 net: introduce include/net/rps.h 2024-03-07 21:12:43 -08:00
iucv net/af_iucv: fix virtual vs physical address confusion 2024-02-22 18:28:13 -08:00
kcm net: kcm: Simplify the allocation of slab caches 2024-02-21 11:28:57 +00:00
key net: fill in MODULE_DESCRIPTION()s for af_key 2024-02-09 14:12:01 -08:00
l2tp Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-02-22 15:29:26 -08:00
l3mdev
lapb
llc llc: call sock_orphan() at release time 2024-01-30 13:49:09 +01:00
mac80211 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-02-29 14:24:56 -08:00
mac802154
mctp Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-02-29 14:24:56 -08:00
mpls mpls: Do not orphan the skb 2024-03-07 20:42:13 -08:00
mptcp mptcp: drop lookup_by_id in lookup_addr 2024-03-06 20:24:10 -08:00
ncsi
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-03-07 10:29:36 -08:00
netlabel netlabel: remove impossible return value in netlbl_bitmap_walk 2024-02-28 19:37:34 -08:00
netlink genetlink: fit NLMSG_DONE into same read() as families 2024-03-06 08:07:45 +00:00
netrom netrom: Fix data-races around sysctl_net_busy_read 2024-03-07 10:36:58 +01:00
nfc nfc: core: make nfc_class constant 2024-03-05 11:21:18 -08:00
nsh
openvswitch net: openvswitch: limit the number of recursions from action sets 2024-02-09 12:54:38 -08:00
packet net: Re-use and set mono_delivery_time bit for userspace tstamp packets 2024-03-05 13:41:16 +01:00
phonet phonet/pep: fix racy skb_queue_empty() use 2024-02-22 09:05:50 +01:00
psample
qrtr
rds Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-03-07 10:29:36 -08:00
rfkill
rose
rxrpc rxrpc: Extract useful fields from a received ACK to skb priv data 2024-03-05 23:35:26 +00:00
sched net: move dev_tx_weight to net_hotdata 2024-03-07 21:12:42 -08:00
sctp net: introduce include/net/rps.h 2024-03-07 21:12:43 -08:00
smc net/smc: reduce rtnl pressure in smc_pnet_create_pnetids_list() 2024-03-05 15:49:35 +01:00
strparser
sunrpc NFSv4.1: Assign the right value for initval and retries for rpc timeout 2024-01-29 13:39:48 -05:00
switchdev net: bridge: switchdev: Skip MDB replays of deferred events on offload 2024-02-16 09:36:37 +00:00
tipc tipc: Cleanup tipc_nl_bearer_add() error paths 2024-02-15 13:18:19 +01:00
tls tls: fix use-after-free on failed backlog decryption 2024-02-29 09:07:16 -08:00
unix Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-02-22 15:29:26 -08:00
vmw_vsock sock_diag: add module pointer to "struct sock_diag_handler" 2024-01-23 15:13:54 +01:00
wireless Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-02-29 14:24:56 -08:00
x25 net: x25: remove dead links from Kconfig 2024-03-07 20:24:35 -08:00
xdp bpf-next-for-netdev 2024-03-02 20:50:59 -08:00
xfrm net: move netdev_max_backlog to net_hotdata 2024-03-07 21:12:42 -08:00
compat.c
devres.c
Kconfig net: bql: allow the config to be disabled 2024-02-18 10:19:21 +00:00
Kconfig.debug
Makefile af_unix: Remove CONFIG_UNIX_SCM. 2024-01-31 16:41:16 -08:00
socket.c net: remove SLAB_MEM_SPREAD flag usage 2024-02-28 19:29:46 -08:00
sysctl_net.c