Go to file
Fred Lotter 28abe57962 nfp: flower: cmsg rtnl locks can timeout reify messages
Flower control message replies are handled in different locations. The truly
high priority replies are handled in the BH (tasklet) context, while the
remaining replies are handled in a predefined Linux work queue. The work
queue handler orders replies into high and low priority groups, and always
start servicing the high priority replies within the received batch first.

Reply Type:			Rtnl Lock:	Handler:

CMSG_TYPE_PORT_MOD		no		BH tasklet (mtu)
CMSG_TYPE_TUN_NEIGH		no		BH tasklet
CMSG_TYPE_FLOW_STATS		no		BH tasklet
CMSG_TYPE_PORT_REIFY		no		WQ high
CMSG_TYPE_PORT_MOD		yes		WQ high (link/mtu)
CMSG_TYPE_MERGE_HINT		yes		WQ low
CMSG_TYPE_NO_NEIGH		no		WQ low
CMSG_TYPE_ACTIVE_TUNS		no		WQ low
CMSG_TYPE_QOS_STATS		no		WQ low
CMSG_TYPE_LAG_CONFIG		no		WQ low

A subset of control messages can block waiting for an rtnl lock (from both
work queue priority groups). The rtnl lock is heavily contended for by
external processes such as systemd-udevd, systemd-network and libvirtd,
especially during netdev creation, such as when flower VFs and representors
are instantiated.

Kernel netlink instrumentation shows that external processes (such as
systemd-udevd) often use successive rtnl_trylock() sequences, which can result
in an rtnl_lock() blocked control message to starve for longer periods of time
during rtnl lock contention, i.e. netdev creation.

In the current design a single blocked control message will block the entire
work queue (both priorities), and introduce a latency which is
nondeterministic and dependent on system wide rtnl lock usage.

In some extreme cases, one blocked control message at exactly the wrong time,
just before the maximum number of VFs are instantiated, can block the work
queue for long enough to prevent VF representor REIFY replies from getting
handled in time for the 40ms timeout.

The firmware will deliver the total maximum number of REIFY message replies in
around 300us.

Only REIFY and MTU update messages require replies within a timeout period (of
40ms). The MTU-only updates are already done directly in the BH (tasklet)
handler.

Move the REIFY handler down into the BH (tasklet) in order to resolve timeouts
caused by a blocked work queue waiting on rtnl locks.

Signed-off-by: Fred Lotter <frederik.lotter@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07 18:05:50 +02:00
arch Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-09-01 11:21:57 -07:00
block block: remove REQ_NOWAIT_INLINE 2019-08-15 11:09:16 -06:00
certs Revert "Merge tag 'keys-acl-20190703' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs" 2019-07-10 18:43:43 -07:00
crypto USB / PHY patches for 5.3-rc1 2019-07-11 15:40:06 -07:00
Documentation Char/Misc driver fixes for 5.3-rc7 2019-09-02 09:30:34 -07:00
drivers nfp: flower: cmsg rtnl locks can timeout reify messages 2019-09-07 18:05:50 +02:00
fs a few small SMB3 fixes, and a larger one to fix various older string handling functions 2019-08-29 17:51:23 -07:00
include isdn/capi: check message length in capi_write() 2019-09-07 17:44:25 +02:00
init Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-07-19 10:42:02 -07:00
ipc Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-07-19 10:42:02 -07:00
kernel bpf: fix precision tracking of stack slots 2019-09-05 14:06:58 +02:00
lib Partially revert "kfifo: fix kfifo_alloc() and kfifo_init()" 2019-08-30 18:47:15 -07:00
LICENSES LICENSES: Rename other to deprecated 2019-05-03 06:34:32 -06:00
mm mm: memcontrol: fix percpu vmstats and vmevents flush 2019-08-30 18:00:50 -07:00
net net: gso: Fix skb_segment splat when splitting gso_size mangled skb having linear-headed frag_list 2019-09-07 17:58:48 +02:00
samples auxdisplay: Fix a typo in cfag12864b-example.c 2019-08-08 20:00:18 +02:00
scripts SPDX fixes for 5.3-rc5 2019-08-18 09:26:16 -07:00
security keys: ensure that ->match_free() is called in request_key_and_link() 2019-08-30 11:10:55 -07:00
sound sound fixes for 5.3-rc7 2019-08-27 10:42:03 -07:00
tools Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec 2019-09-06 15:09:16 +02:00
usr kbuild: enable arch/s390/include/uapi/asm/zcrypt.h for uapi header test 2019-07-23 10:45:46 +02:00
virt arm64 fixes for -rc7 2019-08-28 10:37:21 -07:00
.clang-format Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-04-17 11:26:25 -07:00
.cocciconfig
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes
.gitignore .gitignore: Add compilation database file 2019-07-27 12:18:19 +09:00
.mailmap mailmap: add aliases for Dmitry Safonov 2019-08-30 18:00:50 -07:00
COPYING COPYING: use the new text with points to the license files 2018-03-23 12:41:45 -06:00
CREDITS Remove references to dead website. 2019-07-19 12:22:04 -07:00
Kbuild Kbuild updates for v5.1 2019-03-10 17:48:21 -07:00
Kconfig docs: kbuild: convert docs to ReST and rename to *.rst 2019-06-14 14:21:21 -06:00
MAINTAINERS MAINTAINERS: add myself as maintainer for xilinx axiethernet driver 2019-09-06 15:17:16 +02:00
Makefile Linux 5.3-rc7 2019-09-02 09:57:40 -07:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.