linux

iv/linux

Go to file

Jason A. Donenfeld 8b5553ace8 wireguard: queueing: get rid of per-peer ring buffers

Having two ring buffers per-peer means that every peer results in two
massive ring allocations. On an 8-core x86_64 machine, this commit
reduces the per-peer allocation from 18,688 bytes to 1,856 bytes, which
is an 90% reduction. Ninety percent! With some single-machine
deployments approaching 500,000 peers, we're talking about a reduction
from 7 gigs of memory down to 700 megs of memory.

In order to get rid of these per-peer allocations, this commit switches
to using a list-based queueing approach. Currently GSO fragments are
chained together using the skb->next pointer (the skb_list_* singly
linked list approach), so we form the per-peer queue around the unused
skb->prev pointer (which sort of makes sense because the links are
pointing backwards). Use of skb_queue_* is not possible here, because
that is based on doubly linked lists and spinlocks. Multiple cores can
write into the queue at any given time, because its writes occur in the
start_xmit path or in the udp_recv path. But reads happen in a single
workqueue item per-peer, amounting to a multi-producer, single-consumer
paradigm.

The MPSC queue is implemented locklessly and never blocks. However, it
is not linearizable (though it is serializable), with a very tight and
unlikely race on writes, which, when hit (some tiny fraction of the
0.15% of partial adds on a fully loaded 16-core x86_64 system), causes
the queue reader to terminate early. However, because every packet sent
queues up the same workqueue item after it is fully added, the worker
resumes again, and stopping early isn't actually a problem, since at
that point the packet wouldn't have yet been added to the encryption
queue. These properties allow us to avoid disabling interrupts or
spinning. The design is based on Dmitry Vyukov's algorithm [1].

Performance-wise, ordinarily list-based queues aren't preferable to
ringbuffers, because of cache misses when following pointers around.
However, we *already* have to follow the adjacent pointers when working
through fragments, so there shouldn't actually be any change there. A
potential downside is that dequeueing is a bit more complicated, but the
ptr_ring structure used prior had a spinlock when dequeueing, so all and
all the difference appears to be a wash.

Actually, from profiling, the biggest performance hit, by far, of this
commit winds up being atomic_add_unless(count, 1, max) and atomic_
dec(count), which account for the majority of CPU time, according to
perf. In that sense, the previous ring buffer was superior in that it
could check if it was full by head==tail, which the list-based approach
cannot do.

But all and all, this enables us to get massive memory savings, allowing
WireGuard to scale for real world deployments, without taking much of a
performance hit.

[1] http://www.1024cores.net/home/lock-free-algorithms/queues/intrusive-mpsc-node-based-queue

Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

2021-02-23 15:59:34 -08:00

arch

The performance event updates for v5.12 are:

2021-02-21 12:49:32 -08:00

block

for-5.12/block-2021-02-17

2021-02-21 11:02:48 -08:00

certs

.gitignore: add SPDX License Identifier

2020-03-25 11:50:48 +01:00

crypto

X.509: Fix crash caused by NULL pointer

2021-01-20 11:33:51 -08:00

Documentation

The performance event updates for v5.12 are:

2021-02-21 12:49:32 -08:00

drivers

wireguard: queueing: get rid of per-peer ring buffers

2021-02-23 15:59:34 -08:00

These changes fix MM (soft-)dirty bit management in the procfs code & clean up the API.

2021-02-21 12:19:56 -08:00

include

net: icmp: pass zeroed opts from icmp{,v6}_ndo_send before sending

2021-02-23 11:29:52 -08:00

init

Scheduler updates for v5.12:

2021-02-21 12:35:04 -08:00

ipc

Merge branch 'akpm' (patches from Andrew)

2020-12-15 12:53:37 -08:00

kernel

The performance event updates for v5.12 are:

2021-02-21 12:49:32 -08:00

lib

Scheduler updates for v5.12:

2021-02-21 12:35:04 -08:00

LICENSES

LICENSES: Add the CC-BY-4.0 license

2020-12-08 10:33:27 -07:00

These changes fix MM (soft-)dirty bit management in the procfs code & clean up the API.

2021-02-21 12:19:56 -08:00

net

net: qrtr: Fix memory leak in qrtr_tun_open

2021-02-23 15:38:22 -08:00

samples

samples: bpf: Remove unneeded semicolon

2021-02-02 21:37:59 -08:00

scripts

These are the v5.12 updates for the locking subsystem:

2021-02-21 12:12:01 -08:00

security

cap: fix conversions on getxattr

2021-01-28 10:22:48 +01:00

sound

Merge branches 'acpi-misc', 'acpi-cppc', 'acpi-docs', 'acpi-config' and 'acpi-apei'

2021-02-15 17:04:40 +01:00

tools

wireguard: selftests: test multiple parallel streams

2021-02-23 15:54:07 -08:00

usr

arch: ia64: Remove rest of perfmon support

2021-01-22 12:12:20 +05:30

virt

KVM/arm64 fixes for 5.11, take #2

2021-01-25 18:52:01 -05:00

.clang-format

clang-format: Update with the latest for_each macro list

2021-01-29 15:00:23 +01:00

.cocciconfig

…

.get_maintainer.ignore

Opt out of scripts/get_maintainer.pl

2019-05-16 10:53:40 -07:00

.gitattributes

.gitattributes: use 'dts' diff driver for dts files

2019-12-04 19:44:11 -08:00

.gitignore

.gitignore: docs: ignore sphinx_*/ directories

2020-09-10 10:44:31 -06:00

.mailmap

MAINTAINERS: update Andrey Ryabinin's email address

2021-02-09 17:26:44 -08:00

COPYING

COPYING: state that all contributions really are covered by this file

2020-02-10 13:32:20 -08:00

CREDITS

MAINTAINERS: dccp: move Gerrit Renker to CREDITS

2021-01-14 10:53:49 -08:00

Kbuild

kbuild: rename hostprogs-y/always to hostprogs/always-y

2020-02-04 01:53:07 +09:00

Kconfig

kbuild: ensure full rebuild when the compiler is updated

2020-05-12 13:28:33 +09:00

MAINTAINERS

These are the v5.12 updates for the locking subsystem:

2021-02-21 12:12:01 -08:00

Makefile

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

2021-02-20 17:45:32 -08:00

README

Drop all 00-INDEX files from Documentation/

2018-09-09 15:08:58 -06:00

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.

Languages

C 97.6%

Assembly 1%

Shell 0.5%

Python 0.3%

Makefile 0.3%