IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Use the libbpf skeleton facility and other utilities provided by XDP
samples helper.
One important note:
The XDP samples helper handles ownership of installed XDP programs on
devices, including responding to SIGINT and SIGTERM, so drop the code
here and use the helpers we provide going forward for all xdp_redirect*
conversions.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-17-memxor@gmail.com
Use the libbpf skeleton facility and other utilities provided by XDP
samples helper.
A lot of the code in xdp_monitor and xdp_redirect_cpu has been moved to
the xdp_sample_user.o helper, so we remove the duplicate functions here
that are no longer needed.
Thanks to BPF skeleton, we no longer depend on order of tracepoints to
uninstall them on startup. Instead, the sample mask is used to install
the needed tracepoints.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-15-memxor@gmail.com
We already moved all the functionality it provided in XDP samples helper
userspace and kernel BPF object, so just delete the unneeded code.
We also add generation of BPF skeleton and compilation using clang
-target bpf for files ending with .bpf.c suffix (to denote that they use
vmlinux.h).
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-14-memxor@gmail.com
This adds support for retrieval and printing for devmap_xmit total and
mutli mode tracepoint. For multi mode, we keep a hash map entry for each
redirection stream, such that we can dynamically add and remove entries
on output.
The from_match and to_match will be set by individual samples when
setting up the XDP program on these devices.
The multi mode tracepoint is also handy for xdp_redirect_map_multi,
where up to 32 devices can be specified.
Also add samples_init_pre_load macro to finally set up the resized maps
and mmap them in place for low overhead stats retrieval.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-12-memxor@gmail.com
This adds support for the devmap_xmit tracepoint, and its multi device
variant that can be used to obtain streams for each individual
net_device to net_device redirection. This is useful for decomposing
total xmit stats in xdp_monitor.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-11-memxor@gmail.com
This consolidates retrieval and printing into the XDP sample helper. For
the kthread stats, it expands xdp_stats separately with its own per-CPU
stats. For cpumap enqueue, we display FROM->TO stats also with its
per-CPU stats.
The help out explains in detail the various aspects of the output.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-10-memxor@gmail.com
These are invoked in two places, when the XDP frame or SKB (for generic
XDP) enqueued to the ptr_ring (cpumap_enqueue) and when kthread processes
the frame after invoking the CPUMAP program for it (returning stats for
the batch).
We use cpumap_map_id to filter on the map_id as a way to avoid printing
incorrect stats for parallel sessions of xdp_redirect_cpu.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-9-memxor@gmail.com
This adds the shared BPF file that will be used going forward for
sharing tracepoint programs among XDP redirect samples.
Since vmlinux.h conflicts with tools/include for READ_ONCE/WRITE_ONCE
and ARRAY_SIZE, they are copied in to xdp_sample.bpf.h along with other
helpers that will be required.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-5-memxor@gmail.com
This file implements some common helpers to consolidate differences in
features and functionality between the various XDP samples and give them
a consistent look, feel, and reporting capabilities.
This commit only adds support for receive statistics, which does not
rely on any tracepoint, but on the XDP program installed on the device
by each XDP redirect sample.
Some of the key features are:
* A concise output format accompanied by helpful text explaining its
fields.
* An elaborate output format building upon the concise one, and folding
out details in case of errors and staying out of view otherwise.
* Printing driver names for devices redirecting packets.
* Getting mac address for interface.
* Printing summarized total statistics for the entire session.
* Ability to dynamically switch between concise and verbose mode, using
SIGQUIT (Ctrl + \).
In later patches, the support will be extended for each tracepoint with
its own custom output in concise and verbose mode.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-4-memxor@gmail.com
cookie_uid_helper_example.c: In function ‘main’:
cookie_uid_helper_example.c:178:69: warning: ‘ -j ACCEPT’ directive
writing 10 bytes into a region of size between 8 and 58
[-Wformat-overflow=]
178 | sprintf(rules, "iptables -A OUTPUT -m bpf --object-pinned %s -j ACCEPT",
| ^~~~~~~~~~
/home/kkd/src/linux/samples/bpf/cookie_uid_helper_example.c:178:9: note:
‘sprintf’ output between 53 and 103 bytes into a destination of size 100
178 | sprintf(rules, "iptables -A OUTPUT -m bpf --object-pinned %s -j ACCEPT",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
179 | file);
| ~~~~~
Fix by using snprintf and a sufficiently sized buffer.
tracex4_user.c:35:15: warning: ‘write’ reading 12 bytes from a region of
size 11 [-Wstringop-overread]
35 | key = write(1, "\e[1;1H\e[2J", 12); /* clear screen */
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
Use size as 11.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-2-memxor@gmail.com
There's a few cases that a string that is to be recorded in a trace event,
does not have a terminating 'nul' character, and instead, the tracepoint
passes in the length of the string to record.
Add two helper macros to the trace event code that lets this work easier,
than tricks with "%.*s" logic.
__string_len() which is similar to __string() for declaration, but takes a
length argument.
__assign_str_len() which is similar to __assign_str() for assiging the
string, but it too takes a length argument.
Note, the TRACE_EVENT() macro will allocate the location on the ring
buffer to 'len + 1', that will be used to store the string into. It is a
requirement that the 'len' used for this is a most the length of the
string being recorded.
This string can still use __get_str() just like strings created with
__string() can use to retrieve the string.
Link: https://lore.kernel.org/linux-nfs/20210513105018.7539996a@gandalf.local.home/
Tested-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Currently, "sample04" and "sample05" are not working properly when
running with an IPv6 option("-6"). The commit 0f06a6787e ("samples:
Add an IPv6 "-6" option to the pktgen scripts") has omitted the addition
of this option at "sample04" and "sample05".
In order to support IPv6 option, this commit adds logic related to IPv6
option.
Fixes: 0f06a6787e ("samples: Add an IPv6 "-6" option to the pktgen scripts")
Signed-off-by: Juhee Kang <claudiajkang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All pktgen samples can use the environment variable instead of option
parameters(eg. $DEV is able to use instead of '-i' option).
This is results of running sample as root and user:
// running as root
# DEV=eth0 DEST_IP=10.1.0.1 DST_MAC=00:11:22:33:44:55 ./pktgen_sample01_simple.sh -v -n 1
Running... ctrl^C to stop
// running as normal user
$ DEV=eth0 DEST_IP=10.1.0.1 DST_MAC=00:11:22:33:44:55 ./pktgen_sample01_simple.sh -v -n 1
[...]
ERROR: Please specify output device
This results show the sample doesn't work properly when the sample runs
as normal user. Because the sample is restarted by the function
(root_check_run_with_sudo) to run with sudo. In this process, the
environment variable of normal user doesn't propagate to sudo.
It can be solved by using "-E"(--preserve-env) option of "sudo", which
preserve normal user's existing environment variables. So this commit
adds "-E" option in the function (root_check_run_with_sudo).
Signed-off-by: Juhee Kang <claudiajkang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The xdpsock sample application is a useful base for experiment's around
AF_XDP sockets. Compiling the sample outside of the kernel tree is made
harder then it has to be as the sample includes two headers and that are
not installed by 'make install_header' nor are usually part of
distributions kernel headers.
The first header asm/barrier.h is not used and can just be dropped.
The second linux/compiler.h are only needed for the decorator __force
and are only used in ip_fast_csum(), csum_fold() and
csum_tcpudp_nofold(). These functions are copied verbatim from
include/asm-generic/checksum.h and lib/checksum.c. While it's fine to
copy and use these functions in the sample application the decorator
brings no value and can be dropped together with the include.
With this change it's trivial to compile the xdpsock sample outside the
kernel tree from xdpsock_user.c and xdpsock.h.
$ gcc -o xdpsock xdpsock_user.c -lbpf -lpthread
Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Link: https://lore.kernel.org/bpf/20210806122855.26115-2-simon.horman@corigine.com
The original mei driver communication was strictly write command and
receive response flow, the completion of write was determined when
response was ready using select(). This paradigm is a long time not
true. There can be write without a response and an unsolicited read.
The driver is capable of handling those.
Adjust also the sample code and remove select() on read() from the
write flow. Add select to the read flow to showcase how to do the
read with a timeout.
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Link: https://lore.kernel.org/r/20210801072532.8668-1-tomas.winkler@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current behavior of 'tracex7' doesn't consist with other bpf samples
tracex{1..6}. Other samples do not require any argument to run with, but
tracex7 should be run with btrfs device argument. (it should be executed
with test_override_return.sh)
Currently, tracex7 doesn't have any description about how to run this
program and raises an unexpected error. And this result might be
confusing since users might not have a hunch about how to run this
program.
// Current behavior
# ./tracex7
sh: 1: Syntax error: word unexpected (expecting ")")
// Fixed behavior
# ./tracex7
ERROR: Run with the btrfs device argument!
In order to fix this error, this commit adds logic to report a message
and exit when running this program with a missing argument.
Additionally in test_override_return.sh, there is a problem with
multiple directory(tmpmnt) creation. So in this commit adds a line with
removing the directory with every execution.
Signed-off-by: Juhee Kang <claudiajkang@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210727041056.23455-1-claudiajkang@gmail.com
Commit e4f291b3f7 ("kdb: Simplify kdb commands registration")
allowed registration of pre-allocated kdb commands with pointer to
struct kdbtab_t. Lets switch other users as well to register pre-
allocated kdb commands via:
- Changing prototype for kdb_register() to pass a pointer to struct
kdbtab_t instead.
- Embed kdbtab_t structure in kdb_macro_t rather than individual params.
With these changes kdb_register_flags() becomes redundant and hence
removed. Also, since we have switched all users to register
pre-allocated commands, "is_dynamic" flag in struct kdbtab_t becomes
redundant and hence removed as well.
Suggested-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20210712134620.276667-3-sumit.garg@linaro.org
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Alexei Starovoitov says:
====================
pull-request: bpf-next 2021-07-15
The following pull-request contains BPF updates for your *net-next* tree.
We've added 45 non-merge commits during the last 15 day(s) which contain
a total of 52 files changed, 3122 insertions(+), 384 deletions(-).
The main changes are:
1) Introduce bpf timers, from Alexei.
2) Add sockmap support for unix datagram socket, from Cong.
3) Fix potential memleak and UAF in the verifier, from He.
4) Add bpf_get_func_ip helper, from Jiri.
5) Improvements to generic XDP mode, from Kumar.
6) Support for passing xdp_md to XDP programs in bpf_prog_run, from Zvi.
===================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from Jakub Kicinski.
"Including fixes from bpf and netfilter.
Current release - regressions:
- sock: fix parameter order in sock_setsockopt()
Current release - new code bugs:
- netfilter: nft_last:
- fix incorrect arithmetic when restoring last used
- honor NFTA_LAST_SET on restoration
Previous releases - regressions:
- udp: properly flush normal packet at GRO time
- sfc: ensure correct number of XDP queues; don't allow enabling the
feature if there isn't sufficient resources to Tx from any CPU
- dsa: sja1105: fix address learning getting disabled on the CPU port
- mptcp: addresses a rmem accounting issue that could keep packets in
subflow receive buffers longer than necessary, delaying MPTCP-level
ACKs
- ip_tunnel: fix mtu calculation for ETHER tunnel devices
- do not reuse skbs allocated from skbuff_fclone_cache in the napi
skb cache, we'd try to return them to the wrong slab cache
- tcp: consistently disable header prediction for mptcp
Previous releases - always broken:
- bpf: fix subprog poke descriptor tracking use-after-free
- ipv6:
- allocate enough headroom in ip6_finish_output2() in case
iptables TEE is used
- tcp: drop silly ICMPv6 packet too big messages to avoid
expensive and pointless lookups (which may serve as a DDOS
vector)
- make sure fwmark is copied in SYNACK packets
- fix 'disable_policy' for forwarded packets (align with IPv4)
- netfilter: conntrack:
- do not renew entry stuck in tcp SYN_SENT state
- do not mark RST in the reply direction coming after SYN packet
for an out-of-sync entry
- mptcp: cleanly handle error conditions with MP_JOIN and syncookies
- mptcp: fix double free when rejecting a join due to port mismatch
- validate lwtstate->data before returning from skb_tunnel_info()
- tcp: call sk_wmem_schedule before sk_mem_charge in zerocopy path
- mt76: mt7921: continue to probe driver when fw already downloaded
- bonding: fix multiple issues with offloading IPsec to (thru?) bond
- stmmac: ptp: fix issues around Qbv support and setting time back
- bcmgenet: always clear wake-up based on energy detection
Misc:
- sctp: move 198 addresses from unusable to private scope
- ptp: support virtual clocks and timestamping
- openvswitch: optimize operation for key comparison"
* tag 'net-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (158 commits)
net: dsa: properly check for the bridge_leave methods in dsa_switch_bridge_leave()
sfc: add logs explaining XDP_TX/REDIRECT is not available
sfc: ensure correct number of XDP queues
sfc: fix lack of XDP TX queues - error XDP TX failed (-22)
net: fddi: fix UAF in fza_probe
net: dsa: sja1105: fix address learning getting disabled on the CPU port
net: ocelot: fix switchdev objects synced for wrong netdev with LAG offload
net: Use nlmsg_unicast() instead of netlink_unicast()
octeontx2-pf: Fix uninitialized boolean variable pps
ipv6: allocate enough headroom in ip6_finish_output2()
net: hdlc: rename 'mod_init' & 'mod_exit' functions to be module-specific
net: bridge: multicast: fix MRD advertisement router port marking race
net: bridge: multicast: fix PIM hello router port marking race
net: phy: marvell10g: fix differentiation of 88X3310 from 88X3340
dsa: fix for_each_child.cocci warnings
virtio_net: check virtqueue_add_sgs() return value
mptcp: properly account bulk freed memory
selftests: mptcp: fix case multiple subflows limited by server
mptcp: avoid processing packet if a subflow reset
mptcp: fix syncookie process if mptcp can not_accept new subflow
...
Experience from production shows queue size of 192 is too small, as
this caused packet drops during cpumap-enqueue on RX-CPU. This can be
diagnosed with xdp_monitor sample program.
This bpftrace program was used to diagnose the problem in more detail:
bpftrace -e '
tracepoint:xdp:xdp_cpumap_kthread { @deq_bulk = lhist(args->processed,0,10,1); @drop_net = lhist(args->drops,0,10,1) }
tracepoint:xdp:xdp_cpumap_enqueue { @enq_bulk = lhist(args->processed,0,10,1); @enq_drops = lhist(args->drops,0,10,1); }'
Watch out for the @enq_drops counter. The @drop_net counter can happen
when netstack gets invalid packets, so don't despair it can be
natural, and that counter will likely disappear in newer kernels as it
was a source of confusion (look at netstat info for reason of the
netstack @drop_net counters).
The production system was configured with CPU power-saving C6 state.
Learn more in this blogpost[1].
And wakeup latency in usec for the states are:
# grep -H . /sys/devices/system/cpu/cpu0/cpuidle/*/latency
/sys/devices/system/cpu/cpu0/cpuidle/state0/latency:0
/sys/devices/system/cpu/cpu0/cpuidle/state1/latency:2
/sys/devices/system/cpu/cpu0/cpuidle/state2/latency:10
/sys/devices/system/cpu/cpu0/cpuidle/state3/latency:133
Deepest state take 133 usec to wakeup from (133/10^6). The link speed
is 25Gbit/s ((25*10^9/8) in bytes/sec). How many bytes can arrive with
in 133 usec at this speed: (25*10^9/8)*(133/10^6) = 415625 bytes. With
MTU size packets this is 275 packets, and with minimum Ethernet (incl
intergap overhead) 84 bytes it is 4948 packets. Clearly default queue
size is too small.
Setting default cpumap queue to 2048 as worst-case (small packet) at
10Gbit/s is 1979 packets with 133 usec wakeup time, +64 packet before
kthread wakeup call (due to xdp_do_flush) worst-case 2043 packets.
Thus, if a packet burst on RX-CPU will enqueue packets to a remote
cpumap CPU that is in deep-sleep state it can overrun the cpumap queue.
The production system was also configured to avoid deep-sleep via:
tuned-adm profile network-latency
[1] https://jeremyeder.com/2013/08/30/oh-did-you-expect-the-cpu/
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/162523477604.786243.13372630844944530891.stgit@firesoul
Execute the following command and exit, then execute it again, the following
error will be reported:
$ sudo ./samples/bpf/xdpsock -i ens4f2 -M
^C
$ sudo ./samples/bpf/xdpsock -i ens4f2 -M
libbpf: elf: skipping unrecognized data section(16) .eh_frame
libbpf: elf: skipping relo section(17) .rel.eh_frame for section(16) .eh_frame
libbpf: Kernel error message: XDP program already attached
ERROR: link set xdp fd failed
Commit c9d27c9e8d ("samples: bpf: Do not unload prog within xdpsock") removed
the unloading prog code because of the presence of bpf_link. This is fine if
XDP_SHARED_UMEM is disabled, but if it is enabled, unloading the prog is still
needed.
Fixes: c9d27c9e8d ("samples: bpf: Do not unload prog within xdpsock")
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210628091815.2373487-1-wanghai38@huawei.com
The samples/bpf Makefile currently compiles BPF files in a way that will
produce an .eh_frame section, which will in turn confuse libbpf and produce
errors when loading BPF programs, like:
libbpf: elf: skipping unrecognized data section(32) .eh_frame
libbpf: elf: skipping relo section(33) .rel.eh_frame for section(32) .eh_frame
Fix this by instruction Clang not to produce this section, as it's useless
for BPF anyway.
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210705103841.180260-1-toke@redhat.com
Pull VFIO updates from Alex Williamson:
- Module reference fixes, structure renaming (Max Gurtovoy)
- Export and use common pci_dev_trylock() (Luis Chamberlain)
- Enable direct mdev device creation and probing by parent (Christoph
Hellwig & Jason Gunthorpe)
- Fix mdpy error path leak (Colin Ian King)
- Fix mtty list entry leak (Jason Gunthorpe)
- Enforce mtty device limit (Alex Williamson)
- Resolve concurrent vfio-pci mmap faults (Alex Williamson)
* tag 'vfio-v5.14-rc1' of git://github.com/awilliam/linux-vfio:
vfio/pci: Handle concurrent vma faults
vfio/mtty: Enforce available_instances
vfio/mtty: Delete mdev_devices_list
vfio: use the new pci_dev_trylock() helper to simplify try lock
PCI: Export pci_dev_trylock() and pci_dev_unlock()
vfio/mdpy: Fix memory leak of object mdev_state->vconfig
vfio/iommu_type1: rename vfio_group struck to vfio_iommu_group
vfio/mbochs: Convert to use vfio_register_group_dev()
vfio/mdpy: Convert to use vfio_register_group_dev()
vfio/mtty: Convert to use vfio_register_group_dev()
vfio/mdev: Allow the mdev_parent_ops to specify the device driver to bind
vfio/mdev: Remove CONFIG_VFIO_MDEV_DEVICE
driver core: Export device_driver_attach()
driver core: Don't return EPROBE_DEFER to userspace during sysfs bind
driver core: Flow the return code from ->probe() through to sysfs bind
driver core: Better distinguish probe errors in really_probe
driver core: Pull required checks into driver_probe_device()
vfio/platform: remove unneeded parent_module attribute
vfio: centralize module refcount in subsystem layer
Pull networking updates from Jakub Kicinski:
"Core:
- BPF:
- add syscall program type and libbpf support for generating
instructions and bindings for in-kernel BPF loaders (BPF loaders
for BPF), this is a stepping stone for signed BPF programs
- infrastructure to migrate TCP child sockets from one listener to
another in the same reuseport group/map to improve flexibility
of service hand-off/restart
- add broadcast support to XDP redirect
- allow bypass of the lockless qdisc to improving performance (for
pktgen: +23% with one thread, +44% with 2 threads)
- add a simpler version of "DO_ONCE()" which does not require jump
labels, intended for slow-path usage
- virtio/vsock: introduce SOCK_SEQPACKET support
- add getsocketopt to retrieve netns cookie
- ip: treat lowest address of a IPv4 subnet as ordinary unicast
address allowing reclaiming of precious IPv4 addresses
- ipv6: use prandom_u32() for ID generation
- ip: add support for more flexible field selection for hashing
across multi-path routes (w/ offload to mlxsw)
- icmp: add support for extended RFC 8335 PROBE (ping)
- seg6: add support for SRv6 End.DT46 behavior
- mptcp:
- DSS checksum support (RFC 8684) to detect middlebox meddling
- support Connection-time 'C' flag
- time stamping support
- sctp: packetization Layer Path MTU Discovery (RFC 8899)
- xfrm: speed up state addition with seq set
- WiFi:
- hidden AP discovery on 6 GHz and other HE 6 GHz improvements
- aggregation handling improvements for some drivers
- minstrel improvements for no-ack frames
- deferred rate control for TXQs to improve reaction times
- switch from round robin to virtual time-based airtime scheduler
- add trace points:
- tcp checksum errors
- openvswitch - action execution, upcalls
- socket errors via sk_error_report
Device APIs:
- devlink: add rate API for hierarchical control of max egress rate
of virtual devices (VFs, SFs etc.)
- don't require RCU read lock to be held around BPF hooks in NAPI
context
- page_pool: generic buffer recycling
New hardware/drivers:
- mobile:
- iosm: PCIe Driver for Intel M.2 Modem
- support for Qualcomm MSM8998 (ipa)
- WiFi: Qualcomm QCN9074 and WCN6855 PCI devices
- sparx5: Microchip SparX-5 family of Enterprise Ethernet switches
- Mellanox BlueField Gigabit Ethernet (control NIC of the DPU)
- NXP SJA1110 Automotive Ethernet 10-port switch
- Qualcomm QCA8327 switch support (qca8k)
- Mikrotik 10/25G NIC (atl1c)
Driver changes:
- ACPI support for some MDIO, MAC and PHY devices from Marvell and
NXP (our first foray into MAC/PHY description via ACPI)
- HW timestamping (PTP) support: bnxt_en, ice, sja1105, hns3, tja11xx
- Mellanox/Nvidia NIC (mlx5)
- NIC VF offload of L2 bridging
- support IRQ distribution to Sub-functions
- Marvell (prestera):
- add flower and match all
- devlink trap
- link aggregation
- Netronome (nfp): connection tracking offload
- Intel 1GE (igc): add AF_XDP support
- Marvell DPU (octeontx2): ingress ratelimit offload
- Google vNIC (gve): new ring/descriptor format support
- Qualcomm mobile (rmnet & ipa): inline checksum offload support
- MediaTek WiFi (mt76)
- mt7915 MSI support
- mt7915 Tx status reporting
- mt7915 thermal sensors support
- mt7921 decapsulation offload
- mt7921 enable runtime pm and deep sleep
- Realtek WiFi (rtw88)
- beacon filter support
- Tx antenna path diversity support
- firmware crash information via devcoredump
- Qualcomm WiFi (wcn36xx)
- Wake-on-WLAN support with magic packets and GTK rekeying
- Micrel PHY (ksz886x/ksz8081): add cable test support"
* tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2168 commits)
tcp: change ICSK_CA_PRIV_SIZE definition
tcp_yeah: check struct yeah size at compile time
gve: DQO: Fix off by one in gve_rx_dqo()
stmmac: intel: set PCI_D3hot in suspend
stmmac: intel: Enable PHY WOL option in EHL
net: stmmac: option to enable PHY WOL with PMT enabled
net: say "local" instead of "static" addresses in ndo_dflt_fdb_{add,del}
net: use netdev_info in ndo_dflt_fdb_{add,del}
ptp: Set lookup cookie when creating a PTP PPS source.
net: sock: add trace for socket errors
net: sock: introduce sk_error_report
net: dsa: replay the local bridge FDB entries pointing to the bridge dev too
net: dsa: ensure during dsa_fdb_offload_notify that dev_hold and dev_put are on the same dev
net: dsa: include fdb entries pointing to bridge in the host fdb list
net: dsa: include bridge addresses which are local in the host fdb list
net: dsa: sync static FDB entries on foreign interfaces to hardware
net: dsa: install the host MDB and FDB entries in the master's RX filter
net: dsa: reference count the FDB addresses at the cross-chip notifier level
net: dsa: introduce a separate cross-chip notifier type for host FDBs
net: dsa: reference count the MDB entries at the cross-chip notifier level
...
Pull documentation updates from Jonathan Corbet:
"This was a reasonably active cycle for documentation; this includes:
- Some kernel-doc cleanups. That script is still regex onslaught from
hell, but it has gotten a little better.
- Improvements to the checkpatch docs, which are also used by the
tool itself.
- A major update to the pathname lookup documentation.
- Elimination of :doc: markup, since our automarkup magic can create
references from filenames without all the extra noise.
- The flurry of Chinese translation activity continues.
Plus, of course, the usual collection of updates, typo fixes, and
warning fixes"
* tag 'docs-5.14' of git://git.lwn.net/linux: (115 commits)
docs: path-lookup: use bare function() rather than literals
docs: path-lookup: update symlink description
docs: path-lookup: update get_link() ->follow_link description
docs: path-lookup: update WALK_GET, WALK_PUT desc
docs: path-lookup: no get_link()
docs: path-lookup: update i_op->put_link and cookie description
docs: path-lookup: i_op->follow_link replaced with i_op->get_link
docs: path-lookup: Add macro name to symlink limit description
docs: path-lookup: remove filename_mountpoint
docs: path-lookup: update do_last() part
docs: path-lookup: update path_mountpoint() part
docs: path-lookup: update path_to_nameidata() part
docs: path-lookup: update follow_managed() part
docs: Makefile: Use CONFIG_SHELL not SHELL
docs: Take a little noise out of the build process
docs: x86: avoid using ReST :doc:`foo` markup
docs: virt: kvm: s390-pv-boot.rst: avoid using ReST :doc:`foo` markup
docs: userspace-api: landlock.rst: avoid using ReST :doc:`foo` markup
docs: trace: ftrace.rst: avoid using ReST :doc:`foo` markup
docs: trace: coresight: coresight.rst: avoid using ReST :doc:`foo` markup
...
Daniel Borkmann says:
====================
pull-request: bpf-next 2021-06-28
The following pull-request contains BPF updates for your *net-next* tree.
We've added 37 non-merge commits during the last 12 day(s) which contain
a total of 56 files changed, 394 insertions(+), 380 deletions(-).
The main changes are:
1) XDP driver RCU cleanups, from Toke Høiland-Jørgensen and Paul E. McKenney.
2) Fix bpf_skb_change_proto() IPv4/v6 GSO handling, from Maciej Żenczykowski.
3) Fix false positive kmemleak report for BPF ringbuf alloc, from Rustam Kovhaev.
4) Fix x86 JIT's extable offset calculation for PROBE_LDX NULL, from Ravi Bangoria.
5) Enable libbpf fallback probing with tracing under RHEL7, from Jonathan Edwards.
6) Clean up x86 JIT to remove unused cnt tracking from EMIT macro, from Jiri Olsa.
7) Netlink cleanups for libbpf to please Coverity, from Kumar Kartikeya Dwivedi.
8) Allow to retrieve ancestor cgroup id in tracing programs, from Namhyung Kim.
9) Fix lirc BPF program query to use user-provided prog_cnt, from Sean Young.
10) Add initial libbpf doc including generated kdoc for its API, from Grant Seltzer.
11) Make xdp_rxq_info_unreg_mem_model() more robust, from Jakub Kicinski.
12) Fix up bpfilter startup log-level to info level, from Gary Lin.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull perf events updates from Ingo Molnar:
- Platform PMU driver updates:
- x86 Intel uncore driver updates for Skylake (SNR) and Icelake (ICX) servers
- Fix RDPMC support
- Fix [extended-]PEBS-via-PT support
- Fix Sapphire Rapids event constraints
- Fix :ppp support on Sapphire Rapids
- Fix fixed counter sanity check on Alder Lake & X86_FEATURE_HYBRID_CPU
- Other heterogenous-PMU fixes
- Kprobes:
- Remove the unused and misguided kprobe::fault_handler callbacks.
- Warn about kprobes taking a page fault.
- Fix the 'nmissed' stat counter.
- Misc cleanups and fixes.
* tag 'perf-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Fix task context PMU for Hetero
perf/x86/intel: Fix instructions:ppp support in Sapphire Rapids
perf/x86/intel: Add more events requires FRONTEND MSR on Sapphire Rapids
perf/x86/intel: Fix fixed counter check warning for some Alder Lake
perf/x86/intel: Fix PEBS-via-PT reload base value for Extended PEBS
perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task
kprobes: Do not increment probe miss count in the fault handler
x86,kprobes: WARN if kprobes tries to handle a fault
kprobes: Remove kprobe::fault_handler
uprobes: Update uprobe_write_opcode() kernel-doc comment
perf/hw_breakpoint: Fix DocBook warnings in perf hw_breakpoint
perf/core: Fix DocBook warnings
perf/core: Make local function perf_pmu_snapshot_aux() static
perf/x86/intel/uncore: Enable I/O stacks to IIO PMON mapping on ICX
perf/x86/intel/uncore: Enable I/O stacks to IIO PMON mapping on SNR
perf/x86/intel/uncore: Generalize I/O stacks to PMON mapping procedure
perf/x86/intel/uncore: Drop unnecessary NULL checks after container_of()