IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Ong Boon Leong says:
====================
stmmac: Add XDP support
This is the v4 patch series for adding XDP native support to stmmac.
Changes in v4:
5/6: Move TX clean timer setup to the end of NAPI RX process and
group it under stmmac_finalize_xdp_rx().
Also, fixed stmmac_xdp_xmit_back() returns STMMAC_XDP_CONSUMED
if XDP buffer conversion to XDP frame fails.
6/6: Move xdp_do_flush(0 into stmmac_finalize_xdp_rx() and combine
the XDP verdict of XDP TX and XDP REDIRECT together.
I retested the patch series on the 'xdp2' and 'xdp_redirect' related to
changes above and found the result to be satisfactory.
History of previous patch series:
v3: https://patchwork.kernel.org/project/netdevbpf/cover/20210331154135.8507-1-boon.leong.ong@intel.com/
v2: https://patchwork.kernel.org/project/netdevbpf/list/?series=457757
v1: https://patchwork.kernel.org/project/netdevbpf/list/?series=457139
It will be great if community can help to test or review the v4 series
and provide me any input if any.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the support of XDP_REDIRECT to another remote cpu for
further action. It also implements ndo_xdp_xmit ops, enabling the driver
to transmit packets forwarded to it by XDP program running on another
interface.
This patch has been tested using "xdp_redirect_cpu" for XDP_REDIRECT
+ drop testing. It also been tested with "xdp_redirect" sample app
which can be used to exercise ndo_xdp_xmit ops. The burst traffics are
generated using pktgen_sample03_burst_single_flow.sh in samples/pktgen
directory.
v4: Move xdp_do_flush() processing into stmmac_finalize_xdp_rx() and
combined the XDP verdict of XDP TX and REDIRECT together.
v3: Added 'nq->trans_start = jiffies' to avoid TX time-out as we are
sharing TX queue between slow path and XDP. Thanks to Jakub Kicinski
for point out.
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for XDP_TX action which enables XDP program to
transmit back received frames.
This patch has been tested with the "xdp2" app located in samples/bpf
dir. The DUT receives burst traffic packet generated using pktgen script
'pktgen_sample03_burst_single_flow.sh'.
v4: Moved stmmac_tx_timer_arm() to be done once at the end of NAPI RX.
Fixed stmmac_xdp_xmit_back() to return STMMAC_XDP_CONSUMED if
XDP buffer to frame conversion fails. Thanks to Jakub's input.
v3: Added 'nq->trans_start = jiffies' to avoid TX time-out as we are
sharing TX queue between slow path and XDP. Thanks to Jakub Kicinski
for pointing out.
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the initial XDP support to stmmac driver. It supports
XDP_PASS, XDP_DROP and XDP_ABORTED actions. Upcoming patches will add
support for XDP_TX and XDP_REDIRECT.
To support XDP headroom, this patch adds page_offset into RX buffer and
change the dma_sync_single_for_device|cpu(). The DMA address used for
RX operation are changed to take into page_offset too. As page_pool
can handle dma_sync_single_for_device() on behalf of driver with
PP_FLAG_DMA_SYNC_DEV flag, we skip doing that in stmmac driver.
Current stmmac driver supports split header support (SPH) in RX but
the flexibility of splitting header and payload at different position
makes it very complex to be supported for XDP processing. In addition,
jumbo frame is not supported in XDP to keep the initial codes simple.
This patch has been tested with the sample app "xdp1" located in
samples/bpf directory for both SKB and Native (XDP) mode. The burst
traffic generated using pktgen_sample03_burst_single_flow.sh in
samples/pktgen directory.
Changes in v3:
- factor in xdp header and tail adjustment done by XDP program.
Thanks to Jakub Kicinski for pointing out the gap in v2.
Changes in v2:
- fix for "warning: variable 'len' set but not used" reported by lkp.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch organizes TX tail pointer update into a new function called
stmmac_flush_tx_descriptors() so that we can reuse it in stmmac_xmit(),
stmmac_tso_xmit() and up-coming XDP implementation.
Changes to v2:
- Fix for warning: unused variable ‘desc_size’
https://patchwork.hopto.org/static/nipa/457321/12170149/build_32bit/stderr
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SPH functionality splits header and payload according to split mode and
offsef fields (SPLM and SPLOFST). It is beneficials for Linux network
stack RX processing however it adds a lot of complexity in XDP
processing.
So, this patch makes the split-header (SPH) capability of the controller
is stored in "priv->sph_cap" and the enabling/disabling of SPH is decided
by "priv->sph".
This is to prepare initial XDP enabling for stmmac to disable the use of
SPH whenever XDP is enabled.
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Certain platform likes Intel mGBE has independent hardware IRQ resources
for TX and RX DMA operation. In preparation to support XDP TX, we add IRQ
affinity hint to group both RX and TX queue of the same queue ID to the
same CPU.
Changes in v2:
- IRQ affinity hint need to set to null before IRQ is released.
Thanks to issue reported by Song, Yoong Siang.
Reported-by: Song, Yoong Siang <yoong.siang.song@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Group all the often used fields in the first cache line,
to reduce cache line misses.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Order fields to increase locality for most used protocols.
udplite and icmp are moved at the end.
Same for proc_net_devsnmp6 which is not used in fast path.
This potentially saves one cache line miss for typical TCP/UDP over IPv4/IPv6.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the "type_a->nfcid_len" is too large then it would lead to memory
corruption in pn533_target_found_type_a() when we do:
memcpy(nfc_tgt->nfcid1, tgt_type_a->nfcid_data, nfc_tgt->nfcid1_len);
Fixes: c3b1e1e8a76f ("NFC: Export NFCID1 from pn533")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ioana Ciornei says:
====================
dpaa2-eth: add rx copybreak support
DMA unmapping, allocating a new buffer and DMA mapping it back on the
refill path is really not that efficient. Proper buffer recycling (page
pool, flipping the page and using the other half) cannot be done for
DPAA2 since it's not a ring based controller but it rather deals with
multiple queues which all get their buffers from the same buffer pool on
Rx.
To circumvent these limitations, add support for Rx copybreak in
dpaa2-eth.
Below you can find a summary of the tests that were run to end up
with the default rx copybreak value of 512.
A bit about the setup - a LS2088A SoC, 8 x Cortex A72 @ 1.8GHz, IPfwd
zero loss test @ 20Gbit/s throughput. I tested multiple frame sizes to
get an idea where is the break even point.
Here are 2 sets of results, (1) is the baseline and (2) is just
allocating a new skb for all frames sizes received (as if the copybreak
was even to the MTU). All numbers are in Mpps.
64 128 256 512 640 768 896
(1) 3.23 3.23 3.24 3.21 3.1 2.76 2.71
(2) 3.95 3.88 3.79 3.62 3.3 3.02 2.65
It seems that even for 512 bytes frame sizes it's comfortably better when
allocating a new skb. After that, we see diminishing rewards or even worse.
Changes in v2:
- properly marked dpaa2_eth_copybreak as static
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
It's useful, especially for debugging purposes, to have the Rx copybreak
value changeable at runtime. Export it as an ethtool tunable.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
DMA unmapping, allocating a new buffer and DMA mapping it back on the
refill path is really not that efficient. Proper buffer recycling (page
pool, flipping the page and using the other half) cannot be done for
DPAA2 since it's not a ring based controller but it rather deals with
multiple queues which all get their buffers from the same buffer pool on
Rx.
To circumvent these limitations, add support for Rx copybreak. For small
sized packets instead of creating a skb around the buffer in which the
frame was received, allocate a new sk buffer altogether, copy the
contents of the frame and release the initial page back into the buffer
pool.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename the dpaa2_eth_xdp_release_buf function into dpaa2_eth_recycle_buf
since in the next patches we'll be using the same recycle mechanism for
the normal stack path beside for XDP_DROP.
Also, rename the array which holds the buffers to be recycled so that it
does not have any reference to XDP.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mat Martineau says:
====================
MPTCP: Miscellaneous changes
Here is a collection of patches from the MPTCP tree:
Patches 1 and 2 add some helpful MIB counters for connection
information.
Patch 3 cleans up some unnecessary checks.
Patch 4 is a new feature, support for the MP_TCPRST option. This option
is used when resetting one subflow within a MPTCP connection, and
provides a reason code that the recipient can use when deciding how to
adapt to the lost subflow.
Patches 5-7 update the existing MPTCP selftests to improve timeout
handling and to share better information when tests fail.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Very occasionally, MPTCP selftests fail. Yeah, I saw that at least once!
Here we provide more details in case of errors with mptcp_join.sh script
like it was done with mptcp_connect.sh, see
commit 767389c8dd55 ("selftests: mptcp: dump more info on errors")
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Not to be impacted by packets sent between sub-tests.
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
'mptcp_connect' already has a timeout for poll() but in some cases, it
is not enough.
With "timeout" tool, we will force the command to fail if it doesn't
finish on time. Thanks to that, the script will continue and display
details about the current state before marking the test as failed.
Displaying this state is very important to be able to understand the
issue. Best to have our CI reporting the issue than just "the test
hanged".
Note that in mptcp_connect.sh, we were using a long timeout to validate
the fact we cannot create a socket if a sysctl is set. We don't need
this timeout.
In diag.sh, we want to send signals to mptcp_connect instances that have
been started in the netns. But we cannot send this signal to 'timeout'
otherwise that will stop the timeout and messages telling us SIGUSR1 has
been received will be printed. Instead of trying to find the right PID
and storing them in an array, we can simply use the output of
'ip netns pids' which is all the PIDs we want to send signal to.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/160
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MPTCP reset option allows to carry a mptcp-specific error code that
provides more information on the nature of a connection reset.
Reset option data received gets stored in the subflow context so it can
be sent to userspace via the 'subflow closed' netlink event.
When a subflow is closed, the desired error code that should be sent to
the peer is also placed in the subflow context structure.
If a reset is sent before subflow establishment could complete, e.g. on
HMAC failure during an MP_JOIN operation, the mptcp skb extension is
used to store the reset information.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we explicitly check for the first subflow being
NULL in a couple of places, even if we don't need any
special actions in such scenario.
Just drop the unneeded checks, to avoid confusion.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We are not currently tracking the active MPTCP connection
attempts. Let's add the related counters.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the MPTCP protocol is unable to create a new token,
the socket fallback to plain TCP, let's keep track
of such events via a specific MIB.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson says:
====================
ionic: add PTP and hw clock support
This patchset adds support for accessing the DSC hardware clock and
for offloading PTP timestamping.
Tx packet timestamping happens through a separate Tx queue set up with
expanded completion descriptors that can report the timestamp.
Rx timestamping can happen either on all queues, or on a separate
timestamping queue when specific filtering is requested. Again, the
timestamps are reported with the expanded completion descriptors.
The timestamping offload ability is advertised but not enabled until an
OS service asks for it. At that time the driver's queues are reconfigured
to use the different completion descriptors and the private processing
queues as needed.
Reading the raw clock value comes through a new pair of values in the
device info registers in BAR0. These high and low values are interpreted
with help from new clock mask, mult, and shift values in the device
identity information.
First we add the ability to detect new queue features, then the handling
of the new descriptor sizes. After adding the new interface structures,
we start adding the support code, saving the advertising to the stack
for last.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Let the network stack know we've got support for timestamping
the packets.
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the new hwstamp stats to our ethtool stats output.
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the get_ts_info() callback for ethtool support of
timestamping information.
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Tx and Rx timestamped packets are handled through separate
queues. Here we set them up, service them, and tear them down
along with the normal Tx and Rx queues.
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
We do hardware timestamping through a separate Tx queue,
and optionally through a separate Rx queue. These queues
are allocated, freed, and tracked separately from the basic
queue arrays.
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add handling of the new Rx packet classification filter type.
This simple bit of classification allows for steering packets
to a separate Rx queue for processing.
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
These are changes to compile and link the new code, but no
new feature support is available or advertised yet.
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds the file of code for supporting Tx and Rx hardware
timestamps and the raw clock interface, but does not yet link
it in for compiling or use.
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Split the wait part out of adminq_post_wait() into a separate
function so that a caller can have finer grain control over
the sequencing of operations and locking.
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
The interface for hardware timestamping includes a new FW
request, device identity fields, Tx and Rx queue feature bits, a
new Rx filter type, the beginnings of Rx packet classifications,
and hardware timestamp registers.
If the IONIC_ETH_HW_TIMESTAMP bit is shown in the
ionic_lif_config features bit string, then we have support
for the hw clock registers. If the IONIC_RXQ_F_HWSTAMP and
IONIC_TXQ_F_HWSTAMP features are shown in the ionic_q_identity
features, then the queues can support HW timestamps on packets.
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparating for hardware timestamping, we need to support
large Tx and Rx completion descriptors. Here we add the new
queue feature ids and handling for the completion descriptor
sizes.
We only are adding support for the Rx 2x sized completion
descriptors in the general Rx queues for now as we will be
using it for PTP Rx support, and we don't have an immediate
use for the large descriptors in the general Tx queues yet;
it will be used in a special Tx queues added in one of the
next few patches.
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add queue feature extensions to prepare for features that
can be queue specific, in addition to the general queue
features already defined. While we're here, change the
existing feature ids from #defines to enum.
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov says:
====================
pull-request: bpf-next 2021-04-01
The following pull-request contains BPF updates for your *net-next* tree.
We've added 68 non-merge commits during the last 7 day(s) which contain
a total of 70 files changed, 2944 insertions(+), 1139 deletions(-).
The main changes are:
1) UDP support for sockmap, from Cong.
2) Verifier merge conflict resolution fix, from Daniel.
3) xsk selftests enhancements, from Maciej.
4) Unstable helpers aka kernel func calling, from Martin.
5) Batches ops for LPM map, from Pedro.
6) Fix race in bpf_get_local_storage, from Yonghong.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
All Gigabit PHYs use the same register layout as far as fetching
statistics goes. Fast Ethernet PHYs do not all support statistics, and
the BCM54616S would require some switching between the coper and fiber
modes to fetch the appropriate statistics which is not supported yet.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
If there is overlapp between ip_local_port_range and ip_local_reserved_ports with a huge reserved block, it will affect probability of selecting ephemeral ports, see file net/ipv4/inet_hashtables.c:723
int __inet_hash_connect(
...
for (i = 0; i < remaining; i += 2, port += 2) {
if (unlikely(port >= high))
port -= remaining;
if (inet_is_local_reserved_port(net, port))
continue;
E.g. if there is reserved block of 10000 ports, two ports right after this block will be 5000 more likely selected than others.
If this was intended, we can/should add note into documentation as proposed in this commit, otherwise we should think about different solution. One option could be mapping table of continuous port ranges. Second option could be letting user to modify step (port+=2) in above loop, e.g. using new sysctl parameter.
Signed-off-by: Otto Hollmann <otto.hollmann@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix some typos.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Lu Wei <luwei32@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct smc_clc_msg_local is declared twice. One is declared at
301st line. The blew one is not needed. Remove the duplicate.
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct ctl_table_header is declared twice. One is declared
at 46th line. The blew one is not needed. Remove the duplicate.
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit d2a029bde37b ("stmmac: pci: add MSI support for Intel Quark
X1000") introduced a pci_enable_msi() call in stmmac_pci.c.
With the commit 58da0cfa6cf1 ("net: stmmac: create dwmac-intel.c to
contain all Intel platform"), Intel Quark platform related codes
have been moved to the newly created driver.
Removing this unnecessary pci_enable_msi() call as there are no other
devices that uses stmmac-pci and need MSI to be enabled.
Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update dwmac-intel to use managed function, i.e. pcim_enable_device().
This will allow devres framework to call resource free function for us.
Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The logic in rt6_age_examine_exception is confusing. The commit is
to refactor the code.
Signed-off-by: Xu Jia <xujia39@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When enabling a bearer by name, we don't sanity check its name with
higher slot in bearer list. This may have the effect that the name
of an already enabled bearer bypasses the check.
To fix the above issue, we just perform an extra checking with all
existing bearers.
Fixes: cb30a63384bc9 ("tipc: refactor function tipc_enable_bearer()")
Cc: stable@vger.kernel.org
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
100GbE Intel Wired LAN Driver Updates 2021-03-31
This series contains updates to ice driver only.
Benita adds support for XPS.
Ani moves netdev registration to the end of probe to prevent use before
the interface is ready and moves up an error check to possibly avoid
an unneeded call. He also consolidates the VSI state and flag fields to
a single field.
Dan changes the segment where package information is pulled.
Paul S ensures correct ITR values are set when increasing ring size.
Paul G rewords a link misconfiguration message as this could be
expected.
Bruce removes setting an unnecessary AQ flag and corrects a memory
allocation call. Also fixes checkpatch issues for 'COMPLEX_MACRO'.
Qi aligns PTYPE bitmap naming by adding 'ptype' prefix to the bitmaps
missing it.
Brett removes limiting Rx queue mapping to RSS size as there is not a
dependency on this. He also refactors RSS configuration by introducing
individual functions for LUT and key configuration and by passing a
structure containing pertinent information instead of individual
arguments.
Tony corrects a comment block to follow netdev style.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang says:
====================
From: Cong Wang <cong.wang@bytedance.com>
We have thousands of services connected to a daemon on every host
via AF_UNIX dgram sockets, after they are moved into VM, we have to
add a proxy to forward these communications from VM to host, because
rewriting thousands of them is not practical. This proxy uses an
AF_UNIX socket connected to services and a UDP socket to connect to
the host. It is inefficient because data is copied between kernel
space and user space twice, and we can not use splice() which only
supports TCP. Therefore, we want to use sockmap to do the splicing
without going to user-space at all (after the initial setup).
Currently sockmap only fully supports TCP, UDP is partially supported
as it is only allowed to add into sockmap. This patchset, as the second
part of the original large patchset, extends sockmap with:
1) cross-protocol support with BPF_SK_SKB_VERDICT; 2) full UDP support.
On the high level, ->read_sock() is required for each protocol to support
sockmap redirection, and in order to do sock proto update, a new ops
->psock_update_sk_prot() is introduced, which is also required. And the
BPF ->recvmsg() is also needed to replace the original ->recvmsg() to
retrieve skmsg. To make life easier, we have to get rid of lock_sock()
in sk_psock_handle_skb(), otherwise we would have to implement
->sendmsg_locked() on top of ->sendmsg(), which is ugly.
Please see each patch for more details.
To see the big picture, the original patchset is available here:
https://github.com/congwang/linux/tree/sockmap
this patchset is also available:
https://github.com/congwang/linux/tree/sockmap2
---
v8: get rid of 'offset' in udp_read_sock()
add checks for skb_verdict/stream_verdict conflict
add two cleanup patches for sock_map_link()
add a new test case
v7: use work_mutex to protect psock->work
return err in udp_read_sock()
add patch 6/13
clean up test case
v6: get rid of sk_psock_zap_ingress()
add rcu work patch
v5: use INDIRECT_CALL_2() for function pointers
use ingress_lock to fix a race condition found by Jacub
rename two helper functions
v4: get rid of lock_sock() in sk_psock_handle_skb()
get rid of udp_sendmsg_locked()
remove an empty line
update cover letter
v3: export tcp/udp_update_proto()
rename sk->sk_prot->psock_update_sk_prot()
improve changelogs
v2: separate from the original large patchset
rebase to the latest bpf-next
split UDP test case
move inet_csk_has_ulp() check to tcp_bpf.c
clean up udp_read_sock()
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>