IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Fix the following out-of-bounds warning:
net/core/flow_dissector.c:835:3: warning: 'memcpy' offset [33, 48] from the object at 'flow_keys' is out of the bounds of referenced subobject 'ipv6_src' with type '__u32[4]' {aka 'unsigned int[4]'} at offset 16 [-Warray-bounds]
The problem is that the original code is trying to copy data into a
couple of struct members adjacent to each other in a single call to
memcpy(). So, the compiler legitimately complains about it. As these
are just a couple of members, fix this by copying each one of them in
separate calls to memcpy().
This helps with the ongoing efforts to globally enable -Warray-bounds
and get us closer to being able to tighten the FORTIFY_SOURCE routines
on memcpy().
Link: https://github.com/KSPP/linux/issues/109
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski says:
====================
ethtool: add uAPI for reading standard stats
Continuing the effort of providing a unified access method
to standard stats, and explicitly tying the definitions to
the standards this series adds an API for general stats
which do no fit into more targeted control APIs.
There is nothing clever here, just a netlink API for dumping
statistics defined by standards and RFCs which today end up
in ethtool -S under infinite variations of names.
This series adds basic IEEE stats (for PHY, MAC, Ctrl frames)
and RMON stats. AFAICT other RFCs only duplicate the IEEE
stats.
This series does _not_ add a netlink API to read driver-defined
stats. There seems to be little to gain from moving that part
to netlink.
The netlink message format is very simple, and aims to allow
adding stats and groups with no changes to user tooling (which
IIUC is expected for ethtool).
On user space side we can re-use -S, and make it dump
standard stats if --groups are defined.
$ ethtool -S eth0 --groups eth-phy eth-mac eth-ctrl rmon
Stats for eth0:
eth-phy-SymbolErrorDuringCarrier: 0
eth-mac-FramesTransmittedOK: 0
eth-mac-FrameTooLongErrors: 0
eth-ctrl-MACControlFramesTransmitted: 0
eth-ctrl-MACControlFramesReceived: 1
eth-ctrl-UnsupportedOpcodesReceived: 0
rmon-etherStatsUndersizePkts: 0
rmon-etherStatsJabbers: 0
rmon-rx-etherStatsPkts64Octets: 1
rmon-rx-etherStatsPkts128to255Octets: 0
rmon-rx-etherStatsPkts1024toMaxOctets: 1
rmon-tx-etherStatsPkts64Octets: 1
rmon-tx-etherStatsPkts128to255Octets: 0
rmon-tx-etherStatsPkts1024toMaxOctets: 1
v1:
Driver support for mlxsw, mlx5 and bnxt included.
Compared to the RFC I went ahead with wrapping the stats into
a 1:1 nest. Now IDs of stats can start from 0, at a cost of
slightly "careful" u64 alignment handling.
v2:
Add missing kdoc in patch 5.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Most of the names seem to strongly correlate with names from
the standard and RFC. Whether ..+good_frames are indeed Frames..OK
I'm the least sure of.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw has nicely grouped stats, add support for standard uAPI.
I'm guessing the register access part. Compile tested only.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most devices maintain RMON (RFC 2819) stats - particularly
the "histogram" of packets received by size. Unlike other
RFCs which duplicate IEEE stats, the short/oversized frame
counters in RMON don't seem to match IEEE stats 1-to-1 either,
so expose those, too. Do not expose basic packet, CRC errors
etc - those are already otherwise covered.
Because standard defines packet ranges only up to 1518, and
everything above that should theoretically be "oversized"
- devices often create their own ranges.
Going beyond what the RFC defines - expose the "histogram"
in the Tx direction (assume for now that the ranges will
be the same).
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Number of devices maintains the standard-based MAC control
counters for control frames. Add a API for those.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most of the MAC statistics are included in
struct rtnl_link_stats64, but some fields
are aggregated. Besides it's good to expose
these clearly hardware stats separately.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add an interface for reading standard stats, including
stats which don't have a corresponding control interface.
Start with IEEE 802.3 PHY stats. There seems to be only
one stat to expose there.
Define API to not require user space changes when new
stats or groups are added. Groups are based on bitset,
stats have a string set associated.
v1: wrap stats in a nest
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the lack of expectations for switching NICs explicit,
describe the new stats.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the following out-of-bounds warning:
net/sctp/sm_make_chunk.c:3150:4: warning: 'memcpy' offset [17, 28] from the object at 'addr' is out of the bounds of referenced subobject 'v4' with type 'struct sockaddr_in' at offset 0 [-Warray-bounds]
This helps with the ongoing efforts to globally enable -Warray-bounds
and get us closer to being able to tighten the FORTIFY_SOURCE routines
on memcpy().
Link: https://github.com/KSPP/linux/issues/109
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patchset introduces updates to mlx5e netdev driver.
1) Tariq refactors TLS offloads and adds resiliency against RX resync
failures
2) Maxim reduces code duplications by unifying channels reset flow
regardless if channels are closed or open
3) Aya Enhances TX/RX health reporters diagnostics to expose the
internal clock time-stamping format
4) Moshe adds support for ethtool extended link state, to show the reason
for link down
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmB53AUACgkQSD+KveBX
+j6rzAf+JwJG9G7GSj3a/xird4dlgt4xPbRLB19pTw19ZyHZyujDxdN4QM3r5hTk
5ua1PnhYYaUcyPFvdgR9J0cIJ3QRaxZ+q/XnkE9Yo0eZ1DJ0SL/n6rxEQpcxpee1
XP7qjJu3leVwh5mVW2uOx/ClrL9vYb/fG3Q00j59rUB+i9bZszXZgZ99hJvYBFTB
k7W/9X6BNxuLlEg/Ui9L499aDWHRcIY5J2ku+1v/8paJZltk+IFv5glYszylE++M
l68drIy3dIjl/Sxj6WR2rHTBus6AIFxWFH8C2L7uqGl97BPjS80snMPIefLJhW+y
bQvzMDtfKDmIpvEIdzHPuEhEdqqteg==
=YCy6
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2021-04-16' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2021-04-16
This patchset introduces updates to mlx5e netdev driver.
1) Tariq refactors TLS offloads and adds resiliency against RX resync
failures
2) Maxim reduces code duplications by unifying channels reset flow
regardless if channels are closed or open
3) Aya Enhances TX/RX health reporters diagnostics to expose the
internal clock time-stamping format
4) Moshe adds support for ethtool extended link state, to show the reason
for link down
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil says:
====================
net: gianfar: Drop GFAR_MQ_POLLING support
Drop long time obsolete "per NAPI multi-queue" support in gianfar,
and related (and undocumented) device tree properties.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
These are very old properties that were used by the "gianfar" ethernet
driver. They don't have documented bindings and are obsolete.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gianfar used to enable all 8 Rx queues (DMA rings) per
ethernet device, even though the controller can only
support 2 interrupt lines at most. This meant that
multiple Rx queues would have to be grouped per NAPI poll
routine, and the CPU would have to split the budget and
service them in a round robin manner. The overhead of
this scheme proved to outweight the potential benefits.
The alternative was to introduce the "Single Queue" polling
mode, supporting one Rx queue per NAPI, which became the
default packet processing option and helped improve the
performance of the driver.
MQ_POLLING also relies on undocumeted device tree properties
to specify how to map the 8 Rx and Tx queues to a given
interrupt line (aka "interrupt group"). Using module parameters
to enable this mode wasn't an option either. Long story short,
MQ_POLLING became obsolete, now it is just dead code, and no
one asked for it so far.
For the Tx queues, multi-queue support (more than 1 Tx queue
per CPU) could be revisited by adding tc MQPRIO support, but
again, one has to consider that there are only 2 interrupt lines.
So the NAPI poll routine would have to service multiple Tx rings.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The recent patch that tied enabling of veth NAPI to the GRO flag also has
the nice side effect that a veth device can be the target of an
XDP_REDIRECT without an XDP program needing to be loaded on the peer
device. However, the patch adding this extra NAPI mode didn't actually
change the check in veth_xdp_xmit() to also look at the new NAPI pointer,
so let's fix that.
Fixes: 6788fa154546 ("veth: allow enabling NAPI even without XDP")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
__ipv6_dev_mc_dec() internally uses sleepable functions so that caller
must not acquire atomic locks. But caller, which is addrconf_verify_rtnl()
acquires rcu_read_lock_bh().
So this warning occurs in the __ipv6_dev_mc_dec().
Test commands:
ip netns add A
ip link add veth0 type veth peer name veth1
ip link set veth1 netns A
ip link set veth0 up
ip netns exec A ip link set veth1 up
ip a a 2001:db8::1/64 dev veth0 valid_lft 2 preferred_lft 1
Splat looks like:
============================
WARNING: suspicious RCU usage
5.12.0-rc6+ #515 Not tainted
-----------------------------
kernel/sched/core.c:8294 Illegal context switch in RCU-bh read-side
critical section!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
4 locks held by kworker/4:0/1997:
#0: ffff88810bd72d48 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at:
process_one_work+0x761/0x1440
#1: ffff888105c8fe00 ((addr_chk_work).work){+.+.}-{0:0}, at:
process_one_work+0x795/0x1440
#2: ffffffffb9279fb0 (rtnl_mutex){+.+.}-{3:3}, at:
addrconf_verify_work+0xa/0x20
#3: ffffffffb8e30860 (rcu_read_lock_bh){....}-{1:2}, at:
addrconf_verify_rtnl+0x23/0xc60
stack backtrace:
CPU: 4 PID: 1997 Comm: kworker/4:0 Not tainted 5.12.0-rc6+ #515
Workqueue: ipv6_addrconf addrconf_verify_work
Call Trace:
dump_stack+0xa4/0xe5
___might_sleep+0x27d/0x2b0
__mutex_lock+0xc8/0x13f0
? lock_downgrade+0x690/0x690
? __ipv6_dev_mc_dec+0x49/0x2a0
? mark_held_locks+0xb7/0x120
? mutex_lock_io_nested+0x1270/0x1270
? lockdep_hardirqs_on_prepare+0x12c/0x3e0
? _raw_spin_unlock_irqrestore+0x47/0x50
? trace_hardirqs_on+0x41/0x120
? __wake_up_common_lock+0xc9/0x100
? __wake_up_common+0x620/0x620
? memset+0x1f/0x40
? netlink_broadcast_filtered+0x2c4/0xa70
? __ipv6_dev_mc_dec+0x49/0x2a0
__ipv6_dev_mc_dec+0x49/0x2a0
? netlink_broadcast_filtered+0x2f6/0xa70
addrconf_leave_solict.part.64+0xad/0xf0
? addrconf_join_solict.part.63+0xf0/0xf0
? nlmsg_notify+0x63/0x1b0
__ipv6_ifa_notify+0x22c/0x9c0
? inet6_fill_ifaddr+0xbe0/0xbe0
? lockdep_hardirqs_on_prepare+0x12c/0x3e0
? __local_bh_enable_ip+0xa5/0xf0
? ipv6_del_addr+0x347/0x870
ipv6_del_addr+0x3b1/0x870
? addrconf_ifdown+0xfe0/0xfe0
? rcu_read_lock_any_held.part.27+0x20/0x20
addrconf_verify_rtnl+0x8a9/0xc60
addrconf_verify_work+0xf/0x20
process_one_work+0x84c/0x1440
In order to avoid this problem, it uses rcu_read_unlock_bh() for
a short time. RCU is used for avoiding freeing
ifp(struct *inet6_ifaddr) while ifp is being used. But this will
not be released even if rcu_read_unlock_bh() is used.
Because before rcu_read_unlock_bh(), it uses in6_ifa_hold(ifp).
So this is safe.
Fixes: 63ed8de4be81 ("mld: add mc_lock for protecting per-interface mld data")
Suggested-by: Eric Dumazet <edumazet@google.com>
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder says:
====================
net: ipa: allow different firmware names
Add the ability to define a "firmware-name" property in the IPA DT
node, specifying an alternate name to use for the firmware file.
Used only if the AP (Trust Zone) does early IPA initialization.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
IPA initialization includes loading some firmware. This step is
done either by the modem or by the AP under Trust Zone. If the
AP loads firmware, the name of the firmware file is currently
hard-coded ("ipa_fws.mdt").
Add the ability to specify the relative path of the firmware file to
use in a property in the Device Tree IPA node. If the property is
not found (or if any other error occurs attempting to get it), fall
back to using a default relative path.
Use the "old" fixed name as the default. Rename the symbol that
represents this default to emphasize its purpose.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new optional firmware-name property to the IPA DT node. It
is used only if the modem is not doing early initialization (i.e.,
if the modem-init property is not present). Its value is the name
of the firmware file to use; if it's not specified, a default name
("ipa_fws.mdt") is used.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In page_to_skb(), if we have enough tailroom to save skb_shared_info, we
can use build_skb to create skb directly. No need to alloc for
additional space. And it can save a 'frags slot', which is very friendly
to GRO.
Here, if the payload of the received package is too small (less than
GOOD_COPY_LEN), we still choose to copy it directly to the space got by
napi_alloc_skb. So we can reuse these pages.
Testing Machine:
The four queues of the network card are bound to the cpu1.
Test command:
for ((i=0;i<5;++i)); do sockperf tp --ip 192.168.122.64 -m 1000 -t 150& done
The size of the udp package is 1000, so in the case of this patch, there
will always be enough tailroom to use build_skb. The sent udp packet
will be discarded because there is no port to receive it. The irqsoftd
of the machine is 100%, we observe the received quantity displayed by
sar -n DEV 1:
no build_skb: 956864.00 rxpck/s
build_skb: 1158465.00 rxpck/s
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MHI WWWAN control driver allows MHI QCOM-based modems to expose
different modem control protocols/ports via the WWAN framework, so that
userspace modem tools or daemon (e.g. ModemManager) can control WWAN
config and state (APN config, SMS, provider selection...). A QCOM-based
modem can expose one or several of the following protocols:
- AT: Well known AT commands interactive protocol (microcom, minicom...)
- MBIM: Mobile Broadband Interface Model (libmbim, mbimcli)
- QMI: QCOM MSM/Modem Interface (libqmi, qmicli)
- QCDM: QCOM Modem diagnostic interface (libqcdm)
- FIREHOSE: XML-based protocol for Modem firmware management
(qmi-firmware-update)
Note that this patch is mostly a rework of the earlier MHI UCI
tentative that was a generic interface for accessing MHI bus from
userspace. As suggested, this new version is WWAN specific and is
dedicated to only expose channels used for controlling a modem, and
for which related opensource userpace support exist.
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change introduces initial support for a WWAN framework. Given the
complexity and heterogeneity of existing WWAN hardwares and interfaces,
there is no strict definition of what a WWAN device is and how it should
be represented. It's often a collection of multiple devices that perform
the global WWAN feature (netdev, tty, chardev, etc).
One usual way to expose modem controls and configuration is via high
level protocols such as the well known AT command protocol, MBIM or
QMI. The USB modems started to expose them as character devices, and
user daemons such as ModemManager learnt to use them.
This initial version adds the concept of WWAN port, which is a logical
pipe to a modem control protocol. The protocols are rawly exposed to
user via character device, allowing straigthforward support in existing
tools (ModemManager, ofono...). The WWAN core takes care of the generic
part, including character device management, and relies on port driver
operations to receive/submit protocol data.
Since the different devices exposing protocols for a same WWAN hardware
do not necessarily know about each others (e.g. two different USB
interfaces, PCI/MHI channel devices...) and can be created/removed in
different orders, the WWAN core ensures that all WAN ports contributing
to the 'whole' WWAN feature are grouped under the same virtual WWAN
device, relying on the provided parent device (e.g. mhi controller,
USB device). It's a 'trick' I copied from Johannes's earlier WWAN
subsystem proposal.
This initial version is purposely minimalist, it's essentially moving
the generic part of the previously proposed mhi_wwan_ctrl driver inside
a common WWAN framework, but the implementation is open and flexible
enough to allow extension for further drivers.
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add parser entries for different IPv4 IHL values.
Each entry will set the L4 header offset according to the IPv4 IHL field.
L3 header offset will set during the parsing of the IPv4 protocol.
Because of missed parser support for IP header length > 20, RX IPv4 checksum HW offload fails
and skb->ip_summed set to CHECKSUM_NONE(checksum done by Network stack).
This patch adds RX IPv4 checksum HW offload capability for frames with IP header length > 20.
v1 --> v2
- Improve commit message.
Suggested-by: Dana Vardi <danat@marvell.com>
Signed-off-by: Stefan Chulski <stefanc@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hayes Wang says:
====================
r8152: support new chips
Support new RTL8153 and RTL8156 series.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The vendor mode is not always at config #1, so it is necessary to
set the correct configuration number.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Support new firmware type and method for RTL8156 series.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Support RTL8153C, RTL8153D, RTL8156A, and RTL8156B. The RTL8156A
and RTL8156B are the 2.5G ethernet.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The different chips may have different requests when changing mtu.
Therefore, add a new help function of rtl_ops to change mtu. Besides,
reset the tx/rx after changing mtu.
Additionally, add mtu_to_size() and size_to_mtu() macros to simplify
the code.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use bits operations to record and check the firmware.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set the maximum inter frame gap time (144ns) for speed 10M/half and
100M/half. It improves the performance for those speeds. And, there
is no effect for the other speeds.
For 10M/half and 100M/half, the fast inter frame gap time let the
device couldn't use the feature of the aggregation effectively,
because the transfer would be completed fastly. Therefore, use the
maximum value to improve the effect of the aggregation. However, you
may not feel the improvement for fast CPUs, because they compensate
for the effect of the aggregation.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The intention is for the loop to timeout if the body does not succeed.
The current logic calls time_is_before_jiffies(timeout) which is false
until after the timeout, so the loop body never executes.
Fix by using readl_poll_timeout as a more standard and less error-prone
solution.
Fixes: ba37b7caf1ed ("net: ethernet: mtk_eth_soc: add support for initializing the PPE")
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Cc: Felix Fietkau <nbd@nbd.name>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mat Martineau says:
====================
mptcp: Improve socket option handling
MPTCP sockets have previously had limited socket option support. The
architecture of MPTCP sockets (one userspace-facing MPTCP socket that
manages one or more in-kernel TCP subflow sockets) adds complexity for
passing options through to lower levels. This patch set adds MPTCP
support for socket options commonly used with TCP.
Patch 1 reverts an interim socket option fix (a socket option blocklist)
that was merged in the net tree for v5.12.
Patch 2 moves the socket option code to a separate file, with no
functional changes.
Patch 3 adds an allowlist for socket options that are known to function
with MPTCP. Later patches in this set add more allowed options.
Patches 4 and 5 add infrastructure for syncing MPTCP-level options with
the TCP subflows.
Patches 6-12 add support for specific socket options.
Patch 13 adds a socket option self test.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend mptcp_connect tool with SO_MARK support (-M <value>) and
add a test case that checks that the packet mark gets copied to all
subflows.
This is done by only allowing packets with either skb->mark 1 or 2
via iptables.
DROP rule packet counter is checked; if its not zero, print an error
message and fail the test case.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TCP_CONGESTION is set for all subflows.
The mptcp socket gains icsk_ca_ops too so it can be used to keep the
authoritative state that should be set on new/future subflows.
TCP_INFO will return first subflow only.
The out-of-tree kernel has a MPTCP_INFO getsockopt, this could be added
later on.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Handle SO_DEBUG and set it on all subflows.
Ignore those values not implemented on TCP sockets.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replicate to all subflows.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Value is synced to all subflows.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to PRIORITY/KEEPALIVE: needs to be mirrored to all subflows.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to previous patch: needs to be mirrored to all subflows.
Device bind is simpler: it is only done on the initial (listener) sk.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
start with something simple: both take an integer value, both
need to be mirrored to all subflows.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni suggested to avoid re-syncing new subflows because
they inherit options from listener. In case options were set on
listener but are not set on mptcp-socket there is no need to
do any synchronisation for new subflows.
This change sets sockopt_seq of new mptcp sockets to the seq of
the mptcp listener sock.
Subflow sequence is set to the embedded tcp listener sk.
Add a comment explaing why sk_state is involved in sockopt_seq
generation.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Handle following cases:
1. setsockopt is called with multiple subflows.
Change might have to be mirrored to all of them.
This is done directly in process context/setsockopt call.
2. Outgoing subflow is created after one or several setsockopt()
calls have been made. Old setsockopt changes should be
synced to the new socket.
3. Incoming subflow, after setsockopt call(s).
Cases 2 and 3 are handled right after the join list is spliced to the conn
list.
Not all sockopt values can be just be copied by value, some require
helper calls. Those can acquire socket lock (which can sleep).
If the join->conn list splicing is done from preemptible context,
synchronization can be done right away, otherwise its deferred to work
queue.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Unrolling mcast state at msk dismantel time is bug prone, as
syzkaller reported:
======================================================
WARNING: possible circular locking dependency detected
5.11.0-syzkaller #0 Not tainted
------------------------------------------------------
syz-executor905/8822 is trying to acquire lock:
ffffffff8d678fe8 (rtnl_mutex){+.+.}-{3:3}, at: ipv6_sock_mc_close+0xd7/0x110 net/ipv6/mcast.c:323
but task is already holding lock:
ffff888024390120 (sk_lock-AF_INET6){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1600 [inline]
ffff888024390120 (sk_lock-AF_INET6){+.+.}-{0:0}, at: mptcp6_release+0x57/0x130 net/mptcp/protocol.c:3507
which lock already depends on the new lock.
Instead we can simply forbid any mcast-related setsockopt.
Let's do the same with all other non supported sockopts.
Fixes: 717e79c867ca5 ("mptcp: Add setsockopt()/getsockopt() socket operations")
Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MPTCP sockopt implementation is going to be much
more big and complex soon. Let's move it to a different
source file.
No functional change intended.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change reverts commit 86581852d771 ("mptcp: forbit mcast-related sockopt on MPTCP sockets").
As announced in the cover letter of the mentioned patch above, the
following commits introduce a larger MPTCP sockopt implementation
refactor.
This time, we switch from a blocklist to an allowlist. This is safer for
the future where new sockoptions could be added while not being fully
supported with MPTCP sockets and thus causing unstabilities.
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tx queue cleanup happens in interrupt handler on same core as rx queue
processing. Both can take considerable amount of processing in high
packet-per-second scenarios.
Sending big amounts of packets can stall the rx processing which is
unfair and also can lead to out-of-memory condition since
__dev_kfree_skb_irq queues the skbs for later kfree in softirq which
is not allowed to happen with heavy load in interrupt handler.
This puts tx cleanup in its own napi and enables threaded napi to
allow the rx/tx queue processing to happen on different cores.
The ability to sustain equal amounts of tx/rx traffic increased:
from 280Kpps to 1130Kpps on Threadripper 3960X with upcoming
Mikrotik 10/25G NIC,
from 520Kpps to 850Kpps on Intel i3-3320 with Mikrotik RB44Ge adapter.
Signed-off-by: Gatis Peisenieks <gatis@mikrotik.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean says:
====================
Pass the BR_FDB_LOCAL information to switchdev drivers
Bridge FDB entries with the is_local flag are entries which are
terminated locally and not forwarded. Switchdev drivers might want to be
notified of these addresses so they can trap them. If they don't program
these entries to hardware, there is no guarantee that they will do the
right thing with these entries, and they won't be, let's say, flooded.
Ideally none of the switchdev drivers should ignore these entries, but
having access to the is_local bit is the bare minimum change that should
be done in the bridge layer, before this is even possible.
These 2 changes are extracted from the larger "RX filtering in DSA"
series:
https://patchwork.kernel.org/project/netdevbpf/patch/20210224114350.2791260-8-olteanv@gmail.com/https://patchwork.kernel.org/project/netdevbpf/patch/20210224114350.2791260-9-olteanv@gmail.com/
and submitted separately, because they touch all switchdev drivers,
while the rest is mostly specific to DSA.
This change is not a functional one, in the sense that everybody still
ignores the local FDB entries, but this will be changed by further
patches at least for DSA.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
As explained in bugfix commit 6ab4c3117aec ("net: bridge: don't notify
switchdev for local FDB addresses") as well as in this discussion:
https://lore.kernel.org/netdev/20210117193009.io3nungdwuzmo5f7@skbuf/
the switchdev notifiers for FDB entries managed to have a zero-day bug,
which was that drivers would not know what to do with local FDB entries,
because they were not told that they are local. The bug fix was to
simply not notify them of those addresses.
Let us now add the 'is_local' bit to bridge FDB entries, and make all
drivers ignore these entries by their own choice.
Co-developed-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>