IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct hwc_dma_buf.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Long Li <longli@microsoft.com>
Cc: Ajay Sharma <sharmaajay@microsoft.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20230922172858.3822653-9-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct ipa_power.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Alex Elder <elder@linaro.org>
Link: https://lore.kernel.org/r/20230922172858.3822653-8-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct mana_rxq.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Long Li <longli@microsoft.com>
Cc: Ajay Sharma <sharmaajay@microsoft.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20230922172858.3822653-7-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct rcb_common_cb.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Yisen Zhuang <yisen.zhuang@huawei.com>
Cc: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20230922172858.3822653-6-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct enetc_int_vector.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Claudiu Manoil <claudiu.manoil@nxp.com>
Cc: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20230922172858.3822653-5-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct ppe_common_cb.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Yisen Zhuang <yisen.zhuang@huawei.com>
Cc: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20230922172858.3822653-4-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct ip6_sf_socklist.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20230922172858.3822653-3-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct ip_sf_socklist.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Martin KaFai Lau <martin.lau@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20230922172858.3822653-2-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct fib_info.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: David Ahern <dsahern@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20230922172858.3822653-1-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Petr Machata says:
====================
mlxsw: Provide enhancements and new feature
Vadim Pasternak writes:
Patch #1 - Optimize transaction size for efficient retrieval of module
data.
Patch #3 - Enable thermal zone binding with new cooling device.
Patch #4 - Employ standard macros for dividing buffer into the chunks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Use standard macro DIV_ROUND_UP() to determine the number of chunks
required for a given buffer.
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend the list of allowed external cooling devices for thermal zone
binding to include devices of type "emc2305".
The motivation is to provide support for the system SN2201, which is
equipped with the Spectrum-1 ASIC.
The system's airflow control is managed by the EMC2305 RPM-based PWM
Fan Speed Controller as the cooling device.
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MTBR register is used to read temperatures from multiple sensors in
one transaction, but the driver only reads from a single sensor in each
transaction.
Rrestrict the payload size of the MTBR register to prevent the
transmission of redundant data to the firmware.
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet says:
====================
inet: more data-race fixes
This series fixes some existing data-races on inet fields:
inet->mc_ttl, inet->pmtudisc, inet->tos, inet->uc_index,
inet->mc_index and inet->mc_addr.
While fixing them, we convert eight socket options
to lockless implementation.
v2: addressed David Ahern feedback on ("inet: implement lockless IP_TOS")
Added David Reviewed-by: tag on other patches.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add missing annotations to inet->mc_index and inet->mc_addr
to fix data-races.
getsockopt(IP_MULTICAST_IF) can be lockless.
setsockopt() side is left for later.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current implementation is already lockless, because the socket
lock is released before reading socket fields.
Add missing READ_ONCE() annotations.
Note that corresponding WRITE_ONCE() are needed, the order
of the patches do not really matter.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add missing READ_ONCE() annotations when reading inet->uc_index
Implementing getsockopt(IP_UNICAST_IF) locklessly seems possible,
the setsockopt() part might not be possible at the moment.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk_dst_get() does not require socket lock.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
inet->inet_opt being RCU protected, we can use RCU instead
of locking the socket.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some reads of inet->tos are racy.
Add needed READ_ONCE() annotations and convert IP_TOS option lockless.
v2: missing changes in include/net/route.h (David Ahern)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
inet->pmtudisc can be read locklessly.
Implement proper lockless reads and writes to inet->pmtudisc
ip_sock_set_mtu_discover() can now be called from arbitrary
contexts.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
inet->mc_ttl can be read locklessly.
Implement proper lockless reads and writes to inet->mc_ttl
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet says:
====================
net: more data-races fixes and lockless socket options
This is yet another round of data-races fixes,
and lockless socket options.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This field can be read or written without socket lock being held.
Add annotations to avoid load-store tearing.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This field can be read or written without socket lock being held.
Add annotations to avoid load-store tearing.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk->sk_txrehash readers are already safe against
concurrent change of this field.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SO_MAX_PACING_RATE setsockopt() does not need to hold
the socket lock, because sk->sk_pacing_rate readers
can run fine if the value is changed by other threads,
after adding READ_ONCE() accessors.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Setting sk->sk_ll_usec, sk_prefer_busy_poll and sk_busy_poll_budget
do not require the socket lock, readers are lockless anyway.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This options can not be set and return -ENOPROTOOPT,
no need to acqure socket lock.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sock->flags are atomic, no need to hold the socket lock
in sk_setsockopt() for SO_PASSCRED, SO_PASSPIDFD and SO_PASSSEC.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a followup of 8bf43be799d4 ("net: annotate data-races
around sk->sk_priority").
sk->sk_priority can be read and written without holding the socket lock.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
do_execute_actions() function can be called recursively multiple
times while executing actions that require pipeline forking or
recirculations. It may also be re-entered multiple times if the packet
leaves openvswitch module and re-enters it through a different port.
Currently, there is a 256-byte array allocated on stack in this
function that is supposed to hold NSH header. Compilers tend to
pre-allocate that space right at the beginning of the function:
a88: 48 81 ec b0 01 00 00 sub $0x1b0,%rsp
NSH is not a very common protocol, but the space is allocated on every
recursive call or re-entry multiplying the wasted stack space.
Move the stack allocation to push_nsh() function that is only used
if NSH actions are actually present. push_nsh() is also a simple
function without a possibility for re-entry, so the stack is returned
right away.
With this change the preallocated space is reduced by 256 B per call:
b18: 48 81 ec b0 00 00 00 sub $0xb0,%rsp
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Eelco Chaudron echaudro@redhat.com
Reviewed-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet says:
====================
net: use DEV_STATS_xxx() helpers in virtio_net and l2tp_eth
Inspired by another (minor) KCSAN syzbot report.
Both virtio_net and l2tp_eth can use DEV_STATS_xxx() helpers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Core networking has opt-in atomic variant of dev->stats,
simply use DEV_STATS_INC(), DEV_STATS_ADD() and DEV_STATS_READ().
v2: removed @priv local var in l2tp_eth_dev_recv() (Simon)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use DEV_STATS_INC() and DEV_STATS_READ() which provide
atomicity on paths that can be used concurrently.
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Companion of DEV_STATS_INC() & DEV_STATS_ADD().
This is going to be used in the series.
Use it in macsec_get_stats64().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch extends flower offload support for MPLS protocol.
Due to hardware limitation, currently driver supports lse
depth up to 4.
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
During review of the patch that became 2e0ec0afa902 ("net: ethernet:
xilinx: Convert to platform remove callback returning void") in
net-next, Radhey Shyam Pandey pointed out that the change makes the
documentation about the return value obsolete. The patch was applied
without addressing this feedback, so here comes a fix in a separate
patch.
Fixes: 2e0ec0afa902 ("net: ethernet: xilinx: Convert to platform remove callback returning void")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Switch to napi_consume_skb() to take advantage of bulk free, and skb
reuse through skb cache in conjunction with napi_build_skb().
When parameter 'budget' = 0, indicating non-NAPI context,
dev_consume_skb_any() is called internally.
Signed-off-by: Sieng-Piaw Liew <liew.s.piaw@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet says:
====================
net_sched: sch_fq: round of improvements
For FQ tenth anniversary, it was time for making it faster.
The FQ part (as in Fair Queue) is rather expensive, because
we have to classify packets and store them in a per-flow structure,
and add this per-flow structure in a hash table. Then the RR lists
also add cache line misses.
Most fq qdisc are almost idle. Trying to share NIC bandwidth has
no benefits, thus the qdisc could behave like a FIFO.
This series brings a 5 % throughput increase in intensive
tcp_rr workload, and 13 % increase for (unpaced) UDP packets.
v2: removed an extra label (build bot).
Fix an accidental increase of stat_internal_packets counter
in fast path.
Added "constify qdisc_priv()" patch to allow fq_fastpath_check()
first parameter to be const.
typo on 'eligible' (Willem)
====================
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
FQ performs garbage collection at enqueue time, and only
if number of flows is above a given threshold, which
is hit after the qdisc has been used a bit.
Since an RB-tree traversal is needed to locate a flow,
it makes sense to perform gc all the time, to keep
rb-trees smaller.
This reduces by 50 % average storage costs in FQ,
and avoids 1 cache line miss at enqueue time when
fast path added in prior patch can not be used.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TCQ_F_CAN_BYPASS can be used by few qdiscs.
Idea is that if we queue a packet to an empty qdisc,
following dequeue() would pick it immediately.
FQ can not use the generic TCQ_F_CAN_BYPASS code,
because some additional checks need to be performed.
This patch adds a similar fast path to FQ.
Most of the time, qdisc is not throttled,
and many packets can avoid bringing/touching
at least four cache lines, and consuming 128bytes
of memory to store the state of a flow.
After this patch, netperf can send UDP packets about 13 % faster,
and pktgen goes 30 % faster (when FQ is in the way), on a fast NIC.
TCP traffic is also improved, thanks to a reduction of cache line misses.
I have measured a 5 % increase of throughput on a tcp_rr intensive workload.
tc -s -d qd sh dev eth1
...
qdisc fq 8004: parent 1:2 limit 10000p flow_limit 100p buckets 1024
orphan_mask 1023 quantum 3028b initial_quantum 15140b low_rate_threshold 550Kbit
refill_delay 40ms timer_slack 10us horizon 10s horizon_drop
Sent 5646784384 bytes 1985161 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
flows 122 (inactive 122 throttled 0)
gc 0 highprio 0 fastpath 659990 throttled 27762 latency 8.57us
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, when one fq qdisc has no more packets to send, it can still
have some flows stored in its RR lists (q->new_flows & q->old_flows)
This was a design choice, but what is a bit disturbing is that
the inactive_flows counter does not include the count of empty flows
in RR lists.
As next patch needs to know better if there are active flows,
this change makes inactive_flows exact.
Before the patch, following command on an empty qdisc could have returned:
lpaa17:~# tc -s -d qd sh dev eth1 | grep inactive
flows 1322 (inactive 1316 throttled 0)
flows 1330 (inactive 1325 throttled 0)
flows 1193 (inactive 1190 throttled 0)
flows 1208 (inactive 1202 throttled 0)
After the patch, we now have:
lpaa17:~# tc -s -d qd sh dev eth1 | grep inactive
flows 1322 (inactive 1322 throttled 0)
flows 1330 (inactive 1330 throttled 0)
flows 1193 (inactive 1193 throttled 0)
flows 1208 (inactive 1208 throttled 0)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to propagate const qualifiers, we change qdisc_priv()
to accept a possibly const argument.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet says:
====================
tcp: add tcp_delack_max()
First patches are adding const qualifiers to four existing helpers.
Third patch adds a much needed companion feature to RTAX_RTO_MIN.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
While BPF allows to set icsk->->icsk_delack_max
and/or icsk->icsk_rto_min, we have an ip route
attribute (RTAX_RTO_MIN) to be able to tune rto_min,
but nothing to consequently adjust max delayed ack,
which vary from 40ms to 200 ms (TCP_DELACK_{MIN|MAX}).
This makes RTAX_RTO_MIN of almost no practical use,
unless customers are in big trouble.
Modern days datacenter communications want to set
rto_min to ~5 ms, and the max delayed ack one jiffie
smaller to avoid spurious retransmits.
After this patch, an "rto_min 5" route attribute will
effectively lower max delayed ack timers to 4 ms.
Note in the following ss output, "rto:6 ... ato:4"
$ ss -temoi dst XXXXXX
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
ESTAB 0 0 [2002:a05:6608:295::]:52950 [2002:a05:6608:297::]:41597
ino:255134 sk:1001 <->
skmem:(r0,rb1707063,t872,tb262144,f0,w0,o0,bl0,d0) ts sack
cubic wscale:8,8 rto:6 rtt:0.02/0.002 ato:4 mss:4096 pmtu:4500
rcvmss:536 advmss:4096 cwnd:10 bytes_sent:54823160 bytes_acked:54823121
bytes_received:54823120 segs_out:1370582 segs_in:1370580
data_segs_out:1370579 data_segs_in:1370578 send 16.4Gbps
pacing_rate 32.6Gbps delivery_rate 1.72Gbps delivered:1370579
busy:26920ms unacked:1 rcv_rtt:34.615 rcv_space:65920
rcv_ssthresh:65535 minrtt:0.015 snd_wnd:65536
While we could argue this patch fixes a bug with RTAX_RTO_MIN,
I do not add a Fixes: tag, so that we can soak it a bit before
asking backports to stable branches.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make clear these functions do not change any field from TCP socket.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both helpers only read fields from their socket argument.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
ice: add PTP auxiliary bus support
Michal Michalik says:
Auxiliary bus allows exchanging information between PFs, which allows
both fixing problems and simplifying new features implementation.
The auxiliary bus is enabled for all devices supported by ice driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>