IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Kuee reported a corner case where the tnum becomes constant after the call
to __reg_bound_offset(), but the register's bounds are not, that is, its
min bounds are still not equal to the register's max bounds.
This in turn allows to leak pointers through turning a pointer register as
is into an unknown scalar via adjust_ptr_min_max_vals().
Before:
func#0 @0
0: R1=ctx(off=0,imm=0,umax=0,var_off=(0x0; 0x0)) R10=fp(off=0,imm=0,umax=0,var_off=(0x0; 0x0))
0: (b7) r0 = 1 ; R0_w=scalar(imm=1,umin=1,umax=1,var_off=(0x1; 0x0))
1: (b7) r3 = 0 ; R3_w=scalar(imm=0,umax=0,var_off=(0x0; 0x0))
2: (87) r3 = -r3 ; R3_w=scalar()
3: (87) r3 = -r3 ; R3_w=scalar()
4: (47) r3 |= 32767 ; R3_w=scalar(smin=-9223372036854743041,umin=32767,var_off=(0x7fff; 0xffffffffffff8000),s32_min=-2147450881)
5: (75) if r3 s>= 0x0 goto pc+1 ; R3_w=scalar(umin=9223372036854808575,var_off=(0x8000000000007fff; 0x7fffffffffff8000),s32_min=-2147450881,u32_min=32767)
6: (95) exit
from 5 to 7: R0=scalar(imm=1,umin=1,umax=1,var_off=(0x1; 0x0)) R1=ctx(off=0,imm=0,umax=0,var_off=(0x0; 0x0)) R3=scalar(umin=32767,umax=9223372036854775807,var_off=(0x7fff; 0x7fffffffffff8000),s32_min=-2147450881) R10=fp(off=0,imm=0,umax=0,var_off=(0x0; 0x0))
7: (d5) if r3 s<= 0x8000 goto pc+1 ; R3=scalar(umin=32769,umax=9223372036854775807,var_off=(0x7fff; 0x7fffffffffff8000),s32_min=-2147450881,u32_min=32767)
8: (95) exit
from 7 to 9: R0=scalar(imm=1,umin=1,umax=1,var_off=(0x1; 0x0)) R1=ctx(off=0,imm=0,umax=0,var_off=(0x0; 0x0)) R3=scalar(umin=32767,umax=32768,var_off=(0x7fff; 0x8000)) R10=fp(off=0,imm=0,umax=0,var_off=(0x0; 0x0))
9: (07) r3 += -32767 ; R3_w=scalar(imm=0,umax=1,var_off=(0x0; 0x0)) <--- [*]
10: (95) exit
What can be seen here is that R3=scalar(umin=32767,umax=32768,var_off=(0x7fff;
0x8000)) after the operation R3 += -32767 results in a 'malformed' constant, that
is, R3_w=scalar(imm=0,umax=1,var_off=(0x0; 0x0)). Intersecting with var_off has
not been done at that point via __update_reg_bounds(), which would have improved
the umax to be equal to umin.
Refactor the tnum <> min/max bounds information flow into a reg_bounds_sync()
helper and use it consistently everywhere. After the fix, bounds have been
corrected to R3_w=scalar(imm=0,umax=0,var_off=(0x0; 0x0)) and thus the register
is regarded as a 'proper' constant scalar of 0.
After:
func#0 @0
0: R1=ctx(off=0,imm=0,umax=0,var_off=(0x0; 0x0)) R10=fp(off=0,imm=0,umax=0,var_off=(0x0; 0x0))
0: (b7) r0 = 1 ; R0_w=scalar(imm=1,umin=1,umax=1,var_off=(0x1; 0x0))
1: (b7) r3 = 0 ; R3_w=scalar(imm=0,umax=0,var_off=(0x0; 0x0))
2: (87) r3 = -r3 ; R3_w=scalar()
3: (87) r3 = -r3 ; R3_w=scalar()
4: (47) r3 |= 32767 ; R3_w=scalar(smin=-9223372036854743041,umin=32767,var_off=(0x7fff; 0xffffffffffff8000),s32_min=-2147450881)
5: (75) if r3 s>= 0x0 goto pc+1 ; R3_w=scalar(umin=9223372036854808575,var_off=(0x8000000000007fff; 0x7fffffffffff8000),s32_min=-2147450881,u32_min=32767)
6: (95) exit
from 5 to 7: R0=scalar(imm=1,umin=1,umax=1,var_off=(0x1; 0x0)) R1=ctx(off=0,imm=0,umax=0,var_off=(0x0; 0x0)) R3=scalar(umin=32767,umax=9223372036854775807,var_off=(0x7fff; 0x7fffffffffff8000),s32_min=-2147450881) R10=fp(off=0,imm=0,umax=0,var_off=(0x0; 0x0))
7: (d5) if r3 s<= 0x8000 goto pc+1 ; R3=scalar(umin=32769,umax=9223372036854775807,var_off=(0x7fff; 0x7fffffffffff8000),s32_min=-2147450881,u32_min=32767)
8: (95) exit
from 7 to 9: R0=scalar(imm=1,umin=1,umax=1,var_off=(0x1; 0x0)) R1=ctx(off=0,imm=0,umax=0,var_off=(0x0; 0x0)) R3=scalar(umin=32767,umax=32768,var_off=(0x7fff; 0x8000)) R10=fp(off=0,imm=0,umax=0,var_off=(0x0; 0x0))
9: (07) r3 += -32767 ; R3_w=scalar(imm=0,umax=0,var_off=(0x0; 0x0)) <--- [*]
10: (95) exit
Fixes: b03c9f9fdc37 ("bpf/verifier: track signed and unsigned min/max values")
Reported-by: Kuee K1r0a <liulin063@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20220701124727.11153-2-daniel@iogearbox.net
Kuee reported a quirk in the jmp32's jeq/jne simulation, namely that the
register value does not match expectations for the fall-through path. For
example:
Before fix:
0: R1=ctx(off=0,imm=0) R10=fp0
0: (b7) r2 = 0 ; R2_w=P0
1: (b7) r6 = 563 ; R6_w=P563
2: (87) r2 = -r2 ; R2_w=Pscalar()
3: (87) r2 = -r2 ; R2_w=Pscalar()
4: (4c) w2 |= w6 ; R2_w=Pscalar(umin=563,umax=4294967295,var_off=(0x233; 0xfffffdcc),s32_min=-2147483085) R6_w=P563
5: (56) if w2 != 0x8 goto pc+1 ; R2_w=P571 <--- [*]
6: (95) exit
R0 !read_ok
After fix:
0: R1=ctx(off=0,imm=0) R10=fp0
0: (b7) r2 = 0 ; R2_w=P0
1: (b7) r6 = 563 ; R6_w=P563
2: (87) r2 = -r2 ; R2_w=Pscalar()
3: (87) r2 = -r2 ; R2_w=Pscalar()
4: (4c) w2 |= w6 ; R2_w=Pscalar(umin=563,umax=4294967295,var_off=(0x233; 0xfffffdcc),s32_min=-2147483085) R6_w=P563
5: (56) if w2 != 0x8 goto pc+1 ; R2_w=P8 <--- [*]
6: (95) exit
R0 !read_ok
As can be seen on line 5 for the branch fall-through path in R2 [*] is that
given condition w2 != 0x8 is false, verifier should conclude that r2 = 8 as
upper 32 bit are known to be zero. However, verifier incorrectly concludes
that r2 = 571 which is far off.
The problem is it only marks false{true}_reg as known in the switch for JE/NE
case, but at the end of the function, it uses {false,true}_{64,32}off to
update {false,true}_reg->var_off and they still hold the prior value of
{false,true}_reg->var_off before it got marked as known. The subsequent
__reg_combine_32_into_64() then propagates this old var_off and derives new
bounds. The information between min/max bounds on {false,true}_reg from
setting the register to known const combined with the {false,true}_reg->var_off
based on the old information then derives wrong register data.
Fix it by detangling the BPF_JEQ/BPF_JNE cases and updating relevant
{false,true}_{64,32}off tnums along with the register marking to known
constant.
Fixes: 3f50f132d840 ("bpf: Verifier, do explicit ALU32 bounds tracking")
Reported-by: Kuee K1r0a <liulin063@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20220701124727.11153-1-daniel@iogearbox.net
When a XSK pool gets mapped, xp_check_dma_contiguity() adds bit 0x1
to pages' DMA addresses that go in ascending order and at 4K stride.
The problem is that the bit does not get cleared before doing unmap.
As a result, a lot of warnings from iommu_dma_unmap_page() are seen
in dmesg, which indicates that lookups by iommu_iova_to_phys() fail.
Fixes: 2b43470add8c ("xsk: Introduce AF_XDP buffer allocation API")
Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20220628091848.534803-1-ivan.malov@oktetlabs.ru
The BPF subsystem consists of a large number of pieces. There is not a
single person that understands it all. Yet reviews are crucially important
for the BPF community to provide productive quality feedback to contributors
in a timely manner and therefore to ultimately expand the number of active
developers in the community.
So far, the BPF community had a two-stage review system, that is, a weekly
rotation among 7 developers (Alexei, Daniel, Andrii, Martin, Song, Yonghong,
John) as a first-level review of all inbound patches accompanied by a BPF CI
system which runs the in-tree BPF selftests to check for regressions for
every new patch, and then, a final check by Alexei, Daniel, Andrii to apply
the patches to either bpf or bpf-next trees.
This system worked well for the last ~3.5 years, but clearly reaches its
limits these days as it does not scale enough. Especially, as we also need
to allow enough room for every developer to contribute patches themselves,
integrate with their day to day job, and in particular avoid burnout. We
want to better scale both horizontally and vertically going forward.
On the horizontal scale, we are adding more developers (KP, Stan, Hao, Jiri)
to the overall core reviewer team, thus growing to 11 people in total. The
weekly rotation for the horizontal oncall reviewer is shortened to 1/2 week
(Mo - Wed and Thur - Fri). Instead of just patches, the coverage however
extends also generally to triage and reply to mailing list traffic (e.g. RFCs,
questions, etc).
On the vertical scale, there is clearly a need for deep expertise areas to
assign dedicated maintainer/reviewer teams that are responsible for code
reviews and help with design of individual building blocks. To some degree
we have been doing this implicitly, but the point is to formalize the teams
and commitment.
There is an overlap between areas and boundaries are intentionally grey. These
additional entries provide a guidance on who has to look at the patches. The
patch series which span multiple areas will be looked at by multiple people.
The vertical review with areas of deep expertise are bundled at the same time
with the horizontal side.
This patch cleans up a bit the BPF entries, adds mentioned developers to
the horizontal scale and creates new sub-entries with teams for developers
committing to the above outlined vertical scale. Also, pw.git tools we use
for BPF tree maintenance have been updated with a new pw-schedule script to
semi-automate vertical oncall review rotation.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <martin.lau@linux.dev>
Acked-by: Song Liu <song@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Mykola Lysenko <mykolal@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: KP Singh <kpsingh@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Hao Luo <haoluo@google.com>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://git.kernel.org/pub/scm/linux/kernel/git/dborkman/pw.git
Link: https://lore.kernel.org/bpf/5bdc73e7f5a087299589944fa074563cdf2c2c1a.1656353995.git.daniel@iogearbox.net
As reported by Yuming, currently tc always show a latency of UINT_MAX
for netem Qdisc's on 32-bit platforms:
$ tc qdisc add dev dummy0 root netem latency 100ms
$ tc qdisc show dev dummy0
qdisc netem 8001: root refcnt 2 limit 1000 delay 275s 275s
^^^^^^^^^^^^^^^^
Let us take a closer look at netem_dump():
qopt.latency = min_t(psched_tdiff_t, PSCHED_NS2TICKS(q->latency,
UINT_MAX);
qopt.latency is __u32, psched_tdiff_t is signed long,
(psched_tdiff_t)(UINT_MAX) is negative for 32-bit platforms, so
qopt.latency is always UINT_MAX.
Fix it by using psched_time_t (u64) instead.
Note: confusingly, users have two ways to specify 'latency':
1. normally, via '__u32 latency' in struct tc_netem_qopt;
2. via the TCA_NETEM_LATENCY64 attribute, which is s64.
For the second case, theoretically 'latency' could be negative. This
patch ignores that corner case, since it is broken (i.e. assigning a
negative s64 to __u32) anyways, and should be handled separately.
Thanks Ted Lin for the analysis [1] .
[1] https://github.com/raspberrypi/linux/issues/3512
Reported-by: Yuming Chen <chenyuming.junnan@bytedance.com>
Fixes: 112f9cb65643 ("netem: convert to qdisc_watchdog_schedule_ns")
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Link: https://lore.kernel.org/r/20220616234336.2443-1-yepeilin.cs@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Function fallback_set_params() checks if the module type returned
by a driver is ETH_MODULE_SFF_8079 and in this case it assumes
that buffer returns a concatenated content of page A0h and A2h.
The check is wrong because the correct type is ETH_MODULE_SFF_8472.
Fixes: 96d971e307cc ("ethtool: Add fallback to get_module_eeprom from netlink command")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20220616160856.3623273-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The bonding ARP monitor fails to decrement send_peer_notif, the
number of peer notifications (gratuitous ARP or ND) to be sent. This
results in a continuous series of notifications.
Correct this by decrementing the counter for each notification.
Reported-by: Jonathan Toppins <jtoppins@redhat.com>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Fixes: b0929915e035 ("bonding: Fix RTNL: assertion failed at net/core/rtnetlink.c for ab arp monitor")
Link: https://lore.kernel.org/netdev/b2fd4147-8f50-bebd-963a-1a3e8d1d9715@redhat.com/
Tested-by: Jonathan Toppins <jtoppins@redhat.com>
Reviewed-by: Jonathan Toppins <jtoppins@redhat.com>
Link: https://lore.kernel.org/r/9400.1655407960@famine
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Borkmann says:
====================
pull-request: bpf 2022-06-17
We've added 12 non-merge commits during the last 4 day(s) which contain
a total of 14 files changed, 305 insertions(+), 107 deletions(-).
The main changes are:
1) Fix x86 JIT tailcall count offset on BPF-2-BPF call, from Jakub Sitnicki.
2) Fix a kprobe_multi link bug which misplaces BPF cookies, from Jiri Olsa.
3) Fix an infinite loop when processing a module's BTF, from Kumar Kartikeya Dwivedi.
4) Fix getting a rethook only in RCU available context, from Masami Hiramatsu.
5) Fix request socket refcount leak in sk lookup helpers, from Jon Maxwell.
6) Fix xsk xmit behavior which wrongly adds skb to already full cq, from Ciara Loftus.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
rethook: Reject getting a rethook if RCU is not watching
fprobe, samples: Add use_trace option and show hit/missed counter
bpf, docs: Update some of the JIT/maintenance entries
selftest/bpf: Fix kprobe_multi bench test
bpf: Force cookies array to follow symbols sorting
ftrace: Keep address offset in ftrace_lookup_symbols
selftests/bpf: Shuffle cookies symbols in kprobe multi test
selftests/bpf: Test tail call counting with bpf2bpf and data on stack
bpf, x86: Fix tail call count offset calculation on bpf2bpf call
bpf: Limit maximum modifier chain length in btf_check_type_tags
bpf: Fix request_sock leak in sk lookup helpers
xsk: Fix generic transmit when completion queue reservation fails
====================
Link: https://lore.kernel.org/r/20220617202119.2421-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since the rethook_recycle() will involve the call_rcu() for reclaiming
the rethook_instance, the rethook must be set up at the RCU available
context (non idle). This rethook_recycle() in the rethook trampoline
handler is inevitable, thus the RCU available check must be done before
setting the rethook trampoline.
This adds a rcu_is_watching() check in the rethook_try_get() so that
it will return NULL if it is called when !rcu_is_watching().
Fixes: 54ecbe6f1ed5 ("rethook: Add a generic return hook")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/165461827269.280167.7379263615545598958.stgit@devnote2
Add use_trace option to use trace_printk() instead of pr_info()
so that the handler doesn't involve the RCU operations.
And show the hit and missed counter so that the user can check
how many times the probe handler hit and missed.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/165461826247.280167.11939123218334322352.stgit@devnote2
Various minor updates around some of the BPF-related entries:
JITs for ARM32/NFP/SPARC/X86-32 haven't seen updates in quite a while, thus
for now, mark them as 'Odd Fixes' until they become more actively developed.
JITs for POWERPC/S390 are in good shape and receive active development and
review, thus bump to 'Supported' similar as we have with X86-64/ARM64.
JITs for MIPS/RISC-V are in similar good shape as the ones mentioned above,
but looked after mostly in spare time, thus leave for now in 'Maintained' state.
Add Michael to PPC JIT given he's picking up the patches there, so it better
reflects today's state.
Also, I haven't done much reviewing around BPF sockmap/kTLS after John and I
did the big rework back in the days to integrate sockmap with kTLS.
These days, most of this is taken care by John, Jakub {Sitnicki,Kicinski} and
others in the community, so remove myself from these two.
Lastly, move all BPF-related entries into one place, that is, move the sockmap
one over near rest of BPF.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/f9b8a63a0b48dc764bd4c50f87632889f5813f69.1655494758.git.daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Commit 8ff978b8b222 ("ipv4/raw: support binding to nonlocal addresses")
introduced a helper function to fold duplicated validity checks of bind
addresses into inet_addr_valid_or_nonlocal(). However, this caused an
unintended regression in ping_check_bind_addr(), which previously would
reject binding to multicast and broadcast addresses, but now these are
both incorrectly allowed as reported in [1].
This patch restores the original check. A simple reordering is done to
improve readability and make it evident that multicast and broadcast
addresses should not be allowed. Also, add an early exit for INADDR_ANY
which replaces lost behavior added by commit 0ce779a9f501 ("net: Avoid
unnecessary inet_addr_type() call when addr is INADDR_ANY").
Furthermore, this patch introduces regression selftests to catch these
specific cases.
[1] https://lore.kernel.org/netdev/CANP3RGdkAcDyAZoT1h8Gtuu0saq+eOrrTiWbxnOs+5zn+cpyKg@mail.gmail.com/
Fixes: 8ff978b8b222 ("ipv4/raw: support binding to nonlocal addresses")
Cc: Miaohe Lin <linmiaohe@huawei.com>
Reported-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: Riccardo Paolo Bestetti <pbl@bestov.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hulk Robot reports incorrect sp->rx_count_cooked value in decode_std_command().
This should be caused by the subtracting from sp->rx_count_cooked before.
It seems that sp->rx_count_cooked value is changed to 0, which bypassed the
previous judgment.
The situation is shown below:
(Thread 1) | (Thread 2)
decode_std_command() | resync_tnc()
... |
if (rest == 2) |
sp->rx_count_cooked -= 2; |
else if (rest == 3) | ...
| sp->rx_count_cooked = 0;
sp->rx_count_cooked -= 1; |
for (i = 0; i < sp->rx_count_cooked; i++) // report error
checksum += sp->cooked_buf[i];
sp->rx_count_cooked is a shared variable but is not protected by a lock.
The same applies to sp->rx_count. This patch adds a lock to fix the bug.
The fail log is shown below:
=======================================================================
UBSAN: array-index-out-of-bounds in drivers/net/hamradio/6pack.c:925:31
index 400 is out of range for type 'unsigned char [400]'
CPU: 3 PID: 7433 Comm: kworker/u10:1 Not tainted 5.18.0-rc5-00163-g4b97bac0756a #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
Workqueue: events_unbound flush_to_ldisc
Call Trace:
<TASK>
dump_stack_lvl+0xcd/0x134
ubsan_epilogue+0xb/0x50
__ubsan_handle_out_of_bounds.cold+0x62/0x6c
sixpack_receive_buf+0xfda/0x1330
tty_ldisc_receive_buf+0x13e/0x180
tty_port_default_receive_buf+0x6d/0xa0
flush_to_ldisc+0x213/0x3f0
process_one_work+0x98f/0x1620
worker_thread+0x665/0x1080
kthread+0x2e9/0x3a0
ret_from_fork+0x1f/0x30
...
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Xu Jia <xujia39@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
syzbot found the following issue on:
==================================================================
BUG: KASAN: use-after-free in tipc_named_reinit+0x94f/0x9b0
net/tipc/name_distr.c:413
Read of size 8 at addr ffff88805299a000 by task kworker/1:9/23764
CPU: 1 PID: 23764 Comm: kworker/1:9 Not tainted
5.18.0-rc4-syzkaller-00878-g17d49e6e8012 #0
Hardware name: Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
Workqueue: events tipc_net_finalize_work
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description.constprop.0.cold+0xeb/0x495
mm/kasan/report.c:313
print_report mm/kasan/report.c:429 [inline]
kasan_report.cold+0xf4/0x1c6 mm/kasan/report.c:491
tipc_named_reinit+0x94f/0x9b0 net/tipc/name_distr.c:413
tipc_net_finalize+0x234/0x3d0 net/tipc/net.c:138
process_one_work+0x996/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e9/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298
</TASK>
[...]
==================================================================
In the commit
d966ddcc3821 ("tipc: fix a deadlock when flushing scheduled work"),
the cancel_work_sync() function just to make sure ONLY the work
tipc_net_finalize_work() is executing/pending on any CPU completed before
tipc namespace is destroyed through tipc_exit_net(). But this function
is not guaranteed the work is the last queued. So, the destroyed instance
may be accessed in the work which will try to enqueue later.
In order to completely fix, we re-order the calling of cancel_work_sync()
to make sure the work tipc_net_finalize_work() was last queued and it
must be completed by calling cancel_work_sync().
Reported-by: syzbot+47af19f3307fc9c5c82e@syzkaller.appspotmail.com
Fixes: d966ddcc3821 ("tipc: fix a deadlock when flushing scheduled work")
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 21a75f0915dd ("bonding: Fix ARP monitor validation"),
the bonding ARP / ND link monitors depend on the trans_start time to
determine link availability. NETIF_F_LLTX drivers must update trans_start
directly, which veth does not do. This prevents use of the ARP or ND link
monitors with veth interfaces in a bond.
Resolve this by having veth_xmit update the trans_start time.
Reported-by: Jonathan Toppins <jtoppins@redhat.com>
Tested-by: Jonathan Toppins <jtoppins@redhat.com>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Fixes: 21a75f0915dd ("bonding: Fix ARP monitor validation")
Link: https://lore.kernel.org/netdev/b2fd4147-8f50-bebd-963a-1a3e8d1d9715@redhat.com/
Signed-off-by: David S. Miller <davem@davemloft.net>
Even when the eth port is resticted to work with speeds not higher than 1G,
and so the eth driver is requesting the phy (via phylink) to advertise up
to 1000BASET support, the aquantia phy device is still advertising for 2.5G
and 5G speeds.
Clear these advertising defaults when requested.
Cc: Ondrej Spacek <ondrej.spacek@nxp.com>
Fixes: 09c4c57f7bc41 ("net: phy: aquantia: add support for auto-negotiation configuration")
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://lore.kernel.org/r/20220610084037.7625-1-claudiu.manoil@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiri Olsa says:
====================
hi,
there's bug in kprobe_multi link that makes cookies misplaced when
using symbols to attach. The reason is that we sort symbols by name
but not adjacent cookie values. Current test did not find it because
bpf_fentry_test* are already sorted by name.
v3 changes:
- fixed kprobe_multi bench test to filter out invalid entries
from available_filter_functions
v2 changes:
- rebased on top of bpf/master
- checking if cookies are defined later in swap function [Andrii]
- added acks
thanks,
jirka
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
With [1] the available_filter_functions file contains records
starting with __ftrace_invalid_address___ and marking disabled
entries.
We need to filter them out for the bench test to pass only
resolvable symbols to kernel.
[1] commit b39181f7c690 ("ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function")
Fixes: b39181f7c690 ("ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220615112118.497303-5-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When user specifies symbols and cookies for kprobe_multi link
interface it's very likely the cookies will be misplaced and
returned to wrong functions (via get_attach_cookie helper).
The reason is that to resolve the provided functions we sort
them before passing them to ftrace_lookup_symbols, but we do
not do the same sort on the cookie values.
Fixing this by using sort_r function with custom swap callback
that swaps cookie values as well.
Fixes: 0236fec57a15 ("bpf: Resolve symbols with ftrace_lookup_symbols for kprobe multi link")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220615112118.497303-4-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
We want to store the resolved address on the same index as
the symbol string, because that's the user (bpf kprobe link)
code assumption.
Also making sure we don't store duplicates that might be
present in kallsyms.
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Fixes: bed0d9a50dac ("ftrace: Add ftrace_lookup_symbols function")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220615112118.497303-3-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
There's a kernel bug that causes cookies to be misplaced and
the reason we did not catch this with this test is that we
provide bpf_fentry_test* functions already sorted by name.
Shuffling function bpf_fentry_test2 deeper in the list and
keeping the current cookie values as before will trigger
the bug.
The kernel fix is coming in following changes.
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220615112118.497303-2-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cover the case when tail call count needs to be passed from BPF function to
BPF function, and the caller has data on stack. Specifically when the size
of data allocated on BPF stack is not a multiple on 8.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220616162037.535469-3-jakub@cloudflare.com
On x86-64 the tail call count is passed from one BPF function to another
through %rax. Additionally, on function entry, the tail call count value
is stored on stack right after the BPF program stack, due to register
shortage.
The stored count is later loaded from stack either when performing a tail
call - to check if we have not reached the tail call limit - or before
calling another BPF function call in order to pass it via %rax.
In the latter case, we miscalculate the offset at which the tail call count
was stored on function entry. The JIT does not take into account that the
allocated BPF program stack is always a multiple of 8 on x86, while the
actual stack depth does not have to be.
This leads to a load from an offset that belongs to the BPF stack, as shown
in the example below:
SEC("tc")
int entry(struct __sk_buff *skb)
{
/* Have data on stack which size is not a multiple of 8 */
volatile char arr[1] = {};
return subprog_tail(skb);
}
int entry(struct __sk_buff * skb):
0: (b4) w2 = 0
1: (73) *(u8 *)(r10 -1) = r2
2: (85) call pc+1#bpf_prog_ce2f79bb5f3e06dd_F
3: (95) exit
int entry(struct __sk_buff * skb):
0xffffffffa0201788: nop DWORD PTR [rax+rax*1+0x0]
0xffffffffa020178d: xor eax,eax
0xffffffffa020178f: push rbp
0xffffffffa0201790: mov rbp,rsp
0xffffffffa0201793: sub rsp,0x8
0xffffffffa020179a: push rax
0xffffffffa020179b: xor esi,esi
0xffffffffa020179d: mov BYTE PTR [rbp-0x1],sil
0xffffffffa02017a1: mov rax,QWORD PTR [rbp-0x9] !!! tail call count
0xffffffffa02017a8: call 0xffffffffa02017d8 !!! is at rbp-0x10
0xffffffffa02017ad: leave
0xffffffffa02017ae: ret
Fix it by rounding up the BPF stack depth to a multiple of 8, when
calculating the tail call count offset on stack.
Fixes: ebf7d1f508a7 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT")
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220616162037.535469-2-jakub@cloudflare.com
Current release - regressions:
- Revert "net: Add a second bind table hashed by port and address",
needs more work
- amd-xgbe: use platform_irq_count(), static setup of IRQ resources
had been removed from DT core
- dts: at91: ksz9477_evb: add phy-mode to fix port/phy validation
Current release - new code bugs:
- hns3: modify the ring param print info
Previous releases - always broken:
- axienet: make the 64b addressable DMA depends on 64b architectures
- iavf: fix issue with MAC address of VF shown as zero
- ice: fix PTP TX timestamp offset calculation
- usb: ax88179_178a needs FLAG_SEND_ZLP
Misc:
- document some net.sctp.* sysctls
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmKrcx8ACgkQMUZtbf5S
IrsKjA/9Ho+cxnGAvx7ngQepqAU8RQDFy7sQoFHiGqs+jMeph/E81PM2QDBR9g9h
k/s7YRpLGuxWFT7KUJScNl0ZyPgSk5EHcqy202ToYyDQv+srLnh5bgbRykMF2Unc
D4mf63a2pNo9S0L1PmMz87p+XaWIwblqQ0wbl5F97e7eAWel+y7rPCBqR0lZ9Il7
w8rZp6iOVOhD495s1ikqOYUVCntepC9MQIo8iIE/WrREiOWmZNNbV8RzvuHRNQs6
j9eLsukKwTfekQbzR3SXbYxyjwRowAQ3bD5sEL3MuqflsxRpVm5lEqN0AuVlAo3C
IJZFSFqnusC4cSUYVdfWhYlx8om+uw4XKzfqQD/T7yobjoVA/Mmt/Uf7Mw6krR+g
bI+/bpgX7WpLYQNtBFAils5pY36pthN+zg9FuU0v7tNLgC3AmQqA8sRI/fRCVJFV
b1Wmk6Ldj1lCynX0KpzU6XSFGzP2Ht9CYReImiwvZbaABIoM14woHhRPrh8UGWIY
sdpLoR+XRyL/0N1W7l0FgbGm/zOaEbh8fo0ZGYHLukXPUby6osiV36frzjxOj/NO
DqNkPq4ajfWFvcWdqbfRKXwpLyM/Ki2WpQjvaNzDLOL74sDspr8wjnIOOLbuHv/8
NW6tcWwfIu9nkDJOpRedh+O2gj6FKdruobdKVgQd376J0kxWLv0=
=JSV9
-----END PGP SIGNATURE-----
Merge tag 'net-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Mostly driver fixes.
Current release - regressions:
- Revert "net: Add a second bind table hashed by port and address",
needs more work
- amd-xgbe: use platform_irq_count(), static setup of IRQ resources
had been removed from DT core
- dts: at91: ksz9477_evb: add phy-mode to fix port/phy validation
Current release - new code bugs:
- hns3: modify the ring param print info
Previous releases - always broken:
- axienet: make the 64b addressable DMA depends on 64b architectures
- iavf: fix issue with MAC address of VF shown as zero
- ice: fix PTP TX timestamp offset calculation
- usb: ax88179_178a needs FLAG_SEND_ZLP
Misc:
- document some net.sctp.* sysctls"
* tag 'net-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (31 commits)
net: axienet: add missing error return code in axienet_probe()
Revert "net: Add a second bind table hashed by port and address"
net: ax25: Fix deadlock caused by skb_recv_datagram in ax25_recvmsg
net: usb: ax88179_178a needs FLAG_SEND_ZLP
MAINTAINERS: add include/dt-bindings/net to NETWORKING DRIVERS
ARM: dts: at91: ksz9477_evb: fix port/phy validation
net: bgmac: Fix an erroneous kfree() in bgmac_remove()
ice: Fix memory corruption in VF driver
ice: Fix queue config fail handling
ice: Sync VLAN filtering features for DVM
ice: Fix PTP TX timestamp offset calculation
mlxsw: spectrum_cnt: Reorder counter pools
docs: networking: phy: Fix a typo
amd-xgbe: Use platform_irq_count()
octeontx2-vf: Add support for adaptive interrupt coalescing
xilinx: Fix build on x86.
net: axienet: Use iowrite64 to write all 64b descriptor pointers
net: axienet: make the 64b addresable DMA depends on 64b archectures
net: hns3: fix tm port shapping of fibre port is incorrect after driver initialization
net: hns3: fix PF rss size initialization bug
...
It should return error code in error path in axienet_probe().
Fixes: 00be43a74ca2 ("net: axienet: make the 64b addresable DMA depends on 64b archectures")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20220616062917.3601-1-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Two fixes for rc1 PR.
BR, Jarkko
-----BEGIN PGP SIGNATURE-----
iIgEABYIADAWIQRE6pSOnaBC00OEHEIaerohdGur0gUCYqoqzBIcamFya2tvQGtl
cm5lbC5vcmcACgkQGnq6IXRrq9I3ngD/ZnreuHUG82mz8mLwiTx+cLF+4b4rGgBi
iOJUCx38KSYBALhk2ipFWJ7lCBoeokysx9S7bYVEKwQepn9jutMzmyUP
=/OV9
-----END PGP SIGNATURE-----
Merge tag 'tpmdd-next-v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull tpm fixes from Jarkko Sakkinen:
"Two fixes for this merge window"
* tag 'tpmdd-next-v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
certs: fix and refactor CONFIG_SYSTEM_BLACKLIST_HASH_LIST build
certs/blacklist_hashes.c: fix const confusion in certs blacklist
Commit addf466389d9 ("certs: Check that builtin blacklist hashes are
valid") was applied 8 months after the submission.
In the meantime, the base code had been removed by commit b8c96a6b466c
("certs: simplify $(srctree)/ handling and remove config_filename
macro").
Fix the Makefile.
Create a local copy of $(CONFIG_SYSTEM_BLACKLIST_HASH_LIST). It is
included from certs/blacklist_hashes.c and also works as a timestamp.
Send error messages from check-blacklist-hashes.awk to stderr instead
of stdout.
Fixes: addf466389d9 ("certs: Check that builtin blacklist hashes are valid")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Mickaël Salaün <mic@linux.microsoft.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
This file fails to compile as follows:
CC certs/blacklist_hashes.o
certs/blacklist_hashes.c:4:1: error: ignoring attribute ‘section (".init.data")’ because it conflicts with previous ‘section (".init.rodata")’ [-Werror=attributes]
4 | const char __initdata *const blacklist_hashes[] = {
| ^~~~~
In file included from certs/blacklist_hashes.c:2:
certs/blacklist.h:5:38: note: previous declaration here
5 | extern const char __initconst *const blacklist_hashes[];
| ^~~~~~~~~~~~~~~~
Apply the same fix as commit 2be04df5668d ("certs/blacklist_nohashes.c:
fix const confusion in certs blacklist").
Fixes: 734114f8782f ("KEYS: Add a system blacklist keyring")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Mickaël Salaün <mic@linux.microsoft.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
On processing a module BTF of module built for an older kernel, we might
sometimes find that some type points to itself forming a loop. If such a
type is a modifier, btf_check_type_tags's while loop following modifier
chain will be caught in an infinite loop.
Fix this by defining a maximum chain length and bailing out if we spin
any longer than that.
Fixes: eb596b090558 ("bpf: Ensure type tags precede modifiers in BTF")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220615042151.2266537-1-memxor@gmail.com
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYqmpKwAKCRCRxhvAZXjc
ogvLAQCsgqKYjmqx1s9ta8PXH9qiTWLQh1/s3ONCAvSBe0rYRAD9HPwbUoxguqxr
T2RzjuX2+rqzA5qTErjQqVEftn7DgAo=
=+P6m
-----END PGP SIGNATURE-----
Merge tag 'fs.fixes.v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull vfs idmapping fix from Christian Brauner:
"This fixes an issue where we fail to change the group of a file when
the caller owns the file and is a member of the group to change to.
This is only relevant on idmapped mounts.
There's a detailed description in the commit message and regression
tests have been added to xfstests"
* tag 'fs.fixes.v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
fs: account for group membership
A customer reported a request_socket leak in a Calico cloud environment. We
found that a BPF program was doing a socket lookup with takes a refcnt on
the socket and that it was finding the request_socket but returning the parent
LISTEN socket via sk_to_full_sk() without decrementing the child request socket
1st, resulting in request_sock slab object leak. This patch retains the
existing behaviour of returning full socks to the caller but it also decrements
the child request_socket if one is present before doing so to prevent the leak.
Thanks to Curtis Taylor for all the help in diagnosing and testing this. And
thanks to Antoine Tenart for the reproducer and patch input.
v2 of this patch contains, refactor as per Daniel Borkmann's suggestions to
validate RCU flags on the listen socket so that it balances with bpf_sk_release()
and update comments as per Martin KaFai Lau's suggestion. One small change to
Daniels suggestion, put "sk = sk2" under "if (sk2 != sk)" to avoid an extra
instruction.
Fixes: f7355a6c0497 ("bpf: Check sk_fullsock() before returning from bpf_sk_lookup()")
Fixes: edbf8c01de5a ("bpf: add skc_lookup_tcp helper")
Co-developed-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Curtis Taylor <cutaylor-pub@yahoo.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/56d6f898-bde0-bb25-3427-12a330b29fb8@iogearbox.net
Link: https://lore.kernel.org/bpf/20220615011540.813025-1-jmaxwell37@gmail.com
The skb_recv_datagram() in ax25_recvmsg() will hold lock_sock
and block until it receives a packet from the remote. If the client
doesn`t connect to server and calls read() directly, it will not
receive any packets forever. As a result, the deadlock will happen.
The fail log caused by deadlock is shown below:
[ 369.606973] INFO: task ax25_deadlock:157 blocked for more than 245 seconds.
[ 369.608919] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 369.613058] Call Trace:
[ 369.613315] <TASK>
[ 369.614072] __schedule+0x2f9/0xb20
[ 369.615029] schedule+0x49/0xb0
[ 369.615734] __lock_sock+0x92/0x100
[ 369.616763] ? destroy_sched_domains_rcu+0x20/0x20
[ 369.617941] lock_sock_nested+0x6e/0x70
[ 369.618809] ax25_bind+0xaa/0x210
[ 369.619736] __sys_bind+0xca/0xf0
[ 369.620039] ? do_futex+0xae/0x1b0
[ 369.620387] ? __x64_sys_futex+0x7c/0x1c0
[ 369.620601] ? fpregs_assert_state_consistent+0x19/0x40
[ 369.620613] __x64_sys_bind+0x11/0x20
[ 369.621791] do_syscall_64+0x3b/0x90
[ 369.622423] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 369.623319] RIP: 0033:0x7f43c8aa8af7
[ 369.624301] RSP: 002b:00007f43c8197ef8 EFLAGS: 00000246 ORIG_RAX: 0000000000000031
[ 369.625756] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f43c8aa8af7
[ 369.626724] RDX: 0000000000000010 RSI: 000055768e2021d0 RDI: 0000000000000005
[ 369.628569] RBP: 00007f43c8197f00 R08: 0000000000000011 R09: 00007f43c8198700
[ 369.630208] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff845e6afe
[ 369.632240] R13: 00007fff845e6aff R14: 00007f43c8197fc0 R15: 00007f43c8198700
This patch replaces skb_recv_datagram() with an open-coded variant of it
releasing the socket lock before the __skb_wait_for_more_packets() call
and re-acquiring it after such call in order that other functions that
need socket lock could be executed.
what's more, the socket lock will be released only when recvmsg() will
block and that should produce nicer overall behavior.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Suggested-by: Thomas Osterried <thomas@osterried.de>
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reported-by: Thomas Habets <thomas@@habets.se>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The extra byte inserted by usbnet.c when
(length % dev->maxpacket == 0) is causing problems to device.
This patch sets FLAG_SEND_ZLP to avoid this.
Tested with: 0b95:1790 ASIX Electronics Corp. AX88179 Gigabit Ethernet
Problems observed:
======================================================================
1) Using ssh/sshfs. The remote sshd daemon can abort with the message:
"message authentication code incorrect"
This happens because the tcp message sent is corrupted during the
USB "Bulk out". The device calculate the tcp checksum and send a
valid tcp message to the remote sshd. Then the encryption detects
the error and aborts.
2) NETDEV WATCHDOG: ... (ax88179_178a): transmit queue 0 timed out
3) Stop normal work without any log message.
The "Bulk in" continue receiving packets normally.
The host sends "Bulk out" and the device responds with -ECONNRESET.
(The netusb.c code tx_complete ignore -ECONNRESET)
Under normal conditions these errors take days to happen and in
intense usage take hours.
A test with ping gives packet loss, showing that something is wrong:
ping -4 -s 462 {destination} # 462 = 512 - 42 - 8
Not all packets fail.
My guess is that the device tries to find another packet starting
at the extra byte and will fail or not depending on the next
bytes (old buffer content).
======================================================================
Signed-off-by: Jose Alonso <joalonsof@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2022-06-14
This series contains updates to ice driver only.
Michal fixes incorrect Tx timestamp offset calculation for E822 devices.
Roman enforces required VLAN filtering settings for double VLAN mode.
Przemyslaw fixes memory corruption issues with VFs by ensuring
queues are disabled in the error path of VF queue configuration and to
disabled VFs during reset.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Maintainers of the directory Documentation/devicetree/bindings/net
are also the maintainers of the corresponding directory
include/dt-bindings/net.
Add the file entry for include/dt-bindings/net to the appropriate
section in MAINTAINERS.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Link: https://lore.kernel.org/r/20220613121826.11484-1-lukas.bulwahn@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Latest drivers version requires phy-mode to be set. Otherwise we will
use "NA" mode and the switch driver will invalidate this port mode.
Fixes: 65ac79e18120 ("net: dsa: microchip: add the phylink get_caps")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/20220610081621.584393-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
'bgmac' is part of a managed resource allocated with bgmac_alloc(). It
should not be freed explicitly.
Remove the erroneous kfree() from the .remove() function.
Fixes: 34a5102c3235 ("net: bgmac: allocate struct bgmac just once & don't copy it")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/a026153108dd21239036a032b95c25b5cece253b.1655153616.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit e81fb4198e27 ("netfs: Further cleanups after struct netfs_inode
wrapper introduced") changed the argument types and names, and actually
updated the comment too (although that was thanks to David Howells, not
me: my original patch only changed the code).
But the comment fixup didn't go quite far enough, and didn't change the
argument name in the comment, resulting in
include/linux/netfs.h:314: warning: Function parameter or member 'ctx' not described in 'netfs_inode_init'
include/linux/netfs.h:314: warning: Excess function parameter 'inode' description in 'netfs_inode_init'
during htmldoc generation.
Fixes: e81fb4198e27 ("netfs: Further cleanups after struct netfs_inode wrapper introduced")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Disable VF's RX/TX queues, when VIRTCHNL_OP_CONFIG_VSI_QUEUES fail.
Not disabling them might lead to scenario, where PF driver leaves VF
queues enabled, when VF's VSI failed queue config.
In this scenario VF should not have RX/TX queues enabled. If PF failed
to set up VF's queues, VF will reset due to TX timeouts in VF driver.
Initialize iterator 'i' to -1, so if error happens prior to configuring
queues then error path code will not disable queue 0. Loop that
configures queues will is using same iterator, so error path code will
only disable queues that were configured.
Fixes: 77ca27c41705 ("ice: add support for virtchnl_queue_select.[tx|rx]_queues bitmap")
Suggested-by: Slawomir Laba <slawomirx.laba@intel.com>
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
VLAN filtering features, that is C-Tag and S-Tag, in DVM mode must be
both enabled or disabled.
In case of turning off/on only one of the features, another feature must
be turned off/on automatically with issuing an appropriate message to
the kernel log.
Fixes: 1babaf77f49d ("ice: Advertise 802.1ad VLAN filtering and offloads for PF netdev")
Signed-off-by: Roman Storozhenko <roman.storozhenko@intel.com>
Co-developed-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The offset was being incorrectly calculated for E822 - that led to
collisions in choosing TX timestamp register location when more than
one port was trying to use timestamping mechanism.
In E822 one quad is being logically split between ports, so quad 0 is
having trackers for ports 0-3, quad 1 ports 4-7 etc. Each port should
have separate memory location for tracking timestamps. Due to error for
example ports 1 and 2 had been assigned to quad 0 with same offset (0),
while port 1 should have offset 0 and 1 offset 16.
Fix it by correctly calculating quad offset.
Fixes: 3a7496234d17 ("ice: implement basic E822 PTP support")
Signed-off-by: Michal Michalik <michal.michalik@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
* Properly reset the SVE/SME flags on vcpu load
* Fix a vgic-v2 regression regarding accessing the pending
state of a HW interrupt from userspace (and make the code
common with vgic-v3)
* Fix access to the idreg range for protected guests
* Ignore 'kvm-arm.mode=protected' when using VHE
* Return an error from kvm_arch_init_vm() on allocation failure
* A bunch of small cleanups (comments, annotations, indentation)
RISC-V:
* Typo fix in arch/riscv/kvm/vmid.c
* Remove broken reference pattern from MAINTAINERS entry
x86-64:
* Fix error in page tables with MKTME enabled
* Dirty page tracking performance test extended to running a nested
guest
* Disable APICv/AVIC in cases that it cannot implement correctly
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmKjTIAUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroNhPQgAiIVtp8aepujUM/NhkNyK3SIdLzlS
oZCZiS6bvaecKXi/QvhBU0EBxAEyrovk3lmVuYNd41xI+PDjyaA4SDIl5DnToGUw
bVPNFSYqjpF939vUUKjc0RCdZR4o5g3Od3tvWoHTHviS1a8aAe5o9pcpHpD0D6Mp
Gc/o58nKAOPl3htcFKmjymqo3Y6yvkJU9NB7DCbL8T5mp5pJ959Mw1/LlmBaAzJC
OofrynUm4NjMyAj/mAB1FhHKFyQfjBXLhiVlS0SLiiEA/tn9/OXyVFMKG+n5VkAZ
Q337GMFe2RikEIuMEr3Rc4qbZK3PpxHhaj+6MPRuM0ho/P4yzl2Nyb/OhA==
=h81Q
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"While last week's pull request contained miscellaneous fixes for x86,
this one covers other architectures, selftests changes, and a bigger
series for APIC virtualization bugs that were discovered during 5.20
development. The idea is to base 5.20 development for KVM on top of
this tag.
ARM64:
- Properly reset the SVE/SME flags on vcpu load
- Fix a vgic-v2 regression regarding accessing the pending state of a
HW interrupt from userspace (and make the code common with vgic-v3)
- Fix access to the idreg range for protected guests
- Ignore 'kvm-arm.mode=protected' when using VHE
- Return an error from kvm_arch_init_vm() on allocation failure
- A bunch of small cleanups (comments, annotations, indentation)
RISC-V:
- Typo fix in arch/riscv/kvm/vmid.c
- Remove broken reference pattern from MAINTAINERS entry
x86-64:
- Fix error in page tables with MKTME enabled
- Dirty page tracking performance test extended to running a nested
guest
- Disable APICv/AVIC in cases that it cannot implement correctly"
[ This merge also fixes a misplaced end parenthesis bug introduced in
commit 3743c2f02517 ("KVM: x86: inhibit APICv/AVIC on changes to APIC
ID or APIC base") pointed out by Sean Christopherson ]
Link: https://lore.kernel.org/all/20220610191813.371682-1-seanjc@google.com/
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (34 commits)
KVM: selftests: Restrict test region to 48-bit physical addresses when using nested
KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2
KVM: selftests: Clean up LIBKVM files in Makefile
KVM: selftests: Link selftests directly with lib object files
KVM: selftests: Drop unnecessary rule for STATIC_LIBS
KVM: selftests: Add a helper to check EPT/VPID capabilities
KVM: selftests: Move VMX_EPT_VPID_CAP_AD_BITS to vmx.h
KVM: selftests: Refactor nested_map() to specify target level
KVM: selftests: Drop stale function parameter comment for nested_map()
KVM: selftests: Add option to create 2M and 1G EPT mappings
KVM: selftests: Replace x86_page_size with PG_LEVEL_XX
KVM: x86: SVM: fix nested PAUSE filtering when L0 intercepts PAUSE
KVM: x86: SVM: drop preempt-safe wrappers for avic_vcpu_load/put
KVM: x86: disable preemption around the call to kvm_arch_vcpu_{un|}blocking
KVM: x86: disable preemption while updating apicv inhibition
KVM: x86: SVM: fix avic_kick_target_vcpus_fast
KVM: x86: SVM: remove avic's broken code that updated APIC ID
KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base
KVM: x86: document AVIC/APICv inhibit reasons
KVM: x86/mmu: Set memory encryption "value", not "mask", in shadow PDPTRs
...
Two points of potential failure in the generic transmit function are:
1. completion queue (cq) reservation failure.
2. skb allocation failure
Originally the cq reservation was performed first, followed by the skb
allocation. Commit 675716400da6 ("xdp: fix possible cq entry leak")
reversed the order because at the time there was no mechanism available
to undo the cq reservation which could have led to possible cq entry leaks
in the event of skb allocation failure. However if the skb allocation is
performed first and the cq reservation then fails, the xsk skb destructor
is called which blindly adds the skb address to the already full cq leading
to undefined behavior.
This commit restores the original order (cq reservation followed by skb
allocation) and uses the xskq_prod_cancel helper to undo the cq reserve
in event of skb allocation failure.
Fixes: 675716400da6 ("xdp: fix possible cq entry leak")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20220614070746.8871-1-ciara.loftus@intel.com
Stale Data.
They are a class of MMIO-related weaknesses which can expose stale data
by propagating it into core fill buffers. Data which can then be leaked
using the usual speculative execution methods.
Mitigations include this set along with microcode updates and are
similar to MDS and TAA vulnerabilities: VERW now clears those buffers
too.
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmKXMkMTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoWGPD/idalLIhhV5F2+hZIKm0WSnsBxAOh9K
7y8xBxpQQ5FUfW3vm7Pg3ro6VJp7w2CzKoD4lGXzGHriusn3qst3vkza9Ay8xu8g
RDwKe6hI+p+Il9BV9op3f8FiRLP9bcPMMReW/mRyYsOnJe59hVNwRAL8OG40PY4k
hZgg4Psfvfx8bwiye5efjMSe4fXV7BUCkr601+8kVJoiaoszkux9mqP+cnnB5P3H
zW1d1jx7d6eV1Y063h7WgiNqQRYv0bROZP5BJkufIoOHUXDpd65IRF3bDnCIvSEz
KkMYJNXb3qh7EQeHS53NL+gz2EBQt+Tq1VH256qn6i3mcHs85HvC68gVrAkfVHJE
QLJE3MoXWOqw+mhwzCRrEXN9O1lT/PqDWw8I4M/5KtGG/KnJs+bygmfKBbKjIVg4
2yQWfMmOgQsw3GWCRjgEli7aYbDJQjany0K/qZTq54I41gu+TV8YMccaWcXgDKrm
cXFGUfOg4gBm4IRjJ/RJn+mUv6u+/3sLVqsaFTs9aiib1dpBSSUuMGBh548Ft7g2
5VbFVSDaLjB2BdlcG7enlsmtzw0ltNssmqg7jTK/L7XNVnvxwUoXw+zP7RmCLEYt
UV4FHXraMKNt2ZketlomC8ui2hg73ylUp4pPdMXCp7PIXp9sVamRTbpz12h689VJ
/s55bWxHkR6S
=LBxT
-----END PGP SIGNATURE-----
Merge tag 'x86-bugs-2022-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 MMIO stale data fixes from Thomas Gleixner:
"Yet another hw vulnerability with a software mitigation: Processor
MMIO Stale Data.
They are a class of MMIO-related weaknesses which can expose stale
data by propagating it into core fill buffers. Data which can then be
leaked using the usual speculative execution methods.
Mitigations include this set along with microcode updates and are
similar to MDS and TAA vulnerabilities: VERW now clears those buffers
too"
* tag 'x86-bugs-2022-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/speculation/mmio: Print SMT warning
KVM: x86/speculation: Disable Fill buffer clear within guests
x86/speculation/mmio: Reuse SRBDS mitigation for SBDS
x86/speculation/srbds: Update SRBDS mitigation selection
x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data
x86/speculation/mmio: Enable CPU Fill buffer clearing on idle
x86/bugs: Group MDS, TAA & Processor MMIO Stale Data mitigations
x86/speculation/mmio: Add mitigation for Processor MMIO Stale Data
x86/speculation: Add a common function for MD_CLEAR mitigation update
x86/speculation/mmio: Enumerate Processor MMIO Stale Data bug
Documentation: Add documentation for Processor MMIO Stale Data