IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
[ Upstream commit 6d47b53fb3f363a74538a1dbd09954af3d8d4131 ]
When failed to create xdp rxqs or fill rx channels in cpsw_ndo_open() for
opening device, napi isn't disabled. When open cpsw device next time, it
will report a invalid opcode issue. Compiled tested only.
Fixes: d354eb85d618 ("drivers: net: cpsw: dual_emac: simplify napi usage")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20221109011537.96975-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f4f4096b410e8d31c3f07f39de3b17d144edd53d ]
The pkt_reformat pointer being saved under flow_act and not
dest attribute in the termination table instance.
Fix the comparison pointers.
Also fix returning success if one pkt_reformat pointer is null
and the other is not.
Fixes: 249ccc3c95bd ("net/mlx5e: Add support for offloading traffic from uplink to uplink")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Chris Mi <cmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2808b37b59288ad8f1897e3546c2296df3384b65 ]
For a single CPU system, the kernel thread executing mlx5_cmd_flush()
never releases the CPU but calls down_trylock(&cmd→sem) in a busy loop.
On a single processor system, this leads to a deadlock as the kernel
thread which executes mlx5_cmd_invoke() never gets scheduled. Fix this,
by adding the cond_resched() call to the loop, allow the command
completion kernel thread to execute.
Fixes: 8e715cd613a1 ("net/mlx5: Set command entry semaphore up once got index free")
Signed-off-by: Alexander Schmidt <alexschm@de.ibm.com>
Signed-off-by: Roy Novich <royno@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 15f8f168952f54d3c86d734dc764f20844e423ac ]
Mlx5 LAG is initialized asynchronously on a workqueue which means that for
a brief moment after setting mlx5 UL representors as lower devices of a
bond netdevice the LAG itself is not fully initialized in the driver. When
adding such bond device to a bridge mlx5 bridge code will not consider it
as offload-capable, skip creating necessary bookkeeping and fail any
further bridge offload-related commands with it (setting VLANs, offloading
FDBs, etc.). In order to make the error explicit during bridge
initialization stage implement the code that detects such condition during
NETDEV_PRECHANGEUPPER event and returns an error.
Fixes: ff9b7521468b ("net/mlx5: Bridge, support LAG")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d38a648d2d6cc7bee11c6f533ff9426a00c2a74c ]
ipc_pcie_read_bios_cfg() is using the acpi_evaluate_dsm() to
obtain the wwan power state configuration from BIOS but is
not freeing the acpi_object. The acpi_evaluate_dsm() returned
acpi_object to be freed.
Free the acpi_object after use.
Fixes: 7e98d785ae61 ("net: iosm: entry point")
Signed-off-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b06334919c7a068d54ba5b219c05e919d89943f7 ]
When failed to enable interrupts in nixge_open() for opening device,
napi isn't disabled. When open nixge device next time, it will reports
a invalid opcode issue. Fix it. Only be compiled, not be tested.
Fixes: 492caffa8a1a ("net: ethernet: nixge: Add support for National Instruments XGE netdev")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20221107101443.120205-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 519b58bbfa825f042fcf80261cc18e1e35f85ffd ]
When prestera_sdma_switch_init() failed, the memory pointed to by
sw->rxtx isn't released. Fix it. Only be compiled, not be tested.
Fixes: 501ef3066c89 ("net: marvell: prestera: Add driver for Prestera family ASIC devices")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Vadym Kochan <vadym.kochan@plvision.eu>
Link: https://lore.kernel.org/r/20221108025607.338450-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 03c1f1ef1584c981935fab2fa0c45d3e43e2c235 ]
syzbot reported a warning like below [1]:
WARNING: CPU: 3 PID: 9 at net/netfilter/nf_tables_api.c:10096 nf_tables_exit_net+0x71c/0x840
Modules linked in:
CPU: 2 PID: 9 Comm: kworker/u8:0 Tainted: G W 6.1.0-rc3-00072-g8e5423e991e8 #47
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014
Workqueue: netns cleanup_net
RIP: 0010:nf_tables_exit_net+0x71c/0x840
...
Call Trace:
<TASK>
? __nft_release_table+0xfc0/0xfc0
ops_exit_list+0xb5/0x180
cleanup_net+0x506/0xb10
? unregister_pernet_device+0x80/0x80
process_one_work+0xa38/0x1730
? pwq_dec_nr_in_flight+0x2b0/0x2b0
? rwlock_bug.part.0+0x90/0x90
? _raw_spin_lock_irq+0x46/0x50
worker_thread+0x67e/0x10e0
? process_one_work+0x1730/0x1730
kthread+0x2e5/0x3a0
? kthread_complete_and_exit+0x40/0x40
ret_from_fork+0x1f/0x30
</TASK>
In nf_tables_exit_net(), there is a case where nft_net->commit_list is
empty but nft_net->module_list is not empty. Such a case occurs with
the following scenario:
1. nfnetlink_rcv_batch() is called
2. nf_tables_newset() returns -EAGAIN and NFNL_BATCH_FAILURE bit is
set to status
3. nf_tables_abort() is called with NFNL_ABORT_AUTOLOAD
(nft_net->commit_list is released, but nft_net->module_list is not
because of NFNL_ABORT_AUTOLOAD flag)
4. Jump to replay label
5. netlink_skb_clone() fails and returns from the function (this is
caused by fault injection in the reproducer of syzbot)
This patch fixes this issue by calling __nf_tables_abort() when
nft_net->module_list is not empty in nf_tables_exit_net().
Fixes: eb014de4fd41 ("netfilter: nf_tables: autoload modules from the abort path")
Link: https://syzkaller.appspot.com/bug?id=802aba2422de4218ad0c01b46c9525cc9d4e4aa3 [1]
Reported-by: syzbot+178efee9e2d7f87f5103@syzkaller.appspotmail.com
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 03832a32bf8ff0a8305d94ddd3979835a807248f ]
When type is NFNL_CB_MUTEX and -EAGAIN error occur in nfnetlink_rcv_msg(),
it does not execute nfnl_unlock(). That would trigger potential dead lock.
Fixes: 50f2db9e368f ("netfilter: nfnetlink: consolidate callback types")
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 94d957ae513fc420d0a5a9bac815eb49ffebb56f ]
Commit 3af1dfdd51e06697 ("perf build: Move perf_dlfilters.h in the
source tree") moved perf_dlfilters.h to the include/perf/ directory
while include/perf is ignored because it has 'perf' in the name. Newly
created files in the include/perf/ directory will be ignored.
Testing:
Before:
$ touch tools/perf/include/perf/junk
$ git status | grep junk
$ git check-ignore -v tools/perf/include/perf/junk
tools/perf/.gitignore:6:perf tools/perf/include/perf/junk
After:
$ git status | grep junk
tools/perf/include/perf/junk
$ git check-ignore -v tools/perf/include/perf/junk
Add !include/perf/ to perf's .gitignore file.
Fixes: 3af1dfdd51e06697 ("perf build: Move perf_dlfilters.h in the source tree")
Signed-off-by: Donglin Peng <dolinux.peng@gmail.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221103092704.173391-1-dolinux.peng@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ad353b710c7493df3d4fc2d3a51819126bed2e81 ]
'perf stat' with CSV output option prints an extra empty string as first
field in metrics output line. Sample output below:
# ./perf stat -x, --per-socket -a -C 1 ls
S0,1,1.78,msec,cpu-clock,1785146,100.00,0.973,CPUs utilized
S0,1,26,,context-switches,1781750,100.00,0.015,M/sec
S0,1,1,,cpu-migrations,1780526,100.00,0.561,K/sec
S0,1,1,,page-faults,1779060,100.00,0.561,K/sec
S0,1,875807,,cycles,1769826,100.00,0.491,GHz
S0,1,85281,,stalled-cycles-frontend,1767512,100.00,9.74,frontend cycles idle
S0,1,576839,,stalled-cycles-backend,1766260,100.00,65.86,backend cycles idle
S0,1,288430,,instructions,1762246,100.00,0.33,insn per cycle
====> ,S0,1,,,,,,,2.00,stalled cycles per insn
The above command line uses field separator as "," via "-x," option and
per-socket option displays socket value as first field. But here the
last line for "stalled cycles per insn" has "," in the beginning.
Sample output using interval mode:
# ./perf stat -I 1000 -x, --per-socket -a -C 1 ls
0.001813453,S0,1,1.87,msec,cpu-clock,1872052,100.00,0.002,CPUs utilized
0.001813453,S0,1,2,,context-switches,1868028,100.00,1.070,K/sec
------
0.001813453,S0,1,85379,,instructions,1856754,100.00,0.32,insn per cycle
====> 0.001813453,,S0,1,,,,,,,1.34,stalled cycles per insn
Above result also has an extra CSV separator after
the timestamp. Patch addresses extra field separator
in the beginning of the metric output line.
The counter stats are displayed by function
"perf_stat__print_shadow_stats" in code
"util/stat-shadow.c". While printing the stats info
for "stalled cycles per insn", function "new_line_csv"
is used as new_line callback.
The new_line_csv function has check for "os->prefix"
and if prefix is not null, it will be printed along
with cvs separator.
Snippet from "new_line_csv":
if (os->prefix)
fprintf(os->fh, "%s%s", os->prefix, config->csv_sep);
Here os->prefix gets printed followed by ","
which is the cvs separator. The os->prefix is
used in interval mode option ( -I ), to print
time stamp on every new line. But prefix is
already set to contain CSV separator when used
in interval mode for CSV option.
Reference: Function "static void print_interval"
Snippet:
sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, config->csv_sep);
Also if prefix is not assigned (if not used with
-I option), it gets set to empty string.
Reference: function printout() in util/stat-display.c
Snippet:
.prefix = prefix ? prefix : "",
Since prefix already set to contain cvs_sep in interval
option, patch removes printing config->csv_sep in
new_line_csv function to avoid printing extra field.
After the patch:
# ./perf stat -x, --per-socket -a -C 1 ls
S0,1,2.04,msec,cpu-clock,2045202,100.00,1.013,CPUs utilized
S0,1,2,,context-switches,2041444,100.00,979.289,/sec
S0,1,0,,cpu-migrations,2040820,100.00,0.000,/sec
S0,1,2,,page-faults,2040288,100.00,979.289,/sec
S0,1,254589,,cycles,2036066,100.00,0.125,GHz
S0,1,82481,,stalled-cycles-frontend,2032420,100.00,32.40,frontend cycles idle
S0,1,113170,,stalled-cycles-backend,2031722,100.00,44.45,backend cycles idle
S0,1,88766,,instructions,2030942,100.00,0.35,insn per cycle
S0,1,,,,,,,1.27,stalled cycles per insn
Fixes: 92a61f6412d3a09d ("perf stat: Implement CSV metrics output")
Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Reviewed-By: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20221018085605.63834-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ce9e57feeed81d17d5e80ed86f516ff0d39c3867 ]
When failed to register irq in xgene_enet_open() for opening device,
napi isn't disabled. When open xgene device next time, it will reports
a invalid opcode issue. Fix it. Only be compiled, not be tested.
Fixes: aeb20b6b3f4e ("drivers: net: xgene: fix: ifconfig up/down crash")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20221107043032.357673-1-shaozhengchao@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3faf7e14ec0c3462c2d747fa6793b8645d1391df ]
If lapb_register() failed when lapb device goes to up for the first time,
the NAPI is not disabled. As a result, the invalid opcode issue is
reported when the lapb device goes to up for the second time.
The stack info is as follows:
[ 1958.311422][T11356] kernel BUG at net/core/dev.c:6442!
[ 1958.312206][T11356] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
[ 1958.315979][T11356] RIP: 0010:napi_enable+0x16a/0x1f0
[ 1958.332310][T11356] Call Trace:
[ 1958.332817][T11356] <TASK>
[ 1958.336135][T11356] lapbeth_open+0x18/0x90
[ 1958.337446][T11356] __dev_open+0x258/0x490
[ 1958.341672][T11356] __dev_change_flags+0x4d4/0x6a0
[ 1958.345325][T11356] dev_change_flags+0x93/0x160
[ 1958.346027][T11356] devinet_ioctl+0x1276/0x1bf0
[ 1958.346738][T11356] inet_ioctl+0x1c8/0x2d0
[ 1958.349638][T11356] sock_ioctl+0x5d1/0x750
[ 1958.356059][T11356] __x64_sys_ioctl+0x3ec/0x1790
[ 1958.365594][T11356] do_syscall_64+0x35/0x80
[ 1958.366239][T11356] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 1958.377381][T11356] </TASK>
Fixes: 514e1150da9c ("net: x25: Queue received packets in the drivers instead of per-CPU queues")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20221107011445.207372-1-shaozhengchao@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ac2b9f34f02052709aea7b34bb2a165e1853eb41 ]
If device_register() fails, it should call put_device() to give
up reference, the name allocated in dev_set_name() can be freed
in callback function kobject_cleanup().
Fixes: 5b65781d06ea ("dmaengine: ti: k3-udma-glue: Add support for K3 PKTDMA")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Acked-by: Peter Ujfalusi <peter.ujfalusi@gmail.com>
Link: https://lore.kernel.org/r/20221020062827.2914148-1-yangyingliang@huawei.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 081195d17a0c4c636da2b869bd5809d42e8cbb13 ]
A clk_prepare_enable() call in the probe is not balanced by a corresponding
clk_disable_unprepare() in the remove function.
Add the missing call.
Fixes: 3cd2c313f1d6 ("dmaengine: mv_xor_v2: Fix clock resource by adding a register clock")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/e9e3837a680c9bd2438e4db2b83270c6c052d005.1666640987.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b3d726cb8497c6b12106fd617d46eef11763ea86 ]
The first IRQ is required, but IRQs 1 through (nb_phy_chans - 1) are
optional, because on some platforms (e.g. PXA168) there is a single IRQ
shared between all channels.
This change inhibits a flood of "IRQ index # not found" messages at
startup. Tested on a PXA168-based device.
Fixes: 7723f4c5ecdb ("driver core: platform: Add an error message to platform_get_irq*()")
Signed-off-by: Doug Brown <doug@schmorgal.com>
Link: https://lore.kernel.org/r/20220906000709.52705-1-doug@schmorgal.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1c075b192fe41030457cd4a5f7dea730412bca40 ]
This is a follow-up for commit 974cb0e3e7c9 ("tipc: fix uninit-value
in tipc_nl_compat_name_table_dump") where it should have type casted
sizeof(..) to int to work when TLV_GET_DATA_LEN() returns a negative
value.
syzbot reported a call trace because of it:
BUG: KMSAN: uninit-value in ...
tipc_nl_compat_name_table_dump+0x841/0xea0 net/tipc/netlink_compat.c:934
__tipc_nl_compat_dumpit+0xab2/0x1320 net/tipc/netlink_compat.c:238
tipc_nl_compat_dumpit+0x991/0xb50 net/tipc/netlink_compat.c:321
tipc_nl_compat_recv+0xb6e/0x1640 net/tipc/netlink_compat.c:1324
genl_family_rcv_msg_doit net/netlink/genetlink.c:731 [inline]
genl_family_rcv_msg net/netlink/genetlink.c:775 [inline]
genl_rcv_msg+0x103f/0x1260 net/netlink/genetlink.c:792
netlink_rcv_skb+0x3a5/0x6c0 net/netlink/af_netlink.c:2501
genl_rcv+0x3c/0x50 net/netlink/genetlink.c:803
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0xf3b/0x1270 net/netlink/af_netlink.c:1345
netlink_sendmsg+0x1288/0x1440 net/netlink/af_netlink.c:1921
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg net/socket.c:734 [inline]
Reported-by: syzbot+e5dbaaa238680ce206ea@syzkaller.appspotmail.com
Fixes: 974cb0e3e7c9 ("tipc: fix uninit-value in tipc_nl_compat_name_table_dump")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/ccd6a7ea801b15aec092c3b532a883b4c5708695.1667594933.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ed4314f7729714d788698ade4f9905ee5378ebc0 ]
There are two problems with meson8b_devm_clk_prepare_enable(),
introduced in commit a54dc4a49045 ("net: stmmac: dwmac-meson8b:
Make the clock enabling code re-usable"):
- It doesn't pass the clk argument, but instead always the
rgmii_tx_clk of the device.
- It silently ignores the return value of devm_add_action_or_reset().
The former didn't become an actual bug until another user showed up in
the next commit 9308c47640d5 ("net: stmmac: dwmac-meson8b: add support
for the RX delay configuration"). The latter means the callers could
end up with the clock not actually prepared/enabled.
Fixes: a54dc4a49045 ("net: stmmac: dwmac-meson8b: Make the clock enabling code re-usable")
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Link: https://lore.kernel.org/r/20221104083004.2212520-1-linux@rasmusvillemoes.dk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8aa59e355949442c408408c2d836e561794c40a1 ]
It causes NULL pointer dereference when testing as following:
(a) use syscall(__NR_socket, 0x10ul, 3ul, 0) to create netlink socket.
(b) use syscall(__NR_sendmsg, ...) to create bond link device and vxcan
link device, and bind vxcan device to bond device (can also use
ifenslave command to bind vxcan device to bond device).
(c) use syscall(__NR_socket, 0x1dul, 3ul, 1) to create CAN socket.
(d) use syscall(__NR_bind, ...) to bind the bond device to CAN socket.
The bond device invokes the can-raw protocol registration interface to
receive CAN packets. However, ml_priv is not allocated to the dev,
dev_rcv_lists is assigned to NULL in can_rx_register(). In this case,
it will occur the NULL pointer dereference issue.
The following is the stack information:
BUG: kernel NULL pointer dereference, address: 0000000000000008
PGD 122a4067 P4D 122a4067 PUD 1223c067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
RIP: 0010:can_rx_register+0x12d/0x1e0
Call Trace:
<TASK>
raw_enable_filters+0x8d/0x120
raw_enable_allfilters+0x3b/0x130
raw_bind+0x118/0x4f0
__sys_bind+0x163/0x1a0
__x64_sys_bind+0x1e/0x30
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
</TASK>
Fixes: 4e096a18867a ("net: introduce CAN specific pointer in the struct net_device")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Marc Kleine-Budde <mkl@pengutronix.de>
Link: https://lore.kernel.org/all/20221028085650.170470-1-shaozhengchao@huawei.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0c175da7b0378445f5ef53904247cfbfb87e0b78 ]
If setsockopt with option name of TCP_REPAIR_OPTIONS and opt_code
of TCPOPT_SACK_PERM is called to enable sack after data is sent
and dupacks are received , it will trigger a warning in function
tcp_verify_left_out() as follows:
============================================
WARNING: CPU: 8 PID: 0 at net/ipv4/tcp_input.c:2132
tcp_timeout_mark_lost+0x154/0x160
tcp_enter_loss+0x2b/0x290
tcp_retransmit_timer+0x50b/0x640
tcp_write_timer_handler+0x1c8/0x340
tcp_write_timer+0xe5/0x140
call_timer_fn+0x3a/0x1b0
__run_timers.part.0+0x1bf/0x2d0
run_timer_softirq+0x43/0xb0
__do_softirq+0xfd/0x373
__irq_exit_rcu+0xf6/0x140
The warning is caused in the following steps:
1. a socket named socketA is created
2. socketA enters repair mode without build a connection
3. socketA calls connect() and its state is changed to TCP_ESTABLISHED
directly
4. socketA leaves repair mode
5. socketA calls sendmsg() to send data, packets_out and sack_outs(dup
ack receives) increase
6. socketA enters repair mode again
7. socketA calls setsockopt with TCPOPT_SACK_PERM to enable sack
8. retransmit timer expires, it calls tcp_timeout_mark_lost(), lost_out
increases
9. sack_outs + lost_out > packets_out triggers since lost_out and
sack_outs increase repeatly
In function tcp_timeout_mark_lost(), tp->sacked_out will be cleared if
Step7 not happen and the warning will not be triggered. As suggested by
Denis and Eric, TCP_REPAIR_OPTIONS should be prohibited if data was
already sent.
socket-tcp tests in CRIU has been tested as follows:
$ sudo ./test/zdtm.py run -t zdtm/static/socket-tcp* --keep-going \
--ignore-taint
socket-tcp* represent all socket-tcp tests in test/zdtm/static/.
Fixes: b139ba4e90dc ("tcp: Repair connection-time negotiated parameters")
Signed-off-by: Lu Wei <luwei32@huawei.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cf53db768a8790fdaae2fa3a81322b080285f7e5 ]
A problem about modprobe vc4 failed is triggered with the following log
given:
[ 420.327987] Error: Driver 'vc4_hvs' is already registered, aborting...
[ 420.333904] failed to register platform driver vc4_hvs_driver [vc4]: -16
modprobe: ERROR: could not insert 'vc4': Device or resource busy
The reason is that vc4_drm_register() returns platform_driver_register()
directly without checking its return value, if platform_driver_register()
fails, it returns without unregistering all the vc4 drivers, resulting the
vc4 can never be installed later.
A simple call graph is shown as below:
vc4_drm_register()
platform_register_drivers() # all vc4 drivers are registered
platform_driver_register()
driver_register()
bus_add_driver()
priv = kzalloc(...) # OOM happened
# return without unregister drivers
Fixing this problem by checking the return value of
platform_driver_register() and do platform_unregister_drivers() if
error happened.
Fixes: c8b75bca92cb ("drm/vc4: Add KMS support for Raspberry Pi.")
Signed-off-by: Yuan Can <yuancan@huawei.com>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20221103014705.109322-1-yuancan@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 668205b9c9f94d5ed6ab00cce9a46a654c2b5d16 ]
MHI driver registers network device without setting the
needs_free_netdev flag, and does NOT call free_netdev() when
unregisters network device, which causes a memory leak.
This patch sets needs_free_netdev to true when registers
network device, which makes netdev subsystem call free_netdev()
automatically after unregister_netdevice().
Fixes: aa730a9905b7 ("net: wwan: Add MHI MBIM network driver")
Signed-off-by: HW He <hw.he@mediatek.com>
Signed-off-by: Zhaoping Shu <zhaoping.shu@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f25caaca424703d5a0607310f0452f978f1f78d9 ]
IOSM driver registers network device without setting the
needs_free_netdev flag, and does NOT call free_netdev() when
unregisters network device, which causes a memory leak.
This patch sets needs_free_netdev to true when registers
network device, which makes netdev subsystem call free_netdev()
automatically after unregister_netdevice().
Fixes: 2a54f2c77934 ("net: iosm: net driver")
Signed-off-by: HW He <hw.he@mediatek.com>
Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: Zhaoping Shu <zhaoping.shu@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 85cbaf032d3cd9f595152625eda5d4ecb1d6d78d ]
When following tests are performed, it will cause dev reference counting
leakage.
a)ip link add bond2 type bond mode balance-rr
b)ip link set bond2 up
c)ifenslave -f bond2 rose1
d)ip link del bond2
When new bond device is created, the default type of the bond device is
ether. And the bond device is up, bpq_device_event() receives the message
and creates a new bpq device. In this case, the reference count value of
dev is hold once. But after "ifenslave -f bond2 rose1" command is
executed, the type of the bond device is changed to rose. When the bond
device is unregistered, bpq_device_event() will not put the dev reference
count.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 531705a765493655472c993627106e19f7e5a6d2 ]
When following tests are performed, it will cause dev reference counting
leakage.
a)ip link add bond2 type bond mode balance-rr
b)ip link set bond2 up
c)ifenslave -f bond2 rose1
d)ip link del bond2
When new bond device is created, the default type of the bond device is
ether. And the bond device is up, lapbeth_device_event() receives the
message and creates a new lapbeth device. In this case, the reference
count value of dev is hold once. But after "ifenslave -f bond2 rose1"
command is executed, the type of the bond device is changed to rose. When
the bond device is unregistered, lapbeth_device_event() will not put the
dev reference count.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6973091d1b50ab4042f6a2d495f59e9db3662ab8 ]
When running under PV, the guest's TOD clock is under control of the
ultravisor and the hypervisor isn't allowed to change it. Hence, don't
allow userspace to change the guest's TOD clock by returning
-EOPNOTSUPP.
When userspace changes the guest's TOD clock, KVM updates its
kvm.arch.epoch field and, in addition, the epoch field in all state
descriptions of all VCPUs.
But, under PV, the ultravisor will ignore the epoch field in the state
description and simply overwrite it on next SIE exit with the actual
guest epoch. This leads to KVM having an incorrect view of the guest's
TOD clock: it has updated its internal kvm.arch.epoch field, but the
ultravisor ignores the field in the state description.
Whenever a guest is now waiting for a clock comparator, KVM will
incorrectly calculate the time when the guest should wake up, possibly
causing the guest to sleep for much longer than expected.
With this change, kvm_s390_set_tod() will now take the kvm->lock to be
able to call kvm_s390_pv_is_protected(). Since kvm_s390_set_tod_clock()
also takes kvm->lock, use __kvm_s390_set_tod_clock() instead.
The function kvm_s390_set_tod_clock is now unused, hence remove it.
Update the documentation to indicate the TOD clock attr calls can now
return -EOPNOTSUPP.
Fixes: 0f3035047140 ("KVM: s390: protvirt: Do only reset registers that are accessible")
Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Signed-off-by: Nico Boehr <nrb@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Link: https://lore.kernel.org/r/20221011160712.928239-2-nrb@linux.ibm.com
Message-Id: <20221011160712.928239-2-nrb@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 819b885cd886c193782891c4f51bbcab3de119a4 ]
With mt7621 soc_dev_attr fixed to register the soc as a device,
kernel will experience an oops in soc_device_match_attr
This quirk test was introduced in the staging driver in
commit 9445ccb3714c ("staging: mt7621-pci-phy: add quirks for 'E2'
revision using 'soc_device_attribute'"). The staging driver was removed,
and later re-added in commit d87da32372a0 ("phy: ralink: Add PHY driver
for MT7621 PCIe PHY") for kernel 5.11
Link: https://lore.kernel.org/lkml/26ebbed1-0fe9-4af9-8466-65f841d0b382@app.fastmail.com
Fixes: d87da32372a0 ("phy: ralink: Add PHY driver for MT7621 PCIe PHY")
Signed-off-by: John Thomson <git@johnthomson.fastmail.com.au>
Acked-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Link: https://lore.kernel.org/r/20221104205242.3440388-2-git@johnthomson.fastmail.com.au
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 46653972e3ea64f79e7f8ae3aa41a4d3fdb70a13 ]
Shifting signed 32-bit value by 31 bits is undefined, so changing
significant bit to unsigned. The UBSAN warning calltrace like below:
UBSAN: shift-out-of-bounds in security/commoncap.c:1252:2
left shift of 1 by 31 places cannot be represented in type 'int'
Call Trace:
<TASK>
dump_stack_lvl+0x7d/0xa5
dump_stack+0x15/0x1b
ubsan_epilogue+0xe/0x4e
__ubsan_handle_shift_out_of_bounds+0x1e7/0x20c
cap_task_prctl+0x561/0x6f0
security_task_prctl+0x5a/0xb0
__x64_sys_prctl+0x61/0x8f0
do_syscall_64+0x58/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
</TASK>
Fixes: e338d263a76a ("Add 64-bit capability support to the kernel")
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Acked-by: Andrew G. Morgan <morgan@kernel.org>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b7cbc6740bd6ad5d43345a2504f7e4beff0d709f ]
When the mac device gets removed, it leaves behind the ethernet device.
This will result in a segfault next time the ethernet device accesses
mac_dev. Remove the ethernet device when we get removed to prevent
this. This is not completely reversible, since some resources aren't
cleaned up properly, but that can be addressed later.
Fixes: 3933961682a3 ("fsl/fman: Add FMan MAC driver")
Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Link: https://lore.kernel.org/r/20221103182831.2248833-1-sean.anderson@seco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 02597d39145bb0aa81d04bf39b6a913ce9a9d465 ]
In the bnxt_en driver ndo_rx_flow_steer returns '0' whenever an entry
that we are attempting to steer is already found. This is not the
correct behavior. The return code should be the value/index that
corresponds to the entry. Returning zero all the time causes the
RFS records to be incorrect unless entry '0' is the correct one. As
flows migrate to different cores this can create entries that are not
correct.
Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.")
Reported-by: Akshay Navgire <anavgire@purestorage.com>
Signed-off-by: Alex Barba <alex.barba@broadcom.com>
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 51afe9026d0c63263abe9840e629f118d7405b36 ]
In scenarios where multiple errors have occurred
for a SQ before SW starts handling error interrupt,
SQ_CTX[OP_INT] may get overwritten leading to
NIX_LF_SQ_OP_INT returning incorrect value.
To workaround this read LMT, MNQ and SQ individual
error status registers to determine the cause of error.
Fixes: 4ff7d1488a84 ("octeontx2-pf: Error handling support")
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Reviewed-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit af3826db74d184bc9c2c9d3ff34548e5f317a6f3 ]
Current driver uses software CQ head pointer to poll on CQE
header in memory to determine if CQE is valid. Software needs
to make sure, that the reads of the CQE do not get re-ordered
so much that it ends up with an inconsistent view of the CQE.
To ensure that DMB barrier after read to first CQE cacheline
and before reading of the rest of the CQE is needed.
But having barrier for every CQE read will impact the performance,
instead use hardware CQ head and tail pointers to find the
valid number of CQEs.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: 51afe9026d0c ("octeontx2-pf: NIX TX overwrites SQ_CTX_HW_S[SQ_INT]")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit aaab73f8fba4fd38f4d2617440d541a1c334e819 ]
macsec_add_rxsa and macsec_add_txsa copy the key to an on-stack
offloading context to pass it to the drivers, but leaves it there when
it's done. Clear it with memzero_explicit as soon as it's not needed
anymore.
Fixes: 3cf3227a21d1 ("net: macsec: hardware offloading infrastructure")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 80df4706357a5a06bbbc70273bf2611df1ceee04 ]
macsec_is_configured incorrectly uses secy->n_rx_sc to check if some
RXSCs exist. secy->n_rx_sc only counts the number of active RXSCs, but
there can also be inactive SCs as well, which may be stored in the
driver (in case we're disabling offloading), or would have to be
pushed to the device (in case we're trying to enable offloading).
As long as RXSCs active on creation and never turned off, the issue is
not visible.
Fixes: dcb780fb2795 ("net: macsec: add nla support for changing the offloading selection")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 73a4b31c9d11f98ae3bc5286d5382930adb0e9c7 ]
secy->n_rx_sc is supposed to be the number of _active_ rxsc's within a
secy. This is then used by macsec_send_sci to help decide if we should
add the SCI to the header or not.
This logic is currently broken when we create a new RXSC and turn it
off at creation, as create_rx_sc always sets ->active to true (and
immediately uses that to increment n_rx_sc), and only later
macsec_add_rxsc sets rx_sc->active.
Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 93a30947821c203d08865c4e17ea181c9668ce52 ]
Currently we get an inconsistent state:
- netlink returns the error to userspace
- the RXSC is installed but not offloaded
Then the device could get confused when we try to add an RXSA, because
the RXSC isn't supposed to exist.
Fixes: 3cf3227a21d1 ("net: macsec: hardware offloading infrastructure")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9e4b7a99a03aefd37ba7bb1f022c8efab5019165 ]
Since commit 3dcbdb134f32 ("net: gso: Fix skb_segment splat when
splitting gso_size mangled skb having linear-headed frag_list"), it is
allowed to change gso_size of a GRO packet. However, that commit assumes
that "checking the first list_skb member suffices; i.e if either of the
list_skb members have non head_frag head, then the first one has too".
It turns out this assumption does not hold. We've seen BUG_ON being hit
in skb_segment when skbs on the frag_list had differing head_frag with
the vmxnet3 driver. This happens because __netdev_alloc_skb and
__napi_alloc_skb can return a skb that is page backed or kmalloced
depending on the requested size. As the result, the last small skb in
the GRO packet can be kmalloced.
There are three different locations where this can be fixed:
(1) We could check head_frag in GRO and not allow GROing skbs with
different head_frag. However, that would lead to performance
regression on normal forward paths with unmodified gso_size, where
!head_frag in the last packet is not a problem.
(2) Set a flag in bpf_skb_net_grow and bpf_skb_net_shrink indicating
that NETIF_F_SG is undesirable. That would need to eat a bit in
sk_buff. Furthermore, that flag can be unset when all skbs on the
frag_list are page backed. To retain good performance,
bpf_skb_net_grow/shrink would have to walk the frag_list.
(3) Walk the frag_list in skb_segment when determining whether
NETIF_F_SG should be cleared. This of course slows things down.
This patch implements (3). To limit the performance impact in
skb_segment, the list is walked only for skbs with SKB_GSO_DODGY set
that have gso_size changed. Normal paths thus will not hit it.
We could check only the last skb but since we need to walk the whole
list anyway, let's stay on the safe side.
Fixes: 3dcbdb134f32 ("net: gso: Fix skb_segment splat when splitting gso_size mangled skb having linear-headed frag_list")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/e04426a6a91baf4d1081e1b478c82b5de25fdf21.1667407944.git.jbenc@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f1db20814af532f85e091231223e5e4818e8464b ]
Some helper functions will allocate memory. To avoid memory leaks, the
verifier requires the eBPF program to release these memories by calling
the corresponding helper functions.
When a resource is released, all pointer registers corresponding to the
resource should be invalidated. The verifier use release_references() to
do this job, by apply __mark_reg_unknown() to each relevant register.
It will give these registers the type of SCALAR_VALUE. A register that
will contain a pointer value at runtime, but of type SCALAR_VALUE, which
may allow the unprivileged user to get a kernel pointer by storing this
register into a map.
Using __mark_reg_not_init() while NOT allow_ptr_leaks can mitigate this
problem.
Fixes: fd978bf7fd31 ("bpf: Add reference tracking to verifier")
Signed-off-by: Youlin Li <liulin063@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221103093440.3161-1-liulin063@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b239da34203f49c40b5d656220c39647c3ff0b3c ]
For a lot of use cases in future patches, we will want to modify the
state of registers part of some same 'group' (e.g. same ref_obj_id). It
won't just be limited to releasing reference state, but setting a type
flag dynamically based on certain actions, etc.
Hence, we need a way to easily pass a callback to the function that
iterates over all registers in current bpf_verifier_state in all frames
upto (and including) the curframe.
While in C++ we would be able to easily use a lambda to pass state and
the callback together, sadly we aren't using C++ in the kernel. The next
best thing to avoid defining a function for each case seems like
statement expressions in GNU C. The kernel already uses them heavily,
hence they can passed to the macro in the style of a lambda. The
statement expression will then be substituted in the for loop bodies.
Variables __state and __reg are set to current bpf_func_state and reg
for each invocation of the expression inside the passed in verifier
state.
Then, convert mark_ptr_or_null_regs, clear_all_pkt_pointers,
release_reference, find_good_pkt_pointers, find_equal_scalars to
use bpf_for_each_reg_in_vstate.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220904204145.3089-16-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Stable-dep-of: f1db20814af5 ("bpf: Fix wrong reg type conversion in release_reference()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8bbabb3fddcd0f858be69ed5abc9b470a239d6f2 ]
Stanislav reported a lockdep warning, which is caused by the
cancel_work_sync() called inside sock_map_close(), as analyzed
below by Jakub:
psock->work.func = sk_psock_backlog()
ACQUIRE psock->work_mutex
sk_psock_handle_skb()
skb_send_sock()
__skb_send_sock()
sendpage_unlocked()
kernel_sendpage()
sock->ops->sendpage = inet_sendpage()
sk->sk_prot->sendpage = tcp_sendpage()
ACQUIRE sk->sk_lock
tcp_sendpage_locked()
RELEASE sk->sk_lock
RELEASE psock->work_mutex
sock_map_close()
ACQUIRE sk->sk_lock
sk_psock_stop()
sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED)
cancel_work_sync()
__cancel_work_timer()
__flush_work()
// wait for psock->work to finish
RELEASE sk->sk_lock
We can move the cancel_work_sync() out of the sock lock protection,
but still before saved_close() was called.
Fixes: 799aa7f98d53 ("skmsg: Avoid lock_sock() in sk_psock_backlog()")
Reported-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20221102043417.279409-1-xiyou.wangcong@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 697fb80a53642be624f5121b6ca9d66769c180e0 ]
syzbot reproduced the bug ...
BUG: sleeping function called from invalid context at kernel/workqueue.c:3010
... with the following stack trace fragment ...
start_flush_work kernel/workqueue.c:3010 [inline]
__flush_work+0x109/0xb10 kernel/workqueue.c:3074
__cancel_work_timer+0x3f9/0x570 kernel/workqueue.c:3162
sk_psock_stop+0x4cb/0x630 net/core/skmsg.c:802
sock_map_destroy+0x333/0x760 net/core/sock_map.c:1581
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1130
__tcp_close+0xd5b/0x12b0 net/ipv4/tcp.c:2897
tcp_close+0x29/0xc0 net/ipv4/tcp.c:2909
... introduced by d8616ee2affc. Do a quick trace of the code path and the
bug is obvious:
inet_csk_destroy_sock(sk)
sk_prot->destroy(sk); <--- sock_map_destroy
sk_psock_stop(, true); <--- true so cancel workqueue
cancel_work_sync() <--- splat, because *_bh_disable()
We can not call cancel_work_sync() from inside destroy path. So mark
the sk_psock_stop call to skip this cancel_work_sync(). This will avoid
the BUG, but means we may run sk_psock_backlog after or during the
destroy op. We zapped the ingress_skb queue in sk_psock_stop (safe to
do with local_bh_disable) so its empty and the sk_psock_backlog work
item will not find any pkts to process here. However, because we are
not going to wait for it or clear its ->state its possible it kicks off
or is already running. This should be 'safe' up until psock drops its
refcnt to psock->sk. The sock_put() that drops this reference is only
done at psock destroy time from sk_psock_destroy(). This is done through
workqueue when sk_psock_drop() is called on psock refnt reaches 0.
And importantly sk_psock_destroy() does a cancel_work_sync(). So trivial
fix works.
I've had hit or miss luck reproducing this caught it once or twice with
the provided reproducer when running with many runners. However, syzkaller
is very good at reproducing so relying on syzkaller to verify fix.
Fixes: d8616ee2affc ("bpf, sockmap: Fix sk->sk_forward_alloc warn_on in sk_stream_kill_queues")
Reported-by: syzbot+140186ceba0c496183bc@syzkaller.appspotmail.com
Suggested-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Yufen <wangyufen@huawei.com>
Link: https://lore.kernel.org/bpf/20220628035803.317876-1-john.fastabend@gmail.com
Stable-dep-of: 8bbabb3fddcd ("bpf, sock_map: Move cancel_work_sync() out of sock lock")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d8616ee2affcff37c5d315310da557a694a3303d ]
During TCP sockmap redirect pressure test, the following warning is triggered:
WARNING: CPU: 3 PID: 2145 at net/core/stream.c:205 sk_stream_kill_queues+0xbc/0xd0
CPU: 3 PID: 2145 Comm: iperf Kdump: loaded Tainted: G W 5.10.0+ #9
Call Trace:
inet_csk_destroy_sock+0x55/0x110
inet_csk_listen_stop+0xbb/0x380
tcp_close+0x41b/0x480
inet_release+0x42/0x80
__sock_release+0x3d/0xa0
sock_close+0x11/0x20
__fput+0x9d/0x240
task_work_run+0x62/0x90
exit_to_user_mode_prepare+0x110/0x120
syscall_exit_to_user_mode+0x27/0x190
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The reason we observed is that:
When the listener is closing, a connection may have completed the three-way
handshake but not accepted, and the client has sent some packets. The child
sks in accept queue release by inet_child_forget()->inet_csk_destroy_sock(),
but psocks of child sks have not released.
To fix, add sock_map_destroy to release psocks.
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20220524075311.649153-1-wangyufen@huawei.com
Stable-dep-of: 8bbabb3fddcd ("bpf, sock_map: Move cancel_work_sync() out of sock lock")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b5bcb94b0954a026bbd671741fdb00e7141f9c91 ]
If hid_add_device() returns error, it should call hid_destroy_device()
to free hid_dev which is allocated in hid_allocate_device().
Fixes: 74c4fb058083 ("HID: hv_mouse: Properly add the hid device")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 34de8e6e0e1f66e431abf4123934a2581cb5f133 ]
When using bpftool to pin {PROG, MAP, LINK} without FILE,
segmentation fault will occur. The reson is that the lack
of FILE will cause strlen to trigger NULL pointer dereference.
The corresponding stacktrace is shown below:
do_pin
do_pin_any
do_pin_fd
mount_bpffs_for_pin
strlen(name) <- NULL pointer dereference
Fix it by adding validation to the common process.
Fixes: 75a1e792c335 ("tools: bpftool: Allow all prog/map handles for pinning objects")
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20221102084034.3342995-1-pulehui@huaweicloud.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 30ac96f7cc973bb850c718c9bbe1fdcedfbe826b ]
The TWT Information Frame Disabled bit of control field of TWT Setup
frame shall be set to 1 since handling TWT Information frame is not
supported by current mac80211 implementation.
Fixes: f5a4c24e689f ("mac80211: introduce individual TWT support in AP mode")
Signed-off-by: Howard Hsu <howard-yh.hsu@mediatek.com>
Link: https://lore.kernel.org/r/20221027015653.1448-1-howard-yh.hsu@mediatek.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8ec95b94716a1e4d126edc3fb2bc426a717e2dba ]
When running `test_sockmap` selftests, the following warning appears:
WARNING: CPU: 2 PID: 197 at net/core/stream.c:205 sk_stream_kill_queues+0xd3/0xf0
Call Trace:
<TASK>
inet_csk_destroy_sock+0x55/0x110
tcp_rcv_state_process+0xd28/0x1380
? tcp_v4_do_rcv+0x77/0x2c0
tcp_v4_do_rcv+0x77/0x2c0
__release_sock+0x106/0x130
__tcp_close+0x1a7/0x4e0
tcp_close+0x20/0x70
inet_release+0x3c/0x80
__sock_release+0x3a/0xb0
sock_close+0x14/0x20
__fput+0xa3/0x260
task_work_run+0x59/0xb0
exit_to_user_mode_prepare+0x1b3/0x1c0
syscall_exit_to_user_mode+0x19/0x50
do_syscall_64+0x48/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
The root case is in commit 84472b436e76 ("bpf, sockmap: Fix more uncharged
while msg has more_data"), where I used msg->sg.size to replace the tosend,
causing breakage:
if (msg->apply_bytes && msg->apply_bytes < tosend)
tosend = psock->apply_bytes;
Fixes: 84472b436e76 ("bpf, sockmap: Fix more uncharged while msg has more_data")
Reported-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/1667266296-8794-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>