1059050 Commits

Author SHA1 Message Date
Dan Carpenter
bce3680b48 drm/i915: unpin on error in intel_vgpu_shadow_mm_pin()
[ Upstream commit 3792fc508c095abd84b10ceae12bd773e61fdc36 ]

Call intel_vgpu_unpin_mm() on this error path.

Fixes: 418741480809 ("drm/i915/gvt: Adding ppgtt to GVT GEM context after shadow pdps settled.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/Y3OQ5tgZIVxyQ/WV@kili
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:16 +01:00
Namhyung Kim
da6a3653b8 perf stat: Fix handling of --for-each-cgroup with --bpf-counters to match non BPF mode
[ Upstream commit 54b353a20c7e8be98414754f5aff98c8a68fcc1f ]

The --for-each-cgroup can have the same cgroup multiple times, but this
confuses BPF counters (since they have the same cgroup id), making only
the last cgroup events to be counted.

Let's check the cgroup name before adding a new entry to the cgroups
list.

Before:

  $ sudo ./perf stat -a --bpf-counters --for-each-cgroup /,/ sleep 1

   Performance counter stats for 'system wide':

       <not counted> msec cpu-clock                        /
       <not counted>      context-switches                 /
       <not counted>      cpu-migrations                   /
       <not counted>      page-faults                      /
       <not counted>      cycles                           /
       <not counted>      instructions                     /
       <not counted>      branches                         /
       <not counted>      branch-misses                    /
            8,016.04 msec cpu-clock                        /                #    7.998 CPUs utilized
               6,152      context-switches                 /                #  767.461 /sec
                 250      cpu-migrations                   /                #   31.187 /sec
                 442      page-faults                      /                #   55.139 /sec
         613,111,487      cycles                           /                #    0.076 GHz
         280,599,604      instructions                     /                #    0.46  insn per cycle
          57,692,724      branches                         /                #    7.197 M/sec
           3,385,168      branch-misses                    /                #    5.87% of all branches

         1.002220125 seconds time elapsed

After it becomes similar to the non-BPF mode:

  $ sudo ./perf stat -a --bpf-counters --for-each-cgroup /,/  sleep 1

   Performance counter stats for 'system wide':

            8,013.38 msec cpu-clock                        /                #    7.998 CPUs utilized
               6,859      context-switches                 /                #  855.944 /sec
                 334      cpu-migrations                   /                #   41.680 /sec
                 345      page-faults                      /                #   43.053 /sec
         782,326,119      cycles                           /                #    0.098 GHz
         471,645,724      instructions                     /                #    0.60  insn per cycle
          94,963,430      branches                         /                #   11.851 M/sec
           3,685,511      branch-misses                    /                #    3.88% of all branches

         1.001864539 seconds time elapsed

Committer notes:

As a reminder, to test with BPF counters one has to use BUILD_BPF_SKEL=1
in the make command line and have clang/llvm installed when building
perf, otherwise the --bpf-counters option will not be available:

  # perf stat -a --bpf-counters --for-each-cgroup /,/ sleep 1
  Error: unknown option `bpf-counters'

   Usage: perf stat [<options>] [<command>]

      -a, --all-cpus        system-wide collection from all CPUs
  <SNIP>
  #

Fixes: bb1c15b60b981d10 ("perf stat: Support regex pattern in --for-each-cgroup")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: bpf@vger.kernel.org
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/r/20230104064402.1551516-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:16 +01:00
Szymon Heidrich
11cd4ec635 usb: rndis_host: Secure rndis_query check against int overflow
[ Upstream commit c7dd13805f8b8fc1ce3b6d40f6aff47e66b72ad2 ]

Variables off and len typed as uint32 in rndis_query function
are controlled by incoming RNDIS response message thus their
value may be manipulated. Setting off to a unexpectetly large
value will cause the sum with len and 8 to overflow and pass
the implemented validation step. Consequently the response
pointer will be referring to a location past the expected
buffer boundaries allowing information leakage e.g. via
RNDIS_OID_802_3_PERMANENT_ADDRESS OID.

Fixes: ddda08624013 ("USB: rndis_host, various cleanups")
Signed-off-by: Szymon Heidrich <szymon.heidrich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:16 +01:00
Geetha sowjanya
6ea5273c71 octeontx2-pf: Fix lmtst ID used in aura free
[ Upstream commit 4af1b64f80fbe1275fb02c5f1c0cef099a4a231f ]

Current code uses per_cpu pointer to get the lmtst_id mapped to
the core on which aura_free() is executed. Using per_cpu pointer
without preemption disable causing mismatch between lmtst_id and
core on which pointer gets freed. This patch fixes the issue by
disabling preemption around aura_free.

Fixes: ef6c8da71eaf ("octeontx2-pf: cn10K: Reserve LMTST lines per core")
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:15 +01:00
Daniil Tatianin
4e5f2c74cb drivers/net/bonding/bond_3ad: return when there's no aggregator
[ Upstream commit 9c807965483f42df1d053b7436eedd6cf28ece6f ]

Otherwise we would dereference a NULL aggregator pointer when calling
__set_agg_ports_ready on the line below.

Found by Linux Verification Center (linuxtesting.org) with the SVACE
static analysis tool.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:15 +01:00
Tetsuo Handa
8414983c2e fs/ntfs3: don't hold ni_lock when calling truncate_setsize()
[ Upstream commit 0226635c304cfd5c9db9b78c259cb713819b057e ]

syzbot is reporting hung task at do_user_addr_fault() [1], for there is
a silent deadlock between PG_locked bit and ni_lock lock.

Since filemap_update_page() calls filemap_read_folio() after calling
folio_trylock() which will set PG_locked bit, ntfs_truncate() must not
call truncate_setsize() which will wait for PG_locked bit to be cleared
when holding ni_lock lock.

Link: https://lore.kernel.org/all/00000000000060d41f05f139aa44@google.com/
Link: https://syzkaller.appspot.com/bug?extid=bed15dbf10294aa4f2ae [1]
Reported-by: syzbot <syzbot+bed15dbf10294aa4f2ae@syzkaller.appspotmail.com>
Debugged-by: Linus Torvalds <torvalds@linux-foundation.org>
Co-developed-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 4342306f0f0d ("fs/ntfs3: Add file operations and implementation")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:15 +01:00
Philipp Zabel
a23e8376e6 drm/imx: ipuv3-plane: Fix overlay plane width
[ Upstream commit 92d43bd3bc9728c1fb114d7011d46f5ea9489e28 ]

ipu_src_rect_width() was introduced to support odd screen resolutions
such as 1366x768 by internally rounding up primary plane width to a
multiple of 8 and compensating with reduced horizontal blanking.
This also caused overlay plane width to be rounded up, which was not
intended. Fix overlay plane width by limiting the rounding up to the
primary plane.

drm_rect_width(&new_state->src) >> 16 is the same value as
drm_rect_width(dst) because there is no plane scaling support.

Fixes: 94dfec48fca7 ("drm/imx: Add 8 pixel alignment fix")
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Link: https://lore.kernel.org/r/20221108141420.176696-1-p.zabel@pengutronix.de
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20221108141420.176696-1-p.zabel@pengutronix.de
Tested-by: Ian Ray <ian.ray@ge.com>
(cherry picked from commit 4333472f8d7befe62359fecb1083cd57a6e07bfc)
Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:15 +01:00
Miaoqian Lin
a8f7fd322f perf tools: Fix resources leak in perf_data__open_dir()
[ Upstream commit 0a6564ebd953c4590663c9a3c99a3ea9920ade6f ]

In perf_data__open_dir(), opendir() opens the directory stream.  Add
missing closedir() to release it after use.

Fixes: eb6176709b235b96 ("perf data: Add perf_data__open_dir_data function")
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221229090903.1402395-1-linmq006@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:15 +01:00
Jozsef Kadlecsik
a1e1521b46 netfilter: ipset: Rework long task execution when adding/deleting entries
[ Upstream commit 5e29dc36bd5e2166b834ceb19990d9e68a734d7d ]

When adding/deleting large number of elements in one step in ipset, it can
take a reasonable amount of time and can result in soft lockup errors. The
patch 5f7b51bf09ba ("netfilter: ipset: Limit the maximal range of
consecutive elements to add/delete") tried to fix it by limiting the max
elements to process at all. However it was not enough, it is still possible
that we get hung tasks. Lowering the limit is not reasonable, so the
approach in this patch is as follows: rely on the method used at resizing
sets and save the state when we reach a smaller internal batch limit,
unlock/lock and proceed from the saved state. Thus we can avoid long
continuous tasks and at the same time removed the limit to add/delete large
number of elements in one step.

The nfnl mutex is held during the whole operation which prevents one to
issue other ipset commands in parallel.

Fixes: 5f7b51bf09ba ("netfilter: ipset: Limit the maximal range of consecutive elements to add/delete")
Reported-by: syzbot+9204e7399656300bf271@syzkaller.appspotmail.com
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:15 +01:00
Jozsef Kadlecsik
6f19a38483 netfilter: ipset: fix hash:net,port,net hang with /0 subnet
[ Upstream commit a31d47be64b9b74f8cfedffe03e0a8a1f9e51f23 ]

The hash:net,port,net set type supports /0 subnets. However, the patch
commit 5f7b51bf09baca8e titled "netfilter: ipset: Limit the maximal range
of consecutive elements to add/delete" did not take into account it and
resulted in an endless loop. The bug is actually older but the patch
5f7b51bf09baca8e brings it out earlier.

Handle /0 subnets properly in hash:net,port,net set types.

Fixes: 5f7b51bf09ba ("netfilter: ipset: Limit the maximal range of consecutive elements to add/delete")
Reported-by: Марк Коренберг <socketpair@gmail.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:15 +01:00
Horatiu Vultur
774d259749 net: sparx5: Fix reading of the MAC address
[ Upstream commit 588ab2dc25f60efeb516b4abedb6c551949cc185 ]

There is an issue with the checking of the return value of
'of_get_mac_address', which returns 0 on success and negative value on
failure. The driver interpretated the result the opposite way. Therefore
if there was a MAC address defined in the DT, then the driver was
generating a random MAC address otherwise it would use address 0.
Fix this by checking correctly the return value of 'of_get_mac_address'

Fixes: b74ef9f9cb91 ("net: sparx5: Do not use mac_addr uninitialized in mchp_sparx5_probe()")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:14 +01:00
Jamal Hadi Salim
04dc4003e5 net: sched: cbq: dont intepret cls results when asked to drop
[ Upstream commit caa4b35b4317d5147b3ab0fbdc9c075c7d2e9c12 ]

If asked to drop a packet via TC_ACT_SHOT it is unsafe to assume that
res.class contains a valid pointer

Sample splat reported by Kyle Zeng

[    5.405624] 0: reclassify loop, rule prio 0, protocol 800
[    5.406326] ==================================================================
[    5.407240] BUG: KASAN: slab-out-of-bounds in cbq_enqueue+0x54b/0xea0
[    5.407987] Read of size 1 at addr ffff88800e3122aa by task poc/299
[    5.408731]
[    5.408897] CPU: 0 PID: 299 Comm: poc Not tainted 5.10.155+ #15
[    5.409516] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.15.0-1 04/01/2014
[    5.410439] Call Trace:
[    5.410764]  dump_stack+0x87/0xcd
[    5.411153]  print_address_description+0x7a/0x6b0
[    5.411687]  ? vprintk_func+0xb9/0xc0
[    5.411905]  ? printk+0x76/0x96
[    5.412110]  ? cbq_enqueue+0x54b/0xea0
[    5.412323]  kasan_report+0x17d/0x220
[    5.412591]  ? cbq_enqueue+0x54b/0xea0
[    5.412803]  __asan_report_load1_noabort+0x10/0x20
[    5.413119]  cbq_enqueue+0x54b/0xea0
[    5.413400]  ? __kasan_check_write+0x10/0x20
[    5.413679]  __dev_queue_xmit+0x9c0/0x1db0
[    5.413922]  dev_queue_xmit+0xc/0x10
[    5.414136]  ip_finish_output2+0x8bc/0xcd0
[    5.414436]  __ip_finish_output+0x472/0x7a0
[    5.414692]  ip_finish_output+0x5c/0x190
[    5.414940]  ip_output+0x2d8/0x3c0
[    5.415150]  ? ip_mc_finish_output+0x320/0x320
[    5.415429]  __ip_queue_xmit+0x753/0x1760
[    5.415664]  ip_queue_xmit+0x47/0x60
[    5.415874]  __tcp_transmit_skb+0x1ef9/0x34c0
[    5.416129]  tcp_connect+0x1f5e/0x4cb0
[    5.416347]  tcp_v4_connect+0xc8d/0x18c0
[    5.416577]  __inet_stream_connect+0x1ae/0xb40
[    5.416836]  ? local_bh_enable+0x11/0x20
[    5.417066]  ? lock_sock_nested+0x175/0x1d0
[    5.417309]  inet_stream_connect+0x5d/0x90
[    5.417548]  ? __inet_stream_connect+0xb40/0xb40
[    5.417817]  __sys_connect+0x260/0x2b0
[    5.418037]  __x64_sys_connect+0x76/0x80
[    5.418267]  do_syscall_64+0x31/0x50
[    5.418477]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
[    5.418770] RIP: 0033:0x473bb7
[    5.418952] Code: 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00
00 00 90 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2a 00 00
00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 18 89 54 24 0c 48 89 34
24 89
[    5.420046] RSP: 002b:00007fffd20eb0f8 EFLAGS: 00000246 ORIG_RAX:
000000000000002a
[    5.420472] RAX: ffffffffffffffda RBX: 00007fffd20eb578 RCX: 0000000000473bb7
[    5.420872] RDX: 0000000000000010 RSI: 00007fffd20eb110 RDI: 0000000000000007
[    5.421271] RBP: 00007fffd20eb150 R08: 0000000000000001 R09: 0000000000000004
[    5.421671] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
[    5.422071] R13: 00007fffd20eb568 R14: 00000000004fc740 R15: 0000000000000002
[    5.422471]
[    5.422562] Allocated by task 299:
[    5.422782]  __kasan_kmalloc+0x12d/0x160
[    5.423007]  kasan_kmalloc+0x5/0x10
[    5.423208]  kmem_cache_alloc_trace+0x201/0x2e0
[    5.423492]  tcf_proto_create+0x65/0x290
[    5.423721]  tc_new_tfilter+0x137e/0x1830
[    5.423957]  rtnetlink_rcv_msg+0x730/0x9f0
[    5.424197]  netlink_rcv_skb+0x166/0x300
[    5.424428]  rtnetlink_rcv+0x11/0x20
[    5.424639]  netlink_unicast+0x673/0x860
[    5.424870]  netlink_sendmsg+0x6af/0x9f0
[    5.425100]  __sys_sendto+0x58d/0x5a0
[    5.425315]  __x64_sys_sendto+0xda/0xf0
[    5.425539]  do_syscall_64+0x31/0x50
[    5.425764]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
[    5.426065]
[    5.426157] The buggy address belongs to the object at ffff88800e312200
[    5.426157]  which belongs to the cache kmalloc-128 of size 128
[    5.426955] The buggy address is located 42 bytes to the right of
[    5.426955]  128-byte region [ffff88800e312200, ffff88800e312280)
[    5.427688] The buggy address belongs to the page:
[    5.427992] page:000000009875fabc refcount:1 mapcount:0
mapping:0000000000000000 index:0x0 pfn:0xe312
[    5.428562] flags: 0x100000000000200(slab)
[    5.428812] raw: 0100000000000200 dead000000000100 dead000000000122
ffff888007843680
[    5.429325] raw: 0000000000000000 0000000000100010 00000001ffffffff
ffff88800e312401
[    5.429875] page dumped because: kasan: bad access detected
[    5.430214] page->mem_cgroup:ffff88800e312401
[    5.430471]
[    5.430564] Memory state around the buggy address:
[    5.430846]  ffff88800e312180: fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc
[    5.431267]  ffff88800e312200: 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 fc
[    5.431705] >ffff88800e312280: fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc
[    5.432123]                                   ^
[    5.432391]  ffff88800e312300: 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 fc
[    5.432810]  ffff88800e312380: fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc
[    5.433229] ==================================================================
[    5.433648] Disabling lock debugging due to kernel taint

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Kyle Zeng <zengyhkyle@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:14 +01:00
Jamal Hadi Salim
f02327a487 net: sched: atm: dont intepret cls results when asked to drop
[ Upstream commit a2965c7be0522eaa18808684b7b82b248515511b ]

If asked to drop a packet via TC_ACT_SHOT it is unsafe to assume
res.class contains a valid pointer
Fixes: b0188d4dbe5f ("[NET_SCHED]: sch_atm: Lindent")

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:14 +01:00
Miaoqian Lin
95da1882ce gpio: sifive: Fix refcount leak in sifive_gpio_probe
[ Upstream commit 694175cd8a1643cde3acb45c9294bca44a8e08e9 ]

of_irq_find_parent() returns a node pointer with refcount incremented,
We should use of_node_put() on it when not needed anymore.
Add missing of_node_put() to avoid refcount leak.

Fixes: 96868dce644d ("gpio/sifive: Add GPIO driver for SiFive SoCs")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:14 +01:00
Xiubo Li
da9c9883ec ceph: switch to vfs_inode_has_locks() to fix file lock bug
[ Upstream commit 461ab10ef7e6ea9b41a0571a7fc6a72af9549a3c ]

For the POSIX locks they are using the same owner, which is the
thread id. And multiple POSIX locks could be merged into single one,
so when checking whether the 'file' has locks may fail.

For a file where some openers use locking and others don't is a
really odd usage pattern though. Locks are like stoplights -- they
only work if everyone pays attention to them.

Just switch ceph_get_caps() to check whether any locks are set on
the inode. If there are POSIX/OFD/FLOCK locks on the file at the
time, we should set CHECK_FILELOCK, regardless of what fd was used
to set the lock.

Fixes: ff5d913dfc71 ("ceph: return -EIO if read/write against filp that lost file locks")
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:14 +01:00
Jeff Layton
54e72ce5f1 filelock: new helper: vfs_inode_has_locks
[ Upstream commit ab1ddef98a715eddb65309ffa83267e4e84a571e ]

Ceph has a need to know whether a particular inode has any locks set on
it. It's currently tracking that by a num_locks field in its
filp->private_data, but that's problematic as it tries to decrement this
field when releasing locks and that can race with the file being torn
down.

Add a new vfs_inode_has_locks helper that just returns whether any locks
are currently held on the inode.

Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Stable-dep-of: 461ab10ef7e6 ("ceph: switch to vfs_inode_has_locks() to fix file lock bug")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:14 +01:00
Carlo Caione
f34b03ce3a drm/meson: Reduce the FIFO lines held when AFBC is not used
[ Upstream commit 3b754ed6d1cd90017e66e5cc16f3923e4a952ffc ]

Having a bigger number of FIFO lines held after vsync is only useful to
SoCs using AFBC to give time to the AFBC decoder to be reset, configured
and enabled again.

For SoCs not using AFBC this, on the contrary, is causing on some
displays issues and a few pixels vertical offset in the displayed image.

Conditionally increase the number of lines held after vsync only for
SoCs using AFBC, leaving the default value for all the others.

Fixes: 24e0d4058eff ("drm/meson: hold 32 lines after vsync to give time for AFBC start")
Signed-off-by: Carlo Caione <ccaione@baylibre.com>
Acked-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Acked-by: Neil Armstrong <neil.armstrong@linaro.org>
[narmstrong: added fixes tag]
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20221216-afbc_s905x-v1-0-033bebf780d9@baylibre.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:14 +01:00
Maor Gottlieb
05a8410b0f RDMA/mlx5: Fix validation of max_rd_atomic caps for DC
[ Upstream commit 8de8482fe5732fbef4f5af82bc0c0362c804cd1f ]

Currently, when modifying DC, we validate max_rd_atomic user attribute
against the RC cap, validate against DC. RC and DC QP types have different
device limitations.

This can cause userspace created DC QPs to malfunction.

Fixes: c32a4f296e1d ("IB/mlx5: Add support for DC Initiator QP")
Link: https://lore.kernel.org/r/0c5aee72cea188c3bb770f4207cce7abc9b6fc74.1672231736.git.leonro@nvidia.com
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:13 +01:00
Shay Drory
8d89870d63 RDMA/mlx5: Fix mlx5_ib_get_hw_stats when used for device
[ Upstream commit 38b50aa44495d5eb4218f0b82fc2da76505cec53 ]

Currently, when mlx5_ib_get_hw_stats() is used for device (port_num = 0),
there is a special handling in order to use the correct counters, but,
port_num is being passed down the stack without any change.  Also, some
functions assume that port_num >=1. As a result, the following oops can
occur.

 BUG: unable to handle page fault for address: ffff89510294f1a8
 #PF: supervisor write access in kernel mode
 #PF: error_code(0x0002) - not-present page
 PGD 0 P4D 0
 Oops: 0002 [#1] SMP
 CPU: 8 PID: 1382 Comm: devlink Tainted: G W          6.1.0-rc4_for_upstream_base_2022_11_10_16_12 #1
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 RIP: 0010:_raw_spin_lock+0xc/0x20
 Call Trace:
  <TASK>
  mlx5_ib_get_native_port_mdev+0x73/0xe0 [mlx5_ib]
  do_get_hw_stats.constprop.0+0x109/0x160 [mlx5_ib]
  mlx5_ib_get_hw_stats+0xad/0x180 [mlx5_ib]
  ib_setup_device_attrs+0xf0/0x290 [ib_core]
  ib_register_device+0x3bb/0x510 [ib_core]
  ? atomic_notifier_chain_register+0x67/0x80
  __mlx5_ib_add+0x2b/0x80 [mlx5_ib]
  mlx5r_probe+0xb8/0x150 [mlx5_ib]
  ? auxiliary_match_id+0x6a/0x90
  auxiliary_bus_probe+0x3c/0x70
  ? driver_sysfs_add+0x6b/0x90
  really_probe+0xcd/0x380
  __driver_probe_device+0x80/0x170
  driver_probe_device+0x1e/0x90
  __device_attach_driver+0x7d/0x100
  ? driver_allows_async_probing+0x60/0x60
  ? driver_allows_async_probing+0x60/0x60
  bus_for_each_drv+0x7b/0xc0
  __device_attach+0xbc/0x200
  bus_probe_device+0x87/0xa0
  device_add+0x404/0x940
  ? dev_set_name+0x53/0x70
  __auxiliary_device_add+0x43/0x60
  add_adev+0x99/0xe0 [mlx5_core]
  mlx5_attach_device+0xc8/0x120 [mlx5_core]
  mlx5_load_one_devl_locked+0xb2/0xe0 [mlx5_core]
  devlink_reload+0x133/0x250
  devlink_nl_cmd_reload+0x480/0x570
  ? devlink_nl_pre_doit+0x44/0x2b0
  genl_family_rcv_msg_doit.isra.0+0xc2/0x110
  genl_rcv_msg+0x180/0x2b0
  ? devlink_nl_cmd_region_read_dumpit+0x540/0x540
  ? devlink_reload+0x250/0x250
  ? devlink_put+0x50/0x50
  ? genl_family_rcv_msg_doit.isra.0+0x110/0x110
  netlink_rcv_skb+0x54/0x100
  genl_rcv+0x24/0x40
  netlink_unicast+0x1f6/0x2c0
  netlink_sendmsg+0x237/0x490
  sock_sendmsg+0x33/0x40
  __sys_sendto+0x103/0x160
  ? handle_mm_fault+0x10e/0x290
  ? do_user_addr_fault+0x1c0/0x5f0
  __x64_sys_sendto+0x25/0x30
  do_syscall_64+0x3d/0x90
  entry_SYSCALL_64_after_hwframe+0x46/0xb0

Fix it by setting port_num to 1 in order to get device status and remove
unused variable.

Fixes: aac4492ef23a ("IB/mlx5: Update counter implementation for dual port RoCE")
Link: https://lore.kernel.org/r/98b82994c3cd3fa593b8a75ed3f3901e208beb0f.1672231736.git.leonro@nvidia.com
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:13 +01:00
Miaoqian Lin
4d112f0016 net: phy: xgmiitorgmii: Fix refcount leak in xgmiitorgmii_probe
[ Upstream commit d039535850ee47079d59527e96be18d8e0daa84b ]

of_phy_find_device() return device node with refcount incremented.
Call put_device() to relese it when not needed anymore.

Fixes: ab4e6ee578e8 ("net: phy: xgmiitorgmii: Check phy_driver ready before accessing")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:13 +01:00
David Arinzon
e5fbeb3d16 net: ena: Update NUMA TPH hint register upon NUMA node update
[ Upstream commit a8ee104f986e720cea52133885cc822d459398c7 ]

The device supports a PCIe optimization hint, which indicates on
which NUMA the queue is currently processed. This hint is utilized
by PCIe in order to reduce its access time by accessing the
correct NUMA resources and maintaining cache coherence.

The driver calls the register update for the hint (called TPH -
TLP Processing Hint) during the NAPI loop.

Though the update is expected upon a NUMA change (when a queue
is moved from one NUMA to the other), the current logic performs
a register update when the queue is moved to a different CPU,
but the CPU is not necessarily in a different NUMA.

The changes include:
1. Performing the TPH update only when the queue has switched
a NUMA node.
2. Moving the TPH update call to be triggered only when NAPI was
scheduled from interrupt context, as opposed to a busy-polling loop.
This is due to the fact that during busy-polling, the frequency
of CPU switches for a particular queue is significantly higher,
thus, the likelihood to switch NUMA is much higher. Therefore,
providing the frequent updates to the device upon a NUMA update
are unlikely to be beneficial.

Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: David Arinzon <darinzon@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:13 +01:00
David Arinzon
7840b93cfd net: ena: Set default value for RX interrupt moderation
[ Upstream commit e712f3e4920b3a1a5e6b536827d118e14862896c ]

RX ring can be NULL in XDP use cases where only TX queues
are configured. In this scenario, the RX interrupt moderation
value sent to the device remains in its default value of 0.

In this change, setting the default value of the RX interrupt
moderation to be the same as of the TX.

Fixes: 548c4940b9f1 ("net: ena: Implement XDP_TX action")
Signed-off-by: David Arinzon <darinzon@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:13 +01:00
David Arinzon
d09b7a9d2f net: ena: Fix rx_copybreak value update
[ Upstream commit c7062aaee099f2f43d6f07a71744b44b94b94b34 ]

Make the upper bound on rx_copybreak tighter, by
making sure it is smaller than the minimum of mtu and
ENA_PAGE_SIZE. With the current upper bound of mtu,
rx_copybreak can be larger than a page. Such large
rx_copybreak will not bring any performance benefit to
the user and therefore makes no sense.

In addition, the value update was only reflected in
the adapter structure, but not applied for each ring,
causing it to not take effect.

Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Osama Abboud <osamaabb@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:13 +01:00
David Arinzon
0e7ad9b006 net: ena: Use bitmask to indicate packet redirection
[ Upstream commit 59811faa2c54dbcf44d575b5a8f6e7077da88dc2 ]

Redirecting packets with XDP Redirect is done in two phases:
1. A packet is passed by the driver to the kernel using
   xdp_do_redirect().
2. After finishing polling for new packets the driver lets the kernel
   know that it can now process the redirected packet using
   xdp_do_flush_map().
   The packets' redirection is handled in the napi context of the
   queue that called xdp_do_redirect()

To avoid calling xdp_do_flush_map() each time the driver first checks
whether any packets were redirected, using
	xdp_flags |= xdp_verdict;
and
	if (xdp_flags & XDP_REDIRECT)
	    xdp_do_flush_map()

essentially treating XDP instructions as a bitmask, which isn't the case:
    enum xdp_action {
	    XDP_ABORTED = 0,
	    XDP_DROP,
	    XDP_PASS,
	    XDP_TX,
	    XDP_REDIRECT,
    };

Given the current possible values of xdp_action, the current design
doesn't have a bug (since XDP_REDIRECT = 100b), but it is still
flawed.

This patch makes the driver use a bitmask instead, to avoid future
issues.

Fixes: a318c70ad152 ("net: ena: introduce XDP redirect implementation")
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:13 +01:00
David Arinzon
5d4964984b net: ena: Account for the number of processed bytes in XDP
[ Upstream commit c7f5e34d906320fdc996afa616676161c029cc02 ]

The size of packets that were forwarded or dropped by XDP wasn't added
to the total processed bytes statistic.

Fixes: 548c4940b9f1 ("net: ena: Implement XDP_TX action")
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:13 +01:00
David Arinzon
f17d9aec07 net: ena: Don't register memory info on XDP exchange
[ Upstream commit 9c9e539956fa67efb8a65e32b72a853740b33445 ]

Since the queues aren't destroyed when we only exchange XDP programs,
there's no need to re-register them again.

Fixes: 548c4940b9f1 ("net: ena: Implement XDP_TX action")
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:12 +01:00
David Arinzon
a4aa727ad0 net: ena: Fix toeplitz initial hash value
[ Upstream commit 332b49ff637d6c1a75b971022a8b992cf3c57db1 ]

On driver initialization, RSS hash initial value is set to zero,
instead of the default value. This happens because we pass NULL as
the RSS key parameter, which caused us to never initialize
the RSS hash value.

This patch fixes it by making sure the initial value is set, no matter
what the value of the RSS key is.

Fixes: 91a65b7d3ed8 ("net: ena: fix potential crash when rxfh key is NULL")
Signed-off-by: Nati Koler <nkoler@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:12 +01:00
Jiguang Xiao
0bec17f1ce net: amd-xgbe: add missed tasklet_kill
[ Upstream commit d530ece70f16f912e1d1bfeea694246ab78b0a4b ]

The driver does not call tasklet_kill in several places.
Add the calls to fix it.

Fixes: 85b85c853401 ("amd-xgbe: Re-issue interrupt if interrupt status not cleared")
Signed-off-by: Jiguang Xiao <jiguang.xiao@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:12 +01:00
Adham Faris
cb2f74685f net/mlx5e: Fix hw mtu initializing at XDP SQ allocation
[ Upstream commit 1e267ab88dc44c48f556218f7b7f14c76f7aa066 ]

Current xdp xmit functions logic (mlx5e_xmit_xdp_frame_mpwqe or
mlx5e_xmit_xdp_frame), validates xdp packet length by comparing it to
hw mtu (configured at xdp sq allocation) before xmiting it. This check
does not account for ethernet fcs length (calculated and filled by the
nic). Hence, when we try sending packets with length > (hw-mtu -
ethernet-fcs-size), the device port drops it and tx_errors_phy is
incremented. Desired behavior is to catch these packets and drop them
by the driver.

Fix this behavior in XDP SQ allocation function (mlx5e_alloc_xdpsq) by
subtracting ethernet FCS header size (4 Bytes) from current hw mtu
value, since ethernet FCS is calculated and written to ethernet frames
by the nic.

Fixes: d8bec2b29a82 ("net/mlx5e: Support bpf_xdp_adjust_head()")
Signed-off-by: Adham Faris <afaris@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:12 +01:00
Chris Mi
6c72abb78b net/mlx5e: Always clear dest encap in neigh-update-del
[ Upstream commit 2951b2e142ecf6e0115df785ba91e91b6da74602 ]

The cited commit introduced a bug for multiple encapsulations flow.
If one dest encap becomes invalid, the flow is set slow path flag.
But when other dests encap become invalid, they are not cleared due
to slow path flag of the flow. When neigh-update-add is running, it
will use invalid encap.

Fix it by checking slow path flag after clearing dest encap.

Fixes: 9a5f9cc794e1 ("net/mlx5e: Fix possible use-after-free deleting fdb rule")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:12 +01:00
Roi Dayan
b36783bc11 net/mlx5e: TC, Refactor mlx5e_tc_add_flow_mod_hdr() to get flow attr
[ Upstream commit ff99316700799b84e842f819a44db608557bae3e ]

In later commit we are going to instantiate multiple attr instances
for flow instead of single attr.
Make sure mlx5e_tc_add_flow_mod_hdr() use the correct attr and not flow->attr.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Stable-dep-of: 2951b2e142ec ("net/mlx5e: Always clear dest encap in neigh-update-del")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:12 +01:00
Dragos Tatulea
f8c10eeba3 net/mlx5e: IPoIB, Don't allow CQE compression to be turned on by default
[ Upstream commit b12d581e83e3ae1080c32ab83f123005bd89a840 ]

mlx5e_build_nic_params will turn CQE compression on if the hardware
capability is enabled and the slow_pci_heuristic condition is detected.
As IPoIB doesn't support CQE compression, make sure to disable the
feature in the IPoIB profile init.

Please note that the feature is not exposed to the user for IPoIB
interfaces, so it can't be subsequently turned on.

Fixes: b797a684b0dd ("net/mlx5e: Enable CQE compression when PCI is slower than link")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:12 +01:00
Shay Drory
7227bbb7c1 net/mlx5: Avoid recovery in probe flows
[ Upstream commit 9078e843efec530f279a155f262793c58b0746bd ]

Currently, recovery is done without considering whether the device is
still in probe flow.
This may lead to recovery before device have finished probed
successfully. e.g.: while mlx5_init_one() is running. Recovery flow is
using functionality that is loaded only by mlx5_init_one(), and there
is no point in running recovery without mlx5_init_one() finished
successfully.

Fix it by waiting for probe flow to finish and checking whether the
device is probed before trying to perform recovery.

Fixes: 51d138c2610a ("net/mlx5: Fix health error state handling")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:11 +01:00
Jiri Pirko
9369b9afa8 net/mlx5: Add forgotten cleanup calls into mlx5_init_once() error path
[ Upstream commit 2a35b2c2e6a252eda2134aae6a756861d9299531 ]

There are two cleanup calls missing in mlx5_init_once() error path.
Add them making the error path flow to be the same as
mlx5_cleanup_once().

Fixes: 52ec462eca9b ("net/mlx5: Add reserved-gids support")
Fixes: 7c39afb394c7 ("net/mlx5: PTP code migration to driver core section")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:11 +01:00
Moshe Shemesh
d966f2ee4b net/mlx5: E-Switch, properly handle ingress tagged packets on VST
[ Upstream commit 1f0ae22ab470946143485a02cc1cd7e05c0f9120 ]

Fix SRIOV VST mode behavior to insert cvlan when a guest tag is already
present in the frame. Previous VST mode behavior was to drop packets or
override existing tag, depending on the device version.

In this patch we fix this behavior by correctly building the HW steering
rule with a push vlan action, or for older devices we ask the FW to stack
the vlan when a vlan is already present.

Fixes: 07bab9502641 ("net/mlx5: E-Switch, Refactor eswitch ingress acl codes")
Fixes: dfcb1ed3c331 ("net/mlx5: E-Switch, Vport ingress/egress ACLs rules for VST mode")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:11 +01:00
Stefano Garzarella
6a37a01aba vdpa_sim: fix vringh initialization in vdpasim_queue_ready()
[ Upstream commit 794ec498c9fa79e6bfd71b931410d5897a9c00d4 ]

When we initialize vringh, we should pass the features and the
number of elements in the virtqueue negotiated with the driver,
otherwise operations with vringh may fail.

This was discovered in a case where the driver sets a number of
elements in the virtqueue different from the value returned by
.get_vq_num_max().

In vdpasim_vq_reset() is safe to initialize the vringh with
default values, since the virtqueue will not be used until
vdpasim_queue_ready() is called again.

Fixes: 2c53d0f64c06 ("vdpasim: vDPA device simulator")
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20221110141335.62171-1-sgarzare@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:11 +01:00
Stefano Garzarella
e3462410c3 vhost: fix range used in translate_desc()
[ Upstream commit 98047313cdb46828093894d0ac8b1183b8b317f9 ]

vhost_iotlb_itree_first() requires `start` and `last` parameters
to search for a mapping that overlaps the range.

In translate_desc() we cyclically call vhost_iotlb_itree_first(),
incrementing `addr` by the amount already translated, so rightly
we move the `start` parameter passed to vhost_iotlb_itree_first(),
but we should hold the `last` parameter constant.

Let's fix it by saving the `last` parameter value before incrementing
`addr` in the loop.

Fixes: a9709d6874d5 ("vhost: convert pre sorted vhost memory array to interval tree")
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20221109102503.18816-3-sgarzare@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:11 +01:00
Stefano Garzarella
13871f60ec vringh: fix range used in iotlb_translate()
[ Upstream commit f85efa9b0f5381874f727bd98f56787840313f0b ]

vhost_iotlb_itree_first() requires `start` and `last` parameters
to search for a mapping that overlaps the range.

In iotlb_translate() we cyclically call vhost_iotlb_itree_first(),
incrementing `addr` by the amount already translated, so rightly
we move the `start` parameter passed to vhost_iotlb_itree_first(),
but we should hold the `last` parameter constant.

Let's fix it by saving the `last` parameter value before incrementing
`addr` in the loop.

Fixes: 9ad9c49cfe97 ("vringh: IOTLB support")
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20221109102503.18816-2-sgarzare@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:11 +01:00
Yuan Can
e05d4c8c28 vhost/vsock: Fix error handling in vhost_vsock_init()
[ Upstream commit 7a4efe182ca61fb3e5307e69b261c57cbf434cd4 ]

A problem about modprobe vhost_vsock failed is triggered with the
following log given:

modprobe: ERROR: could not insert 'vhost_vsock': Device or resource busy

The reason is that vhost_vsock_init() returns misc_register() directly
without checking its return value, if misc_register() failed, it returns
without calling vsock_core_unregister() on vhost_transport, resulting the
vhost_vsock can never be installed later.
A simple call graph is shown as below:

 vhost_vsock_init()
   vsock_core_register() # register vhost_transport
   misc_register()
     device_create_with_groups()
       device_create_groups_vargs()
         dev = kzalloc(...) # OOM happened
   # return without unregister vhost_transport

Fix by calling vsock_core_unregister() when misc_register() returns error.

Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko")
Signed-off-by: Yuan Can <yuancan@huawei.com>
Message-Id: <20221108101705.45981-1-yuancan@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:11 +01:00
ruanjinjie
586e6fd7d5 vdpa_sim: fix possible memory leak in vdpasim_net_init() and vdpasim_blk_init()
[ Upstream commit aeca7ff254843d49a8739f07f7dab1341450111d ]

Inject fault while probing module, if device_register() fails in
vdpasim_net_init() or vdpasim_blk_init(), but the refcount of kobject is
not decreased to 0, the name allocated in dev_set_name() is leaked.
Fix this by calling put_device(), so that name can be freed in
callback function kobject_cleanup().

(vdpa_sim_net)
unreferenced object 0xffff88807eebc370 (size 16):
  comm "modprobe", pid 3848, jiffies 4362982860 (age 18.153s)
  hex dump (first 16 bytes):
    76 64 70 61 73 69 6d 5f 6e 65 74 00 6b 6b 6b a5  vdpasim_net.kkk.
  backtrace:
    [<ffffffff8174f19e>] __kmalloc_node_track_caller+0x4e/0x150
    [<ffffffff81731d53>] kstrdup+0x33/0x60
    [<ffffffff83a5d421>] kobject_set_name_vargs+0x41/0x110
    [<ffffffff82d87aab>] dev_set_name+0xab/0xe0
    [<ffffffff82d91a23>] device_add+0xe3/0x1a80
    [<ffffffffa0270013>] 0xffffffffa0270013
    [<ffffffff81001c27>] do_one_initcall+0x87/0x2e0
    [<ffffffff813739cb>] do_init_module+0x1ab/0x640
    [<ffffffff81379d20>] load_module+0x5d00/0x77f0
    [<ffffffff8137bc40>] __do_sys_finit_module+0x110/0x1b0
    [<ffffffff83c4d505>] do_syscall_64+0x35/0x80
    [<ffffffff83e0006a>] entry_SYSCALL_64_after_hwframe+0x46/0xb0

(vdpa_sim_blk)
unreferenced object 0xffff8881070c1250 (size 16):
  comm "modprobe", pid 6844, jiffies 4364069319 (age 17.572s)
  hex dump (first 16 bytes):
    76 64 70 61 73 69 6d 5f 62 6c 6b 00 6b 6b 6b a5  vdpasim_blk.kkk.
  backtrace:
    [<ffffffff8174f19e>] __kmalloc_node_track_caller+0x4e/0x150
    [<ffffffff81731d53>] kstrdup+0x33/0x60
    [<ffffffff83a5d421>] kobject_set_name_vargs+0x41/0x110
    [<ffffffff82d87aab>] dev_set_name+0xab/0xe0
    [<ffffffff82d91a23>] device_add+0xe3/0x1a80
    [<ffffffffa0220013>] 0xffffffffa0220013
    [<ffffffff81001c27>] do_one_initcall+0x87/0x2e0
    [<ffffffff813739cb>] do_init_module+0x1ab/0x640
    [<ffffffff81379d20>] load_module+0x5d00/0x77f0
    [<ffffffff8137bc40>] __do_sys_finit_module+0x110/0x1b0
    [<ffffffff83c4d505>] do_syscall_64+0x35/0x80
    [<ffffffff83e0006a>] entry_SYSCALL_64_after_hwframe+0x46/0xb0

Fixes: 899c4d187f6a ("vdpa_sim_blk: add support for vdpa management tool")
Fixes: a3c06ae158dd ("vdpa_sim_net: Add support for user supported devices")

Signed-off-by: ruanjinjie <ruanjinjie@huawei.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20221110082348.4105476-1-ruanjinjie@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:10 +01:00
Miaoqian Lin
b63bc2db24 nfc: Fix potential resource leaks
[ Upstream commit df49908f3c52d211aea5e2a14a93bbe67a2cb3af ]

nfc_get_device() take reference for the device, add missing
nfc_put_device() to release it when not need anymore.
Also fix the style warnning by use error EOPNOTSUPP instead of
ENOTSUPP.

Fixes: 5ce3f32b5264 ("NFC: netlink: SE API implementation")
Fixes: 29e76924cf08 ("nfc: netlink: Add capability to reply to vendor_cmd with data")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:10 +01:00
Johnny S. Lee
945e58bdaf net: dsa: mv88e6xxx: depend on PTP conditionally
[ Upstream commit 30e725537546248bddc12eaac2fe0a258917f190 ]

PTP hardware timestamping related objects are not linked when PTP
support for MV88E6xxx (NET_DSA_MV88E6XXX_PTP) is disabled, therefore
NET_DSA_MV88E6XXX should not depend on PTP_1588_CLOCK_OPTIONAL
regardless of NET_DSA_MV88E6XXX_PTP.

Instead, condition more strictly on how NET_DSA_MV88E6XXX_PTP's
dependencies are met, making sure that it cannot be enabled when
NET_DSA_MV88E6XXX=y and PTP_1588_CLOCK=m.

In other words, this commit allows NET_DSA_MV88E6XXX to be built-in
while PTP_1588_CLOCK is a module, as long as NET_DSA_MV88E6XXX_PTP is
prevented from being enabled.

Fixes: e5f31552674e ("ethernet: fix PTP_1588_CLOCK dependencies")
Signed-off-by: Johnny S. Lee <foss@jsl.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:10 +01:00
Daniil Tatianin
95df720e64 qlcnic: prevent ->dcb use-after-free on qlcnic_dcb_enable() failure
[ Upstream commit 13a7c8964afcd8ca43c0b6001ebb0127baa95362 ]

adapter->dcb would get silently freed inside qlcnic_dcb_enable() in
case qlcnic_dcb_attach() would return an error, which always happens
under OOM conditions. This would lead to use-after-free because both
of the existing callers invoke qlcnic_dcb_get_info() on the obtained
pointer, which is potentially freed at that point.

Propagate errors from qlcnic_dcb_enable(), and instead free the dcb
pointer at callsite using qlcnic_dcb_free(). This also removes the now
unused qlcnic_clear_dcb_ops() helper, which was a simple wrapper around
kfree() also causing memory leaks for partially initialized dcb.

Found by Linux Verification Center (linuxtesting.org) with the SVACE
static analysis tool.

Fixes: 3c44bba1d270 ("qlcnic: Disable DCB operations from SR-IOV VFs")
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:10 +01:00
Hawkins Jiawei
6c55953e23 net: sched: fix memory leak in tcindex_set_parms
[ Upstream commit 399ab7fe0fa0d846881685fd4e57e9a8ef7559f7 ]

Syzkaller reports a memory leak as follows:
====================================
BUG: memory leak
unreferenced object 0xffff88810c287f00 (size 256):
  comm "syz-executor105", pid 3600, jiffies 4294943292 (age 12.990s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff814cf9f0>] kmalloc_trace+0x20/0x90 mm/slab_common.c:1046
    [<ffffffff839c9e07>] kmalloc include/linux/slab.h:576 [inline]
    [<ffffffff839c9e07>] kmalloc_array include/linux/slab.h:627 [inline]
    [<ffffffff839c9e07>] kcalloc include/linux/slab.h:659 [inline]
    [<ffffffff839c9e07>] tcf_exts_init include/net/pkt_cls.h:250 [inline]
    [<ffffffff839c9e07>] tcindex_set_parms+0xa7/0xbe0 net/sched/cls_tcindex.c:342
    [<ffffffff839caa1f>] tcindex_change+0xdf/0x120 net/sched/cls_tcindex.c:553
    [<ffffffff8394db62>] tc_new_tfilter+0x4f2/0x1100 net/sched/cls_api.c:2147
    [<ffffffff8389e91c>] rtnetlink_rcv_msg+0x4dc/0x5d0 net/core/rtnetlink.c:6082
    [<ffffffff839eba67>] netlink_rcv_skb+0x87/0x1d0 net/netlink/af_netlink.c:2540
    [<ffffffff839eab87>] netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
    [<ffffffff839eab87>] netlink_unicast+0x397/0x4c0 net/netlink/af_netlink.c:1345
    [<ffffffff839eb046>] netlink_sendmsg+0x396/0x710 net/netlink/af_netlink.c:1921
    [<ffffffff8383e796>] sock_sendmsg_nosec net/socket.c:714 [inline]
    [<ffffffff8383e796>] sock_sendmsg+0x56/0x80 net/socket.c:734
    [<ffffffff8383eb08>] ____sys_sendmsg+0x178/0x410 net/socket.c:2482
    [<ffffffff83843678>] ___sys_sendmsg+0xa8/0x110 net/socket.c:2536
    [<ffffffff838439c5>] __sys_sendmmsg+0x105/0x330 net/socket.c:2622
    [<ffffffff83843c14>] __do_sys_sendmmsg net/socket.c:2651 [inline]
    [<ffffffff83843c14>] __se_sys_sendmmsg net/socket.c:2648 [inline]
    [<ffffffff83843c14>] __x64_sys_sendmmsg+0x24/0x30 net/socket.c:2648
    [<ffffffff84605fd5>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    [<ffffffff84605fd5>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
    [<ffffffff84800087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
====================================

Kernel uses tcindex_change() to change an existing
filter properties.

Yet the problem is that, during the process of changing,
if `old_r` is retrieved from `p->perfect`, then
kernel uses tcindex_alloc_perfect_hash() to newly
allocate filter results, uses tcindex_filter_result_init()
to clear the old filter result, without destroying
its tcf_exts structure, which triggers the above memory leak.

To be more specific, there are only two source for the `old_r`,
according to the tcindex_lookup(). `old_r` is retrieved from
`p->perfect`, or `old_r` is retrieved from `p->h`.

  * If `old_r` is retrieved from `p->perfect`, kernel uses
tcindex_alloc_perfect_hash() to newly allocate the
filter results. Then `r` is assigned with `cp->perfect + handle`,
which is newly allocated. So condition `old_r && old_r != r` is
true in this situation, and kernel uses tcindex_filter_result_init()
to clear the old filter result, without destroying
its tcf_exts structure

  * If `old_r` is retrieved from `p->h`, then `p->perfect` is NULL
according to the tcindex_lookup(). Considering that `cp->h`
is directly copied from `p->h` and `p->perfect` is NULL,
`r` is assigned with `tcindex_lookup(cp, handle)`, whose value
should be the same as `old_r`, so condition `old_r && old_r != r`
is false in this situation, kernel ignores using
tcindex_filter_result_init() to clear the old filter result.

So only when `old_r` is retrieved from `p->perfect` does kernel use
tcindex_filter_result_init() to clear the old filter result, which
triggers the above memory leak.

Considering that there already exists a tc_filter_wq workqueue
to destroy the old tcindex_data by tcindex_partial_destroy_work()
at the end of tcindex_set_parms(), this patch solves
this memory leak bug by removing this old filter result
clearing part and delegating it to the tc_filter_wq workqueue.

Note that this patch doesn't introduce any other issues. If
`old_r` is retrieved from `p->perfect`, this patch just
delegates old filter result clearing part to the
tc_filter_wq workqueue; If `old_r` is retrieved from `p->h`,
kernel doesn't reach the old filter result clearing part, so
removing this part has no effect.

[Thanks to the suggestion from Jakub Kicinski, Cong Wang, Paolo Abeni
and Dmitry Vyukov]

Fixes: b9a24bb76bf6 ("net_sched: properly handle failure case of tcf_exts_init()")
Link: https://lore.kernel.org/all/0000000000001de5c505ebc9ec59@google.com/
Reported-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com
Tested-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com
Cc: Cong Wang <cong.wang@bytedance.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:10 +01:00
Jian Shen
d14a4b24d5 net: hns3: fix VF promisc mode not update when mac table full
[ Upstream commit 8ee57c7b8406c7aa8ca31e014440c87c6383f429 ]

Currently, it missed set HCLGE_VPORT_STATE_PROMISC_CHANGE
flag for VF when vport->overflow_promisc_flags changed.
So the VF won't check whether to update promisc mode in
this case. So add it.

Fixes: 1e6e76101fd9 ("net: hns3: configure promisc mode for VF asynchronously")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:10 +01:00
Jian Shen
7ed205b947 net: hns3: fix miss L3E checking for rx packet
[ Upstream commit 7d89b53cea1a702f97117fb4361523519bb1e52c ]

For device supports RXD advanced layout, the driver will
return directly if the hardware finish the checksum
calculate. It cause missing L3E checking for ip packets.
Fixes it.

Fixes: 1ddc028ac849 ("net: hns3: refactor out RX completion checksum")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:10 +01:00
Peng Li
47868cb77f net: hns3: extract macro to simplify ring stats update code
[ Upstream commit e6d72f6ac2ad4965491354d74b48e35a60abf298 ]

As the code to update ring stats is alike for different ring stats
type, this patch extract macro to simplify ring stats update code.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: 7d89b53cea1a ("net: hns3: fix miss L3E checking for rx packet")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:09 +01:00
Hao Chen
7457c5a776 net: hns3: refactor hns3_nic_reuse_page()
[ Upstream commit e74a726da2c4dcedb8b0631f423d0044c7901a20 ]

Split rx copybreak handle into a separate function from function
hns3_nic_reuse_page() to improve code simplicity.

Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: 7d89b53cea1a ("net: hns3: fix miss L3E checking for rx packet")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:09 +01:00
Jie Wang
4a6e9fb534 net: hns3: add interrupts re-initialization while doing VF FLR
[ Upstream commit 09e6b30eeb254f1818a008cace3547159e908dfd ]

Currently keep alive message between PF and VF may be lost and the VF is
unalive in PF. So the VF will not do reset during PF FLR reset process.
This would make the allocated interrupt resources of VF invalid and VF
would't receive or respond to PF any more.

So this patch adds VF interrupts re-initialization during VF FLR for VF
recovery in above cases.

Fixes: 862d969a3a4d ("net: hns3: do VF's pci re-initialization while PF doing FLR")
Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:09 +01:00
Jeff Layton
5e48ed805c nfsd: shut down the NFSv4 state objects before the filecache
[ Upstream commit 789e1e10f214c00ca18fc6610824c5b9876ba5f2 ]

Currently, we shut down the filecache before trying to clean up the
stateids that depend on it. This leads to the kernel trying to free an
nfsd_file twice, and a refcount overput on the nf_mark.

Change the shutdown procedure to tear down all of the stateids prior
to shutting down the filecache.

Reported-and-tested-by: Wang Yugui <wangyugui@e16-tech.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Fixes: 5e113224c17e ("nfsd: nfsd_file cache entries should be per net namespace")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12 11:59:09 +01:00