1136468 Commits

Author SHA1 Message Date
Daniel Wagner
94f5a06884 nvmet: fix invalid memory reference in nvmet_subsys_attr_qid_max_show
The item passed into nvmet_subsys_attr_qid_max_show is not a member of
struct nvmet_port, it is part of nvmet_subsys.  Hence, don't try to
dereference it as struct nvme_ctrl pointer.

Fixes: 3e980f5995e0 ("nvmet: Expose max queues to configfs")
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/r/20220913064203.133536-1-dwagner@suse.de
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-10-19 12:43:13 +02:00
Sagi Grimberg
ddd2b8de9f nvmet: fix workqueue MEM_RECLAIM flushing dependency
The keep alive timer needs to stay on nvmet_wq, and not
modified to reschedule on the system_wq.

This fixes a warning:
------------[ cut here ]------------
workqueue: WQ_MEM_RECLAIM
nvmet-wq:nvmet_rdma_release_queue_work [nvmet_rdma] is flushing
!WQ_MEM_RECLAIM events:nvmet_keep_alive_timer [nvmet]
WARNING: CPU: 3 PID: 1086 at kernel/workqueue.c:2628
check_flush_dependency+0x16c/0x1e0

Reported-by: Yi Zhang <yi.zhang@redhat.com>
Fixes: 8832cf922151 ("nvmet: use a private workqueue instead of the system workqueue")
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-10-19 12:43:13 +02:00
Serge Semin
c94b7f9bab nvme-hwmon: kmalloc the NVME SMART log buffer
Recent commit 52fde2c07da6 ("nvme: set dma alignment to dword") has
caused a regression on our platform.

It turned out that the nvme_get_log() method invocation caused the
nvme_hwmon_data structure instance corruption.  In particular the
nvme_hwmon_data.ctrl pointer was overwritten either with zeros or with
garbage.  After some research we discovered that the problem happened
even before the actual NVME DMA execution, but during the buffer mapping.
Since our platform is DMA-noncoherent, the mapping implied the cache-line
invalidations or write-backs depending on the DMA-direction parameter.
In case of the NVME SMART log getting the DMA was performed
from-device-to-memory, thus the cache-invalidation was activated during
the buffer mapping.  Since the log-buffer isn't cache-line aligned, the
cache-invalidation caused the neighbour data to be discarded.  The
neighbouring data turned to be the data surrounding the buffer in the
framework of the nvme_hwmon_data structure.

In order to fix that we need to make sure that the whole log-buffer is
defined within the cache-line-aligned memory region so the
cache-invalidation procedure wouldn't involve the adjacent data. One of
the option to guarantee that is to kmalloc the DMA-buffer [1]. Seeing the
rest of the NVME core driver prefer that method it has been chosen to fix
this problem too.

Note after a deeper researches we found out that the denoted commit wasn't
a root cause of the problem. It just revealed the invalidity by activating
the DMA-based NVME SMART log getting performed in the framework of the
NVME hwmon driver. The problem was here since the initial commit of the
driver.

[1] Documentation/core-api/dma-api-howto.rst

Fixes: 400b6a7b13a3 ("nvme: Add hardware monitoring support")
Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-10-19 12:43:13 +02:00
Christoph Hellwig
6b8cf94005 nvme-hwmon: consistently ignore errors from nvme_hwmon_init
An NVMe controller works perfectly fine even when the hwmon
initialization fails.  Stop returning errors that do not come from a
controller reset from nvme_hwmon_init to handle this case consistently.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
2022-10-19 12:42:58 +02:00
Christian König
7b476affcc drm/sched: add DRM_SCHED_FENCE_DONT_PIPELINE flag
Setting this flag on a scheduler fence prevents pipelining of jobs
depending on this fence. In other words we always insert a full CPU
round trip before dependent jobs are pushed to the pipeline.

Signed-off-by: Christian König <christian.koenig@amd.com>
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2113#note_1579296
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014081553.114899-1-christian.koenig@amd.com
2022-10-19 12:42:51 +02:00
Christoph Hellwig
6ff5ba9796 nvme: add Guenther as nvme-hwmon maintainer
Given that non of the overall NVMe maintainers knows this code very
deeply it probably makes sense to add Guenther as an additional
MAINTAINER for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by: Guenter Roeck <linux@roeck-us.net>
2022-10-19 12:42:43 +02:00
Russell King (Oracle)
d622f8477a nvme-apple: don't limit DMA segement size
NVMe uses PRPs for data transfers and has no specific limit for a single
DMA segement.  Limiting the size will cause problems because the block
layer assumes PRP-ish devices using a virt boundary mask don't have a
segment limit.  And while this is true, we also really need to tell the
DMA mapping layer about it, otherwise dma-debug will trip over it.

Fixes: 5bd2927aceba ("nvme-apple: Add initial Apple SoC NVMe driver")
Suggested-by: Sven Peter <sven@svenpeter.dev>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
[hch: rewrote the commit message based on the PCIe commit]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Curtin <ecurtin@redhat.com>
Reviewed-by: Sven Peter <sven@svenpeter.dev>
2022-10-19 10:36:39 +02:00
Xander Li
ac9b57d4e1 nvme-pci: disable write zeroes on various Kingston SSD
Kingston SSDs do support NVMe Write_Zeroes cmd but take long time to
process.  The firmware version is locked by these SSDs, we can not expect
firmware improvement, so disable Write_Zeroes cmd.

Signed-off-by: Xander Li <xander_li@kingston.com.tw>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-10-19 10:36:39 +02:00
Dan Carpenter
4739824e2d nvme: fix error pointer dereference in error handling
There is typo here so it releases the wrong variable.  "ctrl->admin_q"
was intended instead of "ctrl->fabrics_q".

Fixes: fe60e8c53411 ("nvme: add common helpers to allocate and free tagsets")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-10-19 10:36:39 +02:00
Pablo Neira Ayuso
96df8360db netfilter: nf_tables: relax NFTA_SET_ELEM_KEY_END set flags requirements
Otherwise EINVAL is bogusly reported to userspace when deleting a set
element. NFTA_SET_ELEM_KEY_END does not need to be set in case of:

- insertion: if not present, start key is used as end key.
- deletion: only start key needs to be specified, end key is ignored.

Hence, relax the sanity check.

Fixes: 88cccd908d51 ("netfilter: nf_tables: NFTA_SET_ELEM_KEY_END requires concat and interval flags")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-10-19 08:46:48 +02:00
Guillaume Nault
1fcc064b30 netfilter: rpfilter/fib: Set ->flowic_uid correctly for user namespaces.
Currently netfilter's rpfilter and fib modules implicitely initialise
->flowic_uid with 0. This is normally the root UID. However, this isn't
the case in user namespaces, where user ID 0 is mapped to a different
kernel UID. By initialising ->flowic_uid with sock_net_uid(), we get
the root UID of the user namespace, thus keeping the same behaviour
whether or not we're running in a user namepspace.

Note, this is similar to commit 8bcfd0925ef1 ("ipv4: add missing
initialization for flowi4_uid"), which fixed the rp_filter sysctl.

Fixes: 622ec2c9d524 ("net: core: add UID to flows, rules, and routes")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-10-19 08:46:48 +02:00
Brett Creeley
aa1d7e1267 ionic: catch NULL pointer issue on reconfig
It's possible that the driver will dereference a qcq that doesn't exist
when calling ionic_reconfigure_queues(), which causes a page fault BUG.

If a reduction in the number of queues is followed by a different
reconfig such as changing the ring size, the driver can hit a NULL
pointer when trying to clean up non-existent queues.

Fix this by checking to make sure both the qcqs array and qcq entry
exists bofore trying to use and free the entry.

Fixes: 101b40a0171f ("ionic: change queue count with no reset")
Signed-off-by: Brett Creeley <brett@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Link: https://lore.kernel.org/r/20221017233123.15869-1-snelson@pensando.io
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-18 19:19:31 -07:00
Eric Dumazet
d8b57135fd net: hsr: avoid possible NULL deref in skb_clone()
syzbot got a crash [1] in skb_clone(), caused by a bug
in hsr_get_untagged_frame().

When/if create_stripped_skb_hsr() returns NULL, we must
not attempt to call skb_clone().

While we are at it, replace a WARN_ONCE() by netdev_warn_once().

[1]
general protection fault, probably for non-canonical address 0xdffffc000000000f: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000078-0x000000000000007f]
CPU: 1 PID: 754 Comm: syz-executor.0 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
RIP: 0010:skb_clone+0x108/0x3c0 net/core/skbuff.c:1641
Code: 93 02 00 00 49 83 7c 24 28 00 0f 85 e9 00 00 00 e8 5d 4a 29 fa 4c 8d 75 7e 48 b8 00 00 00 00 00 fc ff df 4c 89 f2 48 c1 ea 03 <0f> b6 04 02 4c 89 f2 83 e2 07 38 d0 7f 08 84 c0 0f 85 9e 01 00 00
RSP: 0018:ffffc90003ccf4e0 EFLAGS: 00010207

RAX: dffffc0000000000 RBX: ffffc90003ccf5f8 RCX: ffffc9000c24b000
RDX: 000000000000000f RSI: ffffffff8751cb13 RDI: 0000000000000000
RBP: 0000000000000000 R08: 00000000000000f0 R09: 0000000000000140
R10: fffffbfff181d972 R11: 0000000000000000 R12: ffff888161fc3640
R13: 0000000000000a20 R14: 000000000000007e R15: ffffffff8dc5f620
FS: 00007feb621e4700(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007feb621e3ff8 CR3: 00000001643a9000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
hsr_get_untagged_frame+0x4e/0x610 net/hsr/hsr_forward.c:164
hsr_forward_do net/hsr/hsr_forward.c:461 [inline]
hsr_forward_skb+0xcca/0x1d50 net/hsr/hsr_forward.c:623
hsr_handle_frame+0x588/0x7c0 net/hsr/hsr_slave.c:69
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
tun_rx_batched+0x4ab/0x7a0 drivers/net/tun.c:1544
tun_get_user+0x2686/0x3a00 drivers/net/tun.c:1995
tun_chr_write_iter+0xdb/0x200 drivers/net/tun.c:2025
call_write_iter include/linux/fs.h:2187 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x9e9/0xdd0 fs/read_write.c:584
ksys_write+0x127/0x250 fs/read_write.c:637
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Fixes: f266a683a480 ("net/hsr: Better frame dispatch")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20221017165928.2150130-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-18 19:18:27 -07:00
Vikas Gupta
ba077d683d bnxt_en: fix memory leak in bnxt_nvm_test()
Free the kzalloc'ed buffer before returning in the success path.

Fixes: 5b6ff128fdf6 ("bnxt_en: implement callbacks for devlink selftests")
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/1666020742-25834-1-git-send-email-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-18 19:18:02 -07:00
Arunpravin Paneer Selvam
8273b40486 drm/amdgpu: Fix for BO move issue
A user reported a bug on CAPE VERDE system where uvd_v3_1
IP component failed to initialize as there is an issue with
BO move code from one memory to other.

In function amdgpu_mem_visible() called by amdgpu_bo_move(),
when there are no blocks to compare or if we have a single
block then break the loop.

Fixes: 312b4dc11d4f ("drm/amdgpu: Fix VRAM BO swap issue")
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:14:07 -04:00
YuBiao Wang
2abe92c7ad drm/amdgpu: dequeue mes scheduler during fini
[Why]
If mes is not dequeued during fini, mes will be in an uncleaned state
during reload, then mes couldn't receive some commands which leads to
reload failure.

[How]
Perform MES dequeue via MMIO after all the unmap jobs are done by mes
and before kiq fini.

v2: Move the dequeue operation inside kiq_hw_fini.

Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com>
Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:13:59 -04:00
Kenneth Feng
5ce4726a13 drm/amd/pm: enable thermal alert on smu_v13_0_10
enable thermal alert on smu_v13_0_10

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:13:34 -04:00
Yifan Zha
97a3d6090f drm/amdgpu: Program GC registers through RLCG interface in gfx_v11/gmc_v11
[Why]
L1 blocks most of GC registers accessing by MMIO.

[How]
Use RLCG interface to program GC registers under SRIOV VF in full access time.

Signed-off-by: Yifan Zha <Yifan.Zha@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:13:24 -04:00
Nathan Chancellor
e688ba3e27 drm/amdkfd: Fix type of reset_type parameter in hqd_destroy() callback
When booting a kernel compiled with CONFIG_CFI_CLANG on a machine with
an RX 6700 XT, there is a CFI failure in kfd_destroy_mqd_cp():

  [   12.894543] CFI failure at kfd_destroy_mqd_cp+0x2a/0x40 [amdgpu] (target: hqd_destroy_v10_3+0x0/0x260 [amdgpu]; expected type: 0x8594d794)

Clang's kernel Control Flow Integrity (kCFI) makes sure that all
indirect call targets have a type that exactly matches the function
pointer prototype. In this case, hqd_destroy()'s third parameter,
reset_type, should have a type of 'uint32_t' but every implementation of
this callback has a third parameter type of 'enum kfd_preempt_type'.

Update the function pointer prototype to match reality so that there is
no more CFI violation.

Link: https://github.com/ClangBuiltLinux/linux/issues/1738
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:13:12 -04:00
Guenter Roeck
8a70b2d89e drm/amd/display: Increase frame size limit for display_mode_vba_util_32.o
Building 32-bit images may fail with the following error.

drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/display_mode_vba_util_32.c:
	In function ‘dml32_UseMinimumDCFCLK’:
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/display_mode_vba_util_32.c:3142:1:
	error: the frame size of 1096 bytes is larger than 1024 bytes

This is seen when building i386:allmodconfig with any of the following
compilers.

	gcc (Debian 12.2.0-3) 12.2.0
	gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0

The problem is not seen if the compiler supports GCC_PLUGIN_LATENT_ENTROPY
because in that case CONFIG_FRAME_WARN is already set to 2048 even for
32-bit builds.

dml32_UseMinimumDCFCLK() was introduced with commit dda4fb85e433
("drm/amd/display: DML changes for DCN32/321"). It declares a large
number of local variables. Increase the frame size for the affected
file to 2048, similar to other files in the same directory, to enable
32-bit build tests with affected compilers.

Fixes: dda4fb85e433 ("drm/amd/display: DML changes for DCN32/321")
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reported-by: Łukasz Bartosik <ukaszb@google.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:13:03 -04:00
Tim Huang
31c261a7ff drm/amd/pm: add SMU IP v13.0.4 IF version define to V7
The pmfw has changed the driver interface version, so keep same with the
fw.

Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0.x
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:12:51 -04:00
Tim Huang
853fdb4916 drm/amd/pm: update SMU IP v13.0.4 driver interface version
Update the SMU driver interface version to V7.

Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0.x
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:12:44 -04:00
ZhenGuo Yin
5fa993737b drm/amd/pm: Init pm_attr_list when dpm is disabled
[Why]
In SRIOV multi-vf, dpm is always disabled, and pm_attr_list won't
be initialized. There will be a NULL pointer call trace after
removing the dpm check condition in amdgpu_pm_sysfs_fini.
BUG: kernel NULL pointer dereference, address: 0000000000000000
RIP: 0010:amdgpu_device_attr_remove_groups+0x20/0x90 [amdgpu]
Call Trace:
  <TASK>
  amdgpu_pm_sysfs_fini+0x2f/0x40 [amdgpu]
  amdgpu_device_fini_hw+0xdf/0x290 [amdgpu]

[How]
List pm_attr_list should be initialized when dpm is disabled.

Fixes: a6ad27cec585fe ("drm/amd/pm: Remove redundant check condition")
Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:12:36 -04:00
Evan Quan
3059cd8c5f drm/amd/pm: disable cstate feature for gpu reset scenario
Suggested by PMFW team and same as what did for gfxoff feature.
This can address some Mode1Reset failures observed on SMU13.0.0.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0.x
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:12:20 -04:00
Evan Quan
ba2f09960e drm/amd/pm: fulfill SMU13.0.7 cstate control interface
Fulfill the functionality for cstate control.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0.x
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:12:14 -04:00
Evan Quan
528c0e66e0 drm/amd/pm: fulfill SMU13.0.0 cstate control interface
Fulfill the functionality for cstate control.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0.x
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:12:01 -04:00
YiPeng Chai
3bd026c3e3 drm/amdgpu: Add sriov vf ras support in amdgpu_ras_asic_supported
V2:
Add sriov vf ras support in amdgpu_ras_asic_supported.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:11:48 -04:00
YiPeng Chai
001ebcf5b9 drm/amdgpu: Enable ras support for mp0 v13_0_0 and v13_0_10
V1:
Enable ras support for CHIP_IP_DISCOVERY asic type.

V2:
1. Change commit comment.
2. Enable ras support for mp0 v13_0_0 and v13_0_10.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:11:37 -04:00
YiPeng Chai
7cd3f6c3ac drm/amdgpu: Enable gmc soft reset on gmc_v11_0_3
Enable gmc soft reset on gmc_v11_0_3.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:09:48 -04:00
Likun Gao
b7a76a2914 drm/amdgpu: skip mes self test for gc 11.0.3
Temporary disable mes self teset for gc 11.0.3.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:09:41 -04:00
Kenneth Feng
f700486cd1 drm/amd/pm: skip loading pptable from driver on secure board for smu_v13_0_10
skip loading pptable from driver on secure board since it's loaded from psp.

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Guan Yu <Guan.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:09:31 -04:00
Kenneth Feng
657e07221c drm/amd/amdgpu: enable gfx clock gating features on smu_v13_0_10
enable gfx clock gating features on smu_v13_0_10

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Jack Gui <Jack.Gui@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:09:23 -04:00
Kenneth Feng
4c7f9a3c15 drm/amd/pm: remove the pptable id override on smu_v13_0_10
remove the pptable id override on smu_v13_0_10,
and the id is fetched from vbios now.

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:09:17 -04:00
Kenneth Feng
4d72a4e4fb drm/amd/pm: temporarily disable thermal alert on smu_v13_0_10
temporarily disable thermal alert on smu_v13_0_10 due to kfd test fail.
will enable it again after confirming the thermal hardware setting.

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:09:10 -04:00
Asher Song
4545ae2ed3 drm/amdgpu: Revert "drm/amdgpu: getting fan speed pwm for vega10 properly"
This reverts commit 16fb4dca95daa9d8e037201166a58de8284f4268.

Unfortunately, that commit causes fan monitors can't be read and written
properly.

Fixes: 16fb4dca95daa9 ("drm/amdgpu: getting fan speed pwm for vega10 properly")
Signed-off-by: Asher Song <Asher.Song@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:08:49 -04:00
Victor Zhao
a31e62873f drm/amdgpu: Refactor mode2 reset logic for v11.0.7
- refactor mode2 on v11.0.7 to align with aldebaran
- comment out using mode2 reset as default for now, will introduce
another controller to replace previous reset_level_mask

v2: squash in unused variable removal (Alex)

Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:08:40 -04:00
Victor Zhao
a340847b02 Revert "drm/amdgpu: let mode2 reset fallback to default when failure"
This reverts commit dac6b80818ac2353631c5a33d140d8d5508e2957.

This commit reverted the AMDGPU_SKIP_MODE2_RESET as it conflicts with
the original design of reset handler. Will redesign it.

Fixes: dac6b80818ac23 ("drm/amdgpu: let mode2 reset fallback to default when failure")
Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:08:33 -04:00
Victor Zhao
afbaa15501 Revert "drm/amdgpu: add debugfs amdgpu_reset_level"
This reverts commit 5bd8d53f6fa53eab5433698d1362dae2aa53c1cc.

This commit breaks the reset logic for aldebaran, revert it for now.
Will move the mask inside the reset handler.

Fixes: 5bd8d53f6fa53e ("drm/amdgpu: add debugfs amdgpu_reset_level")
Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:08:25 -04:00
Danijel Slivka
65f8682b9a drm/amdgpu: set vm_update_mode=0 as default for Sienna Cichlid in SRIOV case
For asic with VF MMIO access protection avoid using CPU for VM table updates.
CPU pagetable updates have issues with HDP flush as VF MMIO access protection
blocks write to mmBIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL register
during sriov runtime.

v3: introduce virtualization capability flag AMDGPU_VF_MMIO_ACCESS_PROTECT
which indicates that VF MMIO write access is not allowed in sriov runtime

Signed-off-by: Danijel Slivka <danijel.slivka@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:07:58 -04:00
Nikos Tsironis
5434ee8d28 dm clone: Fix typo in block_device format specifier
Use %pg for printing the block device name, instead of %pd.

Fixes: 385411ffba0c ("dm: stop using bdevname")
Cc: stable@vger.kernel.org # v5.18+
Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-10-18 17:17:48 -04:00
Genjian Zhang
99f4f5bcb9 dm: remove unnecessary assignment statement in alloc_dev()
Fixes: 74fe6ba923949 ("dm: convert to blk_alloc_disk/blk_cleanup_disk")
Signed-off-by: Genjian Zhang <zhanggenjian@kylinos.cn>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-10-18 17:17:48 -04:00
Milan Broz
dc3efedf9f dm verity: Add documentation for try_verify_in_tasklet option
Add documentation that was missing from commit 5721d4e5a9cd ("dm
verity: Add optional "try_verify_in_tasklet" feature").

Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-10-18 17:17:48 -04:00
Shaomin Deng
48d1a964dc dm cache: delete the redundant word 'each' in comment
Signed-off-by: Shaomin Deng <dengshaomin@cdjrlc.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-10-18 17:17:47 -04:00
Jiangshan Yi
96fccdce97 dm raid: fix typo in analyse_superblocks code comment
Reported-by: k2ci <kernel-bot@kylinos.cn>
Signed-off-by: Jiangshan Yi <yijiangshan@kylinos.cn>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-10-18 17:17:47 -04:00
Nathan Huckleberry
afd41fff9c dm verity: enable WQ_HIGHPRI on verify_wq
WQ_HIGHPRI increases throughput and decreases disk latency when using
dm-verity. This is important in Android for camera startup speed.

The following tests were run by doing 60 seconds of random reads using
a dm-verity device backed by two ramdisks.

Without WQ_HIGHPRI
lat (usec): min=13, max=3947, avg=69.53, stdev=50.55
READ: bw=51.1MiB/s (53.6MB/s), 51.1MiB/s-51.1MiB/s (53.6MB/s-53.6MB/s)

With WQ_HIGHPRI:
lat (usec): min=13, max=7854, avg=31.15, stdev=30.42
READ: bw=116MiB/s (121MB/s), 116MiB/s-116MiB/s (121MB/s-121MB/s)

Further testing was done by measuring how long it takes to open a
camera on an Android device.

Without WQ_HIGHPRI
Total verity work queue wait times (ms):
880.960, 789.517, 898.852

With WQ_HIGHPRI:
Total verity work queue wait times (ms):
528.824, 439.191, 433.300

The average time to open the camera is reduced by 350ms (or 40-50%).

Signed-off-by: Nathan Huckleberry <nhuck@google.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-10-18 17:17:47 -04:00
Jilin Yuan
cea446630f dm raid: delete the redundant word 'that' in comment
Signed-off-by: Jilin Yuan <yuanjilin@cdjrlc.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-10-18 17:17:47 -04:00
Mikulas Patocka
43e6c11182 dm: change from DMWARN to DMERR or DMCRIT for fatal errors
Change DMWARN to DMERR in cases when there is an unrecoverable error.
Change DMWARN to DMCRIT when handling of a case is unimplemented.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-10-18 17:16:00 -04:00
Linus Torvalds
aae703b02f for-6.1-rc1-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmNNTxsACgkQxWXV+ddt
 WDs7QA//WaEPFWO/086pWBhlJF8k+QCNwM/EEKQL5x4hxC6pEGbHk04q63IY3XKh
 B9WEoxTfkxpHz8p+p9wDVRJl1Fdby/UFc1l2xTLcU273wL4Iweqf00N4WyEYmqQ3
 DtZcYmh4r7gZ9cTuHk8Ex+llhqAUiN7mY3FoEU8naZCE+Fdn/h3T8DT79V2XLgzv
 4f46ci0ao74o30EE7vc/Yw3gr1ouJJ4Ajw/UCEXUVC9tWOLcDNE6501AshT/ozDp
 m2tljY630QIayaMjtR+HCJHmdmB5bNGdE01Cssqc8M+M7AtQKvf+A/nQNTiI0UfK
 6ODdukvteTEEVKL2XHHkW6RWzR1rfhT1JOrl3YRKZwAKYsURXI/t+2UIjZtVstY9
 GRb3YGBDVggtbjyXxC04i4WyF3RoHRehGiF/G303BBGFMXfgZ17rvSp7DfL9KLcc
 VNycW17CcQLVZXueNWJrNSu2dQ0X8Lx0X+OTcsxRkNCJW+JQHffDl/TwMt0GtRoO
 Vhwjp8vUKuJDZbjvGXg0ZKrmk0T12+L8ubt5o5fQtMiFf+RGq77xzI1112ZIwsL0
 OtGOD3ShgKDvz24HoxSAVTbHq+/s+bmhIL/xU4QAeol3sOVPfx6b+KqcmTyG9E9u
 +gbqB9js/2vbDFNtmhOV8Fv1HbGT8bwtMCIlq5CzsiX+aT5rT88=
 =aPaQ
 -----END PGP SIGNATURE-----

Merge tag 'for-6.1-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - fiemap fixes:
      - add missing path cache update
      - fix processing of delayed data and tree refs during backref
        walking, this could lead to reporting incorrect extent sharing

 - fix extent range locking under heavy contention to avoid deadlocks

 - make it possible to test send v3 in debugging mode

 - update links in MAINTAINERS

* tag 'for-6.1-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  MAINTAINERS: update btrfs website links and files
  btrfs: ignore fiemap path cache if we have multiple leaves for a data extent
  btrfs: fix processing of delayed tree block refs during backref walking
  btrfs: fix processing of delayed data refs during backref walking
  btrfs: delete stale comments after merge conflict resolution
  btrfs: unlock locked extent area if we have contention
  btrfs: send: update command for protocol version check
  btrfs: send: allow protocol version 3 with CONFIG_BTRFS_DEBUG
  btrfs: add missing path cache update during fiemap
2022-10-18 11:25:50 -07:00
Babu Moger
67bf649344 x86/resctrl: Fix min_cbm_bits for AMD
AMD systems support zero CBM (capacity bit mask) for cache allocation.
That is reflected in rdt_init_res_defs_amd() by:

  r->cache.arch_has_empty_bitmaps = true;

However given the unified code in cbm_validate(), checking for:

  val == 0 && !arch_has_empty_bitmaps

is not enough because of another check in cbm_validate():

  if ((zero_bit - first_bit) < r->cache.min_cbm_bits)

The default value of r->cache.min_cbm_bits = 1.

Leading to:

  $ cd /sys/fs/resctrl
  $ mkdir foo
  $ cd foo
  $ echo L3:0=0 > schemata
    -bash: echo: write error: Invalid argument
  $ cat /sys/fs/resctrl/info/last_cmd_status
    Need at least 1 bits in the mask

Initialize the min_cbm_bits to 0 for AMD. Also, remove the default
setting of min_cbm_bits and initialize it separately.

After the fix:

  $ cd /sys/fs/resctrl
  $ mkdir foo
  $ cd foo
  $ echo L3:0=0 > schemata
  $ cat /sys/fs/resctrl/info/last_cmd_status
    ok

Fixes: 316e7f901f5a ("x86/resctrl: Add struct rdt_cache::arch_has_{sparse, empty}_bitmaps")
Co-developed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: James Morse <james.morse@arm.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/lkml/20220517001234.3137157-1-eranian@google.com
2022-10-18 20:25:16 +02:00
Linus Torvalds
7ae460973d Changes since last update:
- Fix illegal unmapped accesses when initializing compressed inodes;
 
  - Fix up very rare hung on page lock after enabling compressed data
    deduplication;
 
  - Fix up inplace decompression success rate;
 
  - Take s_inode_list_lock to protect sb->s_inodes for fscache shared
    domain.
 -----BEGIN PGP SIGNATURE-----
 
 iIcEABYIAC8WIQThPAmQN9sSA0DVxtI5NzHcH7XmBAUCY05u9xEceGlhbmdAa2Vy
 bmVsLm9yZwAKCRA5NzHcH7XmBHp1AQCXNLts5Af2PdRZUcG8RP0IAM0cZbXM9oZE
 e520B85VRQD8CuHE4xemQ6pN7iTcBRS/i60p5UF3pCr5PJ0WGcf5gwU=
 =/mdV
 -----END PGP SIGNATURE-----

Merge tag 'erofs-for-6.1-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs fixes from Gao Xiang:

 - Fix invalid unmapped accesses when initializing compressed inodes

 - Fix up very rare hung on page lock after enabling compressed data
   deduplication

 - Fix up inplace decompression success rate

 - Take s_inode_list_lock to protect sb->s_inodes for fscache shared
   domain

* tag 'erofs-for-6.1-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: protect s_inodes with s_inode_list_lock for fscache
  erofs: fix up inplace decompression success rate
  erofs: shouldn't churn the mapping page for duplicated copies
  erofs: fix illegal unmapped accesses in z_erofs_fill_inode_lazy()
2022-10-18 11:18:26 -07:00