7434 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Linus Torvalds
|
c2a96b7f18 |
Driver core changes for 6.11-rc1
Here is the big set of driver core changes for 6.11-rc1. Lots of stuff in here, with not a huge diffstat, but apis are evolving which required lots of files to be touched. Highlights of the changes in here are: - platform remove callback api final fixups (Uwe took many releases to get here, finally!) - Rust bindings for basic firmware apis and initial driver-core interactions. It's not all that useful for a "write a whole driver in rust" type of thing, but the firmware bindings do help out the phy rust drivers, and the driver core bindings give a solid base on which others can start their work. There is still a long way to go here before we have a multitude of rust drivers being added, but it's a great first step. - driver core const api changes. This reached across all bus types, and there are some fix-ups for some not-common bus types that linux-next and 0-day testing shook out. This work is being done to help make the rust bindings more safe, as well as the C code, moving toward the end-goal of allowing us to put driver structures into read-only memory. We aren't there yet, but are getting closer. - minor devres cleanups and fixes found by code inspection - arch_topology minor changes - other minor driver core cleanups All of these have been in linux-next for a very long time with no reported problems. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZqH+aQ8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ymoOQCfVBdLcBjEDAGh3L8qHRGMPy4rV2EAoL/r+zKm cJEYtJpGtWX6aAtugm9E =ZyJV -----END PGP SIGNATURE----- Merge tag 'driver-core-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the big set of driver core changes for 6.11-rc1. Lots of stuff in here, with not a huge diffstat, but apis are evolving which required lots of files to be touched. Highlights of the changes in here are: - platform remove callback api final fixups (Uwe took many releases to get here, finally!) - Rust bindings for basic firmware apis and initial driver-core interactions. It's not all that useful for a "write a whole driver in rust" type of thing, but the firmware bindings do help out the phy rust drivers, and the driver core bindings give a solid base on which others can start their work. There is still a long way to go here before we have a multitude of rust drivers being added, but it's a great first step. - driver core const api changes. This reached across all bus types, and there are some fix-ups for some not-common bus types that linux-next and 0-day testing shook out. This work is being done to help make the rust bindings more safe, as well as the C code, moving toward the end-goal of allowing us to put driver structures into read-only memory. We aren't there yet, but are getting closer. - minor devres cleanups and fixes found by code inspection - arch_topology minor changes - other minor driver core cleanups All of these have been in linux-next for a very long time with no reported problems" * tag 'driver-core-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (55 commits) ARM: sa1100: make match function take a const pointer sysfs/cpu: Make crash_hotplug attribute world-readable dio: Have dio_bus_match() callback take a const * zorro: make match function take a const pointer driver core: module: make module_[add|remove]_driver take a const * driver core: make driver_find_device() take a const * driver core: make driver_[create|remove]_file take a const * firmware_loader: fix soundness issue in `request_internal` firmware_loader: annotate doctests as `no_run` devres: Correct code style for functions that return a pointer type devres: Initialize an uninitialized struct member devres: Fix memory leakage caused by driver API devm_free_percpu() devres: Fix devm_krealloc() wasting memory driver core: platform: Switch to use kmemdup_array() driver core: have match() callback in struct bus_type take a const * MAINTAINERS: add Rust device abstractions to DRIVER CORE device: rust: improve safety comments MAINTAINERS: add Danilo as FIRMWARE LOADER maintainer MAINTAINERS: add Rust FW abstractions to FIRMWARE LOADER firmware: rust: improve safety comments ... |
||
Linus Torvalds
|
3c3ff7be97 |
powerpc updates for 6.11
- Remove support for 40x CPUs & platforms. - Add support to the 64-bit BPF JIT for cpu v4 instructions. - Fix PCI hotplug driver crash on powernv. - Fix doorbell emulation for KVM on PAPR guests (nestedv2). - Fix KVM nested guest handling of some less used SPRs. - Online NUMA nodes with no CPU/memory if they have a PCI device attached. - Reduce memory overhead of enabling kfence on 64-bit Radix MMU kernels. - Reimplement the iommu table_group_ops for pseries for VFIO SPAPR TCE. Thanks to: Anjali K, Artem Savkov, Athira Rajeev, Breno Leitao, Brian King, Celeste Liu, Christophe Leroy, Esben Haabendal, Gaurav Batra, Gautam Menghani, Haren Myneni, Hari Bathini, Jeff Johnson, Krishna Kumar, Krzysztof Kozlowski, Nathan Lynch, Nicholas Piggin, Nick Bowler, Nilay Shroff, Rob Herring (Arm), Shawn Anastasio, Shivaprasad G Bhat, Sourabh Jain, Srikar Dronamraju, Timothy Pearson, Uwe Kleine-König, Vaibhav Jain. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmaaUNITHG1wZUBlbGxl cm1hbi5pZC5hdQAKCRBR6+o8yOGlgDA+D/4o7OZ+SY0plTlMKSy3hW/SRXVj/byA CCKdizNY+3Rf/+K7KhuLOUPXhZOemLPE0xfKS3ND4mIEKCswzzXqmi6kjPH0qd8q qUhkHbt/LNpNJzZOYYw+usaklMTMdZtAl/jD9WEvGwgu2EYHgrujRIq04kEI1b0e OPiRnXOZcfevRBepQmYZKHvFlCRRa5vvsQcvLfY64yFqD0AsKTHgIi/48Dn33pb2 hqHYyV1tZA3uT86Z1TgF1OG83VOSDsgc19Sb2xn14O9aJJ7lD2TOgVa4P4FfBlXA TXYYGQwK31ymGVWGcGfebVdC1ECeTem9n28vlk5I0NO9xNgPok/Ov4DAiZ+u1G0E 3CXRDx9Uz2yPcGBJI2dpxfp2iw83Ad2DtBzAdukMD36xnC7xfrQz+W9SQfbcPJ8e I5SMAstWuLNgrX7YkjAOnXh1N41kht/mdV6KHdcMxPc7jOtAD65gUOZcgwYLeXlT Av17Ax0PMbiQ1BpFe2KNr/0T9Ba5k5rN7oDSKncDAq4uX8LcZKHj4bSHT9KroT1C q+GERspoCYp2VDMO742Jm7KTmQDHsS5y4Q+iSdOR8cQBXF613FaryDxSoJZhg2pf C2zIVED13RGcjIFcWlv73iA6QpBsphM+WWFz7mjULyJhxFQwm6BYt+Wy6jFu84oH sOgvPH8YyaK2uA== =eHVd -----END PGP SIGNATURE----- Merge tag 'powerpc-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Remove support for 40x CPUs & platforms - Add support to the 64-bit BPF JIT for cpu v4 instructions - Fix PCI hotplug driver crash on powernv - Fix doorbell emulation for KVM on PAPR guests (nestedv2) - Fix KVM nested guest handling of some less used SPRs - Online NUMA nodes with no CPU/memory if they have a PCI device attached - Reduce memory overhead of enabling kfence on 64-bit Radix MMU kernels - Reimplement the iommu table_group_ops for pseries for VFIO SPAPR TCE Thanks to: Anjali K, Artem Savkov, Athira Rajeev, Breno Leitao, Brian King, Celeste Liu, Christophe Leroy, Esben Haabendal, Gaurav Batra, Gautam Menghani, Haren Myneni, Hari Bathini, Jeff Johnson, Krishna Kumar, Krzysztof Kozlowski, Nathan Lynch, Nicholas Piggin, Nick Bowler, Nilay Shroff, Rob Herring (Arm), Shawn Anastasio, Shivaprasad G Bhat, Sourabh Jain, Srikar Dronamraju, Timothy Pearson, Uwe Kleine-König, and Vaibhav Jain. * tag 'powerpc-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (57 commits) Documentation/powerpc: Mention 40x is removed powerpc: Remove 40x leftovers macintosh/therm_windtunnel: fix module unload. powerpc: Check only single values are passed to CPU/MMU feature checks powerpc/xmon: Fix disassembly CPU feature checks powerpc: Drop clang workaround for builtin constant checks powerpc64/bpf: jit support for signed division and modulo powerpc64/bpf: jit support for sign extended mov powerpc64/bpf: jit support for sign extended load powerpc64/bpf: jit support for unconditional byte swap powerpc64/bpf: jit support for 32bit offset jmp instruction powerpc/pci: Hotplug driver bridge support pci/hotplug/pnv_php: Fix hotplug driver crash on Powernv powerpc/configs: Update defconfig with now user-visible CONFIG_FSL_IFC powerpc: add missing MODULE_DESCRIPTION() macros macintosh/mac_hid: add MODULE_DESCRIPTION() KVM: PPC: add missing MODULE_DESCRIPTION() macros powerpc/kexec: Use of_property_read_reg() powerpc/64s/radix/kfence: map __kfence_pool at page granularity powerpc/pseries/iommu: Define spapr_tce_table_group_ops only with CONFIG_IOMMU_API ... |
||
Christophe Leroy
|
3efe19a9b1 |
powerpc: Remove 40x leftovers
Remove stale references to 40x. Fixes: e939da89d024 ("powerpc: Remove 40x from Kconfig and defconfig") Fixes: 548f5244f106 ("powerpc/40x: Remove EP405") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/ab30ae302783d8617d407864b92db1b926ab5ab9.1720694914.git.christophe.leroy@csgroup.eu |
||
Jeff Johnson
|
9c5f64734f |
powerpc: add missing MODULE_DESCRIPTION() macros
With ARCH=powerpc, make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/kernel/rtas_flash.o WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/sysdev/rtc_cmos_setup.o WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/platforms/pseries/papr_scm.o WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/platforms/cell/spufs/spufs.o WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/platforms/cell/cbe_thermal.o WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/platforms/cell/cpufreq_spudemand.o WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/platforms/cell/cbe_powerbutton.o Add the missing invocation of the MODULE_DESCRIPTION() macro to all files which have a MODULE_LICENSE(). This includes 85xx/t1042rdb_diu.c and chrp/nvram.c which, although they did not produce a warning with the powerpc allmodconfig configuration, may cause this warning with other configurations. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240615-md-powerpc-arch-powerpc-v1-1-ba4956bea47a@quicinc.com |
||
Shivaprasad G Bhat
|
af199e6ca2 |
powerpc/pseries/iommu: Define spapr_tce_table_group_ops only with CONFIG_IOMMU_API
The patch fixes the below warning: arch/powerpc/platforms/pseries/iommu.c:1824:37: warning: 'spapr_tce_table_group_ops' defined but not used Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202407020357.Hz8kQkKf-lkp@intel.com/ Fixes: b09c031d9433 ("powerpc/iommu: Move pSeries specific functions to pseries/iommu.c") Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/172008854222.784.13666247605789409729.stgit@linux.ibm.com |
||
Greg Kroah-Hartman
|
d69d804845 |
driver core: have match() callback in struct bus_type take a const *
In the match() callback, the struct device_driver * should not be changed, so change the function callback to be a const *. This is one step of many towards making the driver core safe to have struct device_driver in read-only memory. Because the match() callback is in all busses, all busses are modified to handle this properly. This does entail switching some container_of() calls to container_of_const() to properly handle the constant *. For some busses, like PCI and USB and HV, the const * is cast away in the match callback as those busses do want to modify those structures at this point in time (they have a local lock in the driver structure.) That will have to be changed in the future if they wish to have their struct device * in read-only-memory. Cc: Rafael J. Wysocki <rafael@kernel.org> Reviewed-by: Alex Elder <elder@kernel.org> Acked-by: Sumit Garg <sumit.garg@linaro.org> Link: https://lore.kernel.org/r/2024070136-wrongdoer-busily-01e8@gregkh Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Christophe Leroy
|
d5d1a1a55a |
powerpc/platforms: Move files from 4xx to 44x
Only 44x uses 4xx now, so only keep one directory. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240628121201.130802-7-mpe@ellerman.id.au |
||
Michael Ellerman
|
7bf5f0562b |
powerpc: Replace CONFIG_4xx with CONFIG_44x
Replace 4xx usage with 44x, and replace 4xx_SOC with 44x. Also, as pointed out by Christophe, if 44x || BOOKE can be simplified to just test BOOKE, because 44x always selects BOOKE. Retain the CONFIG_4xx symbol, as there are drivers that use it to mean 4xx || 44x, those will need updating before CONFIG_4xx can be removed. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240628121201.130802-6-mpe@ellerman.id.au |
||
Michael Ellerman
|
002b27a51b |
powerpc/4xx: Remove CONFIG_BOOKE_OR_40x
Now that 40x is gone, replace CONFIG_BOOKE_OR_40x by CONFIG_BOOKE. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240628121201.130802-5-mpe@ellerman.id.au |
||
Christophe Leroy
|
732b32daef |
powerpc: Remove core support for 40x
Now that 40x platforms have gone, remove support for 40x in the core of powerpc arch. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240628121201.130802-4-mpe@ellerman.id.au |
||
Michael Ellerman
|
e939da89d0 |
powerpc: Remove 40x from Kconfig and defconfig
Remove 40x from Kconfig, making the code unreachable. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240628121201.130802-3-mpe@ellerman.id.au |
||
Christophe Leroy
|
47d13a269b |
powerpc/40x: Remove 40x platforms.
40x platforms have been orphaned for many years. Remove them. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240628121201.130802-1-mpe@ellerman.id.au |
||
Nicholas Piggin
|
21a741eb75 |
powerpc/pseries: Fix scv instruction crash with kexec
kexec on pseries disables AIL (reloc_on_exc), required for scv instruction support, before other CPUs have been shut down. This means they can execute scv instructions after AIL is disabled, which causes an interrupt at an unexpected entry location that crashes the kernel. Change the kexec sequence to disable AIL after other CPUs have been brought down. As a refresher, the real-mode scv interrupt vector is 0x17000, and the fixed-location head code probably couldn't easily deal with implementing such high addresses so it was just decided not to support that interrupt at all. Fixes: 7fa95f9adaee ("powerpc/64s: system call support for scv/rfscv instructions") Cc: stable@vger.kernel.org # v5.9+ Reported-by: Sourabh Jain <sourabhjain@linux.ibm.com> Closes: https://lore.kernel.org/3b4b2943-49ad-4619-b195-bc416f1d1409@linux.ibm.com Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Gautam Menghani <gautam@linux.ibm.com> Tested-by: Sourabh Jain <sourabhjain@linux.ibm.com> Link: https://msgid.link/20240625134047.298759-1-npiggin@gmail.com Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> |
||
Shivaprasad G Bhat
|
f431a8cde7 |
powerpc/iommu: Reimplement the iommu_table_group_ops for pSeries
PPC64 IOMMU API defines iommu_table_group_ops which handles DMA windows for PEs, their ownership transfer, create/set/unset the TCE tables for the Dynamic DMA wundows(DDW). VFIOS uses these APIs for support on POWER. The commit 9d67c9433509 ("powerpc/iommu: Add "borrowing" iommu_table_group_ops") implemented partial support for this API with "borrow" mechanism wherein the DMA windows if created already by the host driver, they would be available for VFIO to use. Also, it didn't have the support to control/modify the window size or the IO page size. The current patch implements all the necessary iommu_table_group_ops APIs there by avoiding the "borrrowing". So, just the way it is on the PowerNV platform, with this patch the iommu table group ownership is transferred to the VFIO PPC subdriver, the iommu table, DMA windows creation/deletion all driven through the APIs. The pSeries uses the query-pe-dma-window, create-pe-dma-window and reset-pe-dma-window RTAS calls for DMA window creation, deletion and reset to defaul. The RTAs calls do show some minor differences to the way things are to be handled on the pSeries which are listed below. * On pSeries, the default DMA window size is "fixed" cannot be custom sized as requested by the user. For non-SRIOV VFs, It is fixed at 2GB and for SRIOV VFs, its variable sized based on the capacity assigned to it during the VF assignment to the LPAR. So, for the default DMA window alone the size if requested less than tce32_size, the smaller size is enforced using the iommu table->it_size. * The DMA start address for 32-bit window is 0, and for the 64-bit window in case of PowerNV is hardcoded to TVE select (bit 59) at 512PiB offset. This address is returned at the time of create_table() API call (even before the window is created), the subsequent set_window() call actually opens the DMA window. On pSeries, the DMA start address for 32-bit window is known from the 'ibm,dma-window' DT property. However, the 64-bit window start address is not known until the create-pe-dma RTAS call is made. So, the create_table() which returns the DMA window start address actually opens the DMA window and returns the DMA start address as returned by the Hypervisor for the create-pe-dma RTAS call. * The reset-pe-dma RTAS call resets the DMA windows and restores the default DMA window, however it does not clear the TCE table entries if there are any. In case of ownership transfer from platform domain which used direct mapping, the patch chooses remove-pe-dma instead of reset-pe for the 64-bit window intentionally so that the clear_dma_window() is called. Other than the DMA window management changes mentioned above, the patch also brings back the userspace view for the single level TCE as it existed before commit 090bad39b237a ("powerpc/powernv: Add indirect levels to it_userspace") along with the relavent refactoring. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/171923275958.1397.907964437142542242.stgit@linux.ibm.com |
||
Shivaprasad G Bhat
|
aed6e4946e |
powerpc/pseries/iommu: Use the iommu table[0] for IOV VF's DDW
This patch basically brings consistency with PowerNV approach to use the first freely available iommu table when the default window is removed. The pSeries iommu code convention has been that the table[0] is for the default 32 bit DMA window and the table[1] is for the 64 bit DDW. With VFs having only 1 DMA window, the default has to be removed for creating the larger DMA window. The existing code uses the table[1] for that, while marking the table[0] as NULL. This is fine as long as the host driver itself uses the device. For the VFIO user, on pSeries there is no way to skip table[0] as the VFIO subdriver uses the first freely available table. The window 0, when created as 64-bit DDW in that context would still be on table[0], as the maximum number of windows is 1. This won't have any impact for the host driver as the table is fetched from the device's iommu_table_base. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> [mpe: Rebase and resolve conflicts] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/171923272328.1397.1817843961216868850.stgit@linux.ibm.com |
||
Shivaprasad G Bhat
|
6af67f2ddf |
powerpc/pseries/iommu: Fix the VFIO_IOMMU_SPAPR_TCE_GET_INFO ioctl output
The ioctl VFIO_IOMMU_SPAPR_TCE_GET_INFO is not reporting the actuals on the platform as not all the details are correctly collected during the platform probe/scan into the iommu_table_group. Collect the information during the device setup time as the DMA window property is already looked up on parent nodes anyway. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/171923271138.1397.7908302630061814623.stgit@linux.ibm.com |
||
Shivaprasad G Bhat
|
b09c031d94 |
powerpc/iommu: Move pSeries specific functions to pseries/iommu.c
The PowerNV specific table_group_ops are defined in powernv/pci-ioda.c. The pSeries specific table_group_ops are sitting in the generic powerpc file. Move it to where it actually belong(pseries/iommu.c). The functions are currently defined even for CONFIG_PPC_POWERNV which are unused on PowerNV. Only code movement, no functional changes intended. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/171923269701.1397.15758640002786937132.stgit@linux.ibm.com |
||
Anjali K
|
1a14150e16 |
powerpc/pseries: Whitelist dtl slub object for copying to userspace
Reading the dispatch trace log from /sys/kernel/debug/powerpc/dtl/cpu-* results in a BUG() when the config CONFIG_HARDENED_USERCOPY is enabled as shown below. kernel BUG at mm/usercopy.c:102! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries Modules linked in: xfs libcrc32c dm_service_time sd_mod t10_pi sg ibmvfc scsi_transport_fc ibmveth pseries_wdt dm_multipath dm_mirror dm_region_hash dm_log dm_mod fuse CPU: 27 PID: 1815 Comm: python3 Not tainted 6.10.0-rc3 #85 Hardware name: IBM,9040-MRX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_042) hv:phyp pSeries NIP: c0000000005d23d4 LR: c0000000005d23d0 CTR: 00000000006ee6f8 REGS: c000000120c078c0 TRAP: 0700 Not tainted (6.10.0-rc3) MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 2828220f XER: 0000000e CFAR: c0000000001fdc80 IRQMASK: 0 [ ... GPRs omitted ... ] NIP [c0000000005d23d4] usercopy_abort+0x78/0xb0 LR [c0000000005d23d0] usercopy_abort+0x74/0xb0 Call Trace: usercopy_abort+0x74/0xb0 (unreliable) __check_heap_object+0xf8/0x120 check_heap_object+0x218/0x240 __check_object_size+0x84/0x1a4 dtl_file_read+0x17c/0x2c4 full_proxy_read+0x8c/0x110 vfs_read+0xdc/0x3a0 ksys_read+0x84/0x144 system_call_exception+0x124/0x330 system_call_vectored_common+0x15c/0x2ec --- interrupt: 3000 at 0x7fff81f3ab34 Commit 6d07d1cd300f ("usercopy: Restrict non-usercopy caches to size 0") requires that only whitelisted areas in slab/slub objects can be copied to userspace when usercopy hardening is enabled using CONFIG_HARDENED_USERCOPY. Dtl contains hypervisor dispatch events which are expected to be read by privileged users. Hence mark this safe for user access. Specify useroffset=0 and usersize=DISPATCH_LOG_BYTES to whitelist the entire object. Co-developed-by: Vishal Chourasia <vishalc@linux.ibm.com> Signed-off-by: Vishal Chourasia <vishalc@linux.ibm.com> Signed-off-by: Anjali K <anjalik@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240614173844.746818-1-anjalik@linux.ibm.com |
||
Haren Myneni
|
43ac9f5cd4 |
powerpc/pseries/vas: Use usleep_range() to support HCALL delay
VAS allocate, modify and deallocate HCALLs returns H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC for busy delay and expects OS to reissue HCALL after that delay. But using msleep() will often sleep at least 20 msecs even though the hypervisor suggests OS reissue these HCALLs after 1 or 10msecs. The open and close VAS window functions hold mutex and then issue these HCALLs. So these operations can take longer than the necessary when multiple threads issue open or close window APIs simultaneously, especially might affect the performance in the case of repeat open/close APIs for each compression request. Multiple tasks can open / close VAS windows at the same time which depends on the available VAS credits. For example, 240 cores system provides 4800 VAS credits. It means 4800 tasks can execute open VAS windows HCALLs with the mutex. Since each msleep() will often sleep more than 20 msecs, some tasks are waiting more than 120 secs to acquire mutex. It can cause hung traces for these tasks in dmesg due to mutex contention around open/close HCALLs. Instead of msleep(), use usleep_range() to ensure sleep with the expected value before issuing HCALL again. So since each task sleep 10 msecs maximum, this patch allow more tasks can issue open/close VAS calls without any hung traces in the dmesg. Signed-off-by: Haren Myneni <haren@linux.ibm.com> Suggested-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240116055910.421605-1-haren@linux.ibm.com |
||
Nilay Shroff
|
11981816e3 |
powerpc/numa: Online a node if PHB is attached.
In the current design, a numa-node is made online only if that node is attached to cpu/memory. With this design, if any PCI/IO device is found to be attached to a numa-node which is not online then the numa-node id of the corresponding PCI/IO device is set to NUMA_NO_NODE(-1). This design may negatively impact the performance of PCIe device if the numa-node assigned to PCIe device is -1 because in such case we may not be able to accurately calculate the distance between two nodes. The multi-controller NVMe PCIe disk has an issue with calculating the node distance if the PCIe NVMe controller is attached to a PCI host bridge which has numa-node id value set to NUMA_NO_NODE. This patch helps fix this ensuring that a cpu/memory less numa node is made online if it's attached to PCI host bridge. Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240517142531.3273464-3-nilay@linux.ibm.com |
||
Gaurav Batra
|
ff5163bb70 |
powerpc/pseries/iommu: Split Dynamic DMA Window to be used in Hybrid mode
Dynamic DMA Window (DDW) supports TCEs that are backed by 2MB page size. In most configurations, DDW is big enough to pre-map all of LPAR memory for IO. Pre-mapping of memory for DMA results in improvements in IO performance. Persistent memory, vPMEM, can be assigned to an LPAR as well. vPMEM is not contiguous with LPAR memory and usually is assigned at high memory addresses. This makes is not possible to pre-map both vPMEM and LPAR memory in the same DDW. For a dedicated adapter this limitation is not an issue. Dedicated adapters can have both Default DMA window, which is backed by 4K page size and a DDW backed by 2MB page size TCEs. In this scenario, LPAR memory is pre-mapped in the DDW. Any DMA going to the vPMEM is routed via dynamically allocated TCEs in the default window. The issue arises with SR-IOV adapters. There is only one DMA window - either Default or DDW. If an LPAR has vPMEM assigned, memory is not pre-mapped in the DDW since TCEs needs to be allocated for vPMEM as well. In this case, DDW is created and TCEs are dynamically allocated for both vPMEM and LPAR memory. Today, DDW is only used in single mode - direct mapped TCEs or dynamically mapped TCEs. This enhancement breaks a single DDW in 2 regions - 1. First region to pre-map LPAR memory 2. Second region to dynamically allocate TCEs for IO to vPMEM The DDW is split only if it is big enough to pre-map complete LPAR memory and still have some space left to dynamically map vPMEM. Maximum size possible DDW is created as permitted by the Hypervisor. Signed-off-by: Gaurav Batra <gbatra@linux.ibm.com> Reviewed-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240514014608.35537-1-gbatra@linux.ibm.com |
||
Nathan Lynch
|
12870ae381 |
powerpc/pseries/lparcfg: drop error message from guest name lookup
It's not an error or exceptional situation when the hosting environment does not expose a name for the LP/guest via RTAS or the device tree. This happens with qemu when run without the '-name' option. The message also lacks a newline. Remove it. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Fixes: eddaa9a40275 ("powerpc/pseries: read the lpar name from the firmware") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240524-lparcfg-updates-v2-1-62e2e9d28724@linux.ibm.com |
||
Linus Torvalds
|
d90be6e4aa |
Driver core changes for 6.10-rc1
Here is the small set of driver core and kernfs changes for 6.10-rc1. Nothing major here at all, just a small set of changes for some driver core apis, and minor fixups. Included in here are: - sysfs_bin_attr_simple_read() helper added and used - device_show_string() helper added and used All usages of these were acked by the various maintainers. Also in here are: - kernfs minor cleanup - removed unused functions - typo fix in documentation - pay attention to sysfs_create_link() failures in module.c finally. All of these have been in linux-next for a very long time with no reported problems. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZk3+hQ8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ylfTwCfUyHWkDZuZ7ehdtjzfmcd4EKZBK8An3AAV99G ox8PXMxuFTaUEdT/69FQ =2sEo -----END PGP SIGNATURE----- Merge tag 'driver-core-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the small set of driver core and kernfs changes for 6.10-rc1. Nothing major here at all, just a small set of changes for some driver core apis, and minor fixups. Included in here are: - sysfs_bin_attr_simple_read() helper added and used - device_show_string() helper added and used All usages of these were acked by the various maintainers. Also in here are: - kernfs minor cleanup - removed unused functions - typo fix in documentation - pay attention to sysfs_create_link() failures in module.c finally All of these have been in linux-next for a very long time with no reported problems" * tag 'driver-core-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: device property: Fix a typo in the description of device_get_child_node_count() kernfs: mount: Remove unnecessary ‘NULL’ values from knparent scsi: Use device_show_string() helper for sysfs attributes platform/x86: Use device_show_string() helper for sysfs attributes perf: Use device_show_string() helper for sysfs attributes IB/qib: Use device_show_string() helper for sysfs attributes hwmon: Use device_show_string() helper for sysfs attributes driver core: Add device_show_string() helper for sysfs attributes treewide: Use sysfs_bin_attr_simple_read() helper sysfs: Add sysfs_bin_attr_simple_read() helper module: don't ignore sysfs_create_link() failures driver core: Remove unused platform_notify, platform_notify_remove |
||
Linus Torvalds
|
61307b7be4 |
The usual shower of singleton fixes and minor series all over MM,
documented (hopefully adequately) in the respective changelogs. Notable series include: - Lucas Stach has provided some page-mapping cleanup/consolidation/maintainability work in the series "mm/treewide: Remove pXd_huge() API". - In the series "Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's MPOL_PREFERRED_MANY mode, yielding almost doubled performance in one test. - In their series "Memory allocation profiling" Kent Overstreet and Suren Baghdasaryan have contributed a means of determining (via /proc/allocinfo) whereabouts in the kernel memory is being allocated: number of calls and amount of memory. - Matthew Wilcox has provided the series "Various significant MM patches" which does a number of rather unrelated things, but in largely similar code sites. - In his series "mm: page_alloc: freelist migratetype hygiene" Johannes Weiner has fixed the page allocator's handling of migratetype requests, with resulting improvements in compaction efficiency. - In the series "make the hugetlb migration strategy consistent" Baolin Wang has fixed a hugetlb migration issue, which should improve hugetlb allocation reliability. - Liu Shixin has hit an I/O meltdown caused by readahead in a memory-tight memcg. Addressed in the series "Fix I/O high when memory almost met memcg limit". - In the series "mm/filemap: optimize folio adding and splitting" Kairui Song has optimized pagecache insertion, yielding ~10% performance improvement in one test. - Baoquan He has cleaned up and consolidated the early zone initialization code in the series "mm/mm_init.c: refactor free_area_init_core()". - Baoquan has also redone some MM initializatio code in the series "mm/init: minor clean up and improvement". - MM helper cleanups from Christoph Hellwig in his series "remove follow_pfn". - More cleanups from Matthew Wilcox in the series "Various page->flags cleanups". - Vlastimil Babka has contributed maintainability improvements in the series "memcg_kmem hooks refactoring". - More folio conversions and cleanups in Matthew Wilcox's series "Convert huge_zero_page to huge_zero_folio" "khugepaged folio conversions" "Remove page_idle and page_young wrappers" "Use folio APIs in procfs" "Clean up __folio_put()" "Some cleanups for memory-failure" "Remove page_mapping()" "More folio compat code removal" - David Hildenbrand chipped in with "fs/proc/task_mmu: convert hugetlb functions to work on folis". - Code consolidation and cleanup work related to GUP's handling of hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2". - Rick Edgecombe has developed some fixes to stack guard gaps in the series "Cover a guard gap corner case". - Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the series "mm/ksm: fix ksm exec support for prctl". - Baolin Wang has implemented NUMA balancing for multi-size THPs. This is a simple first-cut implementation for now. The series is "support multi-size THP numa balancing". - Cleanups to vma handling helper functions from Matthew Wilcox in the series "Unify vma_address and vma_pgoff_address". - Some selftests maintenance work from Dev Jain in the series "selftests/mm: mremap_test: Optimizations and style fixes". - Improvements to the swapping of multi-size THPs from Ryan Roberts in the series "Swap-out mTHP without splitting". - Kefeng Wang has significantly optimized the handling of arm64's permission page faults in the series "arch/mm/fault: accelerate pagefault when badaccess" "mm: remove arch's private VM_FAULT_BADMAP/BADACCESS" - GUP cleanups from David Hildenbrand in "mm/gup: consistently call it GUP-fast". - hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault path to use struct vm_fault". - selftests build fixes from John Hubbard in the series "Fix selftests/mm build without requiring "make headers"". - Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the series "Improved Memory Tier Creation for CPUless NUMA Nodes". Fixes the initialization code so that migration between different memory types works as intended. - David Hildenbrand has improved follow_pte() and fixed an errant driver in the series "mm: follow_pte() improvements and acrn follow_pte() fixes". - David also did some cleanup work on large folio mapcounts in his series "mm: mapcount for large folios + page_mapcount() cleanups". - Folio conversions in KSM in Alex Shi's series "transfer page to folio in KSM". - Barry Song has added some sysfs stats for monitoring multi-size THP's in the series "mm: add per-order mTHP alloc and swpout counters". - Some zswap cleanups from Yosry Ahmed in the series "zswap same-filled and limit checking cleanups". - Matthew Wilcox has been looking at buffer_head code and found the documentation to be lacking. The series is "Improve buffer head documentation". - Multi-size THPs get more work, this time from Lance Yang. His series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free" optimizes the freeing of these things. - Kemeng Shi has added more userspace-visible writeback instrumentation in the series "Improve visibility of writeback". - Kemeng Shi then sent some maintenance work on top in the series "Fix and cleanups to page-writeback". - Matthew Wilcox reduces mmap_lock traffic in the anon vma code in the series "Improve anon_vma scalability for anon VMAs". Intel's test bot reported an improbable 3x improvement in one test. - SeongJae Park adds some DAMON feature work in the series "mm/damon: add a DAMOS filter type for page granularity access recheck" "selftests/damon: add DAMOS quota goal test" - Also some maintenance work in the series "mm/damon/paddr: simplify page level access re-check for pageout" "mm/damon: misc fixes and improvements" - David Hildenbrand has disabled some known-to-fail selftests ni the series "selftests: mm: cow: flag vmsplice() hugetlb tests as XFAIL". - memcg metadata storage optimizations from Shakeel Butt in "memcg: reduce memory consumption by memcg stats". - DAX fixes and maintenance work from Vishal Verma in the series "dax/bus.c: Fixups for dax-bus locking". -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZkgQYwAKCRDdBJ7gKXxA jrdKAP9WVJdpEcXxpoub/vVE0UWGtffr8foifi9bCwrQrGh5mgEAx7Yf0+d/oBZB nvA4E0DcPrUAFy144FNM0NTCb7u9vAw= =V3R/ -----END PGP SIGNATURE----- Merge tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull mm updates from Andrew Morton: "The usual shower of singleton fixes and minor series all over MM, documented (hopefully adequately) in the respective changelogs. Notable series include: - Lucas Stach has provided some page-mapping cleanup/consolidation/ maintainability work in the series "mm/treewide: Remove pXd_huge() API". - In the series "Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's MPOL_PREFERRED_MANY mode, yielding almost doubled performance in one test. - In their series "Memory allocation profiling" Kent Overstreet and Suren Baghdasaryan have contributed a means of determining (via /proc/allocinfo) whereabouts in the kernel memory is being allocated: number of calls and amount of memory. - Matthew Wilcox has provided the series "Various significant MM patches" which does a number of rather unrelated things, but in largely similar code sites. - In his series "mm: page_alloc: freelist migratetype hygiene" Johannes Weiner has fixed the page allocator's handling of migratetype requests, with resulting improvements in compaction efficiency. - In the series "make the hugetlb migration strategy consistent" Baolin Wang has fixed a hugetlb migration issue, which should improve hugetlb allocation reliability. - Liu Shixin has hit an I/O meltdown caused by readahead in a memory-tight memcg. Addressed in the series "Fix I/O high when memory almost met memcg limit". - In the series "mm/filemap: optimize folio adding and splitting" Kairui Song has optimized pagecache insertion, yielding ~10% performance improvement in one test. - Baoquan He has cleaned up and consolidated the early zone initialization code in the series "mm/mm_init.c: refactor free_area_init_core()". - Baoquan has also redone some MM initializatio code in the series "mm/init: minor clean up and improvement". - MM helper cleanups from Christoph Hellwig in his series "remove follow_pfn". - More cleanups from Matthew Wilcox in the series "Various page->flags cleanups". - Vlastimil Babka has contributed maintainability improvements in the series "memcg_kmem hooks refactoring". - More folio conversions and cleanups in Matthew Wilcox's series: "Convert huge_zero_page to huge_zero_folio" "khugepaged folio conversions" "Remove page_idle and page_young wrappers" "Use folio APIs in procfs" "Clean up __folio_put()" "Some cleanups for memory-failure" "Remove page_mapping()" "More folio compat code removal" - David Hildenbrand chipped in with "fs/proc/task_mmu: convert hugetlb functions to work on folis". - Code consolidation and cleanup work related to GUP's handling of hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2". - Rick Edgecombe has developed some fixes to stack guard gaps in the series "Cover a guard gap corner case". - Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the series "mm/ksm: fix ksm exec support for prctl". - Baolin Wang has implemented NUMA balancing for multi-size THPs. This is a simple first-cut implementation for now. The series is "support multi-size THP numa balancing". - Cleanups to vma handling helper functions from Matthew Wilcox in the series "Unify vma_address and vma_pgoff_address". - Some selftests maintenance work from Dev Jain in the series "selftests/mm: mremap_test: Optimizations and style fixes". - Improvements to the swapping of multi-size THPs from Ryan Roberts in the series "Swap-out mTHP without splitting". - Kefeng Wang has significantly optimized the handling of arm64's permission page faults in the series "arch/mm/fault: accelerate pagefault when badaccess" "mm: remove arch's private VM_FAULT_BADMAP/BADACCESS" - GUP cleanups from David Hildenbrand in "mm/gup: consistently call it GUP-fast". - hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault path to use struct vm_fault". - selftests build fixes from John Hubbard in the series "Fix selftests/mm build without requiring "make headers"". - Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the series "Improved Memory Tier Creation for CPUless NUMA Nodes". Fixes the initialization code so that migration between different memory types works as intended. - David Hildenbrand has improved follow_pte() and fixed an errant driver in the series "mm: follow_pte() improvements and acrn follow_pte() fixes". - David also did some cleanup work on large folio mapcounts in his series "mm: mapcount for large folios + page_mapcount() cleanups". - Folio conversions in KSM in Alex Shi's series "transfer page to folio in KSM". - Barry Song has added some sysfs stats for monitoring multi-size THP's in the series "mm: add per-order mTHP alloc and swpout counters". - Some zswap cleanups from Yosry Ahmed in the series "zswap same-filled and limit checking cleanups". - Matthew Wilcox has been looking at buffer_head code and found the documentation to be lacking. The series is "Improve buffer head documentation". - Multi-size THPs get more work, this time from Lance Yang. His series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free" optimizes the freeing of these things. - Kemeng Shi has added more userspace-visible writeback instrumentation in the series "Improve visibility of writeback". - Kemeng Shi then sent some maintenance work on top in the series "Fix and cleanups to page-writeback". - Matthew Wilcox reduces mmap_lock traffic in the anon vma code in the series "Improve anon_vma scalability for anon VMAs". Intel's test bot reported an improbable 3x improvement in one test. - SeongJae Park adds some DAMON feature work in the series "mm/damon: add a DAMOS filter type for page granularity access recheck" "selftests/damon: add DAMOS quota goal test" - Also some maintenance work in the series "mm/damon/paddr: simplify page level access re-check for pageout" "mm/damon: misc fixes and improvements" - David Hildenbrand has disabled some known-to-fail selftests ni the series "selftests: mm: cow: flag vmsplice() hugetlb tests as XFAIL". - memcg metadata storage optimizations from Shakeel Butt in "memcg: reduce memory consumption by memcg stats". - DAX fixes and maintenance work from Vishal Verma in the series "dax/bus.c: Fixups for dax-bus locking"" * tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits) memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault selftests: cgroup: add tests to verify the zswap writeback path mm: memcg: make alloc_mem_cgroup_per_node_info() return bool mm/damon/core: fix return value from damos_wmark_metric_value mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED selftests: cgroup: remove redundant enabling of memory controller Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT Docs/mm/damon/design: use a list for supported filters Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file selftests/damon: classify tests for functionalities and regressions selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None' selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts selftests/damon/_damon_sysfs: check errors from nr_schemes file reads mm/damon/core: initialize ->esz_bp from damos_quota_init_priv() selftests/damon: add a test for DAMOS quota goal ... |
||
Linus Torvalds
|
ff2632d7d0 |
powerpc updates for 6.10
- Enable BPF Kernel Functions (kfuncs) in the powerpc BPF JIT. - Allow per-process DEXCR (Dynamic Execution Control Register) settings via prctl, notably NPHIE which controls hashst/hashchk for ROP protection. - Install powerpc selftests in sub-directories. Note this changes the way run_kselftest.sh needs to be invoked for powerpc selftests. - Change fadump (Firmware Assisted Dump) to better handle memory add/remove. - Add support for passing additional parameters to the fadump kernel. - Add support for updating the kdump image on CPU/memory add/remove events. - Other small features, cleanups and fixes. Thanks to: Andrew Donnellan, Andy Shevchenko, Aneesh Kumar K.V, Arnd Bergmann, Benjamin Gray, Bjorn Helgaas, Christian Zigotzky, Christophe Jaillet, Christophe Leroy, Colin Ian King, Cédric Le Goater, Dr. David Alan Gilbert, Erhard Furtner, Frank Li, GUO Zihua, Ganesh Goudar, Geoff Levand, Ghanshyam Agrawal, Greg Kurz, Hari Bathini, Joel Stanley, Justin Stitt, Kunwu Chan, Li Yang, Lidong Zhong, Madhavan Srinivasan, Mahesh Salgaonkar, Masahiro Yamada, Matthias Schiffer, Naresh Kamboju, Nathan Chancellor, Nathan Lynch, Naveen N Rao, Nicholas Miehlbradt, Ran Wang, Randy Dunlap, Ritesh Harjani, Sachin Sant, Shirisha Ganta, Shrikanth Hegde, Sourabh Jain, Stephen Rothwell, sundar, Thorsten Blum, Vaibhav Jain, Xiaowei Bao, Yang Li, Zhao Chenhui. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmZHLtwTHG1wZUBlbGxl cm1hbi5pZC5hdQAKCRBR6+o8yOGlgCGdD/0cqQkYl6+E0/K68Y7jnAWF+l0LNFlm /4jZ+zKXPiPhSdaQq4xo2ZjEooUPsm3c+AHidmrAtOMBULvv4pyciu61hrVu4Y2b aAudkBMUc+i/Lfaz7fq1KnN4LDFVm7xZZ+i/ju9tOBLMpOZ3YZ+YoOGA6nqsshJF XuB5h0T+H55he1wBpvyyrsUUyss53Mp3IsajxdwBOsUDDp0fSAg8SLEyhoiK3BsQ EjEa6iEqJSBheqFEXPvqsMuqM3k51CHe/pCOMODjo7P+u/MNrClZUscZKXGB5xq9 Bu3SPxIYfRmU4XE53517faElEPmlxSBrjQGCD1EGEVXGsjn6r7TD6R5voow3SoUq CLTy90KNNrS1cIqeomu6bJ/anzYrViqTdekImA7Vb+Ol8f+uT9l+l1D75eYOKPQ3 N0AHoa4rnWIb5kjCAjHaZ54O+B2q2tPlQqFUmt+BrvZyKS13zjE36stnArxP3MPC Xw6y3huX3AkZiJ4mQYRiBn//xGOLwrRCd/EoTDnoe08yq0Hoor6qIm4uEy2Nu3Kf 0mBsEOxMsmQd6NEq43B/sFgVbbxKhAyxfZ9gHqxDQZcgoxXcMesyj/n4+jM5sRYK zmavLlykM2Tjlh1evs8+e0mCEwDjDn2GRlqstJQTrmnGhbMKi3jvw9I7gGtZVqbS kAflTXzsIXvxBA== =GoCV -----END PGP SIGNATURE----- Merge tag 'powerpc-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Enable BPF Kernel Functions (kfuncs) in the powerpc BPF JIT. - Allow per-process DEXCR (Dynamic Execution Control Register) settings via prctl, notably NPHIE which controls hashst/hashchk for ROP protection. - Install powerpc selftests in sub-directories. Note this changes the way run_kselftest.sh needs to be invoked for powerpc selftests. - Change fadump (Firmware Assisted Dump) to better handle memory add/remove. - Add support for passing additional parameters to the fadump kernel. - Add support for updating the kdump image on CPU/memory add/remove events. - Other small features, cleanups and fixes. Thanks to Andrew Donnellan, Andy Shevchenko, Aneesh Kumar K.V, Arnd Bergmann, Benjamin Gray, Bjorn Helgaas, Christian Zigotzky, Christophe Jaillet, Christophe Leroy, Colin Ian King, Cédric Le Goater, Dr. David Alan Gilbert, Erhard Furtner, Frank Li, GUO Zihua, Ganesh Goudar, Geoff Levand, Ghanshyam Agrawal, Greg Kurz, Hari Bathini, Joel Stanley, Justin Stitt, Kunwu Chan, Li Yang, Lidong Zhong, Madhavan Srinivasan, Mahesh Salgaonkar, Masahiro Yamada, Matthias Schiffer, Naresh Kamboju, Nathan Chancellor, Nathan Lynch, Naveen N Rao, Nicholas Miehlbradt, Ran Wang, Randy Dunlap, Ritesh Harjani, Sachin Sant, Shirisha Ganta, Shrikanth Hegde, Sourabh Jain, Stephen Rothwell, sundar, Thorsten Blum, Vaibhav Jain, Xiaowei Bao, Yang Li, and Zhao Chenhui. * tag 'powerpc-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (85 commits) powerpc/fadump: Fix section mismatch warning powerpc/85xx: fix compile error without CONFIG_CRASH_DUMP powerpc/fadump: update documentation about bootargs_append powerpc/fadump: pass additional parameters when fadump is active powerpc/fadump: setup additional parameters for dump capture kernel powerpc/pseries/fadump: add support for multiple boot memory regions selftests/powerpc/dexcr: Fix spelling mistake "predicition" -> "prediction" KVM: PPC: Book3S HV nestedv2: Fix an error handling path in gs_msg_ops_kvmhv_nestedv2_config_fill_info() KVM: PPC: Fix documentation for ppc mmu caps KVM: PPC: code cleanup for kvmppc_book3s_irqprio_deliver KVM: PPC: Book3S HV nestedv2: Cancel pending DEC exception powerpc/xmon: Check cpu id in commands "c#", "dp#" and "dx#" powerpc/code-patching: Use dedicated memory routines for patching powerpc/code-patching: Test patch_instructions() during boot powerpc64/kasan: Pass virtual addresses to kasan_init_phys_region() powerpc: rename SPRN_HID2 define to SPRN_HID2_750FX powerpc: Fix typos powerpc/eeh: Fix spelling of the word "auxillary" and update comment macintosh/ams: Fix unused variable warning powerpc/Makefile: Remove bits related to the previous use of -mcmodel=large ... |
||
Linus Torvalds
|
c405aa3ea3 |
Updates for nvdimm for 6.10
Code cleanups, remove duplicate code, and updates for current interfaces. -----BEGIN PGP SIGNATURE----- iIoEABYKADIWIQSgX9xt+GwmrJEQ+euebuN7TNx1MQUCZkThOhQcaXJhLndlaW55 QGludGVsLmNvbQAKCRCebuN7TNx1MWUvAP9VuWLTc+bCguRMbArOsdE2T60780pR DT12KXfwm/FvKAD9Hyz8S2CkwvdlwC5hqH2AmdpzDhpJ1f9LBpcsb6GUKwk= =eZC/ -----END PGP SIGNATURE----- Merge tag 'libnvdimm-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull nvdimm updates from Ira Weiny: "The changes include removing duplicate code and updating the nvdimm tree to the current kernel interfaces such as using const for struct device_type and changing the platform remove callback signature" * tag 'libnvdimm-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: dax: remove redundant assignment to variable rc ndtest: Convert to platform remove callback returning void nvdimm/btt: always set max_integrity_segments nvdimm: remove nd_integrity_init dax: constify the struct device_type usage powerpc/papr_scm: Move duplicate definitions to common header files |
||
Hari Bathini
|
7b090b6ff5 |
powerpc/85xx: fix compile error without CONFIG_CRASH_DUMP
Since commit 5c4233cc0920 ("powerpc/kdump: Split KEXEC_CORE and CRASH_DUMP dependency"), crashing_cpu is not available without CONFIG_CRASH_DUMP. Fix compile error on 64-BIT 85xx owing to this change. Fixes: 5c4233cc0920 ("powerpc/kdump: Split KEXEC_CORE and CRASH_DUMP dependency") Cc: stable@vger.kernel.org # v6.9+ Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de> Closes: https://lore.kernel.org/all/fa247ae4-5825-4dbe-a737-d93b7ab4d4b9@xenosoft.de/ Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240510080757.560159-1-hbathini@linux.ibm.com |
||
Hari Bathini
|
683eab94da |
powerpc/fadump: setup additional parameters for dump capture kernel
For fadump case, passing additional parameters to dump capture kernel helps in minimizing the memory footprint for it and also provides the flexibility to disable components/modules, like hugepages, that are hindering the boot process of the special dump capture environment. Set up a dedicated parameter area to be passed to the capture kernel. This area type is defined as RTAS_FADUMP_PARAM_AREA. Sysfs attribute '/sys/kernel/fadump/bootargs_append' is exported to the userspace to specify the additional parameters to be passed to the capture kernel Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240509115755.519982-3-hbathini@linux.ibm.com |
||
Hari Bathini
|
78d5cc15fb |
powerpc/pseries/fadump: add support for multiple boot memory regions
Currently, fadump on pseries assumes a single boot memory region even though f/w supports more than one boot memory region. Add support for more boot memory regions to make the implementation flexible for any enhancements that introduce other region types. For this, rtas memory structure for fadump is updated to have multiple boot memory regions instead of just one. Additionally, methods responsible for creating the fadump memory structure during both the first and second kernel boot have been modified to take these multiple boot memory regions into account. Also, a new callback has been added to the fadump_ops structure to get the maximum boot memory regions supported by the platform. Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240509115755.519982-2-hbathini@linux.ibm.com |
||
Matthias Schiffer
|
ad679719d7 |
powerpc: rename SPRN_HID2 define to SPRN_HID2_750FX
This register number is hardware-specific, rename it for clarity. FIXME comments are added in a few places where it seems like the wrong register is used. As I can't test this, only the rename is done with no functional change. Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240124105031.45734-1-matthias.schiffer@ew.tq-group.com |
||
Bjorn Helgaas
|
0ddbbb8960 |
powerpc: Fix typos
Fix typos, most reported by "codespell arch/powerpc". Only touches comments, no code changes. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240103231605.1801364-8-helgaas@kernel.org |
||
Naveen N Rao
|
c330b50d8c |
powerpc/Makefile: Remove bits related to the previous use of -mcmodel=large
All supported compilers today (gcc v5.1+ and clang v11+) have support for -mcmodel=medium. As such, NO_MINIMAL_TOC is no longer being set. Remove NO_MINIMAL_TOC as well as the fallback to -mminimal-toc. Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240110141237.3179199-1-naveen@kernel.org |
||
Kunwu Chan
|
2d8ebee0aa |
powerpc/pseries/pci: Code cleanup
This part was commented in about 19 years before. If there are no plans to enable this part code in the future, we can remove this dead code. Signed-off-by: Kunwu Chan <chentao@kylinos.cn> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240126025030.577795-1-chentao@kylinos.cn |
||
Kunwu Chan
|
66d8e646e8 |
powerpc/cell: Code cleanup for spufs_mfc_flush
This part was commented from commit a33a7d7309d7 ("[PATCH] spufs: implement mfc access for PPE-side DMA") in about 18 years before. If there are no plans to enable this part code in the future, we can remove this dead code. Signed-off-by: Kunwu Chan <chentao@kylinos.cn> Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240126021258.574916-1-chentao@kylinos.cn |
||
Kunwu Chan
|
f3560a2ba5 |
powerpc/iommu: Code cleanup for cell/iommu.c
This part was commented from commit 165785e5c0be ("[POWERPC] Cell iommu support") in about 17 years before. If there are no plans to enable this part code in the future, we can remove this dead code. Signed-off-by: Kunwu Chan <chentao@kylinos.cn> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240125082637.532826-1-chentao@kylinos.cn |
||
Yang Li
|
554da5e0f7 |
powerpc/rtas: Add kernel-doc comments to smp_startup_cpu()
This commit adds kernel-doc style comments with complete parameter descriptions for the function smp_startup_cpu(). Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240408053109.96360-2-yang.lee@linux.alibaba.com |
||
Lidong Zhong
|
29247de4ad |
powerpc/pseries/vio: Don't return ENODEV if node or compatible missing
We noticed the following nuisance messages during boot process: vio vio: uevent: failed to send synthetic uevent vio 4000: uevent: failed to send synthetic uevent vio 4001: uevent: failed to send synthetic uevent vio 4002: uevent: failedto send synthetic uevent vio 4004: uevent: failed to send synthetic uevent It's caused by either vio_register_device_node() failing to set dev->of_node or the node is missing a "compatible" property. To match the definition of modalias in modalias_show(), remove the return of ENODEV in such cases. The failure messages is also suppressed with this change. Signed-off-by: Lidong Zhong <lidong.zhong@suse.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240411020450.12725-1-lidong.zhong@suse.com |
||
Sourabh Jain
|
c6c5b14dac |
powerpc: make fadump resilient with memory add/remove events
Due to changes in memory resources caused by either memory hotplug or online/offline events, the elfcorehdr, which describes the CPUs and memory of the crashed kernel to the kernel that collects the dump (known as second/fadump kernel), becomes outdated. Consequently, attempting dump collection with an outdated elfcorehdr can lead to failed or inaccurate dump collection. Memory hotplug or online/offline events is referred as memory add/remove events in reset of the commit message. The current solution to address the aforementioned issue is as follows: Monitor memory add/remove events in userspace using udev rules, and re-register fadump whenever there are changes in memory resources. This leads to the creation of a new elfcorehdr with updated system memory information. There are several notable issues associated with re-registering fadump for every memory add/remove events. 1. Bulk memory add/remove events with udev-based fadump re-registration can lead to race conditions and, more importantly, it creates a wide window during which fadump is inactive until all memory add/remove events are settled. 2. Re-registering fadump for every memory add/remove event is inefficient. 3. The memory for elfcorehdr is allocated based on the memblock regions available during early boot and remains fixed thereafter. However, if elfcorehdr is later recreated with additional memblock regions, its size will increase, potentially leading to memory corruption. Address the aforementioned challenges by shifting the creation of elfcorehdr from the first kernel (also referred as the crashed kernel), where it was created and frequently recreated for every memory add/remove event, to the fadump kernel. As a result, the elfcorehdr only needs to be created once, thus eliminating the necessity to re-register fadump during memory add/remove events. At present, the first kernel prepares fadump header and stores it in the fadump reserved area. The fadump header includes the start address of the elfcorehdr, crashing CPU details, and other relevant information. In the event of a crash in the first kernel, the second/fadump boots and accesses the fadump header prepared by the first kernel. It then performs the following steps in a platform-specific function [rtas|opal]_fadump_process: 1. Sanity check for fadump header 2. Update CPU notes in elfcorehdr Along with the above, update the setup_fadump()/fadump.c to create elfcorehdr and set its address to the global variable elfcorehdr_addr for the vmcore module to process it in the second/fadump kernel. Section below outlines the information required to create the elfcorehdr and the changes made to make it available to the fadump kernel if it's not already. To create elfcorehdr, the following crashed kernel information is required: CPU notes, vmcoreinfo, and memory ranges. At present, the CPU notes are already prepared in the fadump kernel, so no changes are needed in that regard. The fadump kernel has access to all crashed kernel memory regions, including boot memory regions that are relocated by firmware to fadump reserved areas, so no changes for that either. However, it is necessary to add new members to the fadump header, i.e., the 'fadump_crash_info_header' structure, in order to pass the crashed kernel's vmcoreinfo address and its size to fadump kernel. In addition to the vmcoreinfo address and size, there are a few other attributes also added to the fadump_crash_info_header structure. 1. version: It stores the fadump header version, which is currently set to 1. This provides flexibility to update the fadump crash info header in the future without changing the magic number. For each change in the fadump header, the version will be increased. This will help the updated kernel determine how to handle kernel dumps from older kernels. The magic number remains relevant for checking fadump header corruption. 2. pt_regs_sz/cpu_mask_sz: Store size of pt_regs and cpu_mask structure of first kernel. These attributes are used to prevent dump processing if the sizes of pt_regs or cpu_mask structure differ between the first and fadump kernels. Note: if either first/crashed kernel or second/fadump kernel do not have the changes introduced here then kernel fail to collect the dump and prints relevant error message on the console. Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240422195932.1583833-2-sourabhjain@linux.ibm.com |
||
Shrikanth Hegde
|
6d43416385 |
powerpc/pseries: Add failure related checks for h_get_mpp and h_get_ppp
Couple of Minor fixes: - hcall return values are long. Fix that for h_get_mpp, h_get_ppp and parse_ppp_data - If hcall fails, values set should be at-least zero. It shouldn't be uninitialized values. Fix that for h_get_mpp and h_get_ppp Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240412092047.455483-3-sshegde@linux.ibm.com |
||
Shrikanth Hegde
|
9c74ecfd0f |
powerpc/pseries: Add pool idle time at LPAR boot
When there are no options specified for lparstat, it is expected to give reports since LPAR(Logical Partition) boot. APP(Available Processor Pool) is an indicator of how many cores in the shared pool are free to use in Shared Processor LPAR(SPLPAR). APP is derived using pool_idle_time which is obtained using H_PIC call. The interval based reports show correct APP value while since boot report shows very high APP values. This happens because in that case APP is obtained by dividing pool idle time by LPAR uptime. Since pool idle time is reported by the PowerVM hypervisor since its boot, it need not align with LPAR boot. To fix that export boot pool idle time in lparcfg and powerpc-utils will use this info to derive APP as below for since boot reports. APP = (pool idle time - boot pool idle time) / (uptime * timebase) Results:: Observe APP values. ====================== Shared LPAR ================================ lparstat System Configuration type=Shared mode=Uncapped smt=8 lcpu=12 mem=15573440 kB cpus=37 ent=12.00 reboot stress-ng --cpu=$(nproc) -t 600 sleep 600 So in this case app is expected to close to 37-6=31. ====== 6.9-rc1 and lparstat 1.3.10 ============= %user %sys %wait %idle physc %entc lbusy app vcsw phint ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- 47.48 0.01 0.00 52.51 0.00 0.00 47.49 69099.72 541547 21 === With this patch and powerpc-utils patch to do the above equation === %user %sys %wait %idle physc %entc lbusy app vcsw phint ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- 47.48 0.01 0.00 52.51 5.73 47.75 47.49 31.21 541753 21 ===================================================================== Note: physc, purr/idle purr being inaccurate is being handled in a separate patch in powerpc-utils tree. Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240412092047.455483-2-sshegde@linux.ibm.com |
||
Suren Baghdasaryan
|
8a2f118787 |
change alloc_pages name in dma_map_ops to avoid name conflicts
After redefining alloc_pages, all uses of that name are being replaced. Change the conflicting names to prevent preprocessor from replacing them when it's not intended. Link: https://lkml.kernel.org/r/20240321163705.3067592-18-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Tested-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Shivaprasad G Bhat
|
dbc8fc9d6d |
powerpc/papr_scm: Move duplicate definitions to common header files
papr_scm and ndtest share common PDSM payload structs like nd_papr_pdsm_health. Presently these structs are duplicated across papr_pdsm.h and ndtest.h header files. Since 'ndtest' is essentially arch independent and can run on platforms other than PPC64, a way needs to be deviced to avoid redundancy and duplication of PDSM structs in future. So the patch proposes moving the PDSM header from arch/powerpc/include- -/uapi/ to the generic include/uapi/linux directory. Also, there are some #defines common between papr_scm and ndtest which are not exported to the user space. So, move them to a header file which can be shared across ndtest and papr_scm via newly introduced include/linux/papr_scm.h. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Link: https://lore.kernel.org/r/170638176942.112443.2937254675538057083.stgit@ltcd48-lp2.aus.stglab.ibm.com Signed-off-by: Ira Weiny <ira.weiny@intel.com> |
||
Gaurav Batra
|
49a940dbdc |
powerpc/pseries/iommu: LPAR panics during boot up with a frozen PE
At the time of LPAR boot up, partition firmware provides Open Firmware property ibm,dma-window for the PE. This property is provided on the PCI bus the PE is attached to. There are execptions where the partition firmware might not provide this property for the PE at the time of LPAR boot up. One of the scenario is where the firmware has frozen the PE due to some error condition. This PE is frozen for 24 hours or unless the whole system is reinitialized. Within this time frame, if the LPAR is booted, the frozen PE will be presented to the LPAR but ibm,dma-window property could be missing. Today, under these circumstances, the LPAR oopses with NULL pointer dereference, when configuring the PCI bus the PE is attached to. BUG: Kernel NULL pointer dereference on read at 0x000000c8 Faulting instruction address: 0xc0000000001024c0 Oops: Kernel access of bad area, sig: 7 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries Modules linked in: Supported: Yes CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.4.0-150600.9-default #1 Hardware name: IBM,9043-MRX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_023) hv:phyp pSeries NIP: c0000000001024c0 LR: c0000000001024b0 CTR: c000000000102450 REGS: c0000000037db5c0 TRAP: 0300 Not tainted (6.4.0-150600.9-default) MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 28000822 XER: 00000000 CFAR: c00000000010254c DAR: 00000000000000c8 DSISR: 00080000 IRQMASK: 0 ... NIP [c0000000001024c0] pci_dma_bus_setup_pSeriesLP+0x70/0x2a0 LR [c0000000001024b0] pci_dma_bus_setup_pSeriesLP+0x60/0x2a0 Call Trace: pci_dma_bus_setup_pSeriesLP+0x60/0x2a0 (unreliable) pcibios_setup_bus_self+0x1c0/0x370 __of_scan_bus+0x2f8/0x330 pcibios_scan_phb+0x280/0x3d0 pcibios_init+0x88/0x12c do_one_initcall+0x60/0x320 kernel_init_freeable+0x344/0x3e4 kernel_init+0x34/0x1d0 ret_from_kernel_user_thread+0x14/0x1c Fixes: b1fc44eaa9ba ("pseries/iommu/ddw: Fix kdump to work in absence of ibm,dma-window") Signed-off-by: Gaurav Batra <gbatra@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240422205141.10662-1-gbatra@linux.ibm.com |
||
Nayna Jain
|
784354349d |
powerpc/pseries: make max polling consistent for longer H_CALLs
Currently, plpks_confirm_object_flushed() function polls for 5msec in total instead of 5sec. Keep max polling time consistent for all the H_CALLs, which take longer than expected, to be 5sec. Also, make use of fsleep() everywhere to insert delay. Reported-by: Nageswara R Sastry <rnsastry@linux.ibm.com> Fixes: 2454a7af0f2a ("powerpc/pseries: define driver for Platform KeyStore") Signed-off-by: Nayna Jain <nayna@linux.ibm.com> Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240418031230.170954-1-nayna@linux.ibm.com |
||
Lukas Wunner
|
66bc1a1733 |
treewide: Use sysfs_bin_attr_simple_read() helper
Deduplicate ->read() callbacks of bin_attributes which are backed by a simple buffer in memory: Use the newly introduced sysfs_bin_attr_simple_read() helper instead, either by referencing it directly or by declaring such bin_attributes with BIN_ATTR_SIMPLE_RO() or BIN_ATTR_SIMPLE_ADMIN_RO(). Aside from a reduction of LoC, this shaves off a few bytes from vmlinux (304 bytes on an x86_64 allyesconfig). No functional change intended. Signed-off-by: Lukas Wunner <lukas@wunner.de> Acked-by: Zhi Wang <zhiwang@kernel.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/92ee0a0e83a5a3f3474845db6c8575297698933a.1712410202.git.lukas@wunner.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Andy Shevchenko
|
676abf7c39 |
powerpc/52xx: Replace of_gpio.h by proper one
of_gpio.h is deprecated and subject to remove. The driver doesn't use it directly, replace it with what is really being used. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240313135645.2066362-1-andriy.shevchenko@linux.intel.com |
||
Geoff Levand
|
bfe51886ca |
powerpc: Fix PS3 allmodconfig warning
The struct ps3_notification_device in the ps3_probe_thread routine is too large to be on the stack, causing a warning for an allmodconfig build with clang. Change the struct ps3_notification_device from a variable on the stack to a dynamically allocated variable. Reported-by: Arnd Bergmann <arnd@kernel.org> Signed-off-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/d64f06f4-81ae-4ec5-ab3b-d7f7f091e0ac@infradead.org |
||
Hari Bathini
|
5c4233cc09 |
powerpc/kdump: Split KEXEC_CORE and CRASH_DUMP dependency
Remove CONFIG_CRASH_DUMP dependency on CONFIG_KEXEC. CONFIG_KEXEC_CORE was used at places where CONFIG_CRASH_DUMP or CONFIG_CRASH_RESERVE was appropriate. Replace with appropriate #ifdefs to support CONFIG_KEXEC and !CONFIG_CRASH_DUMP configuration option. Also, make CONFIG_FA_DUMP dependent on CONFIG_CRASH_DUMP to avoid unmet dependencies for FA_DUMP with !CONFIG_KEXEC_CORE configuration option. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240226103010.589537-4-hbathini@linux.ibm.com |
||
Linus Torvalds
|
66a27abac3 |
powerpc updates for 6.9
- Add AT_HWCAP3 and AT_HWCAP4 aux vector entries for future use by glibc. - Add support for recognising the Power11 architected and raw PVRs. - Add support for nr_cpus=n on the command line where the boot CPU is >= n. - Add ppcxx_allmodconfig targets for all 32-bit sub-arches. - Other small features, cleanups and fixes. Thanks to: Akanksha J N, Brian King, Christophe Leroy, Dawei Li, Geoff Levand, Greg Kroah-Hartman, Jan-Benedict Glaw, Kajol Jain, Kunwu Chan, Li zeming, Madhavan Srinivasan, Masahiro Yamada, Nathan Chancellor, Nicholas Piggin, Peter Bergner, Qiheng Lin, Randy Dunlap, Ricardo B. Marliere, Rob Herring, Sathvika Vasireddy, Shrikanth Hegde, Uwe Kleine-König, Vaibhav Jain, Wen Xiong. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmX01vgTHG1wZUBlbGxl cm1hbi5pZC5hdQAKCRBR6+o8yOGlgJ4bEACVsxXXjbjl+WKgWNjHsM7sVwUX/sSV z43iVycLPXDqochSkkgKjyIEFowaWhjgWVHFHmUXWxB5FjjFEEoH4FPo3VB0IY48 VoSFT6PhzqXDrGmt2fWsJ+k6zUyJZa8pNS38DHg1yuuYDAa0KWxd3E/x/r0qzsbr vcas1uWcDWgjoUDMBuJpyx0sYTl6+mR9HlZuM4+aNQdzhTFU/jK69hAN0RFvryes K2/fLgI0fgLZpQDogCn4HV1/4uixi1eEFlVNXkwvMYDpQVo2FqiBaWLF0hNLWNCk kvm/fYIJhdFoNlp38jVKv0KJnBhW7aAs3prF+8B3YL2B23rLnvA6ZLZKHcdBAeLb 8PJMRrbAbmVxOnVSAG0fgU+0dEdkJQ+0ABqa+usMOV7xIPg9uIui1YrKT1KVq6Fs KyGHM5EQuBC/P6bTsKO6X+1beY2QIfwWxaIkoo8pj6d0WU69qU4u+LzQiDO4XR0L UQQguB1Qo8yaip3rHXhuv0hlnMNVAVye56Zw63uq1MWGkewRKSkY91Ms02L+pXpF r6+96xoFB0ulKZFnyxyBdkj2iC0426fHtTiiJFfQ4R1fiibPKtAx9P59WYnqymVh QsSYqlgC2/jWzRgqJTweLp/XQK8fWqmFkNmCGDN1N9Sij9Xjx/8aZb5dvwJkSBnK rZ4ObxBoaCPbPA== =K9Ok -----END PGP SIGNATURE----- Merge tag 'powerpc-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Add AT_HWCAP3 and AT_HWCAP4 aux vector entries for future use by glibc - Add support for recognising the Power11 architected and raw PVRs - Add support for nr_cpus=n on the command line where the boot CPU is >= n - Add ppcxx_allmodconfig targets for all 32-bit sub-arches - Other small features, cleanups and fixes Thanks to Akanksha J N, Brian King, Christophe Leroy, Dawei Li, Geoff Levand, Greg Kroah-Hartman, Jan-Benedict Glaw, Kajol Jain, Kunwu Chan, Li zeming, Madhavan Srinivasan, Masahiro Yamada, Nathan Chancellor, Nicholas Piggin, Peter Bergner, Qiheng Lin, Randy Dunlap, Ricardo B. Marliere, Rob Herring, Sathvika Vasireddy, Shrikanth Hegde, Uwe Kleine-König, Vaibhav Jain, and Wen Xiong. * tag 'powerpc-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (71 commits) powerpc/macio: Make remove callback of macio driver void returned powerpc/83xx: Fix build failure with FPU=n powerpc/64s: Fix get_hugepd_cache_index() build failure powerpc/4xx: Fix warp_gpio_leds build failure powerpc/amigaone: Make several functions static powerpc/embedded6xx: Fix no previous prototype for avr_uart_send() etc. macintosh/adb: make adb_dev_class constant powerpc: xor_vmx: Add '-mhard-float' to CFLAGS powerpc/fsl: Fix mfpmr() asm constraint error powerpc: Remove cpu-as-y completely powerpc/fsl: Modernise mt/mfpmr powerpc/fsl: Fix mfpmr build errors with newer binutils powerpc/64s: Use .machine power4 around dcbt powerpc/64s: Move dcbt/dcbtst sequence into a macro powerpc/mm: Code cleanup for __hash_page_thp powerpc/hv-gpci: Fix the H_GET_PERF_COUNTER_INFO hcall return value checks powerpc/irq: Allow softirq to hardirq stack transition powerpc: Stop using of_root powerpc/machdep: Define 'compatibles' property in ppc_md and use it of: Reimplement of_machine_is_compatible() using of_machine_compatible_match() ... |
||
Linus Torvalds
|
902861e34c |
- Sumanth Korikkar has taught s390 to allocate hotplug-time page frames
from hotplugged memory rather than only from main memory. Series "implement "memmap on memory" feature on s390". - More folio conversions from Matthew Wilcox in the series "Convert memcontrol charge moving to use folios" "mm: convert mm counter to take a folio" - Chengming Zhou has optimized zswap's rbtree locking, providing significant reductions in system time and modest but measurable reductions in overall runtimes. The series is "mm/zswap: optimize the scalability of zswap rb-tree". - Chengming Zhou has also provided the series "mm/zswap: optimize zswap lru list" which provides measurable runtime benefits in some swap-intensive situations. - And Chengming Zhou further optimizes zswap in the series "mm/zswap: optimize for dynamic zswap_pools". Measured improvements are modest. - zswap cleanups and simplifications from Yosry Ahmed in the series "mm: zswap: simplify zswap_swapoff()". - In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has contributed several DAX cleanups as well as adding a sysfs tunable to control the memmap_on_memory setting when the dax device is hotplugged as system memory. - Johannes Weiner has added the large series "mm: zswap: cleanups", which does that. - More DAMON work from SeongJae Park in the series "mm/damon: make DAMON debugfs interface deprecation unignorable" "selftests/damon: add more tests for core functionalities and corner cases" "Docs/mm/damon: misc readability improvements" "mm/damon: let DAMOS feeds and tame/auto-tune itself" - In the series "mm/mempolicy: weighted interleave mempolicy and sysfs extension" Rakie Kim has developed a new mempolicy interleaving policy wherein we allocate memory across nodes in a weighted fashion rather than uniformly. This is beneficial in heterogeneous memory environments appearing with CXL. - Christophe Leroy has contributed some cleanup and consolidation work against the ARM pagetable dumping code in the series "mm: ptdump: Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute". - Luis Chamberlain has added some additional xarray selftesting in the series "test_xarray: advanced API multi-index tests". - Muhammad Usama Anjum has reworked the selftest code to make its human-readable output conform to the TAP ("Test Anything Protocol") format. Amongst other things, this opens up the use of third-party tools to parse and process out selftesting results. - Ryan Roberts has added fork()-time PTE batching of THP ptes in the series "mm/memory: optimize fork() with PTE-mapped THP". Mainly targeted at arm64, this significantly speeds up fork() when the process has a large number of pte-mapped folios. - David Hildenbrand also gets in on the THP pte batching game in his series "mm/memory: optimize unmap/zap with PTE-mapped THP". It implements batching during munmap() and other pte teardown situations. The microbenchmark improvements are nice. - And in the series "Transparent Contiguous PTEs for User Mappings" Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte mappings"). Kernel build times on arm64 improved nicely. Ryan's series "Address some contpte nits" provides some followup work. - In the series "mm/hugetlb: Restore the reservation" Breno Leitao has fixed an obscure hugetlb race which was causing unnecessary page faults. He has also added a reproducer under the selftest code. - In the series "selftests/mm: Output cleanups for the compaction test", Mark Brown did what the title claims. - Kinsey Ho has added the series "mm/mglru: code cleanup and refactoring". - Even more zswap material from Nhat Pham. The series "fix and extend zswap kselftests" does as claimed. - In the series "Introduce cpu_dcache_is_aliasing() to fix DAX regression" Mathieu Desnoyers has cleaned up and fixed rather a mess in our handling of DAX on archiecctures which have virtually aliasing data caches. The arm architecture is the main beneficiary. - Lokesh Gidra's series "per-vma locks in userfaultfd" provides dramatic improvements in worst-case mmap_lock hold times during certain userfaultfd operations. - Some page_owner enhancements and maintenance work from Oscar Salvador in his series "page_owner: print stacks and their outstanding allocations" "page_owner: Fixup and cleanup" - Uladzislau Rezki has contributed some vmalloc scalability improvements in his series "Mitigate a vmap lock contention". It realizes a 12x improvement for a certain microbenchmark. - Some kexec/crash cleanup work from Baoquan He in the series "Split crash out from kexec and clean up related config items". - Some zsmalloc maintenance work from Chengming Zhou in the series "mm/zsmalloc: fix and optimize objects/page migration" "mm/zsmalloc: some cleanup for get/set_zspage_mapping()" - Zi Yan has taught the MM to perform compaction on folios larger than order=0. This a step along the path to implementaton of the merging of large anonymous folios. The series is named "Enable >0 order folio memory compaction". - Christoph Hellwig has done quite a lot of cleanup work in the pagecache writeback code in his series "convert write_cache_pages() to an iterator". - Some modest hugetlb cleanups and speedups in Vishal Moola's series "Handle hugetlb faults under the VMA lock". - Zi Yan has changed the page splitting code so we can split huge pages into sizes other than order-0 to better utilize large folios. The series is named "Split a folio to any lower order folios". - David Hildenbrand has contributed the series "mm: remove total_mapcount()", a cleanup. - Matthew Wilcox has sought to improve the performance of bulk memory freeing in his series "Rearrange batched folio freeing". - Gang Li's series "hugetlb: parallelize hugetlb page init on boot" provides large improvements in bootup times on large machines which are configured to use large numbers of hugetlb pages. - Matthew Wilcox's series "PageFlags cleanups" does that. - Qi Zheng's series "minor fixes and supplement for ptdesc" does that also. S390 is affected. - Cleanups to our pagemap utility functions from Peter Xu in his series "mm/treewide: Replace pXd_large() with pXd_leaf()". - Nico Pache has fixed a few things with our hugepage selftests in his series "selftests/mm: Improve Hugepage Test Handling in MM Selftests". - Also, of course, many singleton patches to many things. Please see the individual changelogs for details. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZfJpPQAKCRDdBJ7gKXxA joxeAP9TrcMEuHnLmBlhIXkWbIR4+ki+pA3v+gNTlJiBhnfVSgD9G55t1aBaRplx TMNhHfyiHYDTx/GAV9NXW84tasJSDgA= =TG55 -----END PGP SIGNATURE----- Merge tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Sumanth Korikkar has taught s390 to allocate hotplug-time page frames from hotplugged memory rather than only from main memory. Series "implement "memmap on memory" feature on s390". - More folio conversions from Matthew Wilcox in the series "Convert memcontrol charge moving to use folios" "mm: convert mm counter to take a folio" - Chengming Zhou has optimized zswap's rbtree locking, providing significant reductions in system time and modest but measurable reductions in overall runtimes. The series is "mm/zswap: optimize the scalability of zswap rb-tree". - Chengming Zhou has also provided the series "mm/zswap: optimize zswap lru list" which provides measurable runtime benefits in some swap-intensive situations. - And Chengming Zhou further optimizes zswap in the series "mm/zswap: optimize for dynamic zswap_pools". Measured improvements are modest. - zswap cleanups and simplifications from Yosry Ahmed in the series "mm: zswap: simplify zswap_swapoff()". - In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has contributed several DAX cleanups as well as adding a sysfs tunable to control the memmap_on_memory setting when the dax device is hotplugged as system memory. - Johannes Weiner has added the large series "mm: zswap: cleanups", which does that. - More DAMON work from SeongJae Park in the series "mm/damon: make DAMON debugfs interface deprecation unignorable" "selftests/damon: add more tests for core functionalities and corner cases" "Docs/mm/damon: misc readability improvements" "mm/damon: let DAMOS feeds and tame/auto-tune itself" - In the series "mm/mempolicy: weighted interleave mempolicy and sysfs extension" Rakie Kim has developed a new mempolicy interleaving policy wherein we allocate memory across nodes in a weighted fashion rather than uniformly. This is beneficial in heterogeneous memory environments appearing with CXL. - Christophe Leroy has contributed some cleanup and consolidation work against the ARM pagetable dumping code in the series "mm: ptdump: Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute". - Luis Chamberlain has added some additional xarray selftesting in the series "test_xarray: advanced API multi-index tests". - Muhammad Usama Anjum has reworked the selftest code to make its human-readable output conform to the TAP ("Test Anything Protocol") format. Amongst other things, this opens up the use of third-party tools to parse and process out selftesting results. - Ryan Roberts has added fork()-time PTE batching of THP ptes in the series "mm/memory: optimize fork() with PTE-mapped THP". Mainly targeted at arm64, this significantly speeds up fork() when the process has a large number of pte-mapped folios. - David Hildenbrand also gets in on the THP pte batching game in his series "mm/memory: optimize unmap/zap with PTE-mapped THP". It implements batching during munmap() and other pte teardown situations. The microbenchmark improvements are nice. - And in the series "Transparent Contiguous PTEs for User Mappings" Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte mappings"). Kernel build times on arm64 improved nicely. Ryan's series "Address some contpte nits" provides some followup work. - In the series "mm/hugetlb: Restore the reservation" Breno Leitao has fixed an obscure hugetlb race which was causing unnecessary page faults. He has also added a reproducer under the selftest code. - In the series "selftests/mm: Output cleanups for the compaction test", Mark Brown did what the title claims. - Kinsey Ho has added the series "mm/mglru: code cleanup and refactoring". - Even more zswap material from Nhat Pham. The series "fix and extend zswap kselftests" does as claimed. - In the series "Introduce cpu_dcache_is_aliasing() to fix DAX regression" Mathieu Desnoyers has cleaned up and fixed rather a mess in our handling of DAX on archiecctures which have virtually aliasing data caches. The arm architecture is the main beneficiary. - Lokesh Gidra's series "per-vma locks in userfaultfd" provides dramatic improvements in worst-case mmap_lock hold times during certain userfaultfd operations. - Some page_owner enhancements and maintenance work from Oscar Salvador in his series "page_owner: print stacks and their outstanding allocations" "page_owner: Fixup and cleanup" - Uladzislau Rezki has contributed some vmalloc scalability improvements in his series "Mitigate a vmap lock contention". It realizes a 12x improvement for a certain microbenchmark. - Some kexec/crash cleanup work from Baoquan He in the series "Split crash out from kexec and clean up related config items". - Some zsmalloc maintenance work from Chengming Zhou in the series "mm/zsmalloc: fix and optimize objects/page migration" "mm/zsmalloc: some cleanup for get/set_zspage_mapping()" - Zi Yan has taught the MM to perform compaction on folios larger than order=0. This a step along the path to implementaton of the merging of large anonymous folios. The series is named "Enable >0 order folio memory compaction". - Christoph Hellwig has done quite a lot of cleanup work in the pagecache writeback code in his series "convert write_cache_pages() to an iterator". - Some modest hugetlb cleanups and speedups in Vishal Moola's series "Handle hugetlb faults under the VMA lock". - Zi Yan has changed the page splitting code so we can split huge pages into sizes other than order-0 to better utilize large folios. The series is named "Split a folio to any lower order folios". - David Hildenbrand has contributed the series "mm: remove total_mapcount()", a cleanup. - Matthew Wilcox has sought to improve the performance of bulk memory freeing in his series "Rearrange batched folio freeing". - Gang Li's series "hugetlb: parallelize hugetlb page init on boot" provides large improvements in bootup times on large machines which are configured to use large numbers of hugetlb pages. - Matthew Wilcox's series "PageFlags cleanups" does that. - Qi Zheng's series "minor fixes and supplement for ptdesc" does that also. S390 is affected. - Cleanups to our pagemap utility functions from Peter Xu in his series "mm/treewide: Replace pXd_large() with pXd_leaf()". - Nico Pache has fixed a few things with our hugepage selftests in his series "selftests/mm: Improve Hugepage Test Handling in MM Selftests". - Also, of course, many singleton patches to many things. Please see the individual changelogs for details. * tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (435 commits) mm/zswap: remove the memcpy if acomp is not sleepable crypto: introduce: acomp_is_async to expose if comp drivers might sleep memtest: use {READ,WRITE}_ONCE in memory scanning mm: prohibit the last subpage from reusing the entire large folio mm: recover pud_leaf() definitions in nopmd case selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirements selftests/mm: skip uffd hugetlb tests with insufficient hugepages selftests/mm: dont fail testsuite due to a lack of hugepages mm/huge_memory: skip invalid debugfs new_order input for folio split mm/huge_memory: check new folio order when split a folio mm, vmscan: retry kswapd's priority loop with cache_trim_mode off on failure mm: add an explicit smp_wmb() to UFFDIO_CONTINUE mm: fix list corruption in put_pages_list mm: remove folio from deferred split list before uncharging it filemap: avoid unnecessary major faults in filemap_fault() mm,page_owner: drop unnecessary check mm,page_owner: check for null stack_record before bumping its refcount mm: swap: fix race between free_swap_and_cache() and swapoff() mm/treewide: align up pXd_leaf() retval across archs mm/treewide: drop pXd_large() ... |