IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This reverts commit 67211aadcb4b968d0fdc57bc27240fa71500c2d4.
This is part of a revert of the following commits:
11ed8b8624b8 ("PCI: brcmstb: Do not turn off WOL regulators on suspend")
93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")
67211aadcb4b ("PCI: brcmstb: Add mechanism to turn on subdev regulators")
830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup() into two funcs")
Cyril reported that 830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup()
into two funcs"), which appeared in v5.17-rc1, broke booting on the
Raspberry Pi Compute Module 4. Apparently 830aa6f29f07 panics with an
Asynchronous SError Interrupt, and after further commits here is a black
screen on HDMI and no output on the serial console.
This does not seem to affect the Raspberry Pi 4 B.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215925
Link: https://lore.kernel.org/r/20220511201856.808690-4-helgaas@kernel.org
Reported-by: Cyril Brulebois <kibi@debian.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
This reverts commit 93e41f3fca3d4a0f927b784012338c37f80a8a80.
This is part of a revert of the following commits:
11ed8b8624b8 ("PCI: brcmstb: Do not turn off WOL regulators on suspend")
93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")
67211aadcb4b ("PCI: brcmstb: Add mechanism to turn on subdev regulators")
830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup() into two funcs")
Cyril reported that 830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup()
into two funcs"), which appeared in v5.17-rc1, broke booting on the
Raspberry Pi Compute Module 4. Apparently 830aa6f29f07 panics with an
Asynchronous SError Interrupt, and after further commits here is a black
screen on HDMI and no output on the serial console.
This does not seem to affect the Raspberry Pi 4 B.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215925
Link: https://lore.kernel.org/r/20220511201856.808690-3-helgaas@kernel.org
Reported-by: Cyril Brulebois <kibi@debian.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
This reverts commit 11ed8b8624b8085f706864b4addcd304b1e4fc38.
This is part of a revert of the following commits:
11ed8b8624b8 ("PCI: brcmstb: Do not turn off WOL regulators on suspend")
93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")
67211aadcb4b ("PCI: brcmstb: Add mechanism to turn on subdev regulators")
830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup() into two funcs")
Cyril reported that 830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup()
into two funcs"), which appeared in v5.17-rc1, broke booting on the
Raspberry Pi Compute Module 4. Apparently 830aa6f29f07 panics with an
Asynchronous SError Interrupt, and after further commits here is a black
screen on HDMI and no output on the serial console.
This does not seem to affect the Raspberry Pi 4 B.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215925
Link: https://lore.kernel.org/r/20220511201856.808690-2-helgaas@kernel.org
Reported-by: Cyril Brulebois <kibi@debian.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmKP/tQUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vzH0xAAojQowrSWzZ5FKTqI+L/L9ZXoAb+e
9IvQljKc9taJldmXp+EB9wkS/5B+VtQcC2qUQuWEQXUoECF8qHlcB4l+XQyd1tWO
O0vZxETH22xjLLrjG2F3l5rrfkJZAf2nEugwbDk97YEgiimeOiRcv3bx6AUCtj6I
rPJ13Fop3Jke7sQMcXYJe3gQLT1o1AKiQGghiCFNi/gzx2lXI6mmHBgLxFoiqcby
WpfXbvbJti95HRaahUR3HaDFfHj4HVkQNLlTtIykJ3Tl2/rOhWEJjI8JOIQpAA+M
WBrWw9rfgbScTiGV+dZ3h7hKiPnHKl9YETIX7L0oA2sj0jZcIs0d6mSBZx0kYuI9
eAlx+qSK9xpbQQr/fdYaUdF1q4QdtU0BYOvOWOzWsqYCECMRJ1PUHFSMbmR/+PNB
P5lHnAbggRSoxdAtwFYv1HTr+VpGH9S+5oxHCz3ohpMjYy6mkCZwHpZn3doaU3ci
KG6yIoVKftm3fZdtFvL03qHl/I8+X24ZhT/T/278PRGjkhSyr56hZo8hg0gqqTct
ngip8qNABmSbqpr73/W6Vl42zAbYtNk1BykYahbKupgW8FbT7hqaZTB05V87pVu+
Ko1aJM6VoOP9rMlKHI9ba8eYCzDrZbLZUFn7ljNPDpzutf0tAwtgwzvZBXN3za6+
Z9+D5dxmvrZEIbA=
=hEti
-----END PGP SIGNATURE-----
Merge tag 'pci-v5.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull pci updates from Bjorn Helgaas:
"Resource management:
- Restrict E820 clipping to PCI host bridge windows (Bjorn Helgaas)
- Log E820 clipping better (Bjorn Helgaas)
- Add kernel cmdline options to enable/disable E820 clipping (Hans de
Goede)
- Disable E820 reserved region clipping for IdeaPads, Yoga, Yoga
Slip, Acer Spin 5, Clevo Barebone systems where clipping leaves no
usable address space for touchpads, Thunderbolt devices, etc (Hans
de Goede)
- Disable E820 clipping by default starting in 2023 (Hans de Goede)
PCI device hotplug:
- Include files to remove implicit dependencies (Christophe Leroy)
- Only put Root Ports in D3 if they can signal and wake from D3 so
AMD Yellow Carp doesn't miss hotplug events (Mario Limonciello)
Power management:
- Define pci_restore_standard_config() only for CONFIG_PM_SLEEP since
it's unused otherwise (Krzysztof Kozlowski)
- Power up devices completely, including anything platform firmware
needs to do, during runtime resume (Rafael J. Wysocki)
- Move pci_resume_bus() to PM callbacks so we observe the required
bridge power-up delays (Rafael J. Wysocki)
- Drop unneeded runtime_d3cold device flag (Rafael J. Wysocki)
- Split pci_raw_set_power_state() between pci_power_up() and a new
pci_set_low_power_state() (Rafael J. Wysocki)
- Set current_state to D3cold if config read returns ~0, indicating
the device is not accessible (Rafael J. Wysocki)
- Do not call pci_update_current_state() from pci_power_up() so BARs
and ASPM config are restored correctly (Rafael J. Wysocki)
- Write 0 to PMCSR in pci_power_up() in all cases (Rafael J. Wysocki)
- Split pci_power_up() to pci_set_full_power_state() to avoid some
redundant operations (Rafael J. Wysocki)
- Skip restoring BARs if device is not in D0 (Rafael J. Wysocki)
- Rearrange and clarify pci_set_power_state() (Rafael J. Wysocki)
- Remove redundant BAR restores from pci_pm_thaw_noirq() (Rafael J.
Wysocki)
Virtualization:
- Acquire device lock before config space access lock to avoid AB/BA
deadlock with sriov_numvfs_store() (Yicong Yang)
Error handling:
- Clear MULTI_ERR_COR/UNCOR_RCV bits, which a race could previously
leave permanently set (Kuppuswamy Sathyanarayanan)
Peer-to-peer DMA:
- Whitelist Intel Skylake-E Root Ports regardless of which devfn they
are (Shlomo Pongratz)
ASPM:
- Override L1 acceptable latency advertised by Intel DG2 so ASPM L1
can be enabled (Mika Westerberg)
Cadence PCIe controller driver:
- Set up device-specific register to allow PTM Responder to be
enabled by the normal architected bit (Christian Gmeiner)
- Override advertised FLR support since the controller doesn't
implement FLR correctly (Parshuram Thombare)
Cadence PCIe endpoint driver:
- Correct bitmap size for the ob_region_map of outbound window usage
(Dan Carpenter)
Freescale i.MX6 PCIe controller driver:
- Fix PERST# assertion/deassertion so we observe the required delays
before accessing device (Francesco Dolcini)
Freescale Layerscape PCIe controller driver:
- Add "big-endian" DT property (Hou Zhiqiang)
- Update SCFG DT property (Hou Zhiqiang)
- Add "aer", "pme", "intr" DT properties (Li Yang)
- Add DT compatible strings for ls1028a (Xiaowei Bao)
Intel VMD host bridge driver:
- Assign VMD IRQ domain before enumeration to avoid IOMMU interrupt
remapping errors when MSI-X remapping is disabled (Nirmal Patel)
- Revert VMD workaround that kept MSI-X remapping enabled when IOMMU
remapping was enabled (Nirmal Patel)
Marvell MVEBU PCIe controller driver:
- Add of_pci_get_slot_power_limit() to parse the
'slot-power-limit-milliwatt' DT property (Pali Rohár)
- Add mvebu support for sending Set_Slot_Power_Limit message (Pali
Rohár)
MediaTek PCIe controller driver:
- Fix refcount leak in mtk_pcie_subsys_powerup() (Miaoqian Lin)
MediaTek PCIe Gen3 controller driver:
- Reset PHY and MAC at probe time (AngeloGioacchino Del Regno)
Microchip PolarFlare PCIe controller driver:
- Add chained_irq_enter()/chained_irq_exit() calls to mc_handle_msi()
and mc_handle_intx() to avoid lost interrupts (Conor Dooley)
- Fix interrupt handling race (Daire McNamara)
NVIDIA Tegra194 PCIe controller driver:
- Drop tegra194 MSI register save/restore, which is unnecessary since
the DWC core does it (Jisheng Zhang)
Qualcomm PCIe controller driver:
- Add SM8150 SoC DT binding and support (Bhupesh Sharma)
- Fix pipe clock imbalance (Johan Hovold)
- Fix runtime PM imbalance on probe errors (Johan Hovold)
- Fix PHY init imbalance on probe errors (Johan Hovold)
- Convert DT binding to YAML (Dmitry Baryshkov)
- Update DT binding to show that resets aren't required for
MSM8996/APQ8096 platforms (Dmitry Baryshkov)
- Add explicit register names per chipset in DT binding (Dmitry
Baryshkov)
- Add sc7280-specific clock and reset definitions to DT binding
(Dmitry Baryshkov)
Rockchip PCIe controller driver:
- Fix bitmap size when searching for free outbound region (Dan
Carpenter)
Rockchip DesignWare PCIe controller driver:
- Remove "snps,dw-pcie" from rockchip-dwc DT "compatible" property
because it's not fully compatible with rockchip (Peter Geis)
- Reset rockchip-dwc controller at probe (Peter Geis)
- Add rockchip-dwc INTx support (Peter Geis)
Synopsys DesignWare PCIe controller driver:
- Return error instead of success if DMA mapping of MSI area fails
(Jiantao Zhang)
Miscellaneous:
- Change pci_set_dma_mask() documentation references to
dma_set_mask() (Alex Williamson)"
* tag 'pci-v5.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (64 commits)
dt-bindings: PCI: qcom: Add schema for sc7280 chipset
dt-bindings: PCI: qcom: Specify reg-names explicitly
dt-bindings: PCI: qcom: Do not require resets on msm8996 platforms
dt-bindings: PCI: qcom: Convert to YAML
PCI: qcom: Fix unbalanced PHY init on probe errors
PCI: qcom: Fix runtime PM imbalance on probe errors
PCI: qcom: Fix pipe clock imbalance
PCI: qcom: Add SM8150 SoC support
dt-bindings: pci: qcom: Document PCIe bindings for SM8150 SoC
x86/PCI: Disable E820 reserved region clipping starting in 2023
x86/PCI: Disable E820 reserved region clipping via quirks
x86/PCI: Add kernel cmdline options to use/ignore E820 reserved regions
PCI: microchip: Fix potential race in interrupt handling
PCI/AER: Clear MULTI_ERR_COR/UNCOR_RCV bits
PCI: cadence: Clear FLR in device capabilities register
PCI: cadence: Allow PTM Responder to be enabled
PCI: vmd: Revert 2565e5b69c44 ("PCI: vmd: Do not disable MSI-X remapping if interrupt remapping is enabled by IOMMU.")
PCI: vmd: Assign VMD IRQ domain before enumeration
PCI: Avoid pci_dev_lock() AB/BA deadlock with sriov_numvfs_store()
PCI: rockchip-dwc: Add legacy interrupt support
...
and alloc_contig_range alignment", from Zi Yan.
A series of z3fold cleanups and fixes from Miaohe Lin.
Some memcg selftests work from Michal Koutný <mkoutny@suse.com>
Some swap fixes and cleanups from Miaohe Lin.
Several individual minor fixups.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYpEE7QAKCRDdBJ7gKXxA
jlamAP9WmjNdx+5Pz5OkkaSjBO7y7vBrBTcQ9e5pz8bUWRoQhwEA+WtsssLmq9aI
7DBDmBKYCMTbzOQTqaMRHkB+JWZo+Ao=
=L3f1
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2022-05-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more MM updates from Andrew Morton:
- Two follow-on fixes for the post-5.19 series "Use pageblock_order for
cma and alloc_contig_range alignment", from Zi Yan.
- A series of z3fold cleanups and fixes from Miaohe Lin.
- Some memcg selftests work from Michal Koutný <mkoutny@suse.com>
- Some swap fixes and cleanups from Miaohe Lin
- Several individual minor fixups
* tag 'mm-stable-2022-05-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (25 commits)
mm/shmem.c: suppress shift warning
mm: Kconfig: reorganize misplaced mm options
mm: kasan: fix input of vmalloc_to_page()
mm: fix is_pinnable_page against a cma page
mm: filter out swapin error entry in shmem mapping
mm/shmem: fix infinite loop when swap in shmem error at swapoff time
mm/madvise: free hwpoison and swapin error entry in madvise_free_pte_range
mm/swapfile: fix lost swap bits in unuse_pte()
mm/swapfile: unuse_pte can map random data if swap read fails
selftests: memcg: factor out common parts of memory.{low,min} tests
selftests: memcg: remove protection from top level memcg
selftests: memcg: adjust expected reclaim values of protected cgroups
selftests: memcg: expect no low events in unprotected sibling
selftests: memcg: fix compilation
mm/z3fold: fix z3fold_page_migrate races with z3fold_map
mm/z3fold: fix z3fold_reclaim_page races with z3fold_free
mm/z3fold: always clear PAGE_CLAIMED under z3fold page lock
mm/z3fold: put z3fold page back into unbuddied list when reclaim or migration fails
revert "mm/z3fold.c: allow __GFP_HIGHMEM in z3fold_alloc"
mm/z3fold: throw warning on failure of trylock_page in z3fold_alloc
...
for -stable. The remainder address pre-5.19 issues and are cc:stable.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYpEC8gAKCRDdBJ7gKXxA
jlukAQDCaXF7YTBjpoaAl0zhSu+5h7CawiB6cnRlq87/uJ2S4QD/eLVX3zfxI2DX
YcOhc5H8BOgZ8ppD80Nv9qjmyvEWzAA=
=ZFFG
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2022-05-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull hotfixes from Andrew Morton:
"Six hotfixes.
The page_table_check one from Miaohe Lin is considered a minor thing
so it isn't marked for -stable. The remainder address pre-5.19 issues
and are cc:stable"
* tag 'mm-hotfixes-stable-2022-05-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/page_table_check: fix accessing unmapped ptep
kexec_file: drop weak attribute from arch_kexec_apply_relocations[_add]
mm/page_alloc: always attempt to allocate at least one page during bulk allocation
hugetlb: fix huge_pmd_unshare address update
zsmalloc: fix races between asynchronous zspage free and page migration
Revert "mm/cma.c: remove redundant cma_mutex lock"
subsystems. Most notably some maintenance work in ocfs2 and initramfs.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYo/6xQAKCRDdBJ7gKXxA
jkD9AQCPczLBbRWpe1edL+5VHvel9ePoHQmvbHQnufdTh9rB5QEAu0Uilxz4q9cx
xSZypNhj2n9f8FCYca/ZrZneBsTnAA8=
=AJEO
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2022-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc updates from Andrew Morton:
"The non-MM patch queue for this merge window.
Not a lot of material this cycle. Many singleton patches against
various subsystems. Most notably some maintenance work in ocfs2
and initramfs"
* tag 'mm-nonmm-stable-2022-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (65 commits)
kcov: update pos before writing pc in trace function
ocfs2: dlmfs: fix error handling of user_dlm_destroy_lock
ocfs2: dlmfs: don't clear USER_LOCK_ATTACHED when destroying lock
fs/ntfs: remove redundant variable idx
fat: remove time truncations in vfat_create/vfat_mkdir
fat: report creation time in statx
fat: ignore ctime updates, and keep ctime identical to mtime in memory
fat: split fat_truncate_time() into separate functions
MAINTAINERS: add Muchun as a memcg reviewer
proc/sysctl: make protected_* world readable
ia64: mca: drop redundant spinlock initialization
tty: fix deadlock caused by calling printk() under tty_port->lock
relay: remove redundant assignment to pointer buf
fs/ntfs3: validate BOOT sectors_per_clusters
lib/string_helpers: fix not adding strarray to device's resource list
kernel/crash_core.c: remove redundant check of ck_cmdline
ELF, uapi: fixup ELF_ST_TYPE definition
ipc/mqueue: use get_tree_nodev() in mqueue_get_tree()
ipc: update semtimedop() to use hrtimer
ipc/sem: remove redundant assignments
...
When CRYPTO_LIB_POLY1305 is unset, CRYPTO_LIB_POLY1305_RSIZE
is still set in the Kconfig, cluttering things.
Fix this by making CRYPTO_LIB_POLY1305_RSIZE depend on
CRYPTO_LIB_POLY1305.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix the arm64 build error which was caused by commit ae07562909f3 ("mm:
change huge_ptep_clear_flush() to return the original pte") interacting
with commit fb396bb459c1 ("arm64/hugetlb: Drop TLB flush from
get_clear_flush()"):
arch/arm64/mm/hugetlbpage.c: In function ‘huge_ptep_clear_flush’:
arch/arm64/mm/hugetlbpage.c:515:9: error: implicit declaration of function ‘get_clear_flush’; did you mean ‘ptep_clear_flush’? [-Werror=implicit-function-declaration]
515 | return get_clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig);
| ^~~~~~~~~~~~~~~
| ptep_clear_flush
Due to the new get_clear_contig() has dropped TLB flush, we should add
an explicit TLB flush in huge_ptep_clear_flush() to keep original
semantics when changing to use new get_clear_contig().
Fixes: fb396bb459c1 ("arm64/hugetlb: Drop TLB flush from get_clear_flush()").
Fixes: ae07562909f3 ("mm: change huge_ptep_clear_flush() to return the original pte")
Reported-and-tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Reported-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
pipe_resize_ring() needs to take the pipe->rd_wait.lock spinlock to
prevent post_one_notification() from trying to insert into the ring
whilst the ring is being replaced.
The occupancy check must be done after the lock is taken, and the lock
must be taken after the new ring is allocated.
The bug can lead to an oops looking something like:
BUG: KASAN: use-after-free in post_one_notification.isra.0+0x62e/0x840
Read of size 4 at addr ffff88801cc72a70 by task poc/27196
...
Call Trace:
post_one_notification.isra.0+0x62e/0x840
__post_watch_notification+0x3b7/0x650
key_create_or_update+0xb8b/0xd20
__do_sys_add_key+0x175/0x340
__x64_sys_add_key+0xbe/0x140
do_syscall_64+0x5c/0xc0
entry_SYSCALL_64_after_hwframe+0x44/0xae
Reported by Selim Enes Karaduman @Enesdex working with Trend Micro Zero
Day Initiative.
Fixes: c73be61cede5 ("pipe: Add general notification queue support")
Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-17291
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm/shmem.c:1948 shmem_getpage_gfp() warn: should '(((1) << 12) / 512) << folio_order(folio)' be a 64 bit type?
On i386, so an unsigned long is 32-bit, but i_blocks is a 64-bit blkcnt_t.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Jessica Clarke <jrtc27@jrtc27.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
After commits 7b42f1041c98 ("mm: Kconfig: move swap and slab config
options to the MM section") and 519bcb797907 ("mm: Kconfig: group swap,
slab, hotplug and thp options into submenus") we now have nicely organized
mm related config options. I have noticed some that were still misplaced,
so this moves them from various places into the new structure:
VM_EVENT_COUNTERS, COMPAT_BRK, MMAP_ALLOW_UNINITIALIZED to mm/Kconfig and
general MM section.
SLUB_STATS to mm/Kconfig and the slab submenu.
DEBUG_SLAB, SLUB_DEBUG, SLUB_DEBUG_ON to mm/Kconfig.debug and the Kernel
hacking / Memory Debugging submenu.
Link: https://lkml.kernel.org/r/20220525112559.1139-1-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When print virtual mapping info for vmalloc address, it should pass
the addr not page, fix it.
Link: https://lkml.kernel.org/r/20220525120804.38155-1-wangkefeng.wang@huawei.com
Fixes: c056a364e954 ("kasan: print virtual mapping info in reports")
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Pages in the CMA area could have MIGRATE_ISOLATE as well as MIGRATE_CMA so
the current is_pinnable_page() could miss CMA pages which have
MIGRATE_ISOLATE. It ends up pinning CMA pages as longterm for the
pin_user_pages() API so CMA allocations keep failing until the pin is
released.
CPU 0 CPU 1 - Task B
cma_alloc
alloc_contig_range
pin_user_pages_fast(FOLL_LONGTERM)
change pageblock as MIGRATE_ISOLATE
internal_get_user_pages_fast
lockless_pages_from_mm
gup_pte_range
try_grab_folio
is_pinnable_page
return true;
So, pinned the page successfully.
page migration failure with pinned page
..
.. After 30 sec
unpin_user_page(page)
CMA allocation succeeded after 30 sec.
The CMA allocation path protects the migration type change race using
zone->lock but what GUP path need to know is just whether the page is on
CMA area or not rather than exact migration type. Thus, we don't need
zone->lock but just checks migration type in either of (MIGRATE_ISOLATE
and MIGRATE_CMA).
Adding the MIGRATE_ISOLATE check in is_pinnable_page could cause rejecting
of pinning pages on MIGRATE_ISOLATE pageblocks even though it's neither
CMA nor movable zone if the page is temporarily unmovable. However, such
a migration failure by unexpected temporal refcount holding is general
issue, not only come from MIGRATE_ISOLATE and the MIGRATE_ISOLATE is also
transient state like other temporal elevated refcount problem.
Link: https://lkml.kernel.org/r/20220524171525.976723-1-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When swap in shmem error at swapoff time, there would be a infinite loop
in the while loop in shmem_unuse_inode(). It's because swapin error is
deliberately ignored now and thus info->swapped will never reach 0. So we
can't escape the loop in shmem_unuse().
In order to fix the issue, swapin_error entry is stored in the mapping
when swapin error occurs. So the swapcache page can be freed and the user
won't end up with a permanently mounted swap because a sector is bad. If
the page is accessed later, the user process will be killed so that
corrupted data is never consumed. On the other hand, if the page is never
accessed, the user won't even notice it.
Link: https://lkml.kernel.org/r/20220519125030.21486-5-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reported-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: NeilBrown <neilb@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Once the MADV_FREE operation has succeeded, callers can expect they might
get zero-fill pages if accessing the memory again. Therefore it should be
safe to delete the hwpoison entry and swapin error entry. There is no
reason to kill the process if it has called MADV_FREE on the range.
Link: https://lkml.kernel.org/r/20220519125030.21486-4-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Suggested-by: Alistair Popple <apopple@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: NeilBrown <neilb@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This is observed by code review only but not any real report.
When we turn off swapping we could have lost the bits stored in the swap
ptes. The new rmap-exclusive bit is fine since that turned into a page
flag, but not for soft-dirty and uffd-wp. Add them.
Link: https://lkml.kernel.org/r/20220519125030.21486-3-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Suggested-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: NeilBrown <neilb@suse.de>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "A few fixup patches for mm", v4.
This series contains a few patches to avoid mapping random data if swap
read fails and fix lost swap bits in unuse_pte. Also we free hwpoison and
swapin error entry in madvise_free_pte_range and so on. More details can
be found in the respective changelogs.
This patch (of 5):
There is a bug in unuse_pte(): when swap page happens to be unreadable,
page filled with random data is mapped into user address space. In case
of error, a special swap entry indicating swap read fails is set to the
page table. So the swapcache page can be freed and the user won't end up
with a permanently mounted swap because a sector is bad. And if the page
is accessed later, the user process will be killed so that corrupted data
is never consumed. On the other hand, if the page is never accessed, the
user won't even notice it.
Link: https://lkml.kernel.org/r/20220519125030.21486-1-linmiaohe@huawei.com
Link: https://lkml.kernel.org/r/20220519125030.21486-2-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Howells <dhowells@redhat.com>
Cc: NeilBrown <neilb@suse.de>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The memory protection test setup and runtime is almost equal for
memory.low and memory.min cases.
It makes modification of the common parts prone to mistakes, since the
protections are similar not only in setup but also in principle, factor
the common part out.
Past exceptions between the tests:
- missing memory.min is fine (kept),
- test_memcg_low protected orphaned pagecache (adapted like
test_memcg_min and we keep the processes of protected memory running).
The evaluation in two tests is different (OOM of allocator vs low events
of protégés), this is kept different.
Link: https://lkml.kernel.org/r/20220518161859.21565-6-mkoutny@suse.com
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
CC: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Richard Palethorpe <rpalethorpe@suse.de>
Cc: David Vernet <void@manifault.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The reclaim is triggered by memory limit in a subtree, therefore the
testcase does not need configured protection against external reclaim.
Also, correct respective comments.
Link: https://lkml.kernel.org/r/20220518161859.21565-5-mkoutny@suse.com
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: David Vernet <void@manifault.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Richard Palethorpe <rpalethorpe@suse.de>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The numbers are not easy to derive in a closed form (certainly mere
protections ratios do not apply), therefore use a simulation to obtain
expected numbers.
Link: https://lkml.kernel.org/r/20220518161859.21565-4-mkoutny@suse.com
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: David Vernet <void@manifault.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Richard Palethorpe <rpalethorpe@suse.de>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This is effectively a revert of commit cdc69458a5f3 ("cgroup: account for
memory_recursiveprot in test_memcg_low()"). The case test_memcg_low will
fail with memory_recursiveprot until resolved in reclaim code.
However, this patch preserves the existing helpers and variables for later
uses.
Link: https://lkml.kernel.org/r/20220518161859.21565-3-mkoutny@suse.com
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Reviewed-by: David Vernet <void@manifault.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Richard Palethorpe <rpalethorpe@suse.de>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "memcontrol selftests fixups", v2.
Flushing the patches to make memcontrol selftests check the events
behavior we had consensus about (test_memcg_low fails).
(test_memcg_reclaim, test_memcg_swap_max fail for me now but it's present
even before the refactoring.)
The two bigger changes are:
- adjustment of the protected values to make tests succeed with the given
tolerance,
- both test_memcg_low and test_memcg_min check protection of memory in
populated cgroups (actually as per Documentation/admin-guide/cgroup-v2.rst
memory.min should not apply to empty cgroups, which is not the case
currently. Therefore I unified tests with the populated case in order to to
bring more broken tests).
This patch (of 5):
This fixes mis-applied changes from commit 72b1e03aa725 ("cgroup: account
for memory_localevents in test_memcg_oom_group_leaf_events()").
Link: https://lkml.kernel.org/r/20220518161859.21565-1-mkoutny@suse.com
Link: https://lkml.kernel.org/r/20220518161859.21565-2-mkoutny@suse.com
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Reviewed-by: David Vernet <void@manifault.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Richard Palethorpe <rpalethorpe@suse.de>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Think about the below scenario:
CPU1 CPU2
z3fold_page_migrate z3fold_map
z3fold_page_trylock
...
z3fold_page_unlock
/* slots still points to old zhdr*/
get_z3fold_header
get slots from handle
get old zhdr from slots
z3fold_page_trylock
return *old* zhdr
encode_handle(new_zhdr, FIRST|LAST|MIDDLE)
put_page(page) /* zhdr is freed! */
but zhdr is still used by caller!
z3fold_map can map freed z3fold page and lead to use-after-free bug. To
fix it, we add PAGE_MIGRATED to indicate z3fold page is migrated and soon
to be released. So get_z3fold_header won't return such page.
Link: https://lkml.kernel.org/r/20220429064051.61552-10-linmiaohe@huawei.com
Fixes: 1f862989b04a ("mm/z3fold.c: support page migration")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Think about the below scenario:
CPU1 CPU2
z3fold_reclaim_page z3fold_free
spin_lock(&pool->lock) get_z3fold_header -- hold page_lock
kref_get_unless_zero
kref_put--zhdr->refcount can be 1 now
!z3fold_page_trylock
kref_put -- zhdr->refcount is 0 now
release_z3fold_page
WARN_ON(!list_empty(&zhdr->buddy)); -- we're on buddy now!
spin_lock(&pool->lock); -- deadlock here!
z3fold_reclaim_page might race with z3fold_free and will lead to pool lock
deadlock and zhdr buddy non-empty warning. To fix this, defer getting the
refcount until page_lock is held just like what __z3fold_alloc does. Note
this has the side effect that we won't break the reclaim if we meet a soon
to be released z3fold page now.
Link: https://lkml.kernel.org/r/20220429064051.61552-9-linmiaohe@huawei.com
Fixes: dcf5aedb24f8 ("z3fold: stricter locking and more careful reclaim")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Think about the below race window:
CPU1 CPU2
z3fold_reclaim_page z3fold_free
test_and_set_bit PAGE_CLAIMED
failed to reclaim page
z3fold_page_lock(zhdr);
add back to the lru list;
z3fold_page_unlock(zhdr);
get_z3fold_header
page_claimed=test_and_set_bit PAGE_CLAIMED
clear_bit(PAGE_CLAIMED, &page->private);
if (!page_claimed) /* it's false true */
free_handle is not called
free_handle won't be called in this case. So z3fold_buddy_slots will leak.
Fix it by always clear PAGE_CLAIMED under z3fold page lock.
Link: https://lkml.kernel.org/r/20220429064051.61552-8-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When doing z3fold page reclaim or migration, the page is removed from
unbuddied list. If reclaim or migration succeeds, it's fine as page is
released. But in case it fails, the page is not put back into unbuddied
list now. The page will be leaked until next compaction work, reclaim or
migration is done.
Link: https://lkml.kernel.org/r/20220429064051.61552-7-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Revert commit f1549cb5ab2b ("mm/z3fold.c: allow __GFP_HIGHMEM in
z3fold_alloc").
z3fold can't support GFP_HIGHMEM page now. page_address is used directly
at all places. Moreover, z3fold_header is on per cpu unbuddied list which
could be accessed anytime. So we should remove the support of GFP_HIGHMEM
allocation for z3fold.
Link: https://lkml.kernel.org/r/20220429064051.61552-6-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
If trylock_page fails, the page won't be non-lru movable page. When this
page is freed via free_z3fold_page, it will trigger bug on PageMovable
check in __ClearPageMovable. Throw warning on failure of trylock_page to
guard against such rare case just as what zsmalloc does.
Link: https://lkml.kernel.org/r/20220429064051.61552-5-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Currently if z3fold couldn't find an unbuddied page it would first try to
pull a page off the stale list. But this approach is problematic. If
init z3fold page fails later, the page should be freed via
free_z3fold_page to clean up the relevant resource instead of using
__free_page directly. And if page is successfully reused, it will BUG_ON
later in __SetPageMovable because it's already non-lru movable page, i.e.
PAGE_MAPPING_MOVABLE is already set in page->mapping. In order to fix all
of these issues, we can simply remove the buggy use of stale list for
allocation because can_sleep should always be false and we never really
hit the reusing code path now.
Link: https://lkml.kernel.org/r/20220429064051.61552-4-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
alloc_slots could fail to allocate memory under heavy memory pressure. So
we should check zhdr->slots against NULL to avoid future null pointer
dereferencing.
Link: https://lkml.kernel.org/r/20220429064051.61552-3-linmiaohe@huawei.com
Fixes: fc5488651c7d ("z3fold: simplify freeing slots")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "A few fixup patches for z3fold".
This series contains a few fixup patches to fix sheduling while atomic,
fix possible null pointer dereferencing, fix various race conditions and
so on. More details can be found in the respective changelogs.
This patch (of 9):
z3fold's page_lock is always held when calling alloc_slots. So gfp should
be GFP_ATOMIC to avoid "scheduling while atomic" bug.
Link: https://lkml.kernel.org/r/20220429064051.61552-1-linmiaohe@huawei.com
Link: https://lkml.kernel.org/r/20220429064051.61552-2-linmiaohe@huawei.com
Fixes: fc5488651c7d ("z3fold: simplify freeing slots")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In isolate_single_pageblock(), free pages are checked without holding zone
lock, but they can go away in split_free_page() when zone lock is held.
Check the free page and its order again in split_free_page() when zone lock
is held. Recheck the page if the free page is gone under zone lock.
In addition, in split_free_page(), the free page was deleted from the page
list without changing free page accounting. Add the missing free page
accounting code.
Fix the type of order parameter in split_free_page().
Link: https://lore.kernel.org/lkml/20220525103621.987185e2ca0079f7b97b856d@linux-foundation.org/
Link: https://lkml.kernel.org/r/20220526231531.2404977-2-zi.yan@sent.com
Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity")
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reported-by: Doug Berger <opendmb@gmail.com>
Link: https://lore.kernel.org/linux-mm/c3932a6f-77fe-29f7-0c29-fe6b1c67ab7b@gmail.com/
Cc: David Hildenbrand <david@redhat.com>
Cc: Qian Cai <quic_qiancai@quicinc.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Eric Ren <renzhengeek@gmail.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Michael Walle <michael@walle.cc>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
start_isolate_page_range() first isolates the first and the last
pageblocks in the range and ensure pages across range boundaries are split
during isolation. But it missed the case when the range is <= a pageblock
and the first and the last pageblocks are the same one, so the second
isolate_single_pageblock() will always fail. To fix it, skip the
pageblock isolation in second isolate_single_pageblock().
Link: https://lkml.kernel.org/r/20220526231531.2404977-1-zi.yan@sent.com
Fixes: 88ee134320b8 ("mm: fix a potential infinite loop in start_isolate_page_range()")
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/linux-mm/ac65adc0-a7e4-cdfe-a0d8-757195b86293@samsung.com/
Reported-by: Michael Walle <michael@walle.cc>
Tested-by: Michael Walle <michael@walle.cc>
Link: https://lore.kernel.org/linux-mm/8ca048ca8b547e0dd1c95387ee05c23d@walle.cc/
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: David Hildenbrand <david@redhat.com>
Cc: Doug Berger <opendmb@gmail.com>
Cc: Eric Ren <renzhengeek@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Qian Cai <quic_qiancai@quicinc.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ptep is unmapped too early, so ptep could theoretically be accessed while
it's unmapped. This might become a problem if/when CONFIG_HIGHPTE becomes
available on riscv.
Fix it by deferring pte_unmap() until page table checking is done.
[akpm@linux-foundation.org: account for ptep alteration, per Matthew]
Link: https://lkml.kernel.org/r/20220526113350.30806-1-linmiaohe@huawei.com
Fixes: 80110bbfbba6 ("mm/page_table_check: check entries at pmd levels")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
symbols") [1], binutils (v2.36+) started dropping section symbols that
it thought were unused. This isn't an issue in general, but with
kexec_file.c, gcc is placing kexec_arch_apply_relocations[_add] into a
separate .text.unlikely section and the section symbol ".text.unlikely"
is being dropped. Due to this, recordmcount is unable to find a non-weak
symbol in .text.unlikely to generate a relocation record against.
Address this by dropping the weak attribute from these functions.
Instead, follow the existing pattern of having architectures #define the
name of the function they want to override in their headers.
[1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1
[akpm@linux-foundation.org: arch/s390/include/asm/kexec.h needs linux/module.h]
Link: https://lkml.kernel.org/r/20220519091237.676736-1-naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Peter Pavlisko reported the following problem on kernel bugzilla 216007.
When I try to extract an uncompressed tar archive (2.6 milion
files, 760.3 GiB in size) on newly created (empty) XFS file system,
after first low tens of gigabytes extracted the process hangs in
iowait indefinitely. One CPU core is 100% occupied with iowait,
the other CPU core is idle (on 2-core Intel Celeron G1610T).
It was bisected to c9fa563072e1 ("xfs: use alloc_pages_bulk_array() for
buffers") but XFS is only the messenger. The problem is that nothing is
waking kswapd to reclaim some pages at a time the PCP lists cannot be
refilled until some reclaim happens. The bulk allocator checks that there
are some pages in the array and the original intent was that a bulk
allocator did not necessarily need all the requested pages and it was best
to return as quickly as possible.
This was fine for the first user of the API but both NFS and XFS require
the requested number of pages be available before making progress. Both
could be adjusted to call the page allocator directly if a bulk allocation
fails but it puts a burden on users of the API. Adjust the semantics to
attempt at least one allocation via __alloc_pages() before returning so
kswapd is woken if necessary.
It was reported via bugzilla that the patch addressed the problem and that
the tar extraction completed successfully. This may also address bug
215975 but has yet to be confirmed.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=216007
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215975
Link: https://lkml.kernel.org/r/20220526091210.GC3441@techsingularity.net
Fixes: 387ba26fb1cb ("mm/page_alloc: add a bulk page allocator")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: <stable@vger.kernel.org> [5.13+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The routine huge_pmd_unshare() is passed a pointer to an address
associated with an area which may be unshared. If unshare is successful
this address is updated to 'optimize' callers iterating over huge page
addresses. For the optimization to work correctly, address should be
updated to the last huge page in the unmapped/unshared area. However, in
the common case where the passed address is PUD_SIZE aligned, the address
is incorrectly updated to the address of the preceding huge page. That
wastes CPU cycles as the unmapped/unshared range is scanned twice.
Link: https://lkml.kernel.org/r/20220524205003.126184-1-mike.kravetz@oracle.com
Fixes: 39dde65c9940 ("shared page table for hugetlb page")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Muchun Song <songmuchun@bytedance.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
set. This change improves DM's hipri bio polling (REQ_POLLED)
performance by 7 - 20% depending on the system.
- Update DM core to use jump_labels to further reduce cost of unlikely
branches for zoned block devices, dm-stats and swap_bios throttling.
- Various DM core changes to reduce bio-based DM overhead and simplify
IO accounting.
- Fundamental DM core improvements to dm_io reference counting and the
elimination of using bio_split()+bio_chain() -- instead DM's
bio-based IO accounting is updated to account that a split occurred.
- Improve DM core's abnormal bio processing to do less work.
- Improve DM core's hipri polling support to use a single list rather
than an hlist.
- Update DM core to pass NULL bdev to bio_alloc_clone() so that
initialization that isn't useful for DM can be elided.
- Add cond_resched to DM stats' various loops that loop over all
entries.
- Fix incorrect error code return from DM integrity's constructor.
- Make DM crypt's printing of the key constant-time.
- Update bio-based DM multipath to provide high-resolution timer to
the Historical Service Time (HST) path selector.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAmKOiAwACgkQxSPxCi2d
A1qCxgf/VLmiywUR7zCIDiPyJkc547z2MlPVaTCzpclklyLZ5wgfTtqsb8RnEqRs
KIXrvKbakbfZtrG0k9ZYdbIIVO4jHQavCXtVZH0Mj4XqQArDhHxZzlBx8i6BirO5
aya7HKWHcXj+s7GF146hRcPtD53LbSNDrB/bVaiZcHxOemthSgtht2xwmZfzoCcS
JtorpWkA97lgDbdSAcR0kn1zkHDWFkQMJgnUL9FotnlhOVXZVUvAbgb4OjfxB3fG
bCYlytDhs+6VwO+/CZXcOA7LNNtkLJPEkxUxjrChS5do64SgJUlDvY66iB6+7K0w
0Jri5D/NZvjwYVeLgXn5UwyQJ75cjQ==
=zlaE
-----END PGP SIGNATURE-----
Merge tag 'for-5.19/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- Enable DM core bioset's per-cpu bio cache if QUEUE_FLAG_POLL set.
This change improves DM's hipri bio polling (REQ_POLLED) performance
by 7 - 20% depending on the system.
- Update DM core to use jump_labels to further reduce cost of unlikely
branches for zoned block devices, dm-stats and swap_bios throttling.
- Various DM core changes to reduce bio-based DM overhead and simplify
IO accounting.
- Fundamental DM core improvements to dm_io reference counting and the
elimination of using bio_split()+bio_chain() -- instead DM's
bio-based IO accounting is updated to account that a split occurred.
- Improve DM core's abnormal bio processing to do less work.
- Improve DM core's hipri polling support to use a single list rather
than an hlist.
- Update DM core to pass NULL bdev to bio_alloc_clone() so that
initialization that isn't useful for DM can be elided.
- Add cond_resched to DM stats' various loops that loop over all
entries.
- Fix incorrect error code return from DM integrity's constructor.
- Make DM crypt's printing of the key constant-time.
- Update bio-based DM multipath to provide high-resolution timer to the
Historical Service Time (HST) path selector.
* tag 'for-5.19/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (26 commits)
dm: pass NULL bdev to bio_alloc_clone
dm cache metadata: remove unnecessary variable in __dump_mapping
dm mpath: provide high-resolution timer to HST for bio-based
dm crypt: make printing of the key constant-time
dm integrity: fix error code in dm_integrity_ctr()
dm stats: add cond_resched when looping over entries
dm: improve abnormal bio processing
dm: simplify bio-based IO accounting further
dm: put all polled dm_io instances into a single list
dm: improve dm_io reference counting
dm: don't grab target io reference in dm_zone_map_bio
dm: improve bio splitting and associated IO accounting
dm: switch to bdev based IO accounting interfaces
dm: pass dm_io instance to dm_io_acct directly
dm: don't pass bio to __dm_start_io_acct and dm_end_io_acct
dm: use bio_sectors in dm_aceept_partial_bio
dm: simplify basic targets
dm: conditionally enable branching for less used features
dm: introduce dm_{get,put}_live_table_bio called from dm_submit_bio
dm: move hot dm_io members to same cacheline as dm_target_io
...
Small collection of incremental improvement patches:
- Minor code cleanup patches, comment improvements, etc from static tools
- Clean the some of the kernel caps, reducing the historical stealth uAPI
leftovers
- Bug fixes and minor changes for rdmavt, hns, rxe, irdma
- Remove unimplemented cruft from rxe
- Reorganize UMR QP code in mlx5 to avoid going through the IB verbs layer
- flush_workqueue(system_unbound_wq) removal
- Ensure rxe waits for objects to be unused before allowing the core to
free them
- Several rc quality bug fixes for hfi1
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCYo+NxgAKCRCFwuHvBreF
YbqSAQDJ+QolaATUvOQUPLbuLopUCJLe95VS15Kl3SNXiVUUFAEA8DLL1s6+WShd
AgypUxGHipx5BAytrn45/WiwuDeEbQ8=
=jgTl
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
"Small collection of incremental improvement patches:
- Minor code cleanup patches, comment improvements, etc from static
tools
- Clean the some of the kernel caps, reducing the historical stealth
uAPI leftovers
- Bug fixes and minor changes for rdmavt, hns, rxe, irdma
- Remove unimplemented cruft from rxe
- Reorganize UMR QP code in mlx5 to avoid going through the IB verbs
layer
- flush_workqueue(system_unbound_wq) removal
- Ensure rxe waits for objects to be unused before allowing the core
to free them
- Several rc quality bug fixes for hfi1"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (67 commits)
RDMA/rtrs-clt: Fix one kernel-doc comment
RDMA/hfi1: Remove all traces of diagpkt support
RDMA/hfi1: Consolidate software versions
RDMA/hfi1: Remove pointless driver version
RDMA/hfi1: Fix potential integer multiplication overflow errors
RDMA/hfi1: Prevent panic when SDMA is disabled
RDMA/hfi1: Prevent use of lock before it is initialized
RDMA/rxe: Fix an error handling path in rxe_get_mcg()
IB/core: Fix typo in comment
RDMA/core: Fix typo in comment
IB/hf1: Fix typo in comment
IB/qib: Fix typo in comment
IB/iser: Fix typo in comment
RDMA/mlx4: Avoid flush_scheduled_work() usage
IB/isert: Avoid flush_scheduled_work() usage
RDMA/mlx5: Remove duplicate pointer assignment in mlx5_ib_alloc_implicit_mr()
RDMA/qedr: Remove unnecessary synchronize_irq() before free_irq()
RDMA/hns: Use hr_reg_read() instead of remaining roce_get_xxx()
RDMA/hns: Use hr_reg_xxx() instead of remaining roce_set_xxx()
RDMA/irdma: Add SW mechanism to generate completions on error
...
We introduce "courteous server" in this release. Previously NFSD
would purge open and lock state for an unresponsive client after
one lease period (typically 90 seconds). Now, after one lease
period, another client can open and lock those files and the
unresponsive client's lease is purged; otherwise if the unrespon-
sive client's open and lock state is uncontended, the server retains
that open and lock state for up to 24 hours, allowing the client's
workload to resume after a lengthy network partition.
A longstanding issue with NFSv4 file creation is also addressed.
Previously a file creation can fail internally, returning an error
to the client, but leave the newly created file in place as an
artifact. The file creation code path has been reorganized so that
internal failures and race conditions are less likely to result in
an unwanted file creation.
A fault injector has been added to help exercise paths that are run
during kernel metadata cache invalidation. These caches contain
information maintained by user space about exported filesystems.
Many of our test workloads do not trigger cache invalidation.
There is one patch that is needed to support PREEMPT_RT and a fix
for an ancient "sleep while spin-locked" splat that seems to have
become easier to hit since v5.18-rc3.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmKPliAACgkQM2qzM29m
f5dB3BAAorPa2L8xu5P1Ge1oTNogNSOVRkLPDzEkfEwK07ZM2qvz78eMZGkMziJ/
strorvBWl3SWBlVtTePgNpJUjgYQ75MRRwaX7Qh2WuHeRKm1JlZm0/NId3+zKgbh
N40QI20jdswWcNDuhidxVFFWurd09GlcM4z1cu8gZLbfthkiUOjZoPiLkXeNcvhk
7wC9GiueWxHefYQQDAKh1nQS/L0GG1EkzJdJo7WUVAldZ9qVY9LpmJVMRqrBBbta
XrFYfpeY1zFFDY4Qolyz5PUJSeQuDj9PctlhoZ6B1hp56PD/6yaqVhYXiPxtlALj
tITtktfiekULZkgfvfvyzssCv+wkbYiaEBZcSSCauR7dkGOmBmajO+cf7vpsERgE
fbCU8DWGk78SMeehdCrO+26cV37VP+8c2t2Txq/rG5Eq4ZoCi++Hj5poRboFLqb+
oom+0Ee0LfcAKXkxH5gWTPTblHo49GzGitPZtRzTgZ9uFnVwvEaJ4+t0ij0J8JpL
HuVtWrg5/REhqpEvOSwF0sRmkYWLTu7KdueGn/iZ8xUi7GHEue01NsVkClohKJcR
WOjWrbNCNF/LJaG88MX0z5u7IO7s9bOHphd7PJ92vR+4YsehW3uRhk+rNi2ZBqQz
hzULfu8BiaicV9fdB/hDcMmKQD6U6due2AVVPtxTf5XY+CHQNRY=
=phE1
-----END PGP SIGNATURE-----
Merge tag 'nfsd-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd updates from Chuck Lever:
"We introduce 'courteous server' in this release. Previously NFSD would
purge open and lock state for an unresponsive client after one lease
period (typically 90 seconds). Now, after one lease period, another
client can open and lock those files and the unresponsive client's
lease is purged; otherwise if the unresponsive client's open and lock
state is uncontended, the server retains that open and lock state for
up to 24 hours, allowing the client's workload to resume after a
lengthy network partition.
A longstanding issue with NFSv4 file creation is also addressed.
Previously a file creation can fail internally, returning an error to
the client, but leave the newly created file in place as an artifact.
The file creation code path has been reorganized so that internal
failures and race conditions are less likely to result in an unwanted
file creation.
A fault injector has been added to help exercise paths that are run
during kernel metadata cache invalidation. These caches contain
information maintained by user space about exported filesystems. Many
of our test workloads do not trigger cache invalidation.
There is one patch that is needed to support PREEMPT_RT and a fix for
an ancient 'sleep while spin-locked' splat that seems to have become
easier to hit since v5.18-rc3"
* tag 'nfsd-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (36 commits)
NFSD: nfsd_file_put() can sleep
NFSD: Add documenting comment for nfsd4_release_lockowner()
NFSD: Modernize nfsd4_release_lockowner()
NFSD: Fix possible sleep during nfsd4_release_lockowner()
nfsd: destroy percpu stats counters after reply cache shutdown
nfsd: Fix null-ptr-deref in nfsd_fill_super()
nfsd: Unregister the cld notifier when laundry_wq create failed
SUNRPC: Use RMW bitops in single-threaded hot paths
NFSD: Clean up the show_nf_flags() macro
NFSD: Trace filecache opens
NFSD: Move documenting comment for nfsd4_process_open2()
NFSD: Fix whitespace
NFSD: Remove dprintk call sites from tail of nfsd4_open()
NFSD: Instantiate a struct file when creating a regular NFSv4 file
NFSD: Clean up nfsd_open_verified()
NFSD: Remove do_nfsd_create()
NFSD: Refactor NFSv4 OPEN(CREATE)
NFSD: Refactor NFSv3 CREATE
NFSD: Refactor nfsd_create_setattr()
NFSD: Avoid calling fh_drop_write() twice in do_nfsd_create()
...
Fixups and enhancements for OpenRISC:
- A few sparse warning fixups and other cleanups I noticed when working
on a recent TLB bug found on a new OpenRISC core bring up.
- A few fixup's from me and Jason A Donenfeld to help shutdown OpenRISC
platforms when running CI tests
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE2cRzVK74bBA6Je/xw7McLV5mJ+QFAmKP+24ACgkQw7McLV5m
J+Q59w//eiFQv/L//IPZ//fDhDKYILs0JYnJGmAY2IBFpI/m/KkSWOeo32L5qyXe
IlvbxyrM3RVzDQVS/q1DG7vz9HuEI52asuefPEUeCmDSEcZH6g8ZzRHOrYqqDcM6
nKBoi2CIv9BpfkOdoewgUdaMoijhIWsT5QSZpLjkfPZvc2E78tlHvWiaZffZAJiR
XliE16XeuuKN6/vu5MNK2qExwsO9ayAiylv1K6bfN/0KkJEysY2FscRifHCbwscs
bGi367C7XfHE8Tks31SyUDqNlD760FjjL+ygAMG63v20mswhQggwEb/dApjzHj1l
MwrRBOG6CGYHE+AATsQLrd4N+fM9NNMfIARhtnbQYQlicAQpwGWNIPbpe1WOnBcw
KfXaiJwUcntf3bfBLyfZVvWs0ZYMgfUD8NGX4nDtTomywdxSUlTPefj2KP7UKiaV
NRwCOgZaT5iRzuJT832wCIRU75CjbFLvZ3JgMVwZtAlmGKTUopAikxvHTKRDDybn
O9ghq1b34hJf0BmnPTWF0pELTu2z3Vpt8qrUdNtfy3ICNiMDJV0e3aUH1gFJ1LtA
++bh98wErqaxVsNzQXS7PUwiB2xmUB8YGMfJoQIN/spysJ3mAlOjVPxFKLVXHopi
aHatlCvIV/Lc4ygt7no8PHtjvt8JckwCRpG0BLnrw09TokAQkp0=
=JxMd
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of https://github.com/openrisc/linux
Pull OpenRISC updates from Stafford Horne:
- A few sparse warning fixups and other cleanups I noticed when working
on a recent TLB bug found on a new OpenRISC core bring up.
- A few fixup's from me and Jason A Donenfeld to help shutdown OpenRISC
platforms when running CI tests
* tag 'for-linus' of https://github.com/openrisc/linux:
openrisc: Allow power off handler overriding
openrisc: Remove unused IMMU tlb workardound
openrisc/fault: Fix symbol scope warnings
openrisc/delay: Add include to fix symbol not declared warning
openrisc/time: Fix symbol scope warnings
openrisc/traps: Declare unhandled_exception for asmlinkage
openrisc/traps: Remove die_if_kernel function
openrisc/traps: Declare file scope symbols as static
openrisc: Update litex defconfig to support glibc userland
openrisc: Pretty print show_registers memory dumps
openrisc: Add syscall details to emergency syscall debugging
openrisc: Add support for liteuart emergency printing
openrisc: Cleanup emergency print handling
openrisc: Add gcc machine instruction flag configuration
openrisc: define nop command for simulator reboot
openrisc: remove bogus nops and shutdowns
openrisc: fix typos in comments
As promised, for v5.19 I queued up quite a bit of work for modules, but
still with a pretty conservative eye. These changes have been soaking on
modules-next (and so linux-next) for quite some time, the code shift was
merged onto modules-next on March 22, and the last patch was queued on May
5th.
The following are the highlights of what bells and whistles we will get for
v5.19:
1) It was time to tidy up kernel/module.c and one way of starting with
that effort was to split it up into files. At my request Aaron Tomlin
spearheaded that effort with the goal to not introduce any
functional at all during that endeavour. The penalty for the split
is +1322 bytes total, +112 bytes in data, +1210 bytes in text while
bss is unchanged. One of the benefits of this other than helping
make the code easier to read and review is summoning more help on review
for changes with livepatching so kernel/module/livepatch.c is now
pegged as maintained by the live patching folks.
The before and after with just the move on a defconfig on x86-64:
$ size kernel/module.o
text data bss dec hex filename
38434 4540 104 43078 a846 kernel/module.o
$ size -t kernel/module/*.o
text data bss dec hex filename
4785 120 0 4905 1329 kernel/module/kallsyms.o
28577 4416 104 33097 8149 kernel/module/main.o
1158 8 0 1166 48e kernel/module/procfs.o
902 108 0 1010 3f2 kernel/module/strict_rwx.o
3390 0 0 3390 d3e kernel/module/sysfs.o
832 0 0 832 340 kernel/module/tree_lookup.o
39644 4652 104 44400 ad70 (TOTALS)
2) Aaron added module unload taint tracking (MODULE_UNLOAD_TAINT_TRACKING),
so to enable tracking unloaded modules which did taint the kernel.
3) Christophe Leroy added CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
which lets architectures to request having modules data in vmalloc
area instead of module area. There are three reasons why an
architecture might want this:
a) On some architectures (like book3s/32) it is not possible to protect
against execution on a page basis. The exec stuff can be mapped by
different arch segment sizes (on book3s/32 that is 256M segments). By
default the module area is in an Exec segment while vmalloc area is in
a NoExec segment. Using vmalloc lets you muck with module data as
NoExec on those architectures whereas before you could not.
b) By pushing more module data to vmalloc you also increase the
probability of module text to remain within a closer distance
from kernel core text and this reduces trampolines, this has been
reported on arm first and powerpc folks are following that lead.
c) Free'ing module_alloc() (Exec by default) area leaves this
exposed as Exec by default, some architectures have some
security enhancements to set this as NoExec on free, and splitting
module data with text let's future generic special allocators
be added to the kernel without having developers try to grok
the tribal knowledge per arch. Work like Rick Edgecombe's
permission vmalloc interface [0] becomes easier to address over
time.
[0] https://lore.kernel.org/lkml/20201120202426.18009-1-rick.p.edgecombe@intel.com/#r
4) Masahiro Yamada's symbol search enhancements
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmKOnHkSHG1jZ3JvZkBr
ZXJuZWwub3JnAAoJEM4jHQowkoinFw4P/1ADdvfj+b6wbAxou6tPa2ZKnx/ImEnE
0T1P/n2guWg+2Q8oYjqifTpadGzr8td4c/PaGb5UpfdEOdBIyIGklrVZpQ+xkqfT
X4KIvqsf4ajL24OKxOSNtvL8RXEIDUhJ4Veq6BImBk8CPrPjsUBlNyAIlvV0aom2
BsFROQ2pMTSCiFY47gkMKLBlBny1l7zktoF0lhWTzHimw8VSDbTJFlu+fZvspd0o
lCqiHTkpiBSJDSEEjqk0lT6wIb27fvdzjmjy+Ur71bBKiPIEPiL5XNUufkGe6oB3
mnTOPow+wPTQc0dtkTpCHQYXE/a70Sbkwp1JfkbSYeHzJLlFru/tkmKiwN0RUo9l
0mY7VPEKuQWmxsOkLqvwcPBGx5JOSWOJKrbgpFmH+RLgeEgEa8t7uQDURK2KeIj8
P7ZzN5M2klKIHHA4vjfekYOJAb1Tii9Ibp7iGeiYxf93mPJBqwvRwbtBXBZpB4ce
FoDrxwEq812KPW7P2O1kgOvq7Fn1KWh0wVeKc8iBGxFxJhzOQY86H1ZRWDLAxRss
Rr1PMLt2TbTLUBt7MzR4vrg0NoQvpLYyf2jGFjWyZDRHU8nLeHkOlQot3xRDAtq9
Bpx5mSlM9BGfPibd1Kw4BaxBha5vVCQ+AcleT+NWnCjw4I0wLoFi9RLUSyItn9No
tlHLgdrM2a54
=cxtr
-----END PGP SIGNATURE-----
Merge tag 'modules-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull modules updates from Luis Chamberlain:
- It was time to tidy up kernel/module.c and one way of starting with
that effort was to split it up into files. At my request Aaron Tomlin
spearheaded that effort with the goal to not introduce any functional
at all during that endeavour. The penalty for the split is +1322
bytes total, +112 bytes in data, +1210 bytes in text while bss is
unchanged. One of the benefits of this other than helping make the
code easier to read and review is summoning more help on review for
changes with livepatching so kernel/module/livepatch.c is now pegged
as maintained by the live patching folks.
The before and after with just the move on a defconfig on x86-64:
$ size kernel/module.o
text data bss dec hex filename
38434 4540 104 43078 a846 kernel/module.o
$ size -t kernel/module/*.o
text data bss dec hex filename
4785 120 0 4905 1329 kernel/module/kallsyms.o
28577 4416 104 33097 8149 kernel/module/main.o
1158 8 0 1166 48e kernel/module/procfs.o
902 108 0 1010 3f2 kernel/module/strict_rwx.o
3390 0 0 3390 d3e kernel/module/sysfs.o
832 0 0 832 340 kernel/module/tree_lookup.o
39644 4652 104 44400 ad70 (TOTALS)
- Aaron added module unload taint tracking (MODULE_UNLOAD_TAINT_TRACKING),
to enable tracking unloaded modules which did taint the kernel.
- Christophe Leroy added CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
which lets architectures to request having modules data in vmalloc
area instead of module area. There are three reasons why an
architecture might want this:
a) On some architectures (like book3s/32) it is not possible to
protect against execution on a page basis. The exec stuff can be
mapped by different arch segment sizes (on book3s/32 that is 256M
segments). By default the module area is in an Exec segment while
vmalloc area is in a NoExec segment. Using vmalloc lets you muck
with module data as NoExec on those architectures whereas before
you could not.
b) By pushing more module data to vmalloc you also increase the
probability of module text to remain within a closer distance
from kernel core text and this reduces trampolines, this has been
reported on arm first and powerpc folks are following that lead.
c) Free'ing module_alloc() (Exec by default) area leaves this
exposed as Exec by default, some architectures have some security
enhancements to set this as NoExec on free, and splitting module
data with text let's future generic special allocators be added
to the kernel without having developers try to grok the tribal
knowledge per arch. Work like Rick Edgecombe's permission vmalloc
interface [0] becomes easier to address over time.
[0] https://lore.kernel.org/lkml/20201120202426.18009-1-rick.p.edgecombe@intel.com/#r
- Masahiro Yamada's symbol search enhancements
* tag 'modules-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: (33 commits)
module: merge check_exported_symbol() into find_exported_symbol_in_section()
module: do not binary-search in __ksymtab_gpl if fsa->gplok is false
module: do not pass opaque pointer for symbol search
module: show disallowed symbol name for inherit_taint()
module: fix [e_shstrndx].sh_size=0 OOB access
module: Introduce module unload taint tracking
module: Move module_assert_mutex_or_preempt() to internal.h
module: Make module_flags_taint() accept a module's taints bitmap and usable outside core code
module.h: simplify MODULE_IMPORT_NS
powerpc: Select ARCH_WANTS_MODULES_DATA_IN_VMALLOC on book3s/32 and 8xx
module: Remove module_addr_min and module_addr_max
module: Add CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
module: Introduce data_layout
module: Prepare for handling several RB trees
module: Always have struct mod_tree_root
module: Rename debug_align() as strict_align()
module: Rework layout alignment to avoid BUG_ON()s
module: Move module_enable_x() and frob_text() in strict_rwx.c
module: Make module_enable_x() independent of CONFIG_ARCH_HAS_STRICT_MODULE_RWX
module: Move version support into a separate file
...
For two kernel releases now kernel/sysctl.c has been being cleaned up
slowly, since the tables were grossly long, sprinkled with tons of #ifdefs and
all this caused merge conflicts with one susbystem or another.
This tree was put together to help try to avoid conflicts with these cleanups
going on different trees at time. So nothing exciting on this pull request,
just cleanups.
I actually had this sysctl-next tree up since v5.18 but I missed sending a
pull request for it on time during the last merge window. And so these changes
have been being soaking up on sysctl-next and so linux-next for a while.
The last change was merged May 4th.
Most of the compile issues were reported by 0day and fixed.
To help avoid a conflict with bpf folks at Daniel Borkmann's request
I merged bpf-next/pr/bpf-sysctl into sysctl-next to get the effor which
moves the BPF sysctls from kernel/sysctl.c to BPF core.
Possible merge conflicts and known resolutions as per linux-next:
bfp:
https://lkml.kernel.org/r/20220414112812.652190b5@canb.auug.org.au
rcu:
https://lkml.kernel.org/r/20220420153746.4790d532@canb.auug.org.au
powerpc:
https://lkml.kernel.org/r/20220520154055.7f964b76@canb.auug.org.au
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmKOq8ASHG1jZ3JvZkBr
ZXJuZWwub3JnAAoJEM4jHQowkoinDAkQAJVo5YVM9f74UwYp4PQhTpjxJBCjRoZD
z1u9bp5rMj2ujTC8Fr7VmzKaHrb8+r1C1WvCvZtIzemYNB4lZUrHpVDYfXuXiPRB
ihPmEjhlPO5PFBx6cVCpI3cu9bEhG00rLc1QXnABx/pXwNPcOTJAGZJVamZvqubk
chjgZrb7N+adHPfvS55v1+zpwdeKfpp5U3zuu5qlT/nn0GS0HCVzOj5fj4oC4wtJ
IqfUubo+FX50Ga58yQABWNrjaPD9Crykz5ohVazy3ElQl0hJ4VsK65ct3blqc2vz
1Bb8kPpWuv6aZ5nr1lCVE8qvF4ZIL33ySvpg5BSdWLQEDrBbSpzvJe9Yn7wgR+eq
y7fhpO24+zRM82EoDMEvyxX9u1n1RsvoXRtf3ds9BGf63MUxk8a1cgjlU6vuyO2U
JhDmfM1xzdKvPoY4COOnHzcAiIqzItTqKd09N5y0cahmYstROU8lvp9huhTAHqk1
SjQMbLIZG7OnX8ZeQcR1EB8sq/IOPZT48ejj0iJmQ8FyMaep71MOQLYyLPAq4lgh
JHXm8P6QdB57jfJbqAeNSyZoK0qdxOUR/83Zcah7Jjns6vkju1DNatEsaEEI2y2M
4n7/rkHeZ3TyFHBUX4e9FomKvGLsAalDBRiqsuxLSOPMU8rGrNLAslOAtKwvp90X
4ht3M2VP098l
=btwh
-----END PGP SIGNATURE-----
Merge tag 'sysctl-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull sysctl updates from Luis Chamberlain:
"For two kernel releases now kernel/sysctl.c has been being cleaned up
slowly, since the tables were grossly long, sprinkled with tons of
#ifdefs and all this caused merge conflicts with one susbystem or
another.
This tree was put together to help try to avoid conflicts with these
cleanups going on different trees at time. So nothing exciting on this
pull request, just cleanups.
Thanks a lot to the Uniontech and Huawei folks for doing some of this
nasty work"
* tag 'sysctl-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: (28 commits)
sched: Fix build warning without CONFIG_SYSCTL
reboot: Fix build warning without CONFIG_SYSCTL
kernel/kexec_core: move kexec_core sysctls into its own file
sysctl: minor cleanup in new_dir()
ftrace: fix building with SYSCTL=y but DYNAMIC_FTRACE=n
fs/proc: Introduce list_for_each_table_entry for proc sysctl
mm: fix unused variable kernel warning when SYSCTL=n
latencytop: move sysctl to its own file
ftrace: fix building with SYSCTL=n but DYNAMIC_FTRACE=y
ftrace: Fix build warning
ftrace: move sysctl_ftrace_enabled to ftrace.c
kernel/do_mount_initrd: move real_root_dev sysctls to its own file
kernel/delayacct: move delayacct sysctls to its own file
kernel/acct: move acct sysctls to its own file
kernel/panic: move panic sysctls to its own file
kernel/lockdep: move lockdep sysctls to its own file
mm: move page-writeback sysctls to their own file
mm: move oom_kill sysctls to their own file
kernel/reboot: move reboot sysctls to its own file
sched: Move energy_aware sysctls to topology.c
...
- qcom: log pending irq during resume
minor cosmetic changes
- omap: use pm_runtime_resume_and_get
- imx: use pm_runtime_resume_and_get
remove redundant initializer
- mtk: added GCE header for MT8186
enable support for MT8186
- tegra: remove redundant NULL check
added hsp_sm_ops for send/recv api
support shared mailboxes
- stm: remove unsupported "wakeup" irq
- pcc: sanitize mbox allocated memory before use
- misc: documentation fixes for arm_mhu and qcom-ipcc
-
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE6EwehDt/SOnwFyTyf9lkf8eYP5UFAmKO3ZYACgkQf9lkf8eY
P5VYug/8DfK0Ang0Sgw7DkT0w1TVY5sAUvq9EgA5z65HL2Vf6hdCMN5JAAVOBuqg
L8BcKegci3/aa5rxbRXpjAcNAFVG9RiaZ6i0qrXL2QGpNGoqXDFH4z7thREjgCYz
LgwSILyNneOovEuNqCFas7zyJSzzaIUCOybJgHA8ZVpCOnCKNHW5mZhi8tlCIbJ9
aY0rLD7Kc9ZLQ3N4JfvjtevaQ5EQR8EpMCQSIksdm5U9k8ej8dJPfDibUzM0PICU
o1DAYG9WxsqibufkJjFYFKf3uxW6dlcWzhUB4QVP3gnmvSIicNfaHTNvOgYOeLYw
d9yXz/ixxakVH2zGTCz6WELG8YQSzvzkjwOFAeDI7vNYgKE23v6l0Hj7dnc+akYW
08wP4Nyo5HRUPk4KnIOiRpNMw7QQKfS7Lufp8L6uwHVlf0uLBvoT49Px9d5ZdIO9
4pR8DLfuSfhU5swuhepipCs7z+NGfvMp2eeqqv1urfvlgjh9d2Y2A/g5qhC4/Zjt
CK27rKIFTpL0Wn2r1pV6+1rLXc0x3o5CXJBFMNLYZEfMJ91ZvyS3InZqebIcUk7f
0yUzLGCSY0a86Xriq8lvkYs+roQxl4Gqm2Jwn9RQisQQ2Q1OOW0g2ZqaG7mZGQf3
v5/gq5w3x1rbHU5Cn8yuFMw8D9O0kJ6ExNdqHtfiNLkmcPMM8Vc=
=f51u
-----END PGP SIGNATURE-----
Merge tag 'mailbox-v5.19' of git://git.linaro.org/landing-teams/working/fujitsu/integration
Pull mailbox updates from Jassi Brar:
"api:
- hrtimer fix
qcom:
- log pending irq during resume
- minor cosmetic changes
omap:
- use pm_runtime_resume_and_get
imx:
- use pm_runtime_resume_and_get
- remove redundant initializer
mtk:
- added GCE header for MT8186
- enable support for MT8186
tegra:
- remove redundant NULL check
- added hsp_sm_ops for send/recv api
- support shared mailboxes
stm:
- remove unsupported "wakeup" irq
pcc:
- sanitize mbox allocated memory before use
misc:
- documentation fixes for arm_mhu and qcom-ipcc"
* tag 'mailbox-v5.19' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
mailbox: qcom-ipcc: Fix -Wunused-function with CONFIG_PM_SLEEP=n
mailbox: forward the hrtimer if not queued and under a lock
mailbox: qcom-ipcc: Log the pending interrupt during resume
mailbox: pcc: Fix an invalid-load caught by the address sanitizer
dt-bindings: mailbox: remove the IPCC "wakeup" IRQ
mailbox: correct kerneldoc
mailbox: omap: using pm_runtime_resume_and_get to simplify the code
mailbox:imx: using pm_runtime_resume_and_get
mailbox: mediatek: support mt8186 adsp mailbox
dt-bindings: mailbox: mtk,adsp-mbox: add mt8186 compatible name
mailbox: tegra-hsp: Add 128-bit shared mailbox support
dt-bindings: tegra186-hsp: add type for shared mailboxes
mailbox: tegra-hsp: Add tegra_hsp_sm_ops
dt-bindings: gce: add the GCE header file for MT8186
mailbox: remove an unneeded NULL check on list iterator
mailbox: imx: remove redundant initializer
dt-bindings: mailbox: qcom-ipcc: simplify the example
- use ioread()/iowrite() interfaces instead of raw inb()/outb() in drivers
- make irqchips immutable due to the new warning popping up when drivers try to
modify the irqchip structures
- add new compatibles to dt-bindings for realtek-otto, renesas-rcar and pca95xx
- add support for new models to gpio-rcar, gpio-pca953x & gpio-realtek-otto
- allow parsing of GPIO hogs represented as children nodes of gpio-uniphier
- define a set of common GPIO consumer strings in dt-bindings
- shrink code in gpio-ml-ioh by using more devres interfaces
- pass arguments to devm_kcalloc() in correct order in gpio-sim
- add new helpers for iterating over GPIO firmware nodes and descriptors to
gpiolib core and use it in several drivers
- drop unused syscon_regmap_lookup_by_compatible() function
- correct format specifiers and signedness of variables in GPIO ACPI
- drop unneeded error checks in gpio-ftgpio
- stop using the deprecated of_gpio.h header in gpio-zevio
- drop platform_data support in gpio-max732x
- simplify Kconfig dependencies in gpio-vf610
- use raw spinlocks where needed to make PREEMPT_RT happy
- fix return values in board files using gpio-pcf857x
- convert more drivers to using fwnode instead of of_node
- minor fixes and improvements in gpiolib core
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEFp3rbAvDxGAT0sefEacuoBRx13IFAmKOU9gACgkQEacuoBRx
13L6+g//axfOCo1VFtrIZR3Sh8F3Zt6t5DfdWn7DU5OqKL9xQzF7o6dB5nqOa4ua
k8cgXAlMj3EnlFAxWalArCw3Lu3ntInXfl2EAgFOfhya3DLCjQRayoz7EGTMlGrA
XLZWNc+kUEDNOfN0L+fLHRopgi9jOtlS4XODaVMJKH31jVxufAwoQrFZF4d7pMvW
XC4vuSYmRfLrNCm77CqznBjw5hD44v5bxxkGyGmKhE+VmuFcLX1feSTKkttZ+ZMC
CP/Rp0/KSzJU4/1+9uPPrNY8NJGsBN9Uo+BQzH6nuSQrrO2MuOj5JA6UqgR+MHjI
9a/b/iftiPnsxSzbE8PKj/jWcswmScp7tvGqwCa0Q7Fh502+p+8RtEVKugy5nYEG
xNPONhQusu21Hw2ySHZZVjuxfQKi09uDEIZN55V5etsURHXiUB6RtZJwlgXWOYp5
8/h/TPemAZsfvAs/9OZwck171oQBUnX1K+gdbQ/5t4QoW+VxQCuGP0uTPB5kTxSV
yfVjeD2tpEdpjEAwmKrSLug4xLRlB4ed17DeEstFbFARUdOQZSLBiXln2KBg/d6Y
ofnAPvqZyMf0/MFSKBoXIb/aKw8svbTKDwy0HvU3Tf0lOGix4F/w9ih6VzUo/yVt
Aj9oqQyfK9E7EvPEQnZqn3h/3fmXvqDGrryABmuOmiy3N6RFGho=
=tj/B
-----END PGP SIGNATURE-----
Merge tag 'gpio-updates-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio updates from Bartosz Golaszewski:
"We have lots of small changes all over the place, but no huge reworks
or new drivers:
- use ioread()/iowrite() interfaces instead of raw inb()/outb() in
drivers
- make irqchips immutable due to the new warning popping up when
drivers try to modify the irqchip structures
- add new compatibles to dt-bindings for realtek-otto, renesas-rcar
and pca95xx
- add support for new models to gpio-rcar, gpio-pca953x &
gpio-realtek-otto
- allow parsing of GPIO hogs represented as children nodes of
gpio-uniphier
- define a set of common GPIO consumer strings in dt-bindings
- shrink code in gpio-ml-ioh by using more devres interfaces
- pass arguments to devm_kcalloc() in correct order in gpio-sim
- add new helpers for iterating over GPIO firmware nodes and
descriptors to gpiolib core and use it in several drivers
- drop unused syscon_regmap_lookup_by_compatible() function
- correct format specifiers and signedness of variables in GPIO ACPI
- drop unneeded error checks in gpio-ftgpio
- stop using the deprecated of_gpio.h header in gpio-zevio
- drop platform_data support in gpio-max732x
- simplify Kconfig dependencies in gpio-vf610
- use raw spinlocks where needed to make PREEMPT_RT happy
- fix return values in board files using gpio-pcf857x
- convert more drivers to using fwnode instead of of_node
- minor fixes and improvements in gpiolib core"
* tag 'gpio-updates-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: (55 commits)
gpio: sifive: Make the irqchip immutable
gpio: rcar: Make the irqchip immutable
gpio: pcf857x: Make the irqchip immutable
gpio: pca953x: Make the irqchip immutable
gpio: dwapb: Make the irqchip immutable
gpio: sim: Use correct order for the parameters of devm_kcalloc()
gpio: ml-ioh: Convert to use managed functions pcim* and devm_*
gpio: ftgpio: Remove unneeded ERROR check before clk_disable_unprepare
gpio: ws16c48: Utilize iomap interface
gpio: gpio-mm: Utilize iomap interface
gpio: 104-idio-16: Utilize iomap interface
gpio: 104-idi-48: Utilize iomap interface
gpio: 104-dio-48e: Utilize iomap interface
gpio: zevio: drop of_gpio.h header
gpio: max77620: Make the irqchip immutable
dt-bindings: gpio: pca95xx: add entry for pca6408
gpio: pca953xx: Add support for pca6408
gpio: max732x: Drop unused support for irq and setup code via platform data
gpio: vf610: drop the SOC_VF610 dependency for GPIO_VF610
gpio: syscon: Remove usage of syscon_regmap_lookup_by_compatible
...