IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Commit 29064bfdabd5 ("vdpa/mlx5: Add support library for mlx5 VDPA implementation")
declared but never implemented these.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Message-Id: <20230803143041.23388-1-yuehaibing@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
New support:
- Qualcomm SM6115 and QCM2290 dmaengine support
- at_xdma support for microchip,sam9x7 controller
Updates:
- idxd updates for wq simplification and ats knob updates
- fsl edma updates for v3 support
- Xilinx AXI4-Stream control support
- Yaml conversion for bcm dma binding
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+vs47OPLdNbVcHzyfBQHDyUjg0cFAmT0xHEACgkQfBQHDyUj
g0cUNw/+IN6UpnQyQ61D6Ljv1IjjmmhJS7hVVnkQvAzlFdsRdAw9Ez5jJB2qz7kU
dAcFKDPTVGZaLYSeQFkmbHeRF8Sk8rCOBHTWMMM085LMttATVhRjth3m2gj4Jp0Y
XOaJ/TWXGM9DAtojhOoEV+xtY8KdASkwrS7DiPYrwx8u/BTA3p8fa6v3ggyi/ttf
PoPXfmXR7PupZM5CLDaPMDBW5aIveLbTSsXGlixRrWbccI2dH9RG/l5KKUPFq2Se
LEIYtsa4ZK8OOb+WpUaZjJ5C7JgXMm0ZCXK9n8tqcdMjxjPAdOoy0lasaVLS0yw9
BiL4D5sFkUYAOf/9QTY3CSlzv7sjuSCBCGzppV5dYdliKCMx/qv34zNyBnWyhhgA
oW6h0cWQU/kcX9AUptYn9bHWDb4/Cykd+g9fUlCa1En8DL4lMRkKojHGKieD0mWw
A9p8W5p8K//g0FOjfPvqNhFP3iCkVCXMys1/mLZvKzofX6PEJc7lPPetY/YLYdBQ
g02lb6X4FP6MIyPlAzrmudGnCkYuUWGzXE2K4Rs/uTJtfTEkVwICMEQrnAszSl3o
4hYVjcDfk+5/4guFr4gHRqMq58E+cVv+zV70r2ttnF79lX6sTJ/Uvr7cJa1qVhB9
0mPLf7GXoKVanrX0rH0w/67Lo7wrQVdd/94BmHyd0kd8lUh63FM=
=LPWk
-----END PGP SIGNATURE-----
Merge tag 'dmaengine-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine
Pull dmaengine updates from Vinod Koul:
"New controller support and updates to drivers.
New support:
- Qualcomm SM6115 and QCM2290 dmaengine support
- at_xdma support for microchip,sam9x7 controller
Updates:
- idxd updates for wq simplification and ats knob updates
- fsl edma updates for v3 support
- Xilinx AXI4-Stream control support
- Yaml conversion for bcm dma binding"
* tag 'dmaengine-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (53 commits)
dmaengine: fsl-edma: integrate v3 support
dt-bindings: fsl-dma: fsl-edma: add edma3 compatible string
dmaengine: fsl-edma: move tcd into struct fsl_dma_chan
dmaengine: fsl-edma: refactor chan_name setup and safety
dmaengine: fsl-edma: move clearing of register interrupt into setup_irq function
dmaengine: fsl-edma: refactor using devm_clk_get_enabled
dmaengine: fsl-edma: simply ATTR_DSIZE and ATTR_SSIZE by using ffs()
dmaengine: fsl-edma: move common IRQ handler to common.c
dmaengine: fsl-edma: Remove enum edma_version
dmaengine: fsl-edma: transition from bool fields to bitmask flags in drvdata
dmaengine: fsl-edma: clean up EXPORT_SYMBOL_GPL in fsl-edma-common.c
dmaengine: fsl-edma: fix build error when arch is s390
dmaengine: idxd: Fix issues with PRS disable sysfs knob
dmaengine: idxd: Allow ATS disable update only for configurable devices
dmaengine: xilinx_dma: Program interrupt delay timeout
dmaengine: xilinx_dma: Use tasklet_hi_schedule for timing critical usecase
dmaengine: xilinx_dma: Freeup active list based on descriptor completion bit
dmaengine: xilinx_dma: Increase AXI DMA transaction segment count
dmaengine: xilinx_dma: Pass AXI4-Stream control words to dma client
dt-bindings: dmaengine: xilinx_dma: Add xlnx,irq-delay property
...
- Core support for soundwire device number allocation
- intel driver updates for adding hw_params for DAI ops, hybrid number
allocation and power managemnt callback updates
- DT header include changes for subsystem
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+vs47OPLdNbVcHzyfBQHDyUjg0cFAmT0vxoACgkQfBQHDyUj
g0f/Lg/8CHWdjQzNJVOoJIqY+mFjIKT+QtyRZSpdaQIFYcQF4xcd6v6/rEJzLjgg
bMlJ+Icwe19sIq2Yhq69kcecCD5/Arx3tqo2WAwcUF5NykFVQnO/FO+qZYdNM8A9
ohI8+1ZM1qyrC0YocpiUUINOek0MVeLf4wBFFiJMZRVNNMpCH7LrOcs+vH8bLKYn
i03dXc/sbM0QyBnfsjnu02wbaq84pFOlbkLJYn21cFoiqryMAXjNXRIpuJ35T5BX
iXgHHhVX4vWgOszLM83BhKQnSCemNb4y24ablm9gtVBMpNg2yAyHL0qpAk9nFhGn
vVZIup8OVpfziW6O48BBA7HWMBA0fRL8Iz80oH8ZoAAfRl32QGKal37jUaOL7ycO
+kMNbJq85v0NDRnUiHeUAY/X5gnnRZpbCLzWxpp3ohx8W0mQctQRFoArhITV7O0Q
38RNfe89Fq6f9poz56tj+d8d/zLx0VzLqPjiuP89HeeOboIjkGtNRfU1z/V9Wh2G
8Fr77rwSAo+wghlFpHE8S9gkyLvMiVeEdNEHskQod7ZvAD2YrfHPLENOW7L0rpGw
Eurl+bby1hqgsvA6gQqUQt2xeumeLYelttZYcmMGhPLo7RTYbTckwh6CO2iomNL/
noCtG/mNbR78pOKIab4VWoLcOzFfJ3qCpdOLzYBQYphxJo2oj50=
=4w8N
-----END PGP SIGNATURE-----
Merge tag 'soundwire-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire
Pull soundwire updates from Vinod Koul:
"Device numbering and intel driver changes are main features:
- Core support for soundwire device number allocation
- intel driver updates for adding hw_params for DAI ops, hybrid
number allocation and power managemnt callback updates
- DT header include changes for subsystem"
* tag 'soundwire-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
soundwire: intel_ace2x: add DAI hw_params/prepare/hw_free callbacks
soundwire: intel_auxdevice: add hybrid IDA-based device_number allocation
soundwire: bus: add callbacks for device_number allocation
soundwire: extend parameters of new_peripheral_assigned() callback
soundWire: intel_auxdevice: resume 'sdw-master' on startup and system resume
soundwire: intel_auxdevice: enable pm_runtime earlier on startup
soundwire: Explicitly include correct DT includes
Currently the Kconfig fragments in kernel/configs and arch/*/configs
that aren't used internally aren't discoverable through "make help",
which consists of hard-coded lists of config fragments. Instead, list
all the fragment targets that have a "# Help: " comment prefix so the
targets can be generated dynamically.
Add logic to the Makefile to search for and display the fragment and
comment. Add comments to fragments that are intended to be direct targets.
Signed-off-by: Kees Cook <keescook@chromium.org>
Co-developed-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
* Use refcount to prevent corruption
* Call external _get and _put in right order
* Fix use-after-free in mtd release
* Explicitly include correct DT includes
* Clean refcounting with MTD_PARTITIONED_MASTER
* mtdblock: make warning messages ratelimited
* dt-bindings: Add SEAMA partition bindings
MTD device driver changes:
* spear_smi: Use helper function devm_clk_get_enabled()
* maps: fix -Wvoid-pointer-to-enum-cast warning
* docg3: Remove unnecessary (void*) conversions
* physmap-core, spear_smi, st_spi_fsm, lpddr2_nvm, lantiq-flash, plat-ram:
- Use devm_platform_get_and_ioremap_resource()
Raw NAND core changes:
* Fix -Wvoid-pointer-to-enum-cast warning
* Export 'nand_exit_status_op()'
* dt-bindings: Fix nand-controller.yaml license
Raw NAND controller driver changes:
* Omap, Omap2, Samsung, Atmel, fsl_upm, lpc32xx_slc, lpc32xx_mlc, STM32_FMC2,
sh_ftlctl, MXC, Sunxi:
- Use devm_platform_get_and_ioremap_resource()
* Orion, vf610_nfc, Sunxi, STM32_FMC2, MTK, mpc5121, lpc32xx_slc, Intel,
FSMC, Arasan:
- Use helper function devm_clk_get_optional_enabled()
* Brcmnand:
- Use devm_platform_ioremap_resource_byname()
- Propagate init error -EPROBE_DEFER up
- Propagate error and simplify ternary operators
- Fix mtd oobsize
- Fix potential out-of-bounds access in oob write
- Fix crash during the panic_write
- Fix potential false time out warning
- Fix ECC level field setting for v7.2 controller
* fsmc: Handle clk prepare error in fsmc_nand_resume()
* Marvell: Add support for AC5 SoC
* Meson:
- Support for 512B ECC step size
- Fix build error
- Use NAND core API to check status
- dt-bindings:
* Make ECC properties dependent
* Support for 512B ECC step size
* Drop unneeded quotes
* Oxnas: Remove driver and bindings
* Qcom:
- Conversion to ->exec_op()
- Removal of the legacy interface
- Two full series of improvements/misc fixes
* Use the BIT() macro
* Use u8 instead of uint8_t
* Fix alignment with open parenthesis
* Fix the spacing
* Fix wrong indentation
* Fix a typo
* Early structure initialization
* Fix address parsing within ->exec_op()
* Remove superfluous initialization of "ret"
* Rename variables in qcom_op_cmd_mapping()
* Handle unsupported opcode in qcom_op_cmd_mapping()
* Fix the opcode check in qcom_check_op()
* Use EOPNOTSUPP instead of ENOTSUPP
* Wrap qcom_nand_exec_op() to 80 columns
* Unmap sg_list and free desc within submic_descs()
* Simplify the call to nand_prog_page_end_op()
* Do not override the error no of submit_descs()
* Sort includes alphabetically
* Clear buf_count and buf_start in raw read
* Add read/read_start ops in exec_op path
* vf610_nfc: Do not check 0 for platform_get_irq()
SPI NAND manufacturer driver changes:
* gigadevice: Add support for GD5F1GQ{4,5}RExxH
* esmt: Add support for F50D2G41KA
* toshiba: Add support for T{C,H}58NYG{0,2}S3HBAI4 and TH58NYG3S0HBAI6
SPI NOR core changes:
* fix assumption on enabling quad mode in
spi_nor_write_16bit_sr_and_check()
* avoid setting SRWD bit in SR if WP# signal not connected as it will
configure the SR permanently as read only. Add "no-wp" dt property.
* clarify the need for spi-nor compatibles in dt-bindings
SPI NOR manufacturer driver changes:
* Spansion:
- Add support for S28HS02GT
- Switch methods to use vreg_offset from SFDP instead of hardcoding
the register value
* Microchip/SST:
- Add support for sst26vf032b flash
* Winbond:
- Correct flags for Winbond w25q128
* NXP spifi:
- Use helper function devm_clk_get_enabled()
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEE9HuaYnbmDhq/XIDIJWrqGEe9VoQFAmTstY0ACgkQJWrqGEe9
VoRpeggAmiUPLVEJRosvtOAaT+en2YTDiVZrRmQ8hjekjRc4FfY6C7DPNWNua3zx
SaVqLEF7ScjnKH1YYwXN3XG3j4+1NPRV/VmR89yD6NVOcLs8BEJk/Ooc6LQrHAAf
E87jVafbPLWq8MkcVcnHbdijgHVh2onMbUQtkqjFSn6WAolSmZFJotocfKT12uuY
K9Hn5TLjRiH5e7O1rQnBcATMXjHIA1o0G1RCklm+T1MojNXIO1KN8yMYRjUoGbEJ
afFdwczNiTFgL4MJ3qL6NhqhSGC6V6QsUcsYvEjmComepAuZBP2wGnuQMHOxKqYV
Tl93LW8FOdyWHdCSgJdYkctoRPU6KQ==
=uMXQ
-----END PGP SIGNATURE-----
Merge tag 'mtd/for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
Pull MTD updates from Miquel Raynal:
"Core MTD changes:
- Use refcount to prevent corruption
- Call external _get and _put in right order
- Fix use-after-free in mtd release
- Explicitly include correct DT includes
- Clean refcounting with MTD_PARTITIONED_MASTER
- mtdblock: make warning messages ratelimited
- dt-bindings: Add SEAMA partition bindings
Device driver changes:
- Use devm helper functions
- Fix questionable cast, remove pointless ones.
- error handling fixes
- add support for new chip versions
- update DT bindings
- misc cleanups - fix typos, whitespace, indentation"
* tag 'mtd/for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: (105 commits)
dt-bindings: mtd: amlogic,meson-nand: drop unneeded quotes
mtd: spear_smi: Use helper function devm_clk_get_enabled()
mtd: rawnand: orion: Use helper function devm_clk_get_optional_enabled()
mtd: rawnand: vf610_nfc: Use helper function devm_clk_get_enabled()
mtd: rawnand: sunxi: Use helper function devm_clk_get_enabled()
mtd: rawnand: stm32_fmc2: Use helper function devm_clk_get_enabled()
mtd: rawnand: mtk: Use helper function devm_clk_get_enabled()
mtd: rawnand: mpc5121: Use helper function devm_clk_get_enabled()
mtd: rawnand: lpc32xx_slc: Use helper function devm_clk_get_enabled()
mtd: rawnand: intel: Use helper function devm_clk_get_enabled()
mtd: rawnand: fsmc: Use helper function devm_clk_get_enabled()
mtd: rawnand: arasan: Use helper function devm_clk_get_enabled()
mtd: rawnand: qcom: Add read/read_start ops in exec_op path
mtd: rawnand: qcom: Clear buf_count and buf_start in raw read
mtd: maps: fix -Wvoid-pointer-to-enum-cast warning
mtd: rawnand: fix -Wvoid-pointer-to-enum-cast warning
mtd: rawnand: fsmc: handle clk prepare error in fsmc_nand_resume()
mtd: rawnand: Propagate error and simplify ternary operators for brcmstb_nand_wait_for_completion()
mtd: rawnand: qcom: Sort includes alphabetically
mtd: rawnand: qcom: Do not override the error no of submit_descs()
...
Disable virtualization features on 82580 just as on i210/i211.
This avoids that virt functions are acidentally called on 82850.
Fixes: 55cac248caa4 ("igb: Add full support for 82580 devices")
Signed-off-by: Corinna Vinschen <vinschen@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In this cycle, we don't have a highlighted feature enhancement, but mostly
have fixed issues mainly in two parts: 1) zoned block device, 2) compression
support. For zoned block device, we've tried to improve the power-off recovery
flow as much as possible. For compression, we found some corner cases caused by
wrong compression policy and logics. Other than them, there were some reverts
and stat corrections.
Bug fix:
- use finish zone command when closing a zone
- check zone type before sending async reset zone command
- fix to assign compress_level for lz4 correctly
- fix error path of f2fs_submit_page_read()
- don't {,de}compress non-full cluster
- send small discard commands during checkpoint back
- flush inode if atomic file is aborted
- correct to account gc/cp stats
And, there are minor bug fixes, avoiding false lockdep warning, and clean-ups.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmTyemoACgkQQBSofoJI
UNLLKQ//bupYGPOqAgKbd/s7FhtULMiiRmFVy7W2eoMIc/oeeXOGrzDAF/1NifLC
WLV4uBNVTS4PS8D1vRzxZNEZt9aqPS0vQ8hxW/3nTI9Z425NX3nz7gLSxxmwIkIe
xj++V6tvKPcCH0BfKvfFCtcxj09PsflgdEuT8w/sIkH6p4o+VEMFs1Lc9PQsjUmh
epznK7JGBwpAxmHqI74n1eAw2CI6W+oKx23YDTNMBD6hmXTU0fkTeBURrOlSsUHZ
nhafPecsrCEI+OpAj03G/7e/zt+iTUKdmHx9O5ir/P00vF/c+SU2vSwB97FiHqBi
B4UmocTM0MAsU80PQcmE6aU3zgQFI0Yun5yZ24VeWjKTu76ssZSmT2HA/4RL+LLf
AeAW4FSyfh76pls8X5IWfilxGLWq6kTzSZA0MF7dH2q7qlj5apL5wKpm/XH6POqn
qELY/Y9+P1QuCcNL8BiYrgA5xBqVJ7Uw/6/6U3Y77PElc+Pwl3vI8UZ7uCOBrsXL
e0TLXy23AJA6AS2DyLLziy669nXAZRb95B8TWMfEeVZIMFvCeeqYc74N8jOFa0T8
q6uQFZs+0cETLZA8MSZdlNhzvhJmbW6wgSIz++CEdikWSLBZMKWxBVjCPkkCY9uc
DMh8zruSVbYPZWBTcxkMFEBJKKrU43++e7pb8ZoqTj4Pq1317b0=
=Qa8+
-----END PGP SIGNATURE-----
Merge tag 'f2fs-for-6-6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this cycle, we don't have a highlighted feature enhancement, but
mostly have fixed issues mainly in two parts: 1) zoned block device,
and 2) compression support.
For zoned block device, we've tried to improve the power-off recovery
flow as much as possible. For compression, we found some corner cases
caused by wrong compression policy and logics. Other than them, there
were some reverts and stat corrections.
Bug fixes:
- use finish zone command when closing a zone
- check zone type before sending async reset zone command
- fix to assign compress_level for lz4 correctly
- fix error path of f2fs_submit_page_read()
- don't {,de}compress non-full cluster
- send small discard commands during checkpoint back
- flush inode if atomic file is aborted
- correct to account gc/cp stats
And, there are minor bug fixes, avoiding false lockdep warning, and
clean-ups"
* tag 'f2fs-for-6-6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (25 commits)
f2fs: use finish zone command when closing a zone
f2fs: compress: fix to assign compress_level for lz4 correctly
f2fs: fix error path of f2fs_submit_page_read()
f2fs: clean up error handling in sanity_check_{compress_,}inode()
f2fs: avoid false alarm of circular locking
Revert "f2fs: do not issue small discard commands during checkpoint"
f2fs: doc: fix description of max_small_discards
f2fs: should update REQ_TIME for direct write
f2fs: fix to account cp stats correctly
f2fs: fix to account gc stats correctly
f2fs: remove unneeded check condition in __f2fs_setxattr()
f2fs: fix to update i_ctime in __f2fs_setxattr()
Revert "f2fs: fix to do sanity check on extent cache correctly"
f2fs: increase usage of folio_next_index() helper
f2fs: Only lfs mode is allowed with zoned block device feature
f2fs: check zone type before sending async reset zone command
f2fs: compress: don't {,de}compress non-full cluster
f2fs: allow f2fs_ioc_{,de}compress_file to be interrupted
f2fs: don't reopen the main block device in f2fs_scan_devices
f2fs: fix to avoid mmap vs set_compress_option case
...
Commit bde5f6bc68db ("kmemleak: add scheduling point to kmemleak_scan()")
added a cond_resched() call to the struct page scanning loop to prevent
soft lockup from happening. However, soft lockup can still happen in that
loop in some corner cases when the pages that satisfy the "!(pfn & 63)"
check are skipped for some reasons.
Fix this corner case by moving up the cond_resched() check so that it will
be called every 64 pages unconditionally.
Link: https://lkml.kernel.org/r/20230825164947.1317981-1-longman@redhat.com
Fixes: bde5f6bc68db ("kmemleak: add scheduling point to kmemleak_scan()")
Signed-off-by: Waiman Long <longman@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Yisheng Xie <xieyisheng1@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In the past, movable allocations could be disallowed from CMA through
PF_MEMALLOC_PIN. As CMA pages are funneled through the MOVABLE pcplist,
this required filtering that cornercase during allocations, such that
pinnable allocations wouldn't accidentally get a CMA page.
However, since 8e3560d963d2 ("mm: honor PF_MEMALLOC_PIN for all movable
pages"), PF_MEMALLOC_PIN automatically excludes __GFP_MOVABLE. Once
again, MOVABLE implies CMA is allowed.
Remove the stale filtering code. Also remove a stale comment that was
introduced as part of the filtering code, because the filtering let
order-0 pages fall through to the buddy allocator. See 1d91df85f399
("mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore}
APIs") for context. The comment's been obsolete since the introduction of
the explicit ALLOC_HIGHATOMIC flag in eb2e2b425c69 ("mm/page_alloc:
explicitly record high-order atomic allocations in alloc_flags").
Link: https://lkml.kernel.org/r/20230824153821.243148-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
With madvise and prctl KSM can be enabled for different VMA's. Once it is
enabled we can query how effective KSM is overall. However we cannot
easily query if an individual VMA benefits from KSM.
This commit adds a KSM section to the /prod/<pid>/smaps file. It reports
how many of the pages are KSM pages. Note that KSM-placed zeropages are
not included, only actual KSM pages.
Here is a typical output:
7f420a000000-7f421a000000 rw-p 00000000 00:00 0
Size: 262144 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 51212 kB
Pss: 8276 kB
Shared_Clean: 172 kB
Shared_Dirty: 42996 kB
Private_Clean: 196 kB
Private_Dirty: 7848 kB
Referenced: 15388 kB
Anonymous: 51212 kB
KSM: 41376 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
FilePmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 202016 kB
SwapPss: 3882 kB
Locked: 0 kB
THPeligible: 0
ProtectionKey: 0
ksm_state: 0
ksm_skip_base: 0
ksm_skip_count: 0
VmFlags: rd wr mr mw me nr mg anon
This information also helps with the following workflow:
- First enable KSM for all the VMA's of a process with prctl.
- Then analyze with the above smaps report which VMA's benefit the most
- Change the application (if possible) to add the corresponding madvise
calls for the VMA's that benefit the most
[shr@devkernel.io: v5]
Link: https://lkml.kernel.org/r/20230823170107.1457915-1-shr@devkernel.io
Link: https://lkml.kernel.org/r/20230822180539.1424843-1-shr@devkernel.io
Signed-off-by: Stefan Roesch <shr@devkernel.io>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In the discussion of "Improve hugetlbfs read on HWPOISON hugepages" [1],
Matthew Wilcox suggests hwp is a bad abbreviation of hwpoison, as hwp is
already used as "an acronym by acpi, intel_pstate, some clock drivers, an
ethernet driver, and a scsi driver"[1].
So rename hwp_walk and hwp_walk_ops to hwpoison_walk and
hwpoison_walk_ops respectively.
raw_hwp_(page|list), *_raw_hwp, and raw_hwp_unreliable flag are other
major appearances of "hwp". However, given the "raw" hint in the name, it
is easy to differentiate them from other "hwp" acronyms. Since renaming
them is not as straightforward as renaming hwp_walk*, they are not covered
by this commit.
[1] https://lore.kernel.org/lkml/20230707201904.953262-5-jiaqiyan@google.com/T/#me6fecb8ce1ad4d5769199c9e162a44bc88f7bdec
Link: https://lkml.kernel.org/r/20230713235553.4121855-1-jiaqiyan@google.com
Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Memory failure is not interested in logically offlined pages. Skip this
type of page.
Link: https://lkml.kernel.org/r/20230727115643.639741-5-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
- kprobes: use struct_size() for variable size kretprobe_instance
data structure.
- eprobe: Simplify trace_eprobe list iteration.
- probe events: Data structure field access support on BTF argument.
. Update BTF argument support on the functions in the kernel loadable
modules (only loaded modules are supported).
. Move generic BTF access function (search function prototype and get
function parameters) to a separated file.
. Add a function to search a member of data structure in BTF.
. Support accessing BTF data structure member from probe args by
C-like arrow('->') and dot('.') operators. e.g.
't sched_switch next=next->pid vruntime=next->se.vruntime'
. Support accessing BTF data structure member from $retval. e.g.
'f getname_flags%return +0($retval->name):string'
. Add string type checking if BTF type info is available.
This will reject if user specify ":string" type for non "char
pointer" type.
. Automatically assume the fprobe event as a function return event
if $retval is used.
- selftests/ftrace: Add BTF data field access test cases.
- Documentation: Update fprobe event example with BTF data field.
-----BEGIN PGP SIGNATURE-----
iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmTycQkbHG1hc2FtaS5o
aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8bqS8H/jeR1JhOzIXOvTw7XCFm
MrSY/SKi8tQfV6lau2UmoYdbYvYjpqL34XLOQPNf2/lrcL2M9aNYXk9fbhlW8enx
vkMyKQ0E5anixkF4vsTbEl9DaprxbpsPVACmZ/7VjQk2JuXIdyaNk8hno9LgIcEq
udztb0o2HmDFqAXfRi0LvlSTAIwvXZ+usmEvYpaq1g2WwrCe7NHEYl42vMpj+h4H
9l4t5rA9JyPPX4yQUjtKGW5eRVTwDTm/Gn6DRzYfYzkkiBZv27qfovzBOt672LgG
hyot+u7XeKvZx3jjnF7+mRWoH/m0dqyhyi/nPhpIE09VhgwclrbGAcDuR1x6sp01
PHY=
=hBDN
-----END PGP SIGNATURE-----
Merge tag 'probes-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes updates from Masami Hiramatsu:
- kprobes: use struct_size() for variable size kretprobe_instance data
structure.
- eprobe: Simplify trace_eprobe list iteration.
- probe events: Data structure field access support on BTF argument.
- Update BTF argument support on the functions in the kernel
loadable modules (only loaded modules are supported).
- Move generic BTF access function (search function prototype and
get function parameters) to a separated file.
- Add a function to search a member of data structure in BTF.
- Support accessing BTF data structure member from probe args by
C-like arrow('->') and dot('.') operators. e.g.
't sched_switch next=next->pid vruntime=next->se.vruntime'
- Support accessing BTF data structure member from $retval. e.g.
'f getname_flags%return +0($retval->name):string'
- Add string type checking if BTF type info is available. This will
reject if user specify ":string" type for non "char pointer"
type.
- Automatically assume the fprobe event as a function return event
if $retval is used.
- selftests/ftrace: Add BTF data field access test cases.
- Documentation: Update fprobe event example with BTF data field.
* tag 'probes-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
Documentation: tracing: Update fprobe event example with BTF field
selftests/ftrace: Add BTF fields access testcases
tracing/fprobe-event: Assume fprobe is a return event by $retval
tracing/probes: Add string type check with BTF
tracing/probes: Support BTF field access from $retval
tracing/probes: Support BTF based data structure field access
tracing/probes: Add a function to search a member of a struct/union
tracing/probes: Move finding func-proto API and getting func-param API to trace_btf
tracing/probes: Support BTF argument on module functions
tracing/eprobe: Iterate trace_eprobe directly
kernel: kprobes: Use struct_size()
- Replace strlcpy() with strscpy()
- Initialize the pipe cpumask to zero on allocation
- Use within_module() instead of open coding it
- Remove extra space in hwlat_detectory/mode output
- Use LIST_HEAD() instead of open coding it
- A bunch of clean ups and fixes for the cpumask filter
- Set local da_mon_##name to static
- Fix race in snapshot buffer between cpu write and swap
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZPMsBhQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qiToAP49yXVK6seGUwU18QSp4mCNa0QNSH0v
bl2UYVSCPv8aNQEAquDOvGInbMcT2z69lK359TVlGPrtVjhqFDloSLMYgAo=
=DTGo
-----END PGP SIGNATURE-----
Merge tag 'trace-v6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull more tracing updates from Steven Rostedt:
"Tracing fixes and clean ups:
- Replace strlcpy() with strscpy()
- Initialize the pipe cpumask to zero on allocation
- Use within_module() instead of open coding it
- Remove extra space in hwlat_detectory/mode output
- Use LIST_HEAD() instead of open coding it
- A bunch of clean ups and fixes for the cpumask filter
- Set local da_mon_##name to static
- Fix race in snapshot buffer between cpu write and swap"
* tag 'trace-v6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/filters: Fix coding style issues
tracing/filters: Change parse_pred() cpulist ternary into an if block
tracing/filters: Fix double-free of struct filter_pred.mask
tracing/filters: Fix error-handling of cpulist parsing buffer
tracing: Zero the pipe cpumask on alloc to avoid spurious -EBUSY
ftrace: Use LIST_HEAD to initialize clear_hash
ftrace: Use within_module to check rec->ip within specified module.
tracing: Replace strlcpy with strscpy in trace/events/task.h
tracing: Fix race issue between cpu buffer write and swap
tracing: Remove extra space at the end of hwlat_detector/mode
rv: Set variable 'da_mon_##name' to static
x86 self-test code.
[ Arguably the existing code was unnecessarily fragile, and
tooling should have picked up the new syscall number,
and a wider fix is being worked on - but meanwhile, let's
not have the old syscall number in the kernel tree. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmTzDKMRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1iCVg//fHMSq7ZhK7gqoKOW+uqMNED9FWq8OIFv
WJZhqZO7FnU2gJNYX+KGANfnzRi3kgBcaBJ/JlNhjfdWxm4e+YyyXkeTG2zP76Wz
nCc9dwxD8xBD6my9WkW9sD4Q4ogsE6HYEQTi3ook3XsLytzbkYpyXXaLXXl84Rnz
9xYLQfNbTwA60NjYSWwN0Eec/rmiJqavayNe1b7hBfkqX8C4iXAIBDnadvfhN7Lv
KGWlkbrOg3S5o7Dy/chBjHIVbgYgHeD5lI0drzVkkbMdgBJvXy/1UB+NgD7hoa9L
V0OFnHSkeb2r5rv6IxabTu9x8+QhL3punrlBDEpeAWpF3glVnbbZ2S/7FpS5I+wl
fgU+jg3h2WX3N1adn4ibm9RdqmnqcL6yeOLPMUo5odzu/fZ3s07zW/5KsaaXZMdp
1xXxKBZLLUO07apRRtiFfcMCXPThuVafWFF+9rTtULB7Lfzd1mOwVrKxJOZgTwXj
GK/v8hEhx9ipDZsfcHG4Hbe1YAZjogF2uPvOrivFlpLxkdkZUlp9hlmMYyi9uIs4
i3oZX2Viwdq+y6z7frIUYHmW5emO6g81x+a41hhE7xdsWgEPDl/A2lMNF4d0Qunm
oDyFrxELU75ueLiqaAnYk3n65v2PeDTtq5aANYZtHoprqZqwYHLniH0TK+bMRbL9
CoYed1yx5DQ=
=W86M
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2023-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 selftest fix from Ingo Molnar:
"Fix the __NR_map_shadow_stack syscall-renumbering fallout in the x86
self-test code.
[ Arguably the existing code was unnecessarily fragile, and tooling
should have picked up the new syscall number, and a wider fix is
being worked on - but meanwhile, let's not have the old syscall
number in the kernel tree. ]"
* tag 'x86-urgent-2023-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
selftests/x86: Update map_shadow_stack syscall nr
kernels, caused by a buggy factoring-out of existing code.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmTzC5oRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1hG1Q/+ICGbpxdQOrVg7QTLzgsxttxIyi4Un6lb
vPX8NO9/4HIxObR6bd+ji2499TIO6nIhRqGOzEYUe9jzEN27eM/bMo6kCcRkbWra
4V/GZd3j+XdJwIQR442cBdUcByk4X7FlE7KqizJIbvYYyLBXzboBcpOdH012e2O9
UzFjtU+pk5Lhit18jL6/AvjsMhneKb6YUH20Wbb6zjZ1FL28YGKpeOHrh6GSXlKE
GVS07pWSAB8TMXdO+8YaKoE7VIOdMaYS/mJJ6u/M8Wo+Kl0wWwmJtjmSYzvD2Uod
PGcCiGXr1QpWK66wZNnXjs3rb6bX5umCo8rc5L6rqvWTYvB8Owpl5V94+87yGEov
29lYvWdVJ7dPqP8fSQfYxBKbgfINwOO1STYnIX1Q5mDD9fK2SgOpD9+JFagYnJoI
5n6KoVArVHQXSB4odTn+Qyt0yu0iDubUFRxBTrWijq5ooHOExaxByl0ViyCfp1aS
csTcGQSJsvHKhZPejDggjp74IU/ge5lUN4uSFlPVo3jYFwUIIgBG+43QtFiVrplg
3ifpI2qNISQl65PRerZjB5jBmItUGnUl71tnEg/Cli7zvvw/nMeKh98vChtE9S3A
2eQ66rrV9eJAeYaNCV4Uz1UmocD4i2Vec9tZOUUoIga/bDIOVr+bxUr7nvcOneak
98h2ylU4W8o=
=zpfn
-----END PGP SIGNATURE-----
Merge tag 'timers-urgent-2023-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Ingo Molnar:
"Fix false positive 'softirq work is pending' messages on -rt kernels,
caused by a buggy factoring-out of existing code"
* tag 'timers-urgent-2023-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick/rcu: Fix false positive "softirq work is pending" messages
and controls a CPU hot-unplug operation vs. the CFS bandwidth timer.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmTzCnoRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1h9gQ//fOQT0OrAUwhAdW7IZcQGETdSykxYqRXT
OciNPVUirJOXJM7tG08OEUUAiRrbIALHBRkfNk/ycOWTfa2qsur3jyGgyi8cnKo9
NmdNltRMZ2UbKlJxzoeu1wIqWkmoLaYloVp3YWXgPClclNbBROCvXvEHnEr1iRtA
trfEjNxEYgKeDkJROg0Av3RQTzLgZ3TqZ67mzJVZbCbz9i/IxicJa4PNuzrkw3c3
q42Btx+Ru1ikl/Jww0asX4iESFxuUk3Aw7DBX7slaLMrLcPMKsbO2D3npSxLFTCP
TUdMKoIanVjl5+a2//kT8TkV+M1OKvczy6AYH0pV/yZLkAQqJmLphVsEI6rMIdp2
ep26hrjaLlhp3dTr8jNQ86BlxT6zqP1/+OpC4BbKFK2HLJj7sGKcb5W5WMdhB/Qh
tA+CgVZXJDHkH2m2zD6o+SDm5JvbbHOLywfBBUSggHDDq3oOrxdjS2g8tgFwtnJ2
ZxjvJ4Ot3M26b44qkQbJeG42Q7ciLDrfaOZhlZ6bt30agU4EP3bg4dZAL24EoPLY
zdom++puL+nUBr6EvzbboVxisuf0cvDbujmuFRQdntRRy8oHgiQVhb+b4EWh0oOc
CKN06nyA9z5MzhAek3/GuxMYKEWM9/Dy6rDyqvaxfcbc9PIaxGfRxjgpKxrdRPOu
rjGsQHZbTlo=
=wM0O
-----END PGP SIGNATURE-----
Merge tag 'smp-urgent-2023-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull CPU hotplug fix from Ingo Molnar:
"Fix a CPU hotplug related deadlock between the task which initiates
and controls a CPU hot-unplug operation vs. the CFS bandwidth timer"
* tag 'smp-urgent-2023-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu/hotplug: Prevent self deadlock on CPU hot-unplug
and a kernel-doc fix.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmTzCWcRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1ifZQ//SvKEhKT1lolh4bmMZAaRHWJBq8omH1V+
36k5Jd3AOJcIEVJD0h+6yfJH2mlS6ZGW3te33VhW5z4c2dMBms90qMLv6xdr/E7j
Pseud3bc6o9SHPA8v9oNKy9GTcnD/kKXxr7f8tabJxxewzUY7EkHa4lJ1AgOIzDP
njWIVqqVFqoO1QjjKCN1ERuMU6ifX+6bcSik89f9F3Gg8KhUMbmv2+O6Jd22wwWC
mI/atl2EdkJg0VlFNIZtVk6n+hwbBaPfkd76ihQ/82MaLo1M7PilO5mtpgUNUCMh
XLlekYwFewUJP+xGkTg1FG8A2B937EXpPdO/8F4vFU/PhDeev8fIG99MIOo3h6A4
nlaKU/Lh9NFT/64wfP5/b8ud/UEf/7YhD1SH2SdtWwT2yXTrYUl2kdKYpgE8TX3C
c7Ap0vKQIcRrycoOaoxsKw915jeA5zCyykd75RLfzmK2phW22QtZgdIOuiflDeds
LAuelYaY6C7ZRPnGn2iWceoWS3IBhXTo4nsfh6sPX3A057iHo7CFjX7u1DeMqcuh
XIoKOgjZR/vnJQaFdWTSKKbzwTweAc1BBDUYy4CxWbUMD13GIE2trCS+GBWTZcoF
KaASIdXL4nUHP35rX9hlww5GUhF6NNOTZ9mkN7NHYfoVy0WXt/rLCywqo3D6Bne+
jeTHwFKjJYI=
=jDS4
-----END PGP SIGNATURE-----
Merge tag 'sched-urgent-2023-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Miscellaneous scheduler fixes: a reporting fix, a static symbol fix,
and a kernel-doc fix"
* tag 'sched-urgent-2023-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/core: Report correct state for TASK_IDLE | TASK_FREEZABLE
sched/fair: Make update_entity_lag() static
sched/core: Add kernel-doc for set_cpus_allowed_ptr()
Mikhail reports early-6.6-based Fedora Rawhide not booting: "rcu_preempt
detected expedited stalls", minutes wait, and then hung_task splat while
kworker trying to synchronize_rcu_expedited(). Nothing logged to disk.
He bisected to my 6.6 a349d72fd9ef ("mm/pgtable: add rcu_read_lock() and
rcu_read_unlock()s"): but the one to blame is my 6.5 commit to fix the
espfix "bad pmd" warnings when booting x86_64 with CONFIG_EFI_PGT_DUMP=y.
Gaah, that added an "addr >= TASK_SIZE" check to avoid pte_offset_map(),
but failed to add the equivalent check when choosing to pte_unmap().
It's not a problem on 6.5 (for different reasons, it's harmless on both
64-bit and 32-bit), but becomes a bootstopper on 6.6 with the unbalanced
rcu_read_unlock() - RCU has a WARN_ON_ONCE for that, but it would have
scrolled off Mikhail's console too quickly.
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Closes: https://lore.kernel.org/linux-mm/CABXGCsNi8Tiv5zUPNXr6UJw6qV1VdaBEfGqEAMkkXE3QPvZuAQ@mail.gmail.com/
Fixes: 8b1cb4a2e819 ("mm/pagewalk: fix EFI_PGT_DUMP of espfix area")
Fixes: a349d72fd9ef ("mm/pgtable: add rcu_read_lock() and rcu_read_unlock()s")
Signed-off-by: Hugh Dickins <hughd@google.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sudip Mukherjee reports that the mips sb1250_swarm_defconfig build fails
with the current kernel. It isn't actually MIPS-specific, it's just
that that defconfig does not have CGROUP_SCHED enabled like most configs
do, and as such shows this error:
kernel/cgroup/cgroup.c: In function 'cgroup_local_stat_show':
kernel/cgroup/cgroup.c:3699:15: error: implicit declaration of function 'cgroup_tryget_css'; did you mean 'cgroup_tryget'? [-Werror=implicit-function-declaration]
3699 | css = cgroup_tryget_css(cgrp, ss);
| ^~~~~~~~~~~~~~~~~
| cgroup_tryget
kernel/cgroup/cgroup.c:3699:13: warning: assignment to 'struct cgroup_subsys_state *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
3699 | css = cgroup_tryget_css(cgrp, ss);
| ^
because cgroup_tryget_css() only exists when CGROUP_SCHED is enabled,
and the cgroup_local_stat_show() function should similarly be guarded by
that config option.
Move things around a bit to fix this all.
Fixes: d1d4ff5d11a5 ("cgroup: put cgroup_tryget_css() inside CONFIG_CGROUP_SCHED")
Reported-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix the typo which resulted in the driver using FB_DEFAULT_IOMEM_HELPERS
instead of FB_DEFAULT_IOMEM_OPS as the fbdev I/O helpers.
Fixes: 501126083855 ("fbdev/g364fb: Use fbdev I/O helpers")
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A term may have no value in which case it is assumed to have a value
of 1. It doesn't just apply to alias/event terms so change the
parse_events_term__to_strbuf assert.
Commit 99e7138eb7897aa0 ("perf tools: Fail on using multiple bits long
terms without value") made it so that no_value terms could only be for a
single bit. Prior to commit 64199ae4b8a3 ("perf parse-events: Fix
propagation of term's no_value when cloning") this missed a test case
where config1 had no_value.
Fixes: 64199ae4b8a36038 ("perf parse-events: Fix propagation of term's no_value when cloning")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230901233949.2930562-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
While rewriting the code from sockptr_t to iov_iter during the
development, I forgot to replace one place in emu8000-pcm code. As
it's in the disabled area (with ifdef), it's never built and
overlooked. Replace with the proper argument NULL.
Fixes: 9d0fdc602de9 ("ALSA: emu8000: Convert to generic PCM copy ops")
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Closes: https://lore.kernel.org/r/20230902053646.GK3390869@ZenIV
Link: https://lore.kernel.org/r/20230902061044.19366-2-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Micron 1100 drives lock up when encountering queued TRIM command. It is
a quite old hardware series, for past years we have been running our
machines with these drives using libata.force=noncqtrim.
[Damien] Move the "Crucial_CT*M500*" entry to keep Micron and Crucial
entries together.
Signed-off-by: Pawel Zmarzly <pzmarzly@meta.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Elkhart Lake is the successor of Apollo Lake and Gemini Lake. These
CPUs and their PCHs are used in mobile and embedded environments.
With this patch I suggest that Elkhart Lake SATA controllers [1] should
use the default LPM policy for mobile chipsets.
The disadvantage of missing hot-plug support with this setting should
not be an issue, as those CPUs are used in embedded environments and
not in servers with hot-plug backplanes.
We discovered that the Elkhart Lake SATA controllers have been missing
in ahci.c after a customer reported the throttling of his SATA SSD
after a short period of higher I/O. We determined the high temperature
of the SSD controller in idle mode as the root cause for that.
Depending on the used SSD, we have seen up to 1.8 Watt lower system
idle power usage and up to 30°C lower SSD controller temperatures in
our tests, when we set med_power_with_dipm manually. I have provided a
table showing seven different SATA SSDs from ATP, Intel/Solidigm and
Samsung [2].
Intel lists a total of 3 SATA controller IDs (4B60, 4B62, 4B63) in [1]
for those mobile PCHs.
This commit just adds 0x4b63 as I do not have test systems with 0x4b60
and 0x4b62 SATA controllers.
I have tested this patch with a system which uses 0x4b63 as SATA
controller.
[1] https://sata-io.org/product/8803
[2] https://www.thomas-krenn.com/en/wiki/SATA_Link_Power_Management#Example_LES_v4
Signed-off-by: Werner Fischer <devlists@wefi.net>
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Review comments noted that an if block would be clearer than a ternary, so
swap it out.
No change in behaviour intended
Link: https://lkml.kernel.org/r/20230901151039.125186-4-vschneid@redhat.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
When a cpulist filter is found to contain a single CPU, that CPU is saved
as a scalar and the backing cpumask storage is freed.
Also NULL the mask to avoid a double-free once we get down to
free_predicate().
Link: https://lkml.kernel.org/r/20230901151039.125186-3-vschneid@redhat.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
parse_pred() allocates a string buffer to parse the user-provided cpulist,
but doesn't check the allocation result nor does it free the buffer once it
is no longer needed.
Add an allocation check, and free the buffer as soon as it is no longer
needed.
Link: https://lkml.kernel.org/r/20230901151039.125186-2-vschneid@redhat.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Reported-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The pipe cpumask used to serialize opens between the main and percpu
trace pipes is not zeroed or initialized. This can result in
spurious -EBUSY returns if underlying memory is not fully zeroed.
This has been observed by immediate failure to read the main
trace_pipe file on an otherwise newly booted and idle system:
# cat /sys/kernel/debug/tracing/trace_pipe
cat: /sys/kernel/debug/tracing/trace_pipe: Device or resource busy
Zero the allocation of pipe_cpumask to avoid the problem.
Link: https://lore.kernel.org/linux-trace-kernel/20230831125500.986862-1-bfoster@redhat.com
Cc: stable@vger.kernel.org
Fixes: c2489bb7e6be ("tracing: Introduce pipe_cpumask to avoid race on trace_pipes")
Reviewed-by: Zheng Yejian <zhengyejian1@huawei.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Use LIST_HEAD() to initialize clear_hash instead of open-coding it.
Link: https://lore.kernel.org/linux-trace-kernel/20230809071551.913041-1-ruanjinjie@huawei.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
within_module_core && within_module_init condition is same to
within module but it's more readable.
Use within_module instead of former condition to check rec->ip
within specified module area or not.
Link: https://lore.kernel.org/linux-trace-kernel/20230803205236.32201-1-ppbuk5246@gmail.com
Signed-off-by: Levi Yun <ppbuk5246@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Warning happened in rb_end_commit() at code:
if (RB_WARN_ON(cpu_buffer, !local_read(&cpu_buffer->committing)))
WARNING: CPU: 0 PID: 139 at kernel/trace/ring_buffer.c:3142
rb_commit+0x402/0x4a0
Call Trace:
ring_buffer_unlock_commit+0x42/0x250
trace_buffer_unlock_commit_regs+0x3b/0x250
trace_event_buffer_commit+0xe5/0x440
trace_event_buffer_reserve+0x11c/0x150
trace_event_raw_event_sched_switch+0x23c/0x2c0
__traceiter_sched_switch+0x59/0x80
__schedule+0x72b/0x1580
schedule+0x92/0x120
worker_thread+0xa0/0x6f0
It is because the race between writing event into cpu buffer and swapping
cpu buffer through file per_cpu/cpu0/snapshot:
Write on CPU 0 Swap buffer by per_cpu/cpu0/snapshot on CPU 1
-------- --------
tracing_snapshot_write()
[...]
ring_buffer_lock_reserve()
cpu_buffer = buffer->buffers[cpu]; // 1. Suppose find 'cpu_buffer_a';
[...]
rb_reserve_next_event()
[...]
ring_buffer_swap_cpu()
if (local_read(&cpu_buffer_a->committing))
goto out_dec;
if (local_read(&cpu_buffer_b->committing))
goto out_dec;
buffer_a->buffers[cpu] = cpu_buffer_b;
buffer_b->buffers[cpu] = cpu_buffer_a;
// 2. cpu_buffer has swapped here.
rb_start_commit(cpu_buffer);
if (unlikely(READ_ONCE(cpu_buffer->buffer)
!= buffer)) { // 3. This check passed due to 'cpu_buffer->buffer'
[...] // has not changed here.
return NULL;
}
cpu_buffer_b->buffer = buffer_a;
cpu_buffer_a->buffer = buffer_b;
[...]
// 4. Reserve event from 'cpu_buffer_a'.
ring_buffer_unlock_commit()
[...]
cpu_buffer = buffer->buffers[cpu]; // 5. Now find 'cpu_buffer_b' !!!
rb_commit(cpu_buffer)
rb_end_commit() // 6. WARN for the wrong 'committing' state !!!
Based on above analysis, we can easily reproduce by following testcase:
``` bash
#!/bin/bash
dmesg -n 7
sysctl -w kernel.panic_on_warn=1
TR=/sys/kernel/tracing
echo 7 > ${TR}/buffer_size_kb
echo "sched:sched_switch" > ${TR}/set_event
while [ true ]; do
echo 1 > ${TR}/per_cpu/cpu0/snapshot
done &
while [ true ]; do
echo 1 > ${TR}/per_cpu/cpu0/snapshot
done &
while [ true ]; do
echo 1 > ${TR}/per_cpu/cpu0/snapshot
done &
```
To fix it, IIUC, we can use smp_call_function_single() to do the swap on
the target cpu where the buffer is located, so that above race would be
avoided.
Link: https://lore.kernel.org/linux-trace-kernel/20230831132739.4070878-1-zhengyejian1@huawei.com
Cc: <mhiramat@kernel.org>
Fixes: f1affcaaa861 ("tracing: Add snapshot in the per_cpu trace directories")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Space is printed after each mode value including the last one:
$ echo \"$(sudo cat /sys/kernel/tracing/hwlat_detector/mode)\"
"none [round-robin] per-cpu "
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Link: https://lore.kernel.org/linux-trace-kernel/20230825103432.7750-1-m.kobuk@ispras.ru
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 8fa826b7344d ("trace/hwlat: Implement the mode config option")
Signed-off-by: Mikhail Kobuk <m.kobuk@ispras.ru>
Reviewed-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Many small changes across the subystem, some highlights:
- Usual driver cleanups in qedr, siw, erdma, hfi1, mlx4/5, irdma, mthca,
hns, and bnxt_re
- siw now works over tunnel and other netdevs with a MAC address by
removing assumptions about a MAC/GID from the connection manager
- "Doorbell Pacing" for bnxt_re - this is a best effort scheme to allow
userspace to slow down the doorbell rings if the HW gets full
- irdma egress VLAN priority, better QP/WQ sizing
- rxe bug fixes in queue draining and srq resizing
- Support more ethernet speed options in the core layer
- DMABUF support for bnxt_re
- Multi-stage MTT support for erdma to allow much bigger MR registrations
- A irdma fix with a CVE that came in too late to go to -rc, missing
bounds checking for 0 length MRs
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZPEqkAAKCRCFwuHvBreF
YZrNAPoCBfU+VjCKNr2yqF7s52os5ZdBV7Uuh4txHcXWW9H7GAD/f19i2u62fzNu
C27jj4cztemMBb8mgwyxPw/wLg7NLwY=
=pC6k
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
"Many small changes across the subystem, some highlights:
- Usual driver cleanups in qedr, siw, erdma, hfi1, mlx4/5, irdma,
mthca, hns, and bnxt_re
- siw now works over tunnel and other netdevs with a MAC address by
removing assumptions about a MAC/GID from the connection manager
- "Doorbell Pacing" for bnxt_re - this is a best effort scheme to
allow userspace to slow down the doorbell rings if the HW gets full
- irdma egress VLAN priority, better QP/WQ sizing
- rxe bug fixes in queue draining and srq resizing
- Support more ethernet speed options in the core layer
- DMABUF support for bnxt_re
- Multi-stage MTT support for erdma to allow much bigger MR
registrations
- A irdma fix with a CVE that came in too late to go to -rc, missing
bounds checking for 0 length MRs"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (87 commits)
IB/hfi1: Reduce printing of errors during driver shut down
RDMA/hfi1: Move user SDMA system memory pinning code to its own file
RDMA/hfi1: Use list_for_each_entry() helper
RDMA/mlx5: Fix trailing */ formatting in block comment
RDMA/rxe: Fix redundant break statement in switch-case.
RDMA/efa: Fix wrong resources deallocation order
RDMA/siw: Call llist_reverse_order in siw_run_sq
RDMA/siw: Correct wrong debug message
RDMA/siw: Balance the reference of cep->kref in the error path
Revert "IB/isert: Fix incorrect release of isert connection"
RDMA/bnxt_re: Fix kernel doc errors
RDMA/irdma: Prevent zero-length STAG registration
RDMA/erdma: Implement hierarchical MTT
RDMA/erdma: Refactor the storage structure of MTT entries
RDMA/erdma: Renaming variable names and field names of struct erdma_mem
RDMA/hns: Support hns HW stats
RDMA/hns: Dump whole QP/CQ/MR resource in raw
RDMA/irdma: Add missing kernel-doc in irdma_setup_umode_qp()
RDMA/mlx4: Copy union directly
RDMA/irdma: Drop unused kernel push code
...
* Fix PKRU covert channel
* Fix -Wmissing-variable-declarations warning for ia32_xyz_class
* Fix kernel-doc annotation warning
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmTyK2sACgkQaDWVMHDJ
krCPyA//WbT65hhGZt5UHayoW3Cv8NmS4kRf6FTVMmt1yIvbOtVmWVZeOGMypprC
jYBPOdTf1SdoUxeD1cTTd/rtbYknczaBLC5VGilGb2/663yQl9ThT3ePc6OgUpPW
28ItLslfHGchHbrOgCgUK8Aiv8xRdfNUJoByMTOoce6eGUQsgYaNOq74z1YaLsZ4
afbitcd/vWO9eJfW5DNl26rIP+ksOgA/Iue8uRiEczoPltnbUxGGLCtuuR9B2Xxi
FqfttvcnUy9yFJBnELx4721VtTcmK4Dq5O/ZoJnmVQqRPY9aXpkbwZnH9co8r/uh
GueQj30l6VhEO6XnwPvjsOghXI2vOMrxlL3zGMw5y6pjfUWa4W6f33EV6mByYfml
dfPTXIz9K1QxznHtk6xVhn1X8CFw7L1L7CC/2CPQPETiuqyjWrHqvFBXcsWpKq4N
1GjnpoveNZmWcE7UY9qEHAXGh7ndI+EIux/0WfJJHOE/fzCr06GphQdxNwPGdi5O
T3JkZf6R4GBvfi6e0F6Qn2B02sDl18remmAMi9bAn/M31g+mvAoHqKLlI9EzAGGz
91PzJdHecdvalU6zsCKqi0ETzq11WG7zCfSXs+jeHBe1jsUITpcZCkpVi2pGA2aq
X7qLjJyW+Sx0hdE2IWFhCOfiwGFpXxfegCl1uax3HrZg6rrORbw=
=prt2
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2023-09-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Dave Hansen:
"The most important fix here adds a missing CPU model to the recent
Gather Data Sampling (GDS) mitigation list to ensure that mitigations
are available on that CPU.
There are also a pair of warning fixes, and closure of a covert
channel that pops up when protection keys are disabled.
Summary:
- Mark all Skylake CPUs as vulnerable to GDS
- Fix PKRU covert channel
- Fix -Wmissing-variable-declarations warning for ia32_xyz_class
- Fix kernel-doc annotation warning"
* tag 'x86-urgent-2023-09-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu/xstate: Fix PKRU covert channel
x86/irq/i8259: Fix kernel-doc annotation warning
x86/speculation: Mark all Skylake CPUs as vulnerable to GDS
x86/audit: Fix -Wmissing-variable-declarations warning for ia32_xyz_class
User visible changes:
- Added a way to easier filter with cpumasks:
# echo 'cpumask & CPUS{17-42}' > /sys/kernel/tracing/events/ipi_send_cpumask/filter
- Show actual size of ring buffer after modifying the ring buffer size via
buffer_size_kb. Currently it just returns what was written, but the actual
size rounds up to the sub buffer size. Show that real size instead.
Major changes:
- Added "eventfs". This is the code that handles the inodes and dentries of
tracefs/events directory. As there are thousands of events, and each event
has several inodes and dentries that currently exist even when tracing is
never used, they take up precious memory. Instead, eventfs will allocate
the inodes and dentries in a JIT way (similar to what procfs does). There
is now metadata that handles the events and subdirectories, and will create
the inodes and dentries when they are used.
Note, I also have patches that remove the subdirectory meta data, but will
wait till the next merge window before applying them. It's a little more
complex, and I want to make sure the dynamic code works properly before
adding more complexity, making it easier to revert if need be.
Minor changes:
- Optimization to user event list traversal.
- Remove intermediate permission of tracefs files (note the intermediate
permission removes all access to the files so it is not a security concern,
but just a clean up.)
- Add the complex fix to FORTIFY_SOURCE to the kernel stack event logic.
- Other minor clean ups.
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEXtmkj8VMCiLR0IBM68Js21pW3nMFAmTwtAsUHHJvc3RlZHRA
Z29vZG1pcy5vcmcACgkQ68Js21pW3nNOXRAAsslQT6alY4OeplC4x47+V6+6NiIA
oDtOmWAqf7TsH9bukzRFD36rUly42O20RJDx9z0Q3iRc3vGxEawId8z6P0HmBwRb
VSl5BryWvL5Wc5w94xS8EeCuC1MRfhVDyfbtVFmWigzfvd/f+hp71ViMPHUvrRJX
KhzzNSBc4ir5E1lzfwa7meYTXzDwrQlZbYfdf5aH94IWAkqDj85PUZDJ7UmLZhXG
CIglSpNFXZ0j19Wo/U6KZlHR1XfunBKungCzJ5Dbznc9YLWZTQXOIZF4YPKfPIJL
ulRG9chwXY0nQWhG3xM1UHZLsAMSWw5i13a4ZN4d8FCNOgv8ttcJnfDk7ZYUS0Oz
RmY1dGcSRKAZTUTjm8ZBtmyiUCc9kZAIk0fyEfIHtoDYXmhnvni3wuTnbRSdXaSi
q4YkxPaLfX8Fn3QloCqqddt8iONu7BnbpZOhUCl2AtBib52gnTTF7+rQ6/0D3rjo
SSuvEHhnjJhzk+3jM2odxjmTAztNT+yu6FbKXZUKPt1Kj9YHv1J9cEQw9/Etw+GV
8jQBe979D8hFJmDOJOT/O/TdPqE9mQoMNBt6Y8QnE4nbJWM+i/MBrThFpUSQhRCr
0Ya/HgR2QyRH7RmZW5o2H9mNtN+V9c7RxZW8erYzRbUs0YofK2OpGi9SrPzxWCke
w6j0VVZHaxdPguM=
=/s+e
-----END PGP SIGNATURE-----
Merge tag 'trace-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing updates from Steven Rostedt:
"User visible changes:
- Added a way to easier filter with cpumasks:
# echo 'cpumask & CPUS{17-42}' > /sys/kernel/tracing/events/ipi_send_cpumask/filter
- Show actual size of ring buffer after modifying the ring buffer
size via buffer_size_kb.
Currently it just returns what was written, but the actual size
rounds up to the sub buffer size. Show that real size instead.
Major changes:
- Added "eventfs". This is the code that handles the inodes and
dentries of tracefs/events directory. As there are thousands of
events, and each event has several inodes and dentries that
currently exist even when tracing is never used, they take up
precious memory. Instead, eventfs will allocate the inodes and
dentries in a JIT way (similar to what procfs does). There is now
metadata that handles the events and subdirectories, and will
create the inodes and dentries when they are used.
Note, I also have patches that remove the subdirectory meta data,
but will wait till the next merge window before applying them. It's
a little more complex, and I want to make sure the dynamic code
works properly before adding more complexity, making it easier to
revert if need be.
Minor changes:
- Optimization to user event list traversal
- Remove intermediate permission of tracefs files (note the
intermediate permission removes all access to the files so it is
not a security concern, but just a clean up)
- Add the complex fix to FORTIFY_SOURCE to the kernel stack event
logic
- Other minor cleanups"
* tag 'trace-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (29 commits)
tracefs: Remove kerneldoc from struct eventfs_file
tracefs: Avoid changing i_mode to a temp value
tracing/user_events: Optimize safe list traversals
ftrace: Remove empty declaration ftrace_enable_daemon() and ftrace_disable_daemon()
tracing: Remove unused function declarations
tracing/filters: Document cpumask filtering
tracing/filters: Further optimise scalar vs cpumask comparison
tracing/filters: Optimise CPU vs cpumask filtering when the user mask is a single CPU
tracing/filters: Optimise scalar vs cpumask filtering when the user mask is a single CPU
tracing/filters: Optimise cpumask vs cpumask filtering when user mask is a single CPU
tracing/filters: Enable filtering the CPU common field by a cpumask
tracing/filters: Enable filtering a scalar field by a cpumask
tracing/filters: Enable filtering a cpumask field by another cpumask
tracing/filters: Dynamically allocate filter_pred.regex
test: ftrace: Fix kprobe test for eventfs
eventfs: Move tracing/events to eventfs
eventfs: Implement removal of meta data from eventfs
eventfs: Implement functions to create files and dirs when accessed
eventfs: Implement eventfs lookup, read, open functions
eventfs: Implement eventfs file add functions
...
* Unbound workqueues now support more flexible affinity scopes. The default
behavior is to soft-affine according to last level cache boundaries. A
work item queued from a given LLC is executed by a worker running on the
same LLC but the worker may be moved across cache boundaries as the
scheduler sees fit. On machines which multiple L3 caches, which are
becoming more popular along with chiplet designs, this improves cache
locality while not harming work conservation too much.
Unbound workqueues are now also a lot more flexible in terms of execution
affinity. Differeing levels of affinity scopes are supported and both the
default and per-workqueue affinity settings can be modified dynamically.
This should help working around amny of sub-optimal behaviors observed
recently with asymmetric ARM CPUs.
This involved signficant restructuring of workqueue code. Nothing was
reported yet but there's some risk of subtle regressions. Should keep an
eye out.
* Rescuer workers now has more identifiable comms.
* workqueue.unbound_cpus added so that CPUs which can be used by workqueue
can be constrained early during boot.
* Now that all the in-tree users have been flushed out, trigger warning if
system-wide workqueues are flushed.
* One pull commit from for-6.5-fixes to avoid cascading conflicts in the
affinity scope patchset.
-----BEGIN PGP SIGNATURE-----
iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZPERlQ4cdGpAa2VybmVs
Lm9yZwAKCRCxYfJx3gVYGVqQAPwIOy9tWY5jFAmMuIyH6wV50hbmfxCc2n5xhQNr
5HoyGgEA8lw1W7afDCIPiQVA7AYsu8dhwuNSOcRCJxhrrn4XsA0=
=g/Uu
-----END PGP SIGNATURE-----
Merge tag 'wq-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:
- Unbound workqueues now support more flexible affinity scopes.
The default behavior is to soft-affine according to last level cache
boundaries. A work item queued from a given LLC is executed by a
worker running on the same LLC but the worker may be moved across
cache boundaries as the scheduler sees fit. On machines which
multiple L3 caches, which are becoming more popular along with
chiplet designs, this improves cache locality while not harming work
conservation too much.
Unbound workqueues are now also a lot more flexible in terms of
execution affinity. Differeing levels of affinity scopes are
supported and both the default and per-workqueue affinity settings
can be modified dynamically. This should help working around amny of
sub-optimal behaviors observed recently with asymmetric ARM CPUs.
This involved signficant restructuring of workqueue code. Nothing was
reported yet but there's some risk of subtle regressions. Should keep
an eye out.
- Rescuer workers now has more identifiable comms.
- workqueue.unbound_cpus added so that CPUs which can be used by
workqueue can be constrained early during boot.
- Now that all the in-tree users have been flushed out, trigger warning
if system-wide workqueues are flushed.
* tag 'wq-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (31 commits)
workqueue: fix data race with the pwq->stats[] increment
workqueue: Rename rescuer kworker
workqueue: Make default affinity_scope dynamically updatable
workqueue: Add "Affinity Scopes and Performance" section to documentation
workqueue: Implement non-strict affinity scope for unbound workqueues
workqueue: Add workqueue_attrs->__pod_cpumask
workqueue: Factor out need_more_worker() check and worker wake-up
workqueue: Factor out work to worker assignment and collision handling
workqueue: Add multiple affinity scopes and interface to select them
workqueue: Modularize wq_pod_type initialization
workqueue: Add tools/workqueue/wq_dump.py which prints out workqueue configuration
workqueue: Generalize unbound CPU pods
workqueue: Factor out clearing of workqueue-only attrs fields
workqueue: Factor out actual cpumask calculation to reduce subtlety in wq_update_pod()
workqueue: Initialize unbound CPU pods later in the boot
workqueue: Move wq_pod_init() below workqueue_init()
workqueue: Rename NUMA related names to use pod instead
workqueue: Rename workqueue_attrs->no_numa to ->ordered
workqueue: Make unbound workqueues to use per-cpu pool_workqueues
workqueue: Call wq_update_unbound_numa() on all CPUs in NUMA node on CPU hotplug
...
* Per-cpu cpu usage stats are now tracked. This currently isn't printed out
in the cgroupfs interface and can only be accessed through e.g. BPF.
Should decide on a not-too-ugly way to show per-cpu stats in cgroupfs.
* cpuset received some cleanups and prepatory patches for the pending
cpus.exclusive patchset which will allow cpuset partitions to be created
below non-partition parents, which should ease the management of partition
cpusets.
* A lot of code and documentation cleanup patches.
* tools/testing/selftests/cgroup/test_cpuset.c is added. This causes trivial
conflicts in .gitignore and Makefile under the directory against
fe3b1bf19bdf ("selftests: cgroup: add test_zswap program"). They can be
resolved by keeping lines from both branches.
-----BEGIN PGP SIGNATURE-----
iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZPENTg4cdGpAa2VybmVs
Lm9yZwAKCRCxYfJx3gVYGcyBAP44cHwpSFxXe3cehxAzb1l/2BZXtzU5l48OqUQd
MwHyrwEAm7+MTVAR2xOF4f+oVM9KWmKj7oV7Clpixl1S7hHyjwE=
=FCc9
-----END PGP SIGNATURE-----
Merge tag 'cgroup-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
- Per-cpu cpu usage stats are now tracked
This currently isn't printed out in the cgroupfs interface and can
only be accessed through e.g. BPF. Should decide on a not-too-ugly
way to show per-cpu stats in cgroupfs
- cpuset received some cleanups and prepatory patches for the pending
cpus.exclusive patchset which will allow cpuset partitions to be
created below non-partition parents, which should ease the management
of partition cpusets
- A lot of code and documentation cleanup patches
- tools/testing/selftests/cgroup/test_cpuset.c added
* tag 'cgroup-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (32 commits)
cgroup: Avoid -Wstringop-overflow warnings
cgroup:namespace: Remove unused cgroup_namespaces_init()
cgroup/rstat: Record the cumulative per-cpu time of cgroup and its descendants
cgroup: clean up if condition in cgroup_pidlist_start()
cgroup: fix obsolete function name in cgroup_destroy_locked()
Documentation: cgroup-v2.rst: Correct number of stats entries
cgroup: fix obsolete function name above css_free_rwork_fn()
cgroup/cpuset: fix kernel-doc
cgroup: clean up printk()
cgroup: fix obsolete comment above cgroup_create()
docs: cgroup-v1: fix typo
docs: cgroup-v1: correct the term of Page Cache organization in inode
cgroup/misc: Store atomic64_t reads to u64
cgroup/misc: Change counters to be explicit 64bit types
cgroup/misc: update struct members descriptions
cgroup: remove cgrp->kn check in css_populate_dir()
cgroup: fix obsolete function name
cgroup: use cached local variable parent in for loop
cgroup: remove obsolete comment above struct cgroupstats
cgroup: put cgroup_tryget_css() inside CONFIG_CGROUP_SCHED
...
percpu
* A couple cleanups by Baoquan He and Bibo Mao. The only behavior change
is to start printing messages if we're under the warn limit for failed
atomic allocations.
percpu_counter
* Shakeel introduced percpu counters into mm_struct which caused percpu
allocations be on the hot path [1]. Originally I spent some time
trying to improve the percpu allocator, but instead preferred what
Mateusz Guzik proposed grouping at the allocation site,
percpu_counter_init_many(). This allows a single percpu allocation to
be shared by the counters. I like this approach because it creates a
shared lifetime by the allocations. Additionally, I believe many inits
have higher level synchronization requirements, like percpu_counter
does against HOTPLUG_CPU. Therefore we can group these optimizations
together.
[1] https://lore.kernel.org/linux-mm/20221024052841.3291983-1-shakeelb@google.com/
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE3hZPHJdcVwe+yTTtiDc0yuoFPR0FAmTv2IUACgkQiDc0yuoF
PR0+gg//U430Y9jRSKQtbh3dEPaAeWGcTfSTnVHbQGfBj3A4ePJyWl/Tgzri31AC
rzr8SRs0yX8b82TbECWsV67i/GrntLJyz4yQ52S/RRqVwnQqSn/wicEdCY00lJBt
Tye8zApOnYBouaYqIOxm/M7ofvKzJ3gWOVeF/zBwM6hwvNaXXtY5r86fSDxoEbhY
HOFnCDmg5Spf0U50j1G7nV5KfAb7BNA3/HFyzfzH+w+OWi4IGbThsfrg1qvjyFot
KlEK/kF8Af2xj2A2se4XFsLc2D/Tj+29juYVQqIPBJzVPrZ2uerKSszK5Zcr+Use
kMiG7tRWKE+2vkOM1RQ5Y5NCVEBhlXlienz1gf/C7247SEGs6OIyqvyDAgPTRx6p
oR2/vx9hMtaSMf4aHWd+fYS5gNZ05iMvOIbRZnI1wZkQglQVkJvXhzuLaJ+dIGSP
ypv6XOepik7vDjZ3p3xJXd0TAn4NSkn3jWRetrymdtMFanF99qw1VqjmkLecSil0
Gr0UhRL1oiMde6niVJrOpdOGLwt/M4N99Y5rksw6NCnktRJ99coFGj7LglZGMsu+
YkOyjD8MVJXTkBtBNGeqHTKe6nyVkHFq9ad5EmWjPkefP5JziH8i18k7JlF1dLA5
c8peq3ES659D5f0mU2jilD9PsCsBfSn6Of4ruMZa2Zr1XDD8snI=
=vcA1
-----END PGP SIGNATURE-----
Merge tag 'percpu-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu
Pull percpu updates from Dennis Zhou:
"One bigger change to percpu_counter's api allowing for init and
destroy of multiple counters via percpu_counter_init_many() and
percpu_counter_destroy_many(). This is used to help begin remediating
a performance regression with percpu rss stats.
Additionally, it seems larger core count machines are feeling the
burden of the single threaded allocation of percpu. Mateusz is
thinking about it and I will spend some time on it too.
percpu:
- A couple cleanups by Baoquan He and Bibo Mao. The only behavior
change is to start printing messages if we're under the warn limit
for failed atomic allocations.
percpu_counter:
- Shakeel introduced percpu counters into mm_struct which caused
percpu allocations be on the hot path [1]. Originally I spent some
time trying to improve the percpu allocator, but instead preferred
what Mateusz Guzik proposed grouping at the allocation site,
percpu_counter_init_many(). This allows a single percpu allocation
to be shared by the counters. I like this approach because it
creates a shared lifetime by the allocations. Additionally, I
believe many inits have higher level synchronization requirements,
like percpu_counter does against HOTPLUG_CPU. Therefore we can
group these optimizations together"
Link: https://lore.kernel.org/linux-mm/20221024052841.3291983-1-shakeelb@google.com/ [1]
* tag 'percpu-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu:
kernel/fork: group allocation/free of per-cpu counters for mm struct
pcpcntr: add group allocation/free
mm/percpu.c: print error message too if atomic alloc failed
mm/percpu.c: optimize the code in pcpu_setup_first_chunk() a little bit
mm/percpu.c: remove redundant check
mm/percpu: Remove some local variables in pcpu_populate_pte