1102957 Commits

Author SHA1 Message Date
Linus Torvalds
3cc30140db pci-v5.19-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmKP/tQUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vzH0xAAojQowrSWzZ5FKTqI+L/L9ZXoAb+e
 9IvQljKc9taJldmXp+EB9wkS/5B+VtQcC2qUQuWEQXUoECF8qHlcB4l+XQyd1tWO
 O0vZxETH22xjLLrjG2F3l5rrfkJZAf2nEugwbDk97YEgiimeOiRcv3bx6AUCtj6I
 rPJ13Fop3Jke7sQMcXYJe3gQLT1o1AKiQGghiCFNi/gzx2lXI6mmHBgLxFoiqcby
 WpfXbvbJti95HRaahUR3HaDFfHj4HVkQNLlTtIykJ3Tl2/rOhWEJjI8JOIQpAA+M
 WBrWw9rfgbScTiGV+dZ3h7hKiPnHKl9YETIX7L0oA2sj0jZcIs0d6mSBZx0kYuI9
 eAlx+qSK9xpbQQr/fdYaUdF1q4QdtU0BYOvOWOzWsqYCECMRJ1PUHFSMbmR/+PNB
 P5lHnAbggRSoxdAtwFYv1HTr+VpGH9S+5oxHCz3ohpMjYy6mkCZwHpZn3doaU3ci
 KG6yIoVKftm3fZdtFvL03qHl/I8+X24ZhT/T/278PRGjkhSyr56hZo8hg0gqqTct
 ngip8qNABmSbqpr73/W6Vl42zAbYtNk1BykYahbKupgW8FbT7hqaZTB05V87pVu+
 Ko1aJM6VoOP9rMlKHI9ba8eYCzDrZbLZUFn7ljNPDpzutf0tAwtgwzvZBXN3za6+
 Z9+D5dxmvrZEIbA=
 =hEti
 -----END PGP SIGNATURE-----

Merge tag 'pci-v5.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull pci updates from Bjorn Helgaas:
 "Resource management:

   - Restrict E820 clipping to PCI host bridge windows (Bjorn Helgaas)

   - Log E820 clipping better (Bjorn Helgaas)

   - Add kernel cmdline options to enable/disable E820 clipping (Hans de
     Goede)

   - Disable E820 reserved region clipping for IdeaPads, Yoga, Yoga
     Slip, Acer Spin 5, Clevo Barebone systems where clipping leaves no
     usable address space for touchpads, Thunderbolt devices, etc (Hans
     de Goede)

   - Disable E820 clipping by default starting in 2023 (Hans de Goede)

  PCI device hotplug:

   - Include files to remove implicit dependencies (Christophe Leroy)

   - Only put Root Ports in D3 if they can signal and wake from D3 so
     AMD Yellow Carp doesn't miss hotplug events (Mario Limonciello)

  Power management:

   - Define pci_restore_standard_config() only for CONFIG_PM_SLEEP since
     it's unused otherwise (Krzysztof Kozlowski)

   - Power up devices completely, including anything platform firmware
     needs to do, during runtime resume (Rafael J. Wysocki)

   - Move pci_resume_bus() to PM callbacks so we observe the required
     bridge power-up delays (Rafael J. Wysocki)

   - Drop unneeded runtime_d3cold device flag (Rafael J. Wysocki)

   - Split pci_raw_set_power_state() between pci_power_up() and a new
     pci_set_low_power_state() (Rafael J. Wysocki)

   - Set current_state to D3cold if config read returns ~0, indicating
     the device is not accessible (Rafael J. Wysocki)

   - Do not call pci_update_current_state() from pci_power_up() so BARs
     and ASPM config are restored correctly (Rafael J. Wysocki)

   - Write 0 to PMCSR in pci_power_up() in all cases (Rafael J. Wysocki)

   - Split pci_power_up() to pci_set_full_power_state() to avoid some
     redundant operations (Rafael J. Wysocki)

   - Skip restoring BARs if device is not in D0 (Rafael J. Wysocki)

   - Rearrange and clarify pci_set_power_state() (Rafael J. Wysocki)

   - Remove redundant BAR restores from pci_pm_thaw_noirq() (Rafael J.
     Wysocki)

  Virtualization:

   - Acquire device lock before config space access lock to avoid AB/BA
     deadlock with sriov_numvfs_store() (Yicong Yang)

  Error handling:

   - Clear MULTI_ERR_COR/UNCOR_RCV bits, which a race could previously
     leave permanently set (Kuppuswamy Sathyanarayanan)

  Peer-to-peer DMA:

   - Whitelist Intel Skylake-E Root Ports regardless of which devfn they
     are (Shlomo Pongratz)

  ASPM:

   - Override L1 acceptable latency advertised by Intel DG2 so ASPM L1
     can be enabled (Mika Westerberg)

  Cadence PCIe controller driver:

   - Set up device-specific register to allow PTM Responder to be
     enabled by the normal architected bit (Christian Gmeiner)

   - Override advertised FLR support since the controller doesn't
     implement FLR correctly (Parshuram Thombare)

  Cadence PCIe endpoint driver:

   - Correct bitmap size for the ob_region_map of outbound window usage
     (Dan Carpenter)

  Freescale i.MX6 PCIe controller driver:

   - Fix PERST# assertion/deassertion so we observe the required delays
     before accessing device (Francesco Dolcini)

  Freescale Layerscape PCIe controller driver:

   - Add "big-endian" DT property (Hou Zhiqiang)

   - Update SCFG DT property (Hou Zhiqiang)

   - Add "aer", "pme", "intr" DT properties (Li Yang)

   - Add DT compatible strings for ls1028a (Xiaowei Bao)

  Intel VMD host bridge driver:

   - Assign VMD IRQ domain before enumeration to avoid IOMMU interrupt
     remapping errors when MSI-X remapping is disabled (Nirmal Patel)

   - Revert VMD workaround that kept MSI-X remapping enabled when IOMMU
     remapping was enabled (Nirmal Patel)

  Marvell MVEBU PCIe controller driver:

   - Add of_pci_get_slot_power_limit() to parse the
     'slot-power-limit-milliwatt' DT property (Pali Rohár)

   - Add mvebu support for sending Set_Slot_Power_Limit message (Pali
     Rohár)

  MediaTek PCIe controller driver:

   - Fix refcount leak in mtk_pcie_subsys_powerup() (Miaoqian Lin)

  MediaTek PCIe Gen3 controller driver:

   - Reset PHY and MAC at probe time (AngeloGioacchino Del Regno)

  Microchip PolarFlare PCIe controller driver:

   - Add chained_irq_enter()/chained_irq_exit() calls to mc_handle_msi()
     and mc_handle_intx() to avoid lost interrupts (Conor Dooley)

   - Fix interrupt handling race (Daire McNamara)

  NVIDIA Tegra194 PCIe controller driver:

   - Drop tegra194 MSI register save/restore, which is unnecessary since
     the DWC core does it (Jisheng Zhang)

  Qualcomm PCIe controller driver:

   - Add SM8150 SoC DT binding and support (Bhupesh Sharma)

   - Fix pipe clock imbalance (Johan Hovold)

   - Fix runtime PM imbalance on probe errors (Johan Hovold)

   - Fix PHY init imbalance on probe errors (Johan Hovold)

   - Convert DT binding to YAML (Dmitry Baryshkov)

   - Update DT binding to show that resets aren't required for
     MSM8996/APQ8096 platforms (Dmitry Baryshkov)

   - Add explicit register names per chipset in DT binding (Dmitry
     Baryshkov)

   - Add sc7280-specific clock and reset definitions to DT binding
     (Dmitry Baryshkov)

  Rockchip PCIe controller driver:

   - Fix bitmap size when searching for free outbound region (Dan
     Carpenter)

  Rockchip DesignWare PCIe controller driver:

   - Remove "snps,dw-pcie" from rockchip-dwc DT "compatible" property
     because it's not fully compatible with rockchip (Peter Geis)

   - Reset rockchip-dwc controller at probe (Peter Geis)

   - Add rockchip-dwc INTx support (Peter Geis)

  Synopsys DesignWare PCIe controller driver:

   - Return error instead of success if DMA mapping of MSI area fails
     (Jiantao Zhang)

  Miscellaneous:

   - Change pci_set_dma_mask() documentation references to
     dma_set_mask() (Alex Williamson)"

* tag 'pci-v5.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (64 commits)
  dt-bindings: PCI: qcom: Add schema for sc7280 chipset
  dt-bindings: PCI: qcom: Specify reg-names explicitly
  dt-bindings: PCI: qcom: Do not require resets on msm8996 platforms
  dt-bindings: PCI: qcom: Convert to YAML
  PCI: qcom: Fix unbalanced PHY init on probe errors
  PCI: qcom: Fix runtime PM imbalance on probe errors
  PCI: qcom: Fix pipe clock imbalance
  PCI: qcom: Add SM8150 SoC support
  dt-bindings: pci: qcom: Document PCIe bindings for SM8150 SoC
  x86/PCI: Disable E820 reserved region clipping starting in 2023
  x86/PCI: Disable E820 reserved region clipping via quirks
  x86/PCI: Add kernel cmdline options to use/ignore E820 reserved regions
  PCI: microchip: Fix potential race in interrupt handling
  PCI/AER: Clear MULTI_ERR_COR/UNCOR_RCV bits
  PCI: cadence: Clear FLR in device capabilities register
  PCI: cadence: Allow PTM Responder to be enabled
  PCI: vmd: Revert 2565e5b69c44 ("PCI: vmd: Do not disable MSI-X remapping if interrupt remapping is enabled by IOMMU.")
  PCI: vmd: Assign VMD IRQ domain before enumeration
  PCI: Avoid pci_dev_lock() AB/BA deadlock with sriov_numvfs_store()
  PCI: rockchip-dwc: Add legacy interrupt support
  ...
2022-05-27 15:25:10 -07:00
Chao Yu
2d1fe8a86b f2fs: fix to tag gcing flag on page during file defragment
In order to garantee migrated data be persisted during checkpoint,
otherwise out-of-order persistency between data and node may cause
data corruption after SPOR.

Signed-off-by: Chao Yu <chao.yu@oppo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-05-27 13:58:42 -07:00
Yufen Yu
054cb2891b f2fs: replace F2FS_I(inode) and sbi by the local variable
We have define 'fi' at the begin of the functions, just use it,
rather than use F2FS_I(inode) again.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Reviewed-by: Chao Yu <chao@kernel.org>
[Jaegeuk Kim: replace sbi]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-05-27 13:58:23 -07:00
Arnd Bergmann
8a1e75c53c Clock properties for cru nodes to match the yaml-converted bindings
and renaming of Quartz-A bluetooth pin nodename to not conflict with
 Yaml constraints.
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCAAuFiEE7v+35S2Q1vLNA3Lx86Z5yZzRHYEFAmKRDwMQHGhlaWtvQHNu
 dGVjaC5kZQAKCRDzpnnJnNEdgaxyB/9tzsQldon0EfO1UpZsbNaqT7b18E4tNUkY
 4oNkm9GG5wgl9AcOzWnm40zb+2bpSgZUWqBiVgrtDJzvbDuS3fFIwzoyEV5QQysM
 znQNyJzeBQ1rpNkd5qSOUaFwIblk7PGiuSD0JV/J9dkjsy4VSd4t6LkpR+1O1f4O
 hcLhAtII6rzg/iNlLS0JCo2sNmF0kz7U4gLzNcq5AOZCHPkKQGBa0lR7DtAna41K
 c8nKLlKBPY/Z60JAEtlcSEZVJS0Wk187FIwylfrUaeuVZKnCSrg3UyImFjEVT705
 6ulPRjVdCWUWY+O71hyCiBEERQavDJwoCUD18hwLFtGwxtcQEgBz
 =U6N1
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmKRMrUACgkQmmx57+YA
 GNmBWA/9GvtI9XWiaaF+rZgxFurpsvLmQ/AwWKa61n99n4RGs7+9xihpX5nIF2k0
 FXAIEUHL2rMYdgBWGUToSXa+9iia80cdw4Y/LPb2cQjOva3fBTxTzu904aCrZDjB
 mx5ylqj9DRy77oya9yAFgaAO2Ze05/nUWuOI32sXgRoxzgUR/9zuDX16FVGVkvyE
 OHUOEm24BZexd5qILIFQ/FlHF8m2DEgCG83APqtp0aNomfI+ahvkJkRxJqiIxkkq
 rRVmKB3MsiOkBAAXEGetNUx0XeqN32oqsk9sbVrLnYhzJ+DvhBRg0txz2wDdmuZN
 MZUERVbK19mvjjsXkJuKfapQ5mirbj1v0+sKTNWKhXl47avK/hPqoF8Q/OxsuBGf
 B7W9AXjUENd4X4zwYbMvp8fDPOPDW6O4lj0QUJ2gV+lQp+tWiKOw95ltK5h0hWkQ
 1PuOEYbroqB6t8PEWWayaTyWwRcAnN3h9outRwoXz32RIDxBG30xLgSFsuBlTBJ/
 YDGD/oyyIJOmeSvlxgow/RrHRC91GbdxXitxCVq2+p1k/g6VTWRyBbgPlr4RTc12
 BdAWNAp7dRf5AXWNXQt1YBx4jn1tkK4epNqlTOC+w0M29ocfO8Ntd8PPdDJLd6lq
 Pg99jgoYw7kJ5/Iwh1rMWmOi6DQFcQNXcMtJPWk47FsB21RMq88=
 =ZMYI
 -----END PGP SIGNATURE-----

Merge tag 'v5.19-rockchip-dts64-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into arm/late

Clock properties for cru nodes to match the yaml-converted bindings
and renaming of Quartz-A bluetooth pin nodename to not conflict with
Yaml constraints.

* tag 'v5.19-rockchip-dts64-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
  arm64: dts: rockchip: rename Quartz64-A bluetooth gpios
  arm64: dts: rockchip: add clocks property to cru node rk3368
  arm64: dts: rockchip: add clocks property to cru node rk3308
  arm64: dts: rockchip: add clocks to rk356x cru

Link: https://lore.kernel.org/r/7695907.Sb9uPGUboI@phil
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-05-27 22:21:09 +02:00
Arnd Bergmann
440517772b Amba and clock fixes to conform better to actual dt-bindings.
-----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCAAuFiEE7v+35S2Q1vLNA3Lx86Z5yZzRHYEFAmKRDfYQHGhlaWtvQHNu
 dGVjaC5kZQAKCRDzpnnJnNEdgQk8CACp6MnJPwdUL+pA0bjcP7FRCkja1NSdEiab
 Ks/bnfcWFx1D2uc6d5q20rXWPOfYuroG+BvHlh9VqfF/1nmO7Uxuwx4cznYpGmSN
 Lee8Ux0IF9WNvTTBjHbIhCtXTyz/NXtMeocbysw8Dsgl3yNb+rlnBzP12Acs108T
 SA3dIy8icBVYDnyzqcuQjtOuQNN9uMjz+58e9pv/YavNnQwu9Ex67nLQo+UQMVIK
 A0t6kezZQZVtYdgIAp8y2u/3ZcaTUebhK8nLRfuWayuzcPpCF26cDhqrPL3KoQVW
 M5o1hRf1NVSpdH56N56qDAwnBeQtyqSoAVK5QET0YoP1UZd0QIEQ
 =ywJC
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmKRMpAACgkQmmx57+YA
 GNmymBAAqjqz7YZUyb6/tGABamO5g2AxDshIARYHrmiUo8e8f8p7tMOt21RzTaua
 UB+snhRkpfdyWNPk+TlToU+fraEtvlvqbtVIcevjG4SsarDZ0FOZgRq1cGu0B3m4
 GTsmsV+McLgNepewVgrCbpT7kzvoAovGPuWVgbKj2h+CWP3kxCw23lVwR+trk4bz
 5oCcMcFX7RqKNti/mw3dfVJgdKC3z17cyT6djHiaxEMdbENJJABLsi6YoVBmbvat
 lBRvPJwXNmCJdko5u8gEf1X6W2kV9JvKR5Ax0PydsG8p9BU/dwAQuQIUjQkYPpFT
 Xv5sSw5mRY8co0FL0VDGjw7AdRdsgPO4cRnRKuiRyJhqG0W3pyUcKU+SvnfLpgfW
 EUrB/MflPgrxuoBlEjfTXC/w9uD5qByhq3NlODdABLUKR6jgzE9jU3kAuwZ2QZ7h
 cWt3KeIdgokWJbdnq13eQibB+lkt2uz/apk3aM1MPB8CbFd7AmSbcV5YrfEjDTu2
 /6yWqwCI74QMyw8KvXeiTt43gpyie7YQIowiTNR6aw9S6/dZbkof7aqAP2sMTOh0
 vDqw8YwfZ8tNxntnrvT36n77g8gmcEPAG8/q2FY8WKGhuJUkX10lQzinouATYANs
 hXtWDCBMgEWkhynWOAmw8hrxT0H+n3yDQpSQsqhUwMDJp5UdFWY=
 =RiRc
 -----END PGP SIGNATURE-----

Merge tag 'v5.19-rockchip-dts32-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into arm/late

Amba and clock fixes to conform better to actual dt-bindings.

* tag 'v5.19-rockchip-dts32-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
  ARM: dts: rockchip: add clocks property to cru node rk3228
  ARM: dts: rockchip: add clocks property to cru node rk3036
  ARM: dts: rockchip: add clocks property to cru node rk3066a/rk3188
  ARM: dts: rockchip: add clocks property to cru node rk3288
  ARM: dts: rockchip: Remove "amba" bus nodes from rv1108
  ARM: dts: rockchip: add clocks property to cru node rv1108

Link: https://lore.kernel.org/r/4798587.jE0xQCEvom@phil
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-05-27 22:20:32 +02:00
Arnd Bergmann
4a4e81ddb8 Refcount leak for a used of-node in the grf-init.
-----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCAAuFiEE7v+35S2Q1vLNA3Lx86Z5yZzRHYEFAmKD9OcQHGhlaWtvQHNu
 dGVjaC5kZQAKCRDzpnnJnNEdgZDNB/9QBkDadcTEzn1GpRDZb8b4tpBrBiyYzNEl
 458EpbcqHi40HejXq4ZlzdniyZtl1eeRGd76aXpmc5azHPK+DhjvChKfhV+YWi/y
 aTJtLIreZEZuu3TH74PVrHr81sj+5TuZBVat5M3/dGNzIEkTm6CsiM07U9h2f5sD
 uktLk1aUioC5DRMxHCrEtmZmsiQT/uukwwiNVckuSjiRRlVlTd1NFzKqggs1yqNh
 TSQ3d/vnl1C+nw7ErwludYVtkUiVpoShKQp8GCvAMXo4aPuxEnto87UvOsdbibJW
 Xp8R/NuoE7W+iXLvwg4AuHS27/8oyJfxIt2PDA1Cw57A1oumtVeG
 =fSY+
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmKRMmQACgkQmmx57+YA
 GNmu6w/+JD6MYF81WZEUosKh7qOjfuLMAsJvjAV/x3ibYHIM7NBF6PwarDqAN5Z7
 QNNGDK4kVF5IJBWHohWVk0YhR8JKkwWSFlaGJxwRTRGoKEfK3aAH5en5yCrxDRRO
 Vsft5DT2fxELhE2bD8VweqPrKibH6nKUwadfAzWLdKBFFITU/lv2hDU4HL+ett0t
 Rlppu2lu674iWD18W8q8ozpZArchgQ76vW2ZfYoq8vNVY1B6WxjmdGKw9brMBqaM
 dd03tEJ1qaL8j6eO91GfEHGWm8+rL5QLTILNSZ9ma/vg/RaXyn18VCuuU+hG06DK
 aLsplf7GXNxcR/QVWEeT9vUCzGCQAr9EGyZ33qgmZv/bBhfTQOcGuE76SKxbn/6I
 eBFFPoM9dHtk2GM3qhTARHa4UqSWD6lfFGD0NhDBpVVHUN3N4RFXVuoOradz2Oj3
 DvBRyysYmkVzGOzHaqCjRz0A08Gms6U1KwHt0+4Fwf0kcs+zdvLS7mkVHOF5LuVY
 5XVgp+Z8eZtOH1utoC8/bXcGymi2ij8GHZ5PCG3piYalriTuvZExZThnKzpcMQ4A
 HVmfMJ9oroUArmgzbEHGBWSR8QQa6OCEoRHPj9gJGaKaTqNJTybNxhlk58qI7twP
 EfJ7MGQlpE6dRcAuL1tVD0CqFyGEO7pMwUNDLY0ltD7gMOYDA5k=
 =qzen
 -----END PGP SIGNATURE-----

Merge tag 'v5.19-rockchip-drivers2' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into arm/late

Refcount leak for a used of-node in the grf-init.

* tag 'v5.19-rockchip-drivers2' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
  soc: rockchip: Fix refcount leak in rockchip_grf_init

Link: https://lore.kernel.org/r/4541398.Icojqenx9y@phil
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-05-27 22:19:48 +02:00
Linus Torvalds
8291eaafed Two followon fixes for the post-5.19 series "Use pageblock_order for cma
and alloc_contig_range alignment", from Zi Yan.
 
 A series of z3fold cleanups and fixes from Miaohe Lin.
 
 Some memcg selftests work from Michal Koutný <mkoutny@suse.com>
 
 Some swap fixes and cleanups from Miaohe Lin.
 
 Several individual minor fixups.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYpEE7QAKCRDdBJ7gKXxA
 jlamAP9WmjNdx+5Pz5OkkaSjBO7y7vBrBTcQ9e5pz8bUWRoQhwEA+WtsssLmq9aI
 7DBDmBKYCMTbzOQTqaMRHkB+JWZo+Ao=
 =L3f1
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2022-05-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull more MM updates from Andrew Morton:

 - Two follow-on fixes for the post-5.19 series "Use pageblock_order for
   cma and alloc_contig_range alignment", from Zi Yan.

 - A series of z3fold cleanups and fixes from Miaohe Lin.

 - Some memcg selftests work from Michal Koutný <mkoutny@suse.com>

 - Some swap fixes and cleanups from Miaohe Lin

 - Several individual minor fixups

* tag 'mm-stable-2022-05-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (25 commits)
  mm/shmem.c: suppress shift warning
  mm: Kconfig: reorganize misplaced mm options
  mm: kasan: fix input of vmalloc_to_page()
  mm: fix is_pinnable_page against a cma page
  mm: filter out swapin error entry in shmem mapping
  mm/shmem: fix infinite loop when swap in shmem error at swapoff time
  mm/madvise: free hwpoison and swapin error entry in madvise_free_pte_range
  mm/swapfile: fix lost swap bits in unuse_pte()
  mm/swapfile: unuse_pte can map random data if swap read fails
  selftests: memcg: factor out common parts of memory.{low,min} tests
  selftests: memcg: remove protection from top level memcg
  selftests: memcg: adjust expected reclaim values of protected cgroups
  selftests: memcg: expect no low events in unprotected sibling
  selftests: memcg: fix compilation
  mm/z3fold: fix z3fold_page_migrate races with z3fold_map
  mm/z3fold: fix z3fold_reclaim_page races with z3fold_free
  mm/z3fold: always clear PAGE_CLAIMED under z3fold page lock
  mm/z3fold: put z3fold page back into unbuddied list when reclaim or migration fails
  revert "mm/z3fold.c: allow __GFP_HIGHMEM in z3fold_alloc"
  mm/z3fold: throw warning on failure of trylock_page in z3fold_alloc
  ...
2022-05-27 11:40:49 -07:00
Linus Torvalds
77fb622de1 Six hotfixes. One from Miaohe Lin is considered a minor thing so it isn't
for -stable.  The remainder address pre-5.19 issues and are cc:stable.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYpEC8gAKCRDdBJ7gKXxA
 jlukAQDCaXF7YTBjpoaAl0zhSu+5h7CawiB6cnRlq87/uJ2S4QD/eLVX3zfxI2DX
 YcOhc5H8BOgZ8ppD80Nv9qjmyvEWzAA=
 =ZFFG
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2022-05-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull hotfixes from Andrew Morton:
 "Six hotfixes.

  The page_table_check one from Miaohe Lin is considered a minor thing
  so it isn't marked for -stable. The remainder address pre-5.19 issues
  and are cc:stable"

* tag 'mm-hotfixes-stable-2022-05-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm/page_table_check: fix accessing unmapped ptep
  kexec_file: drop weak attribute from arch_kexec_apply_relocations[_add]
  mm/page_alloc: always attempt to allocate at least one page during bulk allocation
  hugetlb: fix huge_pmd_unshare address update
  zsmalloc: fix races between asynchronous zspage free and page migration
  Revert "mm/cma.c: remove redundant cma_mutex lock"
2022-05-27 11:29:35 -07:00
Linus Torvalds
6f664045c8 Not a lot of material this cycle. Many singleton patches against various
subsystems.   Most notably some maintenance work in ocfs2 and initramfs.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYo/6xQAKCRDdBJ7gKXxA
 jkD9AQCPczLBbRWpe1edL+5VHvel9ePoHQmvbHQnufdTh9rB5QEAu0Uilxz4q9cx
 xSZypNhj2n9f8FCYca/ZrZneBsTnAA8=
 =AJEO
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2022-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc updates from Andrew Morton:
 "The non-MM patch queue for this merge window.

  Not a lot of material this cycle. Many singleton patches against
  various subsystems. Most notably some maintenance work in ocfs2
  and initramfs"

* tag 'mm-nonmm-stable-2022-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (65 commits)
  kcov: update pos before writing pc in trace function
  ocfs2: dlmfs: fix error handling of user_dlm_destroy_lock
  ocfs2: dlmfs: don't clear USER_LOCK_ATTACHED when destroying lock
  fs/ntfs: remove redundant variable idx
  fat: remove time truncations in vfat_create/vfat_mkdir
  fat: report creation time in statx
  fat: ignore ctime updates, and keep ctime identical to mtime in memory
  fat: split fat_truncate_time() into separate functions
  MAINTAINERS: add Muchun as a memcg reviewer
  proc/sysctl: make protected_* world readable
  ia64: mca: drop redundant spinlock initialization
  tty: fix deadlock caused by calling printk() under tty_port->lock
  relay: remove redundant assignment to pointer buf
  fs/ntfs3: validate BOOT sectors_per_clusters
  lib/string_helpers: fix not adding strarray to device's resource list
  kernel/crash_core.c: remove redundant check of ck_cmdline
  ELF, uapi: fixup ELF_ST_TYPE definition
  ipc/mqueue: use get_tree_nodev() in mqueue_get_tree()
  ipc: update semtimedop() to use hrtimer
  ipc/sem: remove redundant assignments
  ...
2022-05-27 11:22:03 -07:00
Jason A. Donenfeld
8bdc2a1901 crypto: poly1305 - cleanup stray CRYPTO_LIB_POLY1305_RSIZE
When CRYPTO_LIB_POLY1305 is unset, CRYPTO_LIB_POLY1305_RSIZE
is still set in the Kconfig, cluttering things.

Fix this by making CRYPTO_LIB_POLY1305_RSIZE depend on
CRYPTO_LIB_POLY1305.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-05-27 11:10:21 -07:00
Baolin Wang
e68b823ab0 arm64/hugetlb: Fix building errors in huge_ptep_clear_flush()
Fix the arm64 build error which was caused by commit ae07562909f3 ("mm:
change huge_ptep_clear_flush() to return the original pte") interacting
with commit fb396bb459c1 ("arm64/hugetlb: Drop TLB flush from
get_clear_flush()"):

  arch/arm64/mm/hugetlbpage.c: In function ‘huge_ptep_clear_flush’:
  arch/arm64/mm/hugetlbpage.c:515:9: error: implicit declaration of function ‘get_clear_flush’; did you mean ‘ptep_clear_flush’? [-Werror=implicit-function-declaration]
    515 |  return get_clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig);
        |         ^~~~~~~~~~~~~~~
        |         ptep_clear_flush

Due to the new get_clear_contig() has dropped TLB flush, we should add
an explicit TLB flush in huge_ptep_clear_flush() to keep original
semantics when changing to use new get_clear_contig().

Fixes: fb396bb459c1 ("arm64/hugetlb: Drop TLB flush from get_clear_flush()").
Fixes: ae07562909f3 ("mm: change huge_ptep_clear_flush() to return the original pte")
Reported-and-tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Reported-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-05-27 10:56:35 -07:00
David Howells
189b0ddc24 pipe: Fix missing lock in pipe_resize_ring()
pipe_resize_ring() needs to take the pipe->rd_wait.lock spinlock to
prevent post_one_notification() from trying to insert into the ring
whilst the ring is being replaced.

The occupancy check must be done after the lock is taken, and the lock
must be taken after the new ring is allocated.

The bug can lead to an oops looking something like:

 BUG: KASAN: use-after-free in post_one_notification.isra.0+0x62e/0x840
 Read of size 4 at addr ffff88801cc72a70 by task poc/27196
 ...
 Call Trace:
  post_one_notification.isra.0+0x62e/0x840
  __post_watch_notification+0x3b7/0x650
  key_create_or_update+0xb8b/0xd20
  __do_sys_add_key+0x175/0x340
  __x64_sys_add_key+0xbe/0x140
  do_syscall_64+0x5c/0xc0
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Reported by Selim Enes Karaduman @Enesdex working with Trend Micro Zero
Day Initiative.

Fixes: c73be61cede5 ("pipe: Add general notification queue support")
Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-17291
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-05-27 10:45:59 -07:00
Peter Geis
cd414d5ac1 arm64: dts: rockchip: rename Quartz64-A bluetooth gpios
The bluetooth binding for the Quartz64 Model A has incorrectly named
host-wakeup and device-wakeup gpios. Rename them to clear some dtbs_check
warnings.

Signed-off-by: Peter Geis <pgwipeout@gmail.com>
Link: https://lore.kernel.org/r/20220511150117.113070-4-pgwipeout@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2022-05-27 19:31:24 +02:00
Johan Jonker
3d65818cd6 arm64: dts: rockchip: add clocks property to cru node rk3368
Add clocks and clock-names because the device has to have
at least one input clock.
Also in case someone wants to add properties that start with
assign-xxx to fix warnings like:
'clocks' is a dependency of 'assigned-clocks'

Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Link: https://lore.kernel.org/r/20220329180550.31043-2-jbx6244@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2022-05-27 19:31:23 +02:00
Johan Jonker
2d36391216 arm64: dts: rockchip: add clocks property to cru node rk3308
Add clocks and clock-names to the rk3308 cru node, because
the device has to have at least one input clock.
Also in case someone wants to add properties that start with
assign-xxx to fix warnings like:
'clocks' is a dependency of 'assigned-clocks'
With the addition of new properties also sort the node properties
a little bit.

Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Link: https://lore.kernel.org/r/20220329184339.1134-2-jbx6244@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2022-05-27 19:31:23 +02:00
Peter Geis
cd2d081d18 arm64: dts: rockchip: add clocks to rk356x cru
The rk356x cru requires a 24m clock input to function. Add the clocks
properties to the cru to clear some dtbs_check warnings.

Signed-off-by: Peter Geis <pgwipeout@gmail.com>
Link: https://lore.kernel.org/r/20220511150117.113070-3-pgwipeout@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2022-05-27 19:31:23 +02:00
Johan Jonker
840fc447d7 ARM: dts: rockchip: add clocks property to cru node rk3228
Add clocks and clock-names to the rk3228 cru node, because
the device has to have at least one input clock.

Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Link: https://lore.kernel.org/r/20220330121923.24240-2-jbx6244@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2022-05-27 19:27:00 +02:00
Johan Jonker
8dd85bffc5 ARM: dts: rockchip: add clocks property to cru node rk3036
Add clocks and clock-names to the rk3036 cru node, because
the device has to have at least one input clock.

Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Link: https://lore.kernel.org/r/20220330114847.18633-2-jbx6244@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2022-05-27 19:27:00 +02:00
Johan Jonker
25f417b563 ARM: dts: rockchip: add clocks property to cru node rk3066a/rk3188
Add clocks property to rk3066a/rk3188 cru node to fix warnings like:
'clocks' is a dependency of 'assigned-clocks'

Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Link: https://lore.kernel.org/r/20220329111323.3569-2-jbx6244@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2022-05-27 19:27:00 +02:00
Johan Jonker
9d66847be3 ARM: dts: rockchip: add clocks property to cru node rk3288
Add clocks property to rk3288 cru node to fix warnings like:
'clocks' is a dependency of 'assigned-clocks'.

Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Link: https://lore.kernel.org/r/20220329113657.4567-2-jbx6244@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2022-05-27 19:27:00 +02:00
Johan Jonker
e8cead54a6 ARM: dts: rockchip: Remove "amba" bus nodes from rv1108
The "amba" bus nodes wrapping all the DMA-330 nodes serve no useful
purpose, and certainly bear no relation at all to the actual underlying
interconnect topology. They appear to be cargo-cult copying from a
design misstep in the very early days of FDT adoption on ARM, which was
righted with the "arm,primecell" compatible, and the last trace of the
idea finally purged by commit 2ef7d5f342c1 ("ARM, ARM64: dts: drop
"arm,amba-bus" in favor of "simple-bus"").

As such, they can simply be removed and the DMA-330 nodes fitted into
the normal sort order.
The node names should be generic, so rename it to "dma-controller".

Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Link: https://lore.kernel.org/r/20220330131608.30040-3-jbx6244@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2022-05-27 19:27:00 +02:00
Johan Jonker
f7230dcfb4 ARM: dts: rockchip: add clocks property to cru node rv1108
Add clocks and clock-names to the rv1108 cru node, because
the device has to have at least one input clock.

Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Link: https://lore.kernel.org/r/20220330131608.30040-2-jbx6244@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2022-05-27 19:27:00 +02:00
Steve French
44a48081fc smb3: remove unneeded null check in cifs_readdir
Coverity pointed out an unneeded check.

Addresses-Coverity: 1518030 ("Null pointer dereferences")
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-05-27 12:05:47 -05:00
Andrew Morton
fa020a2b87 mm/shmem.c: suppress shift warning
mm/shmem.c:1948 shmem_getpage_gfp() warn: should '(((1) << 12) / 512) << folio_order(folio)' be a 64 bit type?

On i386, so an unsigned long is 32-bit, but i_blocks is a 64-bit blkcnt_t.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Jessica Clarke <jrtc27@jrtc27.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27 09:33:47 -07:00
Vlastimil Babka
0710d0122a mm: Kconfig: reorganize misplaced mm options
After commits 7b42f1041c98 ("mm: Kconfig: move swap and slab config
options to the MM section") and 519bcb797907 ("mm: Kconfig: group swap,
slab, hotplug and thp options into submenus") we now have nicely organized
mm related config options.  I have noticed some that were still misplaced,
so this moves them from various places into the new structure:

VM_EVENT_COUNTERS, COMPAT_BRK, MMAP_ALLOW_UNINITIALIZED to mm/Kconfig and
general MM section.

SLUB_STATS to mm/Kconfig and the slab submenu.

DEBUG_SLAB, SLUB_DEBUG, SLUB_DEBUG_ON to mm/Kconfig.debug and the Kernel
hacking / Memory Debugging submenu.

Link: https://lkml.kernel.org/r/20220525112559.1139-1-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27 09:33:47 -07:00
Kefeng Wang
fbf4df0699 mm: kasan: fix input of vmalloc_to_page()
When print virtual mapping info for vmalloc address, it should pass
the addr not page, fix it.

Link: https://lkml.kernel.org/r/20220525120804.38155-1-wangkefeng.wang@huawei.com
Fixes: c056a364e954 ("kasan: print virtual mapping info in reports")
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27 09:33:46 -07:00
Minchan Kim
1c56343258 mm: fix is_pinnable_page against a cma page
Pages in the CMA area could have MIGRATE_ISOLATE as well as MIGRATE_CMA so
the current is_pinnable_page() could miss CMA pages which have
MIGRATE_ISOLATE.  It ends up pinning CMA pages as longterm for the
pin_user_pages() API so CMA allocations keep failing until the pin is
released.

     CPU 0                                   CPU 1 - Task B

cma_alloc
alloc_contig_range
                                        pin_user_pages_fast(FOLL_LONGTERM)
change pageblock as MIGRATE_ISOLATE
                                        internal_get_user_pages_fast
                                        lockless_pages_from_mm
                                        gup_pte_range
                                        try_grab_folio
                                        is_pinnable_page
                                          return true;
                                        So, pinned the page successfully.
page migration failure with pinned page
                                        ..
                                        .. After 30 sec
                                        unpin_user_page(page)

CMA allocation succeeded after 30 sec.

The CMA allocation path protects the migration type change race using
zone->lock but what GUP path need to know is just whether the page is on
CMA area or not rather than exact migration type.  Thus, we don't need
zone->lock but just checks migration type in either of (MIGRATE_ISOLATE
and MIGRATE_CMA).

Adding the MIGRATE_ISOLATE check in is_pinnable_page could cause rejecting
of pinning pages on MIGRATE_ISOLATE pageblocks even though it's neither
CMA nor movable zone if the page is temporarily unmovable.  However, such
a migration failure by unexpected temporal refcount holding is general
issue, not only come from MIGRATE_ISOLATE and the MIGRATE_ISOLATE is also
transient state like other temporal elevated refcount problem.

Link: https://lkml.kernel.org/r/20220524171525.976723-1-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27 09:33:46 -07:00
Miaohe Lin
ba6851b45d mm: filter out swapin error entry in shmem mapping
There might be swapin error entries in shmem mapping.  Filter them out to
avoid "Bad swap file entry" complaint.

Link: https://lkml.kernel.org/r/20220519125030.21486-6-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: NeilBrown <neilb@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27 09:33:46 -07:00
Miaohe Lin
6cec2b95da mm/shmem: fix infinite loop when swap in shmem error at swapoff time
When swap in shmem error at swapoff time, there would be a infinite loop
in the while loop in shmem_unuse_inode().  It's because swapin error is
deliberately ignored now and thus info->swapped will never reach 0.  So we
can't escape the loop in shmem_unuse().

In order to fix the issue, swapin_error entry is stored in the mapping
when swapin error occurs.  So the swapcache page can be freed and the user
won't end up with a permanently mounted swap because a sector is bad.  If
the page is accessed later, the user process will be killed so that
corrupted data is never consumed.  On the other hand, if the page is never
accessed, the user won't even notice it.

Link: https://lkml.kernel.org/r/20220519125030.21486-5-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reported-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: NeilBrown <neilb@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27 09:33:46 -07:00
Miaohe Lin
7b49514fa1 mm/madvise: free hwpoison and swapin error entry in madvise_free_pte_range
Once the MADV_FREE operation has succeeded, callers can expect they might
get zero-fill pages if accessing the memory again.  Therefore it should be
safe to delete the hwpoison entry and swapin error entry.  There is no
reason to kill the process if it has called MADV_FREE on the range.

Link: https://lkml.kernel.org/r/20220519125030.21486-4-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Suggested-by: Alistair Popple <apopple@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: NeilBrown <neilb@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27 09:33:46 -07:00
Miaohe Lin
14a762dd19 mm/swapfile: fix lost swap bits in unuse_pte()
This is observed by code review only but not any real report.

When we turn off swapping we could have lost the bits stored in the swap
ptes.  The new rmap-exclusive bit is fine since that turned into a page
flag, but not for soft-dirty and uffd-wp.  Add them.

Link: https://lkml.kernel.org/r/20220519125030.21486-3-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Suggested-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: NeilBrown <neilb@suse.de>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27 09:33:46 -07:00
Miaohe Lin
9f186f9e5f mm/swapfile: unuse_pte can map random data if swap read fails
Patch series "A few fixup patches for mm", v4.

This series contains a few patches to avoid mapping random data if swap
read fails and fix lost swap bits in unuse_pte.  Also we free hwpoison and
swapin error entry in madvise_free_pte_range and so on.  More details can
be found in the respective changelogs.  


This patch (of 5):

There is a bug in unuse_pte(): when swap page happens to be unreadable,
page filled with random data is mapped into user address space.  In case
of error, a special swap entry indicating swap read fails is set to the
page table.  So the swapcache page can be freed and the user won't end up
with a permanently mounted swap because a sector is bad.  And if the page
is accessed later, the user process will be killed so that corrupted data
is never consumed.  On the other hand, if the page is never accessed, the
user won't even notice it.

Link: https://lkml.kernel.org/r/20220519125030.21486-1-linmiaohe@huawei.com
Link: https://lkml.kernel.org/r/20220519125030.21486-2-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Howells <dhowells@redhat.com>
Cc: NeilBrown <neilb@suse.de>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27 09:33:45 -07:00
Michal Koutný
f079a020ba selftests: memcg: factor out common parts of memory.{low,min} tests
The memory protection test setup and runtime is almost equal for
memory.low and memory.min cases.

It makes modification of the common parts prone to mistakes, since the
protections are similar not only in setup but also in principle, factor
the common part out.

Past exceptions between the tests:
- missing memory.min is fine (kept),
- test_memcg_low protected orphaned pagecache (adapted like
  test_memcg_min and we keep the processes of protected memory running).

The evaluation in two tests is different (OOM of allocator vs low events
of protégés), this is kept different.

Link: https://lkml.kernel.org/r/20220518161859.21565-6-mkoutny@suse.com
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
CC: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Richard Palethorpe <rpalethorpe@suse.de>
Cc: David Vernet <void@manifault.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27 09:33:45 -07:00
Michal Koutný
6a35919005 selftests: memcg: remove protection from top level memcg
The reclaim is triggered by memory limit in a subtree, therefore the
testcase does not need configured protection against external reclaim.

Also, correct respective comments.

Link: https://lkml.kernel.org/r/20220518161859.21565-5-mkoutny@suse.com
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: David Vernet <void@manifault.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Richard Palethorpe <rpalethorpe@suse.de>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27 09:33:45 -07:00
Michal Koutný
f10b6e9a8e selftests: memcg: adjust expected reclaim values of protected cgroups
The numbers are not easy to derive in a closed form (certainly mere
protections ratios do not apply), therefore use a simulation to obtain
expected numbers.

Link: https://lkml.kernel.org/r/20220518161859.21565-4-mkoutny@suse.com
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: David Vernet <void@manifault.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Richard Palethorpe <rpalethorpe@suse.de>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27 09:33:45 -07:00
Michal Koutný
1d09069f53 selftests: memcg: expect no low events in unprotected sibling
This is effectively a revert of commit cdc69458a5f3 ("cgroup: account for
memory_recursiveprot in test_memcg_low()").  The case test_memcg_low will
fail with memory_recursiveprot until resolved in reclaim code.

However, this patch preserves the existing helpers and variables for later
uses.

Link: https://lkml.kernel.org/r/20220518161859.21565-3-mkoutny@suse.com
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Reviewed-by: David Vernet <void@manifault.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Richard Palethorpe <rpalethorpe@suse.de>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27 09:33:45 -07:00
Michal Koutný
ff3b72a5d6 selftests: memcg: fix compilation
Patch series "memcontrol selftests fixups", v2.

Flushing the patches to make memcontrol selftests check the events
behavior we had consensus about (test_memcg_low fails).

(test_memcg_reclaim, test_memcg_swap_max fail for me now but it's present
even before the refactoring.)

The two bigger changes are:
- adjustment of the protected values to make tests succeed with the given
  tolerance,
- both test_memcg_low and test_memcg_min check protection of memory in
  populated cgroups (actually as per Documentation/admin-guide/cgroup-v2.rst
  memory.min should not apply to empty cgroups, which is not the case
  currently. Therefore I unified tests with the populated case in order to to
  bring more broken tests).


This patch (of 5):

This fixes mis-applied changes from commit 72b1e03aa725 ("cgroup: account
for memory_localevents in test_memcg_oom_group_leaf_events()").

Link: https://lkml.kernel.org/r/20220518161859.21565-1-mkoutny@suse.com
Link: https://lkml.kernel.org/r/20220518161859.21565-2-mkoutny@suse.com
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Reviewed-by: David Vernet <void@manifault.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Richard Palethorpe <rpalethorpe@suse.de>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27 09:33:44 -07:00
Miaohe Lin
943fb61dd6 mm/z3fold: fix z3fold_page_migrate races with z3fold_map
Think about the below scenario:

CPU1				CPU2
 z3fold_page_migrate		z3fold_map
  z3fold_page_trylock
  ...
  z3fold_page_unlock
  /* slots still points to old zhdr*/
				 get_z3fold_header
				  get slots from handle
				  get old zhdr from slots
				  z3fold_page_trylock
				  return *old* zhdr
  encode_handle(new_zhdr, FIRST|LAST|MIDDLE)
  put_page(page) /* zhdr is freed! */
				 but zhdr is still used by caller!

z3fold_map can map freed z3fold page and lead to use-after-free bug.  To
fix it, we add PAGE_MIGRATED to indicate z3fold page is migrated and soon
to be released.  So get_z3fold_header won't return such page.

Link: https://lkml.kernel.org/r/20220429064051.61552-10-linmiaohe@huawei.com
Fixes: 1f862989b04a ("mm/z3fold.c: support page migration")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27 09:33:44 -07:00
Miaohe Lin
04094226d6 mm/z3fold: fix z3fold_reclaim_page races with z3fold_free
Think about the below scenario:

CPU1				CPU2
z3fold_reclaim_page		z3fold_free
 spin_lock(&pool->lock)		 get_z3fold_header -- hold page_lock
 kref_get_unless_zero
				 kref_put--zhdr->refcount can be 1 now
 !z3fold_page_trylock
  kref_put -- zhdr->refcount is 0 now
   release_z3fold_page
    WARN_ON(!list_empty(&zhdr->buddy)); -- we're on buddy now!
    spin_lock(&pool->lock); -- deadlock here!

z3fold_reclaim_page might race with z3fold_free and will lead to pool lock
deadlock and zhdr buddy non-empty warning.  To fix this, defer getting the
refcount until page_lock is held just like what __z3fold_alloc does.  Note
this has the side effect that we won't break the reclaim if we meet a soon
to be released z3fold page now.

Link: https://lkml.kernel.org/r/20220429064051.61552-9-linmiaohe@huawei.com
Fixes: dcf5aedb24f8 ("z3fold: stricter locking and more careful reclaim")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27 09:33:44 -07:00
Miaohe Lin
4a1c383910 mm/z3fold: always clear PAGE_CLAIMED under z3fold page lock
Think about the below race window:

CPU1				CPU2
z3fold_reclaim_page		z3fold_free
 test_and_set_bit PAGE_CLAIMED
 failed to reclaim page
 z3fold_page_lock(zhdr);
 add back to the lru list;
 z3fold_page_unlock(zhdr);
				 get_z3fold_header
				 page_claimed=test_and_set_bit PAGE_CLAIMED

 clear_bit(PAGE_CLAIMED, &page->private);

				 if (!page_claimed) /* it's false true */
				  free_handle is not called

free_handle won't be called in this case. So z3fold_buddy_slots will leak.
Fix it by always clear PAGE_CLAIMED under z3fold page lock.

Link: https://lkml.kernel.org/r/20220429064051.61552-8-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27 09:33:44 -07:00
Miaohe Lin
6cf9a34967 mm/z3fold: put z3fold page back into unbuddied list when reclaim or migration fails
When doing z3fold page reclaim or migration, the page is removed from
unbuddied list.  If reclaim or migration succeeds, it's fine as page is
released.  But in case it fails, the page is not put back into unbuddied
list now.  The page will be leaked until next compaction work, reclaim or
migration is done.

Link: https://lkml.kernel.org/r/20220429064051.61552-7-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27 09:33:44 -07:00
Miaohe Lin
f4bad643c1 revert "mm/z3fold.c: allow __GFP_HIGHMEM in z3fold_alloc"
Revert commit f1549cb5ab2b ("mm/z3fold.c: allow __GFP_HIGHMEM in
z3fold_alloc").

z3fold can't support GFP_HIGHMEM page now.  page_address is used directly
at all places.  Moreover, z3fold_header is on per cpu unbuddied list which
could be accessed anytime.  So we should remove the support of GFP_HIGHMEM
allocation for z3fold.

Link: https://lkml.kernel.org/r/20220429064051.61552-6-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27 09:33:43 -07:00
Miaohe Lin
2c0f351434 mm/z3fold: throw warning on failure of trylock_page in z3fold_alloc
If trylock_page fails, the page won't be non-lru movable page.  When this
page is freed via free_z3fold_page, it will trigger bug on PageMovable
check in __ClearPageMovable.  Throw warning on failure of trylock_page to
guard against such rare case just as what zsmalloc does.

Link: https://lkml.kernel.org/r/20220429064051.61552-5-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27 09:33:43 -07:00
Miaohe Lin
df6f0f1d0c mm/z3fold: remove buggy use of stale list for allocation
Currently if z3fold couldn't find an unbuddied page it would first try to
pull a page off the stale list.  But this approach is problematic.  If
init z3fold page fails later, the page should be freed via
free_z3fold_page to clean up the relevant resource instead of using
__free_page directly.  And if page is successfully reused, it will BUG_ON
later in __SetPageMovable because it's already non-lru movable page, i.e. 
PAGE_MAPPING_MOVABLE is already set in page->mapping.  In order to fix all
of these issues, we can simply remove the buggy use of stale list for
allocation because can_sleep should always be false and we never really
hit the reusing code path now.

Link: https://lkml.kernel.org/r/20220429064051.61552-4-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27 09:33:43 -07:00
Miaohe Lin
7c61c35bbd mm/z3fold: fix possible null pointer dereferencing
alloc_slots could fail to allocate memory under heavy memory pressure.  So
we should check zhdr->slots against NULL to avoid future null pointer
dereferencing.

Link: https://lkml.kernel.org/r/20220429064051.61552-3-linmiaohe@huawei.com
Fixes: fc5488651c7d ("z3fold: simplify freeing slots")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27 09:33:43 -07:00
Miaohe Lin
4c6bdb3640 mm/z3fold: fix sheduling while atomic
Patch series "A few fixup patches for z3fold".

This series contains a few fixup patches to fix sheduling while atomic,
fix possible null pointer dereferencing, fix various race conditions and
so on. More details can be found in the respective changelogs.


This patch (of 9):

z3fold's page_lock is always held when calling alloc_slots.  So gfp should
be GFP_ATOMIC to avoid "scheduling while atomic" bug.

Link: https://lkml.kernel.org/r/20220429064051.61552-1-linmiaohe@huawei.com
Link: https://lkml.kernel.org/r/20220429064051.61552-2-linmiaohe@huawei.com
Fixes: fc5488651c7d ("z3fold: simplify freeing slots")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27 09:33:43 -07:00
Zi Yan
86d28b0709 mm: split free page with properly free memory accounting and without race
In isolate_single_pageblock(), free pages are checked without holding zone
lock, but they can go away in split_free_page() when zone lock is held.
Check the free page and its order again in split_free_page() when zone lock
is held. Recheck the page if the free page is gone under zone lock.

In addition, in split_free_page(), the free page was deleted from the page
list without changing free page accounting. Add the missing free page
accounting code.

Fix the type of order parameter in split_free_page().

Link: https://lore.kernel.org/lkml/20220525103621.987185e2ca0079f7b97b856d@linux-foundation.org/
Link: https://lkml.kernel.org/r/20220526231531.2404977-2-zi.yan@sent.com
Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity")
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reported-by: Doug Berger <opendmb@gmail.com>
  Link: https://lore.kernel.org/linux-mm/c3932a6f-77fe-29f7-0c29-fe6b1c67ab7b@gmail.com/
Cc: David Hildenbrand <david@redhat.com>
Cc: Qian Cai <quic_qiancai@quicinc.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Eric Ren <renzhengeek@gmail.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Michael Walle <michael@walle.cc>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27 09:33:43 -07:00
Zi Yan
9b209e557d mm: page-isolation: skip isolated pageblock in start_isolate_page_range()
start_isolate_page_range() first isolates the first and the last
pageblocks in the range and ensure pages across range boundaries are split
during isolation.  But it missed the case when the range is <= a pageblock
and the first and the last pageblocks are the same one, so the second
isolate_single_pageblock() will always fail.  To fix it, skip the
pageblock isolation in second isolate_single_pageblock().

Link: https://lkml.kernel.org/r/20220526231531.2404977-1-zi.yan@sent.com
Fixes: 88ee134320b8 ("mm: fix a potential infinite loop in start_isolate_page_range()")
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
  Link: https://lore.kernel.org/linux-mm/ac65adc0-a7e4-cdfe-a0d8-757195b86293@samsung.com/
Reported-by: Michael Walle <michael@walle.cc>
Tested-by: Michael Walle <michael@walle.cc>
  Link: https://lore.kernel.org/linux-mm/8ca048ca8b547e0dd1c95387ee05c23d@walle.cc/
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: David Hildenbrand <david@redhat.com>
Cc: Doug Berger <opendmb@gmail.com>
Cc: Eric Ren <renzhengeek@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Qian Cai <quic_qiancai@quicinc.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27 09:33:42 -07:00
Arnaldo Carvalho de Melo
9dde6cadb9 tools arch x86: Sync the msr-index.h copy with the kernel sources
To pick up the changes in:

  db1af12929c99d15 ("x86/msr-index: Define INTEGRITY_CAPABILITIES MSR")
  089be16d5992dd0b ("x86/msr: Add PerfCntrGlobal* registers")
  f52ba93190457aa2 ("tools/power turbostat: Add Power Limit4 support")

Addressing these tools/perf build warnings:

    diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
    Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'

That makes the beautification scripts to pick some new entries:

  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
  $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
  $ diff -u before after
  --- before	2022-05-26 12:50:01.228612839 -0300
  +++ after	2022-05-26 12:50:07.699776166 -0300
  @@ -116,6 +116,7 @@
   	[0x0000026f] = "MTRRfix4K_F8000",
   	[0x00000277] = "IA32_CR_PAT",
   	[0x00000280] = "IA32_MC0_CTL2",
  +	[0x000002d9] = "INTEGRITY_CAPS",
   	[0x000002ff] = "MTRRdefType",
   	[0x00000309] = "CORE_PERF_FIXED_CTR0",
   	[0x0000030a] = "CORE_PERF_FIXED_CTR1",
  @@ -176,6 +177,7 @@
   	[0x00000586] = "IA32_RTIT_ADDR3_A",
   	[0x00000587] = "IA32_RTIT_ADDR3_B",
   	[0x00000600] = "IA32_DS_AREA",
  +	[0x00000601] = "VR_CURRENT_CONFIG",
   	[0x00000606] = "RAPL_POWER_UNIT",
   	[0x0000060a] = "PKGC3_IRTL",
   	[0x0000060b] = "PKGC6_IRTL",
  @@ -260,6 +262,10 @@
   	[0xc0000102 - x86_64_specific_MSRs_offset] = "KERNEL_GS_BASE",
   	[0xc0000103 - x86_64_specific_MSRs_offset] = "TSC_AUX",
   	[0xc0000104 - x86_64_specific_MSRs_offset] = "AMD64_TSC_RATIO",
  +	[0xc000010f - x86_64_specific_MSRs_offset] = "AMD_DBG_EXTN_CFG",
  +	[0xc0000300 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_STATUS",
  +	[0xc0000301 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_CTL",
  +	[0xc0000302 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_STATUS_CLR",
   };

   #define x86_AMD_V_KVM_MSRs_offset 0xc0010000
  @@ -318,4 +324,5 @@
   	[0xc00102b4 - x86_AMD_V_KVM_MSRs_offset] = "AMD_CPPC_STATUS",
   	[0xc00102f0 - x86_AMD_V_KVM_MSRs_offset] = "AMD_PPIN_CTL",
   	[0xc00102f1 - x86_AMD_V_KVM_MSRs_offset] = "AMD_PPIN",
  +	[0xc0010300 - x86_AMD_V_KVM_MSRs_offset] = "AMD_SAMP_BR_FROM",
   };
  $

Now one can trace systemwide asking to see backtraces to where those
MSRs are being read/written, see this example with a previous update:

  # perf trace -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB"
  ^C#

If we use -v (verbose mode) we can see what it does behind the scenes:

  # perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB"
  Using CPUID AuthenticAMD-25-21-0
  0x6a0
  0x6a8
  New filter for msr:read_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313)
  0x6a0
  0x6a8
  New filter for msr:write_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313)
  mmap size 528384B
  ^C#

Example with a frequent msr:

  # perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr==IA32_SPEC_CTRL" --max-events 2
  Using CPUID AuthenticAMD-25-21-0
  0x48
  New filter for msr:read_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841)
  0x48
  New filter for msr:write_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841)
  mmap size 528384B
  Looking at the vmlinux_path (8 entries long)
  symsrc__init: build id mismatch for vmlinux.
  Using /proc/kcore for kernel data
  Using /proc/kallsyms for symbols
     0.000 Timer/2525383 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
                                       do_trace_write_msr ([kernel.kallsyms])
                                       do_trace_write_msr ([kernel.kallsyms])
                                       __switch_to_xtra ([kernel.kallsyms])
                                       __switch_to ([kernel.kallsyms])
                                       __schedule ([kernel.kallsyms])
                                       schedule ([kernel.kallsyms])
                                       futex_wait_queue_me ([kernel.kallsyms])
                                       futex_wait ([kernel.kallsyms])
                                       do_futex ([kernel.kallsyms])
                                       __x64_sys_futex ([kernel.kallsyms])
                                       do_syscall_64 ([kernel.kallsyms])
                                       entry_SYSCALL_64_after_hwframe ([kernel.kallsyms])
                                       __futex_abstimed_wait_common64 (/usr/lib64/libpthread-2.33.so)
     0.030 :0/0 msr:write_msr(msr: IA32_SPEC_CTRL, val: 2)
                                       do_trace_write_msr ([kernel.kallsyms])
                                       do_trace_write_msr ([kernel.kallsyms])
                                       __switch_to_xtra ([kernel.kallsyms])
                                       __switch_to ([kernel.kallsyms])
                                       __schedule ([kernel.kallsyms])
                                       schedule_idle ([kernel.kallsyms])
                                       do_idle ([kernel.kallsyms])
                                       cpu_startup_entry ([kernel.kallsyms])
                                       secondary_startup_64_no_verify ([kernel.kallsyms])
  #

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/lkml/Yo+i%252Fj5+UtE9dcix@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-27 13:22:14 -03:00
Leo Yan
12fdd6c009 perf scripts python: Support Arm CoreSight trace data disassembly
This commit adds python script to parse CoreSight tracing event and
print out source line and disassembly, it generates readable program
execution flow for easier humans inspecting.

The script receives CoreSight tracing packet with below format:

                +------------+------------+------------+
  packet(n):    |    addr    |    ip      |    cpu     |
                +------------+------------+------------+
  packet(n+1):  |    addr    |    ip      |    cpu     |
                +------------+------------+------------+

packet::addr presents the start address of the coming branch sample, and
packet::ip is the last address of the branch smple.  Therefore, a code
section between branches starts from packet(n)::addr and it stops at
packet(n+1)::ip.  As results we combines the two continuous packets to
generate the address range for instructions:

  [ sample(n)::addr .. sample(n+1)::ip ]

The script supports both objdump or llvm-objdump for disassembly with
specifying option '-d'.  If doesn't specify option '-d', the script
simply outputs source lines and symbols.

Below shows usages with llvm-objdump or objdump to output disassembly.

  # perf script -s scripts/python/arm-cs-trace-disasm.py -- -d llvm-objdump-11 -k ./vmlinux
  ARM CoreSight Trace Data Assembler Dump
  	ffff800008eb3198 <etm4_enable_hw>:
  	ffff800008eb3310: c0 38 00 35  	cbnz	w0, 0xffff800008eb3a28 <etm4_enable_hw+0x890>
  	ffff800008eb3314: 9f 3f 03 d5  	dsb	sy
  	ffff800008eb3318: df 3f 03 d5  	isb
  	ffff800008eb331c: f5 5b 42 a9  	ldp	x21, x22, [sp, #32]
  	ffff800008eb3320: fb 73 45 a9  	ldp	x27, x28, [sp, #80]
  	ffff800008eb3324: e0 82 40 39  	ldrb	w0, [x23, #32]
  	ffff800008eb3328: 60 00 00 34  	cbz	w0, 0xffff800008eb3334 <etm4_enable_hw+0x19c>
  	ffff800008eb332c: e0 03 19 aa  	mov	x0, x25
  	ffff800008eb3330: 8c fe ff 97  	bl	0xffff800008eb2d60 <etm4_cs_lock.isra.0.part.0>
              main  6728/6728  [0004]         0.000000000  etm4_enable_hw+0x198                    [kernel.kallsyms]
  	ffff800008eb2d60 <etm4_cs_lock.isra.0.part.0>:
  	ffff800008eb2d60: 1f 20 03 d5  	nop
  	ffff800008eb2d64: 1f 20 03 d5  	nop
  	ffff800008eb2d68: 3f 23 03 d5  	hint	#25
  	ffff800008eb2d6c: 00 00 40 f9  	ldr	x0, [x0]
  	ffff800008eb2d70: 9f 3f 03 d5  	dsb	sy
  	ffff800008eb2d74: 00 c0 3e 91  	add	x0, x0, #4016
  	ffff800008eb2d78: 1f 00 00 b9  	str	wzr, [x0]
  	ffff800008eb2d7c: bf 23 03 d5  	hint	#29
  	ffff800008eb2d80: c0 03 5f d6  	ret
              main  6728/6728  [0004]         0.000000000  etm4_cs_lock.isra.0.part.0+0x20

  # perf script -s scripts/python/arm-cs-trace-disasm.py -- -d objdump -k ./vmlinux
  ARM CoreSight Trace Data Assembler Dump
  	ffff800008eb3310 <etm4_enable_hw+0x178>:
  	ffff800008eb3310:	350038c0 	cbnz	w0, ffff800008eb3a28 <etm4_enable_hw+0x890>
  	ffff800008eb3314:	d5033f9f 	dsb	sy
  	ffff800008eb3318:	d5033fdf 	isb
  	ffff800008eb331c:	a9425bf5 	ldp	x21, x22, [sp, #32]
  	ffff800008eb3320:	a94573fb 	ldp	x27, x28, [sp, #80]
  	ffff800008eb3324:	394082e0 	ldrb	w0, [x23, #32]
  	ffff800008eb3328:	34000060 	cbz	w0, ffff800008eb3334 <etm4_enable_hw+0x19c>
  	ffff800008eb332c:	aa1903e0 	mov	x0, x25
  	ffff800008eb3330:	97fffe8c 	bl	ffff800008eb2d60 <etm4_cs_lock.isra.0.part.0>
              main  6728/6728  [0004]         0.000000000  etm4_enable_hw+0x198                    [kernel.kallsyms]
  	ffff800008eb2d60 <etm4_cs_lock.isra.0.part.0>:
  	ffff800008eb2d60:	d503201f 	nop
  	ffff800008eb2d64:	d503201f 	nop
  	ffff800008eb2d68:	d503233f 	paciasp
  	ffff800008eb2d6c:	f9400000 	ldr	x0, [x0]
  	ffff800008eb2d70:	d5033f9f 	dsb	sy
  	ffff800008eb2d74:	913ec000 	add	x0, x0, #0xfb0
  	ffff800008eb2d78:	b900001f 	str	wzr, [x0]
  	ffff800008eb2d7c:	d50323bf 	autiasp
  	ffff800008eb2d80:	d65f03c0 	ret
              main  6728/6728  [0004]         0.000000000  etm4_cs_lock.isra.0.part.0+0x20

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Co-authored-by: Al Grant <al.grant@arm.com>
Co-authored-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Co-authored-by: Tor Jeremiassen <tor@ti.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Eelco Chaudron <echaudro@redhat.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Tanmay Jagdale <tanmay@marvell.com>
Cc: coresight@lists.linaro.org
Cc: zengshun . wu <zengshun.wu@outlook.com>
Link: https://lore.kernel.org/r/20220521130446.4163597-3-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-27 13:22:14 -03:00