Commit Graph

1153114 Commits

Author SHA1 Message Date
Minghao Chi
e88f319a25 rtc: ds1742: use devm_platform_get_and_ioremap_resource()
Convert platform_get_resource(), devm_ioremap_resource() to a single
call to devm_platform_get_and_ioremap_resource(), as this is exactly
what this function does.

Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn>
Link: https://lore.kernel.org/r/202211220947194856561@zte.com.cn
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2022-12-15 23:34:31 +01:00
GUO Zihua
55d5a86618 rtc: mxc_v2: Add missing clk_disable_unprepare()
The call to clk_disable_unprepare() is left out in the error handling of
devm_rtc_allocate_device. Add it back.

Fixes: 5490a1e018 ("rtc: mxc_v2: fix possible race condition")
Signed-off-by: GUO Zihua <guozihua@huawei.com>
Link: https://lore.kernel.org/r/20221122085046.21689-1-guozihua@huawei.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2022-12-15 23:34:31 +01:00
Zhang Jiaming
3bb23f1f9d rtc: rs5c313: correct some spelling mistakes
Change 'modifed' to 'modified'.
Change 'Updata' to 'Update'.
Change 'Initiatlize' to 'Initialize'.

Signed-off-by: Zhang Jiaming <jiaming@nfschina.com>
Link: https://lore.kernel.org/r/20220622085344.23519-1-jiaming@nfschina.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2022-12-15 23:34:31 +01:00
Xiang wangx
2dc5e3fb6e rtc: at91rm9200: Fix syntax errors in comments
Delete the redundant word 'is'.

Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com>
Link: https://lore.kernel.org/r/20220605083515.9514-1-wangxiang@cdjrlc.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2022-12-15 23:34:31 +01:00
shaomin Deng
b9354487fb rtc: remove duplicated words in comments
Signed-off-by: shaomin Deng <dengshaomin@cdjrlc.com>
Link: https://lore.kernel.org/r/20220808152354.3641-1-dengshaomin@cdjrlc.com
Link: https://lore.kernel.org/r/20220808153454.6844-1-dengshaomin@cdjrlc.com
Link: https://lore.kernel.org/r/20220808152822.5012-1-dengshaomin@cdjrlc.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2022-12-15 23:34:31 +01:00
Wadim Egorov
16b26f6027 rtc: rv3028: Use IRQ flags obtained from device tree if available
Make the interrupt pin of the RV3028 usable with GPIO controllers
without level type IRQs support, such as the TI Davinci GPIO controller.

Therefore, allow the IRQ type to be passed from the device tree if
available.

Based on commit d4785b4634 ("rtc: pcf2127: use IRQ flags obtained from device tree if available")

Signed-off-by: Wadim Egorov <w.egorov@phytec.de>
Link: https://lore.kernel.org/r/20221208133605.4193907-1-w.egorov@phytec.de
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2022-12-15 23:34:31 +01:00
ye xingchen
0feebdeb3a rtc: ds1307: use sysfs_emit() to instead of scnprintf()
Follow the advice of the Documentation/filesystems/sysfs.rst and show()
should only use sysfs_emit() or sysfs_emit_at() when formatting the
value to be returned to user space.

Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn>
Link: https://lore.kernel.org/r/202212051134455911470@zte.com.cn
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2022-12-15 23:34:31 +01:00
Jean Delvare
bb0a1799bb rtc: isl12026: drop obsolete dependency on COMPILE_TEST
Since commit 0166dc11be ("of: make CONFIG_OF user selectable"), it
is possible to test-build any driver which depends on OF on any
architecture by explicitly selecting OF. Therefore depending on
COMPILE_TEST as an alternative is no longer needed.

It is actually better to always build such drivers with OF enabled,
so that the test builds are closer to how each driver will actually be
built on its intended target. Building them without OF may not test
much as the compiler will optimize out potentially large parts of the
code. In the worst case, this could even pop false positive warnings.
Dropping COMPILE_TEST here improves the quality of our testing and
avoids wasting time on non-existent issues.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20221124154359.039be06c@endymion.delvare
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2022-12-15 23:34:31 +01:00
Tan Tee Min
72abeedd83 igc: Set Qbv start_time and end_time to end_time if not being configured in GCL
The default setting of end_time minus start_time is whole 1 second.
Thus, if it's not being configured in any GCL entry then it will be
staying at original 1 second.

This patch is changing the start_time and end_time to be end_time as
if setting zero will be having weird HW behavior where the gate will
not be fully closed.

Fixes: ec50a9d437 ("igc: Add support for taprio offloading")
Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com>
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-12-15 13:20:05 -08:00
Tan Tee Min
6d05251d53 igc: recalculate Qbv end_time by considering cycle time
Qbv users can specify a cycle time that is not equal to the total GCL
intervals. Hence, recalculation is necessary here to exclude the time
interval that exceeds the cycle time. As those GCL which exceeds the
cycle time will be truncated.

According to IEEE Std. 802.1Q-2018 section 8.6.9.2, once the end of
the list is reached, it will switch to the END_OF_CYCLE state and
leave the gates in the same state until the next cycle is started.

Fixes: ec50a9d437 ("igc: Add support for taprio offloading")
Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com>
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-12-15 13:19:56 -08:00
Tan Tee Min
e17090eb24 igc: allow BaseTime 0 enrollment for Qbv
Introduce qbv_enable flag in igc_adapter struct to store the Qbv on/off.
So this allow the BaseTime to enroll with zero value.

Fixes: 61572d5f8f ("igc: Simplify TSN flags handling")
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-12-15 13:19:49 -08:00
Muhammad Husaini Zulkifli
3b61764fb4 igc: Add checking for basetime less than zero
Using the tc qdisc command, the user can set basetime to any value.
Checking should be done on the driver's side to prevent registering
basetime values that are less than zero.

Fixes: ec50a9d437 ("igc: Add support for taprio offloading")
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-12-15 13:19:39 -08:00
Vinicius Costa Gomes
d8f45be01d igc: Use strict cycles for Qbv scheduling
Configuring strict cycle mode in the controller forces more well
behaved transmissions when taprio is offloaded.

When set this strict_cycle and strict_end, transmission is not
enabled if the whole packet cannot be completed before end of
the Qbv cycle.

Fixes: 82faa9b799 ("igc: Add support for ETF offloading")
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Aravindhan Gunasekaran <aravindhan.gunasekaran@intel.com>
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-12-15 13:19:30 -08:00
Vinicius Costa Gomes
db0b124f02 igc: Enhance Qbv scheduling by using first flag bit
The I225 hardware has a limitation that packets can only be scheduled
in the [0, cycle-time] interval. So, scheduling a packet to the start
of the next cycle doesn't usually work.

To overcome this, we use the Transmit Descriptor first flag to indicates
that a packet should be the first packet (from a queue) in a cycle
according to the section 7.5.2.9.3.4 The First Packet on Each QBV Cycle
in Intel Discrete I225/6 User Manual.

But this only works if there was any packet from that queue during the
current cycle, to avoid this issue, we issue an empty packet if that's
not the case. Also require one more descriptor to be available, to take
into account the empty packet that might be issued.

Test Setup:

Talker: Use l2_tai to generate the launchtime into packet load.

Listener: Use timedump.c to compute the delta between packet arrival
and LaunchTime packet payload.

Test Result:

Before:

1666000610127300000,1666000610127300096,96,621273
1666000610127400000,1666000610127400192,192,621274
1666000610127500000,1666000610127500032,32,621275
1666000610127600000,1666000610127600128,128,621276
1666000610127700000,1666000610127700224,224,621277
1666000610127800000,1666000610127800064,64,621278
1666000610127900000,1666000610127900160,160,621279
1666000610128000000,1666000610128000000,0,621280
1666000610128100000,1666000610128100096,96,621281
1666000610128200000,1666000610128200192,192,621282
1666000610128300000,1666000610128300032,32,621283
1666000610128400000,1666000610128301056,-98944,621284
1666000610128500000,1666000610128302080,-197920,621285
1666000610128600000,1666000610128302848,-297152,621286
1666000610128700000,1666000610128303872,-396128,621287
1666000610128800000,1666000610128304896,-495104,621288
1666000610128900000,1666000610128305664,-594336,621289
1666000610129000000,1666000610128306688,-693312,621290
1666000610129100000,1666000610128307712,-792288,621291
1666000610129200000,1666000610128308480,-891520,621292
1666000610129300000,1666000610128309504,-990496,621293
1666000610129400000,1666000610128310528,-1089472,621294
1666000610129500000,1666000610128311296,-1188704,621295
1666000610129600000,1666000610128312320,-1287680,621296
1666000610129700000,1666000610128313344,-1386656,621297
1666000610129800000,1666000610128314112,-1485888,621298
1666000610129900000,1666000610128315136,-1584864,621299
1666000610130000000,1666000610128316160,-1683840,621300
1666000610130100000,1666000610128316928,-1783072,621301
1666000610130200000,1666000610128317952,-1882048,621302
1666000610130300000,1666000610128318976,-1981024,621303
1666000610130400000,1666000610128319744,-2080256,621304
1666000610130500000,1666000610128320768,-2179232,621305
1666000610130600000,1666000610128321792,-2278208,621306
1666000610130700000,1666000610128322816,-2377184,621307
1666000610130800000,1666000610128323584,-2476416,621308
1666000610130900000,1666000610128324608,-2575392,621309
1666000610131000000,1666000610128325632,-2674368,621310
1666000610131100000,1666000610128326400,-2773600,621311
1666000610131200000,1666000610128327424,-2872576,621312
1666000610131300000,1666000610128328448,-2971552,621313
1666000610131400000,1666000610128329216,-3070784,621314
1666000610131500000,1666000610131500032,32,621315
1666000610131600000,1666000610131600128,128,621316
1666000610131700000,1666000610131700224,224,621317

After:

1666073510646200000,1666073510646200064,64,2676462
1666073510646300000,1666073510646300160,160,2676463
1666073510646400000,1666073510646400256,256,2676464
1666073510646500000,1666073510646500096,96,2676465
1666073510646600000,1666073510646600192,192,2676466
1666073510646700000,1666073510646700032,32,2676467
1666073510646800000,1666073510646800128,128,2676468
1666073510646900000,1666073510646900224,224,2676469
1666073510647000000,1666073510647000064,64,2676470
1666073510647100000,1666073510647100160,160,2676471
1666073510647200000,1666073510647200256,256,2676472
1666073510647300000,1666073510647300096,96,2676473
1666073510647400000,1666073510647400192,192,2676474
1666073510647500000,1666073510647500032,32,2676475
1666073510647600000,1666073510647600128,128,2676476
1666073510647700000,1666073510647700224,224,2676477
1666073510647800000,1666073510647800064,64,2676478
1666073510647900000,1666073510647900160,160,2676479
1666073510648000000,1666073510648000000,0,2676480
1666073510648100000,1666073510648100096,96,2676481
1666073510648200000,1666073510648200192,192,2676482
1666073510648300000,1666073510648300032,32,2676483
1666073510648400000,1666073510648400128,128,2676484
1666073510648500000,1666073510648500224,224,2676485
1666073510648600000,1666073510648600064,64,2676486
1666073510648700000,1666073510648700160,160,2676487
1666073510648800000,1666073510648800000,0,2676488
1666073510648900000,1666073510648900096,96,2676489
1666073510649000000,1666073510649000192,192,2676490
1666073510649100000,1666073510649100032,32,2676491
1666073510649200000,1666073510649200128,128,2676492
1666073510649300000,1666073510649300224,224,2676493
1666073510649400000,1666073510649400064,64,2676494
1666073510649500000,1666073510649500160,160,2676495
1666073510649600000,1666073510649600000,0,2676496
1666073510649700000,1666073510649700096,96,2676497
1666073510649800000,1666073510649800192,192,2676498
1666073510649900000,1666073510649900032,32,2676499
1666073510650000000,1666073510650000128,128,2676500

Fixes: 82faa9b799 ("igc: Add support for ETF offloading")
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Co-developed-by: Aravindhan Gunasekaran <aravindhan.gunasekaran@intel.com>
Signed-off-by: Aravindhan Gunasekaran <aravindhan.gunasekaran@intel.com>
Co-developed-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Signed-off-by: Malli C <mallikarjuna.chilakala@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-12-15 13:18:34 -08:00
Linus Torvalds
785d21ba2f VFIO updates for v6.2-rc1
- Replace deprecated git://github.com link in MAINTAINERS. (Palmer Dabbelt)
 
  - Simplify vfio/mlx5 with module_pci_driver() helper. (Shang XiaoJing)
 
  - Drop unnecessary buffer from ACPI call. (Rafael Mendonca)
 
  - Correct latent missing include issue in iova-bitmap and fix support
    for unaligned bitmaps.  Follow-up with better fix through refactor.
    (Joao Martins)
 
  - Rework ccw mdev driver to split private data from parent structure,
    better aligning with the mdev lifecycle and allowing us to remove
    a temporary workaround. (Eric Farman)
 
  - Add an interface to get an estimated migration data size for a device,
    allowing userspace to make informed decisions, ex. more accurately
    predicting VM downtime. (Yishai Hadas)
 
  - Fix minor typo in vfio/mlx5 array declaration. (Yishai Hadas)
 
  - Simplify module and Kconfig through consolidating SPAPR/EEH code and
    config options and folding virqfd module into main vfio module.
    (Jason Gunthorpe)
 
  - Fix error path from device_register() across all vfio mdev and sample
    drivers. (Alex Williamson)
 
  - Define migration pre-copy interface and implement for vfio/mlx5
    devices, allowing portions of the device state to be saved while the
    device continues operation, towards reducing the stop-copy state
    size. (Jason Gunthorpe, Yishai Hadas, Shay Drory)
 
  - Implement pre-copy for hisi_acc devices. (Shameer Kolothum)
 
  - Fixes to mdpy mdev driver remove path and error path on probe.
    (Shang XiaoJing)
 
  - vfio/mlx5 fixes for incorrect return after copy_to_user() fault and
    incorrect buffer freeing. (Dan Carpenter)
 -----BEGIN PGP SIGNATURE-----
 
 iQJPBAABCAA5FiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmObfPgbHGFsZXgud2ls
 bGlhbXNvbkByZWRoYXQuY29tAAoJECObm247sIsiDogP/i9GuBKposvZpnfxXWwo
 oNpKBZSOVMW8wgavNEuryMb+9WoouIghce8XU49MmONoP26kIh5TA14Zpi3XWkLK
 K+NlpwicESvLeZVHU7f3R8meVqmPtlxIi59jE+CfEHB8BW2HIAsEdwdhkxMwus9C
 nuiiK/2YYyQWOXYc4LAIkspMzjtGPy6Im5P6AED+dI+TFCEqJAM5qgOLJZFlk4a/
 WwZY2xjVKOl6xf5VZXGw+v7fDgz2Ju+j4Bm3X5lx1HgiDrEH83MjXY5h67neAIVb
 bXrfNLN++MiuO5niGTFMbUjGVUIFxsfmJzBnL9QrLsuj0JrGEKsu/1JEO78g0Km0
 ZCChoJ6UyUOgxt6evEymUAZAAkbcKaaht2gdbAXW71tv9p1TripAbBKwVeah1bQp
 SiHPqy9InKJlhaf+GbXL9eux1WVMfQ6FZccU16bNt7VaV2I8js85z/2gqVD0a5Mw
 +gnwp5XMUFWNKlJrnc7uVCD0bDExwQhr75OP4rWjMNvvLi9hPXJ2cI2Sg+9OLzQw
 vm/I+Df+FfXCuGAgX4Lxq76pqWlYGJH0Qxc14Ds6YoXqygBPz9yvTtuBv8mTHJzE
 KdAl/6DmZZxZ/JFD9lPF80KRiAsJ6iNf6tPTWES7hfDBfIdgQ/DZbXridLWJPNoi
 xLfaW19yrLTXWKSmR7G2Lsz4
 =q9xs
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v6.2-rc1' of https://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

 - Replace deprecated git://github.com link in MAINTAINERS (Palmer
   Dabbelt)

 - Simplify vfio/mlx5 with module_pci_driver() helper (Shang XiaoJing)

 - Drop unnecessary buffer from ACPI call (Rafael Mendonca)

 - Correct latent missing include issue in iova-bitmap and fix support
   for unaligned bitmaps. Follow-up with better fix through refactor
   (Joao Martins)

 - Rework ccw mdev driver to split private data from parent structure,
   better aligning with the mdev lifecycle and allowing us to remove a
   temporary workaround (Eric Farman)

 - Add an interface to get an estimated migration data size for a
   device, allowing userspace to make informed decisions, ex. more
   accurately predicting VM downtime (Yishai Hadas)

 - Fix minor typo in vfio/mlx5 array declaration (Yishai Hadas)

 - Simplify module and Kconfig through consolidating SPAPR/EEH code and
   config options and folding virqfd module into main vfio module (Jason
   Gunthorpe)

 - Fix error path from device_register() across all vfio mdev and sample
   drivers (Alex Williamson)

 - Define migration pre-copy interface and implement for vfio/mlx5
   devices, allowing portions of the device state to be saved while the
   device continues operation, towards reducing the stop-copy state size
   (Jason Gunthorpe, Yishai Hadas, Shay Drory)

 - Implement pre-copy for hisi_acc devices (Shameer Kolothum)

 - Fixes to mdpy mdev driver remove path and error path on probe (Shang
   XiaoJing)

 - vfio/mlx5 fixes for incorrect return after copy_to_user() fault and
   incorrect buffer freeing (Dan Carpenter)

* tag 'vfio-v6.2-rc1' of https://github.com/awilliam/linux-vfio: (42 commits)
  vfio/mlx5: error pointer dereference in error handling
  vfio/mlx5: fix error code in mlx5vf_precopy_ioctl()
  samples: vfio-mdev: Fix missing pci_disable_device() in mdpy_fb_probe()
  hisi_acc_vfio_pci: Enable PRE_COPY flag
  hisi_acc_vfio_pci: Move the dev compatibility tests for early check
  hisi_acc_vfio_pci: Introduce support for PRE_COPY state transitions
  hisi_acc_vfio_pci: Add support for precopy IOCTL
  vfio/mlx5: Enable MIGRATION_PRE_COPY flag
  vfio/mlx5: Fallback to STOP_COPY upon specific PRE_COPY error
  vfio/mlx5: Introduce multiple loads
  vfio/mlx5: Consider temporary end of stream as part of PRE_COPY
  vfio/mlx5: Introduce vfio precopy ioctl implementation
  vfio/mlx5: Introduce SW headers for migration states
  vfio/mlx5: Introduce device transitions of PRE_COPY
  vfio/mlx5: Refactor to use queue based data chunks
  vfio/mlx5: Refactor migration file state
  vfio/mlx5: Refactor MKEY usage
  vfio/mlx5: Refactor PD usage
  vfio/mlx5: Enforce a single SAVE command at a time
  vfio: Extend the device migration protocol with PRE_COPY
  ...
2022-12-15 13:12:15 -08:00
Linus Torvalds
8fa590bf34 ARM64:
* Enable the per-vcpu dirty-ring tracking mechanism, together with an
   option to keep the good old dirty log around for pages that are
   dirtied by something other than a vcpu.
 
 * Switch to the relaxed parallel fault handling, using RCU to delay
   page table reclaim and giving better performance under load.
 
 * Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping option,
   which multi-process VMMs such as crosvm rely on (see merge commit 382b5b87a9:
   "Fix a number of issues with MTE, such as races on the tags being
   initialised vs the PG_mte_tagged flag as well as the lack of support
   for VM_SHARED when KVM is involved.  Patches from Catalin Marinas and
   Peter Collingbourne").
 
 * Merge the pKVM shadow vcpu state tracking that allows the hypervisor
   to have its own view of a vcpu, keeping that state private.
 
 * Add support for the PMUv3p5 architecture revision, bringing support
   for 64bit counters on systems that support it, and fix the
   no-quite-compliant CHAIN-ed counter support for the machines that
   actually exist out there.
 
 * Fix a handful of minor issues around 52bit VA/PA support (64kB pages
   only) as a prefix of the oncoming support for 4kB and 16kB pages.
 
 * Pick a small set of documentation and spelling fixes, because no
   good merge window would be complete without those.
 
 s390:
 
 * Second batch of the lazy destroy patches
 
 * First batch of KVM changes for kernel virtual != physical address support
 
 * Removal of a unused function
 
 x86:
 
 * Allow compiling out SMM support
 
 * Cleanup and documentation of SMM state save area format
 
 * Preserve interrupt shadow in SMM state save area
 
 * Respond to generic signals during slow page faults
 
 * Fixes and optimizations for the non-executable huge page errata fix.
 
 * Reprogram all performance counters on PMU filter change
 
 * Cleanups to Hyper-V emulation and tests
 
 * Process Hyper-V TLB flushes from a nested guest (i.e. from a L2 guest
   running on top of a L1 Hyper-V hypervisor)
 
 * Advertise several new Intel features
 
 * x86 Xen-for-KVM:
 
 ** Allow the Xen runstate information to cross a page boundary
 
 ** Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured
 
 ** Add support for 32-bit guests in SCHEDOP_poll
 
 * Notable x86 fixes and cleanups:
 
 ** One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0).
 
 ** Reinstate IBPB on emulated VM-Exit that was incorrectly dropped a few
    years back when eliminating unnecessary barriers when switching between
    vmcs01 and vmcs02.
 
 ** Clean up vmread_error_trampoline() to make it more obvious that params
    must be passed on the stack, even for x86-64.
 
 ** Let userspace set all supported bits in MSR_IA32_FEAT_CTL irrespective
    of the current guest CPUID.
 
 ** Fudge around a race with TSC refinement that results in KVM incorrectly
    thinking a guest needs TSC scaling when running on a CPU with a
    constant TSC, but no hardware-enumerated TSC frequency.
 
 ** Advertise (on AMD) that the SMM_CTL MSR is not supported
 
 ** Remove unnecessary exports
 
 Generic:
 
 * Support for responding to signals during page faults; introduces
   new FOLL_INTERRUPTIBLE flag that was reviewed by mm folks
 
 Selftests:
 
 * Fix an inverted check in the access tracking perf test, and restore
   support for asserting that there aren't too many idle pages when
   running on bare metal.
 
 * Fix build errors that occur in certain setups (unsure exactly what is
   unique about the problematic setup) due to glibc overriding
   static_assert() to a variant that requires a custom message.
 
 * Introduce actual atomics for clear/set_bit() in selftests
 
 * Add support for pinning vCPUs in dirty_log_perf_test.
 
 * Rename the so called "perf_util" framework to "memstress".
 
 * Add a lightweight psuedo RNG for guest use, and use it to randomize
   the access pattern and write vs. read percentage in the memstress tests.
 
 * Add a common ucall implementation; code dedup and pre-work for running
   SEV (and beyond) guests in selftests.
 
 * Provide a common constructor and arch hook, which will eventually be
   used by x86 to automatically select the right hypercall (AMD vs. Intel).
 
 * A bunch of added/enabled/fixed selftests for ARM64, covering memslots,
   breakpoints, stage-2 faults and access tracking.
 
 * x86-specific selftest changes:
 
 ** Clean up x86's page table management.
 
 ** Clean up and enhance the "smaller maxphyaddr" test, and add a related
    test to cover generic emulation failure.
 
 ** Clean up the nEPT support checks.
 
 ** Add X86_PROPERTY_* framework to retrieve multi-bit CPUID values.
 
 ** Fix an ordering issue in the AMX test introduced by recent conversions
    to use kvm_cpu_has(), and harden the code to guard against similar bugs
    in the future.  Anything that tiggers caching of KVM's supported CPUID,
    kvm_cpu_has() in this case, effectively hides opt-in XSAVE features if
    the caching occurs before the test opts in via prctl().
 
 Documentation:
 
 * Remove deleted ioctls from documentation
 
 * Clean up the docs for the x86 MSR filter.
 
 * Various fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmOaFrcUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPemQgAq49excg2Cc+EsHnZw3vu/QWdA0Rt
 KhL3OgKxuHNjCbD2O9n2t5di7eJOTQ7F7T0eDm3xPTr4FS8LQ2327/mQePU/H2CF
 mWOpq9RBWLzFsSTeVA2Mz9TUTkYSnDHYuRsBvHyw/n9cL76BWVzjImldFtjYjjex
 yAwl8c5itKH6bc7KO+5ydswbvBzODkeYKUSBNdbn6m0JGQST7XppNwIAJvpiHsii
 Qgpk0e4Xx9q4PXG/r5DedI6BlufBsLhv0aE9SHPzyKH3JbbUFhJYI8ZD5OhBQuYW
 MwxK2KlM5Jm5ud2NZDDlsMmmvd1lnYCFDyqNozaKEWC1Y5rq1AbMa51fXA==
 =QAYX
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM64:

   - Enable the per-vcpu dirty-ring tracking mechanism, together with an
     option to keep the good old dirty log around for pages that are
     dirtied by something other than a vcpu.

   - Switch to the relaxed parallel fault handling, using RCU to delay
     page table reclaim and giving better performance under load.

   - Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping
     option, which multi-process VMMs such as crosvm rely on (see merge
     commit 382b5b87a9: "Fix a number of issues with MTE, such as
     races on the tags being initialised vs the PG_mte_tagged flag as
     well as the lack of support for VM_SHARED when KVM is involved.
     Patches from Catalin Marinas and Peter Collingbourne").

   - Merge the pKVM shadow vcpu state tracking that allows the
     hypervisor to have its own view of a vcpu, keeping that state
     private.

   - Add support for the PMUv3p5 architecture revision, bringing support
     for 64bit counters on systems that support it, and fix the
     no-quite-compliant CHAIN-ed counter support for the machines that
     actually exist out there.

   - Fix a handful of minor issues around 52bit VA/PA support (64kB
     pages only) as a prefix of the oncoming support for 4kB and 16kB
     pages.

   - Pick a small set of documentation and spelling fixes, because no
     good merge window would be complete without those.

  s390:

   - Second batch of the lazy destroy patches

   - First batch of KVM changes for kernel virtual != physical address
     support

   - Removal of a unused function

  x86:

   - Allow compiling out SMM support

   - Cleanup and documentation of SMM state save area format

   - Preserve interrupt shadow in SMM state save area

   - Respond to generic signals during slow page faults

   - Fixes and optimizations for the non-executable huge page errata
     fix.

   - Reprogram all performance counters on PMU filter change

   - Cleanups to Hyper-V emulation and tests

   - Process Hyper-V TLB flushes from a nested guest (i.e. from a L2
     guest running on top of a L1 Hyper-V hypervisor)

   - Advertise several new Intel features

   - x86 Xen-for-KVM:

      - Allow the Xen runstate information to cross a page boundary

      - Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured

      - Add support for 32-bit guests in SCHEDOP_poll

   - Notable x86 fixes and cleanups:

      - One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0).

      - Reinstate IBPB on emulated VM-Exit that was incorrectly dropped
        a few years back when eliminating unnecessary barriers when
        switching between vmcs01 and vmcs02.

      - Clean up vmread_error_trampoline() to make it more obvious that
        params must be passed on the stack, even for x86-64.

      - Let userspace set all supported bits in MSR_IA32_FEAT_CTL
        irrespective of the current guest CPUID.

      - Fudge around a race with TSC refinement that results in KVM
        incorrectly thinking a guest needs TSC scaling when running on a
        CPU with a constant TSC, but no hardware-enumerated TSC
        frequency.

      - Advertise (on AMD) that the SMM_CTL MSR is not supported

      - Remove unnecessary exports

  Generic:

   - Support for responding to signals during page faults; introduces
     new FOLL_INTERRUPTIBLE flag that was reviewed by mm folks

  Selftests:

   - Fix an inverted check in the access tracking perf test, and restore
     support for asserting that there aren't too many idle pages when
     running on bare metal.

   - Fix build errors that occur in certain setups (unsure exactly what
     is unique about the problematic setup) due to glibc overriding
     static_assert() to a variant that requires a custom message.

   - Introduce actual atomics for clear/set_bit() in selftests

   - Add support for pinning vCPUs in dirty_log_perf_test.

   - Rename the so called "perf_util" framework to "memstress".

   - Add a lightweight psuedo RNG for guest use, and use it to randomize
     the access pattern and write vs. read percentage in the memstress
     tests.

   - Add a common ucall implementation; code dedup and pre-work for
     running SEV (and beyond) guests in selftests.

   - Provide a common constructor and arch hook, which will eventually
     be used by x86 to automatically select the right hypercall (AMD vs.
     Intel).

   - A bunch of added/enabled/fixed selftests for ARM64, covering
     memslots, breakpoints, stage-2 faults and access tracking.

   - x86-specific selftest changes:

      - Clean up x86's page table management.

      - Clean up and enhance the "smaller maxphyaddr" test, and add a
        related test to cover generic emulation failure.

      - Clean up the nEPT support checks.

      - Add X86_PROPERTY_* framework to retrieve multi-bit CPUID values.

      - Fix an ordering issue in the AMX test introduced by recent
        conversions to use kvm_cpu_has(), and harden the code to guard
        against similar bugs in the future. Anything that tiggers
        caching of KVM's supported CPUID, kvm_cpu_has() in this case,
        effectively hides opt-in XSAVE features if the caching occurs
        before the test opts in via prctl().

  Documentation:

   - Remove deleted ioctls from documentation

   - Clean up the docs for the x86 MSR filter.

   - Various fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (361 commits)
  KVM: x86: Add proper ReST tables for userspace MSR exits/flags
  KVM: selftests: Allocate ucall pool from MEM_REGION_DATA
  KVM: arm64: selftests: Align VA space allocator with TTBR0
  KVM: arm64: Fix benign bug with incorrect use of VA_BITS
  KVM: arm64: PMU: Fix period computation for 64bit counters with 32bit overflow
  KVM: x86: Advertise that the SMM_CTL MSR is not supported
  KVM: x86: remove unnecessary exports
  KVM: selftests: Fix spelling mistake "probabalistic" -> "probabilistic"
  tools: KVM: selftests: Convert clear/set_bit() to actual atomics
  tools: Drop "atomic_" prefix from atomic test_and_set_bit()
  tools: Drop conflicting non-atomic test_and_{clear,set}_bit() helpers
  KVM: selftests: Use non-atomic clear/set bit helpers in KVM tests
  perf tools: Use dedicated non-atomic clear/set bit helpers
  tools: Take @bit as an "unsigned long" in {clear,set}_bit() helpers
  KVM: arm64: selftests: Enable single-step without a "full" ucall()
  KVM: x86: fix APICv/x2AVIC disabled when vm reboot by itself
  KVM: Remove stale comment about KVM_REQ_UNHALT
  KVM: Add missing arch for KVM_CREATE_DEVICE and KVM_{SET,GET}_DEVICE_ATTR
  KVM: Reference to kvm_userspace_memory_region in doc and comments
  KVM: Delete all references to removed KVM_SET_MEMORY_ALIAS ioctl
  ...
2022-12-15 11:12:21 -08:00
Dave Hansen
3e844d842d x86/mm: Ensure forced page table splitting
There are a few kernel users like kfence that require 4k pages to work
correctly and do not support large mappings.  They use set_memory_4k()
to break down those large mappings.

That, in turn relies on cpa_data->force_split option to indicate to
set_memory code that it should split page tables regardless of whether
the need to be.

But, a recent change added an optimization which would return early
if a set_memory request came in that did not change permissions.  It
did not consult ->force_split and would mistakenly optimize away the
splitting that set_memory_4k() needs.  This broke kfence.

Skip the same-permission optimization when ->force_split is set.

Fixes: 127960a05548 ("x86/mm: Inhibit _PAGE_NX changes from cpa_process_alias()")
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Marco Elver <elver@google.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/CA+G9fYuFxZTxkeS35VTZMXwQvohu73W3xbZ5NtjebsVvH6hCuA@mail.gmail.com/
2022-12-15 10:37:28 -08:00
Sean Christopherson
1cfaac2400 x86/kasan: Populate shadow for shared chunk of the CPU entry area
Popuplate the shadow for the shared portion of the CPU entry area, i.e.
the read-only IDT mapping, during KASAN initialization.  A recent change
modified KASAN to map the per-CPU areas on-demand, but forgot to keep a
shadow for the common area that is shared amongst all CPUs.

Map the common area in KASAN init instead of letting idt_map_in_cea() do
the dirty work so that it Just Works in the unlikely event more shared
data is shoved into the CPU entry area.

The bug manifests as a not-present #PF when software attempts to lookup
an IDT entry, e.g. when KVM is handling IRQs on Intel CPUs (KVM performs
direct CALL to the IRQ handler to avoid the overhead of INTn):

 BUG: unable to handle page fault for address: fffffbc0000001d8
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 16c03a067 P4D 16c03a067 PUD 0
 Oops: 0000 [#1] PREEMPT SMP KASAN
 CPU: 5 PID: 901 Comm: repro Tainted: G        W          6.1.0-rc3+ #410
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
 RIP: 0010:kasan_check_range+0xdf/0x190
  vmx_handle_exit_irqoff+0x152/0x290 [kvm_intel]
  vcpu_run+0x1d89/0x2bd0 [kvm]
  kvm_arch_vcpu_ioctl_run+0x3ce/0xa70 [kvm]
  kvm_vcpu_ioctl+0x349/0x900 [kvm]
  __x64_sys_ioctl+0xb8/0xf0
  do_syscall_64+0x2b/0x50
  entry_SYSCALL_64_after_hwframe+0x46/0xb0

Fixes: 9fd429c28073 ("x86/kasan: Map shadow for percpu pages on demand")
Reported-by: syzbot+8cdd16fd5a6c0565e227@syzkaller.appspotmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221110203504.1985010-6-seanjc@google.com
2022-12-15 10:37:28 -08:00
Sean Christopherson
bde258d974 x86/kasan: Add helpers to align shadow addresses up and down
Add helpers to dedup code for aligning shadow address up/down to page
boundaries when translating an address to its shadow.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Link: https://lkml.kernel.org/r/20221110203504.1985010-5-seanjc@google.com
2022-12-15 10:37:28 -08:00
Sean Christopherson
7077d2ccb9 x86/kasan: Rename local CPU_ENTRY_AREA variables to shorten names
Rename the CPU entry area variables in kasan_init() to shorten their
names, a future fix will reference the beginning of the per-CPU portion
of the CPU entry area, and shadow_cpu_entry_per_cpu_begin is a bit much.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Link: https://lkml.kernel.org/r/20221110203504.1985010-4-seanjc@google.com
2022-12-15 10:37:28 -08:00
Sean Christopherson
97650148a1 x86/mm: Populate KASAN shadow for entire per-CPU range of CPU entry area
Populate a KASAN shadow for the entire possible per-CPU range of the CPU
entry area instead of requiring that each individual chunk map a shadow.
Mapping shadows individually is error prone, e.g. the per-CPU GDT mapping
was left behind, which can lead to not-present page faults during KASAN
validation if the kernel performs a software lookup into the GDT.  The DS
buffer is also likely affected.

The motivation for mapping the per-CPU areas on-demand was to avoid
mapping the entire 512GiB range that's reserved for the CPU entry area,
shaving a few bytes by not creating shadows for potentially unused memory
was not a goal.

The bug is most easily reproduced by doing a sigreturn with a garbage
CS in the sigcontext, e.g.

  int main(void)
  {
    struct sigcontext regs;

    syscall(__NR_mmap, 0x1ffff000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul);
    syscall(__NR_mmap, 0x20000000ul, 0x1000000ul, 7ul, 0x32ul, -1, 0ul);
    syscall(__NR_mmap, 0x21000000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul);

    memset(&regs, 0, sizeof(regs));
    regs.cs = 0x1d0;
    syscall(__NR_rt_sigreturn);
    return 0;
  }

to coerce the kernel into doing a GDT lookup to compute CS.base when
reading the instruction bytes on the subsequent #GP to determine whether
or not the #GP is something the kernel should handle, e.g. to fixup UMIP
violations or to emulate CLI/STI for IOPL=3 applications.

  BUG: unable to handle page fault for address: fffffbc8379ace00
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 16c03a067 P4D 16c03a067 PUD 15b990067 PMD 15b98f067 PTE 0
  Oops: 0000 [#1] PREEMPT SMP KASAN
  CPU: 3 PID: 851 Comm: r2 Not tainted 6.1.0-rc3-next-20221103+ #432
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:kasan_check_range+0xdf/0x190
  Call Trace:
   <TASK>
   get_desc+0xb0/0x1d0
   insn_get_seg_base+0x104/0x270
   insn_fetch_from_user+0x66/0x80
   fixup_umip_exception+0xb1/0x530
   exc_general_protection+0x181/0x210
   asm_exc_general_protection+0x22/0x30
  RIP: 0003:0x0
  Code: Unable to access opcode bytes at 0xffffffffffffffd6.
  RSP: 0003:0000000000000000 EFLAGS: 00000202
  RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000000001d0
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
  RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
  R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
   </TASK>

Fixes: 9fd429c28073 ("x86/kasan: Map shadow for percpu pages on demand")
Reported-by: syzbot+ffb4f000dc2872c93f62@syzkaller.appspotmail.com
Suggested-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Link: https://lkml.kernel.org/r/20221110203504.1985010-3-seanjc@google.com
2022-12-15 10:37:28 -08:00
Sean Christopherson
80d72a8f76 x86/mm: Recompute physical address for every page of per-CPU CEA mapping
Recompute the physical address for each per-CPU page in the CPU entry
area, a recent commit inadvertantly modified cea_map_percpu_pages() such
that every PTE is mapped to the physical address of the first page.

Fixes: 9fd429c28073 ("x86/kasan: Map shadow for percpu pages on demand")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Link: https://lkml.kernel.org/r/20221110203504.1985010-2-seanjc@google.com
2022-12-15 10:37:28 -08:00
Peter Zijlstra
e996365ee7 x86/mm: Rename __change_page_attr_set_clr(.checkalias)
Now that the checkalias functionality is taken by CPA_NO_CHECK_ALIAS
rename the argument to better match is remaining purpose: primary,
matching __change_page_attr().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221110125544.661001508%40infradead.org
2022-12-15 10:37:28 -08:00
Peter Zijlstra
d597416683 x86/mm: Inhibit _PAGE_NX changes from cpa_process_alias()
There is a cludge in change_page_attr_set_clr() that inhibits
propagating NX changes to the aliases (directmap and highmap) -- this
is a cludge twofold:

 - it also inhibits the primary checks in __change_page_attr();
 - it hard depends on single bit changes.

The introduction of set_memory_rox() triggered this last issue for
clearing both _PAGE_RW and _PAGE_NX.

Explicitly ignore _PAGE_NX in cpa_process_alias() instead.

Fixes: b38994948567 ("x86/mm: Implement native set_memory_rox()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Debugged-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221110125544.594991716%40infradead.org
2022-12-15 10:37:28 -08:00
Peter Zijlstra
ef9ab81af6 x86/mm: Untangle __change_page_attr_set_clr(.checkalias)
The .checkalias argument to __change_page_attr_set_clr() is overloaded
and serves two different purposes:

 - it inhibits the call to cpa_process_alias() -- as suggested by the
   name; however,

 - it also serves as 'primary' indicator for __change_page_attr()
   ( which in turn also serves as a recursion terminator for
     cpa_process_alias() ).

Untangle these by extending the use of CPA_NO_CHECK_ALIAS to all
callsites that currently use .checkalias=0 for this purpose.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221110125544.527267183%40infradead.org
2022-12-15 10:37:28 -08:00
Peter Zijlstra
5ceeee7571 x86/mm: Add a few comments
It's a shame to hide useful comments in Changelogs, add some to the
code.

Shamelessly stolen from commit:

  c40a56a781 ("x86/mm/init: Remove freed kernel image areas from alias mapping")

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221110125544.460677011%40infradead.org
2022-12-15 10:37:28 -08:00
Kirill A. Shutemov
80d0969aa7 x86/mm: Fix CR3_ADDR_MASK
The mask must not include bits above physical address mask. These bits
are reserved and can be used for other things. Bits 61 and 62 are used
for Linear Address Masking.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Alexander Potapenko <glider@google.com>
Link: https://lore.kernel.org/all/20221109165140.9137-2-kirill.shutemov%40linux.intel.com
2022-12-15 10:37:28 -08:00
Pasha Tatashin
82328227db x86/mm: Remove P*D_PAGE_MASK and P*D_PAGE_SIZE macros
Other architectures and the common mm/ use P*D_MASK, and P*D_SIZE.
Remove the duplicated P*D_PAGE_MASK and P*D_PAGE_SIZE which are only
used in x86/*.

Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/20220516185202.604654-1-tatashin@google.com
2022-12-15 10:37:27 -08:00
Peter Zijlstra
2dff2c359e mm: Convert __HAVE_ARCH_P..P_GET to the new style
Since __HAVE_ARCH_* style guards have been depricated in favour of
defining the function name onto itself, convert pxxp_get().

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/Y2EUEBlQXNgaJgoI@hirez.programming.kicks-ass.net
2022-12-15 10:37:27 -08:00
Peter Zijlstra
eb780dcae0 mm: Remove pointless barrier() after pmdp_get_lockless()
pmdp_get_lockless() should itself imply any ordering required.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221022114425.298833095%40infradead.org
2022-12-15 10:37:27 -08:00
Peter Zijlstra
d4a72e7fe6 x86/mm/pae: Get rid of set_64bit()
Recognise that set_64bit() is a special case of our previously
introduced pxx_xchg64(), so use that and get rid of set_64bit().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221022114425.233481884%40infradead.org
2022-12-15 10:37:27 -08:00
Peter Zijlstra
9ee850acd2 x86_64: Remove pointless set_64bit() usage
The use of set_64bit() in X86_64 only code is pretty pointless, seeing
how it's a direct assignment. Remove all this nonsense.

[nathanchance: unbreak irte]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221022114425.168036718%40infradead.org
2022-12-15 10:37:27 -08:00
Peter Zijlstra
b7301f2010 x86/mm/pae: Be consistent with pXXp_get_and_clear()
Given that ptep_get_and_clear() uses cmpxchg8b, and that should be by
far the most common case, there's no point in having an optimized
variant for pmd/pud.

Introduce the pxx_xchg64() helper to implement the common logic once.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221022114425.103392961%40infradead.org
2022-12-15 10:37:27 -08:00
Peter Zijlstra
f7bcd4617d x86/mm/pae: Use WRITE_ONCE()
Disallow write-tearing, that would be really unfortunate.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221022114425.038102604%40infradead.org
2022-12-15 10:37:27 -08:00
Peter Zijlstra
7a9b8bdb6a x86/mm/pae: Don't (ab)use atomic64
PAE implies CX8, write readable code.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221022114424.971450128%40infradead.org
2022-12-15 10:37:27 -08:00
Peter Zijlstra
1180e732c9 mm/gup: Fix the lockless PMD access
On architectures where the PTE/PMD is larger than the native word size
(i386-PAE for example), READ_ONCE() can do the wrong thing. Use
pmdp_get_lockless() just like we use ptep_get_lockless().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221022114424.906110403%40infradead.org
2022-12-15 10:37:27 -08:00
Peter Zijlstra
dab6e71742 mm: Rename pmd_read_atomic()
There's no point in having the identical routines for PTE/PMD have
different names.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221022114424.841277397%40infradead.org
2022-12-15 10:37:27 -08:00
Peter Zijlstra
6ca297d478 mm: Rename GUP_GET_PTE_LOW_HIGH
Since it no longer applies to only PTEs, rename it to PXX.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221022114424.776404066%40infradead.org
2022-12-15 10:37:27 -08:00
Peter Zijlstra
024d232ae4 mm: Fix pmd_read_atomic()
AFAICT there's no reason to do anything different than what we do for
PTEs. Make it so (also affects SH).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221022114424.711181252%40infradead.org
2022-12-15 10:37:27 -08:00
Peter Zijlstra
0862ff059c sh/mm: Make pmd_t similar to pte_t
Just like 64bit pte_t, have a low/high split in pmd_t.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221022114424.645657294%40infradead.org
2022-12-15 10:37:27 -08:00
Peter Zijlstra
fbfdec9989 x86/mm/pae: Make pmd_t similar to pte_t
Instead of mucking about with at least 2 different ways of fudging
it, do the same thing we do for pte_t.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221022114424.580310787%40infradead.org
2022-12-15 10:37:27 -08:00
Peter Zijlstra
93b3037a14 mm: Update ptep_get_lockless()'s comment
Improve the comment.

Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221022114424.515572025%40infradead.org
2022-12-15 10:37:27 -08:00
Peter Zijlstra
60463628c9 x86/mm: Implement native set_memory_rox()
Provide a native implementation of set_memory_rox(), avoiding the
double set_memory_ro();set_memory_x(); calls.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2022-12-15 10:37:27 -08:00
Peter Zijlstra
d48567c9a0 mm: Introduce set_memory_rox()
Because endlessly repeating:

	set_memory_ro()
	set_memory_x()

is getting tedious.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/Y1jek64pXOsougmz@hirez.programming.kicks-ass.net
2022-12-15 10:37:26 -08:00
Peter Zijlstra
414ebf148c x86/mm: Do verify W^X at boot up
Straight up revert of commit:

  a970174d7a ("x86/mm: Do not verify W^X at boot up")

now that the root cause has been fixed.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221025201058.011279208@infradead.org
2022-12-15 10:37:26 -08:00
Peter Zijlstra
eb7d389d5b x86/ftrace: Remove SYSTEM_BOOTING exceptions
Now that text_poke is available before ftrace, remove the
SYSTEM_BOOTING exceptions.

Specifically, this cures a W+X case during boot.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221025201057.945960823@infradead.org
2022-12-15 10:37:26 -08:00
Peter Zijlstra
5b93a83649 x86/mm: Initialize text poking earlier
Move poking_init() up a bunch; specifically move it right after
mm_init() which is right before ftrace_init().

This will allow simplifying ftrace text poking which currently has
a bunch of exceptions for early boot.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221025201057.881703081@infradead.org
2022-12-15 10:37:26 -08:00
Peter Zijlstra
3f4c8211d9 x86/mm: Use mm_alloc() in poking_init()
Instead of duplicating init_mm, allocate a fresh mm. The advantage is
that mm_alloc() has much simpler dependencies. Additionally it makes
more conceptual sense, init_mm has no (and must not have) user state
to duplicate.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221025201057.816175235@infradead.org
2022-12-15 10:37:26 -08:00
Peter Zijlstra
af80602799 mm: Move mm_cachep initialization to mm_init()
In order to allow using mm_alloc() much earlier, move initializing
mm_cachep into mm_init().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221025201057.751153381@infradead.org
2022-12-15 10:37:26 -08:00
Peter Zijlstra
97e3d26b5e x86/mm: Randomize per-cpu entry area
Seth found that the CPU-entry-area; the piece of per-cpu data that is
mapped into the userspace page-tables for kPTI is not subject to any
randomization -- irrespective of kASLR settings.

On x86_64 a whole P4D (512 GB) of virtual address space is reserved for
this structure, which is plenty large enough to randomize things a
little.

As such, use a straight forward randomization scheme that avoids
duplicates to spread the existing CPUs over the available space.

  [ bp: Fix le build. ]

Reported-by: Seth Jenkins <sethjenkins@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-12-15 10:37:26 -08:00