linux

iv/linux

Author	SHA1	Message	Date
Jason Gunthorpe	17bad52708	iommufd: Add enforced_cache_coherency to iommufd_hw_pagetable_alloc() Logically the HWPT should have the coherency set properly for the device that it is being created for when it is created. This was happening implicitly if the immediate_attach was set because iommufd_hw_pagetable_attach() does it as the first thing. Do it unconditionally so !immediate_attach works properly. Link: https://lore.kernel.org/r/9-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-07-26 10:19:52 -03:00
Jason Gunthorpe	d03f1336fd	iommufd: Move putting a hwpt to a helper function Next patch will need to call this from two places. Link: https://lore.kernel.org/r/8-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-07-26 10:19:47 -03:00
Jason Gunthorpe	1d149ab2e0	iommufd: Make sw_msi_start a group global The sw_msi_start is only set by the ARM drivers and it is always constant. Due to the way vfio/iommufd allow domains to be re-used between devices we have a built in assumption that there is only one value for sw_msi_start and it is global to the system. To make replace simpler where we may not reparse the iommu_get_resv_regions() move the sw_msi_start to the iommufd_group so it is always available once any HWPT has been attached. Link: https://lore.kernel.org/r/7-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-07-26 10:19:42 -03:00
Jason Gunthorpe	269c5238c5	iommufd: Use the iommufd_group to avoid duplicate MSI setup This only needs to be done once per group, not once per device. The once per device was a way to make the device list work. Since we are abandoning this we can optimize things a bit. Link: https://lore.kernel.org/r/6-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-07-26 10:19:37 -03:00
Jason Gunthorpe	34f327a985	iommufd: Keep track of each device's reserved regions instead of groups The driver facing API in the iommu core makes the reserved regions per-device. An algorithm in the core code consolidates the regions of all the devices in a group to return the group view. To allow for devices to be hotplugged into the group iommufd would re-load the entire group's reserved regions for each device, just in case they changed. Further iommufd already has to deal with duplicated/overlapping reserved regions as it must union all the groups together. Thus simplify all of this to just use the device reserved regions interface directly from the iommu driver. Link: https://lore.kernel.org/r/5-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.com Suggested-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-07-26 10:19:32 -03:00
Jason Gunthorpe	91a2e17e24	iommufd: Replace the hwpt->devices list with iommufd_group The devices list was used as a simple way to avoid having per-group information. Now that this seems to be unavoidable, just commit to per-group information fully and remove the devices list from the HWPT. The iommufd_group stores the currently assigned HWPT for the entire group and we can manage the per-device attach/detach with a list in the iommufd_group. For destruction the flow is organized to make the following patches easier, the actual call to iommufd_object_destroy_user() is done at the top of the call chain without holding any locks. The HWPT to be destroyed is returned out from the locked region to make this possible. Later patches create locking that requires this. Link: https://lore.kernel.org/r/3-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-07-26 10:19:22 -03:00
Jason Gunthorpe	3a3329a7f1	iommufd: Add iommufd_group When the hwpt to device attachment is fairly static we could get away with the simple approach of keeping track of the groups via a device list. But with replace this is infeasible. Add an automatically managed struct that is 1:1 with the iommu_group per-ictx so we can store the necessary tracking information there. Link: https://lore.kernel.org/r/2-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-07-26 10:19:17 -03:00
Jason Gunthorpe	d525a5b8cf	iommufd: Move isolated msi enforcement to iommufd_device_bind() With the recent rework this no longer needs to be done at domain attachment time, we know if the device is usable by iommufd when we bind it. The value of msi_device_has_isolated_msi() is not allowed to change while a driver is bound. Link: https://lore.kernel.org/r/1-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-07-26 10:16:43 -03:00
Yi Liu	c1cce6d079	vfio: Compile vfio_group infrastructure optionally vfio_group is not needed for vfio device cdev, so with vfio device cdev introduced, the vfio_group infrastructures can be compiled out if only cdev is needed. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Yanting Jiang <yanting.jiang@intel.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230718135551.6592-26-yi.l.liu@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2023-07-25 10:20:50 -06:00
Yi Liu	1c9dc07487	iommufd: Add iommufd_ctx_from_fd() It's common to get a reference to the iommufd context from a given file descriptor. So adds an API for it. Existing users of this API are compiled only when IOMMUFD is enabled, so no need to have a stub for the IOMMUFD disabled case. Tested-by: Yanting Jiang <yanting.jiang@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230718135551.6592-21-yi.l.liu@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2023-07-25 10:19:53 -06:00
Nicolin Chen	e23a6217f3	iommufd/device: Add iommufd_access_detach() API Previously, the detach routine is only done by the destroy(). And it was called by vfio_iommufd_emulated_unbind() when the device runs close(), so all the mappings in iopt were cleaned in that setup, when the call trace reaches this detach() routine. Now, there's a need of a detach uAPI, meaning that it does not only need a new iommufd_access_detach() API, but also requires access->ops->unmap() call as a cleanup. So add one. However, leaving that unprotected can introduce some potential of a race condition during the pin_/unpin_pages() call, where access->ioas->iopt is getting referenced. So, add an ioas_lock to protect the context of iopt referencings. Also, to allow the iommufd_access_unpin_pages() callback to happen via this unmap() call, add an ioas_unpin pointer, so the unpin routine won't be affected by the "access->ioas = NULL" trick. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Yanting Jiang <yanting.jiang@intel.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230718135551.6592-15-yi.l.liu@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2023-07-25 10:19:14 -06:00
Yi Liu	78d3df457a	iommufd: Add helper to retrieve iommufd_ctx and devid This is needed by the vfio-pci driver to report affected devices in the hot-reset for a given device. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Yanting Jiang <yanting.jiang@intel.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230718105542.4138-6-yi.l.liu@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2023-07-25 10:17:55 -06:00
Yi Liu	86b0a96c29	iommufd: Add iommufd_ctx_has_group() This adds the helper to check if any device within the given iommu_group has been bound with the iommufd_ctx. This is helpful for the checking on device ownership for the devices which have not been bound but cannot be bound to any other iommufd_ctx as the iommu_group has been bound. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Yanting Jiang <yanting.jiang@intel.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230718105542.4138-5-yi.l.liu@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2023-07-25 10:17:52 -06:00
Yi Liu	eda175dfe2	iommufd: Reserve all negative IDs in the iommufd xarray With this reservation, IOMMUFD users can encode the negative IDs for specific purposes. e.g. VFIO needs two reserved values to tell userspace the ID returned is not valid but has other meaning. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Yanting Jiang <yanting.jiang@intel.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230718105542.4138-4-yi.l.liu@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2023-07-25 10:17:48 -06:00
Linus Torvalds	31929ae008	iommufd for 6.5 Just two RC syzkaller fixes, both for the same basic issue, using the area pointer during an access forced unmap while the locks protecting it were let go. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZJmGygAKCRCFwuHvBreF YVNSAQC7SgejTvwD6EYXr8AUDko1v0G0M/o60OrWIuC7xWiFPQD/RDwtItRLzf4h i+YCfMtn/7IB/uV/sRTF4m0HzudcDAM= =0fm4 -----END PGP SIGNATURE----- Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "Just two syzkaller fixes, both for the same basic issue: using the area pointer during an access forced unmap while the locks protecting it were let go" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: iommufd: Call iopt_area_contig_done() under the lock iommufd: Do not access the area pointer after unlocking	2023-06-29 20:57:27 -07:00
Jason Gunthorpe	dbe245cdf5	iommufd: Call iopt_area_contig_done() under the lock The iter internally holds a pointer to the area and iopt_area_contig_done() will dereference it. The pointer is not valid outside the iova_rwsem. syzkaller reports: BUG: KASAN: slab-use-after-free in iommufd_access_unpin_pages+0x363/0x370 Read of size 8 at addr ffff888022286e20 by task syz-executor669/5771 CPU: 0 PID: 5771 Comm: syz-executor669 Not tainted 6.4.0-rc5-syzkaller-00313-g4c605260bc60 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/25/2023 Call Trace: <TASK> dump_stack_lvl+0xd9/0x150 print_address_description.constprop.0+0x2c/0x3c0 kasan_report+0x11c/0x130 iommufd_access_unpin_pages+0x363/0x370 iommufd_test_access_unmap+0x24b/0x390 iommufd_access_notify_unmap+0x24c/0x3a0 iopt_unmap_iova_range+0x4c4/0x5f0 iopt_unmap_all+0x27/0x50 iommufd_ioas_unmap+0x3d0/0x490 iommufd_fops_ioctl+0x317/0x4b0 __x64_sys_ioctl+0x197/0x210 do_syscall_64+0x39/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fec1dae3b19 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 11 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fec1da74308 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007fec1db6b438 RCX: 00007fec1dae3b19 RDX: 0000000020000100 RSI: 0000000000003b86 RDI: 0000000000000003 RBP: 00007fec1db6b430 R08: 00007fec1da74700 R09: 0000000000000000 R10: 00007fec1da74700 R11: 0000000000000246 R12: 00007fec1db6b43c R13: 00007fec1db39074 R14: 6d6f692f7665642f R15: 0000000000022000 </TASK> Allocated by task 5770: kasan_save_stack+0x22/0x40 kasan_set_track+0x25/0x30 __kasan_kmalloc+0xa2/0xb0 iopt_alloc_area_pages+0x94/0x560 iopt_map_user_pages+0x205/0x4e0 iommufd_ioas_map+0x329/0x5f0 iommufd_fops_ioctl+0x317/0x4b0 __x64_sys_ioctl+0x197/0x210 do_syscall_64+0x39/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd Freed by task 5770: kasan_save_stack+0x22/0x40 kasan_set_track+0x25/0x30 kasan_save_free_info+0x2e/0x40 ____kasan_slab_free+0x160/0x1c0 slab_free_freelist_hook+0x8b/0x1c0 __kmem_cache_free+0xaf/0x2d0 iopt_unmap_iova_range+0x288/0x5f0 iopt_unmap_all+0x27/0x50 iommufd_ioas_unmap+0x3d0/0x490 iommufd_fops_ioctl+0x317/0x4b0 __x64_sys_ioctl+0x197/0x210 do_syscall_64+0x39/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd The parallel unmap free'd iter->area the instant the lock was released. Fixes: `51fe6141f0` ("iommufd: Data structure to provide IOVA to PFN mapping") Link: https://lore.kernel.org/r/2-v2-9a03761d445d+54-iommufd_syz2_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reported-by: syzbot+6c8d756f238a75fc3eb8@syzkaller.appspotmail.com Closes: https://lore.kernel.org/r/000000000000905eba05fe38e9f2@google.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-06-26 09:00:23 -03:00
Jason Gunthorpe	804ca14d04	iommufd: Do not access the area pointer after unlocking A concurrent unmap can trigger freeing of the area pointers while we are generating an unmapping notification for accesses. syzkaller reports: BUG: KASAN: slab-use-after-free in iopt_unmap_iova_range+0x5ba/0x5f0 Read of size 4 at addr ffff888075996184 by task syz-executor.2/31160 CPU: 1 PID: 31160 Comm: syz-executor.2 Not tainted 6.4.0-rc5-syzkaller-00313-g4c605260bc60 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/25/2023 Call Trace: <TASK> dump_stack_lvl+0xd9/0x150 print_address_description.constprop.0+0x2c/0x3c0 kasan_report+0x11c/0x130 iopt_unmap_iova_range+0x5ba/0x5f0 iopt_unmap_all+0x27/0x50 iommufd_ioas_unmap+0x3d0/0x490 iommufd_fops_ioctl+0x317/0x4b0 __x64_sys_ioctl+0x197/0x210 do_syscall_64+0x39/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f0812c8c169 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f0813914168 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007f0812dabf80 RCX: 00007f0812c8c169 RDX: 0000000020000100 RSI: 0000000000003b86 RDI: 0000000000000005 RBP: 00007f0812ce7ca1 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007f0812ecfb1f R14: 00007f0813914300 R15: 0000000000022000 </TASK> Allocated by task 31160: kasan_save_stack+0x22/0x40 kasan_set_track+0x25/0x30 __kasan_kmalloc+0xa2/0xb0 iopt_alloc_area_pages+0x94/0x560 iopt_map_user_pages+0x205/0x4e0 iommufd_ioas_map+0x329/0x5f0 iommufd_fops_ioctl+0x317/0x4b0 __x64_sys_ioctl+0x197/0x210 do_syscall_64+0x39/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd Freed by task 31161: kasan_save_stack+0x22/0x40 kasan_set_track+0x25/0x30 kasan_save_free_info+0x2e/0x40 ____kasan_slab_free+0x160/0x1c0 slab_free_freelist_hook+0x8b/0x1c0 __kmem_cache_free+0xaf/0x2d0 iopt_unmap_iova_range+0x288/0x5f0 iopt_unmap_all+0x27/0x50 iommufd_ioas_unmap+0x3d0/0x490 iommufd_fops_ioctl+0x317/0x4b0 __x64_sys_ioctl+0x197/0x210 do_syscall_64+0x39/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd The buggy address belongs to the object at ffff888075996100 which belongs to the cache kmalloc-cg-192 of size 192 The buggy address is located 132 bytes inside of freed 192-byte region [ffff888075996100, ffff8880759961c0) The buggy address belongs to the physical page: page:ffffea0001d66580 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x75996 memcg:ffff88801f1c2701 flags: 0xfff00000000200(slab\|node=0\|zone=1\|lastcpupid=0x7ff) page_type: 0xffffffff() raw: 00fff00000000200 ffff88801244ddc0 dead000000000122 0000000000000000 raw: 0000000000000000 0000000080100010 00000001ffffffff ffff88801f1c2701 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 0, migratetype Unmovable, gfp_mask 0x112cc0(GFP_USER\|__GFP_NOWARN\|__GFP_NORETRY), pid 31157, tgid 31154 (syz-executor.0), ts 1984547323469, free_ts 1983933451331 post_alloc_hook+0x2db/0x350 get_page_from_freelist+0xf41/0x2c00 __alloc_pages+0x1cb/0x4a0 alloc_pages+0x1aa/0x270 allocate_slab+0x25f/0x390 ___slab_alloc+0xa91/0x1400 __slab_alloc.constprop.0+0x56/0xa0 __kmem_cache_alloc_node+0x136/0x320 kmalloc_trace+0x26/0xe0 iommufd_test+0x1328/0x2c20 iommufd_fops_ioctl+0x317/0x4b0 __x64_sys_ioctl+0x197/0x210 do_syscall_64+0x39/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd page last free stack trace: free_unref_page_prepare+0x62e/0xcb0 free_unref_page_list+0xe3/0xa70 release_pages+0xcd8/0x1380 tlb_batch_pages_flush+0xa8/0x1a0 tlb_finish_mmu+0x14b/0x7e0 exit_mmap+0x2b2/0x930 __mmput+0x128/0x4c0 mmput+0x60/0x70 do_exit+0x9b0/0x29b0 do_group_exit+0xd4/0x2a0 get_signal+0x2318/0x25b0 arch_do_signal_or_restart+0x79/0x5c0 exit_to_user_mode_prepare+0x11f/0x240 syscall_exit_to_user_mode+0x1d/0x50 do_syscall_64+0x46/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd Precompute what is needed to call the access function and do not check the area's num_accesses again as the pointer may not be valid anymore. Use a counter instead. Fixes: `51fe6141f0` ("iommufd: Data structure to provide IOVA to PFN mapping") Link: https://lore.kernel.org/r/1-v2-9a03761d445d+54-iommufd_syz2_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reported-by: syzbot+1ad12d16afca0e7d2dde@syzkaller.appspotmail.com Closes: https://lore.kernel.org/r/0000000000001d40fc05fe385332@google.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-06-26 09:00:23 -03:00
Lorenzo Stoakes	0b295316b3	mm/gup: remove unused vmas parameter from pin_user_pages_remote() No invocation of pin_user_pages_remote() uses the vmas parameter, so remove it. This forms part of a larger patch set eliminating the use of the vmas parameters altogether. Link: https://lkml.kernel.org/r/28f000beb81e45bf538a2aaa77c90f5482b67a32.1684350871.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian König <christian.koenig@amd.com> Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-06-09 16:25:25 -07:00
Jason Gunthorpe	692d42d411	Merge branch 'iommufd/for-rc' into for-next The following selftest patch requires both the bug fixes and the improvements of the selftest framework. * iommufd/for-rc: iommufd: Do not corrupt the pfn list when doing batch carry iommufd: Fix unpinning of pages when an access is present iommufd: Check for uptr overflow Linux 6.3-rc5 Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-04-04 11:04:30 -03:00
Tom Rix	c52159b5be	iommufd/selftest: Set varaiable mock_iommu_device storage-class-specifier to static smatch reports: drivers/iommu/iommufd/selftest.c:295:21: warning: symbol 'mock_iommu_device' was not declared. Should it be static? This variable is only used in one file so it should be static. Fixes: `65c619ae06` ("iommufd/selftest: Make selftest create a more complete mock device") Link: https://lore.kernel.org/r/20230404002317.1912530-1-trix@redhat.com Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-04-04 11:02:39 -03:00
Jason Gunthorpe	13a0d1ae7e	iommufd: Do not corrupt the pfn list when doing batch carry If batch->end is 0 then setting npfns[0] before computing the new value of pfns will fail to adjust the pfn and result in various page accounting corruptions. It should be ordered after. This seems to result in various kinds of page meta-data corruption related failures: WARNING: CPU: 1 PID: 527 at mm/gup.c:75 try_grab_folio+0x503/0x740 Modules linked in: CPU: 1 PID: 527 Comm: repro Not tainted 6.3.0-rc2-eeac8ede1755+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:try_grab_folio+0x503/0x740 Code: e3 01 48 89 de e8 6d c1 dd ff 48 85 db 0f 84 7c fe ff ff e8 4f bf dd ff 49 8d 47 ff 48 89 45 d0 e9 73 fe ff ff e8 3d bf dd ff <0f> 0b 31 db e9 d0 fc ff ff e8 2f bf dd ff 48 8b 5d c8 31 ff 48 89 RSP: 0018:ffffc90000f37908 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 00000000fffffc02 RCX: ffffffff81504c26 RDX: 0000000000000000 RSI: ffff88800d030000 RDI: 0000000000000002 RBP: ffffc90000f37948 R08: 000000000003ca24 R09: 0000000000000008 R10: 000000000003ca00 R11: 0000000000000023 R12: ffffea000035d540 R13: 0000000000000001 R14: 0000000000000000 R15: ffffea000035d540 FS: 00007fecbf659740(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000200011c3 CR3: 000000000ef66006 CR4: 0000000000770ee0 PKRU: 55555554 Call Trace: <TASK> internal_get_user_pages_fast+0xd32/0x2200 pin_user_pages_fast+0x65/0x90 pfn_reader_user_pin+0x376/0x390 pfn_reader_next+0x14a/0x7b0 pfn_reader_first+0x140/0x1b0 iopt_area_fill_domain+0x74/0x210 iopt_table_add_domain+0x30e/0x6e0 iommufd_device_selftest_attach+0x7f/0x140 iommufd_test+0x10ff/0x16f0 iommufd_fops_ioctl+0x206/0x330 __x64_sys_ioctl+0x10e/0x160 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc Cc: <stable@vger.kernel.org> Fixes: `f394576eb1` ("iommufd: PFN handling for iopt_pages") Link: https://lore.kernel.org/r/3-v1-ceab6a4d7d7a+94-iommufd_syz_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reported-by: Pengfei Xu <pengfei.xu@intel.com> Tested-by: Pengfei Xu <pengfei.xu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-04-04 09:10:55 -03:00
Jason Gunthorpe	727c28c1ce	iommufd: Fix unpinning of pages when an access is present syzkaller found that the calculation of batch_last_index should use 'start_index' since at input to this function the batch is either empty or it has already been adjusted to cross any accesses so it will start at the point we are unmapping from. Getting this wrong causes the unmap to run over the end of the pages which corrupts pages that were never mapped. In most cases this triggers the num pinned debugging: WARNING: CPU: 0 PID: 557 at drivers/iommu/iommufd/pages.c:294 __iopt_area_unfill_domain+0x152/0x560 Modules linked in: CPU: 0 PID: 557 Comm: repro Not tainted 6.3.0-rc2-eeac8ede1755 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:__iopt_area_unfill_domain+0x152/0x560 Code: d2 0f ff 44 8b 64 24 54 48 8b 44 24 48 31 ff 44 89 e6 48 89 44 24 38 e8 fc d3 0f ff 45 85 e4 0f 85 eb 01 00 00 e8 0e d2 0f ff <0f> 0b e8 07 d2 0f ff 48 8b 44 24 38 89 5c 24 58 89 18 8b 44 24 54 RSP: 0018:ffffc9000108baf0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 00000000ffffffff RCX: ffffffff821e3f85 RDX: 0000000000000000 RSI: ffff88800faf0000 RDI: 0000000000000002 RBP: ffffc9000108bd18 R08: 000000000003ca25 R09: 0000000000000014 R10: 000000000003ca00 R11: 0000000000000024 R12: 0000000000000004 R13: 0000000000000801 R14: 00000000000007ff R15: 0000000000000800 FS: 00007f3499ce1740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000243 CR3: 00000000179c2001 CR4: 0000000000770ef0 PKRU: 55555554 Call Trace: <TASK> iopt_area_unfill_domain+0x32/0x40 iopt_table_remove_domain+0x23f/0x4c0 iommufd_device_selftest_detach+0x3a/0x90 iommufd_selftest_destroy+0x55/0x70 iommufd_object_destroy_user+0xce/0x130 iommufd_destroy+0xa2/0xc0 iommufd_fops_ioctl+0x206/0x330 __x64_sys_ioctl+0x10e/0x160 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc Also add some useful WARN_ON sanity checks. Cc: <stable@vger.kernel.org> Fixes: `8d160cd4d5` ("iommufd: Algorithms for PFN storage") Link: https://lore.kernel.org/r/2-v1-ceab6a4d7d7a+94-iommufd_syz_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reported-by: Pengfei Xu <pengfei.xu@intel.com> Tested-by: Pengfei Xu <pengfei.xu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-04-04 09:10:55 -03:00
Jason Gunthorpe	e439570133	iommufd: Check for uptr overflow syzkaller found that setting up a map with a user VA that wraps past zero can trigger WARN_ONs, particularly from pin_user_pages weirdly returning 0 due to invalid arguments. Prevent creating a pages with a uptr and size that would math overflow. WARNING: CPU: 0 PID: 518 at drivers/iommu/iommufd/pages.c:793 pfn_reader_user_pin+0x2e6/0x390 Modules linked in: CPU: 0 PID: 518 Comm: repro Not tainted 6.3.0-rc2-eeac8ede1755+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:pfn_reader_user_pin+0x2e6/0x390 Code: b1 11 e9 25 fe ff ff e8 28 e4 0f ff 31 ff 48 89 de e8 2e e6 0f ff 48 85 db 74 0a e8 14 e4 0f ff e9 4d ff ff ff e8 0a e4 0f ff <0f> 0b bb f2 ff ff ff e9 3c ff ff ff e8 f9 e3 0f ff ba 01 00 00 00 RSP: 0018:ffffc90000f9fa30 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff821e2b72 RDX: 0000000000000000 RSI: ffff888014184680 RDI: 0000000000000002 RBP: ffffc90000f9fa78 R08: 00000000000000ff R09: 0000000079de6f4e R10: ffffc90000f9f790 R11: ffff888014185418 R12: ffffc90000f9fc60 R13: 0000000000000002 R14: ffff888007879800 R15: 0000000000000000 FS: 00007f4227555740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000043 CR3: 000000000e748005 CR4: 0000000000770ef0 PKRU: 55555554 Call Trace: <TASK> pfn_reader_next+0x14a/0x7b0 ? interval_tree_double_span_iter_update+0x11a/0x140 pfn_reader_first+0x140/0x1b0 iopt_pages_rw_slow+0x71/0x280 ? __this_cpu_preempt_check+0x20/0x30 iopt_pages_rw_access+0x2b2/0x5b0 iommufd_access_rw+0x19f/0x2f0 iommufd_test+0xd11/0x16f0 ? write_comp_data+0x2f/0x90 iommufd_fops_ioctl+0x206/0x330 __x64_sys_ioctl+0x10e/0x160 ? __pfx_iommufd_fops_ioctl+0x10/0x10 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc Cc: <stable@vger.kernel.org> Fixes: `8d160cd4d5` ("iommufd: Algorithms for PFN storage") Link: https://lore.kernel.org/r/1-v1-ceab6a4d7d7a+94-iommufd_syz_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reported-by: Pengfei Xu <pengfei.xu@intel.com> Tested-by: Pengfei Xu <pengfei.xu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-04-04 09:10:55 -03:00
Jason Gunthorpe	9fdf791612	Merge branch 'vfio_mdev_ops' into iommufd.git for-next Yi Liu says =================== The .bind_iommufd op of vfio emulated devices are either empty or does nothing. This is different with the vfio physical devices, to add vfio device cdev, need to make them act the same. This series first makes the .bind_iommufd op of vfio emulated devices to create iommufd_access, this introduces a new iommufd API. Then let the driver that does not provide .bind_iommufd op to use the vfio emulated iommufd op set. This makes all vfio device drivers have consistent iommufd operations, which is good for adding new device uAPIs in the device cdev =================== * branch 'vfio_mdev_ops': vfio: Check the presence for iommufd callbacks in __vfio_register_dev() vfio/mdev: Uses the vfio emulated iommufd ops set in the mdev sample drivers vfio-iommufd: Make vfio_iommufd_emulated_bind() return iommufd_access ID vfio-iommufd: No need to record iommufd_ctx in vfio_device iommufd: Create access in vfio_iommufd_emulated_bind() iommu/iommufd: Pass iommufd_ctx pointer in iommufd_get_ioas() Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-03-31 13:43:57 -03:00
Yi Liu	632fda7f91	vfio-iommufd: Make vfio_iommufd_emulated_bind() return iommufd_access ID vfio device cdev needs to return iommufd_access ID to userspace if bind_iommufd succeeds. Link: https://lore.kernel.org/r/20230327093351.44505-5-yi.l.liu@intel.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-03-31 13:43:32 -03:00
Nicolin Chen	54b47585db	iommufd: Create access in vfio_iommufd_emulated_bind() There are needs to created iommufd_access prior to have an IOAS and set IOAS later. Like the vfio device cdev needs to have an iommufd object to represent the bond (iommufd_access) and IOAS replacement. Moves the iommufd_access_create() call into vfio_iommufd_emulated_bind(), making it symmetric with the __vfio_iommufd_access_destroy() call in the vfio_iommufd_emulated_unbind(). This means an access is created/destroyed by the bind()/unbind(), and the vfio_iommufd_emulated_attach_ioas() only updates the access->ioas pointer. Since vfio_iommufd_emulated_bind() does not provide ioas_id, drop it from the argument list of iommufd_access_create(). Instead, add a new access API iommufd_access_attach() to set the access->ioas pointer. Also, set vdev->iommufd_attached accordingly, similar to the physical pathway. Link: https://lore.kernel.org/r/20230327093351.44505-3-yi.l.liu@intel.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-03-31 13:43:31 -03:00
Yi Liu	325de95029	iommu/iommufd: Pass iommufd_ctx pointer in iommufd_get_ioas() No need to pass the iommufd_ucmd pointer. Link: https://lore.kernel.org/r/20230327093351.44505-2-yi.l.liu@intel.com Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-03-29 16:52:41 -03:00
Jason Gunthorpe	fd8c1a4aee	iommufd/selftest: Catch overflow of uptr and length syzkaller hits a WARN_ON when trying to have a uptr close to UINTPTR_MAX: WARNING: CPU: 1 PID: 393 at drivers/iommu/iommufd/selftest.c:403 iommufd_test+0xb19/0x16f0 Modules linked in: CPU: 1 PID: 393 Comm: repro Not tainted 6.2.0-c9c3395d5e3d #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:iommufd_test+0xb19/0x16f0 Code: 94 c4 31 ff 44 89 e6 e8 a5 54 17 ff 45 84 e4 0f 85 bb 0b 00 00 41 be fb ff ff ff e8 31 53 17 ff e9 a0 f7 ff ff e8 27 53 17 ff <0f> 0b 41 be 8 RSP: 0018:ffffc90000eabdc0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff8214c487 RDX: 0000000000000000 RSI: ffff88800f5c8000 RDI: 0000000000000002 RBP: ffffc90000eabe48 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000001 R11: 0000000000000000 R12: 00000000cd2b0000 R13: 00000000cd2af000 R14: 0000000000000000 R15: ffffc90000eabe68 FS: 00007f94d76d5740(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000043 CR3: 0000000006880006 CR4: 0000000000770ee0 PKRU: 55555554 Call Trace: <TASK> ? write_comp_data+0x2f/0x90 iommufd_fops_ioctl+0x1ef/0x310 __x64_sys_ioctl+0x10e/0x160 ? __pfx_iommufd_fops_ioctl+0x10/0x10 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc Check that the user memory range doesn't overflow. Fixes: `f4b20bb34c` ("iommufd: Add kernel support for testing iommufd") Link: https://lore.kernel.org/r/0-v1-95390ed1df8d+8f-iommufd_mock_overflow_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reported-by: Pengfei Xu <pengfei.xu@intel.com> Link: https://lore.kernel.org/r/Y/hOiilV1wJvu/Hv@xpf.sh.intel.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-03-10 15:29:59 -04:00
Jason Gunthorpe	65c619ae06	iommufd/selftest: Make selftest create a more complete mock device iommufd wants to use more infrastructure, like the iommu_group, that the mock device does not support. Create a more complete mock device that can go through the whole cycle of ownership, blocking domain, and has an iommu_group. This requires creating a real struct device on a real bus to be able to connect it to a iommu_group. Unfortunately we cannot formally attach the mock iommu driver as an actual driver as the iommu core does not allow more than one driver or provide a general way for busses to link to iommus. This can be solved with a little hack to open code the dev_iommus struct. With this infrastructure things work exactly the same as the normal domain path, including the auto domains mechanism and direct attach of hwpts. As the created hwpt is now an autodomain it is no longer required to destroy it and trying to do so will trigger a failure. Link: https://lore.kernel.org/r/11-v3-ae9c2975a131+2e1e8-iommufd_hwpt_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-03-06 13:06:11 -04:00
Jason Gunthorpe	2cfdeaa07b	iommufd/selftest: Rename the sefltest 'device_id' to 'stdev_id' It is too confusing now that we have the 'dev_id' as part of the main interface. Make it clear this is the special selftest device object. This object is analogous to the VFIO device FD. Link: https://lore.kernel.org/r/7-v3-ae9c2975a131+2e1e8-iommufd_hwpt_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-03-06 10:51:58 -04:00
Jason Gunthorpe	339fbf3ae1	iommufd: Make iommufd_hw_pagetable_alloc() do iopt_table_add_domain() The HWPT is always linked to an IOAS and once a HWPT exists its domain should be fully mapped. This ended up being split up into device.c during a two phase creation that was a bit confusing. Move the iopt_table_add_domain() into iommufd_hw_pagetable_alloc() by having it call back to device.c to complete the domain attach in the required order. Calling iommufd_hw_pagetable_alloc() with immediate_attach = false will work on most drivers, but notably the SMMU drivers will fail because they can't decide what kind of domain to create until they are attached. This will be fixed when the domain_alloc function can take in a struct device. Link: https://lore.kernel.org/r/6-v3-ae9c2975a131+2e1e8-iommufd_hwpt_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-03-06 10:51:57 -04:00
Jason Gunthorpe	7e7ec8a569	iommufd: Move iommufd_device to iommufd_private.h hw_pagetable.c will need this in the next patches. Link: https://lore.kernel.org/r/5-v3-ae9c2975a131+2e1e8-iommufd_hwpt_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-03-06 10:51:57 -04:00
Jason Gunthorpe	25cde97d95	iommufd: Move ioas related HWPT destruction into iommufd_hw_pagetable_destroy() A HWPT is permanently associated with an IOAS when it is created, remove the strange situation where a refcount != 0 HWPT can have been disconnected from the IOAS by putting all the IOAS related destruction in the object destroy function. Initializing a HWPT is two stages, we have to allocate it, attach it to a device and then populate the domain. Once the domain is populated it is fully linked to the IOAS. Arrange things so that all the error unwinds flow through the iommufd_hw_pagetable_destroy() and allow it to handle all cases. Link: https://lore.kernel.org/r/4-v3-ae9c2975a131+2e1e8-iommufd_hwpt_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-03-06 10:51:57 -04:00
Jason Gunthorpe	342b9cab8e	iommufd: Consistently manage hwpt_item This should be added immediately after every iopt_table_add_domain(), and deleted after every iopt_table_remove_domain() under the ioas->mutex. Tidy things to be consistent. Link: https://lore.kernel.org/r/3-v3-ae9c2975a131+2e1e8-iommufd_hwpt_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-03-06 10:51:57 -04:00
Jason Gunthorpe	7214c1c85f	iommufd: Add iommufd_lock_obj() around the auto-domains hwpts A later patch will require this locking - currently under the ioas mutex the hwpt can not have a 0 reference and be on the list. Link: https://lore.kernel.org/r/2-v3-ae9c2975a131+2e1e8-iommufd_hwpt_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-03-06 10:51:56 -04:00
Jason Gunthorpe	085fcc7eb7	iommufd: Assert devices_lock for iommufd_hw_pagetable_has_group() The hwpt->devices list is locked by this, make it clearer. Link: https://lore.kernel.org/r/1-v3-ae9c2975a131+2e1e8-iommufd_hwpt_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-03-06 10:51:56 -04:00
Jason Gunthorpe	b4ff830eca	iommufd: Do not add the same hwpt to the ioas->hwpt_list twice The hwpt is added to the hwpt_list only during its creation, it is never added again. This hunk is some missed leftover from rework. Adding it twice will corrupt the linked list in some cases. It effects HWPT specific attachment, which is something the test suite cannot cover until we can create a legitimate struct device with a non-system iommu "driver" (ie we need the bus removed from the iommu code) Cc: stable@vger.kernel.org Fixes: `e8d5721003` ("iommufd: Add kAPI toward external drivers for physical devices") Link: https://lore.kernel.org/r/1-v1-4336b5cb2fe4+1d7-iommufd_hwpt_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reported-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-02-15 21:37:48 -04:00
Jason Gunthorpe	b3551ead61	iommufd: Make sure to zero vfio_iommu_type1_info before copying to user Missed a zero initialization here. Most of the struct is filled with a copy_from_user(), however minsz for that copy is smaller than the actual struct by 8 bytes, thus we don't fill the padding. Cc: stable@vger.kernel.org # 6.1+ Fixes: `d624d6652a` ("iommufd: vfio container FD ioctl compatibility") Link: https://lore.kernel.org/r/0-v1-a74499ece799+1a-iommufd_get_info_leak_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reported-by: syzbot+cb1e0978f6bf46b83a58@syzkaller.appspotmail.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-02-14 16:49:55 -04:00
Jason Gunthorpe	bed9e516f1	Merge branch 'vfio-no-iommu' into iommufd.git for-next Shared branch with VFIO for the no-iommu support. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-02-03 15:55:49 -04:00
Jason Gunthorpe	c9a397cee9	vfio: Support VFIO_NOIOMMU with iommufd Add a small amount of emulation to vfio_compat to accept the SET_IOMMU to VFIO_NOIOMMU_IOMMU and have vfio just ignore iommufd if it is working on a no-iommu enabled device. Move the enable_unsafe_noiommu_mode module out of container.c into vfio_main.c so that it is always available even if VFIO_CONTAINER=n. This passes Alex's mini-test: https://github.com/awilliam/tests/blob/master/vfio-noiommu-pci-device-open.c Link: https://lore.kernel.org/r/0-v3-480cd64a16f7+1ad0-iommufd_noiommu_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-02-03 15:45:23 -04:00
Jason Gunthorpe	fd9f2a9122	Merge branch 'iommu-memory-accounting' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/joro/iommu intoiommufd/for-next Jason Gunthorpe says: ==================== iommufd follows the same design as KVM and uses memory cgroups to limit the amount of kernel memory a iommufd file descriptor can pin down. The various internal data structures already use GFP_KERNEL_ACCOUNT to charge its own memory. However, one of the biggest consumers of kernel memory is the IOPTEs stored under the iommu_domain and these allocations are not tracked. This series is the first step in fixing it. The iommu driver contract already includes a 'gfp' argument to the map_pages op, allowing iommufd to specify GFP_KERNEL_ACCOUNT and then having the driver allocate the IOPTE tables with that flag will capture a significant amount of the allocations. Update the iommu_map() API to pass in the GFP argument, and fix all call sites. Replace iommu_map_atomic(). Audit the "enterprise" iommu drivers to make sure they do the right thing. Intel and S390 ignore the GFP argument and always use GFP_ATOMIC. This is problematic for iommufd anyhow, so fix it. AMD and ARM SMMUv2/3 are already correct. A follow up series will be needed to capture the allocations made when the iommu_domain itself is allocated, which will complete the job. ==================== * 'iommu-memory-accounting' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/s390: Use GFP_KERNEL in sleepable contexts iommu/s390: Push the gfp parameter to the kmem_cache_alloc()'s iommu/intel: Use GFP_KERNEL in sleepable contexts iommu/intel: Support the gfp argument to the map_pages op iommu/intel: Add a gfp parameter to alloc_pgtable_page() iommufd: Use GFP_KERNEL_ACCOUNT for iommu_map() iommu/dma: Use the gfp parameter in __iommu_dma_alloc_noncontiguous() iommu: Add a gfp parameter to iommu_map_sg() iommu: Remove iommu_map_atomic() iommu: Add a gfp parameter to iommu_map() Link: https://lore.kernel.org/linux-iommu/0-v3-76b587fe28df+6e3-iommu_map_gfp_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-01-30 13:54:35 -04:00
Jason Gunthorpe	e787a38e31	iommufd: Use GFP_KERNEL_ACCOUNT for iommu_map() iommufd follows the same design as KVM and uses memory cgroups to limit the amount of kernel memory a iommufd file descriptor can pin down. The various internal data structures already use GFP_KERNEL_ACCOUNT. However, one of the biggest consumers of kernel memory is the IOPTEs stored under the iommu_domain. Many drivers will allocate these at iommu_map() time and will trivially do the right thing if we pass in GFP_KERNEL_ACCOUNT. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-76b587fe28df+6e3-iommu_map_gfp_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-01-25 11:52:04 +01:00
Jason Gunthorpe	1369459b2e	iommu: Add a gfp parameter to iommu_map() The internal mechanisms support this, but instead of exposting the gfp to the caller it wrappers it into iommu_map() and iommu_map_atomic() Fix this instead of adding more variants for GFP_KERNEL_ACCOUNT. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Link: https://lore.kernel.org/r/1-v3-76b587fe28df+6e3-iommu_map_gfp_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-01-25 11:52:00 +01:00
Yi Liu	84798f2849	iommufd: Add three missing structures in ucmd_buffer struct iommu_ioas_copy, struct iommu_option and struct iommu_vfio_ioas are missed in ucmd_buffer. Although they are smaller than the size of ucmd_buffer, it is safer to list them in ucmd_buffer explicitly. Fixes: `aad37e71d5` ("iommufd: IOCTLs for the io_pagetable") Fixes: `d624d6652a` ("iommufd: vfio container FD ioctl compatibility") Link: https://lore.kernel.org/r/20230120122040.280219-1-yi.l.liu@intel.com Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-01-23 14:29:04 -04:00
Jason Gunthorpe	25fc417f79	iommufd: Convert to msi_device_has_isolated_msi() Trivially use the new API. Link: https://lore.kernel.org/r/4-v3-3313bb5dd3a3+10f11-secure_msi_jgg@nvidia.com Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-01-11 16:27:23 -04:00
Linus Torvalds	08cdc21579	iommufd for 6.2 iommufd is the user API to control the IOMMU subsystem as it relates to managing IO page tables that point at user space memory. It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO container) which is the VFIO specific interface for a similar idea. We see a broad need for extended features, some being highly IOMMU device specific: - Binding iommu_domain's to PASID/SSID - Userspace IO page tables, for ARM, x86 and S390 - Kernel bypassed invalidation of user page tables - Re-use of the KVM page table in the IOMMU - Dirty page tracking in the IOMMU - Runtime Increase/Decrease of IOPTE size - PRI support with faults resolved in userspace Many of these HW features exist to support VM use cases - for instance the combination of PASID, PRI and Userspace IO Page Tables allows an implementation of DMA Shared Virtual Addressing (vSVA) within a guest. Dirty tracking enables VM live migration with SRIOV devices and PASID support allow creating "scalable IOV" devices, among other things. As these features are fundamental to a VM platform they need to be uniformly exposed to all the driver families that do DMA into VMs, which is currently VFIO and VDPA. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCY5ct7wAKCRCFwuHvBreF YZZ5AQDciXfcgXLt0UBEmWupNb0f/asT6tk717pdsKm8kAZMNAEAsIyLiKT5HqGl s7fAu+CQ1pr9+9NKGevD+frw8Solsw4= =jJkd -----END PGP SIGNATURE----- Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd implementation from Jason Gunthorpe: "iommufd is the user API to control the IOMMU subsystem as it relates to managing IO page tables that point at user space memory. It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO container) which is the VFIO specific interface for a similar idea. We see a broad need for extended features, some being highly IOMMU device specific: - Binding iommu_domain's to PASID/SSID - Userspace IO page tables, for ARM, x86 and S390 - Kernel bypassed invalidation of user page tables - Re-use of the KVM page table in the IOMMU - Dirty page tracking in the IOMMU - Runtime Increase/Decrease of IOPTE size - PRI support with faults resolved in userspace Many of these HW features exist to support VM use cases - for instance the combination of PASID, PRI and Userspace IO Page Tables allows an implementation of DMA Shared Virtual Addressing (vSVA) within a guest. Dirty tracking enables VM live migration with SRIOV devices and PASID support allow creating "scalable IOV" devices, among other things. As these features are fundamental to a VM platform they need to be uniformly exposed to all the driver families that do DMA into VMs, which is currently VFIO and VDPA" For more background, see the extended explanations in Jason's pull request: https://lore.kernel.org/lkml/Y5dzTU8dlmXTbzoJ@nvidia.com/ * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (62 commits) iommufd: Change the order of MSI setup iommufd: Improve a few unclear bits of code iommufd: Fix comment typos vfio: Move vfio group specific code into group.c vfio: Refactor dma APIs for emulated devices vfio: Wrap vfio group module init/clean code into helpers vfio: Refactor vfio_device open and close vfio: Make vfio_device_open() truly device specific vfio: Swap order of vfio_device_container_register() and open_device() vfio: Set device->group in helper function vfio: Create wrappers for group register/unregister vfio: Move the sanity check of the group to vfio_create_group() vfio: Simplify vfio_create_group() iommufd: Allow iommufd to supply /dev/vfio/vfio vfio: Make vfio_container optionally compiled vfio: Move container related MODULE_ALIAS statements into container.c vfio-iommufd: Support iommufd for emulated VFIO devices vfio-iommufd: Support iommufd for physical VFIO devices vfio-iommufd: Allow iommufd to be used in place of a container fd vfio: Use IOMMU_CAP_ENFORCE_CACHE_COHERENCY for vfio_file_enforced_coherent() ...	2022-12-14 09:15:43 -08:00
Jason Gunthorpe	d6c55c0a20	iommufd: Change the order of MSI setup Eric points out this is wrong for the rare case of someone using allow_unsafe_interrupts on ARM. We always have to setup the MSI window in the domain if the iommu driver asks for it. Move the iommu_get_msi_cookie() setup to the top of the function and always do it, regardless of the security mode. Add checks to iommufd_device_setup_msi() to ensure the driver is not doing something incomprehensible. No current driver will set both a HW and SW MSI window, or have more than one SW MSI window. Fixes: `e8d5721003` ("iommufd: Add kAPI toward external drivers for physical devices") Link: https://lore.kernel.org/r/3-v1-0362a1a1c034+98-iommufd_fixes1_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reported-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-12-09 15:24:30 -04:00
Jason Gunthorpe	a26fa39206	iommufd: Improve a few unclear bits of code Correct a few items noticed late in review: - We should assert that the math in batch_clear_carry() doesn't underflow - user->locked should be -1 not 0 sicne we just did mmput - npages should not have been recalculated, it already has that value No functional change. Fixes: `8d160cd4d5` ("iommufd: Algorithms for PFN storage") Link: https://lore.kernel.org/r/2-v1-0362a1a1c034+98-iommufd_fixes1_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reported-by: Binbin Wu <binbin.wu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-12-09 15:20:37 -04:00
Jason Gunthorpe	c9b8a83a8f	iommufd: Fix comment typos Repair some typos in comments that were noticed late in the review cycle. Fixes: `f394576eb1` ("iommufd: PFN handling for iopt_pages") Link: https://lore.kernel.org/r/1-v1-0362a1a1c034+98-iommufd_fixes1_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reported-by: Binbin Wu <binbin.wu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-12-09 15:20:37 -04:00
Jason Gunthorpe	01f70cbb26	iommufd: Allow iommufd to supply /dev/vfio/vfio If the VFIO container is compiled out, give a kconfig option for iommufd to provide the miscdev node with the same name and permissions as vfio uses. The compatibility node supports the same ioctls as VFIO and automatically enables the VFIO compatible pinned page accounting mode. Link: https://lore.kernel.org/r/10-v4-42cd2eb0e3eb+335a-vfio_iommufd_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Yu He <yu.he@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-12-02 11:52:04 -04:00

1 2

62 Commits