drm/amdkfd: fix restore worker race condition

[ Upstream commit f7646585a30ed8ef5ab300d4dc3b0c1d6afbe71d ]

In free memory of gpu path, remove bo from validate_list to make sure
restore worker don't access the BO any more, then unregister bo MMU
interval notifier. Otherwise, the restore worker will crash in the
middle of validating BO user pages if MMU interval notifer is gone.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
This commit is contained in:
Philip Yang 2020-05-21 09:56:58 -04:00 committed by Greg Kroah-Hartman
parent 180e60f154
commit 62962e08b9

View File

@ -1247,15 +1247,15 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
* be freed anyway
*/
/* No more MMU notifiers */
amdgpu_mn_unregister(mem->bo);
/* Make sure restore workers don't access the BO any more */
bo_list_entry = &mem->validate_list;
mutex_lock(&process_info->lock);
list_del(&bo_list_entry->head);
mutex_unlock(&process_info->lock);
/* No more MMU notifiers */
amdgpu_mn_unregister(mem->bo);
ret = reserve_bo_and_cond_vms(mem, NULL, BO_VM_ALL, &ctx);
if (unlikely(ret))
return ret;