linux

iv/linux

Go to file

Jacob Keller 96fdd1f6b4 ice: fix LAG and VF lock dependency in ice_reset_vf()

9f74a3dfcf83 ("ice: Fix VF Reset paths when interface in a failed over
aggregate"), the ice driver has acquired the LAG mutex in ice_reset_vf().
The commit placed this lock acquisition just prior to the acquisition of
the VF configuration lock.

If ice_reset_vf() acquires the configuration lock via the ICE_VF_RESET_LOCK
flag, this could deadlock with ice_vc_cfg_qs_msg() because it always
acquires the locks in the order of the VF configuration lock and then the
LAG mutex.

Lockdep reports this violation almost immediately on creating and then
removing 2 VF:

======================================================
WARNING: possible circular locking dependency detected
6.8.0-rc6 #54 Tainted: G        W  O
------------------------------------------------------
kworker/60:3/6771 is trying to acquire lock:
ff40d43e099380a0 (&vf->cfg_lock){+.+.}-{3:3}, at: ice_reset_vf+0x22f/0x4d0 [ice]

but task is already holding lock:
ff40d43ea1961210 (&pf->lag_mutex){+.+.}-{3:3}, at: ice_reset_vf+0xb7/0x4d0 [ice]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&pf->lag_mutex){+.+.}-{3:3}:
       __lock_acquire+0x4f8/0xb40
       lock_acquire+0xd4/0x2d0
       __mutex_lock+0x9b/0xbf0
       ice_vc_cfg_qs_msg+0x45/0x690 [ice]
       ice_vc_process_vf_msg+0x4f5/0x870 [ice]
       __ice_clean_ctrlq+0x2b5/0x600 [ice]
       ice_service_task+0x2c9/0x480 [ice]
       process_one_work+0x1e9/0x4d0
       worker_thread+0x1e1/0x3d0
       kthread+0x104/0x140
       ret_from_fork+0x31/0x50
       ret_from_fork_asm+0x1b/0x30

-> #0 (&vf->cfg_lock){+.+.}-{3:3}:
       check_prev_add+0xe2/0xc50
       validate_chain+0x558/0x800
       __lock_acquire+0x4f8/0xb40
       lock_acquire+0xd4/0x2d0
       __mutex_lock+0x9b/0xbf0
       ice_reset_vf+0x22f/0x4d0 [ice]
       ice_process_vflr_event+0x98/0xd0 [ice]
       ice_service_task+0x1cc/0x480 [ice]
       process_one_work+0x1e9/0x4d0
       worker_thread+0x1e1/0x3d0
       kthread+0x104/0x140
       ret_from_fork+0x31/0x50
       ret_from_fork_asm+0x1b/0x30

other info that might help us debug this:
 Possible unsafe locking scenario:
       CPU0                    CPU1
       ----                    ----
  lock(&pf->lag_mutex);
                               lock(&vf->cfg_lock);
                               lock(&pf->lag_mutex);
  lock(&vf->cfg_lock);

 *** DEADLOCK ***
4 locks held by kworker/60:3/6771:
 #0: ff40d43e05428b38 ((wq_completion)ice){+.+.}-{0:0}, at: process_one_work+0x176/0x4d0
 #1: ff50d06e05197e58 ((work_completion)(&pf->serv_task)){+.+.}-{0:0}, at: process_one_work+0x176/0x4d0
 #2: ff40d43ea1960e50 (&pf->vfs.table_lock){+.+.}-{3:3}, at: ice_process_vflr_event+0x48/0xd0 [ice]
 #3: ff40d43ea1961210 (&pf->lag_mutex){+.+.}-{3:3}, at: ice_reset_vf+0xb7/0x4d0 [ice]

stack backtrace:
CPU: 60 PID: 6771 Comm: kworker/60:3 Tainted: G        W  O       6.8.0-rc6 #54
Hardware name:
Workqueue: ice ice_service_task [ice]
Call Trace:
 <TASK>
 dump_stack_lvl+0x4a/0x80
 check_noncircular+0x12d/0x150
 check_prev_add+0xe2/0xc50
 ? save_trace+0x59/0x230
 ? add_chain_cache+0x109/0x450
 validate_chain+0x558/0x800
 __lock_acquire+0x4f8/0xb40
 ? lockdep_hardirqs_on+0x7d/0x100
 lock_acquire+0xd4/0x2d0
 ? ice_reset_vf+0x22f/0x4d0 [ice]
 ? lock_is_held_type+0xc7/0x120
 __mutex_lock+0x9b/0xbf0
 ? ice_reset_vf+0x22f/0x4d0 [ice]
 ? ice_reset_vf+0x22f/0x4d0 [ice]
 ? rcu_is_watching+0x11/0x50
 ? ice_reset_vf+0x22f/0x4d0 [ice]
 ice_reset_vf+0x22f/0x4d0 [ice]
 ? process_one_work+0x176/0x4d0
 ice_process_vflr_event+0x98/0xd0 [ice]
 ice_service_task+0x1cc/0x480 [ice]
 process_one_work+0x1e9/0x4d0
 worker_thread+0x1e1/0x3d0
 ? __pfx_worker_thread+0x10/0x10
 kthread+0x104/0x140
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x31/0x50
 ? __pfx_kthread+0x10/0x10
 ret_from_fork_asm+0x1b/0x30
 </TASK>

To avoid deadlock, we must acquire the LAG mutex only after acquiring the
VF configuration lock. Fix the ice_reset_vf() to acquire the LAG mutex only
after we either acquire or check that the VF configuration lock is held.

Fixes: 9f74a3dfcf83 ("ice: Fix VF Reset paths when interface in a failed over aggregate")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Dave Ertman <david.m.ertman@intel.com>
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Tested-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20240423182723.740401-5-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

2024-04-25 08:20:55 -07:00

arch

Misc x86 fixes:

2024-04-14 10:48:51 -07:00

block

block-6.9-20240412

2024-04-12 10:22:33 -07:00

certs

This update includes the following changes:

2023-11-02 16:15:30 -10:00

crypto

This push fixes a regression that broke iwd as well as a divide by

2024-03-25 10:48:23 -07:00

Documentation

pwm: Another batch of fixes targeting v6.9-rc5

2024-04-17 10:04:40 -07:00

drivers

ice: fix LAG and VF lock dependency in ice_reset_vf()

2024-04-25 08:20:55 -07:00

for-6.9-rc4-tag

2024-04-17 18:25:40 -07:00

include

ethernet: Add helper for assigning packet type when dest address does not match device address

2024-04-25 08:20:54 -07:00

init

fs/proc: Skip bootloader comment if no embedded kernel parameters

2024-04-09 23:36:18 +09:00

io_uring

io_uring/net: restore msg_control on sendzc retry

2024-04-08 21:48:41 -06:00

ipc

sysctl changes for v6.9-rc1

2024-03-18 14:59:13 -07:00

kernel

Misc x86 fixes:

2024-04-14 10:48:51 -07:00

lib

Including fixes from bluetooth.

2024-04-11 11:46:31 -07:00

LICENSES

LICENSES: Add the copyleft-next-0.3.1 license

2022-11-08 15:44:01 +01:00

x86/mm/pat: fix VM_PAT handling in COW mappings

2024-04-05 11:21:31 -07:00

net

ethernet: Add helper for assigning packet type when dest address does not match device address

2024-04-25 08:20:54 -07:00

rust

Kbuild updates for v6.9

2024-03-21 14:41:00 -07:00

samples

Tracing updates for 6.9:

2024-03-18 15:11:44 -07:00

scripts

hardening fixes for v6.9-rc4

2024-04-10 13:31:34 -07:00

security

security: Place security_path_post_mknod() where the original IMA call was

2024-04-03 10:21:32 -07:00

sound

ASoC: Fixes for v6.9

2024-04-05 08:48:12 +02:00

tools

tools: ynl: don't ignore errors in NLMSG_DONE messages

2024-04-23 15:37:33 +02:00

usr

Kbuild updates for v6.8

2024-01-18 17:57:07 -08:00

virt

KVM Xen and pfncache changes for 6.9:

2024-03-11 10:42:55 -04:00

.clang-format

clang-format: Update with v6.7-rc4's for_each macro list

2023-12-08 23:54:38 +01:00

.cocciconfig

…

.editorconfig

Add .editorconfig file for basic formatting

2023-12-28 16:22:47 +09:00

.get_maintainer.ignore

Add Jeff Kirsher to .get_maintainer.ignore

2024-03-08 11:36:54 +00:00

.gitattributes

.gitattributes: set diff driver for Rust source code files

2023-05-31 17:48:25 +02:00

.gitignore

kbuild: create a list of all built DTB files

2024-02-19 18:20:39 +09:00

.mailmap

mailmap: add entries for Alex Elder

2024-04-22 11:13:45 +01:00

.rustfmt.toml

rust: add .rustfmt.toml

2022-09-28 09:02:20 +02:00

COPYING

…

CREDITS

MAINTAINERS: Drop Gustavo Pimentel as PCI DWC Maintainer

2024-03-27 13:41:02 -05:00

Kbuild

Kbuild updates for v6.1

2022-10-10 12:00:45 -07:00

Kconfig

kbuild: ensure full rebuild when the compiler is updated

2020-05-12 13:28:33 +09:00

MAINTAINERS

MAINTAINERS: eth: mark IBM eHEA as an Orphan

2024-04-22 14:15:34 -07:00

Makefile

Linux 6.9-rc4

2024-04-14 13:38:39 -07:00

README

README: Fix spelling

2024-03-18 03:36:32 -06:00

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the reStructuredText markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.

Languages

C 97.6%

Assembly 1%

Shell 0.5%

Python 0.3%

Makefile 0.3%