Thinh Tran d27e2da94a net/bnx2x: Prevent access to a freed page in page_pool
Fix race condition leading to system crash during EEH error handling

During EEH error recovery, the bnx2x driver's transmit timeout logic
could cause a race condition when handling reset tasks. The
bnx2x_tx_timeout() schedules reset tasks via bnx2x_sp_rtnl_task(),
which ultimately leads to bnx2x_nic_unload(). In bnx2x_nic_unload()
SGEs are freed using bnx2x_free_rx_sge_range(). However, this could
overlap with the EEH driver's attempt to reset the device using
bnx2x_io_slot_reset(), which also tries to free SGEs. This race
condition can result in system crashes due to accessing freed memory
locations in bnx2x_free_rx_sge()

799  static inline void bnx2x_free_rx_sge(struct bnx2x *bp,
800				struct bnx2x_fastpath *fp, u16 index)
801  {
802	struct sw_rx_page *sw_buf = &fp->rx_page_ring[index];
803     struct page *page = sw_buf->page;
....
where sw_buf was set to NULL after the call to dma_unmap_page()
by the preceding thread.

    EEH: Beginning: 'slot_reset'
    PCI 0011:01:00.0#10000: EEH: Invoking bnx2x->slot_reset()
    bnx2x: [bnx2x_io_slot_reset:14228(eth1)]IO slot reset initializing...
    bnx2x 0011:01:00.0: enabling device (0140 -> 0142)
    bnx2x: [bnx2x_io_slot_reset:14244(eth1)]IO slot reset --> driver unload
    Kernel attempted to read user page (0) - exploit attempt? (uid: 0)
    BUG: Kernel NULL pointer dereference on read at 0x00000000
    Faulting instruction address: 0xc0080000025065fc
    Oops: Kernel access of bad area, sig: 11 [#1]
    .....
    Call Trace:
    [c000000003c67a20] [c00800000250658c] bnx2x_io_slot_reset+0x204/0x610 [bnx2x] (unreliable)
    [c000000003c67af0] [c0000000000518a8] eeh_report_reset+0xb8/0xf0
    [c000000003c67b60] [c000000000052130] eeh_pe_report+0x180/0x550
    [c000000003c67c70] [c00000000005318c] eeh_handle_normal_event+0x84c/0xa60
    [c000000003c67d50] [c000000000053a84] eeh_event_handler+0xf4/0x170
    [c000000003c67da0] [c000000000194c58] kthread+0x1c8/0x1d0
    [c000000003c67e10] [c00000000000cf64] ret_from_kernel_thread+0x5c/0x64

To solve this issue, we need to verify page pool allocations before
freeing.

Fixes: 4cace675d687 ("bnx2x: Alloc 4k fragment for each rx ring buffer element")
Signed-off-by: Thinh Tran <thinhtr@linux.ibm.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240315205535.1321-1-thinhtr@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-19 19:34:27 -07:00
..
2024-02-28 12:20:00 -08:00
2024-01-12 11:32:19 -08:00
2024-03-12 09:31:39 -07:00
2024-02-06 20:07:35 +02:00
2024-03-11 11:43:44 -07:00
2024-03-12 10:35:24 -07:00
2024-01-18 11:37:24 -08:00
2024-02-16 18:51:00 -05:00
2024-01-10 08:45:22 -08:00
2024-03-12 14:36:18 -07:00
2024-01-18 11:37:24 -08:00
2024-01-18 11:37:24 -08:00
2024-03-02 20:50:59 -08:00
2024-03-03 02:32:35 +00:00
2024-01-09 16:32:11 -08:00
2024-03-08 13:06:35 -08:00
2024-03-12 10:29:57 -07:00
2024-01-18 11:37:24 -08:00
2024-03-07 20:26:24 -08:00
2024-01-17 15:25:27 -08:00
2024-03-11 17:11:28 -07:00
2024-01-18 11:37:24 -08:00
2024-03-12 14:49:30 -07:00
2024-03-11 11:43:44 -07:00
2024-03-11 11:43:44 -07:00
2024-03-12 17:44:08 -07:00
2024-03-01 17:18:35 -08:00
2024-01-05 15:55:41 +05:30
2024-03-12 17:44:08 -07:00
2024-03-12 10:35:24 -07:00
2024-01-17 14:37:40 -08:00
2024-03-12 17:44:08 -07:00
2024-01-11 11:31:46 -08:00
2024-03-12 10:14:22 -07:00
2024-03-11 11:43:44 -07:00
2024-01-18 17:08:31 -08:00
2024-02-06 20:07:12 +02:00
2024-03-12 17:44:08 -07:00
2024-03-11 10:52:34 -07:00
2024-03-12 10:35:24 -07:00
2024-03-02 19:47:01 +01:00
2024-01-04 17:03:47 +01:00
2024-01-18 16:44:03 -08:00
2024-01-18 15:57:25 -08:00
2024-03-05 12:38:50 -08:00
2024-01-18 16:44:03 -08:00
2024-02-19 11:10:55 +01:00
2024-01-12 12:38:37 +01:00