linux/drivers
Mike Marciniszyn 9a293d1e21 IB/hfi1: Ensure pq is not left on waitlist
The following warning can occur when a pq is left on the dmawait list and
the pq is then freed:

  WARNING: CPU: 47 PID: 3546 at lib/list_debug.c:29 __list_add+0x65/0xc0
  list_add corruption. next->prev should be prev (ffff939228da1880), but was ffff939cabb52230. (next=ffff939cabb52230).
  Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) 8021q garp mrp ib_isert iscsi_target_mod target_core_mod crc_t10dif crct10dif_generic opa_vnic rpcrdma ib_iser libiscsi scsi_transport_iscsi ib_ipoib(OE) bridge stp llc iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crct10dif_pclmul crct10dif_common crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ast ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm pcspkr joydev drm_panel_orientation_quirks i2c_i801 mei_me lpc_ich mei wmi ipmi_si ipmi_devintf ipmi_msghandler nfit libnvdimm acpi_power_meter acpi_pad hfi1(OE) rdmavt(OE) rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core binfmt_misc numatools(OE) xpmem(OE) ip_tables
  nfsv3 nfs_acl nfs lockd grace sunrpc fscache igb ahci libahci i2c_algo_bit dca libata ptp pps_core crc32c_intel [last unloaded: i2c_algo_bit]
  CPU: 47 PID: 3546 Comm: wrf.exe Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.41.1.el7.x86_64 #1
  Hardware name: HPE.COM HPE SGI 8600-XA730i Gen10/X11DPT-SB-SG007, BIOS SBED1229 01/22/2019
  Call Trace:
  [<ffffffff91f65ac0>] dump_stack+0x19/0x1b
  [<ffffffff91898b78>] __warn+0xd8/0x100
  [<ffffffff91898bff>] warn_slowpath_fmt+0x5f/0x80
  [<ffffffff91a1dabe>] ? ___slab_alloc+0x24e/0x4f0
  [<ffffffff91b97025>] __list_add+0x65/0xc0
  [<ffffffffc03926a5>] defer_packet_queue+0x145/0x1a0 [hfi1]
  [<ffffffffc0372987>] sdma_check_progress+0x67/0xa0 [hfi1]
  [<ffffffffc03779d2>] sdma_send_txlist+0x432/0x550 [hfi1]
  [<ffffffff91a20009>] ? kmem_cache_alloc+0x179/0x1f0
  [<ffffffffc0392973>] ? user_sdma_send_pkts+0xc3/0x1990 [hfi1]
  [<ffffffffc0393e3a>] user_sdma_send_pkts+0x158a/0x1990 [hfi1]
  [<ffffffff918ab65e>] ? try_to_del_timer_sync+0x5e/0x90
  [<ffffffff91a3fe1a>] ? __check_object_size+0x1ca/0x250
  [<ffffffffc0395546>] hfi1_user_sdma_process_request+0xd66/0x1280 [hfi1]
  [<ffffffffc034e0da>] hfi1_aio_write+0xca/0x120 [hfi1]
  [<ffffffff91a4245b>] do_sync_readv_writev+0x7b/0xd0
  [<ffffffff91a4409e>] do_readv_writev+0xce/0x260
  [<ffffffff918df69f>] ? pick_next_task_fair+0x5f/0x1b0
  [<ffffffff918db535>] ? sched_clock_cpu+0x85/0xc0
  [<ffffffff91f6b16a>] ? __schedule+0x13a/0x860
  [<ffffffff91a442c5>] vfs_writev+0x35/0x60
  [<ffffffff91a4447f>] SyS_writev+0x7f/0x110
  [<ffffffff91f78ddb>] system_call_fastpath+0x22/0x27

The issue happens when wait_event_interruptible_timeout() returns a value
<= 0.

In that case, the pq is left on the list. The code continues sending
packets and potentially can complete the current request with the pq still
on the dmawait list provided no descriptor shortage is seen.

If the pq is torn down in that state, the sdma interrupt handler could
find the now freed pq on the list with list corruption or memory
corruption resulting.

Fix by adding a flush routine to ensure that the pq is never on a list
after processing a request.

A follow-up patch series will address issues with seqlock surfaced in:
https://lore.kernel.org/r/20200320003129.GP20941@ziepe.ca

The seqlock use for sdma will then be converted to a spin lock since the
list_empty() doesn't need the protection afforded by the sequence lock
currently in use.

Fixes: a0d406934a ("staging/rdma/hfi1: Add page lock limit check for SDMA requests")
Link: https://lore.kernel.org/r/20200320200200.23203.37777.stgit@awfm-01.aw.intel.com
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-23 21:57:57 -03:00
..
accessibility
acpi ACPI fixes for 5.6-rc4 2020-02-28 09:02:18 -08:00
amba
android binder: prevent UAF for binderfs devices II 2020-03-03 19:58:37 +01:00
ata libata-5.6-2020-02-05 2020-02-06 06:11:50 +00:00
atm Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next 2020-01-28 16:02:33 -08:00
auxdisplay
base Driver core / debugfs fixes for 5.6-rc5 2020-03-08 10:39:40 -05:00
bcma Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next 2020-01-28 16:02:33 -08:00
block xen: branch for v5.6-rc5 2020-03-07 08:04:54 -06:00
bluetooth Bluetooth: btrtl: Use kvmalloc for FW allocations 2020-01-24 19:57:53 +01:00
bus Few fixes for omaps for v5.6-rc cycle 2020-02-29 11:47:44 -08:00
cdrom scsi: compat_ioctl: cdrom: Replace .ioctl with .compat_ioctl in four appropriate places 2020-02-24 15:06:07 -05:00
char tpm: Initialize crypto_id of allocated_banks to HASH_ALGO__LAST 2020-02-17 20:47:06 +02:00
clk ARM: SoC: late updates 2020-02-08 14:17:27 -08:00
clocksource ARM: SoC: late updates 2020-02-08 14:17:27 -08:00
connector
counter
cpufreq cpufreq: Fix policy initialization for internal governor drivers 2020-02-27 08:57:48 +01:00
cpuidle ARM: SoC-related driver updates 2020-02-08 14:04:19 -08:00
crypto Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next 2020-01-28 16:02:33 -08:00
dax dax: Get rid of fs_dax_get_by_host() helper 2020-01-16 09:52:27 -08:00
dca
devfreq Revert "PM / devfreq: Modify the device name as devfreq(X) for sysfs" 2020-02-24 11:14:29 +09:00
dio
dma dmaengine: imx-sdma: Fix the event id check to include RX event for UART6 2020-02-25 14:15:26 +05:30
dma-buf dma-buf: free dmabuf->name in dma_buf_release() 2020-02-27 18:01:58 +05:30
edac EDAC/synopsys: Do not print an error with back-to-back snprintf() calls 2020-02-27 16:44:25 +01:00
eisa
extcon
firewire
firmware ARM: SoC fixes 2020-03-08 17:36:22 -07:00
fpga
fsi fsi: aspeed: add unspecified HAS_IOMEM dependency 2020-02-10 13:45:49 -08:00
gnss
gpio gpio: sifive: fix static checker warning 2020-02-10 13:54:17 +01:00
gpu Merge tag 'amd-drm-fixes-5.6-2020-03-05' of git://people.freedesktop.org/~agd5f/linux into drm-fixes 2020-03-06 11:06:33 +10:00
greybus
hid HID: hyperv: NULL check before some freeing functions is not needed. 2020-03-05 14:17:11 +00:00
hsi
hv - Most of the commits here are work to enable host-initiated hibernation 2020-02-03 14:42:03 +00:00
hwmon hwmon: (adt7462) Fix an error return in ADT7462_REG_VOLT() 2020-03-03 12:42:55 -08:00
hwspinlock hwspinlock: sirf: Use devm_hwspin_lock_register() to register hwlock controller 2020-01-21 16:16:36 -08:00
hwtracing coresight: etm4x: Fix unused function warning 2020-01-14 15:38:28 +01:00
i2c i2c: altera: Fix potential integer overflow 2020-02-13 09:29:30 +01:00
i3c i3c: master: dw: reattach device on first available location of address table 2020-01-13 10:00:05 +01:00
ide scsi: compat_ioctl: cdrom: Replace .ioctl with .compat_ioctl in four appropriate places 2020-02-24 15:06:07 -05:00
idle intel_idle: Introduce 'states_off' module parameter 2020-02-03 11:57:18 +01:00
iio chrome platform changes for 5.6 2020-02-04 07:17:41 +00:00
infiniband IB/hfi1: Ensure pq is not left on waitlist 2020-03-23 21:57:57 -03:00
input Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2020-02-15 16:49:25 -08:00
interconnect interconnect: Handle memory allocation errors 2020-03-03 08:02:57 +01:00
iommu iommu/arm-smmu: Restore naming of driver parameter prefix 2020-02-19 12:03:21 +01:00
ipack
irqchip irqchip/gic-v4.1: Avoid 64bit division for the sake of 32bit ARM 2020-02-09 15:47:37 -08:00
isdn proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00
leds leds: lm3532: add pointer to documentation and fix typo 2020-01-22 21:08:24 +01:00
lightnvm
macintosh macintosh: therm_windtunnel: fix regression when instantiating devices 2020-02-29 21:13:22 +01:00
mailbox
mcb
md block-5.6-2020-03-07 2020-03-07 14:14:38 -06:00
media media: mc-entity.c: use & to check pad flags, not == 2020-02-24 15:10:04 +01:00
memory mvebu drivers for 5.6 (part 1) 2020-01-16 10:45:44 -08:00
memstick
message Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net 2020-01-19 22:10:04 +01:00
mfd chrome platform changes for 5.6 2020-02-04 07:17:41 +00:00
misc altera-stapl: altera_get_note: prevent write beyond end of 'key' 2020-03-03 08:02:57 +01:00
mmc ioremap changes for 5.6 2020-01-27 13:03:00 -08:00
mtd treewide: remove redundant IS_ERR() before error code check 2020-02-04 03:05:27 +00:00
mux
net net: dsa: mv88e6xxx: Fix masking of egress port 2020-02-27 12:29:09 -08:00
nfc nfc: pn544: Fix occasional HW initialization failure 2020-02-19 11:09:27 -08:00
ntb
nubus
nvdimm mm: Cleanup __put_devmap_managed_page() vs ->page_free() 2020-01-31 10:30:37 -08:00
nvme nvme-pci: Hold cq_poll_lock while completing CQEs 2020-02-28 01:32:14 +09:00
nvmem Merge branch 'i2c/for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux 2020-02-07 12:54:13 -08:00
of ARM: SoC-related driver updates 2020-02-08 14:04:19 -08:00
opp ioremap changes for 5.6 2020-01-27 13:03:00 -08:00
oprofile tracing: Make struct ring_buffer less ambiguous 2020-01-13 13:19:38 -05:00
parisc proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00
parport
pci PCI: brcmstb: Fix build on 32bit ARM platforms with older compilers 2020-02-27 08:06:20 -06:00
pcmcia
perf drivers/perf: arm_pmu_acpi: Fix incorrect checking of gicc pointer 2020-03-02 12:07:35 +00:00
phy phy: for 5.6-rc 2020-03-04 13:28:52 +01:00
pinctrl pinctrl: fix pxa2xx.c build warnings 2020-02-04 03:05:24 +00:00
platform platform/chrome: wilco_ec: Include asm/unaligned instead of linux/ path 2020-02-11 09:10:36 +01:00
pnp proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00
power ARM: SoC platform updates 2020-02-08 13:55:25 -08:00
powercap Merge back power capping changes for v5.6. 2020-01-13 10:32:19 +01:00
pps
ps3
ptp Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net 2020-01-19 22:10:04 +01:00
pwm pwm: Remove set but not set variable 'pwm' 2020-01-20 15:40:49 +01:00
rapidio
ras
regulator regulator: Fixes for v5.6 2020-03-06 14:48:30 -06:00
remoteproc remoteproc: qcom: q6v5-mss: Improve readability of reset_assert 2020-01-24 09:34:07 -08:00
reset reset: intel: add unspecified HAS_IOMEM dependency 2020-02-10 11:11:55 +01:00
rpmsg rpmsg: add rpmsg support for mt8183 SCP. 2020-01-20 10:29:56 -08:00
rtc chrome platform changes for 5.6 2020-02-04 07:17:41 +00:00
s390 SCSI fixes on 20200229 2020-02-29 09:58:47 -06:00
sbus
scsi scsi: compat_ioctl: cdrom: Replace .ioctl with .compat_ioctl in four appropriate places 2020-02-24 15:06:07 -05:00
sfi
sh
siox siox: Use the correct style for SPDX License Identifier 2020-01-14 21:46:53 +01:00
slimbus slimbus: qcom: add missed clk_disable_unprepare in remove 2020-01-14 21:46:48 +01:00
soc i.MX fixes for 5.6: 2020-02-24 09:57:05 -08:00
soundwire soundwire: cadence: fix kernel-doc parameter descriptions 2020-01-16 17:34:38 +05:30
spi spi: Fixes for v5.6 2020-03-06 14:50:16 -06:00
spmi spmi: pmic-arb: Set lockdep class for hierarchical irq domains 2020-02-10 13:16:04 +01:00
ssb
staging TTY/Serial fixes for 5.6-rc5 2020-03-08 10:35:04 -05:00
target scsi: Revert "target: iscsi: Wait for all commands to finish before freeing a session" 2020-02-14 17:13:54 -05:00
tc The main MIPS changes for 5.6: 2020-01-31 11:28:31 -08:00
tee arm64: dts: agilex: fix gmac compatible 2020-03-03 16:40:56 -08:00
thermal - Fix a SEVERE docs build failure for cpu idle cooling device (Randy Dunlap) 2020-01-31 14:39:21 -08:00
thunderbolt thunderbolt: Prevent crash if non-active NVMem file is read 2020-02-13 04:59:30 -08:00
tty tty: serial: fsl_lpuart: free IDs allocated by IDA 2020-03-06 14:10:44 +01:00
uio uio: uio_pdrv_genirq: Do not log an error when deferring probe routine. 2020-01-14 15:27:51 +01:00
usb usb: dwc3: gadget: Update chain bit correctly when using sg list 2020-03-04 10:58:16 +01:00
vfio VFIO updates for v5.6-rc1 2020-02-03 22:22:05 +00:00
vhost vhost: Check docket sk_family instead of call getname 2020-02-22 21:41:42 -08:00
video ARM: SoC fixes 2020-03-08 17:36:22 -07:00
virt
virtio virtio_balloon: Fix memory leaks on errors in virtballoon_probe() 2020-02-06 03:40:27 -05:00
visorbus visorbus: fix uninitialized variable access 2020-01-14 15:30:35 +01:00
vlynq
vme Char/Misc driver changes for 5.6-rc1 2020-01-29 10:35:54 -08:00
w1 Char/Misc driver changes for 5.6-rc1 2020-01-29 10:35:54 -08:00
watchdog ACPI fixes for 5.6-rc4 2020-02-28 09:02:18 -08:00
xen xen/xenbus: fix locking 2020-03-05 09:42:23 -06:00
zorro Kbuild updates for v5.6 (2nd) 2020-02-09 16:05:50 -08:00
Kconfig
Makefile