linux/drivers
Jeffrey Hugo bce3f77068 bus: mhi: host: Add MHI_PM_SYS_ERR_FAIL state
When processing a SYSERR, if the device does not respond to the MHI_RESET
from the host, the host will be stuck in a difficult to recover state.
The host will remain in MHI_PM_SYS_ERR_PROCESS and not clean up the host
channels.  Clients will not be notified of the SYSERR via the destruction
of their channel devices, which means clients may think that the device is
still up.  Subsequent SYSERR events such as a device fatal error will not
be processed as the state machine cannot transition from PROCESS back to
DETECT.  The only way to recover from this is to unload the mhi module
(wipe the state machine state) or for the mhi controller to initiate
SHUTDOWN.

This issue was discovered by stress testing soc_reset events on AIC100
via the sysfs node.

soc_reset is processed entirely in hardware.  When the register write
hits the endpoint hardware, it causes the soc to reset without firmware
involvement.  In stress testing, there is a rare race where soc_reset N
will cause the soc to reset and PBL to signal SYSERR (fatal error).  If
soc_reset N+1 is triggered before PBL can process the MHI_RESET from the
host, then the soc will reset again, and re-run PBL from the beginning.
This will cause PBL to lose all state.  PBL will be waiting for the host
to respond to the new syserr, but host will be stuck expecting the
previous MHI_RESET to be processed.

Additionally, the AMSS EE firmware (QSM) was hacked to synthetically
reproduce the issue by simulating a FW hang after the QSM issued a
SYSERR.  In this case, soc_reset would not recover the device.

For this failure case, to recover the device, we need a state similar to
PROCESS, but can transition to DETECT.  There is not a viable existing
state to use.  POR has the needed transitions, but assumes the device is
in a good state and could allow the host to attempt to use the device.
Allowing PROCESS to transition to DETECT invites the possibility of
parallel SYSERR processing which could get the host and device out of
sync.

Thus, invent a new state - MHI_PM_SYS_ERR_FAIL

This essentially a holding state.  It allows us to clean up the host
elements that are based on the old state of the device (channels), but
does not allow us to directly advance back to an operational state.  It
does allow the detection and processing of another SYSERR which may
recover the device, or allows the controller to do a clean shutdown.

Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20240112180800.536733-1-quic_jhugo@quicinc.com
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2024-01-30 23:52:40 +05:30
..
accel drm-next for 6.8: 2024-01-12 11:32:19 -08:00
accessibility
acpi cxl for v6.8 2024-01-18 16:22:43 -08:00
amba
android Char/Misc and other Driver changes for 6.8-rc1 2024-01-17 16:47:17 -08:00
ata ata changes for 6.8-rc1 2024-01-11 13:49:00 -08:00
atm net: fill in MODULE_DESCRIPTION()s for ATM 2024-01-05 08:04:23 -08:00
auxdisplay drm-next for 6.8: 2024-01-12 11:32:19 -08:00
base RTC for 6.8 2024-01-18 17:25:39 -08:00
bcma
block for-6.8/block-2024-01-18 2024-01-18 18:22:40 -08:00
bluetooth USB / Thunderbolt changes for 6.8-rc1 2024-01-18 11:43:55 -08:00
bus bus: mhi: host: Add MHI_PM_SYS_ERR_FAIL state 2024-01-30 23:52:40 +05:30
cache
cdrom
cdx cdx: Unlock on error path in rescan_store() 2024-01-04 17:01:14 +01:00
char TTY/Serial changes for 6.8-rc1 2024-01-18 11:37:24 -08:00
clk clk: qcom: gcc-x1e80100: Replace of_device.h with explicit includes 2024-01-19 08:17:28 -06:00
clocksource
comedi
connector Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-01-04 18:06:46 -08:00
counter
cpufreq Merge branches 'pm-sleep', 'pm-cpufreq' and 'pm-qos' into pm 2024-01-16 12:23:24 +01:00
cpuidle
crypto This update includes the following changes: 2024-01-10 12:23:43 -08:00
cxl cxl/core: use sysfs_emit() for attr's _show() 2024-01-12 14:47:04 -08:00
dax New code for 6.8: 2024-01-10 08:45:22 -08:00
dca
devfreq
dio
dma dmaengine fixes for v6.8-rc1 2024-01-20 15:03:25 -08:00
dma-buf
dpll dpll: expose fractional frequency offset value to user 2024-01-05 07:58:19 -08:00
edac Driver core changes for 6.8-rc1 2024-01-18 09:48:40 -08:00
eisa
extcon
firewire firewire: core: fill model field in modalias of unit device for legacy layout of configuration ROM 2024-01-10 18:37:13 +09:00
firmware Revert "firmware/sysfb: Clear screen_info state after consuming it" 2024-01-19 22:22:26 +01:00
fpga Char/Misc and other Driver changes for 6.8-rc1 2024-01-17 16:47:17 -08:00
fsi
gnss TTY/Serial changes for 6.8-rc1 2024-01-18 11:37:24 -08:00
gpio gpio fixes for v6.8-rc1 2024-01-18 17:03:51 -08:00
gpu drm fixes for 6.8-rc1 2024-01-19 11:50:00 -08:00
greybus TTY/Serial changes for 6.8-rc1 2024-01-18 11:37:24 -08:00
hid hid-for-linus-2024010801 2024-01-12 14:45:13 -08:00
hsi
hte
hv
hwmon hwmon: (npcm750-pwm-fan) Fix crash observed when instantiating nuvoton,npcm750-pwm-fan 2024-01-14 07:44:38 -08:00
hwspinlock
hwtracing
i2c This cycle, I2C removes the currently unused CLASS_DDC support 2024-01-18 17:29:01 -08:00
i3c i3c: master: cdns: Update maximum prescaler value for i2c clock 2024-01-08 00:51:36 +01:00
idle Power management updates for 6.8-rc1 2024-01-09 16:32:11 -08:00
iio TTY/Serial changes for 6.8-rc1 2024-01-18 11:37:24 -08:00
infiniband RDMA v6.8 merge window 2024-01-12 13:52:21 -08:00
input Input updates for 6.8 merge window: 2024-01-18 17:21:35 -08:00
interconnect Char/Misc and other Driver changes for 6.8-rc1 2024-01-17 16:47:17 -08:00
iommu iommufd for 6.8 2024-01-18 15:28:15 -08:00
ipack TTY/Serial changes for 6.8-rc1 2024-01-18 11:37:24 -08:00
irqchip header cleanups for 6.8 2024-01-10 16:43:55 -08:00
isdn
leds - New Drivers 2024-01-17 15:25:27 -08:00
macintosh
mailbox mediatek: add CMDQ support for mt8188 2024-01-17 15:39:32 -08:00
mcb
md for-6.8/block-2024-01-18 2024-01-18 18:22:40 -08:00
media media: solo6x10: replace max(a, min(b, c)) by clamp(b, a, c) 2024-01-20 13:30:00 -08:00
memory IOMMU Updates for Linux v6.8 2024-01-18 15:16:57 -08:00
memstick
message
mfd TTY/Serial changes for 6.8-rc1 2024-01-18 11:37:24 -08:00
misc This cycle, I2C removes the currently unused CLASS_DDC support 2024-01-18 17:29:01 -08:00
mmc TTY/Serial changes for 6.8-rc1 2024-01-18 11:37:24 -08:00
most
mtd This pull request contains updates for UBI and UBIFS: 2024-01-17 10:27:13 -08:00
mux mux: mmio: use reg property when parent device is not a syscon 2024-01-04 17:01:14 +01:00
net net: can: Use device_get_match_data() 2024-01-19 08:08:53 -06:00
nfc
ntb
nubus nubus: Make nubus_bus_type static and constant 2024-01-03 13:33:59 +01:00
nvdimm virtio: features, fixes 2024-01-18 16:44:03 -08:00
nvme for-6.8/block-2024-01-18 2024-01-18 18:22:40 -08:00
nvmem Char/Misc and other Driver changes for 6.8-rc1 2024-01-17 16:47:17 -08:00
of IOMMU Updates for Linux v6.8 2024-01-18 15:16:57 -08:00
opp OPP: Rename 'rate_clk_single' 2024-01-05 15:55:41 +05:30
parisc parisc/power: Fix power soft-off button emulation on qemu 2024-01-07 22:59:16 +01:00
parport
pci cxl for v6.8 2024-01-18 16:22:43 -08:00
pcmcia
peci
perf ACPI updates for 6.8-rc1 2024-01-09 16:12:44 -08:00
phy phy-for-6.8 2024-01-18 17:11:43 -08:00
pinctrl This is the main pin control pull request for the v6.8 kernel series. 2024-01-17 15:55:33 -08:00
platform USB / Thunderbolt changes for 6.8-rc1 2024-01-18 11:43:55 -08:00
pmdomain Driver core changes for 6.8-rc1 2024-01-18 09:48:40 -08:00
pnp More ACPI updates for 6.8-rc1 2024-01-17 14:37:40 -08:00
power power: supply: bq24190_charger: Fix "initializer element is not constant" error 2024-01-19 01:03:17 +01:00
powercap
pps
ps3
ptp Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-01-04 18:06:46 -08:00
pwm pwm: jz4740: Don't use dev_err_probe() in .request() 2024-01-12 18:25:05 +01:00
rapidio
ras
regulator pwm: Changes for v6.8-rc1 2024-01-12 14:59:50 -08:00
remoteproc
reset SoC: driver updates for 6.8 2024-01-11 11:31:46 -08:00
rpmsg
rtc rtc: nuvoton: Compatible with NCT3015Y-R and NCT3018Y-R 2024-01-18 01:05:33 +01:00
s390 s390 updates for 6.8 merge window part 2 2024-01-18 14:11:25 -08:00
sbus
scsi SCSI misc on 20240120 2024-01-20 09:42:32 -08:00
sh maple: make maple_bus_type static and const 2024-01-04 14:37:17 +01:00
siox
slimbus
soc Char/Misc and other Driver changes for 6.8-rc1 2024-01-17 16:47:17 -08:00
soundwire soundwire updates for 6.7 2024-01-18 17:08:31 -08:00
spi spi: Fix for v6.8 2024-01-19 12:50:09 -08:00
spmi
ssb
staging This cycle, I2C removes the currently unused CLASS_DDC support 2024-01-18 17:29:01 -08:00
target SCSI misc on 20240120 2024-01-20 09:42:32 -08:00
tc
tee Another moderately busy cycle for documentation, including: 2024-01-11 19:46:52 -08:00
thermal thermal: loongson2: Replace of_device.h with explicit includes 2024-01-19 08:08:53 -06:00
thunderbolt USB / Thunderbolt changes for 6.8-rc1 2024-01-18 11:43:55 -08:00
tty RISC-V Patches for the 6.8 Merge Window, Part 4 2024-01-20 11:06:04 -08:00
ufs SCSI misc on 20240120 2024-01-20 09:42:32 -08:00
uio uio: Fix use-after-free in uio_open 2024-01-04 17:03:47 +01:00
usb USB / Thunderbolt changes for 6.8-rc1 2024-01-18 11:43:55 -08:00
vdpa virtio: features, fixes 2024-01-18 16:44:03 -08:00
vfio VFIO updates for v6.8-rc1 2024-01-18 15:57:25 -08:00
vhost virtio: features, fixes 2024-01-18 16:44:03 -08:00
video This cycle, I2C removes the currently unused CLASS_DDC support 2024-01-18 17:29:01 -08:00
virt Char/Misc and other Driver changes for 6.8-rc1 2024-01-17 16:47:17 -08:00
virtio virtio: features, fixes 2024-01-18 16:44:03 -08:00
w1
watchdog linux-watchdog 6.8-rc1 tag 2024-01-12 13:32:30 -08:00
xen xen: branch for v6.8-rc1 2024-01-17 13:41:38 -08:00
zorro
Kconfig
Makefile fbdev/intelfb: Remove driver 2024-01-12 12:38:37 +01:00