linux/sound/soc
Mark Brown 9e376b14ef
ASoC : soc-pcm: fix trigger race conditions with shared BE
Merge series from Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>:

We've been adding a 'deep buffer' PCM device to several SOF topologies
in order to reduce power consumption. The typical use-case would be
music playback over a headset: this additional PCM device provides
more buffering and longer latencies, leaving the rest of the system
sleep for longer periods. Notifications and 'regular' low-latency
audio playback would still use the 'normal' PCM device and be mixed
with the 'deep buffer' before rendering on the headphone endpoint. The
tentative direction would be to expose this alternate device to
PulseAudio/PipeWire/CRAS via the UCM SectionModifier definitions.

That seemed a straightforward topology change until our automated
validation stress tests started reporting issues on SoundWire
platforms, when e.g. two START triggers might be send and conversely
the STOP trigger is never sent. The SoundWire stream state management
flagged inconsistent states when the two 'normal' and 'deep buffer'
devices are used concurrently with rapid play/stop/pause monkey
testing.

Looking at the soc-pcm.c code, it seems that the BE state
management needs a lot of love.

a) there is no consistent protection for the BE state. In some parts
of the code, the state updates are protected by a spinlock but in the
trigger they are not. When we open/play/close the two PCM devices in
stress tests, we end-up testing a state that is being modified. That
can't be good.

b) there is a conceptual deadlock: on stop we check the FE states to
see if a shared BE can be stopped, but since we trigger the BE first
the FE states have not been modified yet, so the TRIGGER_STOP is never
sent.

This patchset suggests the removal of the dedicated 'dpcm_lock' and
follows the design suggested by Takashi Iwai.  By default the
protection relies on the 'pcm_mutex', except for the FE and BE
triggers where the mutex cannot be used.  In this case, the FE PCM
lock is used instead. In the cases where a BE is added/removed, the
pcm_mutex and FE PCM lock are both taken.  In addition, the BE PCM
lock is used to serialize access to a shared BE.

With these patches I am able to run our entire validation suite
without any issues with this new 'deep buffer' topology, and no
regressions on existing solutions [1]. The tests were reproduced by
Bard Liao for SoundWire devices.

One might ask 'how come we didn't see this earlier'? The answer is
probably that the .trigger callbacks in most implementations seems to
perform DAPM operations, and sending the triggers multiple times is
not an issue. In the case of SoundWire, we do use the .trigger
callback to reconfigure the bus using the 'bank switch' mechanism. It
could be acceptable to tolerate a trigger multiple times, but the
deadlock on stop cannot be fixed at the SoundWire level alone.

Opens:

1) The issues reported by Nvidia on the RFCv3 may or may not be
present. We'd need test results to make sure the locking update does
not introduce a regression on Tegra.

2) There are other reports of kernel oopses [2] that seem related to
the lack of protection. I'd be good to confirm if this patchset solve
these problems as well.

[1] https://github.com/thesofproject/linux/pull/3146
[2] https://lore.kernel.org/alsa-devel/002f01d7b4f5$c030f4a0$4092dde0$@samsung.com/

changes since RFCv3:
Used two patches from Takashi. We now use the pcm_mutex, the FE stream
lock when adding and deleting a BE, and the BE stream lock to handle
concurrency between streams using the same BE.
Added a patch to use GFP_ATOMIC for the DPCM structure.
Fixed PAUSE_RELEASE transition (GitHub comment from Kai Vehmanen)

changes since RFCv2:
Removal of dpcm_lock to use FE PCM locks (credits to Takashi Iwai for
the suggestion). The FE PCM lock is now used before each use of
for_each_dpcm_be() - with the exception of the trigger where the lock
is already taken. This change is also applied in drivers which make
use of this loop (compress, SH, FSL).
Addition of BE PCM lock to deal with mutual exclusion between triggers
for the same BE.
Alignment of the BE atomicity on the FE on connections, this is
required to avoid sleeping in atomic context.
Additional cleanups (indentation, static functions)

changes since RFC v1:
Removed unused function
Removed exported symbols only used in soc-pcm.c, used static instead
Use a mutex instead of a spinlock
Protect all for_each_dpcm_be() loops
Fix bugs introduced in the refcount

Pierre-Louis Bossart (4):
  ASoC: soc-pcm: use GFP_ATOMIC for dpcm structure
  ASoC: soc-pcm: align BE 'atomicity' with that of the FE
  ASoC: soc-pcm: test refcount before triggering
  ASoC: soc-pcm: fix BE handling of PAUSE_RELEASE

Takashi Iwai (2):
  ASoC: soc-pcm: Fix and cleanup DPCM locking
  ASoC: soc-pcm: serialize BE triggers

 include/sound/soc-dpcm.h |   2 +
 include/sound/soc.h      |   2 -
 sound/soc/soc-core.c     |   1 -
 sound/soc/soc-pcm.c      | 351 +++++++++++++++++++++++++++------------
 4 files changed, 246 insertions(+), 110 deletions(-)

--
2.25.1
2021-12-15 02:02:41 +00:00
..
adi
amd ASoC: amd: Convert to new style DAI format definitions 2021-12-08 16:47:31 +00:00
atmel ASoC: atmel: Convert to new style DAI format definitions 2021-09-16 14:11:30 +01:00
au1x ASoC: au1x: Convert to modern terminology for DAI clocking 2021-09-16 14:11:37 +01:00
bcm ASoC: bcm: Convert to modern clocking terminology 2021-09-27 13:01:09 +01:00
cirrus ARM: SoC drivers for 5.16 2021-11-03 17:00:52 -07:00
codecs ASoC: rt5682s: add delay time to fix pop sound issue 2021-12-08 13:07:58 +00:00
dwc ASoC: dwc-i2s: Update to modern clocking terminology 2021-09-27 13:01:12 +01:00
fsl ASoC: fsl-asoc-card: Add missing Kconfig option for tlv320aic31xx 2021-12-06 13:49:19 +00:00
generic ASoC: test-component: fix null pointer dereference. 2021-12-09 12:31:50 +00:00
hisilicon
img
intel ASoC: Intel: sof_rt5682: Move rt1015 speaker amp to common file 2021-12-08 13:07:59 +00:00
jz4740
kirkwood
mediatek ASoC: mediatek: assign correct type to argument 2021-12-14 13:22:18 +00:00
meson ASoC: meson: axg-tdm-interface: manage formatters in trigger 2021-10-22 13:25:48 +01:00
mxs
pxa
qcom ASoC: qdsp6: Fix an IS_ERR() vs NULL bug 2021-12-14 17:15:52 +00:00
rockchip ASoC: rockchip: i2s_tdm: Dup static DAI template 2021-11-30 13:08:01 +00:00
samsung ASoC: samsung: add missing "fallthrough;" 2021-09-27 13:00:53 +01:00
sh ASoC: rsnd: fixup DMAEngine API 2021-11-12 21:25:19 +00:00
sof ASoC: SOF: sof-probes: Constify sof_probe_compr_ops 2021-12-14 13:22:16 +00:00
spear
sprd
sti
stm Merge branch 'for-5.16' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into asoc-5.17 so we can apply new Tegra work 2021-12-01 14:15:12 +00:00
sunxi ASoC: sunxi: sun4i-spdif: Implement IEC958 control 2021-11-29 12:19:49 +00:00
tegra ASoC: tegra: Add master volume/mute control support 2021-12-01 14:15:39 +00:00
ti ASoC: ti: davinci-mcasp: Remove unnecessary conditional 2021-12-06 13:49:29 +00:00
uniphier ASoC: uniphier: drop selecting non-existing SND_SOC_UNIPHIER_AIO_DMA 2021-11-25 11:54:30 +00:00
ux500 ASoC: ux500: mop500: Constify static snd_soc_ops 2021-09-29 13:06:38 +01:00
xilinx
xtensa
Kconfig
Makefile
soc-ac97.c ASoC: soc-ac97: cleanup cppcheck warning 2021-08-16 13:29:36 +01:00
soc-acpi.c ASoC: soc-acpi: Set mach->id field on comp_ids matches 2021-11-22 15:40:01 +00:00
soc-card.c
soc-component.c ASoC: soc-component: add snd_soc_pcm_component_delay() 2021-11-29 12:19:41 +00:00
soc-compress.c ASoC: compress/component: Use module_get_when_open/put_when_close for cstream 2021-09-20 13:30:18 +01:00
soc-core.c ASoC: soc-pcm: Fix and cleanup DPCM locking 2021-12-14 17:15:45 +00:00
soc-dai.c ASoC: soc-dai: update snd_soc_dai_delay() to snd_soc_pcm_dai_delay() 2021-11-29 12:19:40 +00:00
soc-dapm.c ASoC: DAPM: Cover regression by kctl change notification fix 2021-11-05 12:58:12 +00:00
soc-devres.c
soc-generic-dmaengine-pcm.c ASoC: dmaengine: Introduce module option prealloc_buffer_size_kbytes 2021-09-27 13:01:13 +01:00
soc-jack.c ASoC: soc-jack: cleanup cppcheck warning for CONFIG_GPIOLIB 2021-08-16 13:29:34 +01:00
soc-link.c
soc-ops.c
soc-pcm.c ASoC: soc-pcm: fix BE handling of PAUSE_RELEASE 2021-12-14 17:15:48 +00:00
soc-topology-test.c
soc-topology.c ASoC: topology: Add missing rwsem around snd_ctl_remove() calls 2021-11-16 14:29:50 +00:00
soc-utils.c ASoC: Stop dummy from overriding hwparams 2021-10-29 16:49:45 +01:00