Johannes Berg
2cf3a3c4b8
um: time-travel: fix signal blocking race/hang
When signals are hard-blocked in order to do time-travel socket processing, we set signals_blocked and then handle SIGIO signals by setting the SIGIO bit in signals_pending. When unblocking, we first set signals_blocked to 0, and then handle all pending signals. We have to set it first, so that we can again properly block/unblock inside the unblock, if the time-travel handlers need to be processed. Unfortunately, this is racy. We can get into this situation: // signals_pending = SIGIO_MASK unblock_signals_hard() signals_blocked = 0; if (signals_pending && signals_enabled) { block_signals(); unblock_signals() ... sig_handler_common(SIGIO, NULL, NULL); sigio_handler() ... sigio_reg_handler() irq_do_timetravel_handler() reg->timetravel_handler() == vu_req_interrupt_comm_handler() vu_req_read_message() vhost_user_recv_req() vhost_user_recv() vhost_user_recv_header() // reads 12 bytes header of // 20 bytes message <-- receive SIGIO here <-- sig_handler() int enabled = signals_enabled; // 1 if ((signals_blocked || !enabled) && (sig == SIGIO)) { if (!signals_blocked && time_travel_mode == TT_MODE_EXTERNAL) sigio_run_timetravel_handlers() _sigio_handler() sigio_reg_handler() ... as above ... vhost_user_recv_header() // reads 8 bytes that were message payload // as if it were header - but aborts since // it then gets -EAGAIN ... --> end signal handler --> // continue in vhost_user_recv() // full_read() for 8 bytes payload busy loops // entire process hangs here Conceptually, to fix this, we need to ensure that the signal handler cannot run while we hard-unblock signals. The thing that makes this more complex is that we can be doing hard-block/unblock while unblocking. Introduce a new signals_blocked_pending variable that we can keep at non-zero as long as pending signals are being processed, then we only need to ensure it's decremented safely and the signal handler will only increment it if it's already non-zero (or signals_blocked is set, of course.) Note also that only the outermost call to hard-unblock is allowed to decrement signals_blocked_pending, since it could otherwise reach zero in an inner call, and leave the same race happening if the timetravel_handler loops, but that's basically required of it. Fixes: d6b399a0e02a ("um: time-travel/signals: fix ndelay() in interrupt") Link: https://patch.msgid.link/20240703110144.28034-2-johannes@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Linux kernel ============ There are several guides for kernel developers and users. These guides can be rendered in a number of formats, like HTML and PDF. Please read Documentation/admin-guide/README.rst first. In order to build the documentation, use ``make htmldocs`` or ``make pdfdocs``. The formatted documentation can also be read online at: https://www.kernel.org/doc/html/latest/ There are various text files in the Documentation/ subdirectory, several of them using the reStructuredText markup notation. Please read the Documentation/process/changes.rst file, as it contains the requirements for building and running the kernel, and information about the problems which may result by upgrading your kernel.
Description
Languages
C
97.6%
Assembly
1%
Shell
0.5%
Python
0.3%
Makefile
0.3%