BUG/MEDIUM: muxes: enforce buf_wait check in takeover()

The ->takeover() is quite tricky. It didn't take care of the possibility that the original thread's connection handler had been woken up to handle an event (e.g. read0), failed to get a buffer, registered against its own thread's buffer_wait queue and left the connection in an idle state. A new thread could then come by, perform a takeover(), and when a buffer was available, the new thread's tasklet would be woken up by the old one via *_buf_available(), causing all sort of problems. These problems are easy to reproduce, by running with shared backend connections and few buffers (tune.buffers.limit=20, 8 threads, 500 connections, transfer 64kB objects and wait 2-5s for a crash to appear). A first estimated solution consisted in removing the connection from the idle list but it turns out that it would be worse for the delete stuff (the connection no longer appearing as idle, making it impossible to find it in order to close it). Also, idle counts wouldn't match anymore the list's state, and the special case of private connections could be difficult to handle as the connection could be forcefully re-added to the idle list after allocation despite being private. After multiple attempts to address the problem in various ways, it appears that the only reliable solution for now (without starting to turn many lists to mt_lists) is to have the takeover() function handle the buf_wait detection or unregistration itself: - when doing a regular takeover aiming at finding an idle connection for a new request, connections that are blocked in a buffer_wait queue are quite rare and not interesting at all (since not immediately usable), so skipping them is sufficient. For this we detect that the desired connection belongs to a buffer_wait list by checking its buf_wait.list element. Note that this check is *not* thread-safe! The LIST_DEL_INIT() is performed by __offer_buffers() after the callback was called. But this is sufficient as it is now because the only way for the element to be seen as not in a list is after the element was last touched by __offer_buffers(), so the situation for this connection will not change in a different way later. - when doing a server delete, we're running under thread isolation. The connection might get taken over to be killed. The only trick is that private connections not belonging to any idle list may also experience this, and in this case even the idle_conns lock will not offer any protection against anything. But since we're run under thread isolation, we're certain not to compete with the other thread, so it's safe to directly unregister the connection from its owner thread. Normally this is already handled by conn_release() in cli_parse_delete_server(), which calls mux->destroy(), but this would actually update the current thread's queue instead of the origin thread's, thus we do need to perform an explicit dequeue before completing the takeover. With this, the problem now looks solved for HTTP/1, HTTP/2 and FCGI, though extensive tests were essentially run on HTTP/1 and HTTP/2. While the problem has been there for a very long time, there should be no reason to backport it since buffer_wait didn't practically work before 3.0-dev and the process used to freeze hard very quickly before we'd even have a chance to meet that race.
2024-05-14 19:26:44 +02:00 · 2024-05-14 19:26:44 +02:00 · 821a04377d
commit 821a04377d
parent b0349cf2de
3 changed files with 66 additions and 0 deletions
--- a/src/mux_fcgi.c
+++ b/src/mux_fcgi.c
@ -4173,6 +4173,14 @@ static int fcgi_takeover(struct connection *conn, int orig_tid, int release)
 	 * has been migrated.
 	 */
 	if (!release) {
+		/* If the connection is attached to a buffer_wait (extremely
+		 * rare), it will be woken up at any instant by its own thread
+		 * and we can't undo it anyway, so let's give up on this one.
+		 * It's not interesting anyway since it's not usable right now.
+		 */
+		if (LIST_INLIST(&fcgi->buf_wait.list))
+			goto fail;
+
 		new_task = task_new_here();
 		new_tasklet = tasklet_new();
 		if (!new_task || !new_tasklet)
@ -4229,6 +4237,20 @@ static int fcgi_takeover(struct connection *conn, int orig_tid, int release)
 		                            SUB_RETRY_RECV, &fcgi->wait_event);
 	}

+	if (release) {
+		/* we're being called for a server deletion and are running
+		 * under thread isolation. That's the only way we can
+		 * unregister a possible subscription of the original
+		 * connection from its owner thread's queue, as this involves
+		 * manipulating thread-unsafe areas. Note that it is not
+		 * possible to just call b_dequeue() here as it would update
+		 * the current thread's bufq_map and not the original one.
+		 */
+		BUG_ON(!thread_isolated());
+		if (LIST_INLIST(&fcgi->buf_wait.list))
+			_b_dequeue(&fcgi->buf_wait, orig_tid);
+	}
+
 	if (new_task)
 		__task_free(new_task);
 	return 0;
--- a/src/mux_h1.c
+++ b/src/mux_h1.c
@ -5264,6 +5264,14 @@ static int h1_takeover(struct connection *conn, int orig_tid, int release)
 	 * has been migrated.
 	 */
 	if (!release) {
+		/* If the connection is attached to a buffer_wait (extremely
+		 * rare), it will be woken up at any instant by its own thread
+		 * and we can't undo it anyway, so let's give up on this one.
+		 * It's not interesting anyway since it's not usable right now.
+		 */
+		if (LIST_INLIST(&h1c->buf_wait.list))
+			goto fail;
+
 		new_task = task_new_here();
 		new_tasklet = tasklet_new();
 		if (!new_task || !new_tasklet)
@ -5320,6 +5328,20 @@ static int h1_takeover(struct connection *conn, int orig_tid, int release)
 		                           SUB_RETRY_RECV, &h1c->wait_event);
 	}

+	if (release) {
+		/* we're being called for a server deletion and are running
+		 * under thread isolation. That's the only way we can
+		 * unregister a possible subscription of the original
+		 * connection from its owner thread's queue, as this involves
+		 * manipulating thread-unsafe areas. Note that it is not
+		 * possible to just call b_dequeue() here as it would update
+		 * the current thread's bufq_map and not the original one.
+		 */
+		BUG_ON(!thread_isolated());
+		if (LIST_INLIST(&h1c->buf_wait.list))
+			_b_dequeue(&h1c->buf_wait, orig_tid);
+	}
+
 	if (new_task)
 		__task_free(new_task);
 	return 0;
--- a/src/mux_h2.c
+++ b/src/mux_h2.c
@ -7539,6 +7539,14 @@ static int h2_takeover(struct connection *conn, int orig_tid, int release)
 	 * has been migrated.
 	 */
 	if (!release) {
+		/* If the connection is attached to a buffer_wait (extremely
+		 * rare), it will be woken up at any instant by its own thread
+		 * and we can't undo it anyway, so let's give up on this one.
+		 * It's not interesting anyway since it's not usable right now.
+		 */
+		if (LIST_INLIST(&h2c->buf_wait.list))
+			goto fail;
+
 		new_task = task_new_here();
 		new_tasklet = tasklet_new();
 		if (!new_task || !new_tasklet)
@ -7595,6 +7603,20 @@ static int h2_takeover(struct connection *conn, int orig_tid, int release)
 		                           SUB_RETRY_RECV, &h2c->wait_event);
 	}

+	if (release) {
+		/* we're being called for a server deletion and are running
+		 * under thread isolation. That's the only way we can
+		 * unregister a possible subscription of the original
+		 * connection from its owner thread's queue, as this involves
+		 * manipulating thread-unsafe areas. Note that it is not
+		 * possible to just call b_dequeue() here as it would update
+		 * the current thread's bufq_map and not the original one.
+		 */
+		BUG_ON(!thread_isolated());
+		if (LIST_INLIST(&h2c->buf_wait.list))
+			_b_dequeue(&h2c->buf_wait, orig_tid);
+	}
+
 	if (new_task)
 		__task_free(new_task);
 	return 0;