1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-23 17:34:34 +03:00
samba-mirror/lib
Stefan Metzmacher e232ba946f lib/tsocket: avoid endless cpu-spinning in tstream_bsd_fde_handler()
There were some reports that strace output an LDAP server socket is in
CLOSE_WAIT state, returning EAGAIN for writev over and over (after a call to
epoll() each time).

In the tstream_bsd code the problem happens when we have a pending
writev_send, while there's no readv_send pending. In that case
we still ask for TEVENT_FD_READ in order to notice connection errors
early, so we try to call writev even if the socket doesn't report TEVENT_FD_WRITE.
And there are situations where we do that over and over again.

It happens like this with a Linux kernel:

    tcp_fin() has this:
        struct tcp_sock *tp = tcp_sk(sk);

        inet_csk_schedule_ack(sk);

        sk->sk_shutdown |= RCV_SHUTDOWN;
        sock_set_flag(sk, SOCK_DONE);

        switch (sk->sk_state) {
        case TCP_SYN_RECV:
        case TCP_ESTABLISHED:
                /* Move to CLOSE_WAIT */
                tcp_set_state(sk, TCP_CLOSE_WAIT);
                inet_csk_enter_pingpong_mode(sk);
                break;

It means RCV_SHUTDOWN gets set as well as TCP_CLOSE_WAIT, but
sk->sk_err is not changed to indicate an error.

    tcp_sendmsg_locked has this:
    ...
        err = -EPIPE;
        if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))
                goto do_error;

        while (msg_data_left(msg)) {
                int copy = 0;

                skb = tcp_write_queue_tail(sk);
                if (skb)
                        copy = size_goal - skb->len;

                if (copy <= 0 || !tcp_skb_can_collapse_to(skb)) {
                        bool first_skb;

    new_segment:
                        if (!sk_stream_memory_free(sk))
                                goto wait_for_space;

    ...

    wait_for_space:
                set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
                if (copied)
                        tcp_push(sk, flags & ~MSG_MORE, mss_now,
                                 TCP_NAGLE_PUSH, size_goal);

                err = sk_stream_wait_memory(sk, &timeo);
                if (err != 0)
                        goto do_error;

It means if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN)) doesn't
hit as we only have RCV_SHUTDOWN and sk_stream_wait_memory returns
-EAGAIN.

    tcp_poll has this:

        if (sk->sk_shutdown & RCV_SHUTDOWN)
                mask |= EPOLLIN | EPOLLRDNORM | EPOLLRDHUP;

So we'll get EPOLLIN | EPOLLRDNORM | EPOLLRDHUP triggering
TEVENT_FD_READ and writev/sendmsg keeps getting EAGAIN.

So we need to always clear TEVENT_FD_READ if we don't
have readable handler in order to avoid burning cpu.
But we turn it on again after a timeout of 1 second
in order to monitor the error state of the connection.

And now that our tsocket_bsd_error() helper checks for POLLRDHUP,
we can check if the socket is in an error state before calling the
writable handler when TEVENT_FD_READ was reported.
Only on error we'll call the writable handler, which will pick
the error without calling writev().

BUG: https://bugzilla.samba.org/show_bug.cgi?id=15202

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Ralph Boehme <slow@samba.org>
2022-10-19 16:14:36 +00:00
..
addns libaddns: remove duplicate declaration 2022-09-16 05:46:35 +00:00
afs s3:param: make "servicename" a substituted option 2019-11-27 10:25:37 +00:00
async_req lib: Use FIONREAD in wait_for_read_send/recv 2021-03-16 17:09:31 +00:00
audit_logging audit_logging: add method to replace the object for a given key with a new object 2022-08-08 12:56:28 +00:00
cmdline lib:cmdline: Fix error handling of --client-protection=sign|encrypt|off 2022-06-22 11:49:23 +00:00
compression lib: Fix the 32-bit build 2022-07-23 23:29:38 +00:00
crypto lib:crypto: Change error return to SMB_ASSERT() 2022-10-05 04:23:32 +00:00
dbwrap lib/dbwrap: allow dbwrap_merge_dbufs() to update an existing buffer 2022-09-20 00:34:35 +00:00
fuzzing lib:fuzzing: Fix shellcheck errors in build_samba.sh 2022-08-17 10:08:35 +00:00
krb5_wrap lib:krb5_wrap: Add helper functions to make krb5_data structure 2022-10-05 04:23:33 +00:00
ldb pyldb: Fix tests going unused 2022-10-05 04:23:32 +00:00
ldb-samba pyldb: Fix typos in function names 2022-10-05 05:23:50 +00:00
messaging lib/messaging: s/getpid/tevent_cached_getpid 2022-07-25 17:34:33 +00:00
mscat lib;smbd: Fix the -Os build by initializing variables 2021-08-06 17:22:30 +00:00
param dsdb: Allow password history and password changes without an NT hash 2022-06-26 22:10:29 +00:00
printer_driver printing: Align integer types 2021-04-01 19:32:36 +00:00
pthreadpool build: Do not build selftest binaries for builds without --enable-selftest 2019-11-22 11:48:59 +00:00
replace lib:replace: Add macro BURN_STR() to zero memory of a string 2022-08-26 07:59:32 +00:00
smbconf lib/smbconf: expose smbconf error codes to python wrapper 2022-06-08 13:13:10 +00:00
socket lib/socket: autodetect RSS using ETHTOOL_GRXRINGS 2020-05-07 14:44:40 +00:00
talloc talloc: version 2.3.4 2022-06-08 17:02:29 +00:00
tdb tdb: version 1.4.7 2022-06-08 17:57:53 +00:00
tdb_wrap lib: Open tdb files with O_CLOEXEC 2021-06-04 16:47:34 +00:00
tdr lib: Fix 1354521 Unchecked return value 2016-03-01 21:49:44 +01:00
tevent tevent: Fix flag clearing 2022-10-03 21:05:31 +00:00
texpect texpect: don't ignore unknown options 2021-09-10 15:10:30 +00:00
torture torture: add torture_assertf() 2022-06-17 01:28:30 +00:00
tsocket lib/tsocket: avoid endless cpu-spinning in tstream_bsd_fde_handler() 2022-10-19 16:14:36 +00:00
util lib:util: Check memset_s() error code in talloc_keep_secret_destructor() 2022-09-12 23:07:38 +00:00
README various: Remove references to about to be deleted thirdparty/dnspython 2018-12-11 20:07:18 +01:00
wscript_build Remove 'external' python module support code - use the third_party directory instead. 2015-03-06 04:41:48 +01:00

compression - Various compression algorithms (MSZIP, lzxpress)
popt - Command-line option parsing library
replace - Provides replacements for standard (POSIX, C99) functions 
          not provided by the host platform.
subunit - Utilities and bindings for working with the Subunit test result 
          reporting protocol.
talloc - Hierarchical pool based memory allocator 
tdb - Simple but fast key/value database library, supporting multiple writers
torture - Simple unit testing helper library