IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
Autobuild-User(master): Jeremy Allison <jra@samba.org>
Autobuild-Date(master): Wed Dec 13 00:44:57 CET 2017 on sn-devel-144
After the race is before the race:
1) Create an idle thread
2) Add a job: This won't create a thread anymore
3) Immediately fork
The idle thread will be woken twice before it's actually woken up: Both
pthreadpool_add_job and pthreadpool_prepare_pool call cond_signal, for
different reasons. We must look at pool->prefork_cond first because otherwise
we will end up in a blocking job deep within a fork call, the helper thread
must take its fingers off the condvar as quickly as possible. This means that
after the fork there's no idle thread around anymore that would pick up the job
submitted in 2). So we must keep the idle threads around across the fork.
The quick solution to re-create one helper thread in pthreadpool_parent has a
fatal flaw: What do we do if that pthread_create call fails? We're deep in an
application calling fork(), and doing fancy signalling from there is really
something we must avoid.
This has one potential performance issue: If we have hundreds of idle threads
(do we ever have that) during the fork, the call to pthread_mutex_lock on the
fork_mutex from pthreadpool_server (the helper thread) will probably cause a
thundering herd when the _parent call unlocks the fork_mutex. The solution for
this to just keep one idle thread around. But this adds code that is not
strictly required functionally for now.
More detailed explanation from Jeremy:
First, understanding the problem the test reproduces:
add a job (num_jobs = 1) -> creates thread to run it.
job finishes, thread sticks around (num_idle = 1).
num_jobs is now zero (initial job finished).
a) Idle thread is now waiting on pool->condvar inside
pthreadpool_server() in pthread_cond_timedwait().
Now, add another job ->
pthreadpool_add_job()
-> pthreadpool_put_job()
This adds the job to the queue.
Oh, there is an idle thread so don't
create one, do:
pthread_cond_signal(&pool->condvar);
and return.
Now call fork *before* idle thread in (a) wakes from
the signaling of pool->condvar.
In the parent (child is irrelevent):
Go into: pthreadpool_prepare() ->
pthreadpool_prepare_pool()
Set the variable to tell idle threads to exit:
pool->prefork_cond = &prefork_cond;
then wake them up with:
pthread_cond_signal(&pool->condvar);
This does nothing as the idle thread
is already awoken.
b) Idle thread wakes up and does:
Reduce idle thread count (num_idle = 0)
pool->num_idle -= 1;
Check if we're in the middle of a fork.
if (pool->prefork_cond != NULL) {
Yes we are, tell pthreadpool_prepare()
we are exiting.
pthread_cond_signal(pool->prefork_cond);
And exit.
pthreadpool_server_exit(pool);
return NULL;
}
So we come back from the fork in the parent with num_jobs = 1,
a job on the queue but no idle threads - and the code that
creates a new thread on job submission was skipped because
an idle thread existed at point (a).
OK, assuming that the previous explaination is correct, the
fix is to create a new pthreadpool context mutex:
pool->fork_mutex
and in pthreadpool_server(), when an idle thread wakes up and
notices we're in the prepare fork state, it puts itself to
sleep by waiting on the new pool->fork_mutex.
And in pthreadpool_prepare_pool(), instead of waiting for
the idle threads to exit, hold the pool->fork_mutex and
signal each idle thread in turn, and wait for the pool->num_idle
to go to zero - which means they're all blocked waiting on
pool->fork_mutex.
When the parent continues, pthreadpool_parent()
unlocks the pool->fork_mutex and all the previously
'idle' threads wake up (and you mention the thundering
herd problem, which is as you say vanishingly small :-)
and pick up any remaining job.
Bug: https://bugzilla.samba.org/show_bug.cgi?id=13179
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
When an error is returned to the caller of pthreadpool_add_job, the job
should not be kept in the internal job array. Otherwise the caller might
free the data structure and a later worker thread would still reference
it.
When it is not possible to create a single worker thread, the system
might be out of resources or hitting a configured limit. In this case
fall back to calling the job function synchronously instead of raising
the error to the caller and possibly back to the SMB client.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13170
Signed-off-by: Christof Schmitt <cs@samba.org>
Reviewed-by: Volker Lendecke <vl@samba.org>
No functional change, but this simplifies error handling.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13170
Signed-off-by: Christof Schmitt <cs@samba.org>
Reviewed-by: Volker Lendecke <vl@samba.org>
glibc's pthread_cond_wait(&c, &m) increments m.__data.__nusers, making
pthread_mutex_destroy return EBUSY. Thus we can't allow any thread waiting for
a job across a fork. Also, the state of the condvar itself is unclear across a
fork. Right now to me it looks like an initialized but unused condvar can be
used in the child. Busy worker threads don't cause any trouble here, they don't
hold mutexes or condvars. Also, they can't reach the condvar because _prepare
holds all mutexes.
Bug: https://bugzilla.samba.org/show_bug.cgi?id=13006
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
When copying large files from the server to the client with aio enabled
we noticed that smbd kept growing RSS and VSZ.
valgrind was reporting:
==2503== 4,093,440 bytes in 6,560 blocks are possibly lost in loss record 460 of 460
==2503== at 0x4C299CE: calloc (vg_replace_malloc.c:711)
==2503== by 0x4011C24: _dl_allocate_tls (in /usr/lib64/ld-2.17.so)
==2503== by 0x4E3C960: pthread_create@@GLIBC_2.2.5 (in /usr/lib64/libpthread-2.17.so)
==2503== by 0x9B298AE: pthreadpool_add_job (in /usr/lib64/samba/libmessages-dgm-samba4.so)
==2503== by 0x9B29FDC: pthreadpool_tevent_job_send (in /usr/lib64/samba/libmessages-dgm-samba4.so)
==2503== by 0x56A78EF: ??? (in /usr/lib64/samba/libsmbd-base-samba4.so)
==2503== by 0x55D86B7: smb_vfs_call_pread_send (in /usr/lib64/samba/libsmbd-base-samba4.so)
==2503== by 0x55F7543: schedule_smb2_aio_read (in /usr/lib64/samba/libsmbd-base-samba4.so)
==2503== by 0x5608F57: smbd_smb2_request_process_read (in /usr/lib64/samba/libsmbd-base-samba4.so)
==2503== by 0x55FCB6C: smbd_smb2_request_dispatch (in /usr/lib64/samba/libsmbd-base-samba4.so)
==2503== by 0x55FD7DC: ??? (in /usr/lib64/samba/libsmbd-base-samba4.so)
==2503== by 0x641B977: ??? (in /usr/lib64/samba/libtevent.so.0.9.31)
The problem seems to be caused by worked threads that are not properly
started in detached state and thus their tls is not reclaimed upon
thread termination.
In pthreadpool.c we prepare a pthread attribute with
PTHREAD_CREATE_DETACHED, but we don't pass it to pthread_create().
Bug: https://bugzilla.samba.org/show_bug.cgi?id=12624
Signed-off-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Autobuild-User(master): Ralph Böhme <slow@samba.org>
Autobuild-Date(master): Fri Mar 10 22:06:02 CET 2017 on sn-devel-144