cc2be63997
In this PR we are adding the functionality to collect all the process's threads' backtraces. ## Changes made in this PR ### **introduce threads mngr API** The **threads mngr API** which has 2 abilities: * `ThreadsManager_init() `- register to SIGUSR2. called on the server start-up. * ` ThreadsManager_runOnThreads()` - receives a list of a pid_t and a callback, tells every thread in the list to invoke the callback, and returns the output collected by each invocation. **Elaborating atomicvar API** * `atomicIncrGet(var,newvalue_var,count) `-- Increment and get the atomic counter new value * `atomicFlagGetSet` -- Get and set the atomic counter value to 1 ### **Always set SIGALRM handler** SIGALRM handler prints the process's stacktrace to the log file. Up until now, it was set only if the `server.watchdog_period` > 0. This can be also useful if debugging is needed. However, in situations where the server can't get requests, (a deadlock, for example) we weren't able to change the signal handler. To make it available at run time we set SIGALRM handler on server startup. The signal handler name was changed to a more general `sigalrmSignalHandler`. ### **Print all the process' threads' stacktraces** `logStackTrace()` now calls `writeStacktraces()`, instead of logging the current thread stacktrace. `writeStacktraces()`: * On Linux systems we use the threads manager API to collect the backtraces of all the process' threads. To get the `tids` list (threads ids) we read the `/proc/<redis-server-pid>/tasks` file which includes a list of directories. Each directory name corresponds to one tid (including the main thread). For each thread, we also need to check if it can get the signal from the threads manager (meaning it is not blocking/ignoring that signal). We send the threads manager this tids list and `collect_stacktrace_data()` callback, which collects the thread's backtrace addresses, its name, and tid. * On other systems, the behavior remained as it was (writing only the current thread stacktrace to the log file). ## compatibility notes 1. **The threads mngr API is only supported in linux.** 2. glibc earlier than 2.3 We use `syscall(SYS_gettid)` and `syscall(SYS_tgkill...)` because their dedicated alternatives (`gettid()` and `tgkill`) were added in glibc 2.3. ## Output example Each thread backtrace will have the following format: `<tid> <thread_name> [additional_info]` * **tid**: as read from the `/proc/<redis-server-pid>/tasks` file * **thread_name**: the tread name as it is registered in the os/ * **additional_info**: Sometimes we want to add specific information about one of the threads. currently. it is only used to mark the thread that handles the backtraces collection by adding "*". In case of crash - this also indicates which thread caused the crash. The handling thread in won't necessarily appear first. ``` ------ STACK TRACE ------ EIP: /lib/aarch64-linux-gnu/libc.so.6(epoll_pwait+0x9c)[0xffffb9295ebc] 67089 redis-server * linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0xffffb9437790] /lib/aarch64-linux-gnu/libc.so.6(epoll_pwait+0x9c)[0xffffb9295ebc] redis-server *:6379(+0x75e0c)[0xaaaac2fe5e0c] redis-server *:6379(aeProcessEvents+0x18c)[0xaaaac2fe6c00] redis-server *:6379(aeMain+0x24)[0xaaaac2fe7038] redis-server *:6379(main+0xe0c)[0xaaaac3001afc] /lib/aarch64-linux-gnu/libc.so.6(+0x273fc)[0xffffb91d73fc] /lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0x98)[0xffffb91d74cc] redis-server *:6379(_start+0x30)[0xaaaac2fe0370] 67093 bio_lazy_free /lib/aarch64-linux-gnu/libc.so.6(+0x79dfc)[0xffffb9229dfc] /lib/aarch64-linux-gnu/libc.so.6(pthread_cond_wait+0x208)[0xffffb922c8fc] redis-server *:6379(bioProcessBackgroundJobs+0x174)[0xaaaac30976e8] /lib/aarch64-linux-gnu/libc.so.6(+0x7d5c8)[0xffffb922d5c8] /lib/aarch64-linux-gnu/libc.so.6(+0xe5d1c)[0xffffb9295d1c] 67091 bio_close_file /lib/aarch64-linux-gnu/libc.so.6(+0x79dfc)[0xffffb9229dfc] /lib/aarch64-linux-gnu/libc.so.6(pthread_cond_wait+0x208)[0xffffb922c8fc] redis-server *:6379(bioProcessBackgroundJobs+0x174)[0xaaaac30976e8] /lib/aarch64-linux-gnu/libc.so.6(+0x7d5c8)[0xffffb922d5c8] /lib/aarch64-linux-gnu/libc.so.6(+0xe5d1c)[0xffffb9295d1c] 67092 bio_aof /lib/aarch64-linux-gnu/libc.so.6(+0x79dfc)[0xffffb9229dfc] /lib/aarch64-linux-gnu/libc.so.6(pthread_cond_wait+0x208)[0xffffb922c8fc] redis-server *:6379(bioProcessBackgroundJobs+0x174)[0xaaaac30976e8] /lib/aarch64-linux-gnu/libc.so.6(+0x7d5c8)[0xffffb922d5c8] /lib/aarch64-linux-gnu/libc.so.6(+0xe5d1c)[0xffffb9295d1c] 67089:signal-handler (1693824528) -------- ``` |
||
---|---|---|
.. | ||
assets | ||
cluster | ||
helpers | ||
integration | ||
modules | ||
sentinel | ||
support | ||
tmp | ||
unit | ||
instances.tcl | ||
README.md | ||
test_helper.tcl |
Redis Test Suite
The normal execution mode of the test suite involves starting and manipulating
local redis-server
instances, inspecting process state, log files, etc.
The test suite also supports execution against an external server, which is
enabled using the --host
and --port
parameters. When executing against an
external server, tests tagged external:skip
are skipped.
There are additional runtime options that can further adjust the test suite to match different external server configurations:
Option | Impact |
---|---|
--singledb |
Only use database 0, don't assume others are supported. |
--ignore-encoding |
Skip all checks for specific encoding. |
--ignore-digest |
Skip key value digest validations. |
--cluster-mode |
Run in strict Redis Cluster compatibility mode. |
--large-memory |
Enables tests that consume more than 100mb |
Tags
Tags are applied to tests to classify them according to the subsystem they test, but also to indicate compatibility with different run modes and required capabilities.
Tags can be applied in different context levels:
start_server
contexttags
context that bundles several tests together- A single test context.
The following compatibility and capability tags are currently used:
Tag | Indicates |
---|---|
external:skip |
Not compatible with external servers. |
cluster:skip |
Not compatible with --cluster-mode . |
large-memory |
Test that requires more than 100mb |
tls:skip |
Not compatible with --tls . |
needs:repl |
Uses replication and needs to be able to SYNC from server. |
needs:debug |
Uses the DEBUG command or other debugging focused commands (like OBJECT REFCOUNT ). |
needs:pfdebug |
Uses the PFDEBUG command. |
needs:config-maxmemory |
Uses CONFIG SET to manipulate memory limit, eviction policies, etc. |
needs:config-resetstat |
Uses CONFIG RESETSTAT to reset statistics. |
needs:reset |
Uses RESET to reset client connections. |
needs:save |
Uses SAVE or BGSAVE to create an RDB file. |
When using an external server (--host
and --port
), filtering using the
external:skip
tags is done automatically.
When using --cluster-mode
, filtering using the cluster:skip
tag is done
automatically.
When not using --large-memory
, filtering using the largemem:skip
tag is done
automatically.
In addition, it is possible to specify additional configuration. For example, to
run tests on a server that does not permit SYNC
use:
./runtest --host <host> --port <port> --tags -needs:repl