IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Add a way to HSCAN a hash key, and get only the filed names.
Command syntax is now:
```
HSCAN key cursor [MATCH pattern] [COUNT count] [NOVALUES]
```
when `NOVALUES` is on, the command will only return keys in the hash.
---------
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Adds an ability to kill clients older than a specified age.
Also, fixed the age calculation in `catClientInfoString` to use
`commandTimeSnapshot`
instead of the old `server.unixtime`, and added missing documentation
for
`CLIENT KILL ID` to output of `CLIENT help`.
---------
Co-authored-by: Oran Agra <oran@redislabs.com>
tests consistently fail on timeout (sleep that's too short).
it now takes more time because in #12819 we iterate on all dicts, not
just non-empty ones.
it passed the PR's CI because it skips the `slow` tag, which might have
been misplaced, but now it is probably required.
with the fix, the tests take quite a lot of time:
```
[ok]: Redis can trigger resizing (1860 ms)
[ok]: Redis can rewind and trigger smaller slot resizing (744 ms)
```
before #12819:
```
[ok]: Redis can trigger resizing (309 ms)
[ok]: Redis can rewind and trigger smaller slot resizing (295 ms)
```
failure:
https://github.com/redis/redis/actions/runs/7704158180/job/20995931735
```
*** [err]: expire scan should skip dictionaries with lot's of empty buckets in tests/unit/expire.tcl
scan didn't handle slot skipping logic.
*** [err]: Redis can trigger resizing in tests/unit/other.tcl
Expected '[Dictionary HT]
Hash table 0 stats (main hash table):
table size: 128
number of elements: 5
[Expires HT]
Hash table 0 stats (main hash table):
No stats available for empty dictionaries
' to match '*table size: 8*' (context: type eval line 29 cmd {assert_match "*table size: 8*" [r debug HTSTATS 0]} proc ::test)
*** [err]: Redis can rewind and trigger smaller slot resizing in tests/unit/other.tcl
Expected '[Dictionary HT]
Hash table 0 stats (main hash table):
table size: 256
number of elements: 10
[Expires HT]
Hash table 0 stats (main hash table):
No stats available for empty dictionaries
' to match '*table size: 16*' (context: type eval line 27 cmd {assert_match "*table size: 16*" [r debug HTSTATS 0]} proc ::test)
```
In #11012, we will reprocess command when client is unblocked on keys,
in some blocking commands, for example, in the XREADGROUP BLOCK
scenario,
because of the re-processing command, we will recalculate the block
timeout,
causing the blocking time to be reset.
This commit add a new CLIENT_REPROCESSING_COMMAND clent flag, explicitly
let the command know that it is being re-processed, later in
blockForKeys
we will not reset the timeout.
Affected BLOCK cases:
- list / zset / stream, added test cases for each.
Unaffected cases:
- module (never re-process the commands).
- WAIT / WAITAOF (never re-process the commands).
Fixes#12998.
The function `tryResizeHashTables` only attempts to shrink the dicts
that has keys (change from #11695), this was a serious problem until the
change in #12850 since it meant if all keys are deleted, we won't shrink
the dick.
But still, both dictShrink and dictExpand may be blocked by a fork child
process, therefore, the cron job needs to perform both dictShrink and
dictExpand, for not just non-empty dicts, but all dicts in DBs.
What this PR does:
1. Try to resize all dicts in DBs (not just non-empty ones, as it was
since #12850)
2. handle both shrink and expand (not just shrink, as it was since
forever)
3. Refactor some APIs about dict resizing (get rid of `htNeedsShrink`
`htNeedsShrink` `dictShrinkToFit`, and expose `dictShrinkIfNeeded`
`dictExpandIfNeeded` which already contains all the code of those
functions we get rid of, to make APIs more neat)
4. In the `Don't rehash if redis has child process` test, now that cron
would do resizing, we no longer need to write to DB after the child
process got killed, and can wait for the cron to expand the hash table.
Modules may want to handle allocation failures gracefully. Adding
RM_TryCalloc() and RM_TryRealloc() for it.
RM_TryAlloc() was added before:
https://github.com/redis/redis/pull/10541
#### Problem Statement:
For any read/update operation during rehashing, we're doing ~10+ random
DRAM lookups to do the rehashing, as we are using the `rehashidx` to
rehash 10 buckets, whose dict entries most likely aren't cached in the
CPU or near the bucket we are operating on. If these random bucket are
empty, the rehashing process during that command execution is skipped.
#### Implementation:
For reducing the performance recession while dict is rehashing, we
determine the index at which the key would be stored in the 0th HT, we
check if that index has already been rehashed, if not we will rehash the
bucket containing the key and the bucket will be moved from 0th HT to
the 1st HT.
If the key has already been rehashed, we perform the random access
bucket rehash (using `rehashidx`) and we again verify if rehashing is
still ongoing and look up the key in the respective HT.
This ensures rehashing is not skipped in any command call and that we
rehash a particular bucket or random bucket in each call.
#### Changes in this PR:
- Added a new method `dictBucketRehash` to perform rehash on a single
bucket.
- Helper function `moveKeysInBucketOldtoNew` for `dictRehash` and
`dictBucketRehash` to move all the keys in a bucket from the old to the
new hash HT.
- Helper function `verifyMoreRehashRequired` for `dictRehash` and
`dictBucketRehash` to check if we have already rehashed the whole table
and if more rehashing is required.
### Benchmark:
- This PR still shows **~13%** improvement in the latency during
rehashing.
- Rehashing is now **~2%** faster for this PR when compared to unstable.
---------
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com>
We have test cases for incr related commands with no key exist and
spaces in key and wrong type of key. However, we dont have test cases
covered for INCRBY INCRBYFLOAT DECRBY INCR DECR HINCRBY HINCRBYFLOAT
ZINCRBY with valid key and invalid value as argument, and float value to
incrby and decrby. So added test cases for the scenarios in incr.tcl.
Thank you!
In #11568 we removed the NOSCRIPT flag from commands, e.g. removing
NOSCRIPT flag from WAIT. Aiming to allow them in scripts and let them
implicitly behave in the non-blocking way.
This PR remove NOSCRIPT flag from WAITAOF just like WAIT (to be
symmetrical)).
And this PR also add BLOCKING flag for WAIT and WAITAOF.
This PR did some cleanups around function:
- drop the comment about Libraries Ctx, since we do have comment
in functionsLibCtx, no need to maintain multiple copies.
- remove outdated comment about the dropped Library description.
- remove unused desc and code vars in functionExtractLibMetaData.
- fix engines_nemory typo, changed it to engines_memory.
- remove outdated comment about FUNCTION CREATE and FUNCTION INFO,
FUNCTION CREATE was renamed to FUNCTION LOAD.
- Check in initServer whether the return of functionsInit is OK.
Currently slowlog gets disabled if slowlog-log-slower-than is set to less than zero. I think we should also disable it if slowlog-max-len is set to zero. We apply the same logic to acllog-max-len.
Before this change (most recently modified in
https://github.com/redis/redis/pull/12850#discussion_r1421406393), The
trigger for normal expand threshold was 100% utilization and the trigger
for normal shrink threshold was 10% (HASHTABLE_MIN_FILL).
While during fork (DICT_RESIZE_AVOID), when we want to avoid rehash, the
trigger thresholds were multiplied by 5 (`dict_force_resize_ratio`),
meaning 500% for expand and 2% (100/10/5) for shrink.
However, in `dictRehash` (the incremental rehashing), the rehashing
threshold for shrinking during fork (DICT_RESIZE_AVOID) was 20% by
mistake.
This meant that if a shrinking is triggered when `dict_can_resize` is
`DICT_RESIZE_ENABLE` which the threshold is 10%, the rehashing can
continue when `dict_can_resize` is `DICT_RESIZE_AVOID`.
This would cause unwanted CopyOnWrite damage.
It'll make sense to change the thresholds of the rehash trigger and the
thresholds of the incremental rehashing the same, however, in one we
compare the size of the hash table to the number of records, and in the
other we compare the size of ht[0] to the size of ht[1], so the formula
is not exactly the same.
to make things easier we change all the thresholds to powers of 2, so
the normal shrinking threshold is changed from 100/10 (i.e. 10%) to
100/8 (i.e. 12.5%), and we change the threshold during forks from 5 to
4, i.e. from 500% to 400% for expand, and from 2% (100/10/5) to 3.125%
(100/8/4)
Before #12850, we will only try to shrink the dict in serverCron,
which we can control by using a child process, but now every time
we delete a key, the shrink check will be called.
In these test (added in #12802), we meant to disable the resizing,
but druing the delete, the dict will meet the force shrink, like
2 / 128 = 0.015 < 0.2, the delete will trigger a force resize and
will cause the test to fail.
In this commit, we try to keep the load factor at 3 / 128 = 0.023,
that is, do not meet the force shrink.
The test have a race:
```
*** [err]: Redis can rewind and trigger smaller slot resizing in tests/unit/other.tcl
Expected '[Dictionary HT]
Hash table 0 stats (main hash table):
table size: 12
number of elements: 2
[Expires HT]
Hash table 0 stats (main hash table):
No stats available for empty dictionaries
' to match '*table size: 8*' (context: type eval line 12 cmd {assert_match "*table size: 8*" [r debug HTSTATS 0]} proc ::test)
```
When `r del "{alice}$j"` is executed in the loop, when the key is
deleted to [9, 12], the load factor has meet HASHTABLE_MIN_FILL,
if serverCron happens to trigger slot dict resize, then the test
will fail. Because there is not way to meet HASHTABLE_MIN_FILL in
the subsequent dels.
The solution is to avoid triggering the resize in advance. We can
use multi to delete them at once, or we can disable the resize.
Since we disabled resize in the previous test, the fix also uses
the method of disabling resize.
The test is introduced in #12802.
When we insert entries into dict, it may autonomously expand if needed.
However, when we delete entries from dict, it doesn't shrink to the
proper size. If there are few entries in a very large dict, it may cause
huge waste of memory and inefficiency when iterating.
The main keyspace dicts (keys and expires), are shrinked by cron
(`tryResizeHashTables` calls `htNeedsResize` and `dictResize`),
And some data structures such as zset and hash also do that (call
`htNeedsResize`) right after a loop of calls to `dictDelete`,
But many other dicts are completely missing that call (they can only
expand).
In this PR, we provide the ability to automatically shrink the dict when
deleting. The conditions triggering the shrinking is the same as
`htNeedsResize` used to have. i.e. we expand when we're over 100%
utilization, and shrink when we're below 10% utilization.
Additionally:
* Add `dictPauseAutoResize` so that flows that do mass deletions, will
only trigger shrinkage at the end.
* Rename `dictResize` to `dictShrinkToFit` (same logic as it used to
have, but better name describing it)
* Rename `_dictExpand` to `_dictResize` (same logic as it used to have,
but better name describing it)
related to discussion
https://github.com/redis/redis/pull/12819#discussion_r1409293878
---------
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>
Regarding how to obtain the hash slot of a key, there is an optimization
in `getKeySlot()`, it is used to avoid redundant hash calculations for
keys: when the current client is in the process of executing a command,
it can directly use the slot of the current client because the slot to
access has already been calculated in advance in `processCommand()`.
However, scripts are a special case where, in default mode or with
`allow-cross-slot-keys` enabled, they are allowed to access keys beyond
the pre-declared range. This means that the keys they operate on may not
belong to the slot of the pre-declared keys. Currently, when the
commands in a script are executed, the slot of the original client
(i.e., the current client) is not correctly updated, leading to
subsequent access to the wrong slot.
This PR fixes the above issue. When checking the cluster constraints in
a script, the slot to be accessed by the current command is set for the
original client (i.e., the current client). This ensures that
`getKeySlot()` gets the correct slot cache.
Additionally, the following modifications are made:
1. The 'sort' and 'sort_ro' commands use `getKeySlot()` instead of
`c->slot` because the client could be an engine client in a script and
can lead to potential bug.
2. `getKeySlot()` is also used in pubsub to obtain the slot for the
channel, standardizing the way slots are retrieved.
The test was introduced in #10745, but we forgot to add it to the
test_helper.tcl, so our CI did not actually run it. This PR adds it
and ensures it passes CI tests.
Fix a daily test failure because alpine doesn't support stack traces and
add in an extra assertion related to making sure the stack trace was
printed twice.
This change is trying to make two failure modes a bit easier to deep dive:
1. If a serverPanic or serverAssert occurs during the info (or module)
printing, it will recursively panic, which is a lot of fun as it will
just keep recursively printing. It will eventually stack overflow, but
will generate a lot of text in the process.
2. When a segfault happens during the segfault handler, no information
is communicated other than it happened. This can be problematic because
`info` may help diagnose the real issue, but without fixing the
recursive crash it might be hard to get at that info.
Fix#12792
On ubuntu 23(lunar), non-root users will not be allowed to change the
oom_score_adj of a process to a value that is too low.
Since terminal's default oom_score_adj is 200, if we run the test on
terminal, we won't be able to set the oom_score_adj of the redis process
to 9 or 22, which is too low.
Reproduction on ubuntu 23(lunar) terminal:
```sh
$ cat /proc/`pgrep redis-server`/oom_score_adj
200
$ echo 100 > /proc/`pgrep redis-server`/oom_score_adj
# success without error
$ echo 99 > /proc/`pgrep redis-server`/oom_score_adj
echo: write error: Permission denied
```
As from the output above, we can only set the minimum oom score of redis
processes to 100.
By modifying the test, make oom_score_adj only increase upwards and not
decrease.
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
Previous implementation would disconnect _all_ clients when running `ACL
LOAD`, which wasn't very useful.
This change brings the behavior in line with that of `ACL SETUSER`, `ACL
DELUSER`, in that only clients whose user is deleted or clients
subscribed to channels which they no longer have access to will be
disconnected.
---------
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com>
This PR, we added -4 and -6 options to redis-cli to determine
IPV4 / IPV6 priority in DNS lookup.
This was mentioned in
https://github.com/redis/redis/pull/11151#issuecomment-1231570651
For now it's only used in CLUSTER MEET.
The options also made it possible to reliably test dns lookup in CI,
using this option, we can add some localhost tests for #11151.
The commit was cherry-picked from #11151, back then we decided to split
the PR.
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
We dont have test for hgetall against key doesnot exist so added the
test in test suite and along with this, added wrong type cases for other
missing commands.
The by/get options of sort/sort_ro command used to be forbidden in
cluster mode, since we are not sure which slot the pattern may be in.
As the optimization done in #12536, patterns now can be mapped to slots,
we should allow by/get options in cluster mode when the pattern maps to
the same slot as the key.
In #11489, we consider acl username to be sensitive information,
and consider the ACL GETUSER a sensitive command and remove it
from redis-cli historyfile.
This PR redact username information in ACL GETUSER and ACL DELUSER
from SLOWLOG, and also remove ACL DELUSER from redis-cli historyfile.
This PR also mark tls-key-file-pass and tls-client-key-file-pass
as sensitive config, will redact it from SLOWLOG and also
remove them from redis-cli historyfile.
In INFO CLIENTS section, we already have blocked_clients and
tracking_clients. We should add a new metric showing the number of
pubsub connections, which helps performance monitoring and trouble
shooting.
This is a follow-up fix to #12733. We need to apply the same changes to
delKeysInSlot. Refer to #12733 for more details.
This PR contains some other minor cleanups / improvements to the test
suite and docs.
It uses the postnotifications test module in a cluster mode test which
revealed a leak in the test module (fixed).
in #12536 we made a similar optimization for SCAN, now that hashtags in
patterns. When we can make sure all keys matching the pettern will be in
the same slot, we can limit the iteration to run only one one.
The following four configurations are renamed to align with Redis style:
1. server_cpulist renamed to server-cpulist
2. bio_cpulist renamed to bio-cpulist
3. aof_rewrite_cpulist renamed to aof-rewrite-cpulist
4. bgsave_cpulist renamed to bgsave-cpulist
The original names are retained as aliases to ensure compatibility with
old configuration files. We recommend users to gradually transition to
using the new configuration names to maintain consistency in style.
Introduced in #11695 .
The tryResizeHashTables function gets stuck on the last non-empty slot
while iterating through dictionaries. It does not restart from the
beginning. The reason for this issue is a problem with the usage of
dbIteratorNextDict:
/* Returns next dictionary from the iterator, or NULL if iteration is complete. */
dict *dbIteratorNextDict(dbIterator *dbit) {
if (dbit->next_slot == -1) return NULL;
dbit->slot = dbit->next_slot;
dbit->next_slot = dbGetNextNonEmptySlot(dbit->db, dbit->slot, dbit->keyType);
return dbGetDictFromIterator(dbit);
}
When iterating to the last non-empty slot, next_slot is set to -1,
causing it to loop indefinitely on that slot. We need to modify the code
to ensure that after iterating to the last non-empty slot, it returns to
the first non-empty slot.
BTW, function tryResizeHashTables is actually iterating over slots
that have keys. However, in its implementation, it leverages the
dbIterator (which is a key iterator) to obtain slot and dictionary
information. While this approach works fine, but it is not very
intuitive. This PR also improves readability by changing the iteration
to directly iterate over slots, thereby enhancing clarity.
When we register notification or server event in RedisModule_OnLoad, but
RedisModule_OnLoad eventually fails, triggering notification or server
event
will cause the server to crash.
If the loading fails on a later stage of moduleLoad, we do call
moduleUnload
which handles all un-registration, but when it fails on the
RedisModule_OnLoad
call, we only un-register several specific things and these were
missing:
- moduleUnsubscribeNotifications
- moduleUnregisterFilters
- moduleUnsubscribeAllServerEvents
Refactored the code to reuse the code from moduleUnload.
Fixes#12808.
see discussion from after https://github.com/redis/redis/pull/12453 was
merged
----
This PR replaces signals that are not considered async-signal-safe
(AS-safe) with safe calls.
#### **1. serverLog() and serverLogFromHandler()**
`serverLog` uses unsafe calls. It was decided that we will **avoid**
`serverLog` calls by the signal handlers when:
* The signal is not fatal, such as SIGALRM. In these cases, we prefer
using `serverLogFromHandler` which is the safe version of `serverLog`.
Note they have different prompts:
`serverLog`: `62220:M 26 Oct 2023 14:39:04.526 # <msg>`
`serverLogFromHandler`: `62220:signal-handler (1698331136) <msg>`
* The code was added recently. Calls to `serverLog` by the signal
handler have been there ever since Redis exists and it hasn't caused
problems so far. To avoid regression, from now we should use
`serverLogFromHandler`
#### **2. `snprintf` `fgets` and `strtoul`(base = 16) -------->
`_safe_snprintf`, `fgets_async_signal_safe`, `string_to_hex`**
The safe version of `snprintf` was taken from
[here](8cfc4ca5e7/src/mc_util.c (L754))
#### **3. fopen(), fgets(), fclose() --------> open(), read(), close()**
#### **4. opendir(), readdir(), closedir() --------> open(),
syscall(SYS_getdents64), close()**
#### **5. Threads_mngr sync mechanisms**
* waiting for the thread to generate stack trace: semaphore -------->
busy-wait
* `globals_rw_lock` was removed: as we are not using malloc and the
semaphore anymore we don't need to protect `ThreadsManager_cleanups`.
#### **6. Stacktraces buffer**
The initial problem was that we were not able to safely call malloc
within the signal handler.
To solve that we created a buffer on the stack of `writeStacktraces` and
saved it in a global pointer, assuming that under normal circumstances,
the function `writeStacktraces` would complete before any thread
attempted to write to it. However, **if threads lag behind, they might
access this global pointer after it no longer belongs to the
`writeStacktraces` stack, potentially corrupting memory.**
To address this, various solutions were discussed
[here](https://github.com/redis/redis/pull/12658#discussion_r1390442896)
Eventually, we decided to **create a pipe** at server startup that will
remain valid as long as the process is alive.
We chose this solution due to its minimal memory usage, and since
`write()` and `read()` are atomic operations. It ensures that stack
traces from different threads won't mix.
**The stacktraces collection process is now as follows:**
* Cleaning the pipe to eliminate writes of late threads from previous
runs.
* Each thread writes to the pipe its stacktrace
* Waiting for all the threads to mark completion or until a timeout (2
sec) is reached
* Reading from the pipe to print the stacktraces.
#### **7. Changes that were considered and eventually were dropped**
* replace watchdog timer with a POSIX timer:
according to [settimer man](https://linux.die.net/man/2/setitimer)
> POSIX.1-2008 marks getitimer() and setitimer() obsolete, recommending
the use of the POSIX timers API
([timer_gettime](https://linux.die.net/man/2/timer_gettime)(2),
[timer_settime](https://linux.die.net/man/2/timer_settime)(2), etc.)
instead.
However, although it is supposed to conform to POSIX std, POSIX timers
API is not supported on Mac.
You can take a look here at the Linux implementation:
[here](c7562ee135)
To avoid messing up the code, and uncertainty regarding compatibility,
it was decided to drop it for now.
* avoid using sds (uses malloc) in logConfigDebugInfo
It was considered to print config info instead of using sds, however
apparently, `logConfigDebugInfo` does more than just print the sds, so
it was decided this fix is out of this issue scope.
#### **8. fix Signal mask check**
The check `signum & sig_mask` intended to indicate whether the signal is
blocked by the thread was incorrect. Actually, the bit position in the
signal mask corresponds to the signal number. We fixed this by changing
the condition to: `sig_mask & (1L << (sig_num - 1))`
#### **9. Unrelated changes**
both `fork.tcl `and `util.tcl` implemented a function called
`count_log_message` expecting different parameters. This caused
confusion when trying to run daily tests with additional test parameters
to run a specific test.
The `count_log_message` in `fork.tcl` was removed and the calls were
replaced with calls to `count_log_message` located in `util.tcl`
---------
Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
We have some test cases of swapdb with watchkey but missing seperate
basic swapdb test cases, unhappy path and flushdb after swapdb. So added
the test cases in keyspace.tcl.
When using DB iterator, it will use dictInitSafeIterator to init a old safe
dict iterator. When dbIteratorNext is used, it will jump to the next slot db
dict when we are done a dict. During this process, we do not have any calls to
dictResumeRehashing, which causes the dict's pauserehash to always be > 0.
And at last, it will be returned directly in dictRehashMilliseconds, which causes
us to have slot dict in a state where rehash cannot be completed.
In the "expire scan should skip dictionaries with lot's of empty buckets" test,
adding a `keys *` can reproduce the problem stably. `keys *` will call dbIteratorNext
to trigger a traversal of all slot dicts.
Added dbReleaseIterator and dbIteratorInitNextSafeIterator methods to call dictResetIterator.
Issue was introduced in #11695.
Test recently added fails on timeout in valgrind in GH actions.
Locally with valgrind the test finishes within 1.5 sec(s). Couldn't find
any issue due to lack of reproducibility. Increasing the timeout and
adding an additional log to the test to understand how many keys
were left at the end.
Redis 7.2 (#9406) introduced a new modules event, `RedisModuleEvent_Key`.
This new event allows the module to read the key data just before it is removed
from the database (either deleted, expired, evicted, or overwritten).
When the key is removed from the database, either by active expire or eviction.
The new event was not called as part of an execution unit. This can cause an
issue if the module registers a post notification job inside the event. This job will
not be executed atomically with the expiration/eviction operation and will not
replicated inside a Multi/Exec. Moreover, the post notification job will be executed
right after the event where it is still not safe to perform any write operation, this will
violate the promise that post notification job will be called atomically with the
operation that triggered it and **only when it is safe to write**.
This PR fixes the issue by wrapping each expiration/eviction of a key with an execution
unit. This makes sure the entire operation will run atomically and all the post notification
jobs will be executed at the end where it is safe to write.
Tests were modified to verify the fix.
This change overcomes many stability issues experienced with the
vmactions action.
We need to limit VMs to 8GB for better stability, as the 13GB default
seems to hang them occasionally.
Shell code has been simplified since this action seem to use `bash -e`
which will abort on non-zero exit codes anyway.
Reverts the skipping defrag tests in cluster mode (done in #12672.
instead it skips only some defrag tests that are relevant for cluster modes.
The test now run well after investigating and making the changes in #12674 and #12694.
Co-authored-by: Oran Agra <oran@redislabs.com>
Optimize the performance of SCAN commands when a match pattern can only contain keys from a
single slot in cluster mode. This can happen when the pattern contains a hash tag before any
wildcard matchers or when the key contains no matchers.
Test failure on freebsd CI:
```
*** [err]: expire scan should skip dictionaries with lot's of empty buckets in tests/unit/expire.tcl
scan didn't handle slot skipping logic.
```
Observation:
expiry of keys might happen before the empty buckets are formed and won't help with the expiry skip logic validation.
Solution:
Disable expiration until the empty buckets are formed.
Fixing issues started after #11695 when the defrag tests are being executed in cluster mode too.
For some reason, it looks like the defragmentation is over too quickly, before the test is able to
detect that it's running.
so now instead of waiting to see that it's active, we wait to see that it did some work
```
[err]: Active defrag big list: cluster in tests/unit/memefficiency.tcl
defrag not started.
[err]: Active defrag big keys: cluster in tests/unit/memefficiency.tcl
defrag didn't stop.
```
Temporarily disabling few of the defrag tests in cluster mode to make the daily run stable:
Active defrag eval scripts
Active defrag big keys
Active defrag big list
Active defrag edge case
Dictionary iterator logic in the `tryResizeHashTables` method is picking the next
(incorrect) dictionary while the cursor is at a given slot. This could lead to some
dictionary/slot getting skipped from resizing.
Also stabilize the test.
problem introduced recently in #11695
This is an implementation of https://github.com/redis/redis/issues/10589 that eliminates 16 bytes per entry in cluster mode, that are currently used to create a linked list between entries in the same slot. Main idea is splitting main dictionary into 16k smaller dictionaries (one per slot), so we can perform all slot specific operations, such as iteration, without any additional info in the `dictEntry`. For Redis cluster, the expectation is that there will be a larger number of keys, so the fixed overhead of 16k dictionaries will be The expire dictionary is also split up so that each slot is logically decoupled, so that in subsequent revisions we will be able to atomically flush a slot of data.
## Important changes
* Incremental rehashing - one big change here is that it's not one, but rather up to 16k dictionaries that can be rehashing at the same time, in order to keep track of them, we introduce a separate queue for dictionaries that are rehashing. Also instead of rehashing a single dictionary, cron job will now try to rehash as many as it can in 1ms.
* getRandomKey - now needs to not only select a random key, from the random bucket, but also needs to select a random dictionary. Fairness is a major concern here, as it's possible that keys can be unevenly distributed across the slots. In order to address this search we introduced binary index tree). With that data structure we are able to efficiently find a random slot using binary search in O(log^2(slot count)) time.
* Iteration efficiency - when iterating dictionary with a lot of empty slots, we want to skip them efficiently. We can do this using same binary index that is used for random key selection, this index allows us to find a slot for a specific key index. For example if there are 10 keys in the slot 0, then we can quickly find a slot that contains 11th key using binary search on top of the binary index tree.
* scan API - in order to perform a scan across the entire DB, the cursor now needs to not only save position within the dictionary but also the slot id. In this change we append slot id into LSB of the cursor so it can be passed around between client and the server. This has interesting side effect, now you'll be able to start scanning specific slot by simply providing slot id as a cursor value. The plan is to not document this as defined behavior, however. It's also worth nothing the SCAN API is now technically incompatible with previous versions, although practically we don't believe it's an issue.
* Checksum calculation optimizations - During command execution, we know that all of the keys are from the same slot (outside of a few notable exceptions such as cross slot scripts and modules). We don't want to compute the checksum multiple multiple times, hence we are relying on cached slot id in the client during the command executions. All operations that access random keys, either should pass in the known slot or recompute the slot.
* Slot info in RDB - in order to resize individual dictionaries correctly, while loading RDB, it's not enough to know total number of keys (of course we could approximate number of keys per slot, but it won't be precise). To address this issue, we've added additional metadata into RDB that contains number of keys in each slot, which can be used as a hint during loading.
* DB size - besides `DBSIZE` API, we need to know size of the DB in many places want, in order to avoid scanning all dictionaries and summing up their sizes in a loop, we've introduced a new field into `redisDb` that keeps track of `key_count`. This way we can keep DBSIZE operation O(1). This is also kept for O(1) expires computation as well.
## Performance
This change improves SET performance in cluster mode by ~5%, most of the gains come from us not having to maintain linked lists for keys in slot, non-cluster mode has same performance. For workloads that rely on evictions, the performance is similar because of the extra overhead for finding keys to evict.
RDB loading performance is slightly reduced, as the slot of each key needs to be computed during the load.
## Interface changes
* Removed `overhead.hashtable.slot-to-keys` to `MEMORY STATS`
* Scan API will now require 64 bits to store the cursor, even on 32 bit systems, as the slot information will be stored.
* New RDB version to support the new op code for SLOT information.
---------
Co-authored-by: Vitaly Arbuzov <arvit@amazon.com>
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Roshan Khatri <rvkhatri@amazon.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
In #11568 we removed the NOSCRIPT flag from commands and keep the BLOCKING flag.
Aiming to allow them in scripts and let them implicitly behave in the non-blocking way.
In that sense, the old behavior was to allow LPOP and reject BLPOP, and the new behavior,
is to allow BLPOP too, and fail it only in case it ends up blocking.
So likewise, so far we allowed XREAD and rejected XREAD BLOCK, and we will now allow
that too, and only reject it if it ends up blocking.
The problem is that WAITAOF could have hang in case commands were
propagated only to replicas.
This can happen if a module uses RM_Call with the REDISMODULE_ARGV_NO_AOF flag.
In that case, master_repl_offset would increase, but there would be nothing to fsync, so
in the absence of other traffic, fsynced_reploff_pending would stay the static, and WAITAOF can hang.
This commit updates fsynced_reploff_pending to the latest offset in flushAppendOnlyFile in case
there's nothing to fsync. i.e. in case it's behind because of the above mentions case it'll be refreshed
and release the WAITAOF.
Other changes:
Fix a race in wait.tcl (client getting blocked vs. the fsync thread)
If we set `fsynced_reploff_pending` in `startAppendOnly`, and the fork doesn't start
immediately (e.g. there's another fork active at the time), any subsequent commands
will increment `server.master_repl_offset`, but will not cause a fsync (given they were
executed before the fork started, they just ended up in the RDB part of it)
Therefore, any WAITAOF will wait on the new master_repl_offset, but it will time out
because no fsync will be executed.
Release notes:
```
WAITAOF could timeout in the absence of write traffic in case a new AOF is created and
an AOFRW can't immediately start.
This can happen by the appendonly config is changed at runtime, but also after FLUSHALL,
and replica full sync.
```
The new test added in #12476 causes reply-schemas-validator to fail.
When doing `catch {r get key}`, the req-res output is:
```
3
get
3
key
12
__argv_end__
$100000
aaaaaaaaaaaaaaaaaaaa...4
info
5
stats
12
__argv_end__
=1670
txt:# Stats
...
```
And we can see the link after `$100000`, there is a 4 in the last,
it break the req-res-log-validator script since the format is wrong.
The reason i guess is after the client reconnection (after the output
buf limit), we will not add newlines, but append args directly.
Since obuf-limits.tcl is doing the same thing, and it had the logreqres:skip
flag, so this PR is following it.