valkey

Author	SHA1	Message	Date
Wen Hui	070453eef3	Cluster human readable nodename feature (#9564 ) This PR adds a human readable name to a node in clusters that are visible as part of error logs. This is useful so that admins and operators of Redis cluster have better visibility into failures without having to cross-reference the generated ID with some logical identifier (such as pod-ID or EC2 instance ID). This is mentioned in #8948. Specific nodenames can be set by using the variable cluster-announce-human-nodename. The nodename is gossiped using the clusterbus extension in #9530. Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2023-06-17 21:16:51 -07:00
zhaozhao.zz	07ea220419	add a new loglevel 'nothing' to disable logging (#12133 ) Users can record logs of different levels by setting the `loglevel`. However, sometimes there are many logs even at the warning level, which can affect the performance of Redis. For example, when a user accesses the tls-port using a non-encrypted link, Redis will log lots of "# Error accepting a client connection: ...". We can provide the ability to disable logging so that users can temporarily turn off logging and turn it back on after the problem is resolved.	2023-05-23 18:30:44 +03:00
Binbin	0b159b34ea	Bump codespell to 2.2.4, fix typos and outupdated comments (#11911 ) Fix some seen typos and wrong comments.	2023-03-16 08:50:32 +02:00
Viktor Söderqvist	4e472a1a7f	Listpack encoding for sets (#11290 ) Small sets with not only integer elements are listpack encoded, by default up to 128 elements, max 64 bytes per element, new config `set-max-listpack-entries` and `set-max-listpack-value`. This saves memory for small sets compared to using a hashtable. Sets with only integers, even very small sets, are still intset encoded (up to 1G limit, etc.). Larger sets are hashtable encoded. This PR increments the RDB version, and has an effect on OBJECT ENCODING Possible conversions when elements are added: intset -> listpack listpack -> hashtable intset -> hashtable Note: No conversion happens when elements are deleted. If all elements are deleted and then added again, the set is deleted and recreated, thus implicitly converted to a smaller encoding.	2022-11-09 19:50:07 +02:00
Taishi Kasuga	f609a4eda7	Fix non-existent directive name at comments in redis.conf, s/cluster-tls/tls-cluster/g (#11364 )	2022-10-08 10:16:47 +03:00
Binbin	bb6513cbba	ACL default newly created user set USER_FLAG_SANITIZE_PAYLOAD flag (#11279 ) Starting from 6.2, after ACL SETUSER user reset, the user will carry the sanitize-payload flag. It was added in #7807, and then ACL SETUSER reset is inconsistent with default newly created user which missing sanitize-payload flag. Same as `off` and `on` these two bits are mutually exclusive, the default created user needs to have sanitize-payload flag. Adds USER_FLAG_SANITIZE_PAYLOAD flag to ACLCreateUser. Note that the bug don't have any real implications, since the code in rdb.c (rdbLoadObject) checks for `USER_FLAG_SANITIZE_PAYLOAD_SKIP`, so the fact that `USER_FLAG_SANITIZE_PAYLOAD` is missing doesn't really matters. Added tests to make sure it won't be broken in the future, and updated the comment in ACLSetUser and redis.conf	2022-09-22 09:13:39 +03:00
Eduardo Semprebon	36abc0fa8f	Improve redis.conf documentation on repl-diskless-load (#11213 ) Just noticed that there are some inaccurate, or at least confusing information about `repl-diskless-load` in `redis.conf` It shouldn't scare away users willing to spend the extra memory. `may mean that we have to flush the contents of the current database before the full rdb was received.`: this is likely related to the time when there was an option `always`, where content on replica was flushed before loading from master.	2022-09-11 11:22:59 +03:00
yourtree	ca6aeadfbe	Support setlocale via CONFIG operation. (#11059 ) Till now Redis officially supported tuning it via environment variable see #1074. But we had other requests to allow changing it at runtime, see #799, and #11041. Note that `strcoll()` is used as Lua comparison function and also for comparison of certain string objects in Redis, which leads to a problem that, in different regions, for some characters, the result may be different. Below is an example. ``` 127.0.0.1:6333> SORT test alpha 1) "<" 2) ">" 3) "," 4) "" 127.0.0.1:6333> CONFIG GET locale-collate 1) "locale-collate" 2) "" 127.0.0.1:6333> CONFIG SET locale-collate 1 (error) ERR CONFIG SET failed (possibly related to argument 'locale') 127.0.0.1:6333> CONFIG SET locale-collate C OK 127.0.0.1:6333> SORT test alpha 1) "" 2) "," 3) "<" 4) ">" ``` That will cause accidental code compatibility issues for Lua scripts and some Redis commands. This commit creates a new config parameter to control the local environment which only affects `Collate` category. Above shows how it affects `SORT` command, and below shows the influence on Lua scripts. ``` 127.0.0.1:6333> CONFIG GET locale-collate 1) " locale-collate" 2) "C" 127.0.0.1:6333> EVAL "return ',' < ''" 0 (nil) 127.0.0.1:6333> CONFIG SET locale-collate "" OK 127.0.0.1:6333> EVAL "return ',' < ''" 0 (integer) 1 ``` Co-authored-by: calvincjli <calvincjli@tencent.com> Co-authored-by: Oran Agra <oran@redislabs.com>	2022-08-21 17:55:45 +03:00
Binbin	1f600efd01	Fix outdated lfu-decay-time doc in redis.conf (#11108 ) The divided by two and less <= 10 logics were changed in `06ca9d6839`. Now we just decrement the counter by num_periods. The lfu-decay-time special value of 0 's meaning was actually changed in `06ca9d6839`. Now we won't do anything on counter if lfu-decay-time is 0.	2022-08-14 10:50:31 +03:00
Binbin	8203461120	Fix some outdated comments and some typo (#10946 ) * Fix some outdated comments and some typo	2022-07-06 20:31:59 -07:00
Eduardo Semprebon	3a1d14259d	Allow configuring signaled shutdown flags (#10594 ) The SHUTDOWN command has various flags to change it's default behavior, but in some cases establishing a connection to redis is complicated and it's easier for the management software to use signals. however, so far the signals could only trigger the default shutdown behavior. Here we introduce the option to control shutdown arguments for SIGTERM and SIGINT. New config options: `shutdown-on-sigint [nosave \| save] [now] [force]` `shutdown-on-sigterm [nosave \| save] [now] [force]` Implementation: Support MULTI_ARG_CONFIG on createEnumConfig to support multiple enums to be applied as bit flags. Co-authored-by: Oran Agra <oran@redislabs.com>	2022-04-26 14:34:04 +03:00
Madelyn Olson	6fa8e4f7af	Set replicas to panic on disk errors, and optionally panic on replication errors (#10504 ) * Till now, replicas that were unable to persist, would still execute the commands they got from the master, now they'll panic by default, and we add a new `replica-ignore-disk-errors` config to change that. * Till now, when a command failed on a replica or AOF-loading, it only logged a warning and a stat, we add a new `propagation-error-behavior` config to allow panicking in that state (may become the default one day) Note that commands that fail on the replica can either indicate a bug that could cause data inconsistency between the replica and the master, or they could be in some cases (specifically in previous versions), a result of a command (e.g. EVAL) that failed on the master, but still had to be propagated to fail on the replica as well.	2022-04-26 13:25:33 +03:00
David CARLIER	aba2865c86	Add socket-mark-id support for marking sockets. (#10349 ) Add a configuration option to attach an operating system-specific identifier to Redis sockets, supporting advanced network configurations using iptables (Linux) or ipfw (FreeBSD).	2022-04-20 09:29:37 +03:00
Luke Palmer	bb7891f080	Keyspace event for new keys (#10512 ) Add an optional keyspace event when new keys are added to the db. This is useful for applications where clients need to be aware of the redis keyspace. Such an application can SCAN once at startup and then listen for "new" events (plus others associated with DEL, RENAME, etc).	2022-04-13 11:36:38 +03:00
Binbin	f25e688e2a	Cleanups in redis.conf (#10452 ) Did some cleanups: 1. local local typo 2. replace the only slave word in the file 3. add FUNCTION FLUSH to `lazyfree-lazy-user-flush` description 4. thought it would be better to use these, there are actually "four" options 5. the the typo 6. remove a extra space 7. change comment next to `activedefrag no` to match the default value	2022-03-27 12:03:38 +03:00
Ronald Petty	b104f3cabc	Update redis.conf (#10396 ) Typo in conf file comment.	2022-03-08 12:52:54 -08:00
Binbin	c0ea77f0e1	Show publishshard_sent stat in cluster info (#10314 ) publishshard was added in #8621 (7.0 RC1), but the publishshard_sent stat is not shown in CLUSTER INFO command. Other changes: 1. Remove useless `needhelp` statements, it was removed in `3dad819`. 2. Use `LL_WARNING` log level for some error logs (I/O error, Connection failed). 3. Fix typos that saw by the way.	2022-02-19 21:11:20 -08:00
Yossi Gottlieb	3881f7850f	Fix OpenSSL 3.0.x related issues. (#10291 ) * Drop obsolete initialization calls. * Use decoder API for DH parameters. * Enable auto DH parameters if not explicitly used, which should be the preferred configuration going forward.	2022-02-15 08:37:06 +02:00
Pengcheng Huang	88795f5cba	Fix the misleading description of the option repl-ping-slave-period (#4687 ) Co-authored-by: yoav-steinberg <yoav@monfort.co.il>	2022-02-10 09:18:59 +02:00
Harkrishn Patro	a43b6922d1	Set default channel permission to resetchannels for 7.0 (#10181 ) For backwards compatibility in 6.x, channels default permission was set to `allchannels` however with 7.0, we should modify it and the default value should be `resetchannels` for better security posture. Also, with selectors in ACL, a client doesn't have to set channel rules everytime and by default the value will be `resetchannels`. Before this change ``` 127.0.0.1:6379> acl list 1) "user default on nopass ~* &* +@all" 127.0.0.1:6379> acl setuser hp on nopass +@all ~* OK 127.0.0.1:6379> acl list 1) "user default on nopass ~* &* +@all" 2) "user hp on nopass ~* &* +@all" 127.0.0.1:6379> acl setuser hp1 on nopass -@all (%R~sales) OK 127.0.0.1:6379> acl list 1) "user default on nopass ~ &* +@all" 2) "user hp on nopass ~* &* +@all" 3) "user hp1 on nopass &* -@all (%R~sales* &* -@all)" ``` After this change ``` 127.0.0.1:6379> acl list 1) "user default on nopass ~* &* +@all" 127.0.0.1:6379> acl setuser hp on nopass +@all ~* OK 127.0.0.1:6379> acl list 1) "user default on nopass ~* &* +@all" 2) "user hp on nopass ~* resetchannels +@all" 127.0.0.1:6379> acl setuser hp1 on nopass -@all (%R~sales) OK 127.0.0.1:6379> acl list 1) "user default on nopass ~ &* +@all" 2) "user hp on nopass ~* resetchannels +@all" 3) "user hp1 on nopass resetchannels -@all (%R~sales* resetchannels -@all)" ```	2022-01-30 12:02:55 +02:00
Oran Agra	75a950cb93	doc improvement about acl first-arg (#10199 ) We recently removed capabilities from the first-arg feature of ACL and added a warning. but we didn't document it. ref: #10147 and redis/redis-doc#1761	2022-01-29 18:34:59 +02:00
Binbin	495ac8b79a	Fix outdated save key word in redis.conf (#10166 ) For some complex data types, server.dirty actually counts the number of elements that have been changed. And in FLUSHDB or FLUSHALL, we count the number of keys. So the word "key" is not strictly correct and is outdated. Some discussion can be seen at #8140.	2022-01-24 22:02:42 +02:00
Madelyn Olson	55c81f2cd3	ACL V2 - Selectors and key based permissions (#9974 ) * Implemented selectors which provide multiple different sets of permissions to users * Implemented key based permissions * Added a new ACL dry-run command to test permissions before execution * Updated module APIs to support checking key based permissions Co-authored-by: Oran Agra <oran@redislabs.com>	2022-01-20 13:05:27 -08:00
perryitay	c4b788230c	Adding module api for processing commands during busy jobs and allow flagging the commands that should be handled at this status (#9963 ) Some modules might perform a long-running logic in different stages of Redis lifetime, for example: * command execution * RDB loading * thread safe context During this long-running logic Redis is not responsive. This PR offers 1. An API to process events while a busy command is running (`RM_Yield`) 2. A new flag (`ALLOW_BUSY`) to mark the commands that should be handled during busy jobs which can also be used by modules (`allow-busy`) 3. In slow commands and thread safe contexts, this flag will start rejecting commands with -BUSY only after `busy-reply-threshold` 4. During loading (`rdb_load` callback), it'll process events right away (not wait for `busy-reply-threshold`), but either way, the processing is throttled to the server hz rate. 5. Allow modules to Yield to redis background tasks, but not to client commands * rename `script-time-limit` to `busy-reply-threshold` (an alias to the pre-7.0 `lua-time-limit`) Co-authored-by: Oran Agra <oran@redislabs.com>	2022-01-20 09:05:53 +02:00
Oran Agra	ae89958972	Set repl-diskless-sync to yes by default, add repl-diskless-sync-max-replicas (#10092 ) 1. enable diskless replication by default 2. add a new config named repl-diskless-sync-max-replicas that enables replication to start before the full repl-diskless-sync-delay was reached. 3. put replica online sooner on the master (see below) 4. test suite uses repl-diskless-sync-delay of 0 to be faster 5. a few tests that use multiple replica on a pre-populated master, are now using the new repl-diskless-sync-max-replicas 6. fix possible timing issues in a few cluster tests (see below) put replica online sooner on the master ---------------------------------------------------- there were two tests that failed because they needed for the master to realize that the replica is online, but the test code was actually only waiting for the replica to realize it's online, and in diskless it could have been before the master realized it. changes include two things: 1. the tests wait on the right thing 2. issues in the master, putting the replica online in two steps. the master used to put the replica as online in 2 steps. the first step was to mark it as online, and the second step was to enable the write event (only after getting ACK), but in fact the first step didn't contains some of the tasks to put it online (like updating good slave count, and sending the module event). this meant that if a test was waiting to see that the replica is online form the point of view of the master, and then confirm that the module got an event, or that the master has enough good replicas, it could fail due to timing issues. so now the full effect of putting the replica online, happens at once, and only the part about enabling the writes is delayed till the ACK. fix cluster tests -------------------- I added some code to wait for the replica to sync and avoid race conditions. later realized the sentinel and cluster tests where using the original 5 seconds delay, so changed it to 0. this means the other changes are probably not needed, but i suppose they're still better (avoid race conditions)	2022-01-17 14:11:11 +02:00
Madelyn Olson	e8e02f900c	Changed latency histogram output to omit trailing 0s and periods (#10075 ) Changed latency percentile output to omit trailing 0s and periods	2022-01-09 17:04:18 -08:00
YEONCHEOL JANG	b9669829c8	Fixes typo for redis.conf (#10072 ) Fixes typo for redis.conf	2022-01-09 08:08:55 +02:00
filipe oliveira	5dd15443ac	Added INFO LATENCYSTATS section: latency by percentile distribution/latency by cumulative distribution of latencies (#9462 ) # Short description The Redis extended latency stats track per command latencies and enables: - exporting the per-command percentile distribution via the `INFO LATENCYSTATS` command. ( percentile distribution is not mergeable between cluster nodes ). - exporting the per-command cumulative latency distributions via the `LATENCY HISTOGRAM` command. Using the cumulative distribution of latencies we can merge several stats from different cluster nodes to calculate aggregate metrics . By default, the extended latency monitoring is enabled since the overhead of keeping track of the command latency is very small. If you don't want to track extended latency metrics, you can easily disable it at runtime using the command: - `CONFIG SET latency-tracking no` By default, the exported latency percentiles are the p50, p99, and p999. You can alter them at runtime using the command: - `CONFIG SET latency-tracking-info-percentiles "0.0 50.0 100.0"` ## Some details: - The total size per histogram should sit around 40 KiB. We only allocate those 40KiB when a command was called for the first time. - With regards to the WRITE overhead As seen below, there is no measurable overhead on the achievable ops/sec or full latency spectrum on the client. Including also the measured redis-benchmark for unstable vs this branch. - We track from 1 nanosecond to 1 second ( everything above 1 second is considered +Inf ) ## `INFO LATENCYSTATS` exposition format - Format: `latency_percentiles_usec_<CMDNAME>:p0=XX,p50....` ## `LATENCY HISTOGRAM [command ...]` exposition format Return a cumulative distribution of latencies in the format of a histogram for the specified command names. The histogram is composed of a map of time buckets: - Each representing a latency range, between 1 nanosecond and roughly 1 second. - Each bucket covers twice the previous bucket's range. - Empty buckets are not printed. - Everything above 1 sec is considered +Inf. - At max there will be log2(1000000000)=30 buckets We reply a map for each command in the format: `<command name> : { `calls`: <total command calls> , `histogram` : { <bucket 1> : latency , < bucket 2> : latency, ... } }` Co-authored-by: Oran Agra <oran@redislabs.com>	2022-01-05 14:01:05 +02:00
Binbin	9538088751	Fix typos in aof.c / redis.conf (#10057 )	2022-01-05 12:06:02 +02:00
chenyang8094	87789fae0b	Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788 ) Implement Multi-Part AOF mechanism to avoid overheads during AOFRW. Introducing a folder with multiple AOF files tracked by a manifest file. The main issues with the the original AOFRW mechanism are: * buffering of commands that are processed during rewrite (consuming a lot of RAM) * freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it. * double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files) The main modifications of this PR: 1. Remove the AOF rewrite buffer and related code. 2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type, it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the incremental commands since the last AOFRW. 3. Use a AOF manifest file to record and manage these AOF files mentioned above. 4. The original configuration of `appendfilename` will be the base part of the new file name, for example: `appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof` 5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename` 6. Remove the `aof_rewrite_buffer_length` field in info. 7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs. It also gives users the opportunity to preserve the history AOFs. just for testing use now. 8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now), we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately. 9. Support upgrade (load) data from old version redis. 10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and manifest file will be placed in this directory. 11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if `aof-load-truncated` is enabled. Co-authored-by: Oran Agra <oran@redislabs.com>	2022-01-03 19:14:13 +02:00
Madelyn Olson	5460c10047	Implement clusterbus message extensions and cluster hostname support (#9530 ) Implement the ability for cluster nodes to advertise their location with extension messages.	2022-01-02 19:48:29 -08:00
Harkrishn Patro	9f8885760b	Sharded pubsub implementation (#8621 ) This commit implements a sharded pubsub implementation based off of shard channels. Co-authored-by: Harkrishn Patro <harkrisp@amazon.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2022-01-02 16:54:47 -08:00
Viktor Söderqvist	45a155bd0f	Wait for replicas when shutting down (#9872 ) To avoid data loss, this commit adds a grace period for lagging replicas to catch up the replication offset. Done: * Wait for replicas when shutdown is triggered by SIGTERM and SIGINT. * Wait for replicas when shutdown is triggered by the SHUTDOWN command. A new blocked client type BLOCKED_SHUTDOWN is introduced, allowing multiple clients to call SHUTDOWN in parallel. Note that they don't expect a response unless an error happens and shutdown is aborted. * Log warning for each replica lagging behind when finishing shutdown. * CLIENT_PAUSE_WRITE while waiting for replicas. * Configurable grace period 'shutdown-timeout' in seconds (default 10). * New flags for the SHUTDOWN command: - NOW disables the grace period for lagging replicas. - FORCE ignores errors writing the RDB or AOF files which would normally prevent a shutdown. - ABORT cancels ongoing shutdown. Can't be combined with other flags. * New field in the output of the INFO command: 'shutdown_in_milliseconds'. The value is the remaining maximum time to wait for lagging replicas before finishing the shutdown. This field is present in the Server section only during shutdown. Not directly related: * When shutting down, if there is an AOF saving child, it is killed even if AOF is disabled. This can happen if BGREWRITEAOF is used when AOF is off. * Client pause now has end time and type (WRITE or ALL) per purpose. The different pause purposes are CLIENT PAUSE command, failover and shutdown. If clients are unpaused for one purpose, it doesn't affect client pause for other purposes. For example, the CLIENT UNPAUSE command doesn't affect client pause initiated by the failover or shutdown procedures. A completed failover or a failed shutdown doesn't unpause clients paused by the CLIENT PAUSE command. Notes: * DEBUG RESTART doesn't wait for replicas. * We already have a warning logged when a replica disconnects. This means that if any replica connection is lost during the shutdown, it is either logged as disconnected or as lagging at the time of exit. Co-authored-by: Oran Agra <oran@redislabs.com>	2022-01-02 09:50:15 +02:00
YaacovHazan	ae2f5b7b2e	Protected configs and sensitive commands (#9920 ) Block sensitive configs and commands by default. * `enable-protected-configs` - block modification of configs with the new `PROTECTED_CONFIG` flag. Currently we add this flag to `dbfilename`, and `dir` configs, all of which are non-mutable configs that can set a file redis will write to. * `enable-debug-command` - block the `DEBUG` command * `enable-module-command` - block the `MODULE` command These have a default value set to `no`, so that these features are not exposed by default to client connections, and can only be set by modifying the config file. Users can change each of these to either `yes` (allow all access), or `local` (allow access from local TCP connections and unix domain connections) Note that this is a breaking change (specifically the part about MODULE command being disabled by default). I.e. we don't consider DEBUG command being blocked as an issue (people shouldn't have been using it), and the few configs we protected are unlikely to have been set at runtime anyway. On the other hand, it's likely to assume some users who use modules, load them from the config file anyway. Note that's the whole point of this PR, for redis to be more secure by default and reduce the attack surface on innocent users, so secure defaults will necessarily mean a breaking change.	2021-12-19 10:46:16 +02:00
ny0312	792afb4432	Introduce memory management on cluster link buffers (#9774 ) Introduce memory management on cluster link buffers: * Introduce a new `cluster-link-sendbuf-limit` config that caps memory usage of cluster bus link send buffers. * Introduce a new `CLUSTER LINKS` command that displays current TCP links to/from peers. * Introduce a new `mem_cluster_links` field under `INFO` command output, which displays the overall memory usage by all current cluster links. * Introduce a new `total_cluster_links_buffer_limit_exceeded` field under `CLUSTER INFO` command output, which displays the accumulated count of cluster links freed due to `cluster-link-sendbuf-limit`.	2021-12-16 21:56:59 -08:00
Madelyn Olson	36ca545286	Fix spelling of sanitization (#9901 )	2021-12-06 10:14:13 +02:00
yoav-steinberg	0e5b813ef9	Multiparam config set (#9748 ) We can now do: `config set maxmemory 10m repl-backlog-size 5m` ## Basic algorithm to support "transaction like" config sets: 1. Backup all relevant current values (via get). 2. Run "verify" and "set" on everything, if we fail run "restore". 3. Run "apply" on everything (optional optimization: skip functions already run). If we fail run "restore". 4. Return success. ### restore 1. Run set on everything in backup. If we fail log it and continue (this puts us in an undefined state but we decided it's better than the alternative of panicking). This indicates either a bug or some unsupported external state. 2. Run apply on everything in backup (optimization: skip functions already run). If we fail log it (see comment above). 3. Return error. ## Implementation/design changes: * Apply function are idempotent (have no effect if they are run more than once for the same config). * No indication in set functions if we're reading the config or running from the `CONFIG SET` command (removed `update` argument). * Set function should set some config variable and assume an (optional) apply function will use that later to apply. If we know this setting can be safely applied immediately and can always be reverted and doesn't depend on any other configuration we can apply immediately from within the set function (and not store the setting anywhere). This is the case of this `dir` config, for example, which has no apply function. No apply function is need also in the case that setting the variable in the `server` struct is all that needs to be done to make the configuration take effect. Note that the original concept of `update_fn`, which received the old and new values was removed and replaced by the optional apply function. * Apply functions use settings written to the `server` struct and don't receive any inputs. * I take care that for the generic (non-special) configs if there's no change I avoid calling the setter (possible optimization: avoid calling the apply function as well). * Passing the same config parameter more than once to `config set` will fail. You can't do `config set my-setting value1 my-setting value2`. Note that getting `save` in the context of the conf file parsing to work here as before was a pain. The conf file supports an aggregate `save` definition, where each `save` line is added to the server's save params. This is unlike any other line in the config file where each line overwrites any previous configuration. Since we now support passing multiple save params in a single line (see top comments about `save` in https://github.com/redis/redis/pull/9644) we should deprecate the aggregate nature of this config line and perhaps reduce this ugly code in the future.	2021-12-01 10:15:11 +02:00
sundb	4512905961	Replace ziplist with listpack in quicklist (#9740 ) Part three of implementing #8702, following #8887 and #9366 . ## Description of the feature 1. Replace the ziplist container of quicklist with listpack. 2. Convert existing quicklist ziplists on RDB loading time. an O(n) operation. ## Interface changes 1. New `list-max-listpack-size` config is an alias for `list-max-ziplist-size`. 2. Replace `debug ziplist` command with `debug listpack`. ## Internal changes 1. Add `lpMerge` to merge two listpacks . (same as `ziplistMerge`) 2. Add `lpRepr` to print info of listpack which is used in debugCommand and `quicklistRepr`. (same as `ziplistRepr`) 3. Replace `QUICKLIST_NODE_CONTAINER_ZIPLIST` with `QUICKLIST_NODE_CONTAINER_PACKED`(following #9357 ). It represent that a quicklistNode is a packed node, as opposed to a plain node. 4. Remove `createZiplistObject` method, which is never used. 5. Calculate listpack entry size using overhead overestimation in `quicklistAllowInsert`. We prefer an overestimation, which would at worse lead to a few bytes below the lowest limit of 4k. ## Improvements 1. Calling `lpShrinkToFit` after converting Ziplist to listpack, which was missed at #9366. 2. Optimize `quicklistAppendPlainNode` to avoid memcpy data. ## Bugfix 1. Fix crash in `quicklistRepr` when ziplist is compressed, introduced from #9366. ## Test 1. Add unittest for `lpMerge`. 2. Modify the old quicklist ziplist corrupt dump test. Co-authored-by: Oran Agra <oran@redislabs.com>	2021-11-24 13:34:13 +02:00
Eduardo Semprebon	1a255e3150	Reject PING with MASTERDOWN when replica-serve-stale-data=no (#9757 ) Currently PING returns different status when server is not serving data, for example when `LOADING` or `BUSY`. But same was not true for `MASTERDOWN` This commit makes PING reply with `MASTERDOWN` when replica-serve-stale-data=no and link is MASTER is down.	2021-11-18 10:53:17 +02:00
Eduardo Semprebon	91d0c758e5	Replica keep serving data during repl-diskless-load=swapdb for better availability (#9323 ) For diskless replication in swapdb mode, considering we already spend replica memory having a backup of current db to restore in case of failure, we can have the following benefits by instead swapping database only in case we succeeded in transferring db from master: - Avoid `LOADING` response during failed and successful synchronization for cases where the replica is already up and running with data. - Faster total time of diskless replication, because now we're moving from Transfer + Flush + Load time to Transfer + Load only. Flushing the tempDb is done asynchronously after swapping. - This could be implemented also for disk replication with similar benefits if consumers are willing to spend the extra memory usage. General notes: - The concept of `backupDb` becomes `tempDb` for clarity. - Async loading mode will only kick in if the replica is syncing from a master that has the same repl-id the one it had before. i.e. the data it's getting belongs to a different time of the same timeline. - New property in INFO: `async_loading` to differentiate from the blocking loading - Slot to Key mapping is now a field of `redisDb` as it's more natural to access it from both server.db and the tempDb that is passed around. - Because this is affecting replicas only, we assume that if they are not readonly and write commands during replication, they are lost after SYNC same way as before, but we're still denying CONFIG SET here anyways to avoid complications. Considerations for review: - We have many cases where server.loading flag is used and even though I tried my best, there may be cases where async_loading should be checked as well and cases where it shouldn't (would require very good understanding of whole code) - Several places that had different behavior depending on the loading flag where actually meant to just handle commands coming from the AOF client differently than ones coming from real clients, changed to check CLIENT_ID_AOF instead. Additional for Release Notes - Bugfix - server.dirty was not incremented for any kind of diskless replication, as effect it wouldn't contribute on triggering next database SAVE - New flag for RM_GetContextFlags module API: REDISMODULE_CTX_FLAGS_ASYNC_LOADING - Deprecated RedisModuleEvent_ReplBackup. Starting from Redis 7.0, we don't fire this event. Instead, we have the new RedisModuleEvent_ReplAsyncLoad holding 3 sub-events: STARTED, ABORTED and COMPLETED. - New module flag REDISMODULE_OPTIONS_HANDLE_REPL_ASYNC_LOAD for RedisModule_SetModuleOptions to allow modules to declare they support the diskless replication with async loading (when absent, we fall back to disk-based loading). Co-authored-by: Eduardo Semprebon <edus@saxobank.com> Co-authored-by: Oran Agra <oran@redislabs.com>	2021-11-04 10:46:50 +02:00
Wang Yuan	9ec3294b97	Add timestamp annotations in AOF (#9326 ) Add timestamp annotation in AOF, one part of #9325. Enabled with the new `aof-timestamp-enabled` config option. Timestamp annotation format is "#TS:${timestamp}\r\n"." TS" is short of timestamp and this method could save extra bytes in AOF. We can use timestamp annotation for some special functions. - know the executing time of commands - restore data to a specific point-in-time (by using redis-check-rdb to truncate the file)	2021-10-25 13:08:34 +03:00
Wang Yuan	c1718f9d86	Replication backlog and replicas use one global shared replication buffer (#9166 ) ## Background For redis master, one replica uses one copy of replication buffer, that is a big waste of memory, more replicas more waste, and allocate/free memory for every reply list also cost much. If we set client-output-buffer-limit small and write traffic is heavy, master may disconnect with replicas and can't finish synchronization with replica. If we set client-output-buffer-limit big, master may be OOM when there are many replicas that separately keep much memory. Because replication buffers of different replica client are the same, one simple idea is that all replicas only use one replication buffer, that will effectively save memory. Since replication backlog content is the same as replicas' output buffer, now we can discard replication backlog memory and use global shared replication buffer to implement replication backlog mechanism. ## Implementation I create one global "replication buffer" which contains content of replication stream. The structure of "replication buffer" is similar to the reply list that exists in every client. But the node of list is `replBufBlock`, which has `id, repl_offset, refcount` fields. ```c /* Replication buffer blocks is the list of replBufBlock. * * +--------------+ +--------------+ +--------------+ * \| refcount = 1 \| ... \| refcount = 0 \| ... \| refcount = 2 \| * +--------------+ +--------------+ +--------------+ * \| / \ * \| / \ * \| / \ * Repl Backlog Replia_A Replia_B * * Each replica or replication backlog increments only the refcount of the * 'ref_repl_buf_node' which it points to. So when replica walks to the next * node, it should first increase the next node's refcount, and when we trim * the replication buffer nodes, we remove node always from the head node which * refcount is 0. If the refcount of the head node is not 0, we must stop * trimming and never iterate the next node. / / Similar with 'clientReplyBlock', it is used for shared buffers between * all replica clients and replication backlog. / typedef struct replBufBlock { int refcount; / Number of replicas or repl backlog using. / long long id; / The unique incremental number. / long long repl_offset; / Start replication offset of the block. */ size_t size, used; char buf[]; } replBufBlock; ``` So now when we feed replication stream into replication backlog and all replicas, we only need to feed stream into replication buffer `feedReplicationBuffer`. In this function, we set some fields of replication backlog and replicas to references of the global replication buffer blocks. And we also need to check replicas' output buffer limit to free if exceeding `client-output-buffer-limit`, and trim replication backlog if exceeding `repl-backlog-size`. When sending reply to replicas, we also need to iterate replication buffer blocks and send its content, when totally sending one block for replica, we decrease current node count and increase the next current node count, and then free the block which reference is 0 from the head of replication buffer blocks. Since now we use linked list to manage replication backlog, it may cost much time for iterating all linked list nodes to find corresponding replication buffer node. So we create a rax tree to store some nodes for index, but to avoid rax tree occupying too much memory, i record one per 64 nodes for index. Currently, to make partial resynchronization as possible as much, we always let replication backlog as the last reference of replication buffer blocks, backlog size may exceeds our setting if slow replicas that reference vast replication buffer blocks, and this method doesn't increase memory usage since they share replication buffer. To avoid freezing server for freeing unreferenced replication buffer blocks when we need to trim backlog for exceeding backlog size setting, we trim backlog incrementally (free 64 blocks per call now), and make it faster in `beforeSleep` (free 640 blocks). ### Other changes - `mem_total_replication_buffers`: we add this field in INFO command, it means the total memory of replication buffers used. - `mem_clients_slaves`: now even replica is slow to replicate, and its output buffer memory is not 0, but it still may be 0, since replication backlog and replicas share one global replication buffer, only if replication buffer memory is more than the repl backlog setting size, we consider the excess as replicas' memory. Otherwise, we think replication buffer memory is the consumption of repl backlog. - Key eviction Since all replicas and replication backlog share global replication buffer, we think only the part of exceeding backlog size the extra separate consumption of replicas. Because we trim backlog incrementally in the background, backlog size may exceeds our setting if slow replicas that reference vast replication buffer blocks disconnect. To avoid massive eviction loop, we don't count the delayed freed replication backlog into used memory even if there are no replicas, i.e. we also regard this memory as replicas's memory. - `client-output-buffer-limit` check for replica clients It doesn't make sense to set the replica clients output buffer limit lower than the repl-backlog-size config (partial sync will succeed and then replica will get disconnected). Such a configuration is ignored (the size of repl-backlog-size will be used). This doesn't have memory consumption implications since the replica client will share the backlog buffers memory. - Drop replication backlog after loading data if needed We always create replication backlog if server is a master, we need it because we put DELs in it when loading expired keys in RDB, but if RDB doesn't have replication info or there is no rdb, it is not possible to support partial resynchronization, to avoid extra memory of replication backlog, we drop it. - Multi IO threads Since all replicas and replication backlog use global replication buffer, if I/O threads are enabled, to guarantee data accessing thread safe, we must let main thread handle sending the output buffer to all replicas. But before, other IO threads could handle sending output buffer of all replicas. ## Other optimizations This solution resolve some other problem: - When replicas disconnect with master since of out of output buffer limit, releasing the output buffer of replicas may freeze server if we set big `client-output-buffer-limit` for replicas, but now, it doesn't cause freezing. - This implementation may mitigate reply list copy cost time(also freezes server) when one replication has huge reply buffer and another replica can copy buffer for full synchronization. now, we just copy reference info, it is very light. - If we set replication backlog size big, it also may cost much time to copy replication backlog into replica's output buffer. But this commit eliminates this problem. - Resizing replication backlog size doesn't empty current replication backlog content.	2021-10-25 09:24:31 +03:00
guybe7	43e736f79b	Treat subcommands as commands (#9504 ) ## Intro The purpose is to allow having different flags/ACL categories for subcommands (Example: CONFIG GET is ok-loading but CONFIG SET isn't) We create a small command table for every command that has subcommands and each subcommand has its own flags, etc. (same as a "regular" command) This commit also unites the Redis and the Sentinel command tables ## Affected commands CONFIG Used to have "admin ok-loading ok-stale no-script" Changes: 1. Dropped "ok-loading" in all except GET (this doesn't change behavior since there were checks in the code doing that) XINFO Used to have "read-only random" Changes: 1. Dropped "random" in all except CONSUMERS XGROUP Used to have "write use-memory" Changes: 1. Dropped "use-memory" in all except CREATE and CREATECONSUMER COMMAND No changes. MEMORY Used to have "random read-only" Changes: 1. Dropped "random" in PURGE and USAGE ACL Used to have "admin no-script ok-loading ok-stale" Changes: 1. Dropped "admin" in WHOAMI, GENPASS, and CAT LATENCY No changes. MODULE No changes. SLOWLOG Used to have "admin random ok-loading ok-stale" Changes: 1. Dropped "random" in RESET OBJECT Used to have "read-only random" Changes: 1. Dropped "random" in ENCODING and REFCOUNT SCRIPT Used to have "may-replicate no-script" Changes: 1. Dropped "may-replicate" in all except FLUSH and LOAD CLIENT Used to have "admin no-script random ok-loading ok-stale" Changes: 1. Dropped "random" in all except INFO and LIST 2. Dropped "admin" in ID, TRACKING, CACHING, GETREDIR, INFO, SETNAME, GETNAME, and REPLY STRALGO No changes. PUBSUB No changes. CLUSTER Changes: 1. Dropped "admin in countkeysinslots, getkeysinslot, info, nodes, keyslot, myid, and slots SENTINEL No changes. (note that DEBUG also fits, but we decided not to convert it since it's for debugging and anyway undocumented) ## New sub-command This commit adds another element to the per-command output of COMMAND, describing the list of subcommands, if any (in the same structure as "regular" commands) Also, it adds a new subcommand: ``` COMMAND LIST [FILTERBY (MODULE <module-name>\|ACLCAT <cat>\|PATTERN <pattern>)] ``` which returns a set of all commands (unless filters), but excluding subcommands. ## Module API A new module API, RM_CreateSubcommand, was added, in order to allow module writer to define subcommands ## ACL changes: 1. Now, that each subcommand is actually a command, each has its own ACL id. 2. The old mechanism of allowed_subcommands is redundant (blocking/allowing a subcommand is the same as blocking/allowing a regular command), but we had to keep it, to support the widespread usage of allowed_subcommands to block commands with certain args, that aren't subcommands (e.g. "-select +select\|0"). 3. I have renamed allowed_subcommands to allowed_firstargs to emphasize the difference. 4. Because subcommands are commands in ACL too, you can now use "-" to block subcommands (e.g. "+client -client\|kill"), which wasn't possible in the past. 5. It is also possible to use the allowed_firstargs mechanism with subcommand. For example: `+config -config\|set +config\|set\|loglevel` will block all CONFIG SET except for setting the log level. 6. All of the ACL changes above required some amount of refactoring. ## Misc 1. There are two approaches: Either each subcommand has its own function or all subcommands use the same function, determining what to do according to argv[0]. For now, I took the former approaches only with CONFIG and COMMAND, while other commands use the latter approach (for smaller blamelog diff). 2. Deleted memoryGetKeys: It is no longer needed because MEMORY USAGE now uses the "range" key spec. 4. Bugfix: GETNAME was missing from CLIENT's help message. 5. Sentinel and Redis now use the same table, with the same function pointer. Some commands have a different implementation in Sentinel, so we redirect them (these are ROLE, PUBLISH, and INFO). 6. Command stats now show the stats per subcommand (e.g. instead of stats just for "config" you will have stats for "config\|set", "config\|get", etc.) 7. It is now possible to use COMMAND directly on subcommands: COMMAND INFO CONFIG\|GET (The pipeline syntax was inspired from ACL, and can be used in functions lookupCommandBySds and lookupCommandByCString) 8. STRALGO is now a container command (has "help") ## Breaking changes: 1. Command stats now show the stats per subcommand (see (5) above)	2021-10-20 11:52:57 +03:00
Wen Hui	1c2b5f5318	Make Cluster-bus port configurable with new cluster-port config (#9389 ) Make Cluster-bus port configurable with new cluster-port config	2021-10-18 22:28:27 -07:00
Binbin	dd3ac97ffe	Cleanup typos, incorrect comments, and fixed small memory leak in redis-cli (#9153 ) 1. Remove forward declarations from header files to functions that do not exist: hmsetCommand and rdbSaveTime. 2. Minor phrasing fixes in #9519 3. Add missing sdsfree(title) and fix typo in redis-benchmark. 4. Modify some error comments in some zset commands. 5. Fix copy-paste bug comment in syncWithMaster about `ip-address`.	2021-10-02 22:19:33 -07:00
Eduardo Semprebon	d509675592	Fix error message for replica-serve-stale-data in redis.conf doc (#9519 ) seems that his piece of doc was always wrong (no such error in the code)	2021-09-30 16:28:25 +03:00
yoav-steinberg	2753429c99	Client eviction (#8687 ) ### Description A mechanism for disconnecting clients when the sum of all connected clients is above a configured limit. This prevents eviction or OOM caused by accumulated used memory between all clients. It's a complimentary mechanism to the `client-output-buffer-limit` mechanism which takes into account not only a single client and not only output buffers but rather all memory used by all clients. #### Design The general design is as following: * We track memory usage of each client, taking into account all memory used by the client (query buffer, output buffer, parsed arguments, etc...). This is kept up to date after reading from the socket, after processing commands and after writing to the socket. * Based on the used memory we sort all clients into buckets. Each bucket contains all clients using up up to x2 memory of the clients in the bucket below it. For example up to 1m clients, up to 2m clients, up to 4m clients, ... * Before processing a command and before sleep we check if we're over the configured limit. If we are we start disconnecting clients from larger buckets downwards until we're under the limit. #### Config `maxmemory-clients` max memory all clients are allowed to consume, above this threshold we disconnect clients. This config can either be set to 0 (meaning no limit), a size in bytes (possibly with MB/GB suffix), or as a percentage of `maxmemory` by using the `%` suffix (e.g. setting it to `10%` would mean 10% of `maxmemory`). #### Important code changes * During the development I encountered yet more situations where our io-threads access global vars. And needed to fix them. I also had to handle keeps the clients sorted into the memory buckets (which are global) while their memory usage changes in the io-thread. To achieve this I decided to simplify how we check if we're in an io-thread and make it much more explicit. I removed the `CLIENT_PENDING_READ` flag used for checking if the client is in an io-thread (it wasn't used for anything else) and just used the global `io_threads_op` variable the same way to check during writes. * I optimized the cleanup of the client from the `clients_pending_read` list on client freeing. We now store a pointer in the `client` struct to this list so we don't need to search in it (`pending_read_list_node`). * Added `evicted_clients` stat to `INFO` command. * Added `CLIENT NO-EVICT ON\|OFF` sub command to exclude a specific client from the client eviction mechanism. Added corrosponding 'e' flag in the client info string. * Added `multi-mem` field in the client info string to show how much memory is used up by buffered multi commands. * Client `tot-mem` now accounts for buffered multi-commands, pubsub patterns and channels (partially), tracking prefixes (partially). * CLIENT_CLOSE_ASAP flag is now handled in a new `beforeNextClient()` function so clients will be disconnected between processing different clients and not only before sleep. This new function can be used in the future for work we want to do outside the command processing loop but don't want to wait for all clients to be processed before we get to it. Specifically I wanted to handle output-buffer-limit related closing before we process client eviction in case the two race with each other. * Added a `DEBUG CLIENT-EVICTION` command to print out info about the client eviction buckets. * Each client now holds a pointer to the client eviction memory usage bucket it belongs to and listNode to itself in that bucket for quick removal. * Global `io_threads_op` variable now can contain a `IO_THREADS_OP_IDLE` value indicating no io-threading is currently being executed. * In order to track memory used by each clients in real-time we can't rely on updating these stats in `clientsCron()` alone anymore. So now I call `updateClientMemUsage()` (used to be `clientsCronTrackClientsMemUsage()`) after command processing, after writing data to pubsub clients, after writing the output buffer and after reading from the socket (and maybe other places too). The function is written to be fast. * Clients are evicted if needed (with appropriate log line) in `beforeSleep()` and before processing a command (before performing oom-checks and key-eviction). * All clients memory usage buckets are grouped as follows: * All clients using less than 64k. * 64K..128K * 128K..256K * ... * 2G..4G * All clients using 4g and up. * Added client-eviction.tcl with a bunch of tests for the new mechanism. * Extended maxmemory.tcl to test the interaction between maxmemory and maxmemory-clients settings. * Added an option to flag a numeric configuration variable as a "percent", this means that if we encounter a '%' after the number in the config file (or config set command) we consider it as valid. Such a number is store internally as a negative value. This way an integer value can be interpreted as either a percent (negative) or absolute value (positive). This is useful for example if some numeric configuration can optionally be set to a percentage of something else. Co-authored-by: Oran Agra <oran@redislabs.com>	2021-09-23 14:02:16 +03:00
Altaf hussain	f041990f2a	fix typo in redis.conf. Interfece changed to interface (#9498 )	2021-09-13 20:07:50 +03:00
sundb	3ca6972ecd	Replace all usage of ziplist with listpack for t_zset (#9366 ) Part two of implementing #8702 (zset), after #8887. ## Description of the feature Replaced all uses of ziplist with listpack in t_zset, and optimized some of the code to optimize performance. ## Rdb format changes New `RDB_TYPE_ZSET_LISTPACK` rdb type. ## Rdb loading improvements: 1) Pre-expansion of dict for validation of duplicate data for listpack and ziplist. 2) Simplifying the release of empty key objects when RDB loading. 3) Unify ziplist and listpack data verify methods for zset and hash, and move code to rdb.c. ## Interface changes 1) New `zset-max-listpack-entries` config is an alias for `zset-max-ziplist-entries` (same with `zset-max-listpack-value`). 2) OBJECT ENCODING will return listpack instead of ziplist. ## Listpack improvements: 1) Add `lpDeleteRange` and `lpDeleteRangeWithEntry` functions to delete a range of entries from listpack. 2) Improve the performance of `lpCompare`, converting from string to integer is faster than converting from integer to string. 3) Replace `snprintf` with `ll2string` to improve performance in converting numbers to strings in `lpGet()`. ## Zset improvements: 1) Improve the performance of `zzlFind` method, use `lpFind` instead of `lpCompare` in a loop. 2) Use `lpDeleteRangeWithEntry` instead of `lpDelete` twice to delete a element of zset. ## Tests 1) Add some unittests for `lpDeleteRange` and `lpDeleteRangeWithEntry` function. 2) Add zset RDB loading test. 3) Add benchmark test for `lpCompare` and `ziplsitCompare`. 4) Add empty listpack zset corrupt dump test.	2021-09-09 18:18:53 +03:00
Wang Yuan	9a0c0617f1	Use sync_file_range to optimize fsync if possible (#9409 ) We implement incremental data sync in rio.c by call fsync, on slow disk, that may cost a lot of time, sync_file_range could provide async fsync, so we could serialize key/value and sync file data at the same time. > one tip for sync_file_range usage: http://lkml.iu.edu/hypermail/linux/kernel/1005.2/01845.html Additionally, this change avoids a single large write to be used, which can result in a mass of dirty pages in the kernel (increasing the risk of someone else's write to block). On HDD, current solution could reduce approximate half of dumping RDB time, this PR costs 50s for dump 7.7G rdb but unstable branch costs 93s. On NVME SSD, this PR can't reduce much time, this PR costs 40s, unstable branch costs 48s. Moreover, I find calling data sync every 4MB is better than 32MB.	2021-08-30 10:24:53 +03:00

1 2 3 4 5 ...

344 Commits