Introduce Valkey Over RDMA transport (experimental) (#477)

Adds an option to build RDMA support as a module:

    make BUILD_RDMA=module

To start valkey-server with RDMA, use a command line like the following:

    ./src/valkey-server --loadmodule src/valkey-rdma.so \
        port=6379 bind=xx.xx.xx.xx

* Implement server side of connection module only, this means we can
*NOT*
  compile RDMA support as built-in.

* Add necessary information in README.md

* Support 'CONFIG SET/GET', for example, 'CONFIG Set rdma.port 6380',
then
  check this by 'rdma res show cm_id' and valkey-cli (with RDMA support,
  but not implemented in this patch).

* The full listeners show like:

      listener0:name=tcp,bind=*,bind=-::*,port=6379
      listener1:name=unix,bind=/var/run/valkey.sock
      listener2:name=rdma,bind=xx.xx.xx.xx,bind=yy.yy.yy.yy,port=6379
      listener3:name=tls,bind=*,bind=-::*,port=16379

Because the lack of RDMA support from TCL, use a simple C program to
test
Valkey Over RDMA (under tests/rdma/). This is a quite raw version with
basic
library dependence: libpthread, libibverbs, librdmacm. Run using the
script:

    ./runtest-rdma [ OPTIONS ]

To run RDMA in GitHub actions, a kernel module RXE for emulated soft
RDMA, needs
to be installed. The kernel module source code is fetched a repo
containing only
the RXE kernel driver from the Linux kernel, but stored in an separate
repo to
avoid cloning the whole Linux kernel repo.

----

Since 2021/06, I created a
[PR](https://github.com/redis/redis/pull/9161) for *Redis Over RDMA*
proposal. Then I did some work to [fully abstract connection and make
TLS dynamically loadable](https://github.com/redis/redis/pull/9320), a
new connection type could be built into Redis statically, or a separated
shared library(loaded by Redis on startup) since Redis 7.2.0.

Base on the new connection framework, I created a new
[PR](https://github.com/redis/redis/pull/11182), some
guys(@xiezhq-hermann @zhangyiming1201 @JSpewock @uvletter @FujiZ)
noticed, played and tested this PR. However, because of the lack of time
and knowledge from the maintainers, this PR has been pending about 2
years.

Related doc: [Introduce *Valkey Over RDMA*
specification](https://github.com/valkey-io/valkey-doc/pull/123). (same
as Redis, and this should be same)

Changes in this PR:
- implement *Valkey Over RDMA*. (compact the Valkey style)

Finally, if this feature is considered to merge, I volunteer to maintain
it.

---------

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
This commit is contained in:
zhenwei pi 2024-07-15 20:04:22 +08:00 committed by GitHub
parent c1bbdc796d
commit dd4bd5065b
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
10 changed files with 3342 additions and 1 deletions

View File

@ -44,6 +44,26 @@ jobs:
- name: module api test - name: module api test
run: CFLAGS='-Werror' ./runtest-moduleapi --verbose --dump-logs run: CFLAGS='-Werror' ./runtest-moduleapi --verbose --dump-logs
test-rdma:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
- name: make
run: |
sudo apt-get install librdmacm-dev libibverbs-dev
make BUILD_RDMA=module
- name: clone-rxe-kmod
run: |
mkdir -p tests/rdma/rxe
git clone https://github.com/pizhenwei/rxe.git tests/rdma/rxe
make -C tests/rdma/rxe
- name: clear-kernel-log
run: sudo dmesg -c > /dev/null
- name: test
run: sudo ./runtest-rdma --install-rxe
- name: show-kernel-log
run: sudo dmesg -c
build-debian-old: build-debian-old:
runs-on: ubuntu-latest runs-on: ubuntu-latest
container: debian:buster container: debian:buster

1
.gitignore vendored
View File

@ -47,3 +47,4 @@ redis.code-workspace
.swp .swp
nodes*.conf nodes*.conf
tests/cluster/tmp/* tests/cluster/tmp/*
tests/rdma/rdma-test

View File

@ -31,6 +31,12 @@ libssl-dev on Debian/Ubuntu) and run:
% make BUILD_TLS=yes % make BUILD_TLS=yes
To build with experimental RDMA support you'll need RDMA development libraries
(e.g. librdmacm-dev and libibverbs-dev on Debian/Ubuntu). For now, Valkey only
supports RDMA as connection module mode. Run:
% make BUILD_RDMA=module
To build with systemd support, you'll need systemd development libraries (such To build with systemd support, you'll need systemd development libraries (such
as libsystemd-dev on Debian/Ubuntu or systemd-devel on CentOS) and run: as libsystemd-dev on Debian/Ubuntu or systemd-devel on CentOS) and run:
@ -155,6 +161,38 @@ Running Valkey with TLS:
Please consult the [TLS.md](TLS.md) file for more information on Please consult the [TLS.md](TLS.md) file for more information on
how to use Valkey with TLS. how to use Valkey with TLS.
Running Valkey with RDMA:
------------------
Note that Valkey Over RDMA is an experimental feature.
It may be changed or removed in any minor or major version.
Currently, it is only supported on Linux.
To manually run a Valkey server with RDMA mode:
% ./src/valkey-server --protected-mode no \
--loadmodule src/valkey-rdma.so bind=192.168.122.100 port=6379
It's possible to change bind address/port of RDMA by runtime command:
192.168.122.100:6379> CONFIG SET rdma.port 6380
It's also possible to have both RDMA and TCP available, and there is no
conflict of TCP(6379) and RDMA(6379), Ex:
% ./src/valkey-server --protected-mode no \
--loadmodule src/valkey-rdma.so bind=192.168.122.100 port=6379 \
--port 6379
Note that the network card (192.168.122.100 of this example) should support
RDMA. To test a server supports RDMA or not:
% rdma res show (a new version iproute2 package)
Or:
% ibv_devices
Playing with Valkey Playing with Valkey
------------------ ------------------

1
runtest-rdma Executable file
View File

@ -0,0 +1 @@
./tests/rdma/run.py $*

View File

@ -337,6 +337,28 @@ ifeq ($(BUILD_TLS),module)
TLS_MODULE_CFLAGS+=-DUSE_OPENSSL=$(BUILD_MODULE) $(OPENSSL_CFLAGS) -DBUILD_TLS_MODULE=$(BUILD_MODULE) TLS_MODULE_CFLAGS+=-DUSE_OPENSSL=$(BUILD_MODULE) $(OPENSSL_CFLAGS) -DBUILD_TLS_MODULE=$(BUILD_MODULE)
endif endif
BUILD_RDMA:=no
RDMA_MODULE=
RDMA_MODULE_NAME:=valkey-rdma$(PROG_SUFFIX).so
RDMA_MODULE_CFLAGS:=$(FINAL_CFLAGS)
ifeq ($(BUILD_RDMA),module)
FINAL_CFLAGS+=-DUSE_RDMA=$(BUILD_MODULE)
RDMA_PKGCONFIG := $(shell $(PKG_CONFIG) --exists librdmacm libibverbs && echo $$?)
ifeq ($(RDMA_PKGCONFIG),0)
RDMA_LIBS=$(shell $(PKG_CONFIG) --libs librdmacm libibverbs)
else
RDMA_LIBS=-lrdmacm -libverbs
endif
RDMA_MODULE=$(RDMA_MODULE_NAME)
RDMA_MODULE_CFLAGS+=-DUSE_RDMA=$(BUILD_YES) -DBUILD_RDMA_MODULE $(RDMA_LIBS)
else
ifeq ($(BUILD_RDMA),no)
# disable RDMA, do nothing
else
$(error "RDMA is only supported as module (BUILD_RDMA=module), or disabled (BUILD_RDMA=no)")
endif
endif
ifndef V ifndef V
define MAKE_INSTALL define MAKE_INSTALL
@printf ' %b %b\n' $(LINKCOLOR)INSTALL$(ENDCOLOR) $(BINCOLOR)$(1)$(ENDCOLOR) 1>&2 @printf ' %b %b\n' $(LINKCOLOR)INSTALL$(ENDCOLOR) $(BINCOLOR)$(1)$(ENDCOLOR) 1>&2
@ -414,7 +436,7 @@ ENGINE_TEST_OBJ:=$(sort $(patsubst unit/%.c,unit/%.o,$(ENGINE_TEST_FILES)))
ENGINE_UNIT_TESTS:=$(ENGINE_NAME)-unit-tests$(PROG_SUFFIX) ENGINE_UNIT_TESTS:=$(ENGINE_NAME)-unit-tests$(PROG_SUFFIX)
ALL_SOURCES=$(sort $(patsubst %.o,%.c,$(ENGINE_SERVER_OBJ) $(ENGINE_CLI_OBJ) $(ENGINE_BENCHMARK_OBJ))) ALL_SOURCES=$(sort $(patsubst %.o,%.c,$(ENGINE_SERVER_OBJ) $(ENGINE_CLI_OBJ) $(ENGINE_BENCHMARK_OBJ)))
all: $(SERVER_NAME) $(ENGINE_SENTINEL_NAME) $(ENGINE_CLI_NAME) $(ENGINE_BENCHMARK_NAME) $(ENGINE_CHECK_RDB_NAME) $(ENGINE_CHECK_AOF_NAME) $(TLS_MODULE) all: $(SERVER_NAME) $(ENGINE_SENTINEL_NAME) $(ENGINE_CLI_NAME) $(ENGINE_BENCHMARK_NAME) $(ENGINE_CHECK_RDB_NAME) $(ENGINE_CHECK_AOF_NAME) $(TLS_MODULE) $(RDMA_MODULE)
@echo "" @echo ""
@echo "Hint: It's a good idea to run 'make test' ;)" @echo "Hint: It's a good idea to run 'make test' ;)"
@echo "" @echo ""
@ -437,6 +459,7 @@ persist-settings: distclean
echo OPT=$(OPT) >> .make-settings echo OPT=$(OPT) >> .make-settings
echo MALLOC=$(MALLOC) >> .make-settings echo MALLOC=$(MALLOC) >> .make-settings
echo BUILD_TLS=$(BUILD_TLS) >> .make-settings echo BUILD_TLS=$(BUILD_TLS) >> .make-settings
echo BUILD_RDMA=$(BUILD_RDMA) >> .make-settings
echo USE_SYSTEMD=$(USE_SYSTEMD) >> .make-settings echo USE_SYSTEMD=$(USE_SYSTEMD) >> .make-settings
echo CFLAGS=$(CFLAGS) >> .make-settings echo CFLAGS=$(CFLAGS) >> .make-settings
echo LDFLAGS=$(LDFLAGS) >> .make-settings echo LDFLAGS=$(LDFLAGS) >> .make-settings
@ -489,6 +512,10 @@ $(ENGINE_CHECK_AOF_NAME): $(SERVER_NAME)
$(TLS_MODULE_NAME): $(SERVER_NAME) $(TLS_MODULE_NAME): $(SERVER_NAME)
$(QUIET_CC)$(CC) -o $@ tls.c -shared -fPIC $(TLS_MODULE_CFLAGS) $(TLS_CLIENT_LIBS) $(QUIET_CC)$(CC) -o $@ tls.c -shared -fPIC $(TLS_MODULE_CFLAGS) $(TLS_CLIENT_LIBS)
# valkey-rdma.so
$(RDMA_MODULE_NAME): $(REDIS_SERVER_NAME)
$(QUIET_CC)$(CC) -o $@ rdma.c -shared -fPIC $(RDMA_MODULE_CFLAGS)
# valkey-cli # valkey-cli
$(ENGINE_CLI_NAME): $(ENGINE_CLI_OBJ) $(ENGINE_CLI_NAME): $(ENGINE_CLI_OBJ)
$(SERVER_LD) -o $@ $^ ../deps/hiredis/libhiredis.a ../deps/linenoise/linenoise.o $(FINAL_LIBS) $(TLS_CLIENT_LIBS) $(SERVER_LD) -o $@ $^ ../deps/hiredis/libhiredis.a ../deps/linenoise/linenoise.o $(FINAL_LIBS) $(TLS_CLIENT_LIBS)

1888
src/rdma.c Normal file

File diff suppressed because it is too large Load Diff

16
tests/rdma/Makefile Normal file
View File

@ -0,0 +1,16 @@
BIN = rdma-test
ifeq ($(RDMA_PKGCONFIG),0)
RDMA_LIBS=$(shell $(PKG_CONFIG) --libs librdmacm libibverbs)
else
RDMA_LIBS=-lrdmacm -libverbs
endif
$(BIN): rdma-test.c
@$(CC) $^ -o $@ $(RDMA_LIBS) -lpthread -g
@echo "\nHint: please check the RDMA environment:"
@echo "\t~# rdma res show"
@echo "\n Then launch valkey-server with RDMA support, Run ./"$(BIN) "to test ..."
clean:
rm -rf $(BIN)

1059
tests/rdma/rdma-test.c Normal file

File diff suppressed because it is too large Load Diff

145
tests/rdma/rdma_env.py Executable file
View File

@ -0,0 +1,145 @@
#!/usr/bin/python3
"""
==========================================================================
run.py - script to setup/cleanup soft RDMA devices.
note that is script need root privilege.
--------------------------------------------------------------------------
Copyright (C) 2024 zhenwei pi <pizhenwei@bytedance.com>
This work is licensed under BSD 3-Clause, License 1 of the COPYING file in
the top-level directory.
==========================================================================
"""
import os
import subprocess
import netifaces
import time
import argparse
def prepare_ib():
cmd = "modprobe rdma_cm && modprobe udp_tunnel && modprobe ip6_udp_tunnel && modprobe ib_uverbs"
p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
if p.wait():
outs, _ = p.communicate()
print("Valkey Over RDMA probe modules of IB [FAILED]")
print("---------------\n" + outs.decode() + "---------------\n")
os._exit(1);
print("Valkey Over RDMA probe modules of IB [OK]")
def prepare_rxe(interface):
# is there any builtin rdma_rxe.ko?
p = subprocess.Popen("modprobe rdma_rxe 2> /dev/null", shell=True, stdout=subprocess.PIPE)
if p.wait():
valkeydir = os.path.dirname(os.path.abspath(__file__)) + "/../.."
rxedir = valkeydir + "/tests/rdma/rxe"
rxekmod = rxedir + "/rdma_rxe.ko"
print(rxedir)
print(rxekmod)
if not os.path.exists(rxekmod):
print("Neither kernel builtin nor out-of-tree rdma_rxe.ko found. Abort")
print("Please run the following commands to build out-of-tree RXE on Linux-6.5, then retry:")
print("\t~# mkdir -p " + rxedir)
print("\t~# git clone https://github.com/pizhenwei/rxe.git " + rxedir)
print("\t~# cd " + rxedir)
print("\t~# make")
os._exit(1);
cmd = "insmod " + rxekmod
p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
if p.wait():
os._exit(1);
print("Valkey Over RDMA install RXE [OK]")
softrdma = "rxe_" + interface
cmd = "rdma link add " + softrdma + " type rxe netdev " + interface
p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
if p.wait():
outs, _ = p.communicate()
print("Valkey Over RDMA install RXE [FAILED]")
print("---------------\n" + outs.decode() + "---------------\n")
os._exit(1);
print("Valkey Over RDMA add RXE device <%s> [OK]" % softrdma)
# find any IPv4 available networking interface
def find_iface():
interfaces = netifaces.interfaces()
for interface in interfaces:
if interface == "lo":
continue
addrs = netifaces.ifaddresses(interface)
if netifaces.AF_INET not in addrs:
continue
return interface
def setup_rdma(driver, interface):
if interface == None:
interface = find_iface()
prepare_ib()
if driver == "rxe":
prepare_rxe(interface)
else:
print("rxe is currently supported only")
os._exit(1);
# iterate /sys/class/infiniband, find any all virtual RDMA device, and remove them
def cleanup_rdma():
# Ex, /sys/class/infiniband/mlx5_0
# Ex, /sys/class/infiniband/rxe_eth0
# Ex, /sys/class/infiniband/siw_eth0
ibclass = "/sys/class/infiniband/"
try:
for dev in os.listdir(ibclass):
# Ex, /sys/class/infiniband/rxe_eth0/ports/1/gid_attrs/ndevs/0
origpath = os.readlink(ibclass + dev)
if "virtual" in origpath:
subprocess.Popen("rdma link del " + dev, shell=True).wait()
print("Remove virtual RDMA device : " + dev + " [OK]")
except os.error:
return None
# try to remove RXE driver from kernel, ignore error
subprocess.Popen("rmmod rdma_rxe 2> /dev/null", shell=True).wait()
# try to remove SIW driver from kernel, ignore error
subprocess.Popen("rmmod rdma_siw 2> /dev/null", shell=True).wait()
return None
if __name__ == "__main__":
parser = argparse.ArgumentParser(
description = "Script to setup/cleanup soft RDMA devices, note that root privilege is required",
formatter_class=argparse.RawDescriptionHelpFormatter)
parser.add_argument("-o", "--operation", type=str,
help="[setup|cleanup] setup or cleanup soft RDMA environment")
parser.add_argument("-d", "--driver", type=str, default="rxe",
help="[rxe|siw] specify soft RDMA driver, rxe by default")
parser.add_argument("-i", "--interface", type=str,
help="[IFACE] network interface, auto-select any available interface by default")
args = parser.parse_args()
# test UID. none-root user must stop on none RDMA platform, show some hints and exit.
if os.geteuid():
print("You are not root privileged. Abort.")
print("Or you may setup RXE manually in root privileged by commands:")
print("\t~# modprobe rdma_rxe")
print("\t~# rdma link add rxe0 type rxe netdev [IFACE]")
os._exit(1);
if args.operation == "cleanup":
cleanup_rdma()
elif args.operation == "setup":
setup_rdma(args.driver, args.interface)
os._exit(0);

146
tests/rdma/run.py Executable file
View File

@ -0,0 +1,146 @@
#!/usr/bin/python3
"""
==========================================================================
run.py - script for test client for Valkey Over RDMA (Linux only)
--------------------------------------------------------------------------
Copyright (C) 2024 zhenwei pi <pizhenwei@bytedance.com>
This work is licensed under BSD 3-Clause, License 1 of the COPYING file in
the top-level directory.
==========================================================================
"""
import os
import subprocess
import netifaces
import time
import argparse
def build_program():
valkeydir = os.path.dirname(os.path.abspath(__file__)) + "/../.."
cmd = "make -C " + valkeydir + "/tests/rdma"
p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
if p.wait():
print("Valkey Over RDMA build rdma-test [FAILED]")
return 1
print("Valkey Over RDMA build rdma-test program [OK]")
return 0
# iterate /sys/class/infiniband, find any usable RDMA device, and return IPv4 address
def find_rdma_dev():
# Ex, /sys/class/infiniband/mlx5_0
# Ex, /sys/class/infiniband/rxe_eth0
# Ex, /sys/class/infiniband/siw_eth0
ibclass = "/sys/class/infiniband/"
try:
for dev in os.listdir(ibclass):
# Ex, /sys/class/infiniband/rxe_eth0/ports/1/gid_attrs/ndevs/0
netdev = ibclass + dev + "/ports/1/gid_attrs/ndevs/0"
with open(netdev) as fp:
addrs = netifaces.ifaddresses(fp.readline().strip("\n"))
if netifaces.AF_INET not in addrs:
continue
ipaddr = addrs[netifaces.AF_INET][0]["addr"]
print("Valkey Over RDMA test prepare " + dev + " <" + ipaddr + "> [OK]")
return ipaddr
except os.error:
return None
return None
def test_rdma(ipaddr):
valkeydir = os.path.dirname(os.path.abspath(__file__)) + "/../.."
retval = 0
# step 1, prepare test directory
tmpdir = valkeydir + "/tests/rdma/tmp"
subprocess.Popen("mkdir -p " + tmpdir, shell=True).wait()
# step 2, start server
svrpath = valkeydir + "/src/valkey-server"
rdmapath = valkeydir + "/src/valkey-rdma.so"
svrcmd = [svrpath, "--port", "0", "--loglevel", "verbose", "--protected-mode", "no",
"--appendonly", "no", "--daemonize", "no", "--dir", valkeydir + "/tests/rdma/tmp",
"--loadmodule", rdmapath, "port=6379", "bind=" + ipaddr]
svr = subprocess.Popen(svrcmd, shell=False, stdout=subprocess.PIPE)
try:
if svr.wait(1):
print("Valkey Over RDMA valkey-server runs less than 1s [FAILED]")
return 1
except subprocess.TimeoutExpired as e:
print("Valkey Over RDMA valkey-server start [OK]")
pass
# step 3, run test client
start = time.time()
clipath = valkeydir + "/tests/rdma/rdma-test"
clicmd = [clipath, "--thread", "4", "-h", ipaddr]
cli = subprocess.Popen(clicmd, shell=False, stdout=subprocess.PIPE)
if cli.wait(60):
outs, _ = cli.communicate()
print("Valkey Over RDMA test [FAILED]")
print("---------------\n" + outs.decode() + "---------------\n")
retval = 1
else:
elapsed = time.time() - start
outs, _ = cli.communicate()
print("Valkey Over RDMA test in " + str(round(elapsed, 2)) + "s [OK]")
print(outs.decode())
retval = 0
# step 4, cleanup
svr.kill()
svr.wait()
subprocess.Popen("rm -rf " + tmpdir, shell=True).wait()
# step 5, report result
return retval
def test_exit(retval, install_rxe):
if install_rxe and not os.geteuid():
rdma_env_py = os.path.dirname(os.path.abspath(__file__)) + "/rdma_env.py"
cmd = rdma_env_py + " -o cleanup"
subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE).wait()
os._exit(retval);
if __name__ == "__main__":
parser = argparse.ArgumentParser(
description = "Script to test Valkey Over RDMA",
formatter_class=argparse.RawDescriptionHelpFormatter)
parser.add_argument("-r", "--install-rxe", action='store_true',
help="install RXE driver and setup RXE device")
args = parser.parse_args()
if args.install_rxe:
if os.geteuid():
print("--install-rxe/-r must be root privileged")
test_exit(1, False)
rdma_env_py = os.path.dirname(os.path.abspath(__file__)) + "/rdma_env.py"
cmd = rdma_env_py + " -o setup -d rxe"
p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
if p.wait():
print("Valkey Over RDMA setup RXE [FAILED]")
test_exit(1, False)
# build C client into binary
retval = build_program()
if retval:
test_exit(1, args.install_rxe)
ipaddr = find_rdma_dev()
if ipaddr is None:
# not fatal error, continue to create software version: RXE and SIW
print("Valkey Over RDMA test detect existing RDMA device [FAILED]")
else:
retval = test_rdma(ipaddr)
if not retval:
print("Valkey Over RDMA test over " + ipaddr + " [OK]")
test_exit(0, args.install_rxe);