Revert "iobuf: Get rid of pre allocated iobuf_pool and use per thread mem pool"

This reverts commit b87c397091.

There seems to be some performance regression with the patch and hence recommended to have it reverted.

Updates: #325
Change-Id: Id85d6203173a44fad6cf51d39b3e96f37afcec09
This commit is contained in:
Amar Tumballi 2019-01-04 07:04:50 +00:00
parent 054c7ea916
commit 37653efdc7
8 changed files with 1313 additions and 97 deletions

View File

@ -95,10 +95,52 @@ max-stdalloc=0 #Maximum number of allocations from heap that are in active use a
```
###Iobufs
The iobuf stats are printed in this section. It includes:
- active_cnt : number of iobufs that are currently allocated and being used. This number should not be too high. It generally is only as much as the number of inflight IO fops. large number indicates a leak in iobufs. There is no easy way to debug this, since the iobufs also come from mem pools, looking at the mem pool section in statedump will help.
- misses : number of iobuf allocations that were not served from mem_pool. (includes stdalloc and mem_pool alloc misses)
- hits : number of iobuf allocations that were served from the mem_pool memory.
```
[iobuf.global]
iobuf_pool=0x1f0d970 #The memory pool for iobufs
iobuf_pool.default_page_size=131072 #The default size of iobuf (if no iobuf size is specified the default size is allocated)
#iobuf_arena: One arena represents a group of iobufs of a particular size
iobuf_pool.arena_size=12976128 # The initial size of the iobuf pool (doesn't include the stdalloc'd memory or the newly added arenas)
iobuf_pool.arena_cnt=8 #Total number of arenas in the pool
iobuf_pool.request_misses=0 #The number of iobufs that were stdalloc'd (as they exceeded the default max page size provided by iobuf_pool).
```
There are 3 lists of arenas
1. Arena list: arenas allocated during iobuf pool creation and the arenas that are in use(active_cnt != 0) will be part of this list.
2. Purge list: arenas that can be purged(no active iobufs, active_cnt == 0).
3. Filled list: arenas without free iobufs.
```
[purge.1] #purge.<S.No.>
purge.1.mem_base=0x7fc47b35f000 #The address of the arena structure
purge.1.active_cnt=0 #The number of iobufs active in that arena
purge.1.passive_cnt=1024 #The number of unused iobufs in the arena
purge.1.alloc_cnt=22853 #Total allocs in this pool(number of times the iobuf was allocated from this arena)
purge.1.max_active=7 #Max active iobufs from this arena, at any point in the life of this process.
purge.1.page_size=128 #Size of all the iobufs in this arena.
[arena.5] #arena.<S.No.>
arena.5.mem_base=0x7fc47af1f000
arena.5.active_cnt=0
arena.5.passive_cnt=64
arena.5.alloc_cnt=0
arena.5.max_active=0
arena.5.page_size=32768
```
If the active_cnt of any arena is non zero, then the statedump will also have the iobuf list.
```
[arena.6.active_iobuf.1] #arena.<S.No>.active_iobuf.<iobuf.S.No.>
arena.6.active_iobuf.1.ref=1 #refcount of the iobuf
arena.6.active_iobuf.1.ptr=0x7fdb921a9000 #address of the iobuf
[arena.6.active_iobuf.2]
arena.6.active_iobuf.2.ref=1
arena.6.active_iobuf.2.ptr=0x7fdb92189000
```
At any given point in time if there are lots of filled arenas then that could be a sign of iobuf leaks.
###Call stack
All the fops received by gluster are handled using call-stacks. Call stack contains the information about uid/gid/pid etc of the process that is executing the fop. Each call-stack contains different call-frames per xlator which handles that fop.

View File

@ -2,13 +2,31 @@
##Datastructures
###iobuf
Short for IO Buffer. It is one allocatable unit for the consumers of the IOBUF
API, each unit hosts @page_size bytes of memory. As initial step of processing
a fop, the IO buffer passed onto GlusterFS by the other applications (FUSE VFS/
Applications using gfapi) is copied into GlusterFS space i.e. iobufs. Hence Iobufs
are mostly allocated/deallocated in Fuse, gfapi, protocol xlators, and also in
performance xlators to cache the IO buffers etc.
API, each unit hosts @page_size(defined in arena structure) bytes of memory. As
initial step of processing a fop, the IO buffer passed onto GlusterFS by the
other applications (FUSE VFS/ Applications using gfapi) is copied into GlusterFS
space i.e. iobufs. Hence Iobufs are mostly allocated/deallocated in Fuse, gfapi,
protocol xlators, and also in performance xlators to cache the IO buffers etc.
```
struct iobuf {
union {
struct list_head list;
struct {
struct iobuf *next;
struct iobuf *prev;
};
};
struct iobuf_arena *iobuf_arena;
Iobufs is allocated from the per thread mem pool.
gf_lock_t lock; /* for ->ptr and ->ref */
int ref; /* 0 == passive, >0 == active */
void *ptr; /* usable memory region by the consumer */
void *free_ptr; /* in case of stdalloc, this is the
one to be freed not the *ptr */
};
```
###iobref
There may be need of multiple iobufs for a single fop, like in vectored read/write.
@ -21,10 +39,105 @@ struct iobref {
int alloced; /* 16 by default, grows as required */
int used; /* number of iobufs added to this iobref */
};
```
###iobuf_arenas
One region of memory MMAPed from the operating system. Each region MMAPs
@arena_size bytes of memory, and hosts @arena_size / @page_size IOBUFs.
The same sized iobufs are grouped into one arena, for sanity of access.
```
struct iobuf_arena {
union {
struct list_head list;
struct {
struct iobuf_arena *next;
struct iobuf_arena *prev;
};
};
size_t page_size; /* size of all iobufs in this arena */
size_t arena_size; /* this is equal to
(iobuf_pool->arena_size / page_size)
* page_size */
size_t page_count;
struct iobuf_pool *iobuf_pool;
void *mem_base;
struct iobuf *iobufs; /* allocated iobufs list */
int active_cnt;
struct iobuf active; /* head node iobuf
(unused by itself) */
int passive_cnt;
struct iobuf passive; /* head node iobuf
(unused by itself) */
uint64_t alloc_cnt; /* total allocs in this pool */
int max_active; /* max active buffers at a given time */
};
```
###iobuf_pool
This is just a wrapper structure to keep count of active iobufs, iobuf mem pool
alloc misses and hits.
Pool of Iobufs. As there may be many Io buffers required by the filesystem,
a pool of iobufs are preallocated and kept, if these preallocated ones are
exhausted only then the standard malloc/free is called, thus improving the
performance. Iobuf pool is generally one per process, allocated during
glusterfs_ctx_t init (glusterfs_ctx_defaults_init), currently the preallocated
iobuf pool memory is freed on process exit. Iobuf pool is globally accessible
across GlusterFs, hence iobufs allocated by any xlator can be accessed by any
other xlators(unless iobuf is not passed).
```
struct iobuf_pool {
pthread_mutex_t mutex;
size_t arena_size; /* size of memory region in
arena */
size_t default_page_size; /* default size of iobuf */
int arena_cnt;
struct list_head arenas[GF_VARIABLE_IOBUF_COUNT];
/* array of arenas. Each element of the array is a list of arenas
holding iobufs of particular page_size */
struct list_head filled[GF_VARIABLE_IOBUF_COUNT];
/* array of arenas without free iobufs */
struct list_head purge[GF_VARIABLE_IOBUF_COUNT];
/* array of of arenas which can be purged */
uint64_t request_misses; /* mostly the requests for higher
value of iobufs */
};
```
~~~
The default size of the iobuf_pool(as of yet):
1024 iobufs of 128Bytes = 128KB
512 iobufs of 512Bytes = 256KB
512 iobufs of 2KB = 1MB
128 iobufs of 8KB = 1MB
64 iobufs of 32KB = 2MB
32 iobufs of 128KB = 4MB
8 iobufs of 256KB = 2MB
2 iobufs of 1MB = 2MB
Total ~13MB
~~~
As seen in the datastructure iobuf_pool has 3 arena lists.
- arenas:
The arenas allocated during iobuf_pool create, are part of this list. This list
also contains arenas that are partially filled i.e. contain few active and few
passive iobufs (passive_cnt !=0, active_cnt!=0 except for initially allocated
arenas). There will be by default 8 arenas of the sizes mentioned above.
- filled:
If all the iobufs in the arena are filled(passive_cnt = 0), the arena is moved
to the filled list. If any of the iobufs from the filled arena is iobuf_put,
then the arena moves back to the 'arenas' list.
- purge:
If there are no active iobufs in the arena(active_cnt = 0), the arena is moved
to purge list. iobuf_put() triggers destruction of the arenas in this list. The
arenas in the purge list are destroyed only if there is atleast one arena in
'arenas' list, that way there won't be spurious mmap/unmap of buffers.
(e.g: If there is an arena (page_size=128KB, count=32) in purge list, this arena
is destroyed(munmap) only if there is an arena in 'arenas' list with page_size=128KB).
##APIs
###iobuf_get
@ -44,15 +157,23 @@ struct iobuf * iobuf_get2 (struct iobuf_pool *iobuf_pool, size_t page_size);
Creates a new iobuf of a specified page size, if page_size=0 default page size
is considered.
```
if (requested iobuf size > Max size in the mem pool(1MB as of yet))
if (requested iobuf size > Max iobuf size in the pool(1MB as of yet))
{
Perform standard allocation(CALLOC) of the requested size
Perform standard allocation(CALLOC) of the requested size and
add it to the list iobuf_pool->arenas[IOBUF_ARENA_MAX_INDEX].
}
else
{
-request for memory from the per thread mem pool. This can be a miss
or hit, based on the availablility in the mem pool. Record the hit/miss
in the iobuf_pool.
-Round the page size to match the stndard sizes in iobuf pool.
(eg: if 3KB is requested, it is rounded to 8KB).
-Select the arena list corresponding to the rounded size
(eg: select 8KB arena)
If the selected arena has passive count > 0, then return the
iobuf from this arena, set the counters(passive/active/etc.)
appropriately.
else the arena is full, allocate new arena with rounded size
and standard page numbers and add to the arena list
(eg: 128 iobufs of 8KB is allocated).
}
```
Also takes a reference(increments ref count), hence no need of doing it
@ -76,6 +197,8 @@ Unreference the iobuf, if the ref count is zero iobuf is considered free.
```
-Delete the iobuf, if allocated from standard alloc and return.
-set the active/passive count appropriately.
-if passive count > 0 then add the arena to 'arena' list.
-if active count = 0 then add the arena to 'purge' list.
```
Every iobuf_ref should have a corresponding iobuf_unref, and also every
iobuf_get/2 should have a correspondning iobuf_unref.
@ -126,7 +249,8 @@ Unreference all the iobufs in the iobref, and also unref the iobref.
If all iobuf_refs/iobuf_new do not have correspondning iobuf_unref, then the
iobufs are not freed and recurring execution of such code path may lead to huge
memory leaks. The easiest way to identify if a memory leak is caused by iobufs
is to take a statedump.
is to take a statedump. If the statedump shows a lot of filled arenas then it is
a sure sign of leak. Refer doc/debugging/statedump.md for more details.
If iobufs are leaking, the next step is to find where the iobuf_unref went
missing. There is no standard/easy way of debugging this, code reading and logs

View File

@ -17,38 +17,118 @@
#include <sys/mman.h>
#include <sys/uio.h>
#define GF_VARIABLE_IOBUF_COUNT 32
#define GF_RDMA_DEVICE_COUNT 8
/* Lets try to define the new anonymous mapping
* flag, in case the system is still using the
* now deprecated MAP_ANON flag.
*
* Also, this should ideally be in a centralized/common
* header which can be used by other source files also.
*/
#ifndef MAP_ANONYMOUS
#define MAP_ANONYMOUS MAP_ANON
#endif
#define GF_ALIGN_BUF(ptr, bound) \
((void *)((unsigned long)(ptr + bound - 1) & (unsigned long)(~(bound - 1))))
#define GF_IOBUF_ALIGN_SIZE 512
#define GF_IOBUF_DEFAULT_PAGE_SIZE (128 * GF_UNIT_KB)
/* one allocatable unit for the consumers of the IOBUF API */
/* each unit hosts @page_size bytes of memory */
struct iobuf;
/* one region of memory mapped from the operating system */
/* each region MMAPs @arena_size bytes of memory */
/* each arena hosts @arena_size / @page_size IOBUFs */
struct iobuf_arena;
/* expandable and contractable pool of memory, internally broken into arenas */
struct iobuf_pool;
struct iobuf_init_config {
size_t pagesize;
int32_t num_pages;
};
struct iobuf {
gf_boolean_t stdalloc; /* indicates whether iobuf is allocated from
mem pool or standard alloc*/
gf_lock_t lock; /* for ->ptr and ->ref */
gf_atomic_t ref; /* 0 == passive, >0 == active */
union {
struct list_head list;
struct {
struct iobuf *next;
struct iobuf *prev;
};
};
struct iobuf_arena *iobuf_arena;
gf_lock_t lock; /* for ->ptr and ->ref */
gf_atomic_t ref; /* 0 == passive, >0 == active */
void *ptr; /* usable memory region by the consumer */
void *free_ptr; /* in case of stdalloc, this is the
one to be freed */
size_t page_size; /* iobuf's page size */
struct iobuf_pool *iobuf_pool; /* iobuf_pool iobuf is associated with */
void *free_ptr; /* in case of stdalloc, this is the
one to be freed */
};
struct iobuf_arena {
union {
struct list_head list;
struct {
struct iobuf_arena *next;
struct iobuf_arena *prev;
};
};
struct list_head all_list;
size_t page_size; /* size of all iobufs in this arena */
size_t arena_size;
/* this is equal to rounded_size * num_iobufs.
(rounded_size comes with gf_iobuf_get_pagesize().) */
size_t page_count;
struct iobuf_pool *iobuf_pool;
void *mem_base;
struct iobuf *iobufs; /* allocated iobufs list */
int active_cnt;
struct iobuf active; /* head node iobuf
(unused by itself) */
int passive_cnt;
struct iobuf passive; /* head node iobuf
(unused by itself) */
uint64_t alloc_cnt; /* total allocs in this pool */
int max_active; /* max active buffers at a given time */
};
struct iobuf_pool {
gf_atomic_t mem_pool_hit;
gf_atomic_t mem_pool_miss;
gf_atomic_t active_cnt;
pthread_mutex_t mutex;
size_t arena_size; /* size of memory region in
arena */
size_t default_page_size; /* default size of iobuf */
int arena_cnt;
struct list_head all_arenas;
struct list_head arenas[GF_VARIABLE_IOBUF_COUNT];
/* array of arenas. Each element of the array is a list of arenas
holding iobufs of particular page_size */
struct list_head filled[GF_VARIABLE_IOBUF_COUNT];
/* array of arenas without free iobufs */
struct list_head purge[GF_VARIABLE_IOBUF_COUNT];
/* array of of arenas which can be purged */
uint64_t request_misses; /* mostly the requests for higher
value of iobufs */
int rdma_device_count;
struct list_head *mr_list[GF_RDMA_DEVICE_COUNT];
void *device[GF_RDMA_DEVICE_COUNT];
int (*rdma_registration)(void **, void *);
int (*rdma_deregistration)(struct list_head **, struct iobuf_arena *);
};
struct iobuf_pool *
@ -62,10 +142,13 @@ iobuf_unref(struct iobuf *iobuf);
struct iobuf *
iobuf_ref(struct iobuf *iobuf);
void
iobuf_pool_destroy(struct iobuf_pool *iobuf_pool);
void
iobuf_to_iovec(struct iobuf *iob, struct iovec *iov);
#define iobuf_ptr(iob) ((iob)->ptr)
#define iobuf_pagesize(iob) (iob->page_size)
#define iobpool_default_pagesize(iobpool) ((iobpool)->default_page_size)
#define iobuf_pagesize(iob) (iob->iobuf_arena->page_size)
struct iobref {
gf_lock_t lock;

View File

@ -12,80 +12,588 @@
#include "glusterfs/statedump.h"
#include <stdio.h>
#include "glusterfs/libglusterfs-messages.h"
#include "glusterfs/atomic.h"
/*
TODO: implement destroy margins and prefetching of arenas
*/
#define IOBUF_ARENA_MAX_INDEX \
(sizeof(gf_iobuf_init_config) / (sizeof(struct iobuf_init_config)))
/* Make sure this array is sorted based on pagesize */
struct iobuf_init_config gf_iobuf_init_config[] = {
/* { pagesize, num_pages }, */
{128, 1024}, {512, 512}, {2 * 1024, 512}, {8 * 1024, 128},
{32 * 1024, 64}, {128 * 1024, 32}, {256 * 1024, 8}, {1 * 1024 * 1024, 2},
};
static int
gf_iobuf_get_arena_index(const size_t page_size)
{
int i;
for (i = 0; i < IOBUF_ARENA_MAX_INDEX; i++) {
if (page_size <= gf_iobuf_init_config[i].pagesize)
return i;
}
return -1;
}
static size_t
gf_iobuf_get_pagesize(const size_t page_size)
{
int i;
size_t size = 0;
for (i = 0; i < IOBUF_ARENA_MAX_INDEX; i++) {
size = gf_iobuf_init_config[i].pagesize;
if (page_size <= size)
return size;
}
return -1;
}
void
__iobuf_arena_init_iobufs(struct iobuf_arena *iobuf_arena)
{
int iobuf_cnt = 0;
struct iobuf *iobuf = NULL;
int offset = 0;
int i = 0;
GF_VALIDATE_OR_GOTO("iobuf", iobuf_arena, out);
iobuf_cnt = iobuf_arena->page_count;
iobuf_arena->iobufs = GF_CALLOC(sizeof(*iobuf), iobuf_cnt,
gf_common_mt_iobuf);
if (!iobuf_arena->iobufs)
return;
iobuf = iobuf_arena->iobufs;
for (i = 0; i < iobuf_cnt; i++) {
INIT_LIST_HEAD(&iobuf->list);
LOCK_INIT(&iobuf->lock);
iobuf->iobuf_arena = iobuf_arena;
iobuf->ptr = iobuf_arena->mem_base + offset;
list_add(&iobuf->list, &iobuf_arena->passive.list);
iobuf_arena->passive_cnt++;
offset += iobuf_arena->page_size;
iobuf++;
}
out:
return;
}
void
__iobuf_arena_destroy_iobufs(struct iobuf_arena *iobuf_arena)
{
int iobuf_cnt = 0;
struct iobuf *iobuf = NULL;
int i = 0;
GF_VALIDATE_OR_GOTO("iobuf", iobuf_arena, out);
iobuf_cnt = iobuf_arena->page_count;
if (!iobuf_arena->iobufs) {
gf_msg_callingfn(THIS->name, GF_LOG_ERROR, 0, LG_MSG_IOBUFS_NOT_FOUND,
"iobufs not found");
return;
}
iobuf = iobuf_arena->iobufs;
for (i = 0; i < iobuf_cnt; i++) {
GF_ASSERT(GF_ATOMIC_GET(iobuf->ref) == 0);
LOCK_DESTROY(&iobuf->lock);
list_del_init(&iobuf->list);
iobuf++;
}
GF_FREE(iobuf_arena->iobufs);
out:
return;
}
void
__iobuf_arena_destroy(struct iobuf_pool *iobuf_pool,
struct iobuf_arena *iobuf_arena)
{
GF_VALIDATE_OR_GOTO("iobuf", iobuf_arena, out);
if (iobuf_pool->rdma_deregistration)
iobuf_pool->rdma_deregistration(iobuf_pool->mr_list, iobuf_arena);
__iobuf_arena_destroy_iobufs(iobuf_arena);
if (iobuf_arena->mem_base && iobuf_arena->mem_base != MAP_FAILED)
munmap(iobuf_arena->mem_base, iobuf_arena->arena_size);
GF_FREE(iobuf_arena);
out:
return;
}
struct iobuf_arena *
__iobuf_arena_alloc(struct iobuf_pool *iobuf_pool, size_t page_size,
int32_t num_iobufs)
{
struct iobuf_arena *iobuf_arena = NULL;
size_t rounded_size = 0;
GF_VALIDATE_OR_GOTO("iobuf", iobuf_pool, out);
iobuf_arena = GF_CALLOC(sizeof(*iobuf_arena), 1, gf_common_mt_iobuf_arena);
if (!iobuf_arena)
goto err;
INIT_LIST_HEAD(&iobuf_arena->list);
INIT_LIST_HEAD(&iobuf_arena->all_list);
INIT_LIST_HEAD(&iobuf_arena->active.list);
INIT_LIST_HEAD(&iobuf_arena->passive.list);
iobuf_arena->iobuf_pool = iobuf_pool;
rounded_size = gf_iobuf_get_pagesize(page_size);
iobuf_arena->page_size = rounded_size;
iobuf_arena->page_count = num_iobufs;
iobuf_arena->arena_size = rounded_size * num_iobufs;
iobuf_arena->mem_base = mmap(NULL, iobuf_arena->arena_size,
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (iobuf_arena->mem_base == MAP_FAILED) {
gf_msg(THIS->name, GF_LOG_WARNING, 0, LG_MSG_MAPPING_FAILED,
"mapping failed");
goto err;
}
if (iobuf_pool->rdma_registration) {
iobuf_pool->rdma_registration(iobuf_pool->device, iobuf_arena);
}
list_add_tail(&iobuf_arena->all_list, &iobuf_pool->all_arenas);
__iobuf_arena_init_iobufs(iobuf_arena);
if (!iobuf_arena->iobufs) {
gf_msg(THIS->name, GF_LOG_ERROR, 0, LG_MSG_INIT_IOBUF_FAILED,
"init failed");
goto err;
}
iobuf_pool->arena_cnt++;
return iobuf_arena;
err:
__iobuf_arena_destroy(iobuf_pool, iobuf_arena);
out:
return NULL;
}
static struct iobuf_arena *
__iobuf_arena_unprune(struct iobuf_pool *iobuf_pool, const size_t page_size,
const int index)
{
struct iobuf_arena *iobuf_arena = NULL;
struct iobuf_arena *tmp = NULL;
GF_VALIDATE_OR_GOTO("iobuf", iobuf_pool, out);
list_for_each_entry(tmp, &iobuf_pool->purge[index], list)
{
list_del_init(&tmp->list);
iobuf_arena = tmp;
break;
}
out:
return iobuf_arena;
}
static struct iobuf_arena *
__iobuf_pool_add_arena(struct iobuf_pool *iobuf_pool, const size_t page_size,
const int32_t num_pages, const int index)
{
struct iobuf_arena *iobuf_arena = NULL;
iobuf_arena = __iobuf_arena_unprune(iobuf_pool, page_size, index);
if (!iobuf_arena) {
iobuf_arena = __iobuf_arena_alloc(iobuf_pool, page_size, num_pages);
if (!iobuf_arena) {
gf_msg(THIS->name, GF_LOG_WARNING, 0, LG_MSG_ARENA_NOT_FOUND,
"arena not found");
return NULL;
}
}
list_add(&iobuf_arena->list, &iobuf_pool->arenas[index]);
return iobuf_arena;
}
/* This function destroys all the iobufs and the iobuf_pool */
void
iobuf_pool_destroy(struct iobuf_pool *iobuf_pool)
{
struct iobuf_arena *iobuf_arena = NULL;
struct iobuf_arena *tmp = NULL;
int i = 0;
GF_VALIDATE_OR_GOTO("iobuf", iobuf_pool, out);
pthread_mutex_lock(&iobuf_pool->mutex);
{
for (i = 0; i < IOBUF_ARENA_MAX_INDEX; i++) {
list_for_each_entry_safe(iobuf_arena, tmp, &iobuf_pool->arenas[i],
list)
{
list_del_init(&iobuf_arena->list);
iobuf_pool->arena_cnt--;
__iobuf_arena_destroy(iobuf_pool, iobuf_arena);
}
list_for_each_entry_safe(iobuf_arena, tmp, &iobuf_pool->purge[i],
list)
{
list_del_init(&iobuf_arena->list);
iobuf_pool->arena_cnt--;
__iobuf_arena_destroy(iobuf_pool, iobuf_arena);
}
/* If there are no iobuf leaks, there should be no
* arenas in the filled list. If at all there are any
* arenas in the filled list, the below function will
* assert.
*/
list_for_each_entry_safe(iobuf_arena, tmp, &iobuf_pool->filled[i],
list)
{
list_del_init(&iobuf_arena->list);
iobuf_pool->arena_cnt--;
__iobuf_arena_destroy(iobuf_pool, iobuf_arena);
}
/* If there are no iobuf leaks, there shoould be
* no standard allocated arenas, iobuf_put will free
* such arenas.
* TODO: Free the stdalloc arenas forcefully if present?
*/
}
}
pthread_mutex_unlock(&iobuf_pool->mutex);
pthread_mutex_destroy(&iobuf_pool->mutex);
GF_FREE(iobuf_pool);
out:
return;
}
static void
iobuf_create_stdalloc_arena(struct iobuf_pool *iobuf_pool)
{
struct iobuf_arena *iobuf_arena = NULL;
/* No locking required here as its called only once during init */
iobuf_arena = GF_CALLOC(sizeof(*iobuf_arena), 1, gf_common_mt_iobuf_arena);
if (!iobuf_arena)
goto err;
INIT_LIST_HEAD(&iobuf_arena->list);
INIT_LIST_HEAD(&iobuf_arena->active.list);
INIT_LIST_HEAD(&iobuf_arena->passive.list);
iobuf_arena->iobuf_pool = iobuf_pool;
iobuf_arena->page_size = 0x7fffffff;
list_add_tail(&iobuf_arena->list,
&iobuf_pool->arenas[IOBUF_ARENA_MAX_INDEX]);
err:
return;
}
struct iobuf_pool *
iobuf_pool_new(void)
{
struct iobuf_pool *iobuf_pool = NULL;
int i = 0;
size_t page_size = 0;
size_t arena_size = 0;
int32_t num_pages = 0;
int index;
iobuf_pool = GF_CALLOC(sizeof(*iobuf_pool), 1, gf_common_mt_iobuf_pool);
if (!iobuf_pool)
goto out;
INIT_LIST_HEAD(&iobuf_pool->all_arenas);
pthread_mutex_init(&iobuf_pool->mutex, NULL);
for (i = 0; i <= IOBUF_ARENA_MAX_INDEX; i++) {
INIT_LIST_HEAD(&iobuf_pool->arenas[i]);
INIT_LIST_HEAD(&iobuf_pool->filled[i]);
INIT_LIST_HEAD(&iobuf_pool->purge[i]);
}
GF_ATOMIC_INIT(iobuf_pool->mem_pool_hit, 0);
GF_ATOMIC_INIT(iobuf_pool->mem_pool_miss, 0);
GF_ATOMIC_INIT(iobuf_pool->active_cnt, 0);
iobuf_pool->default_page_size = 128 * GF_UNIT_KB;
iobuf_pool->rdma_registration = NULL;
iobuf_pool->rdma_deregistration = NULL;
for (i = 0; i < GF_RDMA_DEVICE_COUNT; i++) {
iobuf_pool->device[i] = NULL;
iobuf_pool->mr_list[i] = NULL;
}
pthread_mutex_lock(&iobuf_pool->mutex);
{
for (i = 0; i < IOBUF_ARENA_MAX_INDEX; i++) {
page_size = gf_iobuf_init_config[i].pagesize;
num_pages = gf_iobuf_init_config[i].num_pages;
index = gf_iobuf_get_arena_index(page_size);
if (index == -1) {
pthread_mutex_unlock(&iobuf_pool->mutex);
gf_msg("iobuf", GF_LOG_ERROR, 0, LG_MSG_PAGE_SIZE_EXCEEDED,
"page_size (%zu) of iobufs in arena being added is "
"greater than max available",
page_size);
return NULL;
}
__iobuf_pool_add_arena(iobuf_pool, page_size, num_pages, index);
arena_size += page_size * num_pages;
}
}
pthread_mutex_unlock(&iobuf_pool->mutex);
/* Need an arena to handle all the bigger iobuf requests */
iobuf_create_stdalloc_arena(iobuf_pool);
iobuf_pool->arena_size = arena_size;
out:
return iobuf_pool;
}
void
iobuf_pool_destroy(struct iobuf_pool *iobuf_pool)
static void
__iobuf_arena_prune(struct iobuf_pool *iobuf_pool,
struct iobuf_arena *iobuf_arena, const int index)
{
if (!iobuf_pool)
return;
/* code flow comes here only if the arena is in purge list and we can
* free the arena only if we have at least one arena in 'arenas' list
* (ie, at least few iobufs free in arena), that way, there won't
* be spurious mmap/unmap of buffers
*/
if (list_empty(&iobuf_pool->arenas[index]))
goto out;
if (GF_ATOMIC_GET(iobuf_pool->active_cnt) != 0)
gf_msg_callingfn(THIS->name, GF_LOG_ERROR, 0, LG_MSG_IOBUFS_NOT_FOUND,
"iobuf_pool_destroy called, but there"
" are unfreed active iobufs:%" PRId64,
GF_ATOMIC_GET(iobuf_pool->active_cnt));
/* All cases matched, destroy */
list_del_init(&iobuf_arena->list);
list_del_init(&iobuf_arena->all_list);
iobuf_pool->arena_cnt--;
GF_FREE(iobuf_pool);
__iobuf_arena_destroy(iobuf_pool, iobuf_arena);
out:
return;
}
void
iobuf_pool_prune(struct iobuf_pool *iobuf_pool)
{
struct iobuf_arena *iobuf_arena = NULL;
struct iobuf_arena *tmp = NULL;
int i = 0;
GF_VALIDATE_OR_GOTO("iobuf", iobuf_pool, out);
pthread_mutex_lock(&iobuf_pool->mutex);
{
for (i = 0; i < IOBUF_ARENA_MAX_INDEX; i++) {
if (list_empty(&iobuf_pool->arenas[i])) {
continue;
}
list_for_each_entry_safe(iobuf_arena, tmp, &iobuf_pool->purge[i],
list)
{
__iobuf_arena_prune(iobuf_pool, iobuf_arena, i);
}
}
}
pthread_mutex_unlock(&iobuf_pool->mutex);
out:
return;
}
/* Always called under the iobuf_pool mutex lock */
static struct iobuf_arena *
__iobuf_select_arena(struct iobuf_pool *iobuf_pool, const size_t page_size,
const int index)
{
struct iobuf_arena *iobuf_arena = NULL;
struct iobuf_arena *trav = NULL;
/* look for unused iobuf from the head-most arena */
list_for_each_entry(trav, &iobuf_pool->arenas[index], list)
{
if (trav->passive_cnt) {
iobuf_arena = trav;
break;
}
}
if (!iobuf_arena) {
/* all arenas were full, find the right count to add */
iobuf_arena = __iobuf_pool_add_arena(
iobuf_pool, page_size, gf_iobuf_init_config[index].num_pages,
index);
}
return iobuf_arena;
}
/* Always called under the iobuf_pool mutex lock */
static struct iobuf *
__iobuf_get(struct iobuf_pool *iobuf_pool, const size_t page_size,
const int index)
{
struct iobuf *iobuf = NULL;
struct iobuf_arena *iobuf_arena = NULL;
/* most eligible arena for picking an iobuf */
iobuf_arena = __iobuf_select_arena(iobuf_pool, page_size, index);
if (!iobuf_arena)
return NULL;
list_for_each_entry(iobuf, &iobuf_arena->passive.list, list) break;
list_del(&iobuf->list);
iobuf_arena->passive_cnt--;
list_add(&iobuf->list, &iobuf_arena->active.list);
iobuf_arena->active_cnt++;
/* no resetting requied for this element */
iobuf_arena->alloc_cnt++;
if (iobuf_arena->max_active < iobuf_arena->active_cnt)
iobuf_arena->max_active = iobuf_arena->active_cnt;
if (iobuf_arena->passive_cnt == 0) {
list_del(&iobuf_arena->list);
list_add(&iobuf_arena->list, &iobuf_pool->filled[index]);
}
return iobuf;
}
struct iobuf *
iobuf_get_from_stdalloc(struct iobuf_pool *iobuf_pool, size_t page_size)
{
struct iobuf *iobuf = NULL;
struct iobuf_arena *iobuf_arena = NULL;
struct iobuf_arena *trav = NULL;
int ret = -1;
/* The first arena in the 'MAX-INDEX' will always be used for misc */
list_for_each_entry(trav, &iobuf_pool->arenas[IOBUF_ARENA_MAX_INDEX], list)
{
iobuf_arena = trav;
break;
}
iobuf = GF_CALLOC(1, sizeof(*iobuf), gf_common_mt_iobuf);
if (!iobuf)
goto out;
/* 4096 is the alignment */
iobuf->free_ptr = GF_CALLOC(1, ((page_size + GF_IOBUF_ALIGN_SIZE) - 1),
gf_common_mt_char);
if (!iobuf->free_ptr)
goto out;
iobuf->ptr = GF_ALIGN_BUF(iobuf->free_ptr, GF_IOBUF_ALIGN_SIZE);
iobuf->iobuf_arena = iobuf_arena;
LOCK_INIT(&iobuf->lock);
/* Hold a ref because you are allocating and using it */
GF_ATOMIC_INIT(iobuf->ref, 1);
ret = 0;
out:
if (ret && iobuf) {
GF_FREE(iobuf->free_ptr);
GF_FREE(iobuf);
iobuf = NULL;
}
return iobuf;
}
struct iobuf *
iobuf_get2(struct iobuf_pool *iobuf_pool, size_t page_size)
{
struct iobuf *iobuf = NULL;
gf_boolean_t hit = _gf_false;
size_t rounded_size = 0;
int index = 0;
if (page_size == 0) {
page_size = GF_IOBUF_DEFAULT_PAGE_SIZE;
page_size = iobuf_pool->default_page_size;
}
iobuf = mem_pool_get0(sizeof(struct iobuf), &hit);
if (!iobuf)
goto out;
rounded_size = gf_iobuf_get_pagesize(page_size);
if (rounded_size == -1) {
/* make sure to provide the requested buffer with standard
memory allocations */
iobuf = iobuf_get_from_stdalloc(iobuf_pool, page_size);
iobuf->free_ptr = mem_pool_get(page_size, &hit);
if (!iobuf->free_ptr) {
iobuf->free_ptr = GF_MALLOC(page_size, gf_common_mt_char);
iobuf->stdalloc = _gf_true;
gf_msg_debug("iobuf", 0,
"request for iobuf of size %zu "
"is serviced using standard calloc() (%p) as it "
"exceeds the maximum available buffer size",
page_size, iobuf);
iobuf_pool->request_misses++;
return iobuf;
}
if (!iobuf->free_ptr) {
mem_put(iobuf);
iobuf = NULL;
goto out;
index = gf_iobuf_get_arena_index(page_size);
if (index == -1) {
gf_msg("iobuf", GF_LOG_ERROR, 0, LG_MSG_PAGE_SIZE_EXCEEDED,
"page_size (%zu) of iobufs in arena being added is "
"greater than max available",
page_size);
return NULL;
}
if (hit == _gf_true)
GF_ATOMIC_INC(iobuf_pool->mem_pool_hit);
else
GF_ATOMIC_INC(iobuf_pool->mem_pool_miss);
iobuf->ptr = iobuf->free_ptr;
LOCK_INIT(&iobuf->lock);
pthread_mutex_lock(&iobuf_pool->mutex);
{
iobuf = __iobuf_get(iobuf_pool, rounded_size, index);
if (!iobuf) {
gf_msg(THIS->name, GF_LOG_WARNING, 0, LG_MSG_IOBUF_NOT_FOUND,
"iobuf not found");
goto unlock;
}
iobuf->page_size = page_size;
iobuf->iobuf_pool = iobuf_pool;
iobuf_ref(iobuf);
}
unlock:
pthread_mutex_unlock(&iobuf_pool->mutex);
/* Hold a ref because you are allocating and using it */
iobuf_ref(iobuf);
GF_ATOMIC_INC(iobuf_pool->active_cnt);
out:
return iobuf;
}
@ -99,13 +607,23 @@ iobuf_get_page_aligned(struct iobuf_pool *iobuf_pool, size_t page_size,
req_size = page_size;
if (req_size == 0) {
req_size = GF_IOBUF_DEFAULT_PAGE_SIZE;
req_size = iobuf_pool->default_page_size;
}
iobuf = iobuf_get2(iobuf_pool, req_size + align_size);
if (!iobuf)
return NULL;
/* If std allocation was used, then free_ptr will be non-NULL. In this
* case, we do not want to modify the original free_ptr.
* On the other hand, if the buf was gotten through the available
* arenas, then we use iobuf->free_ptr to store the original
* pointer to the offset into the mmap'd block of memory and in turn
* reuse iobuf->ptr to hold the page-aligned address. And finally, in
* iobuf_put(), we copy iobuf->free_ptr into iobuf->ptr - back to where
* it was originally when __iobuf_get() returned this iobuf.
*/
if (!iobuf->free_ptr)
iobuf->free_ptr = iobuf->ptr;
iobuf->ptr = GF_ALIGN_BUF(iobuf->ptr, align_size);
return iobuf;
@ -114,22 +632,118 @@ iobuf_get_page_aligned(struct iobuf_pool *iobuf_pool, size_t page_size,
struct iobuf *
iobuf_get(struct iobuf_pool *iobuf_pool)
{
return iobuf_get2(iobuf_pool, GF_IOBUF_DEFAULT_PAGE_SIZE);
struct iobuf *iobuf = NULL;
int index = 0;
GF_VALIDATE_OR_GOTO("iobuf", iobuf_pool, out);
index = gf_iobuf_get_arena_index(iobuf_pool->default_page_size);
if (index == -1) {
gf_msg("iobuf", GF_LOG_ERROR, 0, LG_MSG_PAGE_SIZE_EXCEEDED,
"page_size (%zu) of iobufs in arena being added is "
"greater than max available",
iobuf_pool->default_page_size);
return NULL;
}
pthread_mutex_lock(&iobuf_pool->mutex);
{
iobuf = __iobuf_get(iobuf_pool, iobuf_pool->default_page_size, index);
if (!iobuf) {
gf_msg(THIS->name, GF_LOG_WARNING, 0, LG_MSG_IOBUF_NOT_FOUND,
"iobuf not found");
goto unlock;
}
iobuf_ref(iobuf);
}
unlock:
pthread_mutex_unlock(&iobuf_pool->mutex);
out:
return iobuf;
}
void
__iobuf_put(struct iobuf *iobuf, struct iobuf_arena *iobuf_arena)
{
struct iobuf_pool *iobuf_pool = NULL;
int index = 0;
GF_VALIDATE_OR_GOTO("iobuf", iobuf_arena, out);
GF_VALIDATE_OR_GOTO("iobuf", iobuf, out);
iobuf_pool = iobuf_arena->iobuf_pool;
index = gf_iobuf_get_arena_index(iobuf_arena->page_size);
if (index == -1) {
gf_msg_debug("iobuf", 0,
"freeing the iobuf (%p) "
"allocated with standard calloc()",
iobuf);
/* free up properly without bothering about lists and all */
LOCK_DESTROY(&iobuf->lock);
GF_FREE(iobuf->free_ptr);
GF_FREE(iobuf);
return;
}
if (iobuf_arena->passive_cnt == 0) {
list_del(&iobuf_arena->list);
list_add_tail(&iobuf_arena->list, &iobuf_pool->arenas[index]);
}
list_del_init(&iobuf->list);
iobuf_arena->active_cnt--;
if (iobuf->free_ptr) {
iobuf->ptr = iobuf->free_ptr;
iobuf->free_ptr = NULL;
}
list_add(&iobuf->list, &iobuf_arena->passive.list);
iobuf_arena->passive_cnt++;
if (iobuf_arena->active_cnt == 0) {
list_del(&iobuf_arena->list);
list_add_tail(&iobuf_arena->list, &iobuf_pool->purge[index]);
GF_VALIDATE_OR_GOTO("iobuf", iobuf_pool, out);
__iobuf_arena_prune(iobuf_pool, iobuf_arena, index);
}
out:
return;
}
void
iobuf_put(struct iobuf *iobuf)
{
LOCK_DESTROY(&iobuf->lock);
struct iobuf_arena *iobuf_arena = NULL;
struct iobuf_pool *iobuf_pool = NULL;
if (iobuf->stdalloc)
GF_FREE(iobuf->free_ptr);
else
mem_put(iobuf->free_ptr);
GF_VALIDATE_OR_GOTO("iobuf", iobuf, out);
GF_ATOMIC_DEC(iobuf->iobuf_pool->active_cnt);
mem_put(iobuf);
iobuf_arena = iobuf->iobuf_arena;
if (!iobuf_arena) {
gf_msg(THIS->name, GF_LOG_WARNING, 0, LG_MSG_ARENA_NOT_FOUND,
"arena not found");
return;
}
iobuf_pool = iobuf_arena->iobuf_pool;
if (!iobuf_pool) {
gf_msg(THIS->name, GF_LOG_WARNING, 0, LG_MSG_POOL_NOT_FOUND,
"iobuf pool not found");
return;
}
pthread_mutex_lock(&iobuf_pool->mutex);
{
__iobuf_put(iobuf, iobuf_arena);
}
pthread_mutex_unlock(&iobuf_pool->mutex);
out:
return;
}
@ -353,10 +967,25 @@ out:
size_t
iobuf_size(struct iobuf *iobuf)
{
if (!iobuf)
return 0;
size_t size = 0;
return iobuf->page_size;
GF_VALIDATE_OR_GOTO("iobuf", iobuf, out);
if (!iobuf->iobuf_arena) {
gf_msg(THIS->name, GF_LOG_WARNING, 0, LG_MSG_ARENA_NOT_FOUND,
"arena not found");
goto out;
}
if (!iobuf->iobuf_arena->iobuf_pool) {
gf_msg(THIS->name, GF_LOG_WARNING, 0, LG_MSG_POOL_NOT_FOUND,
"pool not found");
goto out;
}
size = iobuf->iobuf_arena->page_size;
out:
return size;
}
size_t
@ -380,21 +1009,114 @@ out:
return size;
}
void
iobuf_info_dump(struct iobuf *iobuf, const char *key_prefix)
{
char key[GF_DUMP_MAX_BUF_LEN];
struct iobuf my_iobuf;
int ret = 0;
GF_VALIDATE_OR_GOTO("iobuf", iobuf, out);
ret = TRY_LOCK(&iobuf->lock);
if (ret) {
return;
}
memcpy(&my_iobuf, iobuf, sizeof(my_iobuf));
UNLOCK(&iobuf->lock);
gf_proc_dump_build_key(key, key_prefix, "ref");
gf_proc_dump_write(key, "%" GF_PRI_ATOMIC, GF_ATOMIC_GET(my_iobuf.ref));
gf_proc_dump_build_key(key, key_prefix, "ptr");
gf_proc_dump_write(key, "%p", my_iobuf.ptr);
out:
return;
}
void
iobuf_arena_info_dump(struct iobuf_arena *iobuf_arena, const char *key_prefix)
{
char key[GF_DUMP_MAX_BUF_LEN];
int i = 1;
struct iobuf *trav;
GF_VALIDATE_OR_GOTO("iobuf", iobuf_arena, out);
gf_proc_dump_build_key(key, key_prefix, "mem_base");
gf_proc_dump_write(key, "%p", iobuf_arena->mem_base);
gf_proc_dump_build_key(key, key_prefix, "active_cnt");
gf_proc_dump_write(key, "%d", iobuf_arena->active_cnt);
gf_proc_dump_build_key(key, key_prefix, "passive_cnt");
gf_proc_dump_write(key, "%d", iobuf_arena->passive_cnt);
gf_proc_dump_build_key(key, key_prefix, "alloc_cnt");
gf_proc_dump_write(key, "%" PRIu64, iobuf_arena->alloc_cnt);
gf_proc_dump_build_key(key, key_prefix, "max_active");
gf_proc_dump_write(key, "%d", iobuf_arena->max_active);
gf_proc_dump_build_key(key, key_prefix, "page_size");
gf_proc_dump_write(key, "%" GF_PRI_SIZET, iobuf_arena->page_size);
list_for_each_entry(trav, &iobuf_arena->active.list, list)
{
gf_proc_dump_build_key(key, key_prefix, "active_iobuf.%d", i++);
gf_proc_dump_add_section("%s", key);
iobuf_info_dump(trav, key);
}
out:
return;
}
void
iobuf_stats_dump(struct iobuf_pool *iobuf_pool)
{
char msg[1024];
struct iobuf_arena *trav = NULL;
int i = 1;
int j = 0;
int ret = -1;
GF_VALIDATE_OR_GOTO("iobuf", iobuf_pool, out);
ret = pthread_mutex_trylock(&iobuf_pool->mutex);
if (ret) {
return;
}
gf_proc_dump_add_section("iobuf.global");
gf_proc_dump_write("iobuf_pool", "%p", iobuf_pool);
gf_proc_dump_write("iobuf_pool.default_page_size", "%llu",
GF_IOBUF_DEFAULT_PAGE_SIZE);
gf_proc_dump_write("iobuf_pool.request_hits", "%" PRId64,
GF_ATOMIC_GET(iobuf_pool->mem_pool_hit));
gf_proc_dump_write("iobuf_pool.default_page_size", "%" GF_PRI_SIZET,
iobuf_pool->default_page_size);
gf_proc_dump_write("iobuf_pool.arena_size", "%" GF_PRI_SIZET,
iobuf_pool->arena_size);
gf_proc_dump_write("iobuf_pool.arena_cnt", "%d", iobuf_pool->arena_cnt);
gf_proc_dump_write("iobuf_pool.request_misses", "%" PRId64,
GF_ATOMIC_GET(iobuf_pool->mem_pool_miss));
gf_proc_dump_write("iobuf_pool.active_cnt", "%" PRId64,
GF_ATOMIC_GET(iobuf_pool->active_cnt));
iobuf_pool->request_misses);
for (j = 0; j < IOBUF_ARENA_MAX_INDEX; j++) {
list_for_each_entry(trav, &iobuf_pool->arenas[j], list)
{
snprintf(msg, sizeof(msg), "arena.%d", i);
gf_proc_dump_add_section("%s", msg);
iobuf_arena_info_dump(trav, msg);
i++;
}
list_for_each_entry(trav, &iobuf_pool->purge[j], list)
{
snprintf(msg, sizeof(msg), "purge.%d", i);
gf_proc_dump_add_section("%s", msg);
iobuf_arena_info_dump(trav, msg);
i++;
}
list_for_each_entry(trav, &iobuf_pool->filled[j], list)
{
snprintf(msg, sizeof(msg), "filled.%d", i);
gf_proc_dump_add_section("%s", msg);
iobuf_arena_info_dump(trav, msg);
i++;
}
}
pthread_mutex_unlock(&iobuf_pool->mutex);
out:
return;

View File

@ -920,7 +920,7 @@ mem_pool_get(unsigned long sizeof_type, gf_boolean_t *hit)
sizeof_type |= (1 << POOL_SMALLEST) - 1;
power = sizeof(sizeof_type) * 8 - __builtin_clzl(sizeof_type - 1) + 1;
if (power > POOL_LARGEST) {
gf_msg_callingfn("mem-pool", GF_LOG_DEBUG, EINVAL, LG_MSG_INVALID_ARG,
gf_msg_callingfn("mem-pool", GF_LOG_ERROR, EINVAL, LG_MSG_INVALID_ARG,
"invalid argument");
return NULL;
}

View File

@ -344,6 +344,207 @@ gf_rdma_post_recv(struct ibv_srq *srq, gf_rdma_post_t *post)
return ibv_post_srq_recv(srq, &wr, &bad_wr);
}
static void
gf_rdma_deregister_iobuf_pool(gf_rdma_device_t *device)
{
gf_rdma_arena_mr *arena_mr = NULL;
gf_rdma_arena_mr *tmp = NULL;
while (device) {
pthread_mutex_lock(&device->all_mr_lock);
{
if (!list_empty(&device->all_mr)) {
list_for_each_entry_safe(arena_mr, tmp, &device->all_mr, list)
{
if (ibv_dereg_mr(arena_mr->mr)) {
gf_msg("rdma", GF_LOG_WARNING, 0,
RDMA_MSG_DEREGISTER_ARENA_FAILED,
"deallocation of memory region "
"failed");
pthread_mutex_unlock(&device->all_mr_lock);
return;
}
list_del(&arena_mr->list);
GF_FREE(arena_mr);
}
}
}
pthread_mutex_unlock(&device->all_mr_lock);
device = device->next;
}
}
int
gf_rdma_deregister_arena(struct list_head **mr_list,
struct iobuf_arena *iobuf_arena)
{
gf_rdma_arena_mr *tmp = NULL;
gf_rdma_arena_mr *dummy = NULL;
gf_rdma_device_t *device = NULL;
int count = 0, i = 0;
count = iobuf_arena->iobuf_pool->rdma_device_count;
for (i = 0; i < count; i++) {
device = iobuf_arena->iobuf_pool->device[i];
pthread_mutex_lock(&device->all_mr_lock);
{
list_for_each_entry_safe(tmp, dummy, mr_list[i], list)
{
if (tmp->iobuf_arena == iobuf_arena) {
if (ibv_dereg_mr(tmp->mr)) {
gf_msg("rdma", GF_LOG_WARNING, 0,
RDMA_MSG_DEREGISTER_ARENA_FAILED,
"deallocation of memory region "
"failed");
pthread_mutex_unlock(&device->all_mr_lock);
return -1;
}
list_del(&tmp->list);
GF_FREE(tmp);
break;
}
}
}
pthread_mutex_unlock(&device->all_mr_lock);
}
return 0;
}
int
gf_rdma_register_arena(void **arg1, void *arg2)
{
struct ibv_mr *mr = NULL;
gf_rdma_arena_mr *new = NULL;
struct iobuf_pool *iobuf_pool = NULL;
gf_rdma_device_t **device = (gf_rdma_device_t **)arg1;
struct iobuf_arena *iobuf_arena = arg2;
int count = 0, i = 0;
iobuf_pool = iobuf_arena->iobuf_pool;
count = iobuf_pool->rdma_device_count;
for (i = 0; i < count; i++) {
new = GF_CALLOC(1, sizeof(gf_rdma_arena_mr),
gf_common_mt_rdma_arena_mr);
if (new == NULL) {
gf_msg("rdma", GF_LOG_INFO, ENOMEM, RDMA_MSG_MR_ALOC_FAILED,
"Out of "
"memory: registering pre allocated buffer "
"with rdma device failed.");
return -1;
}
INIT_LIST_HEAD(&new->list);
new->iobuf_arena = iobuf_arena;
mr = ibv_reg_mr(device[i]->pd, iobuf_arena->mem_base,
iobuf_arena->arena_size,
IBV_ACCESS_REMOTE_READ | IBV_ACCESS_LOCAL_WRITE |
IBV_ACCESS_REMOTE_WRITE);
if (!mr)
gf_msg("rdma", GF_LOG_WARNING, 0, RDMA_MSG_MR_ALOC_FAILED,
"allocation of mr "
"failed");
new->mr = mr;
pthread_mutex_lock(&device[i]->all_mr_lock);
{
list_add(&new->list, &device[i]->all_mr);
}
pthread_mutex_unlock(&device[i]->all_mr_lock);
new = NULL;
}
return 0;
}
static void
gf_rdma_register_iobuf_pool(gf_rdma_device_t *device,
struct iobuf_pool *iobuf_pool)
{
struct iobuf_arena *tmp = NULL;
struct iobuf_arena *dummy = NULL;
struct ibv_mr *mr = NULL;
gf_rdma_arena_mr *new = NULL;
if (!list_empty(&iobuf_pool->all_arenas)) {
list_for_each_entry_safe(tmp, dummy, &iobuf_pool->all_arenas, all_list)
{
new = GF_CALLOC(1, sizeof(gf_rdma_arena_mr),
gf_common_mt_rdma_arena_mr);
if (new == NULL) {
gf_msg("rdma", GF_LOG_INFO, ENOMEM, RDMA_MSG_MR_ALOC_FAILED,
"Out of "
"memory: registering pre allocated "
"buffer with rdma device failed.");
return;
}
INIT_LIST_HEAD(&new->list);
new->iobuf_arena = tmp;
mr = ibv_reg_mr(device->pd, tmp->mem_base, tmp->arena_size,
IBV_ACCESS_REMOTE_READ | IBV_ACCESS_LOCAL_WRITE |
IBV_ACCESS_REMOTE_WRITE);
if (!mr) {
gf_msg("rdma", GF_LOG_WARNING, 0, RDMA_MSG_MR_ALOC_FAILED,
"failed"
" to pre register buffers with rdma "
"devices.");
}
new->mr = mr;
pthread_mutex_lock(&device->all_mr_lock);
{
list_add(&new->list, &device->all_mr);
}
pthread_mutex_unlock(&device->all_mr_lock);
new = NULL;
}
}
return;
}
static void
gf_rdma_register_iobuf_pool_with_device(gf_rdma_device_t *device,
struct iobuf_pool *iobuf_pool)
{
while (device) {
gf_rdma_register_iobuf_pool(device, iobuf_pool);
device = device->next;
}
}
static struct ibv_mr *
gf_rdma_get_pre_registred_mr(rpc_transport_t *this, void *ptr, int size)
{
gf_rdma_arena_mr *tmp = NULL;
gf_rdma_arena_mr *dummy = NULL;
gf_rdma_private_t *priv = NULL;
gf_rdma_device_t *device = NULL;
priv = this->private;
device = priv->device;
pthread_mutex_lock(&device->all_mr_lock);
{
if (!list_empty(&device->all_mr)) {
list_for_each_entry_safe(tmp, dummy, &device->all_mr, list)
{
if (tmp->iobuf_arena->mem_base <= ptr &&
ptr < tmp->iobuf_arena->mem_base +
tmp->iobuf_arena->arena_size) {
pthread_mutex_unlock(&device->all_mr_lock);
return tmp->mr;
}
}
}
}
pthread_mutex_unlock(&device->all_mr_lock);
return NULL;
}
static int32_t
gf_rdma_create_posts(rpc_transport_t *this)
{
@ -492,11 +693,13 @@ gf_rdma_get_device(rpc_transport_t *this, struct ibv_context *ibctx,
int32_t i = 0;
gf_rdma_device_t *trav = NULL, *device = NULL;
gf_rdma_ctx_t *rdma_ctx = NULL;
struct iobuf_pool *iobuf_pool = NULL;
priv = this->private;
options = &priv->options;
ctx = this->ctx;
rdma_ctx = ctx->ib;
iobuf_pool = ctx->iobuf_pool;
trav = rdma_ctx->device;
@ -517,6 +720,8 @@ gf_rdma_get_device(rpc_transport_t *this, struct ibv_context *ibctx,
trav->next = rdma_ctx->device;
rdma_ctx->device = trav;
iobuf_pool->device[iobuf_pool->rdma_device_count] = trav;
iobuf_pool->mr_list[iobuf_pool->rdma_device_count++] = &trav->all_mr;
trav->request_ctx_pool = mem_pool_new(gf_rdma_request_context_t,
GF_RDMA_POOL_SIZE);
if (trav->request_ctx_pool == NULL) {
@ -594,6 +799,7 @@ gf_rdma_get_device(rpc_transport_t *this, struct ibv_context *ibctx,
INIT_LIST_HEAD(&trav->all_mr);
pthread_mutex_init(&trav->all_mr_lock, NULL);
gf_rdma_register_iobuf_pool(trav, iobuf_pool);
if (gf_rdma_create_posts(this) < 0) {
gf_msg(this->name, GF_LOG_ERROR, 0, RDMA_MSG_ALOC_POST_FAILED,
@ -1229,8 +1435,12 @@ __gf_rdma_create_read_chunks_from_vector(gf_rdma_peer_t *peer,
readch->rc_discrim = hton32(1);
readch->rc_position = hton32(*pos);
mr = ibv_reg_mr(device->pd, vector[i].iov_base, vector[i].iov_len,
IBV_ACCESS_REMOTE_READ);
mr = gf_rdma_get_pre_registred_mr(
peer->trans, (void *)vector[i].iov_base, vector[i].iov_len);
if (!mr) {
mr = ibv_reg_mr(device->pd, vector[i].iov_base, vector[i].iov_len,
IBV_ACCESS_REMOTE_READ);
}
if (!mr) {
gf_msg(GF_RDMA_LOG_NAME, GF_LOG_WARNING, errno,
RDMA_MSG_MR_ALOC_FAILED,
@ -1351,8 +1561,13 @@ __gf_rdma_create_write_chunks_from_vector(
device = priv->device;
for (i = 0; i < count; i++) {
mr = ibv_reg_mr(device->pd, vector[i].iov_base, vector[i].iov_len,
IBV_ACCESS_REMOTE_WRITE | IBV_ACCESS_LOCAL_WRITE);
mr = gf_rdma_get_pre_registred_mr(
peer->trans, (void *)vector[i].iov_base, vector[i].iov_len);
if (!mr) {
mr = ibv_reg_mr(device->pd, vector[i].iov_base, vector[i].iov_len,
IBV_ACCESS_REMOTE_WRITE | IBV_ACCESS_LOCAL_WRITE);
}
if (!mr) {
gf_msg(GF_RDMA_LOG_NAME, GF_LOG_WARNING, errno,
RDMA_MSG_MR_ALOC_FAILED,
@ -2033,6 +2248,9 @@ __gf_rdma_register_local_mr_for_rdma(gf_rdma_peer_t *peer, struct iovec *vector,
* Infiniband Architecture Specification Volume 1
* (Release 1.2.1)
*/
ctx->mr[ctx->mr_count] = gf_rdma_get_pre_registred_mr(
peer->trans, (void *)vector[i].iov_base, vector[i].iov_len);
if (!ctx->mr[ctx->mr_count]) {
ctx->mr[ctx->mr_count] = ibv_reg_mr(device->pd, vector[i].iov_base,
vector[i].iov_len,
@ -4551,6 +4769,7 @@ init(rpc_transport_t *this)
{
gf_rdma_private_t *priv = NULL;
gf_rdma_ctx_t *rdma_ctx = NULL;
struct iobuf_pool *iobuf_pool = NULL;
priv = GF_CALLOC(1, sizeof(*priv), gf_common_mt_rdma_private_t);
if (!priv)
@ -4569,6 +4788,18 @@ init(rpc_transport_t *this)
if (!rdma_ctx)
return -1;
pthread_mutex_lock(&rdma_ctx->lock);
{
if (this->dl_handle && (++(rdma_ctx->dlcount)) == 1) {
iobuf_pool = this->ctx->iobuf_pool;
iobuf_pool->rdma_registration = gf_rdma_register_arena;
iobuf_pool->rdma_deregistration = gf_rdma_deregister_arena;
gf_rdma_register_iobuf_pool_with_device(rdma_ctx->device,
iobuf_pool);
}
}
pthread_mutex_unlock(&rdma_ctx->lock);
return 0;
}
@ -4600,6 +4831,7 @@ fini(struct rpc_transport *this)
{
/* TODO: verify this function does graceful finish */
gf_rdma_private_t *priv = NULL;
struct iobuf_pool *iobuf_pool = NULL;
gf_rdma_ctx_t *rdma_ctx = NULL;
priv = this->private;
@ -4618,6 +4850,17 @@ fini(struct rpc_transport *this)
if (!rdma_ctx)
return;
pthread_mutex_lock(&rdma_ctx->lock);
{
if (this->dl_handle && (--(rdma_ctx->dlcount)) == 0) {
iobuf_pool = this->ctx->iobuf_pool;
gf_rdma_deregister_iobuf_pool(rdma_ctx->device);
iobuf_pool->rdma_registration = NULL;
iobuf_pool->rdma_deregistration = NULL;
}
}
pthread_mutex_unlock(&rdma_ctx->lock);
return;
}

View File

@ -325,6 +325,7 @@ typedef struct __gf_rdma_device gf_rdma_device_t;
struct __gf_rdma_arena_mr {
struct list_head list;
struct iobuf_arena *iobuf_arena;
struct ibv_mr *mr;
};

View File

@ -5820,7 +5820,8 @@ fuse_thread_proc(void *data)
THIS = this;
iov_in[0].iov_len = sizeof(*finh) + sizeof(struct fuse_write_in);
iov_in[1].iov_len = GF_IOBUF_DEFAULT_PAGE_SIZE;
iov_in[1].iov_len = ((struct iobuf_pool *)this->ctx->iobuf_pool)
->default_page_size;
priv->msg0_len_p = &iov_in[0].iov_len;
for (;;) {