bd: posix/multi-brick support to BD xlator

Current BD xlator (block backend) has a few limitations such as
* Creation of directories not supported
* Supports only single brick
* Does not use extended attributes (and client gfid) like posix xlator
* Creation of special files (symbolic links, device nodes etc) not
  supported

Basic limitation of not allowing directory creation is blocking
oVirt/VDSM to consume BD xlator as part of Gluster domain since VDSM
creates multi-level directories when GlusterFS is used as storage
backend for storing VM images.

To overcome these limitations a new BD xlator with following
improvements is suggested.

* New hybrid BD xlator that handles both regular files and block device
  files
* The volume will have both POSIX and BD bricks. Regular files are
  created on POSIX bricks, block devices are created on the BD brick (VG)
* BD xlator leverages exiting POSIX xlator for most POSIX calls and
  hence sits above the POSIX xlator
* Block device file is differentiated from regular file by an extended
  attribute
* The xattr 'user.glusterfs.bd' (BD_XATTR) plays a role in mapping a
  posix file to Logical Volume (LV).
* When a client sends a request to set BD_XATTR on a posix file, a new
  LV is created and mapped to posix file. So every block device will
  have a representative file in POSIX brick with 'user.glusterfs.bd'
  (BD_XATTR) set.
* Here after all operations on this file results in LV related
  operations.

For example opening a file that has BD_XATTR set results in opening
the LV block device, reading results in reading the corresponding LV
block device.

When BD xlator gets request to set BD_XATTR via setxattr call, it
creates a LV and information about this LV is placed in the xattr of the
posix file. xattr "user.glusterfs.bd" used to identify that posix file
is mapped to BD.

Usage:
Server side:
[root@host1 ~]# gluster volume create bdvol host1:/storage/vg1_info?vg1 host2:/storage/vg2_info?vg2
It creates a distributed gluster volume 'bdvol' with Volume Group vg1
using posix brick /storage/vg1_info in host1 and Volume Group vg2 using
/storage/vg2_info in host2.

[root@host1 ~]# gluster volume start bdvol

Client side:
[root@node ~]# mount -t glusterfs host1:/bdvol /media
[root@node ~]# touch /media/posix
It creates regular posix file 'posix' in either host1:/vg1 or host2:/vg2 brick
[root@node ~]# mkdir /media/image
[root@node ~]# touch /media/image/lv1
It also creates regular posix file 'lv1' in either host1:/vg1 or
host2:/vg2 brick
[root@node ~]# setfattr -n "user.glusterfs.bd" -v "lv" /media/image/lv1
[root@node ~]#
Above setxattr results in creating a new LV in corresponding brick's VG
and it sets 'user.glusterfs.bd' with value 'lv:<default-extent-size'
[root@node ~]# truncate -s5G /media/image/lv1
It results in resizig LV 'lv1'to 5G

New BD xlator code is placed in xlators/storage/bd directory.

Also add volume-uuid to the VG so that same VG can't be used for other
bricks/volumes. After deleting a gluster volume, one has to manually
remove the associated tag using vgchange <vg-name> --deltag
<trusted.glusterfs.volume-id:<volume-id>>

Changes from previous version V5:
* Removed support for delayed deleting of LVs

Changes from previous version V4:
* Consolidated the patches
* Removed usage of BD_XATTR_SIZE and consolidated it in BD_XATTR.

Changes from previous version V3:
* Added support in FUSE to support full/linked clone
* Added support to merge snapshots and provide information about origin
* bd_map xlator removed
* iatt structure used in inode_ctx. iatt is cached and updated during
fsync/flush
* aio support
* Type and capabilities of volume are exported through getxattr

Changes from version 2:
* Used inode_context for caching BD size and to check if loc/fd is BD or
  not.
* Added GlusterFS server offloaded copy and snapshot through setfattr
  FOP. As part of this libgfapi is modified.
* BD xlator supports stripe
* During unlinking if a LV file is already opened, its added to delete
  list and bd_del_thread tries to delete from this list when a last
  reference to that file is closed.

Changes from previous version:
* gfid is used as name of LV
* ? is used to specify VG name for creating BD volume in volume
  create, add-brick. gluster volume create volname host:/path?vg
* open-behind issue is fixed
* A replicate brick can be added dynamically and LVs from source brick
  are replicated to destination brick
* A distribute brick can be added dynamically and rebalance operation
  distributes existing LVs/files to the new brick
* Thin provisioning support added.
* bd_map xlator support retained
* setfattr -n user.glusterfs.bd -v "lv" creates a regular LV and
  setfattr -n user.glusterfs.bd -v "thin" creates thin LV
* Capability and backend information added to gluster volume info (and
--xml) so
  that management tools can exploit BD xlator.
* tracing support for bd xlator added

TODO:
* Add support to display snapshots for a given LV
* Display posix filename for list-origin instead of gfid

Change-Id: I00d32dfbab3b7c806e0841515c86c3aa519332f2
BUG: 1028672
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Reviewed-on: http://review.gluster.org/4809
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
This commit is contained in:
M. Mohan Kumar 2013-11-13 22:44:42 +05:30 committed by Anand Avati
parent 15a8ecd9b3
commit 48c40e1a42
21 changed files with 3289 additions and 3 deletions

View File

@ -1829,7 +1829,12 @@ struct cli_cmd volume_cmds[] = {
"list information of all volumes"},
{ "volume create <NEW-VOLNAME> [stripe <COUNT>] [replica <COUNT>] "
"[transport <tcp|rdma|tcp,rdma>] <NEW-BRICK> ... [force]",
"[transport <tcp|rdma|tcp,rdma>] <NEW-BRICK>"
#ifdef HAVE_BD_XLATOR
"?<vg_name>"
#endif
"... [force]",
cli_cmd_volume_create_cbk,
"create a new volume of specified type with mentioned bricks"},

View File

@ -496,6 +496,8 @@ gf_cli_get_volume_cbk (struct rpc_req *req, struct iovec *iov,
char key[1024] = {0};
char err_str[2048] = {0};
gf_cli_rsp rsp = {0};
char *caps = NULL;
int k __attribute__((unused)) = 0;
if (-1 == req->rpc_status)
goto out;
@ -658,6 +660,40 @@ xml_output:
cli_out ("Volume ID: %s", volume_id_str);
cli_out ("Status: %s", cli_vol_status_str[status]);
#ifdef HAVE_BD_XLATOR
k = 0;
memset (key, 0, sizeof (key));
snprintf (key, sizeof (key), "volume%d.xlator%d", i, k);
ret = dict_get_str (dict, key, &caps);
if (ret)
goto next;
do {
j = 0;
cli_out ("Xlator %d: %s", k + 1, caps);
do {
memset (key, 0, sizeof (key));
snprintf (key, sizeof (key),
"volume%d.xlator%d.caps%d",
i, k, j++);
ret = dict_get_str (dict, key, &caps);
if (ret)
break;
cli_out ("Capability %d: %s", j, caps);
} while (1);
memset (key, 0, sizeof (key));
snprintf (key, sizeof (key),
"volume%d.xlator%d", i, ++k);
ret = dict_get_str (dict, key, &caps);
if (ret)
break;
} while (1);
next:
#else
caps = 0; /* Avoid compiler warnings when BD not enabled */
#endif
if (type == GF_CLUSTER_TYPE_STRIPE_REPLICATE) {
cli_out ("Number of Bricks: %d x %d x %d = %d",
(brick_count / dist_count),
@ -693,6 +729,12 @@ xml_output:
goto out;
cli_out ("Brick%d: %s", j, brick);
#ifdef HAVE_BD_XLATOR
snprintf (key, 256, "volume%d.vg%d", i, j);
ret = dict_get_str (dict, key, &caps);
if (!ret)
cli_out ("Brick%d VG: %s", j, caps);
#endif
j++;
}

View File

@ -2497,7 +2497,8 @@ cli_xml_output_vol_info (cli_local_t *local, dict_t *dict)
char key[1024] = {0,};
int i = 0;
int j = 1;
char *caps = NULL;
int k __attribute__((unused)) = 0;
ret = dict_get_int32 (dict, "count", &count);
if (ret)
@ -2613,6 +2614,62 @@ cli_xml_output_vol_info (cli_local_t *local, dict_t *dict)
"%d", transport);
XML_RET_CHECK_AND_GOTO (ret, out);
#ifdef HAVE_BD_XLATOR
/* <xlators> */
ret = xmlTextWriterStartElement (local->writer,
(xmlChar *)"xlators");
XML_RET_CHECK_AND_GOTO (ret, out);
for (k = 0; ; k++) {
memset (key, 0, sizeof (key));
snprintf (key, sizeof (key),"volume%d.xlator%d", i, k);
ret = dict_get_str (dict, key, &caps);
if (ret)
break;
/* <xlator> */
ret = xmlTextWriterStartElement (local->writer,
(xmlChar *)"xlator");
XML_RET_CHECK_AND_GOTO (ret, out);
ret = xmlTextWriterWriteFormatElement
(local->writer, (xmlChar *)"name", "%s", caps);
XML_RET_CHECK_AND_GOTO (ret, out);
/* <capabilities> */
ret = xmlTextWriterStartElement (local->writer,
(xmlChar *)
"capabilities");
XML_RET_CHECK_AND_GOTO (ret, out);
j = 0;
for (j = 0; ;j++) {
memset (key, 0, sizeof (key));
snprintf (key, sizeof (key),
"volume%d.xlator%d.caps%d", i, k, j);
ret = dict_get_str (dict, key, &caps);
if (ret)
break;
ret = xmlTextWriterWriteFormatElement
(local->writer, (xmlChar *)"capability",
"%s", caps);
XML_RET_CHECK_AND_GOTO (ret, out);
}
/* </capabilities> */
ret = xmlTextWriterEndElement (local->writer);
XML_RET_CHECK_AND_GOTO (ret, out);
/* </xlator> */
ret = xmlTextWriterEndElement (local->writer);
XML_RET_CHECK_AND_GOTO (ret, out);
}
ret = xmlTextWriterFullEndElement (local->writer);
XML_RET_CHECK_AND_GOTO (ret, out);
/* </xlators> */
#else
caps = 0; /* Avoid compiler warnings when BD not enabled */
#endif
j = 1;
/* <bricks> */
ret = xmlTextWriterStartElement (local->writer,
(xmlChar *)"bricks");

View File

@ -53,6 +53,8 @@ AC_CONFIG_FILES([Makefile
xlators/storage/Makefile
xlators/storage/posix/Makefile
xlators/storage/posix/src/Makefile
xlators/storage/bd/Makefile
xlators/storage/bd/src/Makefile
xlators/cluster/Makefile
xlators/cluster/afr/Makefile
xlators/cluster/afr/src/Makefile
@ -301,6 +303,43 @@ if test "x$enable_fuse_client" != "xno"; then
BUILD_FUSE_CLIENT="yes"
fi
AC_ARG_ENABLE([bd-xlator],
AC_HELP_STRING([--enable-bd-xlator], [Build BD xlator]))
if test "x$enable_bd_xlator" != "xno"; then
AC_CHECK_LIB([lvm2app],
[lvm_init,lvm_lv_from_name],
[HAVE_BD_LIB="yes"],
[HAVE_BD_LIB="no"])
if test "x$HAVE_BD_LIB" = "xyes"; then
# lvm_lv_from_name() has been made public with lvm2-2.02.79
AC_CHECK_DECLS(
[lvm_lv_from_name],
[NEED_LVM_LV_FROM_NAME_DECL="no"],
[NEED_LVM_LV_FROM_NAME_DECL="yes"],
[[#include <lvm2app.h>]])
fi
fi
if test "x$enable_bd_xlator" = "xyes" -a "x$HAVE_BD_LIB" = "xno"; then
echo "BD xlator requested but required lvm2 development library not found."
exit 1
fi
BUILD_BD_XLATOR=no
if test "x${enable-bd-xlator}" != "xno" -a "x${HAVE_BD_LIB}" = "xyes"; then
BUILD_BD_XLATOR=yes
AC_DEFINE(HAVE_BD_XLATOR, 1, [define if lvm2app library found and bd xlator
enabled])
if test "x$NEED_LVM_LV_FROM_NAME_DECL" = "xyes"; then
AC_DEFINE(NEED_LVM_LV_FROM_NAME_DECL, 1, [defined if lvm_lv_from_name()
was not found in the lvm2app.h header, but can be linked])
fi
fi
AM_CONDITIONAL([ENABLE_BD_XLATOR], [test x$BUILD_BD_XLATOR = xyes])
AC_SUBST(FUSE_CLIENT_SUBDIR)
# end FUSE section
@ -821,6 +860,7 @@ echo "georeplication : $BUILD_SYNCDAEMON"
echo "Linux-AIO : $BUILD_LIBAIO"
echo "Enable Debug : $BUILD_DEBUG"
echo "systemtap : $BUILD_SYSTEMTAP"
echo "Block Device xlator : $BUILD_BD_XLATOR"
echo "glupy : $BUILD_GLUPY"
echo "Use syslog : $USE_SYSLOG"
echo "XML output : $BUILD_XML_OUTPUT"

View File

@ -2,6 +2,9 @@ xlator_LTLIBRARIES = glusterd.la
xlatordir = $(libdir)/glusterfs/$(PACKAGE_VERSION)/xlator/mgmt
glusterd_la_CPPFLAGS = $(AM_CPPFLAGS) "-DFILTERDIR=\"$(libdir)/glusterfs/$(PACKAGE_VERSION)/filter\""
glusterd_la_LDFLAGS = -module -avoid-version
if ENABLE_BD_XLATOR
glusterd_la_LDFLAGS += -llvm2app
endif
glusterd_la_SOURCES = glusterd.c glusterd-handler.c glusterd-sm.c \
glusterd-op-sm.c glusterd-utils.c glusterd-rpc-ops.c \
glusterd-store.c glusterd-handshake.c glusterd-pmap.c \

View File

@ -1025,6 +1025,8 @@ glusterd_op_perform_add_bricks (glusterd_volinfo_t *volinfo, int32_t count,
glusterd_brickinfo_t *brickinfo = NULL;
glusterd_gsync_status_temp_t param = {0, };
gf_boolean_t restart_needed = 0;
char msg[1024] __attribute__((unused)) = {0, };
int caps = 0;
GF_ASSERT (volinfo);
@ -1105,12 +1107,30 @@ glusterd_op_perform_add_bricks (glusterd_volinfo_t *volinfo, int32_t count,
if (count)
brick = strtok_r (brick_list+1, " \n", &saveptr);
#ifdef HAVE_BD_XLATOR
if (brickinfo->vg[0])
caps = CAPS_BD | CAPS_THIN;
#endif
while (i <= count) {
ret = glusterd_volume_brickinfo_get_by_brick (brick, volinfo,
&brickinfo);
if (ret)
goto out;
#ifdef HAVE_BD_XLATOR
/* Check for VG/thin pool if its BD volume */
if (brickinfo->vg[0]) {
ret = glusterd_is_valid_vg (brickinfo, 0, msg);
if (ret) {
gf_log (THIS->name, GF_LOG_CRITICAL, "%s", msg);
goto out;
}
/* if anyone of the brick does not have thin support,
disable it for entire volume */
caps &= brickinfo->caps;
} else
caps = 0;
#endif
if (uuid_is_null (brickinfo->uuid)) {
ret = glusterd_resolve_brick (brickinfo);
@ -1147,7 +1167,7 @@ glusterd_op_perform_add_bricks (glusterd_volinfo_t *volinfo, int32_t count,
dict_foreach (volinfo->gsync_slaves,
_glusterd_restart_gsync_session, &param);
}
volinfo->caps = caps;
out:
GF_FREE (free_ptr1);
GF_FREE (free_ptr2);
@ -1321,6 +1341,18 @@ glusterd_op_stage_add_brick (dict_t *dict, char **op_errstr)
}
if (!uuid_compare (brickinfo->uuid, MY_UUID)) {
#ifdef HAVE_BD_XLATOR
if (brickinfo->vg[0]) {
ret = glusterd_is_valid_vg (brickinfo, 1, msg);
if (ret) {
gf_log (THIS->name, GF_LOG_ERROR, "%s",
msg);
*op_errstr = gf_strdup (msg);
goto out;
}
}
#endif
ret = glusterd_validate_and_create_brickpath (brickinfo,
volinfo->volume_id,
op_errstr, is_force);

View File

@ -50,6 +50,10 @@
#include "globals.h"
#include "glusterd-syncop.h"
#ifdef HAVE_BD_XLATOR
#include <lvm2app.h>
#endif
int glusterd_big_locked_notify (struct rpc_clnt *rpc, void *mydata,
rpc_clnt_event_t event,
void *data, rpc_clnt_notify_t notify_fn)
@ -395,6 +399,39 @@ glusterd_add_volume_detail_to_dict (glusterd_volinfo_t *volinfo,
if (ret)
goto out;
#ifdef HAVE_BD_XLATOR
if (volinfo->caps) {
snprintf (key, 256, "volume%d.xlator0", count);
buf = GF_MALLOC (256, gf_common_mt_char);
if (!buf) {
ret = ENOMEM;
goto out;
}
if (volinfo->caps & CAPS_BD)
snprintf (buf, 256, "BD");
ret = dict_set_dynstr (volumes, key, buf);
if (ret) {
GF_FREE (buf);
goto out;
}
if (volinfo->caps & CAPS_THIN) {
snprintf (key, 256, "volume%d.xlator0.caps0", count);
buf = GF_MALLOC (256, gf_common_mt_char);
if (!buf) {
ret = ENOMEM;
goto out;
}
snprintf (buf, 256, "thin");
ret = dict_set_dynstr (volumes, key, buf);
if (ret) {
GF_FREE (buf);
goto out;
}
}
}
#endif
list_for_each_entry (brickinfo, &volinfo->bricks, brick_list) {
char brick[1024] = {0,};
char brick_uuid[64] = {0,};
@ -414,6 +451,16 @@ glusterd_add_volume_detail_to_dict (glusterd_volinfo_t *volinfo,
if (ret)
goto out;
#ifdef HAVE_BD_XLATOR
if (volinfo->caps & CAPS_BD) {
snprintf (key, 256, "volume%d.vg%d", count, i);
snprintf (brick, 1024, "%s", brickinfo->vg);
buf = gf_strdup (brick);
ret = dict_set_dynstr (volumes, key, buf);
if (ret)
goto out;
}
#endif
i++;
}

View File

@ -286,4 +286,7 @@ glusterd_check_gsync_running (glusterd_volinfo_t *volinfo, gf_boolean_t *flag);
int
glusterd_defrag_volume_node_rsp (dict_t *req_dict, dict_t *rsp_dict,
dict_t *op_ctx);
int
glusterd_is_valid_vg (glusterd_brickinfo_t *brick, int check_tag, char *msg);
#endif

View File

@ -241,6 +241,11 @@ glusterd_store_brickinfo_write (int fd, glusterd_brickinfo_t *brickinfo)
if (ret)
goto out;
if (!brickinfo->vg[0])
goto out;
ret = gf_store_save_value (fd, GLUSTERD_STORE_KEY_BRICK_VGNAME,
brickinfo->vg);
out:
gf_log (THIS->name, GF_LOG_DEBUG, "Returning %d", ret);
return ret;
@ -581,6 +586,13 @@ glusterd_volume_exclude_options_write (int fd, glusterd_volinfo_t *volinfo)
buf);
if (ret)
goto out;
if (volinfo->caps) {
snprintf (buf, sizeof (buf), "%d", volinfo->caps);
ret = gf_store_save_value (fd, GLUSTERD_STORE_KEY_VOL_CAPS,
buf);
if (ret)
goto out;
}
out:
if (ret)
@ -1538,6 +1550,11 @@ glusterd_store_retrieve_bricks (glusterd_volinfo_t *volinfo)
} else if (!strncmp (key, GLUSTERD_STORE_KEY_BRICK_DECOMMISSIONED,
strlen (GLUSTERD_STORE_KEY_BRICK_DECOMMISSIONED))) {
gf_string2int (value, &brickinfo->decommissioned);
} else if (!strncmp (key,
GLUSTERD_STORE_KEY_BRICK_VGNAME,
strlen (GLUSTERD_STORE_KEY_BRICK_VGNAME))) {
strncpy (brickinfo->vg, value,
sizeof (brickinfo->vg));
} else {
gf_log ("", GF_LOG_ERROR, "Unknown key: %s",
key);
@ -1856,6 +1873,9 @@ glusterd_store_retrieve_volume (char *volname)
} else if (!strncmp (key, GLUSTERD_STORE_KEY_VOL_CLIENT_OP_VERSION,
strlen (GLUSTERD_STORE_KEY_VOL_CLIENT_OP_VERSION))) {
volinfo->client_op_version = atoi (value);
} else if (!strncmp (key, GLUSTERD_STORE_KEY_VOL_CAPS,
strlen (GLUSTERD_STORE_KEY_VOL_CAPS))) {
volinfo->caps = atoi (value);
} else {
if (is_key_glusterd_hooks_friendly (key)) {

View File

@ -64,11 +64,13 @@ typedef enum glusterd_store_ver_ac_{
#define GLUSTERD_STORE_KEY_BRICK_PORT "listen-port"
#define GLUSTERD_STORE_KEY_BRICK_RDMA_PORT "rdma.listen-port"
#define GLUSTERD_STORE_KEY_BRICK_DECOMMISSIONED "decommissioned"
#define GLUSTERD_STORE_KEY_BRICK_VGNAME "vg"
#define GLUSTERD_STORE_KEY_PEER_UUID "uuid"
#define GLUSTERD_STORE_KEY_PEER_HOSTNAME "hostname"
#define GLUSTERD_STORE_KEY_PEER_STATE "state"
#define GLUSTERD_STORE_KEY_VOL_CAPS "caps"
#define glusterd_for_each_entry(entry, dir) \
do {\

View File

@ -49,6 +49,11 @@
#include <unistd.h>
#include <fnmatch.h>
#include <sys/statvfs.h>
#include <ifaddrs.h>
#ifdef HAVE_BD_XLATOR
#include <lvm2app.h>
#endif
#ifdef GF_LINUX_HOST_OS
#include <mntent.h>
@ -622,6 +627,7 @@ glusterd_brickinfo_new_from_brick (char *brick,
char *path = NULL;
char *tmp_host = NULL;
char *tmp_path = NULL;
char *vg = NULL;
GF_ASSERT (brick);
GF_ASSERT (brickinfo);
@ -640,6 +646,17 @@ glusterd_brickinfo_new_from_brick (char *brick,
if (ret)
goto out;
#ifdef HAVE_BD_XLATOR
vg = strchr (path, '?');
/* ? is used as a delimiter for vg */
if (vg) {
strncpy (new_brickinfo->vg, vg + 1, PATH_MAX - 1);
*vg = '\0';
}
new_brickinfo->caps = CAPS_BD;
#else
vg = NULL; /* Avoid compiler warnings when BD not enabled */
#endif
ret = gf_canonicalize_path (path);
if (ret)
goto out;
@ -743,6 +760,62 @@ out:
return available;
}
#ifdef HAVE_BD_XLATOR
/*
* Sets the tag of the format "trusted.glusterfs.volume-id:<uuid>" in
* the brick VG. It is used to avoid using same VG for another brick.
* @volume-id - gfid, @brick - brick info, @msg - Error message returned
* to the caller
*/
int
glusterd_bd_set_vg_tag (unsigned char *volume_id, glusterd_brickinfo_t *brick,
char *msg, int msg_size)
{
lvm_t handle = NULL;
vg_t vg = NULL;
char *uuid = NULL;
int ret = -1;
gf_asprintf (&uuid, "%s:%s", GF_XATTR_VOL_ID_KEY,
uuid_utoa (volume_id));
if (!uuid) {
snprintf (msg, sizeof(*msg), "Could not allocate memory "
"for tag");
return -1;
}
handle = lvm_init (NULL);
if (!handle) {
snprintf (msg, sizeof(*msg), "lvm_init failed");
goto out;
}
vg = lvm_vg_open (handle, brick->vg, "w", 0);
if (!vg) {
snprintf (msg, sizeof(*msg), "Could not open VG %s",
brick->vg);
goto out;
}
if (lvm_vg_add_tag (vg, uuid) < 0) {
snprintf (msg, sizeof(*msg), "Could not set tag %s for "
"VG %s", uuid, brick->vg);
goto out;
}
lvm_vg_write (vg);
ret = 0;
out:
GF_FREE (uuid);
if (vg)
lvm_vg_close (vg);
if (handle)
lvm_quit (handle);
return ret;
}
#endif
int
glusterd_validate_and_create_brickpath (glusterd_brickinfo_t *brickinfo,
uuid_t volume_id, char **op_errstr,
@ -825,6 +898,14 @@ glusterd_validate_and_create_brickpath (glusterd_brickinfo_t *brickinfo,
}
}
#ifdef HAVE_BD_XLATOR
if (brickinfo->vg[0]) {
ret = glusterd_bd_set_vg_tag (volume_id, brickinfo, msg,
sizeof(msg));
if (ret)
goto out;
}
#endif
ret = glusterd_check_and_set_brick_xattr (brickinfo->hostname,
brickinfo->path, volume_id,
op_errstr, is_force);

View File

@ -594,6 +594,8 @@ get_server_xlator (char *xlator)
subvol = GF_XLATOR_MARKER;
if (strcmp (xlator, "io-stats") == 0)
subvol = GF_XLATOR_IO_STATS;
if (strcmp (xlator, "bd") == 0)
subvol = GF_XLATOR_BD;
return subvol;
}
@ -1456,7 +1458,26 @@ server_graph_builder (volgen_graph_t *graph, glusterd_volinfo_t *volinfo,
"posix");
if (ret)
return -1;
#ifdef HAVE_BD_XLATOR
if (*brickinfo->vg != '\0') {
/* Now add BD v2 xlator if volume is BD type */
xl = volgen_graph_add (graph, "storage/bd", volname);
if (!xl)
return -1;
ret = xlator_set_option (xl, "device", "vg");
if (ret)
return -1;
ret = xlator_set_option (xl, "export", brickinfo->vg);
if (ret)
return -1;
ret = check_and_add_debug_xl (graph, set_dict, volname, "bd");
if (ret)
return -1;
}
#endif
xl = volgen_graph_add (graph, "features/changelog", volname);
if (!xl)

View File

@ -75,6 +75,7 @@ typedef enum {
GF_XLATOR_INDEX,
GF_XLATOR_MARKER,
GF_XLATOR_IO_STATS,
GF_XLATOR_BD,
GF_XLATOR_NONE,
} glusterd_server_xlator_t;

View File

@ -12,6 +12,10 @@
#include "config.h"
#endif
#ifdef HAVE_BD_XLATOR
#include <lvm2app.h>
#endif
#include "common-utils.h"
#include "syscall.h"
#include "cli1-xdr.h"
@ -26,6 +30,7 @@
#define glusterd_op_start_volume_args_get(dict, volname, flags) \
glusterd_op_stop_volume_args_get (dict, volname, flags)
int
__glusterd_handle_create_volume (rpcsvc_request_t *req)
{
@ -599,6 +604,101 @@ glusterd_handle_cli_statedump_volume (rpcsvc_request_t *req)
__glusterd_handle_cli_statedump_volume);
}
#ifdef HAVE_BD_XLATOR
/*
* Validates if given VG in the brick exists or not. Also checks if VG has
* GF_XATTR_VOL_ID_KEY tag set to avoid using same VG for multiple bricks.
* Tag is checked only during glusterd_op_stage_create_volume. Tag is set during
* glusterd_validate_and_create_brickpath().
* @brick - brick info, @check_tag - check for VG tag or not
* @msg - Error message to return to caller
*/
int
glusterd_is_valid_vg (glusterd_brickinfo_t *brick, int check_tag, char *msg)
{
lvm_t handle = NULL;
vg_t vg = NULL;
char *vg_name = NULL;
int retval = 0;
char *p = NULL;
char *ptr = NULL;
struct dm_list *dm_lvlist = NULL;
struct dm_list *dm_seglist = NULL;
struct lvm_lv_list *lv_list = NULL;
struct lvm_property_value prop = {0, };
struct lvm_lvseg_list *seglist = NULL;
struct dm_list *taglist = NULL;
struct lvm_str_list *strl = NULL;
handle = lvm_init (NULL);
if (!handle) {
sprintf (msg, "lvm_init failed, could not validate vg");
return -1;
}
if (*brick->vg == '\0') { /* BD xlator has vg in brick->path */
p = gf_strdup (brick->path);
vg_name = strtok_r (p, "/", &ptr);
} else
vg_name = brick->vg;
vg = lvm_vg_open (handle, vg_name, "r", 0);
if (!vg) {
sprintf (msg, "no such vg: %s", vg_name);
retval = -1;
goto out;
}
if (!check_tag)
goto next;
taglist = lvm_vg_get_tags (vg);
if (!taglist)
goto next;
dm_list_iterate_items (strl, taglist) {
if (!strncmp(strl->str, GF_XATTR_VOL_ID_KEY,
strlen (GF_XATTR_VOL_ID_KEY))) {
sprintf (msg, "VG %s is already part of"
" a brick", vg_name);
retval = -1;
goto out;
}
}
next:
brick->caps = CAPS_BD;
dm_lvlist = lvm_vg_list_lvs (vg);
if (!dm_lvlist)
goto out;
dm_list_iterate_items (lv_list, dm_lvlist) {
dm_seglist = lvm_lv_list_lvsegs (lv_list->lv);
dm_list_iterate_items (seglist, dm_seglist) {
prop = lvm_lvseg_get_property (seglist->lvseg,
"segtype");
if (!prop.is_valid || !prop.value.string)
continue;
if (!strcmp (prop.value.string, "thin-pool")) {
brick->caps |= CAPS_THIN;
gf_log (THIS->name, GF_LOG_INFO, "Thin Pool "
"\"%s\" will be used for thin LVs",
lvm_lv_get_name (lv_list->lv));
break;
}
}
}
retval = 0;
out:
if (vg)
lvm_vg_close (vg);
lvm_quit (handle);
if (p)
GF_FREE (p);
return retval;
}
#endif
/* op-sm */
int
glusterd_op_stage_create_volume (dict_t *dict, char **op_errstr)
@ -712,6 +812,11 @@ glusterd_op_stage_create_volume (dict_t *dict, char **op_errstr)
}
if (!uuid_compare (brick_info->uuid, MY_UUID)) {
if (brick_info->vg[0]) {
ret = glusterd_is_valid_vg (brick_info, 1, msg);
if (ret)
goto out;
}
ret = glusterd_validate_and_create_brickpath (brick_info,
volume_uuid, op_errstr,
is_force);
@ -809,6 +914,7 @@ glusterd_op_stage_start_volume (dict_t *dict, char **op_errstr)
uuid_t volume_id = {0,};
char volid[50] = {0,};
char xattr_volid[50] = {0,};
int caps = 0;
this = THIS;
GF_ASSERT (this);
@ -847,6 +953,7 @@ glusterd_op_stage_start_volume (dict_t *dict, char **op_errstr)
}
}
list_for_each_entry (brickinfo, &volinfo->bricks, brick_list) {
ret = glusterd_resolve_brick (brickinfo);
if (ret) {
@ -902,8 +1009,24 @@ glusterd_op_stage_start_volume (dict_t *dict, char **op_errstr)
ret = -1;
goto out;
}
#ifdef HAVE_BD_XLATOR
if (brickinfo->vg[0])
caps = CAPS_BD | CAPS_THIN;
/* Check for VG/thin pool if its BD volume */
if (brickinfo->vg[0]) {
ret = glusterd_is_valid_vg (brickinfo, 0, msg);
if (ret)
goto out;
/* if anyone of the brick does not have thin support,
disable it for entire volume */
caps &= brickinfo->caps;
} else
caps = 0;
#endif
}
volinfo->caps = caps;
ret = 0;
out:
if (ret && (msg[0] != '\0')) {
@ -1315,6 +1438,8 @@ glusterd_op_create_volume (dict_t *dict, char **op_errstr)
char *str = NULL;
char *username = NULL;
char *password = NULL;
int caps = 0;
char msg[1024] __attribute__((unused)) = {0, };
this = THIS;
GF_ASSERT (this);
@ -1477,6 +1602,7 @@ glusterd_op_create_volume (dict_t *dict, char **op_errstr)
if (count)
brick = strtok_r (brick_list+1, " \n", &saveptr);
caps = CAPS_BD | CAPS_THIN;
while ( i <= count) {
ret = glusterd_brickinfo_new_from_brick (brick, &brickinfo);
@ -1489,6 +1615,27 @@ glusterd_op_create_volume (dict_t *dict, char **op_errstr)
brickinfo->hostname, brickinfo->path);
goto out;
}
#ifdef HAVE_BD_XLATOR
if (!uuid_compare (brickinfo->uuid, MY_UUID)) {
if (brickinfo->vg[0]) {
ret = glusterd_is_valid_vg (brickinfo, 0, msg);
if (ret) {
gf_log (this->name, GF_LOG_ERROR, "%s",
msg);
goto out;
}
/* if anyone of the brick does not have thin
support, disable it for entire volume */
caps &= brickinfo->caps;
} else
caps = 0;
}
#endif
list_add_tail (&brickinfo->brick_list, &volinfo->bricks);
brick = strtok_r (NULL, " \n", &saveptr);
i++;
@ -1496,6 +1643,8 @@ glusterd_op_create_volume (dict_t *dict, char **op_errstr)
gd_update_volume_op_versions (volinfo);
volinfo->caps = caps;
ret = glusterd_store_volinfo (volinfo, GLUSTERD_VOLINFO_VER_AC_INCREMENT);
if (ret) {
glusterd_store_delete_volume (volinfo);

View File

@ -176,6 +176,8 @@ struct glusterd_brickinfo {
gf_brick_status_t status;
struct rpc_clnt *rpc;
int decommissioned;
char vg[PATH_MAX]; /* FIXME: Use max size for length of vg */
int caps; /* Capability */
};
typedef struct glusterd_brickinfo glusterd_brickinfo_t;
@ -231,6 +233,10 @@ struct _auth {
typedef struct _auth auth_t;
/* Capabilities of xlator */
#define CAPS_BD 0x00000001
#define CAPS_THIN 0x00000010
struct glusterd_rebalance_ {
gf_defrag_status_t defrag_status;
uint64_t rebalance_files;
@ -300,6 +306,7 @@ struct glusterd_volinfo_ {
xlator_t *xl;
gf_boolean_t memory_accounting;
int caps; /* Capability */
int op_version;
int client_op_version;

View File

@ -1,3 +1,7 @@
SUBDIRS = posix
if ENABLE_BD_XLATOR
SUBDIRS += bd
endif
CLEANFILES =

View File

@ -0,0 +1,3 @@
SUBDIRS = src
CLEANFILES =

View File

@ -0,0 +1,20 @@
if ENABLE_BD_XLATOR
xlator_LTLIBRARIES = bd.la
xlatordir = $(libdir)/glusterfs/$(PACKAGE_VERSION)/xlator/storage
bd_la_LDFLAGS = -module -avoid-version
LIBBD = -llvm2app -lrt
bd_la_SOURCES = bd.c bd-helper.c
bd_la_LIBADD = $(top_builddir)/libglusterfs/src/libglusterfs.la $(LIBBD)
noinst_HEADERS = bd.h
AM_CPPFLAGS = $(GF_CPPFLAGS) -I$(top_srcdir)/libglusterfs/src \
-I$(top_srcdir)/rpc/xdr/src \
-I$(top_srcdir)/rpc/rpc-lib/src
AM_CFLAGS = -fno-strict-aliasing -Wall $(GF_CFLAGS)
CLEANFILES =
endif

View File

@ -0,0 +1,562 @@
#ifndef _CONFIG_H
#define _CONFIG_H
#include "config.h"
#endif
#include <lvm2app.h>
#include "bd.h"
#include "run.h"
int
bd_inode_ctx_set (inode_t *inode, xlator_t *this, bd_attr_t *ctx)
{
int ret = -1;
uint64_t ctx_int = 0;
GF_VALIDATE_OR_GOTO (this->name, inode, out);
GF_VALIDATE_OR_GOTO (this->name, ctx, out);
ctx_int = (long)ctx;
ret = inode_ctx_set (inode, this, &ctx_int);
out:
return ret;
}
int
bd_inode_ctx_get (inode_t *inode, xlator_t *this, bd_attr_t **ctx)
{
int ret = -1;
uint64_t ctx_int = 0;
GF_VALIDATE_OR_GOTO (this->name, inode, out);
ret = inode_ctx_get (inode, this, &ctx_int);
if (ret)
return ret;
if (ctx)
*ctx = (bd_attr_t *) ctx_int;
out:
return ret;
}
void
bd_local_free (xlator_t *this, bd_local_t *local)
{
if (!local)
return;
if (local->fd)
fd_unref (local->fd);
else if (local->loc.path)
loc_wipe (&local->loc);
if (local->dict)
dict_unref (local->dict);
if (local->inode)
inode_unref (local->inode);
if (local->bdatt) {
GF_FREE (local->bdatt->type);
GF_FREE (local->bdatt);
}
mem_put (local);
local = NULL;
}
bd_local_t *
bd_local_init (call_frame_t *frame, xlator_t *this)
{
frame->local = mem_get0 (this->local_pool);
if (!frame->local)
return NULL;
return frame->local;
}
/*
* VG are set with the tag in GF_XATTR_VOL_ID_KEY:<uuid> format.
* This function validates this tag agains volume-uuid. Also goes
* through LV list to find out if a thin-pool is configured or not.
*/
int bd_scan_vg (xlator_t *this, bd_priv_t *priv)
{
vg_t brick = NULL;
data_t *tmp_data = NULL;
struct dm_list *tags = NULL;
int op_ret = -1;
uuid_t dict_uuid = {0, };
uuid_t vg_uuid = {0, };
gf_boolean_t uuid = _gf_false;
lvm_str_list_t *strl = NULL;
struct dm_list *lv_dm_list = NULL;
lv_list_t *lv_list = NULL;
struct dm_list *dm_seglist = NULL;
lvseg_list_t *seglist = NULL;
lvm_property_value_t prop = {0, };
gf_boolean_t thin = _gf_false;
const char *lv_name = NULL;
brick = lvm_vg_open (priv->handle, priv->vg, "w", 0);
if (!brick) {
gf_log (this->name, GF_LOG_CRITICAL, "VG %s is not found",
priv->vg);
return ENOENT;
}
lv_dm_list = lvm_vg_list_lvs (brick);
if (!lv_dm_list)
goto check;
dm_list_iterate_items (lv_list, lv_dm_list) {
dm_seglist = lvm_lv_list_lvsegs (lv_list->lv);
if (!dm_seglist)
continue;
dm_list_iterate_items (seglist, dm_seglist) {
prop = lvm_lvseg_get_property (seglist->lvseg,
"segtype");
if (!prop.is_valid || !prop.value.string)
continue;
if (!strcmp (prop.value.string, "thin-pool")) {
thin = _gf_true;
lv_name = lvm_lv_get_name (lv_list->lv);
priv->pool = gf_strdup (lv_name);
gf_log (THIS->name, GF_LOG_INFO, "Thin Pool "
"\"%s\" will be used for thin LVs",
lv_name);
break;
}
}
}
check:
/* If there is no volume-id set in dict, we cant validate */
tmp_data = dict_get (this->options, "volume-id");
if (!tmp_data) {
op_ret = 0;
goto out;
}
op_ret = uuid_parse (tmp_data->data, dict_uuid);
if (op_ret < 0) {
gf_log (this->name, GF_LOG_ERROR,
"wrong volume-id (%s) set in volume file",
tmp_data->data);
op_ret = -1;
goto out;
}
tags = lvm_vg_get_tags (brick);
if (!tags) { /* no tags in the VG */
gf_log (this->name, GF_LOG_ERROR,
"Extended attribute trusted.glusterfs."
"volume-id is absent");
op_ret = -1;
goto out;
}
dm_list_iterate_items (strl, tags) {
if (!strncmp (strl->str, GF_XATTR_VOL_ID_KEY,
strlen (GF_XATTR_VOL_ID_KEY))) {
uuid = _gf_true;
break;
}
}
/* UUID tag is not set in VG */
if (!uuid) {
gf_log (this->name, GF_LOG_ERROR,
"Extended attribute trusted.glusterfs."
"volume-id is absent");
op_ret = -1;
goto out;
}
op_ret = uuid_parse (strl->str + strlen (GF_XATTR_VOL_ID_KEY) + 1,
vg_uuid);
if (op_ret < 0) {
gf_log (this->name, GF_LOG_ERROR,
"wrong volume-id (%s) set in VG", strl->str);
op_ret = -1;
goto out;
}
if (uuid_compare (dict_uuid, vg_uuid)) {
gf_log (this->name, GF_LOG_ERROR,
"mismatching volume-id (%s) received. "
"already is a part of volume %s ",
tmp_data->data, vg_uuid);
op_ret = -1;
goto out;
}
op_ret = 0;
out:
lvm_vg_close (brick);
if (!thin)
gf_log (THIS->name, GF_LOG_WARNING, "No thin pool found in "
"VG %s\n", priv->vg);
else
priv->caps |= BD_CAPS_THIN;
return op_ret;
}
/* FIXME: Move this code to common place, so posix and bd xlator can use */
char *
page_aligned_alloc (size_t size, char **aligned_buf)
{
char *alloc_buf = NULL;
char *buf = NULL;
alloc_buf = GF_CALLOC (1, (size + ALIGN_SIZE), gf_common_mt_char);
if (!alloc_buf)
return NULL;
/* page aligned buffer */
buf = GF_ALIGN_BUF (alloc_buf, ALIGN_SIZE);
*aligned_buf = buf;
return alloc_buf;
}
static int
__bd_fd_ctx_get (xlator_t *this, fd_t *fd, bd_fd_t **bdfd_p)
{
int ret = -1;
int _fd = -1;
char *devpath = NULL;
bd_fd_t *bdfd = NULL;
uint64_t tmp_bdfd = 0;
bd_priv_t *priv = this->private;
bd_gfid_t gfid = {0, };
bd_attr_t *bdatt = NULL;
/* not bd file */
if (fd->inode->ia_type != IA_IFREG ||
bd_inode_ctx_get (fd->inode, this, &bdatt))
return 0;
ret = __fd_ctx_get (fd, this, &tmp_bdfd);
if (ret == 0) {
bdfd = (void *)(long) tmp_bdfd;
*bdfd_p = bdfd;
return 0;
}
uuid_utoa_r (fd->inode->gfid, gfid);
asprintf (&devpath, "/dev/%s/%s", priv->vg, gfid);
if (!devpath)
goto out;
_fd = open (devpath, O_RDWR | O_LARGEFILE, 0);
if (_fd < 0) {
ret = errno;
gf_log (this->name, GF_LOG_ERROR, "open on %s: %s", devpath,
strerror (ret));
goto out;
}
bdfd = GF_CALLOC (1, sizeof(bd_fd_t), gf_bd_fd);
BD_VALIDATE_MEM_ALLOC (bdfd, ret, out);
bdfd->fd = _fd;
bdfd->flag = O_RDWR | O_LARGEFILE;
if (__fd_ctx_set (fd, this, (uint64_t)(long)bdfd) < 0) {
gf_log (this->name, GF_LOG_WARNING,
"failed to set the fd context fd=%p", fd);
goto out;
}
*bdfd_p = bdfd;
ret = 0;
out:
FREE (devpath);
if (ret) {
close (_fd);
GF_FREE (bdfd);
}
return ret;
}
int
bd_fd_ctx_get (xlator_t *this, fd_t *fd, bd_fd_t **bdfd)
{
int ret;
/* FIXME: Is it ok to fd->lock here ? */
LOCK (&fd->lock);
{
ret = __bd_fd_ctx_get (this, fd, bdfd);
}
UNLOCK (&fd->lock);
return ret;
}
/*
* Validates if LV exists for given inode or not.
* Returns 0 if LV exists and size also matches.
* If LV does not exist -1 returned
* If LV size mismatches, returnes 1 also lv_size is updated with actual
* size
*/
int
bd_validate_bd_xattr (xlator_t *this, char *bd, char **type,
uint64_t *lv_size, uuid_t uuid)
{
char *path = NULL;
int ret = -1;
bd_gfid_t gfid = {0, };
bd_priv_t *priv = this->private;
struct stat stbuf = {0, };
uint64_t size = 0;
vg_t vg = NULL;
lv_t lv = NULL;
char *bytes = NULL;
bytes = strrchr (bd, ':');
if (bytes) {
*bytes = '\0';
bytes++;
gf_string2bytesize (bytes, &size);
}
if (strcmp (bd, BD_LV) && strcmp (bd, BD_THIN)) {
gf_log (this->name, GF_LOG_WARNING,
"invalid xattr %s", bd);
return -1;
}
*type = gf_strdup (bd);
/*
* Check if LV really exist, there could be a failure
* after setxattr and successful LV creation
*/
uuid_utoa_r (uuid, gfid);
gf_asprintf (&path, "/dev/%s/%s", priv->vg, gfid);
if (!path) {
gf_log (this->name, GF_LOG_WARNING,
"insufficient memory");
return 0;
}
/* Destination file does not exist */
if (stat (path, &stbuf)) {
gf_log (this->name, GF_LOG_WARNING,
"lstat failed for path %s", path);
return -1;
}
vg = lvm_vg_open (priv->handle, priv->vg, "r", 0);
if (!vg) {
gf_log (this->name, GF_LOG_WARNING,
"VG %s does not exist?", priv->vg);
ret = -1;
goto out;
}
lv = lvm_lv_from_name (vg, gfid);
if (!lv) {
gf_log (this->name, GF_LOG_WARNING,
"LV %s does not exist", gfid);
ret = -1;
goto out;
}
*lv_size = lvm_lv_get_size (lv);
if (size == *lv_size) {
ret = 0;
goto out;
}
ret = 1;
out:
if (vg)
lvm_vg_close (vg);
GF_FREE (path);
return ret;
}
static int
create_thin_lv (char *vg, char *pool, char *lv, uint64_t extent)
{
int ret = -1;
runner_t runner = {0, };
char *path = NULL;
struct stat stat = {0, };
runinit (&runner);
runner_add_args (&runner, LVM_CREATE, NULL);
runner_add_args (&runner, "--thin", NULL);
runner_argprintf (&runner, "%s/%s", vg, pool);
runner_add_args (&runner, "--name", NULL);
runner_argprintf (&runner, "%s", lv);
runner_add_args (&runner, "--virtualsize", NULL);
runner_argprintf (&runner, "%ldB", extent);
runner_start (&runner);
runner_end (&runner);
gf_asprintf (&path, "/dev/%s/%s", vg, lv);
if (!path) {
ret = ENOMEM;
goto out;
}
if (lstat (path, &stat) < 0)
ret = EAGAIN;
else
ret = 0;
out:
GF_FREE (path);
return ret;
}
int
bd_create (uuid_t uuid, uint64_t size, char *type, bd_priv_t *priv)
{
int ret = 0;
vg_t vg = NULL;
bd_gfid_t gfid = {0, };
uuid_utoa_r (uuid, gfid);
if (!strcmp (type, BD_THIN))
return create_thin_lv (priv->vg, priv->pool, gfid,
size);
vg = lvm_vg_open (priv->handle, priv->vg, "w", 0);
if (!vg) {
gf_log (THIS->name, GF_LOG_WARNING, "opening VG %s failed",
priv->vg);
return ENOENT;
}
if (!lvm_vg_create_lv_linear (vg, gfid, size)) {
gf_log (THIS->name, GF_LOG_WARNING, "lvm_vg_create_lv_linear "
"failed");
ret = errno;
}
lvm_vg_close (vg);
return ret;
}
int32_t
bd_resize (bd_priv_t *priv, uuid_t uuid, off_t size)
{
uint64_t new_size = 0;
runner_t runner = {0, };
bd_gfid_t gfid = {0, };
int ret = 0;
vg_t vg = NULL;
lv_t lv = NULL;
uuid_utoa_r (uuid, gfid);
runinit (&runner);
runner_add_args (&runner, LVM_RESIZE, NULL);
runner_argprintf (&runner, "%s/%s", priv->vg, gfid);
runner_argprintf (&runner, "-L%ldb", size);
runner_add_args (&runner, "-f", NULL);
runner_start (&runner);
runner_end (&runner);
vg = lvm_vg_open (priv->handle, priv->vg, "w", 0);
if (!vg) {
gf_log (THIS->name, GF_LOG_WARNING, "opening VG %s failed",
priv->vg);
return EAGAIN;
}
lv = lvm_lv_from_name (vg, gfid);
if (!lv) {
gf_log (THIS->name, GF_LOG_WARNING, "LV %s not found", gfid);
ret = EIO;
goto out;
}
new_size = lvm_lv_get_size (lv);
if (new_size != size) {
gf_log (THIS->name, GF_LOG_WARNING, "resized LV size %ld does "
"not match requested size %ld", new_size, size);
ret = EIO;
}
out:
lvm_vg_close (vg);
return ret;
}
uint64_t
bd_get_default_extent (bd_priv_t *priv)
{
vg_t vg = NULL;
uint64_t size = 0;
vg = lvm_vg_open (priv->handle, priv->vg, "w", 0);
if (!vg) {
gf_log (THIS->name, GF_LOG_WARNING, "opening VG %s failed",
priv->vg);
return 0;
}
size = lvm_vg_get_extent_size (vg);
lvm_vg_close (vg);
return size;
}
/*
* Adjusts the user specified size to VG specific extent size
*/
uint64_t
bd_adjust_size (bd_priv_t *priv, uint64_t size)
{
uint64_t extent = 0;
uint64_t nr_ex = 0;
extent = bd_get_default_extent (priv);
if (!extent)
return 0;
nr_ex = size / extent;
if (size % extent)
nr_ex++;
size = extent * nr_ex;
return size;
}
int
bd_delete_lv (bd_priv_t *priv, const char *lv_name, int *op_errno)
{
vg_t vg = NULL;
lv_t lv = NULL;
int ret = -1;
*op_errno = 0;
vg = lvm_vg_open (priv->handle, priv->vg, "w", 0);
if (!vg) {
gf_log (THIS->name, GF_LOG_WARNING, "opening VG %s failed",
priv->vg);
*op_errno = ENOENT;
return -1;
}
lv = lvm_lv_from_name (vg, lv_name);
if (!lv) {
gf_log (THIS->name, GF_LOG_WARNING, "No such LV %s", lv_name);
*op_errno = ENOENT;
goto out;
}
ret = lvm_vg_remove_lv (lv);
if (ret < 0) {
gf_log (THIS->name, GF_LOG_WARNING, "removing LV %s failed",
lv_name);
*op_errno = errno;
goto out;
}
out:
lvm_vg_close (vg);
return ret;
}

2047
xlators/storage/bd/src/bd.c Normal file

File diff suppressed because it is too large Load Diff

140
xlators/storage/bd/src/bd.h Normal file
View File

@ -0,0 +1,140 @@
/*
BD translator - Exports Block devices on server side as regular
files to client
Copyright IBM, Corp. 2012
This file is part of GlusterFS.
Author:
M. Mohan Kumar <mohan@in.ibm.com>
This file is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3 or
later), or the GNU General Public License, version 2 (GPLv2), in all
cases as published by the Free Software Foundation.
*/
#ifndef _BD_H
#define _BD_H
#ifndef _CONFIG_H
#define _CONFIG_H
#include "config.h"
#endif
#include "xlator.h"
#include "mem-types.h"
#define BD_XLATOR "block device mapper xlator"
#define BACKEND_VG "vg"
#define GF_XATTR "user.glusterfs"
#define BD_XATTR GF_XATTR ".bd"
#define BD_LV "lv"
#define BD_THIN "thin"
#define LVM_RESIZE "/sbin/lvresize"
#define LVM_CREATE "/sbin/lvcreate"
#define VOL_TYPE "volume.type"
#define VOL_CAPS "volume.caps"
#define ALIGN_SIZE 4096
#define BD_CAPS_BD 0x01
#define BD_CAPS_THIN 0x02
#define BD_VALIDATE_MEM_ALLOC(buff, op_errno, label) \
if (!buff) { \
op_errno = ENOMEM; \
gf_log (this->name, GF_LOG_ERROR, "out of memory"); \
goto label; \
}
#define BD_VALIDATE_LOCAL_OR_GOTO(local, op_errno, label) \
if (!local) { \
op_errno = EINVAL; \
goto label; \
}
#define BD_STACK_UNWIND(typ, frame, args ...) do { \
bd_local_t *__local = frame->local; \
xlator_t *__this = frame->this; \
\
frame->local = NULL; \
STACK_UNWIND_STRICT (typ, frame, args); \
if (__local) \
bd_local_free (__this, __local); \
} while (0)
typedef char bd_gfid_t[GF_UUID_BUF_SIZE];
enum gf_bd_mem_types_ {
gf_bd_private = gf_common_mt_end + 1,
gf_bd_attr,
gf_bd_fd,
gf_bd_mt_end
};
/**
* bd_fd - internal structure
*/
typedef struct bd_fd {
int fd;
int32_t flag;
} bd_fd_t;
typedef struct bd_priv {
lvm_t handle;
char *vg;
char *pool;
int caps;
} bd_priv_t;
typedef enum bd_type {
BD_TYPE_NONE,
BD_TYPE_LV,
} bd_type_t;
typedef struct {
struct iatt iatt;
char *type;
} bd_attr_t;
typedef struct {
dict_t *dict;
bd_attr_t *bdatt;
inode_t *inode;
loc_t loc;
fd_t *fd;
data_t *data; /* for setxattr */
} bd_local_t;
typedef struct {
char *lv;
struct list_head list;
} bd_del_entry;
/* Prototypes */
int bd_inode_ctx_set (inode_t *inode, xlator_t *this, bd_attr_t *ctx);
int bd_inode_ctx_get (inode_t *inode, xlator_t *this, bd_attr_t **ctx);
int bd_scan_vg (xlator_t *this, bd_priv_t *priv);
bd_local_t *bd_local_init (call_frame_t *frame, xlator_t *this);
void bd_local_free (xlator_t *this, bd_local_t *local);
int bd_fd_ctx_get (xlator_t *this, fd_t *fd, bd_fd_t **bdfd);
char *page_aligned_alloc (size_t size, char **aligned_buf);
int bd_validate_bd_xattr (xlator_t *this, char *bd, char **type,
uint64_t *lv_size, uuid_t uuid);
uint64_t bd_get_default_extent (bd_priv_t *priv);
uint64_t bd_adjust_size (bd_priv_t *priv, uint64_t size);
int bd_create (uuid_t uuid, uint64_t size, char *type, bd_priv_t *priv);
int bd_resize (bd_priv_t *priv, uuid_t uuid, off_t size);
int bd_delete_lv (bd_priv_t *priv, const char *lv_name, int *op_errno);
int bd_snapshot_create (bd_local_t *local, bd_priv_t *priv);
int bd_clone (bd_local_t *local, bd_priv_t *priv);
int bd_merge (bd_priv_t *priv, uuid_t gfid);
int bd_get_origin (bd_priv_t *priv, loc_t *loc, fd_t *fd, dict_t *dict);
#endif