doc: added documentation for dispersed volumes

Change-Id: I8a8368bdbe31af30a239aaf8cc478429e10c3f57
BUG: 1147563
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/8885
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
This commit is contained in:
Xavier Hernandez 2014-09-29 15:48:55 +02:00 committed by Vijay Bellur
parent 535c425911
commit 36d2975714
2 changed files with 170 additions and 2 deletions

View File

@ -52,11 +52,24 @@ start it before attempting to mount it.
and performance is critical. In this release, configuration of
this volume type is supported only for Map Reduce workloads.
- **Dispersed** - Dispersed volumes are based on erasure codes,
providing space-efficient protection against disk or server failures.
It stores an encoded fragment of the original file to each brick in
a way that only a subset of the fragments is needed to recover the
original file. The number of bricks that can be missing without
losing access to data is configured by the administrator on volume
creation time.
- **Distributed Dispersed** - Distributed dispersed volumes distribute
files across dispersed subvolumes. This has the same advantages of
distribute replicate volumes, but using disperse to store the data
into the bricks.
**To create a new volume**
- Create a new volume :
`# gluster volume create [stripe | replica ] [transport tcp | rdma | tcp, rdma] `
`# gluster volume create [stripe | replica | disperse] [transport tcp | rdma | tcp, rdma] `
For example, to create a volume called test-volume consisting of
server3:/exp3 and server4:/exp4:
@ -389,6 +402,161 @@ of this volume type is supported only for Map Reduce workloads.
> Use the `force` option at the end of command if you want to create the volume in this case.
##Creating Dispersed Volumes
Dispersed volumes are based on erasure codes. It stripes the encoded data of
files, with some redundancy addedd, across multiple bricks in the volume. You
can use dispersed volumes to have a configurable level of reliability with a
minimum space waste.
**Redundancy**
Each dispersed volume has a redundancy value defined when the volume is
created. This value determines how many bricks can be lost without
interrupting the operation of the volume. It also determines the amount of
usable space of the volume using this formula:
<Usable size> = <Brick size> * (#Bricks - Redundancy)
All bricks of a disperse set should have the same capacity otherwise, when
the smaller brick becomes full, no additional data will be allowed in the
disperse set.
It's important to note that a configuration with 3 bricks and redundancy 1
will have less usable space (66.7% of the total physical space) than a
configuration with 10 bricks and redundancy 1 (90%). However the first one
will be safer than the second one (roughly the probability of failure of
the second configuration if more than 4.5 times bigger than the first one).
For example, a dispersed volume composed by 6 bricks of 4TB and a redundancy
of 2 will be completely operational even with two bricks inaccessible. However
a third inaccessible brick will bring the volume down because it won't be
possible to read or write to it. The usable space of the volume will be equal
to 16TB.
The implementation of erasure codes in GlusterFS limits the redundancy to a
value smaller than #Bricks / 2 (or equivalently, redundancy * 2 < #Bricks).
Having a redundancy equal to half of the number of bricks would be almost
equivalent to a replica-2 volume, and probably a replicated volume will
perform better in this case.
**Optimal volumes**
One of the worst things erasure codes have in terms of performance is the
RMW (Read-Modify-Write) cycle. Erasure codes operate in blocks of a certain
size and it cannot work with smaller ones. This means that if a user issues
a write of a portion of a file that doesn't fill a full block, it needs to
read the remaining portion from the current contents of the file, merge them,
compute the updated encoded block and, finally, writing the resulting data.
This adds latency, reducing performance when this happens. Some GlusterFS
performance xlators can help to reduce or even eliminate this problem for
some workloads, but it should be taken into account when using dispersed
volumes for a specific use case.
Current implementation of dispersed volumes use blocks of a size that depends
on the number of bricks and redundancy: 512 * (#Bricks - redundancy) bytes.
This value is also known as the stripe size.
Using combinations of #Bricks/redundancy that give a power of two for the
stripe size will make the disperse volume perform better in most workloads
because it's more typical to write information in blocks that are multiple of
two (for example databases, virtual machines and many applications).
These combinations are considered *optimal*.
For example, a configuration with 6 bricks and redundancy 2 will have a stripe
size of 512 * (6 - 2) = 2048 bytes, so it's considered optimal. A configuration
with 7 bricks and redundancy 2 would have a stripe size of 2560 bytes, needing
a RMW cycle for many writes (of course this always depends on the use case).
**To create a dispersed volume**
1. Create a trusted storage pool.
2. Create the dispersed volume:
`# gluster volume create [disperse [<count>]] [redundancy <count>] [transport tcp | rdma | tcp,rdma]`
A dispersed volume can be created by specifying the number of bricks in a
disperse set, by specifying the number of redundancy bricks, or both.
If *disperse* is not specified, or the _&lt;count&gt;_ is missing, the
entire volume will be treated as a single disperse set composed by all
bricks enumerated in the command line.
If *redundancy* is not specified, it is computed automatically to be the
optimal value. If this value does not exist, it's assumed to be '1' and a
warning message is shown:
# gluster volume create test-volume disperse 4 server{1..4}:/bricks/test-volume
There isn't an optimal redundancy value for this configuration. Do you want to create the volume with redundancy 1 ? (y/n)
In all cases where *redundancy* is automatically computed and it's not
equal to '1', a warning message is displayed:
# gluster volume create test-volume disperse 6 server{1..6}:/bricks/test-volume
The optimal redundancy for this configuration is 2. Do you want to create the volume with this value ? (y/n)
_redundancy_ must be greater than 0, and the total number of bricks must
be greater than 2 * _redundancy_. This means that a dispersed volume must
have a minimum of 3 bricks.
If the transport type is not specified, *tcp* is used as the default. You
can also set additional options if required, like in the other volume
types.
> **Note**:
> - Make sure you start your volumes before you try to mount them or
> else client operations after the mount will hang.
> - GlusterFS will fail to create a dispersed volume if more than one brick of a disperse set is present on the same peer.
> ```
# gluster volume create <volname> disperse 3 server1:/brick{1..3}
volume create: <volname>: failed: Multiple bricks of a replicate volume are present on the same server. This setup is not optimal.
Do you still want to continue creating the volume? (y/n)```
> Use the `force` option at the end of command if you want to create the volume in this case.
##Creating Distributed Dispersed Volumes
Distributed dispersed volumes are the equivalent to distributed replicated
volumes, but using dispersed subvolumes instead of replicated ones.
**To create a distributed dispersed volume**
1. Create a trusted storage pool.
2. Create the distributed dispersed volume:
`# gluster volume create disperse <count> [redundancy <count>] [transport tcp | rdma tcp,rdma]`
To create a distributed dispersed volume, the *disperse* keyword and
&lt;count&gt; is mandatory, and the number of bricks specified in the
command line must must be a multiple of the disperse count.
*redundancy* is exactly the same as in the dispersed volume.
If the transport type is not specified, *tcp* is used as the default. You
can also set additional options if required, like in the other volume
types.
> **Note**:
> - Make sure you start your volumes before you try to mount them or
> else client operations after the mount will hang.
> - GlusterFS will fail to create a distributed dispersed volume if more than one brick of a disperse set is present on the same peer.
> ```
# gluster volume create <volname> disperse 3 server1:/brick{1..6}
volume create: <volname>: failed: Multiple bricks of a replicate volume are present on the same server. This setup is not optimal.
Do you still want to continue creating the volume? (y/n)```
> Use the `force` option at the end of command if you want to create the volume in this case.
##Starting Volumes
You must start your volumes before you try to mount them.

View File

@ -36,7 +36,7 @@ The Gluster Console Manager is a command line utility for elastic volume managem
\fB\ volume info [all|<VOLNAME>] \fR
Display information about all volumes, or the specified volume.
.TP
\fB\ volume create <NEW-VOLNAME> [stripe <COUNT>] [replica <COUNT>] [transport <tcp|rdma|tcp,rdma>] <NEW-BRICK> ... \fR
\fB\ volume create <NEW-VOLNAME> [stripe <COUNT>] [replica <COUNT>] [disperse [<COUNT>]] [redundancy <COUNT>] [transport <tcp|rdma|tcp,rdma>] <NEW-BRICK> ... \fR
Create a new volume of the specified type using the specified bricks and transport type (the default transport type is tcp).
To create a volume with both transports (tcp and rdma), give 'transport tcp,rdma' as an option.
.TP