doc: added documentation for dispersed volumes
Change-Id: I8a8368bdbe31af30a239aaf8cc478429e10c3f57 BUG: 1147563 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8885 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
This commit is contained in:
parent
535c425911
commit
36d2975714
@ -52,11 +52,24 @@ start it before attempting to mount it.
|
||||
and performance is critical. In this release, configuration of
|
||||
this volume type is supported only for Map Reduce workloads.
|
||||
|
||||
- **Dispersed** - Dispersed volumes are based on erasure codes,
|
||||
providing space-efficient protection against disk or server failures.
|
||||
It stores an encoded fragment of the original file to each brick in
|
||||
a way that only a subset of the fragments is needed to recover the
|
||||
original file. The number of bricks that can be missing without
|
||||
losing access to data is configured by the administrator on volume
|
||||
creation time.
|
||||
|
||||
- **Distributed Dispersed** - Distributed dispersed volumes distribute
|
||||
files across dispersed subvolumes. This has the same advantages of
|
||||
distribute replicate volumes, but using disperse to store the data
|
||||
into the bricks.
|
||||
|
||||
**To create a new volume**
|
||||
|
||||
- Create a new volume :
|
||||
|
||||
`# gluster volume create [stripe | replica ] [transport tcp | rdma | tcp, rdma] `
|
||||
`# gluster volume create [stripe | replica | disperse] [transport tcp | rdma | tcp, rdma] `
|
||||
|
||||
For example, to create a volume called test-volume consisting of
|
||||
server3:/exp3 and server4:/exp4:
|
||||
@ -389,6 +402,161 @@ of this volume type is supported only for Map Reduce workloads.
|
||||
|
||||
> Use the `force` option at the end of command if you want to create the volume in this case.
|
||||
|
||||
##Creating Dispersed Volumes
|
||||
|
||||
Dispersed volumes are based on erasure codes. It stripes the encoded data of
|
||||
files, with some redundancy addedd, across multiple bricks in the volume. You
|
||||
can use dispersed volumes to have a configurable level of reliability with a
|
||||
minimum space waste.
|
||||
|
||||
**Redundancy**
|
||||
|
||||
Each dispersed volume has a redundancy value defined when the volume is
|
||||
created. This value determines how many bricks can be lost without
|
||||
interrupting the operation of the volume. It also determines the amount of
|
||||
usable space of the volume using this formula:
|
||||
|
||||
<Usable size> = <Brick size> * (#Bricks - Redundancy)
|
||||
|
||||
All bricks of a disperse set should have the same capacity otherwise, when
|
||||
the smaller brick becomes full, no additional data will be allowed in the
|
||||
disperse set.
|
||||
|
||||
It's important to note that a configuration with 3 bricks and redundancy 1
|
||||
will have less usable space (66.7% of the total physical space) than a
|
||||
configuration with 10 bricks and redundancy 1 (90%). However the first one
|
||||
will be safer than the second one (roughly the probability of failure of
|
||||
the second configuration if more than 4.5 times bigger than the first one).
|
||||
|
||||
For example, a dispersed volume composed by 6 bricks of 4TB and a redundancy
|
||||
of 2 will be completely operational even with two bricks inaccessible. However
|
||||
a third inaccessible brick will bring the volume down because it won't be
|
||||
possible to read or write to it. The usable space of the volume will be equal
|
||||
to 16TB.
|
||||
|
||||
The implementation of erasure codes in GlusterFS limits the redundancy to a
|
||||
value smaller than #Bricks / 2 (or equivalently, redundancy * 2 < #Bricks).
|
||||
Having a redundancy equal to half of the number of bricks would be almost
|
||||
equivalent to a replica-2 volume, and probably a replicated volume will
|
||||
perform better in this case.
|
||||
|
||||
**Optimal volumes**
|
||||
|
||||
One of the worst things erasure codes have in terms of performance is the
|
||||
RMW (Read-Modify-Write) cycle. Erasure codes operate in blocks of a certain
|
||||
size and it cannot work with smaller ones. This means that if a user issues
|
||||
a write of a portion of a file that doesn't fill a full block, it needs to
|
||||
read the remaining portion from the current contents of the file, merge them,
|
||||
compute the updated encoded block and, finally, writing the resulting data.
|
||||
|
||||
This adds latency, reducing performance when this happens. Some GlusterFS
|
||||
performance xlators can help to reduce or even eliminate this problem for
|
||||
some workloads, but it should be taken into account when using dispersed
|
||||
volumes for a specific use case.
|
||||
|
||||
Current implementation of dispersed volumes use blocks of a size that depends
|
||||
on the number of bricks and redundancy: 512 * (#Bricks - redundancy) bytes.
|
||||
This value is also known as the stripe size.
|
||||
|
||||
Using combinations of #Bricks/redundancy that give a power of two for the
|
||||
stripe size will make the disperse volume perform better in most workloads
|
||||
because it's more typical to write information in blocks that are multiple of
|
||||
two (for example databases, virtual machines and many applications).
|
||||
|
||||
These combinations are considered *optimal*.
|
||||
|
||||
For example, a configuration with 6 bricks and redundancy 2 will have a stripe
|
||||
size of 512 * (6 - 2) = 2048 bytes, so it's considered optimal. A configuration
|
||||
with 7 bricks and redundancy 2 would have a stripe size of 2560 bytes, needing
|
||||
a RMW cycle for many writes (of course this always depends on the use case).
|
||||
|
||||
**To create a dispersed volume**
|
||||
|
||||
1. Create a trusted storage pool.
|
||||
|
||||
2. Create the dispersed volume:
|
||||
|
||||
`# gluster volume create [disperse [<count>]] [redundancy <count>] [transport tcp | rdma | tcp,rdma]`
|
||||
|
||||
A dispersed volume can be created by specifying the number of bricks in a
|
||||
disperse set, by specifying the number of redundancy bricks, or both.
|
||||
|
||||
If *disperse* is not specified, or the _<count>_ is missing, the
|
||||
entire volume will be treated as a single disperse set composed by all
|
||||
bricks enumerated in the command line.
|
||||
|
||||
If *redundancy* is not specified, it is computed automatically to be the
|
||||
optimal value. If this value does not exist, it's assumed to be '1' and a
|
||||
warning message is shown:
|
||||
|
||||
# gluster volume create test-volume disperse 4 server{1..4}:/bricks/test-volume
|
||||
There isn't an optimal redundancy value for this configuration. Do you want to create the volume with redundancy 1 ? (y/n)
|
||||
|
||||
In all cases where *redundancy* is automatically computed and it's not
|
||||
equal to '1', a warning message is displayed:
|
||||
|
||||
# gluster volume create test-volume disperse 6 server{1..6}:/bricks/test-volume
|
||||
The optimal redundancy for this configuration is 2. Do you want to create the volume with this value ? (y/n)
|
||||
|
||||
_redundancy_ must be greater than 0, and the total number of bricks must
|
||||
be greater than 2 * _redundancy_. This means that a dispersed volume must
|
||||
have a minimum of 3 bricks.
|
||||
|
||||
If the transport type is not specified, *tcp* is used as the default. You
|
||||
can also set additional options if required, like in the other volume
|
||||
types.
|
||||
|
||||
> **Note**:
|
||||
|
||||
> - Make sure you start your volumes before you try to mount them or
|
||||
> else client operations after the mount will hang.
|
||||
|
||||
> - GlusterFS will fail to create a dispersed volume if more than one brick of a disperse set is present on the same peer.
|
||||
|
||||
> ```
|
||||
# gluster volume create <volname> disperse 3 server1:/brick{1..3}
|
||||
volume create: <volname>: failed: Multiple bricks of a replicate volume are present on the same server. This setup is not optimal.
|
||||
Do you still want to continue creating the volume? (y/n)```
|
||||
|
||||
> Use the `force` option at the end of command if you want to create the volume in this case.
|
||||
|
||||
##Creating Distributed Dispersed Volumes
|
||||
|
||||
Distributed dispersed volumes are the equivalent to distributed replicated
|
||||
volumes, but using dispersed subvolumes instead of replicated ones.
|
||||
|
||||
**To create a distributed dispersed volume**
|
||||
|
||||
1. Create a trusted storage pool.
|
||||
|
||||
2. Create the distributed dispersed volume:
|
||||
|
||||
`# gluster volume create disperse <count> [redundancy <count>] [transport tcp | rdma tcp,rdma]`
|
||||
|
||||
To create a distributed dispersed volume, the *disperse* keyword and
|
||||
<count> is mandatory, and the number of bricks specified in the
|
||||
command line must must be a multiple of the disperse count.
|
||||
|
||||
*redundancy* is exactly the same as in the dispersed volume.
|
||||
|
||||
If the transport type is not specified, *tcp* is used as the default. You
|
||||
can also set additional options if required, like in the other volume
|
||||
types.
|
||||
|
||||
> **Note**:
|
||||
|
||||
> - Make sure you start your volumes before you try to mount them or
|
||||
> else client operations after the mount will hang.
|
||||
|
||||
> - GlusterFS will fail to create a distributed dispersed volume if more than one brick of a disperse set is present on the same peer.
|
||||
|
||||
> ```
|
||||
# gluster volume create <volname> disperse 3 server1:/brick{1..6}
|
||||
volume create: <volname>: failed: Multiple bricks of a replicate volume are present on the same server. This setup is not optimal.
|
||||
Do you still want to continue creating the volume? (y/n)```
|
||||
|
||||
> Use the `force` option at the end of command if you want to create the volume in this case.
|
||||
|
||||
##Starting Volumes
|
||||
|
||||
You must start your volumes before you try to mount them.
|
||||
|
@ -36,7 +36,7 @@ The Gluster Console Manager is a command line utility for elastic volume managem
|
||||
\fB\ volume info [all|<VOLNAME>] \fR
|
||||
Display information about all volumes, or the specified volume.
|
||||
.TP
|
||||
\fB\ volume create <NEW-VOLNAME> [stripe <COUNT>] [replica <COUNT>] [transport <tcp|rdma|tcp,rdma>] <NEW-BRICK> ... \fR
|
||||
\fB\ volume create <NEW-VOLNAME> [stripe <COUNT>] [replica <COUNT>] [disperse [<COUNT>]] [redundancy <COUNT>] [transport <tcp|rdma|tcp,rdma>] <NEW-BRICK> ... \fR
|
||||
Create a new volume of the specified type using the specified bricks and transport type (the default transport type is tcp).
|
||||
To create a volume with both transports (tcp and rdma), give 'transport tcp,rdma' as an option.
|
||||
.TP
|
||||
|
Loading…
x
Reference in New Issue
Block a user