mirror of
https://github.com/samba-team/samba.git
synced 2025-01-11 05:18:09 +03:00
8f8a9f0190
(This used to be commit 9f672c26d6
)
501 lines
19 KiB
XML
501 lines
19 KiB
XML
<?xml version="1.0" encoding="iso-8859-1"?>
|
|
<!DOCTYPE chapter PUBLIC "-//Samba-Team//DTD DocBook V4.2-Based Variant V1.0//EN" "http://www.samba.org/samba/DTD/samba-doc">
|
|
<chapter id="SambaHA">
|
|
<chapterinfo>
|
|
&author.jht;
|
|
&author.jeremy;
|
|
</chapterinfo>
|
|
|
|
<title>High Availability</title>
|
|
|
|
<sect1>
|
|
<title>Features and Benefits</title>
|
|
|
|
<para>
|
|
<indexterm><primary>availability</primary></indexterm>
|
|
<indexterm><primary>intolerance</primary></indexterm>
|
|
<indexterm><primary>vital task</primary></indexterm>
|
|
Network administrators are often concerned about the availability of file and print
|
|
services. Network users are inclined toward intolerance of the services they depend
|
|
on to perform vital task responsibilities.
|
|
</para>
|
|
|
|
<para>
|
|
A sign in a computer room served to remind staff of their responsibilities. It read:
|
|
</para>
|
|
|
|
<blockquote>
|
|
<para>
|
|
<indexterm><primary>fail</primary></indexterm>
|
|
<indexterm><primary>managed by humans</primary></indexterm>
|
|
<indexterm><primary>economically wise</primary></indexterm>
|
|
<indexterm><primary>anticipate failure</primary></indexterm>
|
|
All humans fail, in both great and small ways we fail continually. Machines fail too.
|
|
Computers are machines that are managed by humans, the fallout from failure
|
|
can be spectacular. Your responsibility is to deal with failure, to anticipate it
|
|
and to eliminate it as far as is humanly and economically wise to achieve.
|
|
Are your actions part of the problem or part of the solution?
|
|
</para>
|
|
</blockquote>
|
|
|
|
<para>
|
|
If we are to deal with failure in a planned and productive manner, then first we must
|
|
understand the problem. That is the purpose of this chapter.
|
|
</para>
|
|
|
|
<para>
|
|
<indexterm><primary>high availability</primary></indexterm>
|
|
<indexterm><primary>CIFS/SMB</primary></indexterm>
|
|
<indexterm><primary>state of knowledge</primary></indexterm>
|
|
Parenthetically, in the following discussion there are seeds of information on how to
|
|
provision a network infrastructure against failure. Our purpose here is not to provide
|
|
a lengthy dissertation on the subject of high availability. Additionally, we have made
|
|
a conscious decision to not provide detailed working examples of high availability
|
|
solutions; instead we present an overview of the issues in the hope that someone will
|
|
rise to the challenge of providing a detailed document that is focused purely on
|
|
presentation of the current state of knowledge and practice in high availability as it
|
|
applies to the deployment of Samba and other CIFS/SMB technologies.
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1>
|
|
<title>Technical Discussion</title>
|
|
|
|
<para>
|
|
<indexterm><primary>SambaXP conference</primary></indexterm>
|
|
<indexterm><primary>Germany</primary></indexterm>
|
|
<indexterm><primary>inspired structure</primary></indexterm>
|
|
The following summary was part of a presentation by Jeremy Allison at the SambaXP 2003
|
|
conference that was held at Goettingen, Germany, in April 2003. Material has been added
|
|
from other sources, but it was Jeremy who inspired the structure that follows.
|
|
</para>
|
|
|
|
<sect2>
|
|
<title>The Ultimate Goal</title>
|
|
|
|
<para>
|
|
<indexterm><primary>clustering technologies</primary></indexterm>
|
|
<indexterm><primary>affordable power</primary></indexterm>
|
|
<indexterm><primary>unstoppable services</primary></indexterm>
|
|
All clustering technologies aim to achieve one or more of the following:
|
|
</para>
|
|
|
|
<itemizedlist>
|
|
<listitem><para>Obtain the maximum affordable computational power.</para></listitem>
|
|
<listitem><para>Obtain faster program execution.</para></listitem>
|
|
<listitem><para>Deliver unstoppable services.</para></listitem>
|
|
<listitem><para>Avert points of failure.</para></listitem>
|
|
<listitem><para>Exact most effective utilization of resources.</para></listitem>
|
|
</itemizedlist>
|
|
|
|
<para>
|
|
A clustered file server ideally has the following properties:
|
|
<indexterm><primary>clustered file server</primary></indexterm>
|
|
<indexterm><primary>connect transparently</primary></indexterm>
|
|
<indexterm><primary>transparently reconnected</primary></indexterm>
|
|
<indexterm><primary>distributed file system</primary></indexterm>
|
|
</para>
|
|
|
|
<itemizedlist>
|
|
<listitem><para>All clients can connect transparently to any server.</para></listitem>
|
|
<listitem><para>A server can fail and clients are transparently reconnected to another server.</para></listitem>
|
|
<listitem><para>All servers serve out the same set of files.</para></listitem>
|
|
<listitem><para>All file changes are immediately seen on all servers.</para>
|
|
<itemizedlist><listitem><para>Requires a distributed file system.</para></listitem></itemizedlist></listitem>
|
|
<listitem><para>Infinite ability to scale by adding more servers or disks.</para></listitem>
|
|
</itemizedlist>
|
|
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Why Is This So Hard?</title>
|
|
|
|
<para>
|
|
In short, the problem is one of <emphasis>state</emphasis>.
|
|
</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
<indexterm><primary>state information</primary></indexterm>
|
|
All TCP/IP connections are dependent on state information.
|
|
</para>
|
|
<para>
|
|
<indexterm><primary>TCP failover</primary></indexterm>
|
|
The TCP connection involves a packet sequence number. This
|
|
sequence number would need to be dynamically updated on all
|
|
machines in the cluster to effect seamless TCP failover.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<indexterm><primary>CIFS/SMB</primary></indexterm>
|
|
<indexterm><primary>TCP</primary></indexterm>
|
|
CIFS/SMB (the Windows networking protocols) uses TCP connections.
|
|
</para>
|
|
<para>
|
|
This means that from a basic design perspective, failover is not
|
|
seriously considered.
|
|
<itemizedlist>
|
|
<listitem><para>
|
|
All current SMB clusters are failover solutions
|
|
&smbmdash; they rely on the clients to reconnect. They provide server
|
|
failover, but clients can lose information due to a server failure.
|
|
<indexterm><primary>server failure</primary></indexterm>
|
|
</para></listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Servers keep state information about client connections.
|
|
<itemizedlist>
|
|
<indexterm><primary>state</primary></indexterm>
|
|
<listitem><para>CIFS/SMB involves a lot of state.</para></listitem>
|
|
<listitem><para>Every file open must be compared with other open files
|
|
to check share modes.</para></listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<sect3>
|
|
<title>The Front-End Challenge</title>
|
|
|
|
<para>
|
|
<indexterm><primary>cluster servers</primary></indexterm>
|
|
<indexterm><primary>single server</primary></indexterm>
|
|
<indexterm><primary>TCP data streams</primary></indexterm>
|
|
<indexterm><primary>front-end virtual server</primary></indexterm>
|
|
<indexterm><primary>virtual server</primary></indexterm>
|
|
<indexterm><primary>de-multiplex</primary></indexterm>
|
|
<indexterm><primary>SMB</primary></indexterm>
|
|
To make it possible for a cluster of file servers to appear as a single server that has one
|
|
name and one IP address, the incoming TCP data streams from clients must be processed by the
|
|
front-end virtual server. This server must de-multiplex the incoming packets at the SMB protocol
|
|
layer level and then feed the SMB packet to different servers in the cluster.
|
|
</para>
|
|
|
|
<para>
|
|
<indexterm><primary>IPC$ connections</primary></indexterm>
|
|
<indexterm><primary>RPC calls</primary></indexterm>
|
|
One could split all IPC$ connections and RPC calls to one server to handle printing and user
|
|
lookup requirements. RPC printing handles are shared between different IPC4 sessions &smbmdash; it is
|
|
hard to split this across clustered servers!
|
|
</para>
|
|
|
|
<para>
|
|
Conceptually speaking, all other servers would then provide only file services. This is a simpler
|
|
problem to concentrate on.
|
|
</para>
|
|
|
|
</sect3>
|
|
|
|
<sect3>
|
|
<title>Demultiplexing SMB Requests</title>
|
|
|
|
<para>
|
|
<indexterm><primary>SMB requests</primary></indexterm>
|
|
<indexterm><primary>SMB state information</primary></indexterm>
|
|
<indexterm><primary>front-end virtual server</primary></indexterm>
|
|
<indexterm><primary>complicated problem</primary></indexterm>
|
|
De-multiplexing of SMB requests requires knowledge of SMB state information,
|
|
all of which must be held by the front-end <emphasis>virtual</emphasis> server.
|
|
This is a perplexing and complicated problem to solve.
|
|
</para>
|
|
|
|
<para>
|
|
<indexterm><primary>vuid</primary></indexterm>
|
|
<indexterm><primary>tid</primary></indexterm>
|
|
<indexterm><primary>fid</primary></indexterm>
|
|
Windows XP and later have changed semantics so state information (vuid, tid, fid)
|
|
must match for a successful operation. This makes things simpler than before and is a
|
|
positive step forward.
|
|
</para>
|
|
|
|
<para>
|
|
<indexterm><primary>SMB requests</primary></indexterm>
|
|
<indexterm><primary>Terminal Server</primary></indexterm>
|
|
SMB requests are sent by vuid to their associated server. No code exists today to
|
|
effect this solution. This problem is conceptually similar to the problem of
|
|
correctly handling requests from multiple requests from Windows 2000
|
|
Terminal Server in Samba.
|
|
</para>
|
|
|
|
<para>
|
|
<indexterm><primary>de-multiplexing</primary></indexterm>
|
|
One possibility is to start by exposing the server pool to clients directly.
|
|
This could eliminate the de-multiplexing step.
|
|
</para>
|
|
|
|
</sect3>
|
|
|
|
<sect3>
|
|
<title>The Distributed File System Challenge</title>
|
|
|
|
<para>
|
|
<indexterm><primary>Distributed File Systems</primary></indexterm>
|
|
There exists many distributed file systems for UNIX and Linux.
|
|
</para>
|
|
|
|
<para>
|
|
<indexterm><primary>backend</primary></indexterm>
|
|
<indexterm><primary>SMB semantics</primary></indexterm>
|
|
<indexterm><primary>share modes</primary></indexterm>
|
|
<indexterm><primary>locking</primary></indexterm>
|
|
<indexterm><primary>oplock</primary></indexterm>
|
|
<indexterm><primary>distributed file systems</primary></indexterm>
|
|
Many could be adopted to backend our cluster, so long as awareness of SMB
|
|
semantics is kept in mind (share modes, locking, and oplock issues in particular).
|
|
Common free distributed file systems include:
|
|
<indexterm><primary>NFS</primary></indexterm>
|
|
<indexterm><primary>AFS</primary></indexterm>
|
|
<indexterm><primary>OpenGFS</primary></indexterm>
|
|
<indexterm><primary>Lustre</primary></indexterm>
|
|
</para>
|
|
|
|
<itemizedlist>
|
|
<listitem><para>NFS</para></listitem>
|
|
<listitem><para>AFS</para></listitem>
|
|
<listitem><para>OpenGFS</para></listitem>
|
|
<listitem><para>Lustre</para></listitem>
|
|
</itemizedlist>
|
|
|
|
<para>
|
|
<indexterm><primary>server pool</primary></indexterm>
|
|
The server pool (cluster) can use any distributed file system backend if all SMB
|
|
semantics are performed within this pool.
|
|
</para>
|
|
|
|
</sect3>
|
|
|
|
<sect3>
|
|
<title>Restrictive Constraints on Distributed File Systems</title>
|
|
|
|
<para>
|
|
<indexterm><primary>SMB services</primary></indexterm>
|
|
<indexterm><primary>oplock handling</primary></indexterm>
|
|
<indexterm><primary>server pool</primary></indexterm>
|
|
<indexterm><primary>backend file system pool</primary></indexterm>
|
|
Where a clustered server provides purely SMB services, oplock handling
|
|
may be done within the server pool without imposing a need for this to
|
|
be passed to the backend file system pool.
|
|
</para>
|
|
|
|
<para>
|
|
<indexterm><primary>NFS</primary></indexterm>
|
|
<indexterm><primary>interoperability</primary></indexterm>
|
|
On the other hand, where the server pool also provides NFS or other file services,
|
|
it will be essential that the implementation be oplock-aware so it can
|
|
interoperate with SMB services. This is a significant challenge today. A failure
|
|
to provide this interoperability will result in a significant loss of performance that will be
|
|
sorely noted by users of Microsoft Windows clients.
|
|
</para>
|
|
|
|
<para>
|
|
Last, all state information must be shared across the server pool.
|
|
</para>
|
|
|
|
</sect3>
|
|
|
|
<sect3>
|
|
<title>Server Pool Communications</title>
|
|
|
|
<para>
|
|
<indexterm><primary>POSIX semantics</primary></indexterm>
|
|
<indexterm><primary>SMB</primary></indexterm>
|
|
<indexterm><primary>POSIX locks</primary></indexterm>
|
|
<indexterm><primary>SMB locks</primary></indexterm>
|
|
Most backend file systems support POSIX file semantics. This makes it difficult
|
|
to push SMB semantics back into the file system. POSIX locks have different properties
|
|
and semantics from SMB locks.
|
|
</para>
|
|
|
|
<para>
|
|
<indexterm><primary>smbd</primary></indexterm>
|
|
<indexterm><primary>tdb</primary></indexterm>
|
|
<indexterm><primary>Clustered smbds</primary></indexterm>
|
|
All <command>smbd</command> processes in the server pool must of necessity communicate
|
|
very quickly. For this, the current <parameter>tdb</parameter> file structure that Samba
|
|
uses is not suitable for use across a network. Clustered <command>smbd</command>s must use something else.
|
|
</para>
|
|
|
|
</sect3>
|
|
|
|
<sect3>
|
|
<title>Server Pool Communications Demands</title>
|
|
|
|
<para>
|
|
High-speed interserver communications in the server pool is a design prerequisite
|
|
for a fully functional system. Possibilities for this include:
|
|
</para>
|
|
|
|
<itemizedlist>
|
|
<indexterm><primary>Myrinet</primary></indexterm>
|
|
<indexterm><primary>scalable coherent interface</primary><see>SCI</see></indexterm>
|
|
<listitem><para>
|
|
Proprietary shared memory bus (example: Myrinet or SCI [scalable coherent interface]).
|
|
These are high-cost items.
|
|
</para></listitem>
|
|
|
|
<listitem><para>
|
|
Gigabit Ethernet (now quite affordable).
|
|
</para></listitem>
|
|
|
|
<listitem><para>
|
|
Raw Ethernet framing (to bypass TCP and UDP overheads).
|
|
</para></listitem>
|
|
</itemizedlist>
|
|
|
|
<para>
|
|
We have yet to identify metrics for performance demands to enable this to happen
|
|
effectively.
|
|
</para>
|
|
|
|
</sect3>
|
|
|
|
<sect3>
|
|
<title>Required Modifications to Samba</title>
|
|
|
|
<para>
|
|
Samba needs to be significantly modified to work with a high-speed server interconnect
|
|
system to permit transparent failover clustering.
|
|
</para>
|
|
|
|
<para>
|
|
Particular functions inside Samba that will be affected include:
|
|
</para>
|
|
|
|
<itemizedlist>
|
|
<listitem><para>
|
|
The locking database, oplock notifications,
|
|
and the share mode database.
|
|
</para></listitem>
|
|
|
|
<listitem><para>
|
|
<indexterm><primary>failure semantics</primary></indexterm>
|
|
<indexterm><primary>oplock messages</primary></indexterm>
|
|
Failure semantics need to be defined. Samba behaves the same way as Windows.
|
|
When oplock messages fail, a file open request is allowed, but this is
|
|
potentially dangerous in a clustered environment. So how should interserver
|
|
pool failure semantics function, and how should such functionality be implemented?
|
|
</para></listitem>
|
|
|
|
<listitem><para>
|
|
Should this be implemented using a point-to-point lock manager, or can this
|
|
be done using multicast techniques?
|
|
</para></listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
</sect3>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>A Simple Solution</title>
|
|
|
|
<para>
|
|
<indexterm><primary>failover servers</primary></indexterm>
|
|
<indexterm><primary>exported file system</primary></indexterm>
|
|
<indexterm><primary>distributed locking protocol</primary></indexterm>
|
|
Allowing failover servers to handle different functions within the exported file system
|
|
removes the problem of requiring a distributed locking protocol.
|
|
</para>
|
|
|
|
<para>
|
|
<indexterm><primary>high-speed server interconnect</primary></indexterm>
|
|
<indexterm><primary>complex file name space</primary></indexterm>
|
|
If only one server is active in a pair, the need for high-speed server interconnect is avoided.
|
|
This allows the use of existing high-availability solutions, instead of inventing a new one.
|
|
This simpler solution comes at a price &smbmdash; the cost of which is the need to manage a more
|
|
complex file name space. Since there is now not a single file system, administrators
|
|
must remember where all services are located &smbmdash; a complexity not easily dealt with.
|
|
</para>
|
|
|
|
<para>
|
|
<indexterm><primary>virtual server</primary></indexterm>
|
|
The <emphasis>virtual server</emphasis> is still needed to redirect requests to backend
|
|
servers. Backend file space integrity is the responsibility of the administrator.
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>High-Availability Server Products</title>
|
|
|
|
<para>
|
|
<indexterm><primary>resource failover</primary></indexterm>
|
|
<indexterm><primary>high-availability services</primary></indexterm>
|
|
<indexterm><primary>dedicated heartbeat</primary></indexterm>
|
|
<indexterm><primary>LAN</primary></indexterm>
|
|
<indexterm><primary>failover process</primary></indexterm>
|
|
Failover servers must communicate in order to handle resource failover. This is essential
|
|
for high-availability services. The use of a dedicated heartbeat is a common technique to
|
|
introduce some intelligence into the failover process. This is often done over a dedicated
|
|
link (LAN or serial).
|
|
</para>
|
|
|
|
<para>
|
|
<indexterm><primary>SCSI</primary></indexterm>
|
|
<indexterm><primary>Red Hat Cluster Manager</primary></indexterm>
|
|
<indexterm><primary>Microsoft Wolfpack</primary></indexterm>
|
|
<indexterm><primary>Fiber Channel</primary></indexterm>
|
|
<indexterm><primary>failover communication</primary></indexterm>
|
|
Many failover solutions (like Red Hat Cluster Manager and Microsoft Wolfpack)
|
|
can use a shared SCSI of Fiber Channel disk storage array for failover communication.
|
|
Information regarding Red Hat high availability solutions for Samba may be obtained from
|
|
<ulink url="http://www.redhat.com/docs/manuals/enterprise/RHEL-AS-2.1-Manual/cluster-manager/s1-service-samba.html">www.redhat.com</ulink>.
|
|
</para>
|
|
|
|
<para>
|
|
<indexterm><primary>Linux High Availability project</primary></indexterm>
|
|
The Linux High Availability project is a resource worthy of consultation if your desire is
|
|
to build a highly available Samba file server solution. Please consult the home page at
|
|
<ulink url="http://www.linux-ha.org/">www.linux-ha.org/</ulink>.
|
|
</para>
|
|
|
|
<para>
|
|
<indexterm><primary>backend failures</primary></indexterm>
|
|
<indexterm><primary>continuity of service</primary></indexterm>
|
|
Front-end server complexity remains a challenge for high availability because it must deal
|
|
gracefully with backend failures, while at the same time providing continuity of service
|
|
to all network clients.
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>MS-DFS: The Poor Man's Cluster</title>
|
|
|
|
<para>
|
|
<indexterm><primary>MS-DFS</primary></indexterm>
|
|
<indexterm><primary>DFS</primary><see>MS-DFS, Distributed File Systems</see></indexterm>
|
|
MS-DFS links can be used to redirect clients to disparate backend servers. This pushes
|
|
complexity back to the network client, something already included by Microsoft.
|
|
MS-DFS creates the illusion of a simple, continuous file system name space that works even
|
|
at the file level.
|
|
</para>
|
|
|
|
<para>
|
|
Above all, at the cost of complexity of management, a distributed system (pseudo-cluster) can
|
|
be created using existing Samba functionality.
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Conclusions</title>
|
|
|
|
<itemizedlist>
|
|
<listitem><para>Transparent SMB clustering is hard to do!</para></listitem>
|
|
<listitem><para>Client failover is the best we can do today.</para></listitem>
|
|
<listitem><para>Much more work is needed before a practical and manageable high-availability transparent cluster solution will be possible.</para></listitem>
|
|
<listitem><para>MS-DFS can be used to create the illusion of a single transparent cluster.</para></listitem>
|
|
</itemizedlist>
|
|
|
|
</sect2>
|
|
|
|
</sect1>
|
|
</chapter>
|