Microsoft Clustering Services (MSCS) and Exchange Co- Existence - Part 1
Mission critical applications are always high availability demanding. They need to stay online forever or at least 99.9% throughout their lifecycle.
Considering a bunch of machines serving your requests it is almost impossible to keep them alive and online forever. However, we can always manage to provide uninterrupted services with the least downtime. There what we call it 99.9% uptime of services.
So are you wondering how the companies offering 99.9% high availability manage to deliver such a promising answer to their customers? Yes, most you might have guessed it correct. Answer would not be that tough, of course.
They achieve this least downtime with the help of server clustering. Clusters are mostly known as very complicated systems made up of many software and hardware components. Sure, they are a bit complicated than a standalone server because they use their own architecture to provide high availability and their ability to communicate and share resources with their partner nodes make them distinct from a standalone server system.
Clustering can be configured with UNIX, Linux and Windows and other custom OSs. Here, we are discussing a little about Microsoft Clustering Services, very well known as MSCS. Being very precise while talking about MSCS we are talking about server clustering. Though there are several documents available on the web they don’t seem have been providing consolidated information about all basics. We are also covering up a bit of Exchange Clustering in this document.
Officially, Microsoft defines a cluster as, “A server cluster is a group of independent computer systems, known as nodes, working together as a single system to ensure that critical applications and resources remain available to clients.” You might notice that they say A Server Cluster. When it is said about a server cluster what is it?
Basically, Microsoft Clustering Services MSCS can provide you other clustering solution like Network Load Balancing (NLB ) and it is completely different than the server clustering. Though NLB doesn’t work the way server clustering works yet it is known as a cluster too. The only difference between a NLB cluster and a server cluster is the way they work.
An NLB is high availability solution architected to provide highly available systems through TCP/IP stack with few common resources like shared storage shared among all nodes while the server clusters are based on their architecture where every node participating in a cluster shares resources with the other and allows ownership of its resources to the other nodes (subject to configuration).
Let’s take a look at the server cluster now. How it looks and how it is designed to share its resources with other nodes. To make this document helpful to understand the concepts of clustering and at the same time handy we will be talking about Windows Server 2003 clustering primarily.
In Windows Server 2003 and later operating systems a server cluster is made up of following components:
Cluster Service:
Cluster service is the component responsible for handling:
Cluster objects and their configurations – Cluster Objects are server cluster networks, cluster network interfaces, nodes, resources of each type, nodes in a server cluster, cluster groups, etc. (Explained Later in Cluster Objects)
Local restart policies – Local restart policy is a restart behavior defined by an administrator for a particular resource. Usually it is defined using the advanced tab in properties of an individual cluster resource.
Coordinating with other instances of cluster service in a cluster
Event Notification – Event notifications are used to communicate occurred events on the a specific clustered server.
Facilitating Communications among other software components
Performing Failover Operations
Resource DLLs:
As the name suggests they are the bunch of DLL files containing instructions that manage cluster resources of one or more types. The most important role they play is of detection of application failures.
Resource Monitors:
Resource monitors is a component which works as a agent between cluster service and resource DLLs. When; the cluster service requests a resource the resource monitor routes that request to the corresponding resource DLL. It is also responsible for notifying the failure and success events to the cluster service though it won’t initiate any operation itself.
Cluster Administration Application:
An application program that is used to administer, configure and manage the server clusters.
Cluster Automation Server:
Cluster Automation Server is a component which provides scripting capabilities via COM objects.
Cluster Database:
One of the very important components also known as cluster registry, located at %systemroot%\Cluster\CLUSDB and contains the information about physical and logical elements of a cluster. A copy of cluster registry also stored on the quorum disk inside the file CHKxxx.TMP
The cluster service replicates the changes in cluster database to all of the nodes in a cluster. These changes are replicated across all cluster nodes every 4 hours and even when a cluster resource is taken offline, taken online, or the resource monitor detects changes in registry key on the clustered server. The cluster service that owns the quorum resource at any given point consistently maintains updates to the database copy. It also makes sure that the quorum resource recovery log contains the most recent cluster database update. During these update and replication process if the cluster service fails to update replicate the changes among all nodes the changes are logged to the quorum disk. Quorum resource is basically used for maintaining the cluster consistency.
Network and Disk Drivers:
Clusters operate on two types of network, internal and public networks. The internal networks are used for intra cluster communications and ping tests to another nodes. This whole stuff is handled by the Cluster Network Driver coded into a driver file clusnet.sys.
Clusters also use the shared storage media to host quorum and the application generated data or databases. To let this sharing take place smoothly only one node should own a disk resource at any given point of time. Though the physical disk resources are shared and can be accessed by other nodes, it may become troublesome when more than one node try accessing the same disk resource. To prevent these disk resources should be reserved for the node which owns them. This sharing and reservations is controlled by cluster disk drivers written into the file named clusdisk.sys.
In the next part of this series we will take a look at other concepts used in Microsoft server clustering. You can download all of the parts of this series later when the last topic explaining how exchange uses clustering will be published. Meanwhile, we appreciate your comments, feedback and suggestions. If you have further personal feedback to be given or I can answer any questions for you, you may please feel free to write me at milindn @ msexchangegeek . com


Comments