Introducing Microsoft Cluster Service (MSCS) in the Windows Server 2003 Family

 

Mohan Rao Cavale
Microsoft Corporation

November 2002

Applies to:
    Windows® Server 2003
    Windows® Server 2003, Enterprise Edition
    Windows® Server 2003, Datacenter Edition
    Microsoft Cluster Service

Summary: Learn how to easily perform a sanity check of your application within a cluster environment without having to make any changes to your application's code. This paper focuses on Cluster Service, one of three Microsoft server technologies that support clustering. (15 printed pages)

Contents

Introduction
Three Technologies for Clustering
Failover Capability Through Microsoft Cluster Service
Cluster Service Architecture
A Cluster-Unaware Application
High Availability Notepad
Conclusion

Introduction

Delivering a great quality application with a rich feature set isn't enough in all cases—increasingly, it must also meet high availability criteria. Have you avoided taking your application to the next level because cluster technology seems too daunting to understand and use? With Microsoft's® Cluster Service— introduced in Windows® NT™ 4 and available in the Windows Server 2003 family, developers have at their disposal straightforward tools to deploy applications in a clustered environment. These include the ability to enlist any application in a cluster as a generic application, and the ability to control application configuration by means of Window scripting.

A cluster connects two or more servers together so that they appear as a single computer to clients. Connecting servers in a cluster allows for workload sharing, enables a single point of operation/management, and provides a path for scaling to meet increased demand. Thus, clustering gives you the ability to produce high availability applications.

This paper focuses on Cluster Service, one of three Microsoft server technologies that support clustering. We demonstrate how to easily perform a sanity check of your application within a cluster environment without having to make any changes to your application's code.

Three Technologies for Clustering

Microsoft servers provide three technologies to support clustering: Network Load Balancing (NLB), Component Load Balancing (CLB), and Microsoft Cluster Service (MSCS).

Network Load Balancing

Network Load Balancing acts as a front-end cluster, distributing incoming IP traffic across a cluster of servers, and is ideal for enabling incremental scalability and outstanding availability for e-commerce Web sites. Up to 32 computers running a member of the Windows Server 2003 family can be connected to share a single virtual IP address. NLB enhances scalability by distributing its client requests across multiple servers within the cluster. As traffic increases, additional servers can be added to the cluster; up to 32 servers are possible in any one cluster. NLB also provides high availability by automatically detecting the failure of a server and repartitioning client traffic among the remaining servers within 10 seconds, while it provides users with continuous service.

Component Load Balancing

Component Load Balancing distributes workload across multiple servers running a site's business logic. It provides for dynamic balancing of COM+ components across a set of up to eight identical servers. In CLB, the COM+ components live on servers in a separate, COM+ cluster. Calls to activate COM+ components are load balanced to different servers within the COM+ cluster. CLB complements both NLB and Cluster Service by acting on the middle tier of a multi-tiered clustered network. CLB is provided as a feature of Application Center 2000. Both CLB and Microsoft Cluster Service can run on the same group of machines.

Cluster Service

Cluster Service acts as a back-end cluster; it provides high availability for applications such as databases, messaging and file and print services. MSCS attempts to minimize the effect of failure on the system as any node (a server in the cluster) fails or is taken offline.

Figure 1. Three Microsoft server technologies support clustering

Failover Capability Through Microsoft Cluster Service

MSCS failover capability is achieved through redundancy across the multiple connected machines in the cluster, each with independent failure states. Redundancy requires that applications be installed on multiple servers within the cluster. However, an application is online on only one node at any point in time. As that application fails, or that server is taken down, the application is restarted on another node. The Windows Server 2003, Datacenter Edition supports up to 8 nodes in a cluster.

Each node has its own memory, system disk, operating system and subset of the cluster's resources. If a node fails, the other node takes ownership of the failed node's resources (this process is known as "failover"). Microsoft Cluster Service then registers the network address for the resource on the new node so that client traffic is routed to the system that is available and now owns the resource. When the failed resource is later brought back online, MSCS can be configured to redistribute resources and client requests appropriately (this process is known as "failback"). To allow the application to resume at the point at which failover occurred, the nodes must have access to shared storage where application state is maintained.

Note that Microsoft Cluster Service is designed to provide high availability, rather than true fault tolerance. The phrase "fault tolerant" is generally used to describe technology that offers a higher level of resilience and recovery. Fault-tolerant servers typically use a high degree of hardware or data redundancy, combined with specialized software, to provide near-instantaneous recovery from any single hardware or software fault. These solutions cost significantly more than a clustering solution because you must pay for redundant hardware that waits idly for a fault from which to recover. Microsoft Cluster Service provides a very good high-availability solution using standard, inexpensive hardware, while it maximizes computing resources.

Microsoft Cluster Service is based on the shared-nothing clustering model. The shared-nothing model dictates that while several nodes in the cluster may have access to a device or resource, the resource is owned and managed by only one system at a time. (In an MSCS cluster, a resource is defined as any physical or logical component that can be brought online and taken offline, managed in a cluster, hosted by only one node at a time and moved between nodes.)

Figure 2. Microsoft Cluster Service

Cluster Service Architecture

Microsoft Cluster Service is comprised of three key components: the Cluster Service, Resource Monitor and Resource DLLs. Additionally, Cluster Administrator allows for the production of Extension DLLs for management capability.

The Cluster Service

The Cluster Service is the core component and runs as a high-priority system service. The Cluster Service controls cluster activities and performs such tasks as coordinating event notification, facilitating communication between cluster components, handling failover operations and managing the configuration. Each cluster node runs its own Cluster Service.

The Resource Monitor

The Resource Monitor is an interface between the Cluster Service and the cluster resources, and runs as an independent process. The Cluster Service uses the Resource Monitor to communicate with the resource DLLs. The DLL handles all communication with the resource, so hosting the DLL in a Resource Monitor shields the Cluster Service from resources that misbehave or stop functioning. Multiple copies of the Resource Monitor can be running on a single node, thereby providing a means by which unpredictable resources can be isolated from other resources.

When the Cluster Service needs to perform an operation on a resource, it sends the request to the Resource Monitor assigned to that resource. If the Resource Monitor does not have a DLL in its process that can handle that type of resource, it uses the registration information to load the DLL associated with that resource type. It then passes the Cluster Service's request to one of the DLL's entry point functions. The resource DLL handles the details of the operation so as to meet the specific needs of the resource.

The Resource DLL

The third key Microsoft Cluster Service component is the resource DLL. The Resource Monitor and resource DLL communicate using the Resource API, which is a collection of entry points, callback functions and related structures and macros used to manage resources.

To the Cluster Service, a resource is any physical or logical component that can be managed. Examples of resources are disks, network names, IP addresses, databases, Web sites, application programs and any other entity that can be brought online and taken offline. Resources are organized by type. Resource types include physical hardware (such as disk drives) and logical items (such as IP addresses, file shares and generic applications).

Every resource uses a resource DLL, a largely passive translation layer between the Resource Monitor and the resource. The Resource Monitor calls the entry point functions of the resource DLL to check the status of the resource and to bring the resource online and offline. The resource DLL is responsible for communicating with its resource through any convenient IPC mechanism to implement these methods.

Applications that implement their own resource DLLs to communicate with the Cluster Service and that use the Cluster API to request and update cluster information are defined as cluster-aware applications. Applications and services that do not use the Cluster or Resource APIs and cluster control code functions are unaware of clustering and have no knowledge that Cluster Service is running. These cluster-unaware applications are generally managed as generic applications or services.

Both cluster-aware and cluster-unaware applications run on a cluster node and can be managed as cluster resources. However, only cluster-aware applications can take advantage of features offered by Cluster Service through the Cluster API. Developing a cluster-aware application requires building a custom resource type. The custom resource type allows developers to enable their application to respond and react as necessary as various events occur within the cluster (for example, the node is going offline, therefore close the database connection).

For most applications that are required to run in a cluster, it will be desirable to invest the time and resources to develop a custom resource type. However, applications can be initially tested within a cluster environment without making modifications to the application's code or creating a new resource type. Within the Windows Server 2003 family, unmodified applications can participate at a basic level as "cluster-unaware" applications. The Cluster Service provides a Generic Application resource type specifically for this purpose.

Cluster Administrator Extension DLL

Cluster Administrator Extension DLLs provide application-specific management features from within the Cluster Administrator, allowing users to manage their applications in the same manner whether the application is running from within, or outside of a cluster. Developers can provide application management features within the Cluster Administrator framework, or simply link to their existing management tool.

Developers extend the functionality of Cluster Administrator by writing an extension DLL. The Cluster Administrator application communicates with extension DLLs via a defined set of COM interfaces. Extension DLLs must implement a specific set of interfaces and be registered on each node of the cluster.

Figure 3. Key components: Cluster Service, Resource Monitor and Resource DLLs

A Cluster-Unaware Application

Applications or services that do not provide their own resource DLLs can still be configured into the cluster environment. Cluster Service in the Windows Server 2003 family includes a generic resource DLL for just this purpose—the Generic Application resource DLL and the Generic Service resource DLL. The Cluster Service treats these applications or services as generic, cluster-unaware applications or services.

The Generic resource DLL provides only very basic control. For example, the Generic Application resource DLL checks for application failure by determining whether the application's process still exists and takes the application offline by terminating the process. But it has no dependencies on other resources, and it provides a straightforward way to test an application in a clustered environment.

High Availability Notepad

Not all applications will function effectively in a cluster. The most effective means of evaluation is to actually deploy the application in a cluster. The simplest mechanism for performing an initial litmus test is to enlist your application into a cluster using the built-in Generic Application resource type. The Generic Application resource type is provided as part of the Cluster Service in the Windows Server 2003 family and can be seen with the other built-in resource types by viewing the Resource Types node under Cluster Configuration in the Cluster Administrator tool (see Figure 4.)

Figure 4. Resource Types node in Cluster Administrator tool

Cluster Administrator has interactive wizards to allow you to create resources for any of the listed resource types. But the Cluster Service also offers COM interfaces to allow for programmatically creating and administering resources.

Note   The latest Cluster Administrator tool and associated development resources may be obtained from the Platform SDK.

Cluster Automation Server

The Cluster Automation Server provides a set of Automation objects that expose a complete cluster management interface to scripting languages, allowing you to develop Web-based remote administration tools. The Cluster Automation Server simplifies and enhances the process of creating a cluster management application.

The object-oriented nature of Cluster Automation Server means that almost all cluster programming tasks will involve the following steps:

  1. Determine the cluster operation you need to perform.
  2. Find the Cluster Automation Server object that has a property or method to accomplish the operation.
  3. Determine how to obtain the object in Step 2. The Object Hierarchy available on MSDN® is useful for this.
  4. Obtain the object and invoke the property or method.

To illustrate, we will use the Windows Scripting Host and Microsoft VBScript to programmatically create our Generic Application resource.

The Cluster Object

The Cluster Object is a top-level object, allowing new instances to be created. The ProgID for the Cluster Object is "MSCLUSTER.CLUSTER":

    Set oCluster = CreateObject( "MSCluster.Cluster" )

Opening the Cluster

Before using any method on a cluster, a connection to that cluster must be opened. The Open method opens a connection to a cluster. Passing an empty string ("") as a parameter to the Open method opens a connection to the cluster on the localhost. Our script will operate on the localhost server:

    oCluster.Open( "" )

Creating a Group

A cluster group is a container for cluster resources. When one resource in a group fails and it is necessary to move the resource to another node, all resources in the group are moved. Groups also define dependency boundaries. A resource can only establish a dependency on another resource if that resource is in the same group. For our test, we will create a unique group called "High Availability Notepad":

    Set oGroup = oCluster.ResourceGroups.CreateItem( "High Availability
      Notepad" )

Creating the Resource

Each group has a collection of resources. The CreateItem method creates a new resource and adds it to a group's collection. In our example, we will create a resource named "Notepad" of resource type "Generic Application":

    Set oResource = oGroupResources.CreateItem( "Notepad", "Generic 
      Application", 0 )

Setting the Resource Properties

Each Generic Application resource has two properties that are essential for bringing the resource online: CommandLine and CurrentDirectory. CommandLine contains the command that will be executed when the resource is brought online; CurrentDirectory specifies the file system directory from which the command will be executed. When this script executes the statement to bring the resource online, Notepad will be launched. In order to see Notepad, we have also set the InteractWithDesktop property to 1.

    Set oProperties = oResource.PrivateProperties

    ' Set the properties of the Generic Application
    oProperties.Item("CommandLine") = "notepad" 
    oProperties.Item("CurrentDirectory") = "c:\"
    oProperties.Item("InteractWithDesktop") = 1

    oProperties.SaveChanges

Bringing the Resource Online

The Online method brings a resource online. Online is a state that describes the resource as being available to the cluster. For our Generic Application, bringing the resource online means launching Notepad.

    oResource.Online 10

Full Script Listing

Option Explicit

Main
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' 
  Main subroutine.
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
  Sub Main

    Dim oGroup
    Dim oCluster
    Dim oResource

    ' Create the Cluster object.
    Set oCluster = CreateObject( "MSCluster.Cluster" )

    ' Open the cluster. Empty string means open the local cluster.
    oCluster.Open( "" )

    ' Create or open the group.
    AddGroup oCluster, oGroup

    ' Create or open the resource.
    AddResource oGroup, oResource

    ' Bring the resource online and wait for up to 10 seconds
    ' for it to come online.
    oResource.Online 10

End Sub

'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
' This subroutine will create or open the group.
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Sub AddGroup( oCluster, oGroup )
    Set oGroup = oCluster.ResourceGroups.CreateItem( "High Availability 
      Notepad" )
End Sub

''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' 
  This subroutine will add the resource to the group.
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Sub AddResource( oGroup, oResource )
    Dim oGroupResources
    Dim oProperties
    Dim oCLProperty
    Dim oCDPropery

    Set oGroupResources = oGroup.Resources
    Set oResource = oGroupResources.CreateItem( "Notepad", "Generic 
      Application", 0 ) 'CLUSTER_RESOURCE_DEFAULT_MONITOR
    Set oProperties = oResource.PrivateProperties

    ' Set the properties of the Generic Application
    oProperties.Item("CommandLine") = "notepad" 
    oProperties.Item("CurrentDirectory") = "c:\"
    oProperties.Item("InteractWithDesktop") = 1

    oProperties.SaveChanges
End Sub

'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
' Wait for a specified number of time.
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Sub Sleep( PauseTime )
    Dim Start

    Start = Timer

    Do While Timer < Start + PauseTime
    Loop

End Sub

By running this simple script, our new "Notepad" resource will be created and placed into its own group ("High Availability Notepad").

Verifying the Results

We can verify our results by using Cluster Administrator. We can see that the parameters have been set correctly by looking at the properties of our Notepad resource in Cluster Administrator (see Figure 5).

Figure 5. Notepad resource parameters

Looking at the Advanced tab, you see default properties indicating that the application should be restarted up to 3 times upon failure. The LooksAlive and IsAlive polling intervals are defaulted to values from the resource type, but may be overridden by specifying a different value. Because this application has no special code to make it cluster-aware, the application is determined to be alive only by presence of its process running in the system.

Figure 6. Notepad resource Advanced tab

Testing the Application

When you bring the Notepad resource online, Notepad will be launched on the server. If the Notepad process is terminated, it will immediately be launched again. That's the Cluster Service in action, attempting to keep your application up and running. As a Generic Application resource, the Cluster Service will notice whenever the application's process is no longer running, and take corrective action based upon policy.

What if our application failed in a way that didn't result in the process terminating (for example, network failure, hanging or background thread termination)? Unfortunately, with the Generic Application resource type you only get generic failure detection. Most developers writing applications that will run in a clustered environment will prefer to produce custom resource DLLs, to handle application specific issues. But for a quick way to evaluate your application in a cluster, the Generic Application resource type can't be beat.

Conclusion

Microsoft Cluster Service provides high availability using standard, inexpensive hardware, while it maximizes computing resources. The Cluster Service in the Windows Server 2003 family provides powerful tools for making your applications highly available. Jumping into writing cluster-aware applications may appear too costly or intimidating for some developers. To expose developers to the benefits of clustering with little upfront investment, Cluster Service provides a Generic Application resource type, which allows applications to be easily configured to run within a cluster. Though the Generic Application resource type may not provide the sophistication required for a production application, it can provide one litmus test to see how your application performs within a cluster.