Section 6: Going Live: Instrumentation, Testing, and Deployment

 

Brian Travis
Architag International Corporation

November 2003

Applies to:
    Microsoft® .NET Framework
    Microsoft Visual Studio® .NET 2003

Summary: Learn how to make performance information known to the deployment and operations personnel of the FoodMovers system and learn how to test and deploy the system. (38 printed pages)

To see an overview of the entire project, read FoodMovers: Building Distributed Applications using Microsoft Visual Studio .NET.

Contents

The Monitoring Process
Exception Management
Managing Exceptions in Web Services
Event Management
Instrumenting Distributed Applications
WMI Architecture
Performance Monitoring
Exception Mining
EIF (Enterprise Instrumentation Framework)
Testing, Testing, Testing
Physical Architecture, Deployment, and Operational Requirements
Third Party Installation Tools
Conclusion
Unfinished Business

If you're reading this section, I'm guessing that you are the kind of person who likes information. I bet that one of the things you look for in a new car is what kind of information can you get from the operation of the vehicle. Your grandmother is satisfied with that light that indicates low oil pressure, but you want to know exactly what the pressure is right now, and how it varies based on what kinds of stress you put on the engine. You want to know what the RPM is so you can shift effectively. You want to know if the trunk is unlocked, the compass direction you are heading, and the name of the song playing on the radio.

Imagine driving a car that had few or no instruments. Sure, it would get you down the road, but you would not be able to get the best performance out of it, and you might not know that something is going wrong until it breaks.

Now, imagine driving a car that gives you too much information. BMW has been working on a system that provides the driver with just about every informational aspect of the operation of the car. All information is funneled to a display and accessed using a little joystick that is mounted near the driver. BMW has been simultaneously cheered and jeered for this concept. While gadget freaks love the idea, others say that it provides more information that a driver needs in order to go down the road, requiring more attention than might be safe.

And so it is with deploying distributed applications. In deploying the FoodMovers system, we need to get information about the operations of the system on a real-time basis. We need to know if there are bottlenecks or security breaches. We need to know if there are data anomalies or if a server is down. We need a lot of information from our system to make sure it is running in peak condition.

Wouldn't it be nice to have some kind of dashboard, filled with instruments that gave us the information we need? Then we could make adjustments before they become problems. We could tune the system based on loads and network slowdowns before our users notice that there is trouble.

Getting to a dashboard requires a whole-system approach. If you buy a car that does not have a tachometer and you want to install one, you can't just buy it and stick it on the dashboard. Well, you can, but it wouldn't give you the information you want. Rather, you need to hook it into some part of the engine that provides the raw data. In the case of a tachometer, you need to attach it to the distributor coil for older cars, or the electronic ignition or computer for newer cars.

In order to provide a dashboard for computer systems, you need to think about it from the very beginning of your system design. The architects designing the system must build system monitoring into the architecture, just like automobile designers specify how to collect information and report it to car dashboard instruments.

Monitoring performance requires that every process be written with monitoring in mind. It requires the use of several different technologies, which all come together to provide a picture of the status of the system. This philosophy is called "instrumentation."

In this section, I will discuss the issues involved with making performance information known to the deployment and operations personnel. This includes code-level considerations, event monitoring issues, and overall tracking of the health of the system.

I will also discuss testing of the system. Once we have the bits in place to monitor our system's general and specific health, we can subject it to real-life and worst-case scenarios and see how it performs.

The Monitoring Process

When deploying a system, it is important to provide enough information to the operations personnel to allow them to assure that the system is running properly. Developing a monitoring process flow is similar to the Architect...Design...Implement...Deploy model we followed in developing the system originally.

A suggested process for monitoring our application is shown in Figure 1.

Aa302167.foodmovers6_01(en-us,MSDN.10).gif

Figure 1. The monitoring process flow

The monitoring process starts with the analysis phase. In analysis, developers and operators set up some rules for the performance and the expected behaviors of the application. A strategy to monitor the application is determined.

Then the instrumentation phase starts. In instrumentation, variables such as performance counters, exceptions, and event logs are collected and coded to be exposed.

The next phase is deciding the usage of common services technologies. In this phase, WMI is the main technology to monitor applications, objects, services, and other software components. In addition to WMI, the application management servers such as Microsoft® Operations Manager (MOM), Application Center Server (AC), and Systems Management Server (SMS) can be used. I will discuss these tools later in this section.

In the development phase, these strategies are developed and implemented for application instrumentation and monitoring. WMI SDK, Managed code, and Microsoft Visual Studio® .NET provides tools and APIs to instrument and monitor applications. I will also be discussing these in this section.

The last step is the management phase. Instrumentation and monitoring tools and custom code are deployed. As errors and exceptions occur, the WMI providers publish events and WMI subscribers receive notifications.

Exception Management

When we designed our system in the beginning, we developed a philosophy on exception management. This philosophy is important in guiding our programmers as they develop programs.

One of the facets of the philosophy is that there is no such thing as a user error. This means that if a user creates an error, it is the programmer's fault. If, for example, a field in a database is numeric and the corresponding field on a form is alphanumeric, the user is able to enter non-numeric data. If this causes a problem on a database insert, don't blame the user. In this case, the programmer needs to verify that the user entered a non-numeric value or, better, prevent the user from entering non-numeric data in the first place.

Another tenant of the exception management philosophy is the concept of a hierarchy of exceptions. I see three different classes of exceptions, each of which is handled differently:

  • Exceptions that are dealt with by the program
  • Exceptions that discover or will cause an inconsistency in data
  • Exceptions that will render the system unusable

Exception management is handled in many different ways. The easiest way is the trapping of exceptions at run-time with .NET exception handlers. This is done using try and catch.

Any time there is a possibility that an exception will be generated by the system, the programmer should place a try around it.

try
{
   // Some code that can throw an exception.
}
catch(SomeException exception)
{
   // Code to react to some exception from 
   // the code in the try block
}
catch(SomeOtherException exception)
{
   // Code to react to some other exception 
   // from the code in the try block
}
catch()
{
   // Code that catches any other exception
   // generated by the code in the try block
}
finally
{
   // Code that gets run always, whether or not an
   // exception was thrown. This is usually clean up
   // code that should be executed regardless of
   // whether an exception has been thrown.
}

We catch exceptions only when we need to do something, either gathering information for logging or trying to recover from that exception. Other reasons can be adding any relevant information to the exception or executing cleanup code. In the code if we don't want to gather information or recover then we don't need to catch the exception. Rather, the code needs to allow it to propagate back up the call stack. This keeps the code clean as we only catch the exceptions that need to be handled within the scope of a particular method and allow all others to continue to propagate.

The program should handle exceptions it was designed to handle. For example, if a TCP connection times out, it will be caught. Code in the catch can increase the timeout and retry.

Some exceptions should be reported to the operations personnel based on some criteria. For example, if someone tries to log on and enters the wrong password, it is not really a problem. However, if a login attempt is repeated ten times in two minutes with the same user id and a different password, we can assume that there is a break-in attempt and an administrator should be informed.

It is also possible to throw our own exceptions. Throwing exceptions is like generating our own problems. For example, a buyer can enter an invalid SKU. This is not considered an exception, but the application should handle it gracefully and raise a message for the user. However, a critical exception occurs when a database is not available, or if a required Web service is down. Remember that throwing exceptions shouldn't be used in the usual flow of the application such as a business rule. Too many exceptions also result in unmanageable code.

There are three main ways to propagate exceptions:

  • We can let the exception propagate automatically, hoping that the application will fix itself. If this happens, it is none of our business that an exception even happened.

    In this case, the application has control to move immediately from the current code block up the call stack until a catch block with a filter that matches the exception type is found.

  • We can catch and rethrow the exception. With this approach, we catch and react to the exception, and clean up or perform any other required processing within the scope of the current method. If we cannot recover from the exception, we rethrow the same exception to the caller.

  • Third, we can catch, wrap, and throw the wrapped exception. As an exception propagates up the call stack, the exception type becomes less relevant. When an exception is wrapped, a more relevant exception can be returned to the caller. If you cannot recover, wrap the exception in a new exception, and throw the new exception back to the caller. TheInnerExceptionproperty of the Exception class explicitly allows you to preserve a previously caught exception. This allows the original exception to be wrapped as an inner exception inside a new and more relevant outer exception. The InnerException property is set in the constructor of an exception class.

    try
    {
       // Some code that could throw an exception.
    }
    catch(TypeAException e)
    {
       // Code to do any processing needed.
       // Rethrow the exception
       throw;
    }
    catch(TypeBException e)
    {
       // Code to do any processing needed.
    
       // Wrap the current exception in a more relevant
       // outer exception and rethrow the new exception.
       Throw (new TypeCException(strMessage, e));
    }
    finally
    {
       // Code that gets executed regardless of whether
       // an exception was thrown.
    }
    

    Another alternative is to create custom exception classes. We can create custom exceptions based on the ApplicationException class.

Managing Unhandled Exceptions in ASP.NET

Another facet of our exception management philosophy is that all exceptions will be handled. It is completely unacceptable for an application to bomb because an exception was not caught. As I mentioned earlier, exceptions should be caught at the lowest level and propagated up the call stack until some process fixes it.

Even though we strive to catch all errors, it is difficult to anticipate everything that could happen. For this reason, .NET provides several different methods for gracefully reporting that an otherwise unhandled exception has occurred.

In Microsoft ASP.NET, it is possible to specify a page that should be loaded if an unhandled exception is found in a particular page. Code is placed in the Web.config file of the page to redirect in case of certain errors.

<customErrors defaultredirect="https://FoodMovers.com/error.aspx" 
  mode="on">   
  <error statuscode="500" redirect="/errorpages/servererror.aspx" />
  <error statuscode="404" redirect="/errorpages/filenotfound.htm" />
</customErrors>

It is also possible to indicate on an ASPX page that another page is to be loaded in case of an unhandled exception by adding a @Pagedirective.

<%@ Page ErrorPage="customerror.aspx" %>

Each page also has aPage.Errorevent that is raised whenever there is an unhandled exception on a page. Here's how this exception can be handled:

Page.Error += new System.EventHandler(Page_Error);

Finally, it is possible to trap everything globally in your application. This should be considered a last-ditch effort, as trapping an error at this high level is not nearly as helpful as trapping at a lower level. Add code to theglobal.asaxfile that handles otherwise unhandled exceptions.

protected void Application_Error(Object sender, EventArgs e)
{
      Exception exc = Server.GetLastError();
      // Perform logging, send any notifications, and so on.
}

Tracing

Exception management is useful for trapping and correcting or reporting errors. The problem is that an exception that writes an event to a log might appear out of context. It would be nice to know what the system was doing immediately before the error occurred in order to get a better idea of what exactly went wrong.

VSEA and the Microsoft .NET Framework provide Trace and Debug classes to help trace the execution of the code. With the Trace and Debug classes, we can record information about errors and how the application is executed to logs, text files, or other device so that these logs can be analyzed later.

When using Trace, you must have a mechanism for collecting and recording the messages that are sent. Trace messages are received by listeners. The purpose of a listener is to collect, store, and route tracing messages. Listeners direct the tracing output to an appropriate target, such as a log, window, or text file.

System.IO.FileStream myTraceLog = new 
   System.IO.FileStream("C:\\myTraceLog.txt", 
   System.IO.FileMode.OpenOrCreate);
System.Diagnostics.TextWriterTraceListener myListener = 
   new System.Diagnostics.TextWriterTraceListener(myTraceLog);
System.Diagnostics.Trace.Listeners.Add(myListener);

Trace.AutoFlush = true;
Trace.WriteLine("Entering Store Order Interface");
Trace.Indent();
...
Trace.Unindent();
Trace.WriteLine("Exiting Store Order Interface");

This just opens a file on the root drive. In a real system, you would probably create a listener that sends data to whatever site management package you are using.

ASP.NET also provides sophisticated tracing that is viewable as HTML output. The TraceContext class is available through the Trace property of the Page class.

Like exception handling, trace information can be created inside of your application by using Trace.Write and Trace.Warn inside an application. Page-level tracing is controlled using the Trace attribute of the@Pagedirective.

Application-level tracing can be controlled using settings within the application'sWeb.configfile.

<configuration>
   <system.web>
      <trace enabled="true" requestLimit="20" pageOutput="true"/>
   </system.web>
</configuration>

Trace is enabled by default in both debug and release build configurations in Visual Studio.NET.

Managing Exceptions in Web Services

Exception handling for Web services in Visual Studio.NET is done according to the W3C standard SOAP specification. The .NET Framework provides an error class called SoapException. Unhandled exceptions from a Web service result in a SoapException.

A SoapException event can occur because of two different reasons

  • The SOAP message is malformed.
  • The Web service is down.

The SoapException class raises ClientFaultCode, MustUnderstandFaultCode, and VersionMismatchFaultCode when there is an exception or an error due to the request message format or message content. For example, when FoodMovers receives a malformed SOAP message from Hearty Soup Company, .NET Framework raises a SoapException.

When a Web service method throws any unhandled exception, the exception is repackaged as a SoapException and is returned to the Web service client via the SOAP response. This ensures interoperability.

Another built-in error mechanism is the ServerFaultCode field in the SoapException class. ServerFaultCode is used for errors that don't involve message format or content, but errors in the processing of the request. For example, when Hearty Soup Company sends a message to FoodMovers Update Manager Web Service, and the Web Service is down due to network problems, ServerFaultCode occurs.

The important point is that no matter why the exception has happened in the Web service, the SoapException class sends an error message back to the client in the standard format that any application that supports the W3C SOAP specification can understand.

The SOAP response that the server runtime creates is in a standard SOAP format which allows interoperability among different, non-.NET clients to detect that an error has occurred. On a .NET-based client, the Framework captures the SOAP message and deserializes the SoapException so that the client can detect exceptions through standard exception handling techniques, just as it would for a non-Web service call. However, the exception that is detected on the client is a SoapException, not the exception type that was originally thrown by the Web service.

For Web services that are exposed to external companies and systems, some of the exceptions need to be kept internal rather than exposing all of the detail to the external partner. We throw an exception to FoodMovers' external partners so that they can react. However, the details of the exception is kept internally and written to a log file.

When the exception is not handled by the application domain, the UnhandledExceptionEventHandler delegate can be used. The UnhandledExceptionEventHandler delegate can be used as a means to handle exceptions that are unhandled by your application code. The method associated with the UnhandledExceptionEventHandler delegate will be executed when an unhandled exception occurs.

In FoodMovers ASP .NET Web applications, the application will use Application_Error, instead of the UnhandledExceptionEventHandler, to serve as the application's last chance to handle the exception.

Presenting Information About Exceptions

In addition to catching exceptions, another important point is what information to gather so that it can be logged, researched, and corrected. We need to gather the appropriate information about the state of things when the exception happened so we can take the appropriate action.

What to do with this information depends on who will view the information.

  • End users require friendly, well-presented information that does not confuse them. When the end-user is external we need to be careful to keep the internal information and not present it to the external user.
  • For system operators, information required to fix the problem and recover from it must be presented.
  • When the exception information is gathered for application developers, the information must be more detailed in order to help them with problem diagnosis and correction.

The details that must be captured include the date and time of the exception, the machine name, exception source, message description, thread id, and thread user. This information is located usually within the System namespace. DateTime.Now, Environment.MachineName, Exception.Source, Exception.Message, Exception.StackTrace, AppDomain.GetCurrentThreadId, and Thread.CurrentPrincipal are available.

In order to log the exceptions we can use Microsoft Windows® Event Log, a central database, or a custom log file. Each of these media has advantages and disadvantages. However, .NET Framework classes make the event log easy to write to and maintain programmatically, as we will see below. Also, I will discuss monitoring and the tools related to monitoring in the following sections in further detail.

Event Management

Now that I have talked about the specifics of exception trapping and exception management, let's talk about the larger issue of managing events of importance.

Way back when Windows NT was being developed, the Event Viewer was born. This is familiar to many developers and power users. It is shown in Figure 2.

Aa302167.foodmovers6_02(en-us,MSDN.10).gif

Figure 2. The Event Viewer

The Event Log was maintained internally by the operating system, and worked fine for quite a while. In fact, it was the preferred method of debugging systems. When something went wrong, we could trust that a well-written application would write details about any problems it encountered to the Event Log, where we could see it using the Event Viewer.

For tracking events on a single machine, FoodMovers uses this approach. First, we create a special area at the highest level of the hierarchy. By creating our own level, we can have sources that reflect subsystems in the FoodMovers application. This is shown in Figure 3.

Aa302167.foodmovers6_03(en-us,MSDN.10).gif

Figure 3. The Event Viewer showing FoodMovers event logging area

Here is code for creating an event area and sources.

public static void InitApplication()
{
   if(!EventLog.SourceExists("FoodMovers"))
      EventLog.CreateEventSource("FoodMovers", "FoodMovers");
   if(!EventLog.SourceExists("OrderInterface"))
      EventLog.CreateEventSource("OrderInterface", "FoodMovers");
   if(!EventLog.SourceExists("GetUser"))
      EventLog.CreateEventSource("GetUser", "FoodMovers");
}

For brevity, this example only shows three sources. Once the sources are created, an Event Log writer can be invoked to write an event to a particular source.

public static void WriteEvent(SourceType iSource, 
   string strEvent, WriteEventType eventType, string data)
{
   EventLog myLog = new EventLog();
   myLog.Log = "FoodMovers";
   switch (iSource)
   {
      case SourceType.OrderInterface:
         myLog.Source = "OrderInterface";
         break;
      case SourceType.GetUser:
         myLog.Source = "GetUser";
         break;
      default:
         myLog.Source = "FoodMovers";
         break;
   }

   byte[] myRawData = new byte[250];
   for (int i = 0; i < Math.Min(data.Length, 250); i++)
      myRawData[i] = Convert.ToByte(data[i]);

   try
   {
      myLog.WriteEntry(strEvent, (EventLogEntryType)eventType);
   }
   catch (System.Web.Services.Protocols.SoapException err)
   {
      Console.WriteLine (err.Message + "\n" + err.StackTrace);
   }
}

Then, in our application, we can access this method when something happens that needs to be reported.

public static UserData GetUser(string userID, string password)
{
   Users accUser = new Users();
   
   if (userID == null || password == null)
      return null;
   else
   {
      datUser = accUser.GetUser(userID, password);
      if (datUser == null)
      {
         WriteEvent(SourceType.GetUser, "Incorrect user login", 
            WriteEventType.Warning, "userid: " + 
            this.txtUserID.Text + ", password: " +
            this.txtPassword.Text););
      }
      return datUser;
   }
}

This results in the entry shown in Figure 4.

Aa302167.foodmovers6_04(en-us,MSDN.10).gif

Figure 4. The Event Viewer showing an event logged

Aa302167.foodmovers6_05(en-us,MSDN.10).gif

Figure 5. Event Properties

Writing events to the Event Log on a machine is a critical part of understanding the operation of any system. Writing events of importance will help manage the system by allowing operators to see problems as they happen, rather than after they explode into system malfunctions.

Instrumenting Distributed Applications

In a single-computer system, such as the one you are probably looking at right now, the event viewer is a handy way to report and discover problems with the environment. The event logging infrastructure was designed for single-machine systems and works fine in that environment.

But think about the World Wide Web. The Web is made possible by the magic of stateless protocols. When you move from page to page on a Web site, each page is delivered with no understanding of the pages that came before. If you surf ten pages on a popular site, you might get those pages from three, or ten, different physical boxes.

A Web farm is a load-balanced environment. An application farm is a clustered environment where several dozen or several hundred physical machines are given requests for pages. They all share a common database that stores page data and session information, but each box has its own memory, disk storage, processor, and network connection. And each is running its own instance of the operating system, which includes our friend the Event Log.

During the day, things are sure to go wrong with individual machines. One machine could have a bad network card that drops packets resulting in many retries. Another machine might have a bad spot in memory that requires constant parity correction. Another might have a bad hard disk.

A well-written application will detect such problems and write a message to the Event Log, hoping that someone will see it and correct the problem. However, since it is written to the Event Log on the local machine, it is difficult for a human to review it. Someone would need to log on to each of the hundreds of machines in the Web farm or the application farm in order to find those individual problems.

Clearly, we need a method of aggregating errors to a central source that can be monitored more easily than accessing each machine's logs. Enter WMI.

The Windows Management Instrumentation (WMI) is Microsoft's implementation of the Desktop Management Task Force's (DMTF) Web-Based Enterprise Management (WBEM) initiative and the DMTF Common Information Model (CIM).

WMI allows you to hide the complexities associated with the monitored environment. The CIM schema, which is also a DMTF standard, presents a consistent and unified view of the various types of logical and physical objects contained within the environment such as software components, services, and printers. Management objects are represented as classes. These classes include properties that describe data and methods that describe behavior.

Because WMI is an implementation of a standard (WBEM), there are many third-party tools available to interpret and report on the messages created using WMI. This is important for a couple of reasons. First, being a standard means that there are more tools available for certain functions. Software publishers can create WBEM-compliant applications that are targeted for a certain vertical market, such as grocery distribution, that provides more specific functionality than a horizontal, mass-market interface. Second, because it uses a standard schema, information collected in a Windows WMI interface can be aggregated along with information collected from other operating system platforms. All aggregated data can be analyzed with a single instance of a monitoring application.

Adding WMI functionality to our FoodMovers application is relatively straightforward. WMI event logging functionality is contained in the System.Management.Instrumentation namespace. Creating a WMI monitored application requires adding this namespace and establishing a base event.

using System.Management.Instrumentation;
...
public class FMEvent : System.Management.Instrumentation.BaseEvent 
{
   public string Event_Name;
}
...
FMEvent e = new MyEvent();
e.Event_Name = "Incorrect user login";
e.Fire();

These events are stored in the WMI repository, ready for a WMI monitoring application to pick them up and analyze them. The process is shown in Figure 6.

Aa302167.foodmovers6_06(en-us,MSDN.10).gif

Figure 6. WMI instrumentation and monitoring flow

By adding WMI instrumentation, our application is now an "Instrumented .NET Application." The WMI Object Manager owns the BaseClass that we created above. That class writes to the WMI repository. For unmanaged code, WMI providers are available that will interface with the WMI objects. This allows all applications to write to the same WMI repository.

WMI Architecture

WMI architecture is made up of 3 parts:

  • Management Infrastructure
  • Managed Objects
  • WMI Providers

Management Infrastructure

There is an object manager called Common Information Model (CIM). Users use CIM Object Manager (CIMOM) to handle communications between management applications and providers. The CIMOM facilitates these communications by providing a common programming interface, the WMI API. This API supplies event notification and query processing services, and can be used from several programming languages.

CIMOM is the primary component of WMI management infrastructure. It runs as a standard service that starts when the first client management application makes a call to the WMI ConnectServer interface and connects successfully and continues to run as long as management applications are actively requesting its services. After the last management application shuts down, so does CIMOM.

The CIMOM Object Repository is a central storage area that holds CIM schema information. The repository can also hold static data, though it is not designed to be a large volume database for management data. WMI supports dynamic data, as well. Dynamic data is data that is generated on demand. The CIM Object Manager, network administrators, and third-party developers can place schema information in the CIMOM object repository by using either the MOF language and its compiler or the WMI API. The CIMOM Repository and CIMOM make up the WMI Management Infrastructure.

Managed Objects

Management applications access managed objects using the CIM Object Manager.

Managed objects consist of pieces of the enterprise network; they range from small devices, such as a disk drive, to large software applications, such as a database manager. These objects, most of which existed before WMI, are modeled by using the CIM.

By using WMI technology we can create management applications that implement numerous features, such as displaying system information, generating an inventory of network resources, and processing and responding to events. Windows Management Instrumentation supports various strategies to create management applications. For example, applications can use the WMI API to access the CIM Object Manager directly or they can access the CIM Object Manager indirectly.

For example, Web browsers can use HTML. An Internet Server API (ISAPI) layer provides support for HTML and communicates with the CIM Object Manager.

Another example is the Microsoft Open Database Connectivity (ODBC) Driver. Database applications can use an ODBC driver to combine ODBC-compliant database capabilities with the management capabilities of WMI. This driver enables an application to use various ODBC-based reporting packages and tools, including Microsoft Excel and Microsoft Access.

WMI Providers

WMI providers are components that supply dynamic management data about managed objects, handle object-specific requests, or generate WMI events.

WMI architecture is shown in Figure 7.

Aa302167.foodmovers6_07(en-us,MSDN.10).gif

Figure 7. Windows Management Instrumentation (WMI) architecture

Applications can use COM interfaces directly to interact with CIM Object Manager to make management requests. For example a C/C++ application can access CIMOM directly, but other applications, such as a database manager, must use ODBC to access CIMOM. Other applications can use Active Directory Service Interfaces (ADSI), or Windows Management Instrumentation Scripting API, which used to be called WBEM Scripting, to make their requests.

Managed objects and associated providers, such as Win32 APIs, Win32 Providers, Windows registry, and Registry Providers can access the CIMOM through WMI providers.

Performance Monitoring

Remember our automobile analogy? Just as the car designers put sensors in the engine and instruments on their dashboard, we have seen how to get information from the running of the application that we developed. But there's more information that is available to the driver; information about the environment that is not directly related to the systems inside the car. For example, you might want to know the temperature outside, or where the next gas station is. These are valuable pieces of information, but are found using sensors that are external to the actual operation of the car itself.

Just as is the case with cars, information about the environment is available to the operations people in the form of performance monitors.

Ever since Microsoft Windows NT®, there has been performance monitoring capabilities. The Performance Monitor (perfmon) is the tool that displays real-time performance information, such as processor load or network throughput. Combining real-time performance monitoring with real-time event management will make it easier to anticipate problems and correct them before they explode.

The Performance Monitor tool is familiar to most power users. It is shown in Figure 8.

Aa302167.foodmovers6_08(en-us,MSDN.10).gif

Figure 8. The Windows Performance Monitor application shows real-time statistics about hundreds of different aspects of a system

In addition to the traditional monitoring counters such as CPU utilization and disk efficiency, the .NET Framework provides a rich set of counters to monitor any measurable aspect of the running system.

ASP.NET applications provide the following performance counters:

  • Process(aspnet_wp)\% Processor Time
  • Process(aspnet_wp)\Private Bytes
  • Process(aspnet_wp)\Virtual Bytes
  • Process(aspnet_wp)\Handle Count
  • .NET CLR Exceptions\# Exceps thrown / sec
  • ASP.NET\Application Restarts
  • ASP.NET\Requests Rejected
  • ASP.NET\Worker Process Restarts
  • Web Service\Current Connections
  • Web Service\ISAPI Extension Requests/sec

Each of these can have different values and thresholds for critical error. For further information refer to ASP.NET Performance Monitoring, and When to Alert Administrators.

Application Monitoring

We monitor our applications to make sure that they function and perform as expected. The operations team and system administrators are responsible for monitoring the applications. As we mentioned earlier, the operations team has established guidelines and practices for application monitoring and shared them with the development team as part of our original architecture. These guidelines standardize the error logging, and analyzing the data from problem discovery to diagnosis.

Instrumentation metrics are equally valuable to both teams. Operators can use instrumentation data to perform capacity planning and health monitoring. Developers can use instrumentation data to design, build, and optimize high-performance applications.

Operators can also improve the quality of an application by judiciously recording problem history and the details of their resolutions. By communicating this information back to the development team, the developers can improve their future system designs and the diagnostic capabilities of their applications.

It is important that the development team communicate with the operations team to inform them of the types of error logs generated by every application. In turn, the operations team must relay to the development team the various mechanisms that are available for monitoring errors. Together, both teams must decide on the appropriate logging mechanisms, and then develop and monitor applications accordingly.

In some situations, development may be decoupled from operations, and the developers cannot interact with the operators who monitor the application in the production environment. The use of a common object model technology allows discoverability of the manageable objects without the need for direct communication between the authors and consumers. WMI is the preferred monitoring technology on the Microsoft Windows platform. Because WMI is native to the Windows platform (and therefore to the .NET Framework), many advanced monitoring techniques can be implemented by interacting with the WMI object model via classes contained within the .NET Framework Class Library.

Health Monitoring

Health monitoring is the process of identifying the conditions that contribute to system failure and take preventive action.

You can already monitor events in the event log or other logging resources to spot fatal errors or warning conditions that can signal the start of a problem. You can use third-party tools to monitor events and performance data thresholds; for example, when the CPU on a particular computer exceeds 95 percent of its capacity. These tools raise events of particular significance to administrators for resolution. You can also elect to create your own diagnostic procedures and monitoring tools.

Performance Monitoring

Microsoft Windows performance counters allow your applications and components to capture and analyze the performance data that applications, services, and drivers provide. You can use this information to determine system bottlenecks and fine-tune system and application performance.

For example, you might use a performance counter to track the amount of time required to process an order or to query a database, or you might monitor the size of a message queue and write code that performs a specific action whenever the queue reaches some preset limit. WMI also supports writing data back to the application. In this case, the action could be paging an operator or sending e-mail to an administrator. The WMI SDK can also provide an automatic corrective action, such as increasing the queue size to accommodate the load. This can also be done through WMI if the queue size is an instrumented piece of configuration data.

Exception Mining

We have information coming in from many sources. If our programmers wrote their programs correctly, the programs will be generating information about their internal operations. At the same time, performance monitors will be looking at the overall and specific performance of the entire system.

Somehow, we need to collect all information from all of our boxes into a single location where we can determine what needs attention.

Remember the example of the new BMW above? Critics say that having too much information is as bad as or worse than having no information at all. Providing too much non-critical information to the driver is distracting and can be dangerous. Providing too much information to system operators runs the risk of important information being lost. Instead, we need some way of looking through that flood of information for the most important things.

I call this "exception mining."

Exception mining is like data mining, except that the data is all of the exceptions that we have been collecting.

How can we monitor and manage a distributed application among several servers? How will we monitor and collect performance data and things such as event logs and WMI repositories into a central location where we can see each server and problem or performance in detail?

Microsoft offers application servers that can do the job for us, that can collect data and do meaningful actions with them when necessary.

  • Microsoft Management Console (MMC)
  • Application Center (AC)
  • Systems Management Server (SMS)
  • Microsoft Operations Management (MOM)
  • Enterprise Instrumentation Framework (EIF)

These application servers provide the following capabilities:

  • Rolling up individual server information into system-wide aggregations
  • Displaying of event logs and performance information across multiple servers on a single display
  • Reading instrumentation created with WMI or provide their own management agents

Let's take a look at each one of these and see how they might be useful to us.

Microsoft Management Console (MMC)

Microsoft Management Console is not really an application server, but it does provide a common console framework for system management applications. Its primary goal is to support simplified administration and lower cost of ownership through tool integration, task orientation, support for task delegation, and overall interface simplification. The MMC console hosts administrative tools called MMC snap-ins; the console itself provides no management functionality.

Microsoft Application Center (AC)

Managing distributed servers is a difficult job. Application Center provides tools and technologies for managing groups of servers.

Application Center is used for system deployment and operations management. The tool manages Web clusters and Web farms, thereby ensuring availability and scalability of the system.

Application Center supports WMI. This means that it consumes and publishes WMI events, allowing easier integration with other applications and system management tools that support WMI.

Application Center provides Health Monitor tool to monitor system performance.

  • Integrated monitoring tools that monitor performance, events, and log data from across multiple servers into a single integrated console, allowing detailed drill-down into specific machines and resources
  • Self-heals and self-tunes systems by monitoring the health of servers and applications and by setting thresholds for acceptable performance
  • Can collect system data, apply rules to that data, and perform actions (such as e-mail, programmatic action, event creation) if rules are violated

Application Center architecture has three layers:

  • User Interface
  • Feature System
  • Operating System

Aa302167.foodmovers6_09(en-us,MSDN.10).gif

Figure 9. Microsoft Application Center provides a set of tools for monitoring the health of a deployed system

Application Center and WMI work together. As we have seen earlier, applications can be written to provide WMI events. These events are integrated using Application Center to provide a central dashboard of the running system.

Microsoft Systems Management Server (SMS)

Microsoft's System Management Server (SMS) is a change and configuration management solution. SMS can be used to distribute and configure software to multiple machines. SMS also has status monitoring capabilities that are accessed with the SMS SDK.

As a status monitoring system, SMS provides management of networked computers, including:

  • Hardware inventory based on WBEM.
  • Software distribution and installation.
  • Remote performance analysis and troubleshooting.
  • Network Tracing Topology Tool (NetTrace).
  • Network Monitor Tool (NetMon).

SMS uses WMI as the infrastructure on which to build its services. The SMS provider extends the WMI model to include SMS-specific operations.

Microsoft Operations Manager (MOM)

Microsoft Operations Manager and Systems Management Server provide complementary functions. Managing the IT infrastructure requires both operations management and change and configuration management.

Microsoft Operations Manager (MOM) provides event collection from multiple sources, performance monitoring, reporting tools, event management, proactive monitoring and alerting, reporting, and trend analysis in large, distributed environments.

MOM also provides a centralized management console, extracting and presenting data from other management applications. You can use additional management pack modules to extend the capabilities of MOM and create a fine-grained view of all three tiers.

MOM collects data from several providers, such as events, counters, and WMI. MOM captures all this information using rules and either responds to a specific fault scenario with a predefined action or consolidates the data into a more meaningful event. Automated responses to such events can range from sending e-mails or paging calls to triggering preprogrammed remedial actions. MOM maintains a repository of system events and can provide a knowledge base of operational procedures.

MOM provides two benefits that complement the other two management applications:

  • Enterprise-level support, monitoring hundreds of computers and providing a centralized view of this information.
  • Increased resolution of a .NET-based application monitoring picture using management packs.

MOM enables you to configure monitoring in many ways. As a rule of thumb, MOM suggests that you keep your monitoring environment as simple as you can while still getting the job done.

MOM consists of components that can work in a distributed environment. The Windows 2000 core management services provide the main management data for Operations Manager. Additional providers include IIS, .NET Framework, COM+, Application Center, and AppMetrics.

The main MOM architectural components are as follows:

  • Agents
  • Consolidator Agent Manager
  • Data Access Server
  • Microsoft SQL Server™ Database

MOM's architecture is shown in Figure 10.

Aa302167.foodmovers6_10(en-us,MSDN.10).gif

Figure 10. Microsoft Operations Manager (MOM) Architecture

MOM Agents are intelligent monitoring components installed on each monitored computer. Agents collect and analyze information and execute commands that MOM sends. They also store rules locally and can act without referring back to their managing computers.

The Consolidator Agent Manager (CAM) delivers rules and configuration data to the agents on the managed nodes. The CAM handles all communications with the managed computers on the network and sends information received from managed nodes to the Data Access Server (DAS). This information appears in the Microsoft SQL Server database. The CAM also automatically deploys and updates the remote agents on each managed computer and ensures that new rules propagate to the local agents.

MOM services use the DAS to access the database for reading or writing information. The DAS acts as a broker, transposing simple requests into database-specific tasks.

The SQL Server database stores all event information and rules logic and is where the MOM management packs reside. Additionally, this database contains the prescriptive advice and Knowledge Base links, and the reporting engine queries it when generating reports.

Although you can install all of these services on a single computer, it is also possible to spread them across multiple computers for better performance. The distributed nature of the MOM architecture makes it easy for you to avoid bottlenecks and support thousands of managed computers.

EIF (Enterprise Instrumentation Framework)

EIF is another technology for monitoring and troubleshooting high-volume, distributed environments.

EIF uses event logging, tracing, and performance counters to provide a real-time operational view of an application in a distributed environment. EIF can also work on a single server.

To unify existing event logging and tracing mechanisms built into Windows, the EIF provides a consistent, low-profile API and configuration layer. Developers can use this feature to publish audits, errors, warnings, business events, and diagnostic events for support teams to monitor and analyze.

EIF Integrates with Microsoft management tools and products (Application Center, Operations Manager, and so on). Benefits of EIF include:

For the Support Organization:

  • Application faults or issues often detected first by end-users and customers.
  • Support staff needs structured application information to be able to help the users.
  • Most tracing solutions are not intended for production deployment scenarios.
  • Distributed applications magnify these challenges.
  • It is hard to correlate events fired from an application across multiple servers.

For the Development Organization:

  • There is no unified instrumentation API in the development environment.
  • Developers are forced to use different eventing, tracing or logging solutions, or more often, avoid instrumentation entirely.
  • Firing an event must be as simple and low-profile as possible.
  • Developers often forced to determine event routing within code.

Which Application Server to Choose?

The information here seems overwhelming. If you think like there are many different ways to get the same task done, you are not alone. The application monitoring tools I have described have been developed over a number of years to solve problems that were evident when they were created. Each one of these tools has a specific audience, and there is certainly some overlap.

Plus, there are many third-party system management tools available.

Here is my attempt to provide some input on which solution to choose.

As I have mentioned in this section, cross-machine deployment provides challenges for the operations team. Microsoft offers many servers as I have discussed before.

  • Application Center is specifically designed for e-site management, including Web server and Web farm applications, as well as Web services deployment.
  • MOM is more of a general purpose management and monitoring tool. It handles BizTalk Server environments, or SQL Server.
  • EIF is a technology for Visual Studio.NET applications. It works hand-in-hand with AC and MOM, providing a uniform data for event management, tracing and logs.

These are just observations I have made while working with these systems. You will want to evaluate your application and how each one of these products fits into your environment.

Testing, Testing, Testing

Now that instrumentation has been added to the applications and a holistic monitoring system has been implemented, it is time to see if the system works.

Testing is a critical part of deploying a system. In fact, testing is an integral part of the development process, as well. Approaches to testing can be grouped into two categories:

  • Waterfall. Each person works in phases. Development is done, then testing and release. This is the traditional method of testing.
  • Evolutionary approach. Unit Test. Work on a modular piece or a unit. Build functionality, test it, build more functionality, and then test again. This way a working application is consistently released; more functionality is added as more is understood about the system.

Regardless of the testing approach, any observed problems are fixed and bottlenecks are removed. Therefore after collection of data, analysis, configuration and testing, some level of optimization is achieved.

There are many types of tests:

  • Unit Testing. Testing modular sections of an application. Take the small and modular application pieces and test to assure that it achieves the goals of the piece. This test is performed independent of other pieces of the application so errors that are caused by the tested module are captured and fixed.
  • Integration Testing. Testing the interface of multiple units. Eventually all units that are individually tested will be combined to form an application. Integration testing will identify the problems that occur when units are combined.
  • Regression Testing. Re-testing an application that is modified after the implementation. This can be done by running existing tests against the modified code to determine if the changes have broken the unit, component, or application. Developing a library of used tests is a logical solution for regression testing.

All tests should include:

  • Availability. Running the application for a period of time, collecting data for availability and repair times and fixing these problems. As a result of these tests, we obtain a percentage of availability. This will be compared to the service level agreement (SLA).

    Availability testing is concerned with measuring and minimizing repair time. It means testing catastrophic failures, failover technologies, and resource conflicts. For example, finding a disk controller and loosening the connection, turning off the power, or even calling the help desk to determine how long it will take for them to fix a problem.

  • Manageability. Making sure that deployment, maintenance and monitoring tools and technologies are working as expected. This means testing cluster configuration, network load balancing, WMI, and so on. For example, making sure that the failover is done gracefully, or workload is evenly spread between the servers, or making sure every source of information can be monitored.

  • Performance. Involves measuring performance, defining performance tests, determining baseline performance, stress testing and solving performance problems. Performance tests need to be applied to systems that have passed through all the functional tests. Because a network problem or a bug can result in low performance.

  • Reliability. The application is run and tested so many times that any failures are discovered and removed before deployment. After component or integration stress testing is done, after requirements and real-world testing such as making sure those applications that are interfering with each other can reside without failure, and other functionality testing, we need to make sure that the problems are fixed and the application is reliable. Another test is by using random, even illogical, input to crash or hang the application.

  • Scalability. An extension of the performance tests. Scalability tests identify the workloads and bottlenecks and fix them if any problems show up. The bottlenecks can be identified with the values obtained by the performance tests that can be monitored by Windows Task manager, Windows Performance Monitor, and Component Services administrative tool.

  • Securability. Identifying possible security flaws and ensuring the application's secure services are working as expected. Test for buffer overflows and underflows, attack the application using well-known security tests, and audit the application source code if security is a serious matter. FoodMovers Web service extensions need to test that WSE document security works.

  • Globalization and Localization. First, testing to make sure the code can be localized. For example, are the strings stored in code or as external resource objects? Then, once localization is complete, testing to assure that the localization is done accurately and reflects the functionality of the system.

  • Requirements testing. Making sure that the software components meet the design and specifications. This can be applied to unit, integration and regression tests.

  • Stress testing. Trying to crash the system by increasing traffic. Testing for scalability. This can be applied to unit, integration and regression tests.

  • Business Logic Testing. So far, we have discussed system testing; making sure that an application does not bomb, and that problems are caught and dealt with. There's also the matter of assuring that a system adheres to business rules. For example, suppose that we need to charge sales tax to all customers in our state, but not to customers in other states or countries. This needs to be tested to assure compliance with state laws. Failure to do so could result in fines.

    Business logic testing involves checking that a system adheres to a specification. If our initial spec contained requirements and expectations, they should be tested.

The tools under Visual Studio.NET are:

  • Visual Studio Analyzer is a performance analysis tool used to examine and debug distributed applications.

  • Application Center Test is to stress test Web servers for performance and scalability.

    Performance counters gives us the criteria to understand the application.

For Web applications, Microsoft ACT (Application Center Test) is used. Microsoft Web Application Stress Tool (WAST) is another tool that can be used to test the traffic that is coming to a Web application. (WAST can be downloaded from MS Web Application Stress Tool This tool will not support ViewState object, because it cannot dynamically build the response.

Testing Methodology

Testers need to be chosen from people who know the working system and business rules well. As testers use the system they will probably come up with new specifications and rules. This is dangerous and should be avoided if these specifications are additional features. Developed pieces and project timeline should be met before further development occurs. You should provide a means for allowing the testers to note these features so they can be considered for future enhancements.

Other things to consider when testing:

  • What about other user interfaces and applications?
  • What are the performance counters to watch them?
  • What other tools there are that needs to be used for tests, for example for windows applications and other processes? These test the user interfaces.
  • What else is needed to test the business logic?

Physical Architecture, Deployment, and Operational Requirements

We have finally finished the first phase of development for our FoodMovers project. Instrumentation has been included and tested. Performance counters have been set up. The tested system needs to be as close to production as possible. This will avoid the problems that the live system will face. The application need to be staged and then deployed in the production environment.

That is why it is advised to have separate data centers for developing, testing, staging, and stress testing enterprise-level applications such as our FoodMovers system.

I will wrap up this project with an overall view of the deployment of our system. This section is intentionally non-specific, as each environment will require a different set of implementation details.

One recommended approach is diagrammed in Figure 11.

Aa302167.foodmovers6_11(en-us,MSDN.10).gif

Figure 11. The development, testing, and deployment process

There are four different environments where the application lives during its development and deployment lifecycle:

  • Development Environment. Coding and development are accomplished in this environment. The application is built, compiled, and packaged here. The development team uses VSS (Microsoft Visual SourceSafe®) to check in, check out and get the latest files. The test team packages this application for test and then it is packaged for release and then the deployment process starts. The admin team is responsible from the deployment process.
  • Test Environment. The actual deployment process is tested in this environment. It is important that test environment is the same or at least very similar to the production environment. Installation and configuration of the application is done here. Packaging the application, deployment and configuration of the application is also done here.
  • Staging Environment. The staging environment is the scaled-down version of the production system. For example, a production environment might have 100 servers in the Web farm. A staging environment might only have five. For the fewest deployment problems, the staging environment should be exactly the same as the production environment. The network and even the brands of the physical machines must be the same or at least very similar to the development environment. All the tests are done in this environment from traffic tests to network tests.
  • Production Environment. The applications go live in this environment. Administrative work is done in the production environment.

For the FoodMovers application, I have discussed the physical deployment of the production environment briefly in Section 2, Templates, Policies, Database, and Service Interface Design. The physical architecture and design of the production environment is shown in Figure 12.

Aa302167.foodmovers6_12(en-us,MSDN.10).gif

Figure 12. FoodMovers Project physical architecture

Physical tiers are separated by firewalls or other security boundaries to create different units of trust or security contexts. For example, the database servers must be harder to access from untrusted sources. The same goes for the application farm because the business rules and logic resides there. In contrast, however, the Web servers accessible to user interaction are more open to the outside traffic. The firewalls between the Web servers and the application farm and the database servers must be configured according to each security context.

There are two main families of physical tiers, farms and clusters. Farms consist of identically configured and extendable sets of servers sharing the workload. Clusters are specialized sets of computers controlling a shared resource such as a data store, designed to handle failures of individual nodes gracefully.

Implementing this policy, the servers that keep and manage states are clustered. This includes SQL Server and BizTalk Server. In addition to that they can be load-balanced to share the workload. Stateless servers, on the other hand, do not need to be clustered. They will only be load balanced.

Problems

As I mentioned, the staging and test environments must be similar to the production environment for the steps to go smoothly. The more similar they are, the fewer hassles the deployment process will have in going live.

Because it is costly to have a copy of the production environment as the test and staging environments, there will be pressure to limit the test or staging procedures. This is a bad idea. Any costs in equipment will usually be must less than the labor spent trying to track down deployment problems.

Due to differences in the physical architecture of the different environments, some problems may be encountered. These problems can come from:

  • Firewalls. The production environment has firewalls to secure business components and data. However, the test and staging environments might not have firewalls deployed. Therefore, the port restrictions and direction of communication need to be taken into consideration. Lack of firewalls in the test and staging environments may cause problems in the production system. An alternative to actual firewall machines is to at least install firewall emulator software into the test environment.
  • Network topology. The staging environment may be smaller than the production environment, but the network topology should be consistent for a meaningful test. In distributed applications such as FoodMovers, the components can be deployed in different physical machines that will call and communicate with each other. For example, the Web farm needs to communicate with the application farm components. The communication across computers must work as expected.
  • Processor count. When the production environment machines are multi-processor, and the test and staging environments are not, the load-balancing tests must be adjusted considering the multi-processor power. However, the multi-threaded code might behave unexpectedly. Therefore, code must also be tested in multi-processor machines to make sure the multi-threaded code works as expected.

Attention to these details will give you the smoothest deployment possible.

Deployment

Deployment occurs after the solution is built. The solution should be packaged and deployed in the staging and live environments. To distribute the FoodMovers application, we should have a solution and tools to package and deploy it. Visual Studio .NET provides us with tools that are necessary to package and deploy FoodMovers application.

Visual Studio .NET provides several options for packaging applications, including Microsoft Windows Installer and CAB files. Some .NET–based applications can be deployed without packaging by simply copying the right files to the destination, sending them through e-mail, or providing FTP downloads such as Web pages and Web Services.

Other tools and Microsoft services that are used to distribute applications are:

  • Microsoft Application Center
  • Microsoft Systems Management Server
  • Microsoft Active Directory

The physical architecture is only a part of application's deployment environment. There are operational requirements that must be considered for an application to run and live in its environment.

Third Party Installation Tools

In addition to the deployment tools included in Visual Studio, installation tools that support Windows Installer are available from third-party vendors. These tools may support additional Windows Installer authoring features that are not available in Visual Studio deployment projects.

Here are a couple of third-party tools I have worked with:

  • InstallShield Developer is a Windows Installer setup-authoring solution that provides control of the Windows Installer service and full support for .NET application installations. The Developer edition provides the option to create .NET installations directly from within the Microsoft Visual Studio .NET IDE or from the InstallShield's own IDE. For more information, see the InstallShield Web site.
  • Wise for Visual Studio .NET operates directly within Microsoft Visual Studio .NET. It merges the installation and application development lifecycles so that applications and installations are designed, coded, and tested together. For more information, see the Wise Web site.

Conclusion

In this project, we have moved from an idea to a completed, deployed system. We have created the system using the process of Architect...Design...Implement...Deploy that provides the best scalability, both in development and deployment. We have developed the system with Visual Studio .NET using state-of-the-art Web service and user interface components included with the .NET Framework Class Library. We have tested and deployed the system, putting in place plenty of performance monitoring tools and methods.

As a review, there are some important points that should be considered at every point in the design of the system. For the application to run without problems, we need to specify some operational requirements. In the design of the application these requirements need to be considered. These include:

  • Scalability. Asynchronous operations are more scalable then synchronous operations. Caching data where it is required avoids holding state unnecessarily. To achieve scalability we partition data, resources, and operations. For example, we are applying load-balancing in the Web servers, which will partition the incoming traffic to available processing power.
  • Availability. We need to avoid single points of failure. We cluster application and database servers for fault-tolerance. Planning an effective backup strategy is a part of the availability. Another important point is testing and debugging application and code extensively.
  • Maintainability. Structure your code in a predictable manner. Isolate frequently changing data and behavior. Use metadata for configuration and program parameters. Use pluggable types. Expose Public Types in your Interfaces.
  • Security. Evaluate the risks. Apply the principle of "least privilege." Perform authentication checks at the boundary of each security zone. Carefully consider the role of user context in asynchronous business processes. For document exchange through the Internet, use encryption and signatures.
  • Manageability. Use instrumentation for health monitoring, service level agreement verification, and capacity planning. Monitor applications as they run to see how they behave. Report and fix infrastructure problems as well as business problems. The capacity plan can be updated as more customers use the system.
  • Performance. Application and service performance is critical to a good user experience and efficient hardware utilization. Define the measurable performance requirements for specific operations, then do performance testing, and analyze the test results. If the application does not meet the performance goals, identify bottlenecks in the application and repeat until the performance results meet the goals. The application and the demand for it will grow through time. The infrastructure needs to adapt to the changes in order to perform.
    1. Define the measurable performance requirements for specific operations (for example, throughput and/or latency under certain utilization, such as "50 requests per second with 70% average CPU usage on a specific hardware configuration").

    2. Do performance testing: Stress test the system and collect profiling information.

    3. Analyze the test results: Does the application meet the performance goals?

    4. If the application does not meet the performance goals, identify bottlenecks in the application. (For tools that can help you isolate performance bottlenecks, see the link at the end of this list.)

    5. Repeat Step 2 until the performance results meet the goals

      Some of these requirements may contradict with each other's solutions. For example, when security added to the system performance problems can come up. Or, it is common to lower the manageability of an application favoring security. It is important to decide for the operational requirements in the application design as I have discussed in Section 2, Templates, Policies, Database, and Service Interface Design.

Unfinished Business

We have covered a lot of ground in this project. However, there are still some things not yet implemented.

  • The Order Manager is designed to accept orders entered from a Pocket PC device. The .NET Framework uses the Compact Framework to interface with compact devices. Visual Studio provides a visual debugging tool to assist in the development and testing for compact devices.
  • The Store Order Interface also has the ability to accept orders as Excel spreadsheets. The Office SDK is included in Visual Studio. This makes it easy to open Excel spreadsheets as objects and navigate through them.
  • The Warehouse Manager is also designed to accept input from a Pocket PC device.
  • The Warehouse Manager needs a Shipping Manifest report that provides a list of goods to be placed on each truck. This is developed using Crystal Reports.
  • As part of the ordering process, I mentioned briefly the ability to connect to external Web services, such as credit checks, before approving the order. The Order Manager can do this work.
  • As the system is deployed, more external suppliers will have requirements for interfacing. These will need to be developed on a case-by-case basis.

Author's Note

I really enjoyed building this system and writing this project. I certainly learned a lot on the way, and I really respect all of the hard work that Microsoft has done for the past few years in the area of Web services and enterprise interconnectivity.

I hope that this has been a helpful project for you, and that we can meet again on a future project! For now, you may want to check out my book, Web Services Implementation Guide. It covers much of the architectural reasons for deciding to move to Web services. To see an overview of the entire FoodMovers project, read FoodMovers: Building Distributed Applications using Microsoft Visual Studio .NET.

About the Author

Brian Travis is Chief Technical Officer and Founder of Architag International Corporation, a consulting and training company based in Englewood, Colorado. Brian is an expert in real-world XML implementations. Since founding Architag in 1993, he has created intelligent content management systems and e-business solutions for Architag clients around the world. Brian is also a noted instructor and popular lecturer in XML and related standards. In his role as principal instructor for Architag University, he has been teaching clients about XML in the U.S., Europe, Africa, and Asia.

Brian has lectured at seminars around the world and has written about XML, Web services, and related technologies. His most recent book, Web Services Implementation Guide is a guide for IT architects and developers who need to integrate internal systems and external business partners. The book provides the basis for understanding the goals of Web services for the Enterprise Architect, Project Architect, Deployment Architect, developer, and manager. It outlines the steps an organization must take in order to align itself with the new world of Web services.