Delivering End-to-End High-Productivity Computing
by Marc Holmes and Simon Cox
Summary: Performing a complex computational science and engineering calculation today is more than about just buying a big supercomputer. Although HPC traditionally stands for "high-performance computing," we believe that the real end-to-end solution should be about "high-productivity computing." What we mean by "high-productivity computing" is the whole computational and data-handling infrastructure, as well as the tools, technologies, and platforms required to coordinate, execute, and monitor such a calculation end-to-end.
Many challenges are associated with delivering a general high-productivity computing (HPC) solution for engineering and scientific domain problems. In this article, we discuss these challenges based on the typical requirements of such problems, propose various solutions, and demonstrate how they have been deployed to users in a specific end-to-end environmental-science exemplar. Our general technical solution will potentially translate to any solution requiring controlling and interface layers for a distributed service-oriented HPC service. (27 printed pages)
Introduction
Requirements of High-Productivity Computing Solutions
Enhancing Value of High-Productivity Computing Solutions
Defining the Problem and Solution
High-Performance Computing with Microsoft Windows Compute Cluster Server Edition
Application Logic with Windows Workflow Foundation
Activity Libraries
Services
Security and Policy
User Experience with Microsoft Office SharePoint Server
Monitoring with Microsoft Operations Manager and PowerShell
Providing an Authoring Experience with Microsoft Visual Studio 2005
Developing
Publishing
Workflow Compilation
Technical Architecture
Sidebars
Conclusion
Further Resources
Performing a complex computational science and engineering calculation today is more than about just buying a big supercomputer. Although HPC traditionally stands for "high-performance computing," we believe that the real end-to-end solution should be about "high-productivity computing." What we mean by "high-productivity computing" is the whole computational and data-handling infrastructure, as well as the tools, technologies, and platforms required to coordinate, execute, and monitor such a calculation end-to-end.
Many challenges are associated with delivering a general high-productivity computing (HPC) solution for engineering and scientific domain problems. In this article, we discuss these challenges based on the typical requirements of such problems, propose various solutions, and demonstrate how they have been deployed to users in a specific end-to-end environmental-science exemplar. Our general technical solution will potentially translate to any solution requiring controlling and interface layers for a distributed service-oriented HPC service.
The solution considers the overall value stream for HPC solutions and makes use of Microsoft technology to improve this value stream beyond simply performance enhancements to hardware, software, and algorithms which—as is described—is not always a viable option. Key to our solution is the ability to interoperate with other systems via open standards.
In the domains of engineering and science, HPC solutions can be used to crunch complex mathematical problems in a variety of areas, such as statistical calculations for genetic epidemiology, fluid dynamics calculations for the aerospace industry, and global environmental modeling. Increasingly, the challenge is in integrating all of the components required to compose, execute, and analyze the results from large-scale computational and data-handling problems.
Even with such diverse differences in the problems, the requirements for the solutions have similar features, because of the domain context and the complexity of the problem at hand.
Because the calculations and industry involvement are diverse, there are no particular solution providers for any given problem, resulting in highly individualized solutions emerging in any given research department or corporation requiring these calculations. This individuality is compounded by the small number of teams actually seeking to solve such problems and perhaps the need to maintain the intellectual property of algorithms or other aspects of specific processes. Individuality is not in itself an issue; it might be a very good thing. But, given that the technical solutions are a means to an end, it is likely that these individual solutions are not "productized" and, thus, are probably difficult to interact with or obscure in other ways.
The commoditization of the infrastructure required to perform large scale computations and handle massive amounts of data has provided opportunities to perform calculations that were previously computationally impractical. Development of new algorithms and parallelization of code to run across computing clusters can have a dramatic effect, reducing times for computation by several orders of magnitude. So, a calculation that will complete "sometime shortly after the heat death of the universe" could perhaps run in several weeks or months. This success has enabled industries to achieve significant results and build upon this enablement to provide further direction for development.
In areas of research, there is a critical need to ensure a useful trail of information for a variety of reasons. There might be just a need to rerun algorithms and ensure that the same result sets are produced as a "peace-of-mind" exercise, but more likely this will be required as part of proof and scientific backup for the publication of research. There might also be statutory reasons for such provenance information: In the aerospace industry, in the event of an air-accident investigation, engineers making specific engineering choices might be liable for the cause of the accident and face criminal proceedings. Therefore, the need to recreate specific states and result sets, as required, and to follow the decision paths to such choices is of extreme importance. The complexity of such tasks is compounded when one considers that the life cycle of an aircraft design could be 50 years.
Deep Thought performed one of the largest HPC tasks in recent (fictional) memory. Given a simple input ("What is the answer to life, the universe, and everything?"), he provided a simple output ("42")—albeit after several million years of processing and a small, embarrassed pause at the end of the process.
The reality, however, is that significant calculations requiring significant amounts of processing are very likely to involve significant amounts of data throughout the life cycle of the calculation. Even with Deep Thought–style input and output simplicity, the operational data sets within the problem space during calculation might be significant. Strategies must be developed to handle this data, and its metadata. Given the need for provenance information, these strategies must be both flexible and robust and integrated into the workflow processes used to coordinate the calculations.
Because of the complexity of the domain problems, and the requirements previously described, the net result is that solutions are typically designed to get the job done. This in itself is something of an achievement, and that is leaving aside the intellectual effort involved in the development of the computational algorithms and conceptual solutions in the first place.
There is, of course, a simple way to improve the value of an HPC solution: Make it quicker. However, for problems of sufficient computational complexity, at a certain point it becomes fruitless to continue to attempt to improve the solution because of technology constraints. The efforts to improve the speed of computation can be represented by the cycle shown in Figure 1.
Figure 1. Efforts to improve the speed of computation fall in three interrelated zones forming a cycle.
To improve the speed of calculation, a development team may:
- Use better or more hardware. The benefits of faster processors, faster disks, more memory, and so on could be limited by the capability of software or algorithms to utilize the available hardware. It is also limited by the release cycles of next-generation hardware and is probably severely limited by budget.
- Use better or more software. The algorithms themselves are more likely to be a bottleneck than the underlying software, but from time to time, there will be improvements in data handling or other such features to provide a useful enhancement. This might also suffer from budgetary constraints.
- Use better algorithms. Better algorithms require invention that simply might not be possible and is probably less predictable than software or hardware improvements, although when it does occur might provide the most significant improvement of all.
So, the challenge for continued platform enhancement on a fundamental level is straightforward to understand, but not easy to achieve. As a result, the teams using HPC solutions tend to be pragmatic about the length of time it takes to complete calculations as they understand that they are pushing available technology and, as previously mentioned, can be quite happy to be completing the calculation at all.
After computational times have been reduced as far as practical, enhancing the value of such solutions is then about consideration for the overall process and the implications of performing the calculations to further reduce the overall time to new scientific or industrial insight. Given the research/engineering nature of the problems, the overall process is typically a human workflow on either side of the computational task. It is also about the development of a solution set to provide interfaces and controls to enable an effective translation of the overall process, allowing researchers and engineers to carry out their tasks more quickly and efficiently.
Utilizing the available portfolio of Microsoft technologies allows us to create a fuller architecture for a HPC solution and enables the fulfillment of features that drive additional value to teams using the solution. Furthermore, use of technologies can build seamlessly and interoperate with existing infrastructures. This section of the paper will describe a possible architecture and some of the features of that architecture and the value they provide.
It is difficult to fully generalize an HPC solution, because of the specific needs of an individual domain and the business processes that emerge as part of the engineering challenges that are typically present. However, we could consider three scenarios for the solution. The first is the core user scenario: completing a computational job. Two other scenarios, administration and authoring, support this process.
Computation Scenario. The computation process is divided into four core steps for an average user, which will ensure task completion: access, request, compute, and analyze (see Figure 2).
Figure 2. General computation scenario (Click on the picture for a larger image)
- Logon—A user must be able to log on to the solution and be identified as a valid user. The user then is likely to be able to see only the information that is relevant to the user's role and/or group.
- Initialize—Prior to making a job request, there might be a need to set up a dashboard for the execution of the job. Given the nature of a long-running computation, one run of the process might be considered a "project."
- Load—A user must be able to upload or access data sets that are intended to be used as part of the job. Data sets can be large, nonhomogeneous, or sourced externally, so a specific step might be necessary in the workflow to obtain that data and partition it across the cluster node in a way that balances the expected workload.
- Input—A user must then be able to add parameters and associated metadata to ensure that the job will run successfully. These parameters are captured for reuse and auditing.
- Approve—Following the input of parameters and any other required information, it is likely that approval will be required before a job is submitted. Presumably then, some kind of typical approval workflow will ensue, following a user job submission.
- Preprocess—The preprocessing step for a job might do several things. It will likely move the input data sets to the cluster nodes and might perform an initial set of computations, such as the generation of analytical measures for use in the main computation. It might also initialize data structures and assess all data for validity prior to execution.
- Compute—This phase represents the actual parallel computation. Each node will execute a portion of it and might be required to pass intermediate results to other nodes for the whole process to complete (message passing jobs). Alternatively, the data might lend itself to be computed into totally isolated portions, such as in parametric sweeps, or what-if scenarios. This phase might take place over a potentially significant duration. Ideally, some feedback is provided to the end user during this time.
- Post-process—The final step in the actual computation is similar to preprocessing in that it is now likely that some work is completed with the results data. This might involve data movements to warehouses, aggregation of the separate node data, clean-up operations, visualization tasks, and so on.
- Automatic—Although the results from a job are likely to need specialist intervention to truly understand, some automatic analysis might be possible, particularly in the case of statistical analysis where patterns are important and are easy to automate.
- Online—A user should be able to perform some core analysis without requesting the entire result set. This may be presented as tools with typical business-intelligence paradigms—slicing and dicing, and so on.
- Download—Finally, a user should be able to retrieve result sets for advanced offline manipulation as required.
Authoring Scenario. The authoring scenario covers the aspects of the overall process to provide the capabilities for an end user to complete a computation job. Broadly, a process author should be able to develop new capabilities and processes and publish those for end-user consumption. Figure 3 shows the authoring scenario for two actors: authors and power users. The difference between these roles is that an author will likely develop code, whereas a power user is likely to be configuring existing code. The authoring scenario can be described as follows:
Figure 3. General authoring scenario
- Activities—A developer might want to construct discrete process elements or activities that can be used as part of an overall computation job process. An example might be a generic activity for performing FTP of result sets.
- Algorithms—A developer may be developing the actual algorithms for use in the compute step of the end-user process.
- Workflows—A developer or power user may define specific computation workflows by joining together activities and algorithms to create an overall compute step consisting of pre- and post-processing, as well as the actual compute task.
- Catalog—Following development of activities, algorithms, and workflows, the author will want to catalog the development for use by another author or the end user.
- Configure—Some configuration of activities or workflows might be required, such as access restrictions, or other rules governing the use of the development work.
- Release—Finally, the development work should be exposed by the author when it is ready for use. Ideally, this will not require IT administration effort.
Administration Scenario. Finally, administration of the HPC platform is required. In this case, we are considering that the main tasks to support the end user involve monitoring and accounting and are excluding other core administration tasks such as cluster configuration and other general IT infrastructure activities (see Figure 4).The administration scenario can be described as follows:
Figure 4. General administration scenario
- Monitor—Clusters are typically shared resources, managed by specialized personnel. They will monitor the installations for functionality and performance—for example, analyzing the job queues and the resource-consumption trends.
- Account—It is desirable to account for usage of resource utilizations by certain accounts. This helps when the clusters are funded and used by several groups.
- Billing—It is possible to build charge-back models to explicitly bill for usage.
Having described a general series of scenarios that might represent the requirements for an HPC solution, and considering the values described at the outset in the construction of an HPC solution, we can now consider some solution scenarios to satisfy these requirements.
The core service required for the solution is the actual High Performance Computing cluster capability. Microsoft Windows Compute Cluster Server Edition (CCS) provides clustering capabilities to satisfy the Compute step of the problem scenarios (see Figure 5).
Figure 5. Microsoft Windows Compute Cluster Server Edition satisfies the Compute step. (Click on the picture for a larger image)
CCS provides clustering capabilities to a cut-down version of Microsoft Windows Server 2003—allowing, for example, the use of parallelized computations, which takes advantage of many processors in order to complete particularly complex or intensive calculations, such as those found in genetical statistics.
Control of a CCS cluster is handled by a "head node" that communicates with "cluster nodes" to issue instructions and coordinate a series of tasks on a particular job. The cluster nodes might exist on a private network; for optimal performance, this will likely be a low-latency network to ensure that network latency does not affect computation times. The Further Resources section at the end of this document contains links to technical articles on deploying and configuring CCS.
The head node is equipped with a "job scheduler" to allow the insertion of a computation job into a queue for execution when the desired or required number of processors becomes available to complete the job. A small database is provided for the head node to keep track of the current job queue. The job scheduler can be accessed via an interface on the head node, via command line—which can be parameterized or executed with an XML definition of the parameters, or via an API (CCS API) for programmatic interaction.
A CCS cluster can handle several types of job. These are:
- Serial job. This is a typical compute task of simply executing a required program.
- Task flow. This is a series of serial jobs consisting of potentially multiple programs/tasks.
- Parametric sweep. This type of job would execute multiple instances of a single program but with a series of differing parameters, allowing for many result sets in the same time frame as a single task.
- Message Passing Interface job. The most sophisticated job would be the use of a Message Passing Interface (MPI) to parallelize an algorithm and split the computation job amongst many processors, coordinating to provide speed increases for long and complex computations.
Any or all of these types of jobs might be found inside a single computation process. CCS should be able to provide a powerful platform for any of the pre- or post-processing tasks, as well as the main computational task.
Because we can consider the overall computation process to be an instance of a long-running human and machine workflow, we can consider Windows Workflow Foundation (WF) as a suitable starting point for the construction of an application layer and framework for the HPC solution. In fact, WF has several components which can used to provide a complete application layer for the HPC solution. The particular (but not exhaustive) areas that WF can cover are shown in Figure 6:
Figure 6. WF can provide a complete application layer covering most areas in the Request, Compute, and Analyze steps. (Click on the picture for a larger image)
WF forms a core part of the .NET 3.0 Framework and provides a full workflow engine that can be hosted in a variety of environments. The Further Resources section of this document contains several links to more information on Windows Workflow Foundation.
Some of the core features of WF for consideration are as follows:
- Sequential and state workflows—The WF runtime can handle sequential and state-driven workflows, so a huge variety of processes can be described. These workflows can also have retry and exception handling.
- Activity libraries—Workflows consist of "activities" such as decision making, looping, and parallel execution, as well as arbitrary "code" activities, and several of these are ready-to-use in .NET 3.0. Additionally, because WF is used to power server products (for instance, Microsoft Office SharePoint Server 2007), these products also have base activities for consumption inside WF. Finally, activities can be constructed as required to create a bespoke library to suit a specific requirement.
- Rules engine—WF has a rich forward-chaining rules engine which can be used for decision making inside a workflow, but can also be executed outside of workflow instances. Activities are designed to work with this rules engine.
- Designer and rehosting—WF also has a full drag-and-drop design surface used inside Microsoft Visual Studio 2005, but can also be rehosted inside (for example) a Windows Forms application.
- Runtime services—The WF runtime can have services added prior to workflow execution, to intercept the execution of a workflow and perform actions such as persistence or tracking. Services can also be constructed as required.
- Extensible Application Markup Language (XAML)—Finally, WF makes extensive use of XAML to describe workflows and rulesets, meaning that serialization is trivial, and the generation of truly bespoke design surfaces and rules managers is possible.
Given these features, we can see how WF can provide capability to the architecture.
Activity libraries are the building blocks needed for the HPC solution. Examples are as follows:
- Cluster communication—It is possible to build a custom activity that uses the CCS application programming interface to communicate with the job scheduler and create or cancel jobs on the cluster. This custom activity can then be used by any workflow that needs to communicate with the job scheduler, and can provide a higher level of abstraction to "power users" who might need to author new workflows or amend existing workflows without detailed programming knowledge.
- Higher-level, "strongly typed" cluster job activities targeting specific (instead of arbitrary) executables.
- Activities for data movement around the network—perhaps using WF to control SQL Server Integration Services.
- Data upload and FTP activities, for results movement.
- Retry behavior in the event of some failure of the process.
- Notification activities for events during the process.
- Rules-based activities to govern automated optimization of the process, or control access privileges.
Workflows for specific computation scenarios can then be assembled using a combination of these generic activities and some specific activities to enable the desired scenario.
Workflow activities conform to all the typical rules of inheritance; so, it is worth noting that in a heterogeneous environment with multiple clusters of differing types, activities could be written to a general interface and then selected (at design time or run time) to allow execution on a given cluster. A useful example of this is to create an activity that allows the execution of an MPI computation on a local machine—perhaps making use of two processors—for testing purposes.
WF provides several services which can be used as required by attaching the service implementation to the workflow runtime. Three of the most useful for HPC are:
- Tracking service. This uses the notion of tracking profiles and a standard database schema to audit occurrences in a workflow. This can be extended as required, or replaced in a wholesale fashion with a custom implementation. The out-of-the-box service is generally sufficient to provide quite a detailed audit trail of workflow execution.
- Persistence service. This uses a standard database schema to dehydrate and rehydrate workflow instances as required by the runtime, or as specified by host. Rehydration of a running workflow happens when an event on that workflow occurs, making WF an ideal controller for long-running workflows, such as week- or month-long computations. Persistence also provides scalability across multiple controlling applications, and enables the use of "change sets" to rollback a workflow to a previously persisted position and re-execute from there. This has potentially significant advantages for multistage complex computations and for rerunning scientific research computations in a given state.
- Cluster monitor service. This service can register for events and callbacks from the cluster and then perform new activities within a workflow. For instance, the monitoring service may listen for an exception during the computation, or be used to cancel a running computation on the cluster.
Additional services could be constructed alongside these. An example might be a billing service which monitors the durations of processor usage and "bills" a user accordingly.
WF has a powerful forward-chaining rules engine that can be used inside a workflow instance. Rules can be created in code, or declaratively, and are then compiled alongside the workflow for execution.
Rules and policy could have several uses:
- User identity—Access to system resources or types of resources could be granted on the basis of identity, or additional services and workflow steps could be added and removed on the basis of identity.
- Optimization—Some automation of optimization or business process could be provided through the use of rules combined with business intelligence or other known parameters again to complete or remove workflow steps as required.
- Metascheduling—CCS does not currently have a metascheduler, but WF could be used to designate particular clusters for execution based on parameters, such as requested number of processors or current status of the available clusters, or user identity.
So, there is significant power and flexibility in WF. We can also see later on how we can use the features of WF to provide an authoring experience.
Using WF to provide the application logic and control of the process means that we can use any technology to host the application logic—from ASP.NET to Windows Forms to Windows Presentation Foundation (WPF). The screen shots in Figure 7 show the use of WPF and Microsoft Office SharePoint Server (MOSS), respectively, using the same WF-based application layer—although offering very different user experiences.
Figure 7. Using WF-based application layer, WPF (left) and MOSS (right) offer diverse user experiences.
Use of Microsoft Office SharePoint Server (MOSS) provides a solution footprint covering areas in the Access, Request, and Analyze steps (see Figure 8). The reasons are as follows:
Figure 8. Computation scenario with Microsoft Office SharePoint Server (Click on the picture for a larger image)
- Collaboration—Research and other typical uses of HPC revolve around collaboration activities, and MOSS is extremely feature-rich in terms of enabling collaboration around the concept of "sites."
- Access—Research groups might be geographically separated, so that a Web-based interface helps ensure that the platform is available to all users. Significant business value could be generated from providing access to an HPC platform to remote users.
- Extensibility—MOSS can be extended in a variety of ways, from search and cataloging of information to the construction of Web parts to be "plugged in" to the main application. This extensibility also provides a strong user paradigm and development platform and framework for the overall solution.
- Workflow—MOSS is also powered by WF, so that the synergy between the two technologies is quite clear. An advantage here is that the skill sets of teams handling HPC workflow and those handling other business workflow are fully transferable.
To cover the required footprint, we can take advantage of several features of MOSS:
- Access and initialize—The single sign-on and user capabilities of MOSS will effectively provide off-the-shelf access functionality. Initialization can be made extremely easy through the creation of bespoke "sites" representing computation job types. A user can then generate a site representing a project or a job and receive the required configuration of Web parts in order to request, execute, and analyze on a computation job. Additionally, collaboration features such as presence controls and wiki can also be exploited.
- Request and execution—Specific Web forms could be designed to enable parameter input and requests to be made, but a better solution, in terms of capability and flexibility, might be to use Microsoft Office InfoPath. Office InfoPath provides sophisticated Web form capabilities and a well-developed authoring capability. Because it uses XML to maintain the form definition and the data for the form, it is also well-positioned to take advantage of Web services for submission, and the use of Web service enabled workflows to trigger job request approval and job execution. WF itself can be used to define the required approval workflows and is embedded as part of MOSS.
- Analysis—MOSS is also capable of hosting reporting services and Business Scorecard Manager (soon to be PerformancePoint) for dashboard-style reporting. Additionally, Microsoft Office Excel Services can be accessed via MOSS, which makes it an ideal platform for performing the initial analysis.
The main product for monitoring system health is Microsoft Operations Manager (MOM). We can use MOM to monitor the health of the CCS cluster in the same way as for other server environments. In addition, we can provide simple user-monitoring for end users with PowerShell (see Figure 9).
Figure 9. Microsoft Operations Manager can monitor the system health of the CCS cluster.
Finally, we can provide a first-class authoring experience through Visual Studio 2005, some of which is off-the-shelf capability, and some extension (see Figure 10).
Figure 10. Authoring scenario with Visual Studio 2005
Coding is required for authoring activities and algorithms, so that Visual Studio 2005 is an ideal environment for using the .NET Framework to extend the activities or amend the user interface, as well as for using the Microsoft MPI and C++ stacks for the coding of parallel algorithms.
For the development of workflows, Visual Studio 2005 is also the preferred option, although it might not be as suitable for power users as for developers, because of the complexity of the interface. For power users, we have other options:
- Designer rehosting. The entire workflow designer can be rehosted within a Windows Forms application. This application could have a series of specific, cut-down functions to ensure that a power user can easily construct and configure workflow templates. It can also be used to provide some additional quality assurance to the workflow constructs by limiting the activities available to a power user.
- Designer development. Alternatively, because workflow and rule definitions can be described in XAML, an entirely new design surface could be constructed to represent workflow construction. A new design surface is a great option for a domain that has a particular way of diagramming processes or where other hosting options are required—for example, making use of Windows Presentation Foundation or AJAX capabilities of the Web. The Further Resources section contains a link to an ASP.NET AJAX implementation of the design surface.
From a development perspective, we can effectively consolidate all platform development into one environment, and specialize certain aspects through the use of XAML for the definition of workflows. The use of XAML also presents opportunities for publishing.
In the case of activities and algorithms, the publishing paradigm is effectively a software release, as binaries must be distributed accordingly; although, because of the very specific software that is being generated, a distribution mechanism could be built to specifically retrieve the libraries as required.
The Microsoft ClickOnce technology might also be worth investigating for the distribution of a "thick-client" power-user authoring application for automating library updates.
In the case of workflow authoring, publishing could be as simple as transmitting the resulting XAML definitions to a central database and cataloging these definitions. The workflow can be retrieved from the same database by an end user—or simply invoked as the result of a selection on a user interface—and then compiled and executed as required. Described in the next section, implementing this concept means new workflows could potentially be published very quickly, because no code release is required; therefore, there are no release-cycle or administrative requirements.
This capability could be provided as part of a bespoke power-user application, or as an add-in to Visual Studio 2005.
WF workflows are typically compiled into assemblies, and are then executed by the workflow runtime by passing a type that conforms to a workflow into the runtime to create an instance of the workflow. However, a workflow can be executed from a declarative markup file—XAML file—if required, or compiled on the fly from XAML, with rules alongside.
This is useful because it allows a workflow author—someone with knowledge of the coding of workflows—to create a workflow that is not entirely valid (perhaps it is missing some parameters) and save the workflow into a markup file. The markup file can then be amended by a power user to add the required parameters (but perhaps not edit the actual workflow). Now that the workflow is valid, it can be submitted, compiled, and executed by the application layer.
These steps can reduce the cost of process execution by enabling roles to overlap (see Figure 11); expensive authoring skills do not have to be involved at the execution step of the process where less expensive configuration skills can be applied.
Figure 11. Overlapping roles reduce costs.
The solution scenarios can be aggregated into a technical architecture (see Figure 12). Conceptually, CCS should be considered a service gateway in the same way that SQL Server provides services to an application: The solution is not centered on CCS, but it utilizes its capabilities, as required.
Figure 12. HPC technical architecture
Similarly, these services are typically abstracted away from a user interface through some sort of application control or business logic layer. Given the nature of interactions with the HPC solution—human workflows—WF is a suitable choice as a controller for these services.
WF also has other advantages. It is designed to be hosted as required, so that the user-interface layer could be a Web or Windows application written specifically for the task. In the instance described previously, we described Microsoft Office SharePoint Server as a suitable interface, because it has strong hooks for WF and the integration with Office (including Office InfoPath) and the collaboration features of MOSS could be useful in the scientific and engineering domains. The choice, of course, should be based on the requirements of a given domain.
The Grid-ENabled Integrated Earth (GENIE) system model project provides a grid-enabled framework that facilitates the integration, execution, and management of constituent models for the study of the Earth system over millennial timescales. Simulations based on the GENIE framework must follow complicated procedures of operations across different models and heterogeneous computational resources. The GENIE project has developed a framework for the composition, execution and management of integrated Earth-system models. Component codes (ocean, atmosphere, land surface, sea-ice, ice-sheets, biogeochemistry, and so on) of varying resolution and complexity can be flexibly coupled together to form a suite of efficient climate models capable of simulation over millennial timescales. The project brings together a distributed group of environmental scientists with a common interest in developing and using GENIE models to understand the Earth system. Earth-system simulations are both computationally intensive and data intensive. The GENIE framework has been designed to support running of such simulations across multiple distributed data and computing resources over a lengthy period of time. We exploit a range of heterogeneous resources, including the U.K. National Grid of parallel computing resources (running both Linux and Microsoft Windows Compute Cluster Server) and desktop cycle stealing at distributed sites. Our back-end data store uses Oracle 10G to store metadata about our simulations and SQL Server to coordinate the persistence tracking of our running workflows (see Figure 12a).
At Supercomputing 2006 in Tampa, FL, we showed how applying the workflow methodology described in the article can provide the GENIE simulations with an environment for rapid composition of simulations and a solid hosting environment to coordinate their execution. The scientists in the collaboration are investigating how Atlantic thermohaline circulation—sea-water density is controlled by temperature (thermo) and salinity (haline), and density differences drive ocean circulation—responds to changes in carbon dioxide levels in the atmosphere, and seek to understand, in particular, the stability of key ocean currents under different scenarios of climate change.
Microsoft Institute of High Performance Computing: Matthew J. Fairman, Andrew R. Price, Gang Xue, Marc Molinari, Denis A. Nicole, Kenji Takeda, and Simon J. Cox. External Collaborators: Tim Lenton (School of Environmental Sciences, University of East Anglia) and Robert Marsh (National Oceanography Centre, University of Southampton).
Figure 12a. High-productivity environmental-science end-to-end support: (a) customer Web interface using WPF, (b) WWF to author and coordinate simulations, and (c) heterogeneous infrastructure consisting of Windows and Linux Compute resources, and Oracle and SQL Server data storage (Click on the picture for a larger image)
Typically, the users of high-productivity computing solutions are highly skilled and highly prized assets to the research and development teams. Their skills and expertise are needed to provide the inputs to the processes, the algorithms for the calculations, and the analysis of the resulting output. However, these assets should be allowed to work as effectively and efficiently as possible. The creation of obscure (in one sense or another) solutions will affect this in a number of possible ways:
The user might need to have some understanding of the system that goes beyond what would typically be expected. For instance, I have to understand the basic principles of a remote control in order to operate my TV, but I don't need to understand the voltage required to power its LCD screen to watch it.
Experts who have been involved in an individual solution might find it hard to escape. For example, if Alice has coded an algorithm for the solution, she might find that she becomes heavily involved in the day-to-day operation of the solution because only she understands the best way to interact with the system to use the algorithm. She should really be put to use creating even better algorithms. The result after one or two generations of improvement is both a high cost of solution operation and a risk problem since Alice has become a key component of the solution.
The solution might also be a problem to administer for core IT teams. An HPC solution crafted by domain experts could easily be analogous to the sales team "linked spreadsheet solutions" that can be the stuff of nightmares for any in-house development and support teams.
The diagram in Figure 13 represents a high-level process for the provision and use of an HPC solution.
Figure 13. Original value stream (Click on the picture for a larger image)
The compression of the value stream in terms of time and cost should be the primary concern for the provision of a solution. Given the issues described with speed improvements to the computational work, improvements should be identified outside of this scope and in other areas of the process.
Suitable candidates for the cost reduction in the process are, broadly, all aspects of the value stream with the probable exception of the algorithm creation, which is the invention and intellectual property of the solution. Reduction of cost is in two parts: the cost to use the solution and the cost of administration. Improving the process for the publication and use of algorithms and providing analytical support could reduce the overall process cost.
Based on the consideration of the value stream, a solution set should consist of tools providing two core features: simplification of task completion and opportunities for improved interaction.
Providing an HPC-based solution using architecture similar to that described might have several benefits for the overall value stream (see Figure 14).
Figure 14. Improved value stream (Click on the picture for a larger image)
There are several areas that could be initially affected:
- Planning for a long-running job—Reductions of both time and cost are possible through the use of intuitive interfaces, validation, and other logic to ensure correct capture of information. In addition, specialist coders or administrators might no longer be needed at this point in the process.
- Publishing a long-running job—Both cost and time might be reduced because the information is captured by the system, and, consequently, the publication of the job can be fully automated.
- Cost of computation—Cost is reduced through role specialization (monitoring can occur from an end-user perspective), and the overall cost across many jobs might be reduced through improved recognition of failing or erroneous jobs that can be cancelled before they have completed execution. This improved recognition is again provided by the use of feedback mechanisms and monitoring presented to the user interface.
- Analysis—Analysis can begin more quickly if the user interface presents information during the job execution; some automatic analysis might be available, reducing initial costs before specialist intervention is required.
The core architecture looks like a familiar n-tier application architecture and, broadly speaking, this is indeed the case. The organization of functionality is quite logical. Figure 15 shows the required components for the architecture.
Figure 15: Components
The user-interface experience can be broken into three parts. Each performs a different function of the overall solution:
- The main user experience for planning and executing a computation can be provided through various technologies. Examples could be Windows Forms, Windows Presentation Foundation, ASP.NET or Microsoft Office SharePoint Server. The technology choice should be appropriate to a particular environmental characteristic (for example, the use of Web technologies where wide availability is required, or WPF where rich interaction is required).
- The workflow authoring experience is provided by Visual Studio 2005 using the Windows Workflow Foundation (WF) authoring design surface. Using the "code beside" model of developing workflows enables the use of a new Source Code Control API interface to submit WF "XAML" files to a central database repository, for example, for execution in the user interface.
- For IT administrators, Microsoft Operations Manager can be used to monitor the health and status of the HPC cluster, though for end users of an HPC system, more user-friendly feedback can be made available through the use of Windows PowerShell.
The application layer consists of the WF runtime hosted as an NT Service. This layer is used primarily to communicate with the cluster job scheduler but offers a variety of functions:
- An activity and workflow library for controlling the cluster and performing data movements and other pre- and post-processing steps.
- A rules and policy engine for managing access to a cluster and potentially providing priorities and "metascheduling" capabilities along with security.
- Tracking information for job execution providing provenance information as required.
- Persistence information allowing scaling of the solution and providing robustness to long running workflows and potentially the ability to re-execute a workflow from a particular state.
The data service contains supporting databases such as tracking and persistence stores for the workflow runtime. In this architecture, it is also likely that there is a store containing results or aggregates of results. Input and output files are likely to be held as files for easier movement around the cluster to the respective compute nodes. Finally, there is a database for holding a catalog of available workflows for execution.
The compute service is the "simplest" part of the architecture, consisting of a cluster of x nodes in a desired configuration.
Communication throughout the architecture is handled using Windows Communication Foundation through a TCP (Remoting) or HTTP channel. This keeps the architecture decoupled and scalable, and offers advantages for differing user interfaces based on user requirements (authoring, execution, reporting).
In a distributed service-oriented framework potentially crossing multiple security domains, federation of identities to permit single sign-on and user-level access control to all aspects of the component architecture is essential. Technologies offering such capabilities include those based around certificates, active directory federation (along with extensions to interoperate with Unix systems), and Microsoft Windows CardSpace.
Although aspects of the proposed architecture are intended for reuse, it is unlikely that the architecture can be significantly generic to be reused simply from scenario to scenario. That the architecture represents a "shell" or "ball-and-socket" approach is more likely: Development will be required in each scenario, but with various reusable components and approaches. A speculative ratio (based on development experience) for generic-to-specific components is shown in Figure 16.
These ratios could be explained as follows:
- User interface 50:50—Core aspects of the user interface—file-movement controls, job-execution controls, and monitoring—would likely be immediately reusable and applicable to many scenarios. Specialization would be required—as with any UI—for any given process, particularly with such diverse scenarios as HPC computations.
- Application layer 80:20—The application layer is made up of a series of workflows and activities which can be rebound and reorganized for a particular process. With some work, it is likely that a lot of these activities and logical elements can be reused given new data. Examples may include activities for file movements, CCS integration, policy, and metascheduling. Also, various authoring tools would be applicable for power users in any scenario.
- HPC layer 10:90—Very little of the HPC layer has any reuse because every algorithm will be unique. However, monitoring and installation processes are reusable and match existing Microsoft paradigms.
- Data layer 40:60—Again, unique data structures will be required for any given computation, but tracking, persistence, and billing databases (for example) will all be applicable from scenario to scenario.
Figure 16. Speculative ratio for generic-to-specific components
Architecting for high-productivity computing is not just a case of ensuring the "best" performance in order to compute results as quickly as possible; that is more of an expectation than a design feature. In the context of the overall value stream, the architecture must drive value from other areas, such as ease of access and decreasing cost of specialist skills to operate the system.
A successful architecture for high-productivity computing solutions involves consideration of the overall process alongside the computationally intensive activities, and, therefore, might use several integrated components to perform the individual aspects of the process. Microsoft Cluster Compute Server Edition is easy to include as a service gateway inside a general n-tier application structure, and is simple to integrate via command-line or API hooks.
Other available technologies can provide the basis for an HPC solution. In particular, Windows Workflow Foundation is well-suited to provide an application interface to CCS, because the features and extensibility of WF, such as persistence and tracking, lend themselves to the requirements of HPC-based solutions. The use of WF also opens up the available choices of user-experience technologies to be applied in a given domain.
The following resources might be of help when considering the technologies that make up the solution proposed in this document:
- Microsoft Windows Compute Cluster Server 2003
- Microsoft SQL Server 2005 (including SQL Server Integration Services)
- Microsoft Windows Workflow Foundation
- Microsoft Windows Communication Foundation
- Microsoft Windows Presentation Foundation
- Microsoft Office SharePoint Server 2007
- Microsoft Operations Manager
- Microsoft Windows PowerShell
Other resources referred to in this document:
- Microsoft Corporation. "Deploying and Managing Microsoft Windows Compute Cluster Server 2003." Microsoft TechNet. June 2006.
- Shukla, Dharma, and Bob Schmidt. Essential Windows Workflow Foundation. Upper Saddle River, NJ: Addison-Wesley, 2007.
- Implementing the WF Design Surface in ASP.NET (with AJAX)
- Mezquita, Marc. "Performance Characteristics of Windows Workflow Foundation." Microsoft Developer Network. November 2006.
Peter Williams (Microsoft Consulting Services, Reading)
Marc Holmes is an Architect Evangelist for the Microsoft Technology Centre at Thames Valley Park in the U.K., where he specializes in architecture, design, and proof-of-concept work with a variety of customers, partners, and ISVs. Prior to Microsoft, Marc most recently led a significant development team as Head of Applications and Web at BBC Worldwide. Marc is the author of Expert .NET Delivery Using NAnt and CruiseControl.NET (Berkeley, CA: APress, 2005) and maintains a blog at https://www.marcmywords.org. He can be contacted at marc.holmes@microsoft.com.
Simon Cox is Professor of Computational Methods in the Computational Engineering Design Research Group within the School of Engineering Sciences of the University of Southampton. An MVP Award holder for Microsoft Windows Server System: Infrastructure Architect, he directs the Microsoft Institute for High Performance Computing at the University of Southampton and has published over 100 papers. He currently heads a team that applies and develops computing in collaborative, interdisciplinary computational science and engineering projects, such as computational electromagnetics, liquid crystals, Earth-system modeling, biomolecular databasing, applied computational algorithms, and distributed service-oriented computing.
This article was published in the Architecture Journal, a print and online publication produced by Microsoft. For more articles from this publication, please visit the Architecture Journal Web site.