Chapter 3 — Design Guidelines for Application Performance

Article
07/14/2010

Retired Content
This content is outdated and is no longer being maintained. It is provided as a courtesy for individuals who are still using these technologies. This page may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.

Improving .NET Application Performance and Scalability

J.D. Meier, Srinath Vasireddy, Ashish Babbar, Rico Mariani, and Alex Mackman
Microsoft Corporation

May 2004

Related Links

Home Page for Improving .NET Application Performance and Scalability

Chapter 4 — Architecture and Design Review of a .NET Application for Performance and Scalability

Checklist: Architecture and Design Review for Performance and Scalability

Send feedback to Scale@microsoft.com

patterns & practices Library

Summary: This chapter presents a set of performance guidelines and principles for application architects and designers. These help you to proactively make key design choices up front in the application life cycle, while balancing other nonperformance-related tradeoffs.

Objectives
Overview
How to Use This Chapter
Principles
Deployment Considerations
Scale Up vs. Scale Out
Architecture and Design Issues
Coupling and Cohesion
Communication
Concurrency
Resource Management
Caching
State Management
Data Structures and Algorithms
Design Guidelines Summary
Desktop Applications Considerations
Browser Client Considerations
Web Layer Considerations
Business Layer Considerations
Data Access Layer Considerations
Summary
Additional Resources

Objectives

Learn design tradeoffs for performance and scalability.
Apply a principle-based approach to your design.
Identify and use a performance and scalability framework.
Learn design considerations for scaling up and scaling out.
Minimize communication and data transformation overhead.
Improve application concurrency.
Manage resources efficiently.
Cache application data effectively.
Manage application state efficiently.
Design an efficient presentation layer.
Design an efficient business layer.
Design an efficient data access layer.

Overview

Performance and scalability are two quality-of-service (QoS) considerations. Other QoS attributes include availability, manageability, integrity, and security. These should be balanced with performance and scalability, and this often involves architecture and design tradeoffs.

During your design phase, identify performance objectives. How fast is fast enough? What are your application response time and throughput constraints? How much CPU, memory, disk I/O, and network I/O is it acceptable for your application to consume? These are key factors that your design must be able to accommodate.

The guidelines in this chapter will help you design applications that meet your performance and scalability objectives. The chapter begins with a set of proven design principles and design process principles. It then covers deployment issues that you must consider at design time. Subsequent sections present design guidelines organized by the performance and scalability frame introduced in Chapter 1, "Fundamentals of Engineering for Performance". Finally, a set of recommendations are presented that focus on the client, presentation layer, business layer and data access layer.

How to Use This Chapter

Use this chapter to help you design your applications and evaluate your design decisions. You can apply the design guidelines in this chapter to new and existing applications. To get the most out of this chapter:

Jump to topics or read from beginning to end. The main headings in this chapter help you locate the topics that interest you. Alternatively, you can read the chapter from beginning to end to gain a thorough appreciation of performance and scalability design issues.
Use the "Architecture" section in each technical chapter in Part III of this guide. Refer to the technical chapter architecture sections to make better design and implementation choices.
Use the "Design Considerations" section in each technical chapter in Part III of this guide. Refer to the technical chapter design considerations sections for specific technology-related design guidelines.
Use the "Checklists" section of this guide. Use "Checklist: Architecture and Design Review for Performance and Scalability" to quickly view and evaluate the guidelines presented in this chapter.

Principles

The guidance throughout this chapter guide is based on principles. Performance, like security and many other aspects of software engineering, lends itself to a principle-based approach, where proven principles are applied regardless of the implementation technology or application scenario.

Design Process Principles

Consider the following principles to enhance your design process:

Set objective goals. Avoid ambiguous or incomplete goals that cannot be measured such as "the application must run fast" or "the application must load quickly." You need to know the performance and scalability goals of your application so that you can (a) design to meet them, and (b) plan your tests around them. Make sure that your goals are measurable and verifiable.

Requirements to consider for your performance objectives include response times, throughput, resource utilization, and workload. For example, how long should a particular request take? How many users does your application need to support? What is the peak load the application must handle? How many transactions per second must it support?

You must also consider resource utilization thresholds. How much CPU, memory, network I/O, and disk I/O is it acceptable for your application to consume?
Validate your architecture and design early. Identify, prototype, and validate your key design choices up front. Beginning with the end in mind, your goal is to evaluate whether your application architecture can support your performance goals. Some of the important decisions to validate up front include deployment topology, load balancing, network bandwidth, authentication and authorization strategies, exception management, instrumentation, database design, data access strategies, state management, and caching. Be prepared to cut features and functionality or rework areas that do not meet your performance goals. Know the cost of specific design choices and features.
Cut the deadwood. Often the greatest gains come from finding whole sections of work that can be removed because they are unnecessary. This often occurs when (well-tuned) functions are composed to perform some greater operation. It is often the case that many interim results from the first function in your system do not end up getting used if they are destined for the second and subsequent functions. Elimination of these "waste" paths can yield tremendous end-to-end improvements.
Tune end-to-end performance. Optimizing a single feature could take away resources from another feature and hinder overall performance. Likewise, a single bottleneck in a subsystem within your application can affect overall application performance regardless of how well the other subsystems are tuned. You obtain the most benefit from performance testing when you tune end-to-end, rather than spending considerable time and money on tuning one particular subsystem. Identify bottlenecks, and then tune specific parts of your application. Often performance work moves from one bottleneck to the next bottleneck.
Measure throughout the life cycle. You need to know whether your application's performance is moving toward or away from your performance objectives. Performance tuning is an iterative process of continuous improvement with hopefully steady gains, punctuated by unplanned losses, until you meet your objectives. Measure your application's performance against your performance objectives throughout the development life cycle and make sure that performance is a core component of that life cycle. Unit test the performance of specific pieces of code and verify that the code meets the defined performance objectives before moving on to integrated performance testing.

When your application is in production, continue to measure its performance. Factors such as the number of users, usage patterns, and data volumes change over time. New applications may start to compete for shared resources.

Design Principles

The following design principles are abstracted from architectures that have scaled and performed well over time:

Design coarse-grained services. Coarse-grained services minimize the number of client-service interactions and help you design cohesive units of work. Coarse-grained services also help abstract service internals from the client and provide a looser coupling between the client and service. Loose coupling increases your ability to encapsulate change. If you already have fine-grained services, consider wrapping them with a facade layer to help achieve the benefits of a coarse-grained service.
Minimize round trips by batching work. Minimize round trips to reduce call latency. For example, batch calls together and design coarse-grained services that allow you to perform a single logical operation by using a single round trip. Apply this principle to reduce communication across boundaries such as threads, processes, processors, or servers. This principle is particularly important when making remote server calls across a network.
Acquire late and release early. Minimize the duration that you hold shared and limited resources such as network and database connections. Releasing and re-acquiring such resources from the operating system can be expensive, so consider a recycling plan to support "acquire late and release early." This enables you to optimize the use of shared resources across requests.
Evaluate affinity with processing resources. When certain resources are only available from certain servers or processors, there is an affinity between the resource and the server or processor. While affinity can improve performance, it can also impact scalability. Carefully evaluate your scalability needs. Will you need to add more processors or servers? If application requests are bound by affinity to a particular processor or server, you could inhibit your application's ability to scale. As load on your application increases, the ability to distribute processing across processors or servers influences the potential capacity of your application.
Put the processing closer to the resources it needs. If your processing involves a lot of client-service interaction, you may need to push the processing closer to the client. If the processing interacts intensively with the data store, you may want to push the processing closer to the data.
Pool shared resources. Pool shared resources that are scarce or expensive to create such as database or network connections. Use pooling to help eliminate performance overhead associated with establishing access to resources and to improve scalability by sharing a limited number of resources among a much larger number of clients.
Avoid unnecessary work. Use techniques such as caching, avoiding round trips, and validating input early to reduce unnecessary processing. For more information, see "Cut the Deadwood," above.
Reduce contention. Blocking and hotspots are common sources of contention. Blocking is caused by long-running tasks such as expensive I/O operations. Hotspots result from concentrated access to certain data that everyone needs. Avoid blocking while accessing resources because resource contention leads to requests being queued. Contention can be subtle. Consider a database scenario. On the one hand, large tables must be indexed very carefully to avoid blocking due to intensive I/O. However, many clients will be able to access different parts of the table with no difficulty. On the other hand, small tables are unlikely to have I/O problems but might be used so frequently by so many clients that they are hotly contested.

Techniques for reducing contention include the efficient use of shared threads and minimizing the amount of time your code retains locks.
Use progressive processing. Use efficient practices for handling data changes. For example, perform incremental updates. When a portion of data changes, process the changed portion and not all of the data. Also consider rendering output progressively. Do not block on the entire result set when you can give the user an initial portion and some interactivity earlier.
Process independent tasks concurrently. When you need to process multiple independent tasks, you can asynchronously execute those tasks to perform them concurrently. Asynchronous processing offers the most benefits to I/O bound tasks but has limited benefits when the tasks are CPU-bound and restricted to a single processor. If you plan to deploy on single-CPU servers, additional threads guarantee context switching, and because there is no real multithreading, there are likely to be only limited gains. Single CPU-bound multithreaded tasks perform relatively slowly due to the overhead of thread switching.

Deployment Considerations

Runtime considerations bring together application functionality, choices of deployment architecture, operational requirements, and QoS attributes. These aspects are shown in Figure 3.1.

Ff647801.ch03-deployment-framework(en-us,PandP.10).gif

Figure 3.1: Deployment considerations

During the application design phase, review your corporate policies and procedures together with the infrastructure your application is to be deployed on. Frequently, the target environment is rigid, and your application design must reflect the imposed restrictions. It must also take into account other QoS attributes, such as security and maintainability. Sometimes design tradeoffs are required, for example because of protocol restrictions or network topologies.

The main deployment issues to recognize at design time are the following:

Consider your deployment architecture.
Identify constraints and assumptions early.
Evaluate server affinity.
Use a layered design.
Stay in the same process.
Do not remote application logic unless you need to.

Consider Your Deployment Architecture

Nondistributed and distributed architectures are both suitable for .NET applications. Both approaches have different pros and cons in terms of performance, scalability, ease of development, administration, and operations.

Nondistributed Architecture

With the nondistributed architecture, presentation, business, and data access code are logically separated but are physically located in a single Web server process on the Web server. This is shown in Figure 3.2.

Ff647801.ch03-single-process-web-app(en-us,PandP.10).gif

Figure 3.2: Nondistributed application architecture: logical layers on a single physical tier

Pros

Nondistributed architecture is less complex than distributed architecture.
Nondistributed architecture has performance advantages gained through local calls.

Cons

With nondistributed architecture, it is difficult to share business logic with other applications.
With nondistributed architecture, server resources are shared across layers. This can be good or bad — layers may work well together and result in optimized usage because one of them is always busy. However, if one layer requires disproportionately more resources, you starve resources from another layer.

Distributed Architecture

With the distributed architecture, presentation logic communicates remotely to business logic located on a middle-tier application server as shown in Figure 3.3.

Ff647801.ch03-remote-business-logic(en-us,PandP.10).gif

Figure 3.3: Distributed architecture: logical layers on multiple physical tiers

Pros

Distributed architecture has the ability to scale out and load balance business logic independently.
Distributed architecture has separate server resources that are available for separate layers.
Distributed architecture is flexible.

Cons

Distributed architecture has additional serialization and network latency overheads due to remote calls.
Distributed architecture is potentially more complex and more expensive in terms of total cost of ownership.

Identify Constraints and Assumptions Early

Identify any constraints and assumptions early in the design phase to avoid surprises later. Involve members of the network and infrastructure teams to help with this process. Study any available network diagrams, in addition to your security policy and operation requirements.

Target environments are often rigid, and your application design needs to accommodate the imposed restrictions. Sometimes design tradeoffs are required because of considerations such as protocol restrictions, firewalls, and specific deployment topologies. Likewise, your design may rely on assumptions such as the amount of memory or CPU capacity or may not even consider them. Take maintainability into consideration. Ease of maintenance after deployment often affects the design of the application.

Evaluate Server Affinity

Affinity can have a positive or negative impact on performance and scalability. Server affinity occurs when all requests from a particular client must be handled by the same server. It is most often introduced by using locally updatable caches or in-process or local session state stores. If your design causes server affinity, scaling out at a later point forces you to re-engineer or develop complex synchronization solutions to synchronize data across multiple servers. If you need to scale out, consider affinity to resources that may limit your ability. If you do not need to support scaling out, consider the performance benefits that affinity to a resource may bring.

Use a Layered Design

A layered design is one that factors in presentation, business, and data access logic. A good layered design exhibits high degrees of cohesion by keeping frequently interacting components within a single layer, close to each other. A multilayered approach with separate presentation, business, and data access logic helps you build a more scalable and more maintainable application. For more information, see "Coupling and Cohesion," later in this chapter.

Stay in the Same Process

Avoid remote method calls and round trips where possible. Remote calls across physical boundaries (process and machine) are costly due to serialization and network latency.

You can host your application's business logic on your Web server along with the presentation layer or on a physically separate application server. You achieve optimum performance by locating your business logic on the Web server in your Web application process. If you avoid or exploit server affinity in your application design, this approach supports scaling up and scaling out. You can add more hardware to the existing servers or add more servers to the Web layer as shown in Figure 3.4.

Ff647801.ch03-web-farm(en-us,PandP.10).gif

Figure 3.4: Scaling out Web servers in a Web farm

**Note **This deployment architecture still assumes logical layers: presentation, business, and data access. You should make logical layers a design goal, regardless of your physical deployment architecture.

Do Not Remote Application Logic Unless You Need To

Do not physically separate your business logic layer unless you need to and you have evaluated the tradeoffs. Remote logic can increase performance overhead. Performance overhead results from an increased number of round trips over the network with associated network latency and serialization costs.

However, you might need to physically separate your business layer, as in the following scenarios:

You might want to collocate business gateway servers with key partners.
You might need to add a Web front end to an existing set of business logic.
You might want to share your business logic among multiple client applications.
The security policy of your organization might prohibit you from installing business logic on your front-end Web servers.
You might want to offload the processing to a separate server because your business logic might be computationally intensive.

If you do need a remote application layer, use design patterns that help minimize the performance overhead. For more information, see "Communication," later in this chapter.

Scale Up vs. Scale Out

Your approach to scaling is a critical design consideration because whether you plan to scale out your solution through a Web farm, a load-balanced middle tier, or a partitioned database, you need to ensure that your design supports this.

When you scale your application, you can choose from and combine two basic choices:

Scale up: Get a bigger box.
Scale out: Get more boxes.

Scale Up: Get a Bigger Box

With this approach, you add hardware such as processors, RAM, and network interface cards to your existing servers to support increased capacity. This is a simple option and one that can be cost effective. It does not introduce additional maintenance and support costs. However, any single points of failure remain, which is a risk. Beyond a certain threshold, adding more hardware to the existing servers may not produce the desired results. For an application to scale up effectively, the underlying framework, runtime, and computer architecture must scale up as well. When scaling up, consider which resources the application is bound by. If it is memory-bound or network-bound, adding CPU resources will not help.

Scale Out: Get More Boxes

To scale out, you add more servers and use load balancing and clustering solutions. In addition to handling additional load, the scale-out scenario also protects against hardware failures. If one server fails, there are additional servers in the cluster that can take over the load. For example, you might host multiple Web servers in a Web farm that hosts presentation and business layers, or you might physically partition your application's business logic and use a separately load-balanced middle tier along with a load-balanced front tier hosting the presentation layer. If your application is I/O-constrained and you must support an extremely large database, you might partition your database across multiple database servers. In general, the ability of an application to scale out depends more on its architecture than on underlying infrastructure.

Guidelines

Consider the following approaches to scaling:

Consider whether you need to support scale out.
Consider design implications and tradeoffs up front.
Consider database partitioning at design time.

Consider Whether You Need to Support Scale Out

Scaling up with additional processor power and increased memory can be a cost-effective solution, It also avoids introducing the additional management cost associated with scaling out and using Web farms and clustering technology. You should look at scale-up options first and conduct performance tests to see whether scaling up your solution meets your defined scalability criteria and supports the necessary number of concurrent users at an acceptable level of performance. You should have a scaling plan for your system that tracks its observed growth.

If scaling up your solution does not provide adequate scalability because you reach CPU, I/O, or memory thresholds, you must scale out and introduce additional servers. To ensure that your application can be scaled out successfully, consider the following practices in your design:

You need to be able to scale out your bottlenecks, wherever they are. If the bottlenecks are on a shared resource that cannot be scaled, you have a problem. However, having a class of servers that have affinity with one resource type could be beneficial, but they must then be independently scaled. For example, if you have a single SQL Server™ that provides a directory, everyone uses it. In this case, when the server becomes a bottleneck, you can scale out and use multiple copies. Creating an affinity between the data in the directory and the SQL Servers that serve the data allows you to specialize those servers and does not cause scaling problems later, so in this case affinity is a good idea.
Define a loosely coupled and layered design. A loosely coupled, layered design with clean, remotable interfaces is more easily scaled out than tightly-coupled layers with "chatty" interactions. A layered design will have natural clutch points, making it ideal for scaling out at the layer boundaries. The trick is to find the right boundaries. For example, business logic may be more easily relocated to a load-balanced, middle-tier application server farm.

Consider Design Implications and Tradeoffs Up Front

You need to consider aspects of scalability that may vary by application layer, tier, or type of data. Know your tradeoffs up front and know where you have flexibility and where you do not. Scaling up and then out with Web or application servers may not be the best approach. For example, although you can have an 8-processor server in this role, economics would probably drive you to a set of smaller servers instead of a few big ones. On the other hand, scaling up and then out may be the right approach for your database servers, depending on the role of the data and how the data is used. Apart from technical and performance considerations, you also need to take into account operational and management implications and related total cost of ownership costs.

Use the following points to help evaluate your scaling strategy:

Stateless components. If you have stateless components (for example, a Web front end with no in-process state and no stateful business components), this aspect of your design supports scaling up and out. Typically, you optimize the price and performance within the boundaries of the other constraints you may have. For example, 2-processor Web or application servers may be optimal when you evaluate price and performance compared with 4-processor servers; that is, four 2-processor servers may be better than two 4-processor servers. You also need to consider other constraints, such as the maximum number of servers you can have behind a particular load-balancing infrastructure. In general, there are no design tradeoffs if you adhere to a stateless design. You optimize price, performance, and manageability.
Data. For data, decisions largely depend on the type of data:
- Static, reference, and read-only data. For this type of data, you can easily have many replicas in the right places if this helps your performance and scalability. This has minimal impact on design and can be largely driven by optimization considerations. Consolidating several logically separate and independent databases on one database server may or may not be appropriate even if you can do it in terms of capacity. Spreading replicas closer to the consumers of that data may be an equally valid approach. However, be aware that whenever you replicate, you will have a loosely synchronized system.
- Dynamic (often transient) data that is easily partitioned. This is data that is relevant to a particular user or session (and if subsequent requests can come to different Web or application servers, they all need to access it), but the data for user A is not related in any way to the data for user B. For example, shopping carts and session state both fall into this category. This data is slightly more complicated to handle than static, read-only data, but you can still optimize and distribute quite easily. This is because this type of data can be partitioned. There are no dependencies between the groups, down to the individual user level. The important aspect of this data is that you do not query it across partitions. For example, you ask for the contents of user A's shopping cart but do not ask to show all carts that contain a particular item.
- Core data. This type of data is well maintained and protected. This is the main case where the "scale up, then out" approach usually applies. Generally, you do not want to hold this type of data in many places due to the complexity of keeping it synchronized. This is the classic case in which you would typically want to scale up as far as you can (ideally, remaining a single logical instance, with proper clustering), and only when this is not enough, consider partitioning and distribution scale-out. Advances in database technology (such as distributed partitioned views) have made partitioning much easier, although you should do so only if you need to. This is rarely because the database is too big, but more often it is driven by other considerations such as who owns the data, geographic distribution, proximity to the consumers and availability.

Consider Database Partitioning at Design Time

If your application uses a very large database and you anticipate an I/O bottleneck, ensure that you design for database partitioning up front. Moving to a partitioned database later usually results in a significant amount of costly rework and often a complete database redesign.

Partitioning provides several benefits:

The ability to restrict queries to a single partition, thereby limiting the resource usage to only a fraction of the data.
The ability to engage multiple partitions, thereby getting more parallelism and superior performance because you can have more disks working to retrieve your data.

Be aware that in some situations, multiple partitions may not be appropriate and could have a negative impact. For example, some operations that use multiple disks could be performed more efficiently with concentrated data. So, when you partition, consider the benefits together with alternate approaches.

More Information

For more information about scaling up versus scaling out, see the following resources:

"Deployment and Infrastructure" in Chapter 4, "Architecture and Design Review of a .NET Application for Performance and Scalability"
"SQL: Scale Up vs. Scale Out" in Chapter 14, "Improving SQL Server Performance"
"How To: Perform Capacity Planning for .NET Applications" in the "How To" section of this guide

Architecture and Design Issues

In addition to affecting performance, bad design can limit your application's scalability. Performance is concerned with achieving response time, throughput, and resource utilization levels that meet your performance objectives. Scalability refers to the ability to handle additional workload without adversely affecting performance by adding resources such as more CPU, memory, or storage capacity.

Sometimes a design decision involves a tradeoff between performance and scalability. Figure 3.5 highlights some of the main problems that can occur across the layers of distributed applications.

Ff647801.ch03-common-performance-issues-across-layers(en-us,PandP.10).gif

Figure 3.5: Common performance issues across application layers

The highlighted issues can apply across application layers. For example, a nonresponsive application might be the result of concurrency issues in your Web page's code, in your application's middle tier, or in the database. Alternatively, it could be the direct result of communication issues caused by a chatty interface design or the failure to pool shared resources. In this case, poor performance might become apparent only when several users concurrently access your application.

Table 3.1 lists the key issues that can result from poor design. These issues have been organized by the categories defined by the performance and scalability frame introduced in Chapter 1, "Fundamentals of Engineering for Performance".

Table 3.1: Potential Performance Problems with Bad Design

Category	Potential Problem Due to Bad Design
Coupling and Cohesion	Limited scalability due to server and resource affinity. Mixed presentation and business logic, which limits your options for scaling out your application. Lifetime issues due to tight coupling.
Communication	Increased network traffic and latency due to chatty calls between layers. Inappropriate transport protocols and wire formats. Large data volumes over limited bandwidth networks.
Concurrency	Blocking calls and nongranular locks that stall the application's user interface. Additional processor and memory overhead due to inefficient threading. Contention at the database due to inappropriate transaction isolation levels. Reduced concurrency due to inefficient locking.
Resource Management	Large working sets due to inefficient memory management. Limited scalability and reduced throughput due to failing to release and pool shared resources. Reduced performance due to excessive late binding and inefficient object creation and destruction.
Caching	Caching shared resources, cache misses, failure to expire items, poor cache design, and lack of a cache synchronization mechanism for scaling out.
State Management	State affinity, reduced scalability, inappropriate state design, inappropriate state store.
Data Structures and Algorithms	Excessive type conversion. Inefficient lookups. Incorrect choice of data structure for various functions such as searching, sorting, enumerating, and the size of data.

The subsequent sections in this chapter present design recommendations, organized by performance category.

Coupling and Cohesion

Reducing coupling and increasing cohesion are two key principles to increasing application scalability. Coupling is a degree of dependency (at design or run time) that exists between parts of a system. Cohesion measures how many different components take advantage of shared processing and data. An application that is designed in a modular fashion contains a set of highly cohesive components that are themselves loosely coupled.

To help ensure appropriate degrees of coupling and cohesion in your design, consider the following recommendations:

Design for loose coupling.
Design for high cohesion.
Partition application functionality into logical layers.
Use early binding where possible.
Evaluate resource affinity.

Design for Loose Coupling

Aim to minimize coupling within and across your application components. If you have tight coupling and need to make changes, the changes are likely to ripple across the tightly coupled components. With loosely coupled components, changes are limited because the complexities of individual components are encapsulated from consumers. In addition, loose coupling provides greater flexibility to choose optimized strategies for performance and scalability for different components of your system independently.

There may be certain performance-critical scenarios where you need to tightly couple your presentation, business, and data access logic because you cannot afford the slight overhead of loose coupling. For example, code inlining removes the overhead of instantiating and calling multiple objects, setting up a call stack for calling different methods, performing virtual table lookups, and so on. However, in the majority of cases, the benefits of loose coupling outweigh these minor performance gains.

Some of the patterns and principles that enable loose coupling are the following:

Separate interface from implementation. Providing facades at critical boundaries in your application leads to better maintainability and helps define units of work that encapsulate internal complexity.

For a good example of this approach, see the implementation of the "Exception Handling Application Block" included with the Enterprise Library on MSDN.
Message-based communication. Message queues support asynchronous request invocation, and you can use a client-side queue if you need responses. This provides additional flexibility for determining when requests should be processed.

Design for High Cohesion

Logically related entities, such as classes and methods, should be grouped together. For example, a class should contain a logically related set of methods. Similarly, a component should contain logically related classes.

Weak cohesion among components tends to result in more round trips because the classes or components are not logically grouped and may end up residing in different tiers. This can force you to require a mix of local and remote calls to complete a logical operation. You can avoid this with appropriate grouping. This also helps reduce complexity by eliminating complex sequences of interactions between various components.

Partition Application Functionality into Logical Layers

Using logical layers to partition your application ensures that your presentation logic, business logic, and data access logic are not interspersed. This logical organization leads to a cohesive design in which related classes and data are located close to each other, generally within a single boundary. This helps optimize the use of expensive resources. For example, co-locating all data access logic classes ensures they can share a database connection pool.

Use Early Binding Where Possible

Prefer early binding where possible because this minimizes run-time overhead and is the most efficient way to call a method.

Late binding provides a looser coupling, but it affects performance because components must be dynamically located and loaded. Use late binding only where it is absolutely necessary, such as for extensibility.

Evaluate Resource Affinity

Compare and contrast the pros and cons. Affinity to a particular resource can improve performance in some situations. However, while affinity may satisfy your performance goals for today, resource affinity can make it difficult to scale your application. For example, affinity to a particular resource can limit or prevent the effective use of additional hardware on servers, such as more processors and memory. Server affinity can also prevent scaling out.

Some examples of affinity that can cause scalability problems include the following:

Using an in-process state store. As a result of this, all requests from a specific client must be routed to the same server.
Using application logic that introduces thread affinity. This forces the thread to be run on a specific set of processors. This hinders the ability of the scheduler to schedule threads across the processors, causing a decrease in performance gains produced by parallel processing.

More Information

For more information about coupling and cohesion, see "Coupling and Cohesion" in Chapter 4, "Architecture and Design Review of a .NET Application for Performance and Scalability."

Communication

The benefits of distributed architectures such as improved scalability, fault tolerance, and maintenance are well documented. However, the increased levels of communication and coordination inevitably affect performance.

To avoid common pitfalls and minimize performance overhead, consider the following guidelines:

Choose the appropriate remote communication mechanism.
Design chunky interfaces.
Consider how to pass data between layers.
Minimize the amount of data sent across the wire.
Batch work to reduce calls over the network.
Reduce transitions across boundaries.
Consider asynchronous communication.
Consider message queuing.
Consider a "fire and forget" invocation model.

Choose the Appropriate Remote Communication Mechanism

Your choice of transport mechanism is governed by various factors, including available network bandwidth, amount of data to be passed, average number of simultaneous users, and security restrictions such as firewalls.

Services are the preferred communication across application boundaries, including platform, deployment, and trust boundaries. Object technology, such as Enterprise Services or .NET remoting, should generally be used only within a service's implementation. Use Enterprise Services only if you need the additional feature set (such as object pooling, declarative distributed transactions, role-based security, and queued components) or where your application communicates between components on a local server and you have performance issues with Web services.

You should choose secure transport protocols such as HTTPS only where necessary and only for those parts of a site that require it.

Design Chunky Interfaces

Design chunky interfaces and avoid chatty interfaces. Chatty interfaces require multiple request/response round trips to perform a single logical operation, which consumes system and potentially network resources. Chunky interfaces enable you to pass all of the necessary input parameters and complete a logical operation in a minimum number of calls. For example, you can wrap multiple get and set calls with a single method call. The wrapper would then coordinate property access internally.

You can have a facade with chunky interfaces that wrap existing components to reduce the number of round trips. Your facade would encapsulate the functionality of the set of wrapped components and would provide a simpler interface to the client. The interface internally coordinates the interaction among various components in the layer. In this way, the client is less prone to any changes that affect the business layer, and the facade also helps you to reduce round trips between the client and the server.

Consider How to Pass Data Between Layers

Passing data between layers involves processing overhead for serialization as well as network utilization. Your options include using ADO.NET DataSet objects, strongly typed DataSet objects, collections, XML, or custom objects and value types.

To make an informed design decision, consider the following questions:

In what format is the data retrieved?

If the client retrieves data in a certain format, it may be expensive to transform it. Transformation is a common requirement, but you should avoid multiple transformations as the data flows through your application.
In what format is the data consumed?

If the client requires data in the form of a collection of objects of a particular type, a strongly typed collection is a logical and correct choice.
What features does the client require?

A client might expect certain features to be available from the objects it receives as output from the business layer. For example, if your client needs to be able to view the data in multiple ways, needs to update data on the server by using optimistic concurrency, and needs to handle complex relationships between various sets of data, a DataSet is well suited to this type of requirement.

However, the DataSet is expensive to create due to its internal object hierarchy, and it has a large memory footprint. Also, default DataSet serialization incurs a significant processing cost even when you use the BinaryFormatter.

Other client-side requirements can include the need for validation, data binding, sorting, and sharing assemblies between client and server.

For more information about how to improve DataSet serialization performance, see "How To: Improve Serialization Performance" in the "How To" section of this guide.
Can the data be logically grouped?

If the data required by the client represents a logical grouping, such as the attributes that describe an employee, consider using a custom type. For example, you could return employee details as a struct type that has employee name, address, and employee number as members.

The main performance benefit of custom classes is that they allow you to create your own optimized serialization mechanisms to reduce the communication footprint between computers.
Do you need to consider cross-platform interoperability?

XML is an open standard and is the ideal data representation for cross-platform interoperability and communicating with external (and heterogeneous) systems.

Performance issues to consider include the considerable parsing effort required to process large XML strings. Large and verbose strings also consume large amounts of memory. For more information about XML processing, see Chapter 9, "Improving XML Performance".

More Information

For more information about passing data across layers, see "Data Access" in Chapter 4, "Architecture and Design Review of a .NET Application for Performance and Scalability."

Minimize the Amount of Data Sent Across the Wire

Avoid sending redundant data over the wire. You can optimize data communication by using a number of design patterns:

Use coarse-grained wrappers. You can develop a wrapper object with a coarse-grained interface to encapsulate and coordinate the functionality of one or more objects that have not been designed for efficient remote access. The wrapper object abstracts complexity and the relationships between various business objects, provides a chunky interface optimized for remote access, and helps provide a loosely coupled system. It provides clients with single interface functionality for multiple business objects. It also helps define coarser units of work and encapsulate change. This approach is described by facade design patterns.
Wrap and return the data that you need. Instead of making a remote call to fetch individual data items, you fetch a data object by value in a single remote call. You then operate locally against the locally cached data. This might be sufficient for many scenarios.

In other scenarios, where you need to ultimately update the data on the server, the wrapper object exposes a single method that you call to send the data back to the server. This approach is demonstrated in the following code fragment.
```
struct Employee {
  private int _employeeID;
  private string _projectCode;

  public int EmployeeID {
    get {return _ employeeID;}
  }
  public string ProjectCode {
    get {return _ projectCode;}
  }
  public SetData(){
    // Send the changes back and update the changes on the remote server
  }
}
```
Besides encapsulating the relevant data, the value object can expose a SetData or method for updating the data back on the server. The public properties act locally on the cached data without making a remote method call. These individual methods can also perform data validation. This approach is sometimes referred to as the data transfer object design pattern.
Serialize only what you need to. Analyze the way your objects implement serialization to ensure that only the necessary data is serialized. This reduces data size and memory overhead. For more information, see "How To: Improve Serialization Performance" in the "How To" section of this guide.
Use data paging. Use a paging solution when you need to present large volumes of data to the client. This helps reduce processing load on the server, client, and network, and it provides a superior user experience. For more information about various implementation techniques, see "How To: Page Records in .NET Applications" in the "How To" section of this guide.
Consider compression techniques. In situations where you absolutely must send large amounts of data, and where network bandwidth is limited, consider compression techniques such as HTTP 1.1 compression.

Batch Work to Reduce Calls Over the Network

Batch your work to reduce the amount of remote calls over the network. Some examples of batching include the following:

Batch updates. The client sends multiple updates as a single batch to a remote application server instead of making multiple remote calls for updates for a transaction.
Batch queries. Multiple SQL queries can be batched by separating them with a semicolon or by using stored procedures.

Reduce Transitions Across Boundaries

Keep frequently interacting entities within the same boundary, such as the same application domain, process, or machine, to reduce communication overhead. When doing so, consider the performance against scalability tradeoff. A single-process, single-application domain solution provides optimum performance, and a multiple server solution provides significant scalability benefits and enables you to scale out your solution.

The main boundaries you need to consider are the following:

Managed to unmanaged code
Process to process
Server to server

Consider Asynchronous Communication

To avoid blocking threads, consider using asynchronous calls for any sort of I/O operation. Synchronous calls continue to block on threads during the time they wait for response. Asynchronous calls give you the flexibility to free up the processing thread for doing some useful work (maybe handling new requests for server applications). As a result, asynchronous calls are helpful for potentially long-running calls that are not CPU-bound. The .NET Framework provides an asynchronous design pattern for implementing asynchronous communication.

Note that each asynchronous call actually uses a worker thread from the process thread pool. If they are used excessively on a single-CPU system, this can lead to thread starvation and excessive thread switching, thus degrading performance. If your clients do not need results to be returned immediately, consider using client and server-side queues as an alternative approach.

Consider Message Queuing

A loosely coupled, message-driven approach enables you to do the following:

Decouple the lifetime of the client state and server state, which helps to reduce complexity and increase the resilience of distributed applications.
Improve responsiveness and throughput because the current request is not dependent on the completion of a potentially slow downstream process.
Offload processor-intensive work to other servers.
Add additional consumer processes that read from a common message queue to help improve scalability.
Defer processing to nonpeak periods.
Reduce the need for synchronized access to resources.

The basic message queuing approach is shown in Figure 3.6. The client submits requests for processing in the form of messages on the request queue. The processing logic (which can be implemented as multiple parallel processes for scalability) reads requests from the request queue, performs the necessary work, and places the response messages on the response queue, which are then read by the client.

Ff647801.ch03-messagequeuingwithresponse(en-us,PandP.10).gif

Figure 3.6: Message queuing with response

Message queuing presents additional design challenges:

How will your application behave if messages are not delivered or received?
How will your application behave if duplicate messages arrive or messages arrive out of sequence? Your design should not have order and time dependencies.

Consider a "Fire and Forget" Invocation Model

If the client does not need to wait for the results of a call, you can use a "fire and forget" approach to improve response time and avoid blocking while the long-running server call completes. The "fire and forget" approach can be implemented in a number of ways:

The client and the server can have message queues.
If you are using ASP.NET Web services or .NET remoting, you can use the OneWay attribute.

More Information

For more information, see the following resources:

For more information about the .NET Framework asynchronous invocation model, see Chapter 5, "Improving Managed Code Performance," and "Asynchronous Programming Design Pattern," in the .NET Framework Software Development Kit (SDK) on MSDN.
If you require other Enterprise Services in addition to asynchronous communication, consider using COM+ Queued Components. For more information about using Queued Components, see Chapter 8, "Improving Enterprise Services Performance."
For more information about using the "fire and forget" approach with Web services, see Chapter 10, "Improving Web Services Performance."
For more information about fire and forget with .NET remoting, see Chapter 11, "Improving Remoting Performance."

More Information

For more information about communication, see "Communication" in Chapter 4, "Architecture and Design Review of a .NET Application for Performance and Scalability."

Concurrency

One of the key benefits of distributed architectures is that they can support high degrees of concurrency and parallel processing. However, there are many factors such as contention for shared resources, blocking synchronous calls, and locking, which reduce concurrency. Your design needs to take these factors into account. A fundamental design goal is to minimize contention and maximize concurrency.

Use the following guidelines to help achieve that goal:

Reduce contention by minimizing lock times.
Balance between coarse- and fine-grained locks.
Choose an appropriate transaction isolation level.
Avoid long-running atomic transactions.

Reduce Contention by Minimizing Lock Times

If you use synchronization primitives to synchronize access to shared resources or code, make sure you minimize the amount of time the lock is held. High contention for shared resources results in queued requests and increased caller wait time. For example, hold locks only over those lines of code that need the atomicity. When you perform database operations, consider the use of partitioning and file groups to distribute the I/O operations across multiple hard disks.

Balance Between Coarse- and Fine-Grained Locks

Review and test your code against your policy for how many locks, what kind of locks, and where the lock is taken and released. Determine the right balance of coarse-grained and fine-grained locks. Coarse-grained locks can result in increased contention for resources. Fine-grained locks that only lock the relevant lines of code for the minimum amount of time are preferable because they lead to less lock contention and improved concurrency. However, having too many fine-grained locks can introduce processing overhead as well as increase code complexity and the chances of errors and deadlocks.

Choose an Appropriate Transaction Isolation Level

You need to select the appropriate isolation level to ensure that the data integrity is preserved without affecting the performance of your application. Different levels of isolation bring with them different guarantees for data integrity as well as different levels of performance. Four ANSI isolation levels are supported by SQL Server:

Read Uncommitted
Read Committed
Repeatable Read
Serializable

**Note **The support for isolation levels may vary from database to database. For example, Oracle 8i does not support the Read Uncommitted isolation level.

Read Uncommitted offers the best performance but provides the fewest data integrity guarantees, while Serializable offers the slowest performance but guarantees maximum data integrity.

You should carefully evaluate the impact of changing the SQL Server default isolation level (Read Committed). Changing it to a value higher than required might increase contention on database objects, and decreasing it might increase performance but at the expense of data integrity issues.

Choosing the appropriate isolation levels requires you to understand the way the database handles locking and the kind of task your application performs. For example, if your transaction involves a couple of rows in a table, it is unlikely to interfere as much with other transactions in comparison to one which involves many tables and may need to lock many rows or entire tables. Transactions that hold many locks are likely to take considerable time to complete and they require a higher isolation level than ones that lock only a couple of rows.

The nature and criticality of a transaction also plays a very significant part in deciding isolation levels. Isolation has to do with what interim states are observable to readers. It has less to do with the correctness of the data update.

In some scenarios — for example if you need a rough estimate of inactive customer accounts — you may be willing to sacrifice accuracy by using a lower isolation level to avoid interfering with other users of the database.

More Information

For more information, see the following resources:

For more information about transactions and isolation levels, see "Transactions" in Chapter 12, "Improving ADO.NET Performance."
For more information about database performance, see Chapter 14, "Improving SQL Server Performance."

Avoid Long-Running Atomic Transactions

Keep atomic transactions as short as possible to minimize the time that locks are retained and to reduce contention. Atomic transactions that run for a long time retain database locks, which can significantly reduce the overall throughput for your application. The following suggestions help reduce transaction time:

Avoid wrapping read-only operations in a transaction. To query reference data (for example, to display in a user interface), the implicit isolation provided by SQL Server for concurrent operations is enough to guarantee data consistency.
Use optimistic concurrency strategies. Gather data for coarse-grained operations outside the scope of a transaction, and when the transaction is submitted, provide enough data to detect whether the underlying reference data has changed enough to make it invalid. Typical approaches include comparing timestamps for data changes and comparing specific fields of the reference data in the database with the data retrieved.
Do not flow your transactions across more boundaries than necessary. Gather user and external data before the transaction and define the transaction scope around one coarse-grained object or service call.
Only wrap operations that work against transactional resource managers, such as SQL Server or Microsoft Windows Message Queuing in transactions.
Consider using compensating transactions where you need transactional qualities and where the cost of a synchronous long-running transaction would be too expensive.

More Information

For more information about concurrency, see "Concurrency" in Chapter 4, "Architecture and Design Review of a .NET Application for Performance and Scalability."

Resource Management

Resources are generally finite and often need to be shared among multiple clients. Inefficient resource management is often the cause of performance and scalability bottlenecks. Sometimes the platform provides efficient ways to manage resources, but you also need to adopt the right design patterns.

When you design for resource management, consider the following recommendations:

Treat threads as a shared resource.
Pool shared or scarce resources.
Acquire late, release early.
Consider efficient object creation and destruction.
Consider resource throttling.

Treat Threads As a Shared Resource

Avoid creating threads on a per-request basis. If threads are created indiscriminately, particularly for high-volume server applications, this can hurt performance because it consumes resources (particularly on single-CPU servers) and introduces thread switching overhead for the processor. A better approach is to use a shared pool of threads, such as the process thread pool. When using a shared pool, make sure you optimize the way that you use the threads:

Optimize the number of threads in the shared pool. For example, specific thread pool tuning is required for a high-volume Web application making outbound calls to one or more Web services. For more information about tuning the thread pool in this situation, see Chapter 10, "Improving Web Services Performance".
Minimize the length of jobs that are running on shared threads.

An efficient thread pool implementation offers a number of benefits and allows the optimization of system resources. For example, the .NET thread pool implementation dynamically tunes the number of threads in the pool based on current CPU utilization levels. This helps to ensure that the CPU is not overloaded. The thread pool also enforces a limit on the number of threads it allows to be active in a process simultaneously, based on the number of CPUs and other factors.

Pool Shared or Scarce Resources

Pool shared resources that are scarce or expensive to create, such as database or network connections. Use pooling to help reduce performance overhead and improve scalability by sharing a limited number of resources among a much higher number of clients. Common pools include the following:

Thread pool. Use process-wide thread pools instead of creating threads on a per-request basis.
Connection pool. To ensure that you use connection pooling most efficiently, use the trusted subsystem model to access downstream systems and databases. With this model, you use a single fixed identity to connect to downstream systems. This allows the connection to be efficiently pooled.
Object pool. Objects that are expensive to initialize are ideal candidates for pooling. For example, you could use an object pool to retain a limited set of mainframe connections that take a long time to establish. Multiple objects can be shared by multiple clients as long as no client-specific state is maintained. You should also avoid any affinity to a particular resource. Creating an affinity to a particular object effectively counteracts the benefits of object pooling in the first place. Any object in the pool should be able to service any request and should not be blocked for one particular request.

More Information

For more information, see the following resources:

For more information about the trusted subsystem model, see Chapter 14, "Building Secure Data Access," in Improving Web Application Security: Threats and Countermeasures on MSDN, at https://msdn.microsoft.com/en-us/library/aa302430.aspx.
For more information about COM+ object pooling, see "Object Pooling" in Chapter 8, "Improving Enterprise Services Performance."

Acquire Late, Release Early

Acquire resources as late as possible, immediately before you need to use them, and release them immediately after you are finished with them. Use language constructs, such as finally blocks, to ensure that resources are released even in the event of an exception.

Consider Efficient Object Creation and Destruction

Object creation should generally be deferred to the actual point of usage. This ensures that the objects do not consume system resources while waiting to be used. Release objects immediately after you are finished with them.

If objects require explicit cleanup code and need to release handles to system resources, such as files or network connections, make sure that you perform the cleanup explicitly to avoid any memory leaks and waste of resources.

More Information

For more information about garbage collection, see Chapter 5, "Improving Managed Code Performance."

Consider Resource Throttling

You can use resource throttling to prevent any single task from consuming a disproportionate percentage of resources from the total allocated for the application. Resource throttling prevents an application from overshooting its allocated budget of computer resources, including CPU, memory, disk I/O, and network I/O.

A server application attempting to consume large amounts of resources can result in increased contention. This causes increased response times and decreased throughput. Common examples of inefficient designs that cause this degradation include the following:

A user query that returns a large result set from a database. This can increase resource consumption at the database, on the network, and on the Web server.
An update that locks a large number of rows across frequently accessed tables. This causes significant increases in contention.

To help address these and similar issues, consider the following options for resource throttling:

Paging through large result sets.
Setting timeouts on long-running operations such that no single request continues to block on a shared resource beyond a permissible time limit.
Setting the process and thread priorities appropriately. Avoid assigning priorities higher than normal unless the process or the thread is very critical and demands real-time attention from the processor.

If there are cases where a single request or the application as a whole needs to consume large amounts of resources, you can either consider splitting the work across multiple servers or you can offload the work to nonpeak hours when the resource utilization is generally low.

More Information

For more information about resource management, see "Resource Management" in Chapter 4, "Architecture and Design Review of a .NET Application for Performance and Scalability."

Caching

Caching is one of the best techniques you can use to improve performance. Use caching to optimize reference data lookups, avoid network round trips, and avoid unnecessary and duplicate processing. To implement caching you need to decide when to load the cache data. Try to load the cache asynchronously or by using a batch process to avoid client delays.

When you design for caching, consider the following recommendations:

Decide where to cache data.
Decide what data to cache.
Decide the expiration policy and scavenging mechanism.
Decide how to load the cache data.
Avoid distributed coherent caches.

Decide Where to Cache Data

Cache state where it can save the most processing and network round trips. This might be at the client, a proxy server, your application's presentation logic, business logic, or in a database. Choose the cache location that supports the lifetime you want for your cached items. If you need to cache data for lengthy periods of time, you should use a SQL Server database. For shorter cache durations, use in-memory caches.

Consider the following scenarios:

Data caching in the presentation layer. Consider caching data in the presentation layer when it needs to be displayed to the user and the data is not cached on per-user basis. For example, if you need to display a list of states, you can look these up once from the database and then store them in the cache.

For more information about ASP.NET caching techniques, see Chapter 6, "Improving ASP.NET Performance".
Data caching in the business layer. You can implement caching mechanisms by using hash tables or other data structures in your application's business logic. For example, you could cache taxation rules that enable tax calculation. Consider caching in the business layer when the data cannot be efficiently retrieved from the database. Data that changes frequently should not be cached.
Data caching in the database. Cache large amounts of data in a database and when you need to cache for lengthy periods of time. The data can be served in smaller chunks or as a whole, depending on your requirements. The data will be cached in temporary tables, which consume more RAM and may cause memory bottlenecks. You should always measure to see whether caching in a database is hurting or improving your application performance.

Decide What Data to Cache

Caching the right data is the most critical aspect of caching. If you fail to get this right, you can end up reducing performance instead of improving it. You might end up consuming more memory and at the same time suffer from cache misses, where the data is not actually getting served from cache but is refetched from the original source.

The following are some important recommendations that help you decide what to cache:

Avoid caching per-user data. Caching data on a per-user basis can cause a memory bottleneck. Imagine a search engine that caches the results of the query fired by each user, so that it can page through the results efficiently. Do not cache per-user data unless the retrieval of the data is expensive and the concurrent load of clients does not build up memory pressure. Even in this case, you need to measure both approaches for better performance and consider caching the data on a dedicated server. In such cases, you can also consider using session state as a cache mechanism for Web applications, but only for small amounts of data. Also, you should be caching only the most relevant data.
Avoid caching volatile data. Cache frequently used, not frequently changing, data. Cache static data that is expensive to retrieve or create.

Caching volatile data, which is required by the user to be accurate and updated in real time, should be avoided. If you frequently expire the cache to keep in synchronization with the rapidly changing data, you might end up using more system resources such as CPU, memory, and network.

Cache data that does not change very frequently or is completely static. If the data does change frequently, you should evaluate the acceptable time limit during which stale data can be served to the user. For example, consider a stock ticker, which shows the stock quotes. Although the stock rates are continuously updated, the stock ticker can safely be updated after a fixed time interval of five minutes.

You can then devise a suitable expiration mechanism to clear the cache and retrieve fresh data from the original medium.
Do not cache shared expensive resources. Do not cache shared expensive resources such as network connections. Instead, pool those resources.
Cache transformed data, keeping in mind the data use. If you need to transform data before it can be used, transform the data before caching it.
Try to avoid caching data that needs to be synchronized across servers. This approach requires manual and complex synchronization logic and should be avoided where possible.

Decide the Expiration Policy and Scavenging Mechanism

You need to determine the appropriate time interval to refresh data, and design a notification process to indicate that the cache needs refreshing.

If you hold data too long, you run the risk of using stale data, and if you expire the data too frequently you can affect performance. Decide on the expiration algorithm that is right for your scenario. These include the following:

Least recently used.
Least frequently used.
Absolute expiration after a fixed interval.
Caching expiration based on a change in an external dependency, such as a file.
Cleaning up the cache if a resource threshold (such as a memory limit) is reached.

**Note **The best choice of scavenging mechanism also depends on the storage choice for the cache.

Decide How to Load the Cache Data

For large caches, consider loading the cache asynchronously with a separate thread or by using a batch process.

When a client accesses an expired cache, it needs to be repopulated. Doing so synchronously affects client-response time and blocks the request processing thread.

Avoid Distributed Coherent Caches

In a Web farm, if you need multiple caches on different servers to be kept synchronized because of localized cache updates, you are probably dealing with transactional state. You should store this type of state in a transactional resource manager such as SQL Server. Otherwise, you need to rethink the degree of integrity and potential staleness you can trade off for increased performance and scalability.

A localized cache is acceptable even in a server farm as long as you require it only for serving the pages faster. If the request goes to other servers that do not have the same updated cache, they should still be able to serve the same pages, albeit by querying the persistent medium for same data.

More Information

For more information, see the following resources:

For more information about caching, see "Caching" in Chapter 4, "Architecture and Design Review of a .NET Application for Performance and Scalability."
For more information and guidelines about caching, see "Caching Architecture Guide for .NET Framework Applications," on MSDN.
For middle-tier caching solutions, consider the Caching Application Block included with the Enterprise Library on MSDN.

State Management

Improper state management causes many performance and scalability issues.

The following guidelines help you to design efficient state management:

Evaluate stateful versus stateless design.
Consider your state store options.
Minimize session data.
Free session resources as soon as possible.
Avoid accessing session variables from business logic.

Evaluate Stateful vs. Stateless Design

Stateful components hold onto state in member variables for completing a logical activity that spans multiple calls from the client. The state may or may not be persisted after the completion of an operation. Stateful components can produce optimized performance for certain load conditions. The client makes all the requests to a particular instance of the component to complete the operation. Hence, stateful components result in clients having affinity with the component.

The caveat for stateful components is that they may hold onto server resources across calls until the logical activity is complete. This can result in increased contention for resources. The server continues to hold onto resources even if the client does not make subsequent calls to complete the operation and would release them only if there is a timeout value set for the activity.

Affinity is an important issue for stateful design. The client is tied to a particular instance because of localized state. This may be a disadvantage if you have a remote application tier and have scalability goals that mean you need to scale out your application tier. The affinity ties clients to a particular server, making it impossible to truly load balance the application.

As discussed earlier, having state for components is a design tradeoff, and the decision requires inputs that relate to your deployment requirements (for example, whether or not you have a remote application tier) and your performance and scalability goals.

If you decide on a stateless component design, you then need to decide where to persist state outside of the components so that it can be retrieved on a per-request basis.

Consider Your State Store Options

If you use a stateless component design, store state where it can be retrieved most efficiently. Factors to consider that influence your choice include the amount of state, the network bandwidth between client and server, and whether state needs to be shared across multiple servers. You have the following options to store state:

On the client. If you have only small amounts of state and sufficient network bandwidth, you can consider storing it at the client and passing it back to the server with each request.
In memory on the server. If you have too much state to be passed from the client with each request, you can store it in memory on the server, either in the Web application process or in a separate local process. Localized state on the server is faster and avoids network round trips to fetch the state. However, this adds to memory utilization on the server.
In a dedicated resource manager. If you have large amounts of state and it needs to be shared across multiple servers, consider using SQL Server. The increase in scalability offered by this approach is achieved at the cost of performance because of additional round trips and serialization.

Minimize Session Data

Keep the amount of session data stored for a specific user to a minimum to reduce the storage and retrieval performance overheads. The total size of session data for the targeted load of concurrent users may result in increased memory pressure when the session state is stored on the server, or increased network congestion if the data is held in a remote store.

If you use session state, there are two situations you should avoid:

Avoid storing any shared resources. These are required by multiple requests and may result in contention because the resource is not released until the session times out.
Avoid storing large collections and objects in session stores. Consider caching them if they are required by multiple clients.

Free Session Resources As Soon As Possible

Sessions continue to hold server resources until the data is explicitly cleaned up or until the session times out.

You can follow a two-pronged strategy to minimize this overhead. At design time, you should ensure that the session state is released as soon as possible. For example, in a Web application, you may temporarily store a dataset in a session variable so that the data is available across pages. This data should be removed as soon as possible to reduce load. One way to achieve this is to release all session variables containing objects as soon as the user clicks on a menu item.

Also, you should tune your session timeout to ensure that the session data does not continue to consume resources for long periods.

Avoid Accessing Session Variables from Business Logic

Accessing session variables from business logic makes sense only when the business logic is interspersed along with presentation code as a result of tight coupling. You may require this in some scenarios, as discussed in "Coupling and Cohesion" earlier in this chapter.

However, in the majority of cases, you benefit from loosely coupled presentation and business logic, partitioned in separate logical layers. This provides better maintainability and improved scalability options. It is most frequently user interface–related state that needs to be persisted across calls. Therefore, session-related state should be part of the presentation layer. In this way, if the workflows of the user interface change, it is only the presentation layer code that is affected.

More Information

For more information about state management, see "State Management" in Chapter 4, "Architecture and Design Review of a .NET Application for Performance and Scalability."

Data Structures and Algorithms

The correct use of data structures and algorithms plays a significant role in building high-performance applications. Your choices in these areas can significantly affect memory consumption and CPU loading.

The following guidelines help you to use efficient data structures and algorithms:

Choose an appropriate data structure.
Pre-assign size for large dynamic growth data types.
Use value and reference types appropriately.

Choose an Appropriate Data Structure

Before choosing the collection type for your scenarios, you should spend time analyzing your specific requirements by using the following common criteria:

Data storage. Consider how much data will be stored. Will you store a few records or a few thousand records? Do you know the amount of data to be stored ahead of time instead of at run time? How do you need to store the data? Does it need to be stored in order or randomly?
Type. What type of data do you need to store? Is it strongly typed data? Do you store variant objects or value types?
Growth. How will your data grow? What size of growth? What frequency?
Access. Do you need indexed access? Do you need to access data by using a key-value pair? Do you need sorting in addition to searching?
Concurrency. Does access to the data need to be synchronized? If the data is regularly updated, you need synchronized access. You may not need synchronization if the data is read-only.
Marshaling. Do you need to marshal your data structure across boundaries? For example, do you need to store your data in a cache or a session store? If so, you need to make sure that the data structure supports serialization in an efficient way.

Pre-Assign Size for Large Dynamic Growth Data Types

If you know that you need to add a lot of data to a dynamic data type, assign an approximate size up front wherever you can. This helps avoid unnecessary memory re-allocations.

Use Value and Reference Types Appropriately

Value types are stack-based and are passed by value, while reference types are heap-based and are passed by reference. Use the following guidelines when choosing between pass-by-value and pass-by-reference semantics:

Avoid passing large value types by value to local methods. If the target method is in the same process or application domain, the data is copied onto the stack. You can improve performance by passing a reference to a large structure through a method parameter, rather than passing the structure by value.
Consider passing reference types by value across process boundaries. If you pass an object reference across a process boundary, a callback to the client process is required each time the objects' fields or methods are accessed. By passing the object by value, you avoid this overhead. If you pass a set of objects or a set of connected objects, make sure all of them can be passed by value.
Consider passing a reference type when the size of the object is very large or the state is relevant only within the current process boundaries. For example, objects that maintain handles to local server resources, such as files.

More Information

For more information about data structures and algorithms, see "Data Structures and Algorithms" in Chapter 4, "Architecture and Design Review of a .NET Application for Performance and Scalability."

Design Guidelines Summary

Table 3.2 summarizes the design guidelines discussed in this chapter and organizes them by performance profile category.

Table 3.2: Design Guidelines by Performance Profile Category

Performance profile category	Guidelines
Coupling and Cohesion	Design for loose coupling. Design for high cohesion. Partition application functionality into logical layers. Use early binding where possible. Evaluate resource affinity.
Communication	Choose the appropriate remote communication mechanism. Design chunky interfaces. Consider how to pass data between layers. Minimize the amount of data sent across the wire. Batch work to reduce calls over the network. Reduce transitions across boundaries. Consider asynchronous communication. Consider message queuing. Consider a "fire and forget" invocation model.
Concurrency	Reduce contention by minimizing lock times. Balance between coarse-grained and fine-grained locks. Choose an appropriate transaction isolation level. Avoid long-running atomic transactions.
Resource Management	Treat threads as a shared resource. Pool shared or scarce resources. Acquire late, release early. Consider efficient object creation and destruction. Consider resource throttling.
Caching	Decide where to cache data. Decide what data to cache. Decide the expiration policy and scavenging mechanism. Decide how to load the cache data. Avoid distributed coherent caches.
State Management	Evaluate stateful versus stateless design. Consider your state store options. Minimize session data. Free session resources as soon as possible. Avoid accessing session variables from business logic.
Data Structures/Algorithms	Choose an appropriate data structure. Pre-assign size for large dynamic growth data types. Use value and reference types appropriately.

Desktop Applications Considerations

Desktop applications must share resources, including CPU, memory, network I/O, and disk I/O, with other processes that run on the desktop computer. A single application that consumes a disproportionate amount of resources affects other applications and the overall performance and responsiveness of the desktop. Some of the more significant aspects to consider if you build desktop applications include the following:

Consider UI responsiveness. User interface responsiveness is an important consideration for desktop applications. You should consider performing long-running tasks asynchronously by delegating the task to a separate thread rather than having the main user interface thread do the work. This keeps the user interface responsive. You can perform asynchronous work in a number of ways, such as by using the process thread pool, spawning threads, or by using message queues. However, asynchronous processing adds complexity and requires careful design and implementation.
Consider work priorities. When building complex desktop applications, you must consider the relative priorities of work items within the application and relative to other applications the user might be running. Background tasks with lower priorities and unobtrusive UI designs provide better performance for the user (both actual and perceived) when performing different tasks. Background network transfers and progressive data loading are two techniques that you can use to prioritize different work items.

Browser Client Considerations

The following design guidelines help you to improve both actual and perceived performance for browser clients:

Force the user to enter detailed search criteria. By validating that the user has entered detailed search criteria, you can execute more specific queries that result in less data being retrieved. This helps to reduce round trips to the server and reduces the volume of data that needs to be manipulated on the server and client.
Implement client-side validation. Perform client-side validation to avoid invalid data being submitted to the server and to minimize unnecessary round trips. Always validate data on the server in addition to using client-side validation for security reasons, because client-side validation can easily be bypassed.
Display a progress bar for long-running operations. When you have long-running operations that you cannot avoid, implement a progress bar on the client to improve the perceived performance. For more information about how to make long-running calls from an ASP.NET application, see "How To: Submit and Poll for Long-Running Tasks" in the "How To" section of this guide.
Avoid complex pages. Complex pages can result in multiple server calls to your business logic and they can result in large amounts of data being transferred across the network. Consider bandwidth restrictions when you design complex pages and those that contain graphics.
Render output in stages. You can render output a piece at a time. The upper or left part of the output for many Web pages is usually the same for all requests and can be instantly displayed for every request. You can stream the specific part after you have finished processing the request. Even in those cases, the text display can precede streaming of images.
Minimize image size and number of images. Use small compressed images and keep the number of images to a minimum to reduce the amount of data that needs to be sent to the browser. GIF and JPEG formats both use compression, but the GIF format generally produces smaller files when compressing images that have relatively few colors. JPEG generally produces smaller files when the images contain many colors.

Web Layer Considerations

The following recommendations apply to the design of your application's Web layer:

Consider state management implications. Web layer state management involves storing state (such as client preferences) across multiple calls for the duration of a user session, or sometimes across multiple user sessions. You can evaluate your state management approach by using the following criteria: How much data and how many trips?

Consider the following options:
- Storing data on the client. You can store data on the client and submit it with each request. If you store data on the client, you need to consider bandwidth restrictions because the additional state that needs to be persisted across calls adds to the overall page size. You should store only small amounts of data on the client so that the effect on the response time for the target bandwidth is minimal, given a representative load of concurrent users.
- Storing data in the server process. You can store per-user data in the host process on the server. If you choose to store user data on the server, remember that the data consumes resources on the server until the session times out. If the user abandons the session without any notification to the Web application, the data continues to consume server resources unnecessarily until the session times out. Storing user data in the server process also introduces server affinity. This limits your application's scalability and generally prevents network load balancing.
- Storing data in a remote server process. You can store per-user data in remote state store. Storing data on a remote server introduces additional performance overhead. This includes network latency and serialization time. Any data that you store must be serializable and you need to design up front to minimize the number of round trips required to fetch the data. The remote store option does enable you to scale out your solution by, for example, using multiple Web servers in a Web farm. A scalable, fault tolerant remote store such as a SQL Server database also improves application resilience.
Consider how you build output. Web output can be HTML, XML, text, images, or some other file type. The activities required to render output include retrieving the data, formatting the data, and streaming it to the client. Any inefficiency in this process affects all users of your system.

Some key principles that help you build output efficiently include the following:
- Avoid interspersing user interface and business logic.
- Retain server resources such as memory, CPU, threads, and database connections for as little time as possible.
- Minimize output concatenation and streaming overhead by recycling the buffers used for this purpose.

Implement paging. Implement a data paging mechanism for pages that display large amounts of data such as search results screens. For more information about how to implement data paging, see "How To: Page Records in .NET Applications" in the "How To" section of this guide.

Minimize or avoid blocking calls. Calls that block your Web application result in queued and possibly rejected requests and generally cause performance and scalability issues. Minimize or avoid blocking calls by using asynchronous method calls, a "fire and forget" invocation model, or message queuing.
Keep objects close together where possible. Objects that communicate across process and machine boundaries incur significantly greater communication overhead than local objects. Choose the proper locality for your objects based on your reliability, performance, and scalability needs.

Business Layer Considerations

Consider the following design guidelines to help improve the performance and scalability of your business layer:

Instrument your code up front. Instrument your application to gather custom health and performance data that helps you track whether your performance objectives are being met. Instrumentation can also provide additional information about the resource utilization associated with your application's critical and frequently performed operations.

Design your instrumentation so that it can be enabled and disabled through configuration file settings. By doing so, you can minimize overhead by enabling only the most relevant counters when you need to monitor them.
Prefer a stateless design. By following a stateless design approach for your business logic, you help minimize resource utilization in your business layer and you ensure that your business objects do not hold onto shared resources across calls. This helps reduce resource contention and increase performance. Stateless objects also make it easier for you to ensure that you do not introduce server affinity, which restricts your scale-out options.

Ideally, with a stateless design, the lifetime of your business objects is tied to the lifetime of a single request. If you use singleton objects, you should store state outside of the object in a resource manager, such as a SQL Server database, and rehydrate the object with state before servicing each request.

Note that a stateless design may not be a requirement if you need to operate only on a single server. In this case, stateful components can actually help improve performance by removing the overhead of storing state outside of components or having the clients send the necessary state for servicing the request.
Partition your logic. Avoid interspersing your business logic with your presentation logic or data access logic. Doing so significantly reduces the maintainability of your application and introduces versioning issues. Interspersed logic often results in a chatty, tightly coupled system that is difficult to optimize and tune in parts.
Free shared resources as soon as possible. It is essential for scalability that you free limited and shared resources, such as database connections, as soon as you are finished with them. You must also ensure that this occurs even if exceptions are generated.

Data Access Layer Considerations

Consider the following design guidelines to help improve performance and scalability of your data access layer:

Consider abstraction versus performance. If your application uses a single database, use the database-specific data provider.

If you need to support multiple databases, you generally need to have an abstraction layer, which helps you transparently connect to the currently configured store. The information regarding the database and provider is generally specified in a configuration file. While this approach is very flexible, it can become a performance overhead if not designed appropriately.
Consider resource throttling. In certain situations, it is possible for a single request to consume a disproportionate level of server-side resources. For example, a select query that spans a large number of tables might place too much stress on the database server. A request that locks a large number of rows on a frequently used table causes contention issues. This type of situation affects other requests, overall system throughput, and response times.

Consider introducing safeguards and design patterns to prevent this type of issue. For example, implement paging techniques to page through a large amount of data in small chunks rather than reading the whole set of data in one go. Apply appropriate database normalization and ensure that you only lock a small range of relevant rows.
Consider the identities you flow to the database. If you flow the identity of the original user to the database, the connection cannot be reused by other users because the connection request to the database is authorized based on the caller's identity. Unless you have a specific requirement where you have a wide audience of both trusted and nontrusted users, you should make all the requests to the database by using a single identity. Single identity calls improve scalability by enabling efficient connection pooling.
Separate read-only and transactional requests. Avoid interspersing read-only requests within a transaction. This tends to increase the time duration for the transaction, which increases lock times and increases contention. Separate out and complete any read-only requests before starting a transaction that requires the data as input.
Avoid unnecessary data returns. Avoid returning data unnecessarily from database operations. The database server returns control faster to the caller when using operations that do not return. You should analyze your stored procedures and "write" operations on the database to minimize returning data the application does not need, such as row counts, identifiers, and return codes.

Summary

This chapter has shown you a set of design principles and patterns to help you design applications capable of meeting your performance and scalability objectives.

Designing for performance and scalability involves tradeoffs. Other quality-of-service attributes, including availability, manageability, integrity, and security, must also be considered and balanced against your performance objectives. Make sure you have a clear idea of what your performance objectives are (including resource constraints) during the design phase.

For more information about technology-specific design guidelines, see the "Design Considerations" section of each of the chapters in Part III, "Application Performance and Scalability."

Additional Resources

For more information, see the following resources:

For a printable checklist, see "Checklist: Architecture and Design Review for Performance and Scalability" in the "Checklists" section of this guide.
For a question-driven approach to reviewing your architecture and design from a performance perspective, see Chapter 4, "Architecture and Design Review of a .NET Application for Performance and Scalability."
For a question-driven approach to reviewing code and implementation from a performance perspective, see Chapter 13, "Code Review: .NET Application Performance."
For information about how to assess whether your software architecture will meet its performance objectives, see PASA: An Architectural Approach to Fixing Software Performance Problems, by Lloyd G. Williams and Connie U. Smith, at http://www.perfeng.com/papers/pasafix.pdf.
For more information about application architecture, see Application Architecture for .NET: Designing Applications and Services, on MSDN at https://msdn.microsoft.com/en-us/library/ms954595.aspx.
For more information about patterns, see Enterprise Solution Patterns Using Microsoft .NET, on MSDN at https://msdn.microsoft.com/en-us/library/ms998469.aspx.
For more information about security, see Building Secure ASP.NET Applications: Authentication, Authorization and Secure Communication, on MSDN at https://msdn.microsoft.com/en-us/library/aa302415.aspx and Improving Web Application Security: Threats and Countermeasures, on MSDN at https://msdn.microsoft.com/en-us/library/ms994921.aspx.

Retired Content
This content is outdated and is no longer being maintained. It is provided as a courtesy for individuals who are still using these technologies. This page may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.

Chapter 3 — Design Guidelines for Application Performance

Improving .NET Application Performance and Scalability

Contents

Objectives

Overview

How to Use This Chapter

Principles

Design Process Principles

Design Principles

Deployment Considerations

Consider Your Deployment Architecture

Nondistributed Architecture

Distributed Architecture

Identify Constraints and Assumptions Early

Evaluate Server Affinity

Use a Layered Design

Stay in the Same Process

Do Not Remote Application Logic Unless You Need To

Scale Up vs. Scale Out

Scale Up: Get a Bigger Box

Scale Out: Get More Boxes

Guidelines

Consider Whether You Need to Support Scale Out

Consider Design Implications and Tradeoffs Up Front

Consider Database Partitioning at Design Time

More Information

Architecture and Design Issues

Coupling and Cohesion

Design for Loose Coupling

Design for High Cohesion

Partition Application Functionality into Logical Layers

Use Early Binding Where Possible

Evaluate Resource Affinity

More Information

Communication

Choose the Appropriate Remote Communication Mechanism

Design Chunky Interfaces

Consider How to Pass Data Between Layers

Minimize the Amount of Data Sent Across the Wire

Batch Work to Reduce Calls Over the Network

Reduce Transitions Across Boundaries

Consider Asynchronous Communication

Consider Message Queuing

Consider a "Fire and Forget" Invocation Model

More Information

Concurrency

Reduce Contention by Minimizing Lock Times

Balance Between Coarse- and Fine-Grained Locks

Choose an Appropriate Transaction Isolation Level

Avoid Long-Running Atomic Transactions

More Information

Resource Management

Treat Threads As a Shared Resource

Pool Shared or Scarce Resources

Acquire Late, Release Early

Consider Efficient Object Creation and Destruction

Consider Resource Throttling

More Information

Caching

Decide Where to Cache Data

Decide What Data to Cache

Decide the Expiration Policy and Scavenging Mechanism

Decide How to Load the Cache Data

Avoid Distributed Coherent Caches

More Information

State Management

Evaluate Stateful vs. Stateless Design

Consider Your State Store Options

Minimize Session Data

Free Session Resources As Soon As Possible

Avoid Accessing Session Variables from Business Logic

More Information

Data Structures and Algorithms

Choose an Appropriate Data Structure

Pre-Assign Size for Large Dynamic Growth Data Types

Use Value and Reference Types Appropriately

More Information

Design Guidelines Summary

Desktop Applications Considerations

Browser Client Considerations