Manage Detail Pages Across Multiple Platforms with Centralized Data Caching

 

Jason Salas

April 2004

Applies to:
   Microsoft® ASP.NET

Summary: Use the ASP.NET Cache API in a tiered application design to get the most out of your applications. (10 printed pages)

Contents

Getting a Handle on Detail Pages
Appropriate Use of Caching in ASP.NET
Avoiding Caching Conflicts
Plan Your Code, Code Your Plan
Caching on the Horizon
Conclusion
Other Helpful Links:
Related Books

Building great Web applications requires an understanding of the behavior patterns of your site's visitors and the ability to tactically predict their expected actions in using the features you provide. You need to accurately envision how people will use your services, and intelligently design your back end to allow for maximum performance and scalability. This results in a great experience for your customers and less of an operational headache for you.

This article discusses how to manage one vital aspect of information-intensive Web-based systems: the proper caching of detail pages in multiple platform environments. Taking the time to make a few wise design decisions enriches the overall quality of your applications, to the satisfaction of your users.

Getting a Handle on Detail Pages

Most Microsoft® ASP.NET developers will need to make use of detail pages at some point in their projects. Detail pages are database-driven documents used by customers to get the full description of a particular record of data. Examples of such are news articles, customer orders, movie reviews, bug report summaries, blog posts, financial profiles, or sports stat sheets. Detail pages are presented in an assortment of types, representing data in archived or time delayed formats, or in real time.

Highly trafficked sites often use multiple types of detail pages, accessed through a variety of clients and devices, in order to cater to the growing number of ways information is made available on the Internet. Effectively controlling the underlying data access makes for an interesting challenge as you continually expand the ways in which your site delivers information.

For instance, the following are some common features you've undoubtedly seen on article oriented sites, all of which reference the same core item:

  • Article pages on the World Wide Web
  • Mobile articles suitable for the Wireless Web
  • "Print this page" pages
  • "E-mail this story to a friend" pages
  • Content generated for mass mailing lists
  • RSS feeds for syndication (not common, but some people do this)
  • Internal Web services
  • Internal administrative tools viewing/manipulating data

Additionally, information-rich sites also need to consider traffic generated by external entities that access their data, including:

  • Links from Google and search engine queries
  • Links from personal homepages and various online properties
  • TrackBack links from weblogs
  • RSS aggregators
  • Web service consumers

Taking advantage of the ASP.NET caching capabilities by using a single-source data caching repository in a data tier in situations like these can really help your applications—and it's incredibly easy to do.

Appropriate Use of Caching in ASP.NET

It's the nature of caching to provide fast access to frequently requested data by serving an in-memory copy of the desired information rather than making a direct database query. While the ASP.NET caching capability is a fantastic feature, if it is not managed appropriately it introduces the potential risk of allowing your data to become out-of-sync when more than one resource references the same data.

Just having the ability to cache information through the Microsoft® .NET Framework isn't enough—you need to couple helpful functionality with proper planning and sound design. It's surprising how often developers overlook this critical facet when designing a system.

It's logical to assume that most people who request multiple variations of the same resource do so within a certain timeframe, probably within a few minutes. For example, consider the following hypothetical situation for a news site using multiple detail pages: Jamil visits a Web page containing a news story with an ID of "123456," calls up the story's printable version, and then e-mails it using an on-page form to Marie, who a few minutes later reads the story on her mobile phone with a WAP browser.

Simultaneously, Sabrina accesses the story by way of a syndicated newsfeed to which she subscribes using RSS; Brant reads the same story on a Windows Forms application calling an XML Web service; and Andre, a site administrator, uses an internal tool to send the story to a mailing list he oversees.

Assuming each subsidiary service was contained within its own page, the backend of the application serving the content might use the following URLs to serve each separate detail page:

Jamil - /web/article.aspx?id=123456
Jamil - /web/print_article.aspx?id=123456
Jamil - /web/email_article.aspx?id=123456
Marie - /mobile/article.aspx?id=123456
Sabrina - /rss/feed.aspx?id=123456
Brant - /webservices/article.asmx?id=123456
Andre - /admin/mass_mailer.aspx?id=123456

This type of circumstance in an uncached atmosphere would generate a single trip to the database for each page visited, which adds up in high-volume situations.

Even environments that cache data may do so in an inefficient manner. As an example, an unwise implementation of ASP.NET caching for the detail pages above might employ the high-level API OutputCache directive on each client page, storing a copy of each specific record by setting the VaryByParam property.

<%@ OutputCache Duration="600" VaryByParam="id" %>

While this certainly will work, it results in an unnecessary abundance of cached items on the server, with each page having (1) its own unique entry, and (2) its own subset of record-specific entries beneath it. This exponentially adds more entries on the server, which raises the possibility that items may be prematurely scavenged to free up memory. If not planned out intelligently, what appears to be a performance enhancement can actually hurt your app in the aggregate.

Avoiding Caching Conflicts

Besides the additional memory occupied by repetitive items, another danger in detail pages to look out for is inconsistency due to changes made to the data source itself. Output caching across multiple platforms can easily work against you and return dissimilar information, due to multiple pages relying on their own time-sensitive expiration settings.

For instance, suppose you modify an existing record in your database by executing a SQL UPDATE statement. An architecture based on per-page output caching would render pages hit prior to the modification with old content—and would continue to do so until the cache item's duration expired. As a result, the user viewing the page would receive stale or otherwise inaccurate information.

The obvious solution for this issue is to centralize your application's cached data into a single tier and through a single collection. Fortunately the Cache API, being a globally available dictionary within an ASP.NET application, is naturally centralized.

Plan Your Code, Code Your Plan

Building on our previous example, we'll construct a simple data tier object we'll call "NewsCache," containing methods to (1) return memory-resident information from the Cache and (2) force the eviction of items to demonstrate a centralized caching tier. Both methods essentially act as wrappers for the Cache.Insert and Cache.Remove methods, respectively.

Because we're working with detail pages, the GetCachedNewsStory method returns a DataTable, which consists of a single DataRow, assumedly with multiple fields for things like title, body, attribution, graphics, publication date, etc.

using System;
using System.Web;
using System.Web.Caching;
using System.Data;
using System.Data.SqlClient;
using System.Configuration;

namespace NewsyNews
{
  public class NewsCache
  {
    private static HttpContext ctx = HttpContext.Current;
    private static CacheItemRemovedCallback itemRemoved = null;
    public DataTable GetCachedNewsStory(string storyID)
    {
      object cacheItem = (DataTable)ctx.Cache[storyID];
      itemRemoved = new 
        CacheItemRemovedCallback(this.ItemRemovedFromCache);

      if(cacheItem == null)
      {
        cacheItem = GetFreshDataFromDatabase(storyID);
        ctx.Cache.Insert(storyID,cacheItem, 
          null, 
          DateTime.Now.AddMinutes(20), 
          Cache.NoSlidingExpiration, 
          CacheItemPriority.High, 
          itemRemoved);
      }
      return (DataTable)ctx.Cache[storyID];
    }

    public void ResetCachedNewsStory(string storyID)
    {
      if(ctx.Cache[storyID] != null)
      {
        lock(typeof(Cache))
        {
          ctx.Cache.Remove(storyID);
        }
      }
    }

    private DataTable GetFreshDataFromDatabase(string storyID)
    {
      DataTable table = new DataTable();

      using ( SqlCommand comm = new SqlCommand() )
      {
        comm.CommandText = "GetDetailPageContents";
        comm.CommandType = CommandType.StoredProcedure;
        comm.Parameters.Add(new SqlParameter("@StoryID",
          SqlDbType.Int,4));
        comm.Parameters["@StoryID"].Value = storyID;
        comm.Connection = new 
SqlConnection(ConfigurationSettings.AppSettings["SQL2000"]);

        using ( SqlDataAdapter da = new SqlDataAdapter() )
        {    
          da.SelectCommand = comm;

          try
          {
            comm.Connection.Open();
            da.Fill(table);
            comm.Connection.Close();
          }
          catch
          {
            return null;
          }
          finally
          {
            if((comm != null) && 
              (comm.Connection.State == ConnectionState.Open))
              comm.Connection.Close();
          }
        }
      }
      return table;
    }

    private void ItemRemovedFromCache(string key, 
      object value,CacheItemRemovedReason reason)
    {
      if(reason == CacheItemRemovedReason.Removed)
        GetFreshDataFromDatabase(key);
    }
  }
}

All the detail pages and clients need only to instantiate a NewsCache object, and assign the result of executing GetCachedNewsStory() to a DataTable, and they're set!

NewsyNews.NewsCache data = new NewsyNews.NewsCache();
DataTable dt = data.GetCachedNewsStory("123456");

A couple of notes on the code used: one important thing in the GetCachedNewsStory method is the use of the Cached Data Reference Pattern, saving the value of the cached item to a local variable, so that the actual Cache object is read only once. Also observe that before calling Cache.Remove, a Microsoft® Visual C# lock is placed on the Cache object, which prevents possible concurrency conflicts.

We've also included a callback method to handle manually evicted items, automatically reinserting them back into the Cache. So when doing administrative work and calling our custom eviction method, our code will ensure new data is displayed. Using callbacks exhibits good housekeeping and proactive thinking on the part of the developer.

Note that the class would work without the callback, and could rely solely on the simpler model of having the next immediate page request reinsert data back into the Cache. It's considered better practice, however, to ensure that all work related to performing an update is done by the update itself, with whatever delay that entails, rather than left for the next unwitting user. In this case, repopulating the cache after an update is something that takes time, and it provides a better user experience to make users who are performing the update have to wait (they know updates take time) than to cause random users of the application to wait (they'll just think the site is slow).

In general, user tolerance for system delays is higher if they know they are doing something interactive with the site than if they are simply browsing content.

It makes sense to consolidate your program logic in this manner—if your methods will be used by more than a single page, you roll a component to encapsulate the functionality; likewise, if you have data that will be accessed frequently, with multiple pages calling that same data, you use caching to optimize performance. In this example, we're doing both. This is not just good caching practice, but also good object-oriented design.

Caching on the Horizon

In most simple environments, having your detail pages rely on items stored in the Cache object is usually sufficient. It provides quick access to frequently requested pages, thereby reducing strain on the database, and provides single-source simplicity for managing exactly what's being stored, and for how long.

Still, more complex atmospheres may call for more elaborate solutions. Thankfully, the flexibility of the Cache API in the System.Web.Caching namespace provides a developer with a variety of scalable options.

On that note, Microsoft® ASP.NET 2.0 "Whidbey" will sport several great additions to its feature set for caching, many of which directly address several of the issues we've mentioned here. Foremost among these is SqlCacheDependency, which delivers dependency-based caching for SQL Server databases, automatically invalidating entries as database values change, relieving you of the need to manually call Cache.Remove. The SqlCacheDependency feature is more tightly integrated with output caching, making for easier avoidance of the out-of-sync concerns we've mentioned.

On this note, programmers also will enjoy more low-level API control for fragment caching, in which specific portions of pages (typically user controls) are stored for quick access.

Also shipping as part of ASP.NET 2.0 will be the ability for a developer to create custom dependencies, as Microsoft has unsealed some of the classes that previously prevented rolling such a solution. ASP.NET 1.x provides programmatic control over time, key, and file-based dependencies (the examples in this article dealt mainly with cache expirations after a certain amount of time), and the forthcoming version of ASP.NET will provide the facilities for a developer to create, for example, role-based dependencies that vary the caching policy employed for general users, managers, or site administrators.

Developers will also get the ability to create phenomenal functionality with AggregateDependency, which invalidates cache items based on two or more file, key, time, database, or developer-defined dependencies.

Post-Cache Substitution is also a really slick feature of ASP.NET 2.0, which caches entire pages while dynamically replacing specified bits of content. Suppose a page contains a Repeater and several AdRotator Web server controls. Using Post-Cache Substitution, the page can populate the Repeater with cached information upon its first request, and cache that same data across page refreshes, while the AdRotator continually displays rotating advertisements, all while caching the entire page!

Lastly, although not one of the caching features of Whidbey per se, the new powerful DetailsView Web server control was created specifically to display and manage data in individual records. Using a DetailsView allows you to easily drill down to specific information in a master-detail relationship across multiple pages, or all from within the same WebForm. The control includes administrative capabilities to add/remove/modify rows, and is the perfect candidate for the situations we've described above.

Conclusion

Hopefully I've given you something to think about when it comes to maximizing accessibility to your information and creating great system architectures. The benefits of reusability, centralization, and an appreciation for the tiered approach to application development are key to building winning Web applications.

As your projects get larger both in scope and complexity, you'll undoubtedly add more and more subsidiary services to ramp up the level of ways you can serve your customers, with the timely management of data caching inevitably coming into play. Even if you already use caching extensively in your application, I hope I've provided you with something to chew on as you think about expansion.

Successful sites expand outwardly, so it behooves you to start planning a scalable architecture early. There are fantastic facilities baked right in to the .NET Framework, but maximizing its effectiveness in your applications means proper usage. Using a tiered model delivers easy integration with business logic, and allows new modules and features to be plugged directly into existing functionality, without having to go through a major revamp.

Don't get me wrong—there are several scenarios where output caching is just a better fit. You can also get great results by using the HttpCachePolicy class, another low-level API, found in the System.Web namespace.

Needless to say, the future is bright and the opportunities are many for getting the most out of your applications with caching. Just be aware that you need to do it right.

 

About the Author

Jason Salas, a Microsoft MVP for ASP.NET, is Web development manager for KUAM in Guam (www.kuam.com), where he also serves as a television news anchor, co-hosts a weekly sportstalk radio program, and has several authoring credits to his name. He was the first person on Guam to use .NET technologies, and has written numerous articles on development with ASP.NET.

Jason is the founder and president of the ASP.NET User Group of Guam (www.guam-asp.net). He holds degrees in marketing and in music theory, and has a MBA in Technology Management. You can reach Jason at jason@kuam.com.

© Microsoft Corporation. All rights reserved.