Data Points

Techniques for Managing Rowset Paging

John Papa

Code download available at:DataPoints0305.exe(110 KB)

Contents

Client-side Options
ASP State
State Management in ASP.NET
Paging in SQL
Conclusion

There are a number of ways to handle paging in Web applications. This month I'll examine several paging techniques and weigh their pros and cons. I'll discuss and demonstrate how to manage paging and caching issues through the lower tiers of an n-tiered architecture, including how to make SQL Server manage the paging on your app's behalf.

Most Web commerce sites have at least one page that lists rows from a rowset. Usually the rowset lists such things as the results of a search or the contents of a shopping cart. One common dilemma is how to handle the paging of a rowset on a Web page when the rowset could contain 1 or 100,000 rows.

In addition, one of the key decisions you need to make in deciding such things as how to handle paging in a Web application is where in the architecture the paging process belongs. You could choose to manage the paging of a rowset in the database, in the business component, in ASP, or even in the client. I'll walk through the advantages and disadvantages of each of these choices. I'll take the top-down approach in this column and start out at the presentation tier of the n-tiered architecture, where I'll discuss the issues involved in the management of paging at the client level using various caching features from XML data islands through DHTML. Then I'll discuss paging found in the middle tiers, such as in ASP using caching techniques like the ASP Session object. I'll also discuss how to manage paging of rowsets using the database's tools; in particular, stored procedures (SP) and table datatypes. Finally, I'll review the options and present some food for thought on when to choose which option for your application.

Client-side Options

Displaying a rowset and managing the paging through client-side code in the browser can be accomplished in a few different ways. But first you must consider the browser that your application will support. If you want your application to run in both Microsoft® Internet Explorer and Netscape, you'll have fewer options due to the differences in the browsers' Document Object Models (DOMs). One technique for managing paging in the browser is to pass the data to the browser and store it there, perhaps in an XML data island. However, if you are using Netscape browsers as well, you may consider storing the data in the browser using JavaScript objects instead.

If you are running in a LAN environment and you want to avoid round-trips to the server, you could pass the data to the client and store it there. That way, when you go to the next page of the rowset, you have the data with you and you can use some DHTML to load the next page. For example, let's say the rowset contains a list of 100 customers and your application passes it to the client. The browser could then store it in an XML data island and you could use some DHTML code to display the first 10 records. Then when the "next page" hyperlink is clicked, you could call a function that clears the records that are being displayed and loads the next 10 records in the XML data island into an HTML table.

One drawback to this approach is that you are sending a lot of data to the browser and relying on the browser's capabilities. Another drawback is that all of your data is being cached (in any caching scenario), which means that it is insensitive to changes being made in the database. The upside is that you can page through the data without having to make round-trips to the server. While this scenario is viable, it is not the optimal solution when compared to managing the paging in one of the lower tiers. It could destroy an app's performance if the rowset is large enough and the bandwidth is low.

ASP State

Managing the paging of a rowset through ASP state can be accomplished by using some sort of state management technique. In order to cache a rowset using ASP, the rowset would need to be stored in a database, the ASP Session object, a file, or by some other state persistence mechanism. Storing a rowset in a file, perhaps using XML, would allow the application to persist the rowset, but would mean that the application might have a significant file system bottleneck, as I/O access is not as efficient as database access or Session access. Plus, persisting the data to a file requires a naming convention that is unique to the active user's session, and that data would have to be cleaned up by some process.

Storing the rowset in the ASP Session object is often a better option than persisting to a file—the trick is choosing the container for that data. Storing an ADO Recordset object in an ASP Session object is not scalable because of the way the ASP Session stores objects. However, storing the data in an XML format in the Session object yields more promise. This way, you do not have to be concerned with storing an object in Session, since the XML can be stored as a string. Plus, you can easily go from XML to an ADO Recordset and back if need be. However, the cost of storing data of any kind in the ASP Session object can add up. When you have a tremendous amount of users hitting your Web application, all of the data stored in Session can hog your server's memory. Also, the ASP Session object is not scalable across load-balanced servers in a Web farm. For these reasons, using the Session object to cache rowsets should be considered carefully before committing to this solution. In situations where you need to get to the data and must cache at the ASP level, this is an acceptable solution but there are problems when scaling to more simultaneous users. Also, keep in mind that the ASP Session object is stored in the Web server's process, so if the Web service stops, the state is lost.

Another option when caching rowsets using ASP is to cache the data in a state database. Unlike ASP.NET, ASP 3.0 does not have this built-in feature, but it's easy to implement in any case (see "Taming the Stateless Beast: Managing Session State Across Servers on a Web Farm" in the October 2000 issue of MSDN® Magazine for one such solution). Storing the data in a database is a more scalable solution than using the ASP Session object because it does not have the memory limitations that the ASP Session does. Also, state databases can be used in a Web farm. However, accessing data stored in a database is less efficient than accessing data stored in memory.

As you can see, when caching rowsets in ASP there are considerations to make when choosing the option that best fits your needs. State databases are generally more scalable, while ASP Session objects are simpler to implement and faster. In building enterprise solutions, I often suggest using a state database because the performance you give up is usually negligible when managed properly.

State Management in ASP.NET

As mentioned previously, ASP.NET has more built-in state management options than ASP 3.0. ASP.NET offers three solutions: in-process state (similar to the traditional ASP Session), out-of-process state in a state service, and an option that uses a state database. The fastest option is still using in-process state. However, the state service and state database options in ASP.NET offer more resiliency and scalability. The ASP.NET state service uses memory, just like the in-process state; however, it maintains the state outside of the Web server's process space. So when using the ASP.NET state service your state will survive any Web service restarts. Of course, state stored in a state database will also survive any Web service restarts, but it will also survive Web server reboots. One of the greatest benefits of ASP.NET state is that you can switch between the different management techniques by changing the sessionState mode setting in the config.web file, as shown here:

<sessionState mode="Off|Inproc|StateServer|SqlServer" 
  cookieless="true|false" timeout="number of minutes" 
  connectionString="server name:port number" 
  sqlConnectionString="sql connection string" />

Regardless of which mode you use in ASP.NET, all of the code to access and store state throughout your application is unaffected.

Paging in SQL

One common problem with all of the caching techniques discussed thus far is that they all require that the entire rowset be cached somewhere. This means that if you had a rowset with 1,000 rows of data, all 1,000 rows would be cached in an XML data island, the ASP Session object, a database, or whatever caching method you choose. It also means that you are transporting all 1,000 rows across the network from the database to the Web server on the initial call. If you want to limit what you transport across the network and do not want to cache the entire rowset, consider using the database to do the dirty work or think about finding some other custom paging technique.

One key feature of relational databases is that they are good at managing data. So why not ask the database to retrieve only the rows that you want? For example, if a rowset has 1,000 rows and you want to get the first 10 through a stored procedure, you could query the database for just those 10 records. When the user moves to the next page of the rowset, you would then ask the database to retrieve rows 11 through 20. While this technique requires more interaction with the database, you are not moving all that data across the network, nor are you caching it anywhere. One upside (or downside depending on how you look at it) is that your application would always show the latest database contents. However, this could also result in skipped rows when you page through.

One way to do this is to create a stored procedure that accepts the beginning and ending row numbers that you want. Figure 1 shows a sample stored procedure that, when run against the Northwind database, will return only the rows requested from the orders table. Of course, this stored procedure could be modified to narrow the search to orders for a particular customer or in a certain date range.

Figure 1 Paging Using a Stored Procedure

CREATE PROCEDURE prPagingExample
   @nRecordNumberStart INT, @nRecordNumberEnd INT AS 
SET NOCOUNT ON 
-------------------------------------------------------------------- 
-- Define the table to do the filtering and paging
-------------------------------------------------------------------- 
DECLARE @tblTempData TABLE ( nID INT IDENTITY, OrderID INT, OrderDate DATETIME, 
  CustomerID NCHAR(5) ) 
INSERT INTO @tblTempData ( OrderID, OrderDate, CustomerID ) 
  SELECT OrderID, OrderDate, CustomerID FROM Orders ORDER BY CustomerID, OrderDate 
  SELECT OrderID, OrderDate, CustomerID FROM @tblTempData 
    WHERE nID BETWEEN @nRecordNumberStart AND @nRecordNumberEnd 
    ORDER BY nID ASC

The stored procedure in Figure 1 first creates a variable called @tblTempData of the TABLE datatype. This table contains all of the fields that you want to retrieve from the query plus an additional column called nID. The nID column is an integer and is used to number each of the rows. As the rows are inserted into the table @tblTempData, they are numbered from 1 to n (where n is the number of rows). Since the nID column is typed as an IDENTITY value, the row numbering is created and stored in this column. Once the rowset is numbered, retrieving a specific range of rows becomes easy. The last part of the SP retrieves all of the columns but nID from the table @tblTempData, limiting the rows where the nID is within the range that was passed in.

This option does not require caching of the rowset in ASP or on the client, nor does it place a heavy burden on the network since the rowset is limited to a reasonable size before it leaves the database. However, this option does retrieve all of the data in the stored procedure each time and then discards most of it before returning the rowset. Also, since this stored procedure may be called repeatedly, if the query takes a long time to run, it could severely impact the performance of your application. If the query is slow, it would be better to use a caching option such as storing the rowset in a state database through ASP 3.0 or ASP.NET.

Conclusion

Some other simple but effective ways of enhancing paging performance are to limit the number of rows that a user can see on a page (don't allow the user to show more than 500 rows at a time) and optimize your searching query as much as possible. If your application allows the user to search on 10 different fields, it would be better to run a SQL query that only operates against the fields that the user is interested in. For example, if the user searches for all orders within a certain date range, your WHERE clause should include this criteria. You could create a single stored procedure that searches all of the fields at one time, which would be easy to maintain, but you will get better performance by writing a SQL™ statement that dynamically constructs the WHERE clause using only the criteria needed. It is a little harder to maintain, but that is easily made up by gains in performance.

What becomes pretty clear after some research into paging and caching techniques is that there is no perfect solution free of side effects. Caching on the client is the fastest of all of the solutions once the page is loaded. However, the initial loading of the page takes more time and requires more client-side code to manage the paging. While there are several techniques for caching data in ASP, the best option from my experience when using ASP to store a rowset in the XML format is to use a state database. Whether using the one that comes with ASP.NET or building your own, the performance is good and the data is resilient.

When caching at the ASP level is not an option for you, consider using a stored procedure to page the data, as shown in Figure 1. Using the stored procedure yields good performance (without caching) as long as the query does not take excessive amounts of time to execute. There are other options for paging and caching rowsets as well, including the use of the shared property manager of Microsoft Transaction Services to cache the data in a business object and other custom techniques.

Whatever option you end up using, it is important to make sure that the solution performs well for your application, is easily maintainable, and can scale with your application. When choosing a paging technique, take care to weigh your options against your application's context.

Send your questions and comments for John to mmdata@microsoft.com.

John Papais a baseball fanatic who spends most of his summer nights rooting for the Yankees with his two little girls, wife, and faithful dog, Kadi. He has authored several books on ADO, XML, and SQL Server and can often be found speaking at industry conferences such as VSLive. You can reach him at mmdata@microsoft.com.