SQL Server CE and Scalability

 

Jim Wilson

JW Hedgehog, Inc.

June 2004

Applies to:
   Microsoft® Visual Studio® 2005
   Microsoft SQL Server™ 2000 Windows® CE Edition
   Microsoft .NET Compact Framework
   Microsoft eMbedded Visual C++®

Summary: This is the second article in the "You Can Take It with You" column. This article offers tips that developers can use to improve the scalability of Compact Framework-based applications accessing SQL Server CE. (10 printed pages)

Contents

Mobility Has Gone Mainstream
Enterprise Data Management and Mobility
Effect of Storage Media
Keep Your House (or Database) in Order
Work with the Query Processor
Get Down to the Metal
Conclusion

Mobility Has Gone Mainstream

Well, here it is, another Sunday morning and I'm looking back on the month that's passed since the first edition of "You Can Take It with You." Wow! What a month it's been for mobility.

In the past month, Microsoft has launched the Windows Mobile Solutions Partner Program, released the first technology preview of Microsoft® Visual Studio® 2005 containing an updated version of the Microsoft .NET Compact Framework, and had a tremendously successful Microsoft Mobile DevCon (MDC) in the United States. By the time most of you read this, the MDC will have run in Australia, Malaysia, and Korea as well. On the horizon are MDC Canada and MDC Europe.

Here in the United States, I was extremely impressed by both the number and the quality of the attendees at the MDC. I had the opportunity to present an Introduction to Mobility preconference workshop and couldn't believe how many people showed up. Some attendees even needed to stand along the walls because there were no more seats available.

What I've learned from talking to the people I met is that many developers no longer view mobility as a subspecialty. It is now simply part of the bag-of-tricks that enterprise developers use to solve information management problems. Just one more confirmation of this fact is the Tech Ed 2004 conference schedule. Tech Ed is Microsoft's biggest enterprise developer conference. This year both Tech Ed United States and Tech Ed Europe are dedicating a full conference track to mobility.

There are no two ways about it: mobility has gone mainstream.

Enterprise Data Management and Mobility

As mobility goes more mainstream, we are deploying increasingly complex mobile applications. Applications now commonly require large data volumes that just a few years ago would have been incredibly difficult to manage on a mobile device. Device applications managing tens of thousands of database records are becoming almost commonplace.

As the data volumes grow, the efficiency with which mobile applications access the data becomes notably more important. As I mentioned last month, Microsoft SQL Server™ 2000 Windows® CE Edition (SQL Server CE) is doing a great deal of work to make the next release more scalable, especially when accessed from the .NET Compact Framework. Of course, the problem is that many developers can't wait for Visual Studio 2005. So what do we do in the meantime?

It turns out that there are a number of simple things that developers can do today that can notably improve the scalability of Compact Framework-based applications accessing SQL Server CE.

Note The scalability points mentioned in this article are directed specifically at Microsoft SQL Server 2000 Windows CE Edition 2.0. When Microsoft SQL Server 2000 Windows CE Edition 3.0, code named Laguna, ships the underlying architecture will be substantially different. Be sure to check the documentation for details.

Effect of Storage Media

As data volumes grow, it may become necessary to store your application's SQL Server CE databases on a storage card rather than RAM. Using a storage card can have a dramatic impact on the performance of the database and therefore your application.

For many developers new to mobility, the choice of which storage card to purchase is often price driven. What these developers may not realize is that the choice of storage media can affect the performance of database access more than almost any other single decision.

In short, all storage cards are not created equal. The differences in read or write performance among storage cards can be huge, varying by as much as 100 percent and, in extreme cases, by nearly 300 percent. An inappropriate choice in storage media is virtually impossible to overcome by means of software.

There are many resources on the Internet that show comparisons among the various storage card vendors. Two that I have found helpful are from Digital Photography Review and Digit-Life.com. The Digital Photography Review article focuses on CompactFlash cards whereas the Digit-Life.com article compares a number of different media, including both CompactFlash and Secure Digital (SD) cards.

Keep Your House (or Database) in Order

Another big factor in the performance of large databases in SQL Server CE 2.0 is the organization of the database structure itself. As your application modifies the contents of the database, the records become more randomly distributed within the database file structure. This factor is especially true after a large number of inserts and deletes. To ensure optimal access to the database, compact the database after any substantial change to the contents.

In addition to recovering unused space, performing a compact on the database has two notable impacts on performance: first, it stores all table records in order by their primary key; second, it updates the statistics used by the query processor.

Ordering the records by primary key can notably improve primary key access. This is due to the page-oriented nature of SQL Server CE (and most other databases). Rather than loading individual records from the database into memory, SQL Server CE loads blocks of records called pages. When the database records are grouped in order by primary key, loading the page containing one record automatically loads those records with similar primary key values. For most applications, this results in what's referred to as a good "hit rate," which means that when your application goes to access successive database records, there is a strong likelihood that the page containing those records is already in memory and can be directly accessed. When records are more randomly distributed, as often happens after a large number of inserts and deletes, there is a poor hit rate requiring SQL Server CE to retrieve more pages from the database file to access the same number of records.

The query processor statistics influence how the query processor determines the best method for locating records. Decisions like whether to use a key or do a sequential scan to locate a particular record are all influenced by the query processor statistics. As the statistics become stale, there is an increased likelihood that the query processor may make a less than optimal decision. Performing a compact refreshes these statistics.

Performing a compact is quite simple. Create an instance of the SqlCeEngine with a connection string pointing to the original database file; then call the Compact method with a connection string identifying the new database file.

SqlCeEngine eng = new SqlCeEngine("Data Source=StudentDB.sdf");
eng.Compact("Data Source=NewStudentDB.sdf");

It's important to note that the Compact method doesn't actually compact the original database but rather creates a new compacted version of the database. In the above example, NewStudentDB.sdf is a compacted version StudentDB.sdf.

Applications will frequently use the File methods to put the new version of the database file in place of the original.

SqlCeEngine eng = new SqlCeEngine("Data Source= StudentDB.sdf");
eng.Compact("Data Source= NewStudentDB.sdf");
// Backup the original database file
File.Move("StudentDB.sdf", "StudentDB_BAK.sdf");
// Change the new database file name to that of the original
File.Move("NewStudentDB.sdf", "StudentDB.sdf");

Work with the Query Processor

As with any other relational database, the primary way applications interact with SQL Server CE is by means of SQL commands. As we know, to actually retrieve data, the database must translate each SQL command into a set of operations. These individual operations are what actually return or modify the data and are commonly known as the query plan. Depending on the complexity of an individual SQL command, the time required to produce the query plan can be a substantial portion of the overall data access time. Minimizing the time your application spends in the query processor can notably improve application performance.

Use Parameterized Queries

One very easy way to reduce the amount of time your application spends in the query processor is to use parameterized queries. Parameterized queries are useful anytime your application needs to perform a query multiple times. This is true whether you are using the same or different data values for each query execution. Using parameterized queries is much more important on SQL Server CE than on the server version of Microsoft SQL Server 2000 due to differences in their respective query processors. Most notably, the resource limitations of smart devices prevent SQL Server CE from caching queries as the server version of SQL Server 2000 does. By default, SQL Server CE fully calculates the query plan for each SQL command even if it has previously processed that same command. You can avoid this repeated cost by using parameterized queries.

The first step is to construct your SQL statement so that it can be used with more than one set of data values. To do this, construct the SQL statement by using placeholders for each data value. In SQL Server CE 2.0, these placeholders must be question marks (?).

Note Microsoft SQL Server 2000 Windows CE Edition 3.0, code-named Laguna, is adding support for named parameters. This will allow .NET Compact Framework-based applications to use the same notation (@parameter_name) as the server version of SQL Server 2000.

In the SQL Server CE 2.0 parameterized syntax, a statement to insert three values into a database table named StudentTable would look like the following:

string sqlText = "Insert into StudentTable (Name, YearsExp, Company) values (?, ?, ?)";

After we have the parameterized statement, we associate the data type of each of the parameters by using the Parameters collection of the SqlCeCommand class. For variable length data types such as strings, we should also associate the data size.

SqlCeConnection conn = new SqlCeConnection(@"Data Source=\My Documents\StudentDB.sdf");
Conn.Open();
SqlCeCommand cmd = new SqlCeCommand(sqlText, conn);

cmd.Parameters.Add(new SqlCeParameter()) ;
cmd.Parameters[0].DbType = DbType.String ;
cmd.Parameters[0].Size = 40 ;

cmd.Parameters.Add(new SqlCeParameter()) ;
cmd.Parameters[1].DbType = DbType.Int32 ;

cmd.Parameters.Add(new SqlCeParameter()) ;
cmd.Parameters[2].DbType = DbType.String ;
cmd.Parameters[2].Size = 40 ;

The order the parameters are added must correspond to the order that the question marks appear in the SQL statement as parameters are bound based on their order in SQL Server CE 2.0. When describing the parameter type information, you can use either the DbType or the SqlDbType enumerations, but SQL Server CE will perform slightly better with DbType.

With the parameters ready, we can now ask SQL Server CE to calculate the query plan for this statement. To do this we use the Prepare method.

cmd.Prepare() ;

The Prepare method stores the query plan with this instance of the SqlCeCommand class. We can now repeatedly execute this command substituting different data values without needing to recalculate the query plan.

while (moreData) 
{
  cmd.Parameters[0].Value = // get Name value
  cmd.Parameters[1].Value = // get YearsExp value
  cmd.Parameters[2].Value = // get Company value

  cmd.ExecuteNonQuery() ;
}
conn.Close();

Go Directly to the Table

For some applications, it may make sense to completely bypass the query processor. For these applications, SQL Server CE provides a mechanism called table-direct mode. By using table-direct mode, your applications are able to work directly against the underlying table structure. To use table-direct mode, simply set the command type to be CommandType.TableDirect and specify the table name for the command text.

SqlCeCommand cmd = new SqlCeCommand("StudentTable", conn);
cmd.CommandType = CommandType.TableDirect;
SqlCeDataReader rdr = cmd.ExecuteReader();
While (rdr.Read())
{
  ...
}

By using table-direct mode, you completely bypass the query processor and directly open the underlying table. This example is the fastest way possible to access the contents of a table from the Compact Framework. This example provides basically the same result as having done a Select * from StudentTable, but without having paid the cost of the query processor.

Ordering and Filtering

Now that we have table-direct mode, we know how to get to the data quickly, but simply getting all of the data in a table without any order or filtering has limited value. Not to worry, though, because table-direct mode is just the beginning.

After we're in table-direct mode, we can utilize the SqlCeCommand IndexName property to control the order the content is accessed. IndexName can be set to the name of any valid index on the table and will result in the data being returned in order by that index.

SqlCeCommand cmd = new SqlCeCommand("StudentTable", conn);
cmd.CommandType = CommandType.TableDirect;
cmd.IndexName = "idxExperience";  // set the index name
SqlCeDataReader rdr = cmd.ExecuteReader();
While (rdr.Read())
{
  ...
}

Assuming our StudentTable has an index named idxExperience created on the field named YearsExp; this code will result in all of the records in StudentTable being returned in order by the value of the YearsExp field. This is the same as issuing the SQL command Select * from StudentTable Order By YearsExp, but, again, without the overhead of the query processor.

Once we have the index set, we can add filtering with the SqlCeCommand SetRange method. SetRange enables us to limit the returned data to those records with index values within the specified range. SetRange takes three parameters: one or more flags indicating how the comparison should be performed; an array containing the values for the starting range of the comparison; an array containing the values for the ending range of the comparison.

SqlCeCommand cmd = new SqlCeCommand("StudentTable", conn);
cmd.CommandType = CommandType.TableDirect;
cmd.IndexName = "idxExperience";
// create arrays specifiying the range
object[] startRange = new object[] {5};
object[] endRange = new object[] {10};
// Set the range
cmd.SetRange(DbRangeOptions.InclusiveStart | DbRangeOptions.InclusiveEnd,
   startRange, endRange);
SqlCeDataReader rdr = cmd.ExecuteReader();
While (rdr.Read())
{
  ...
}

In this example, I have specified that we would like the index comparison values to be 5 and 10. The way these values are interpreted is controlled by the DbRangeOptions flags. By specifying both the InclusiveStart and InclusiveEnd flags, we will receive all of the records with between 5 and 10 years experience inclusive just as if we issued the query "Select * from StudentTable where YearsExp >= 5 and YearsExp <= 10 order by YearsExp".

The DbRangeOptions flag is at the heart of the SetRange method as it specifies the type of comparison to perform. You can specify inclusive comparisons as I did in this example; you can also specify exclusive comparisons or indicate that only those records that exactly match a specific value be returned. You can even filter on records whose index begins with the specified value, allowing you to do a partial match similar to the SQL "Like" operator; an example of this might be searching an index of last names for all records with a last name starting with "W". You can see the complete list of flag values by checking out DbRangeOption Enumeration on the Microsoft MSDN® Web site.

The starting and ending ranges identify what key values the DbRangeOptions enumeration is applied to. Both the starting and ending range are specified as arrays to support multiple column indices. Each element in the array identifies the comparison value for that column in the index. In my example, idxYearsExp is a single-column index, so the startRange and endRange arrays each have one element. If IndexName referenced a two-column index, each array would have two elements; a three-column index would have a three element array, and so forth.

Seek and You Shall Find

Imagine an inventory management application that scans product bar codes and returns data about each selected item; or an application where the user might type in the first part of someone's name and would like to see the list of customers with matching names. For these types of applications, rapidly locating the matching records is essential to success. This scenario is exactly what the SqlCeDataReader Seek method is designed to address.

The Seek method provides the fastest mechanism available for locating SQL Server CE records from the Compact Framework because it allows you to immediately position the reader to any record identified by the open index. After it is positioned, the reader can read the remaining records in order by the index.

SqlCeCommand cmd = new SqlCeCommand("StudentsTable", conn) ;
cmd.CommandType = CommandType.TableDirect ;
cmd.IndexName = "idxExperience" ;

// Get the Reader
SqlCeDataReader rdr = cmd.ExecuteReader() ;

// Position to first record with YearsExp of 7
// Now process all records with YearsExp of 7
rdr.Seek(DbSeekOptions.FirstEqual, 7) ;
while (rdr.Read())
{
  if ((int)rdr["YearsExp"] > 7)
    break;
  // process normally
}

// Position to first record with YearsExp of 1
// Now process all records with YearsExp of 1 
rdr.Seek(DbSeekOptions.FirstEqual, 1) ;
while (rdr.Read())
{
  if ((int)rdr["YearsExp"] > 1)
    break;
  // process normally
}

To use the Seek method, the SqlCeDataReader must be returned from a SqlCeCommand opened in table-direct mode with the IndexName property set to a valid index name. The power of Seek comes from the fact that you can repeatedly call Seek to position to different records in the table without needing to reissue the command.

In the above example, I open StudentsTable in table-direct mode specifying the idxExperience index. I then use the returned reader to position directly to the first record with a YearsExp value of 7. The remaining records are read in order by YearsExp, so the example loops through processing all of the records until one with a YearsExp value greater than 7 is read; this ensures that I have processed all records with a YearsExp of 7. The example then positions directly to the first record with YearsExp of 1 and process all records with a YearsExp value of 1.

The exact type of search performed by the Seek method is controlled by the DbSeekOptions enumeration. Check out DbSeekOptions Enumeration on the MSDN Web site for the list of values. Rather than use arrays to support multiple column indices; Seek relies on a variable length parameter list. In my example, idxYearsExp is a single-column index so I specify a single data value after the DbSeekOptions parameter. If I was searching a two-column index, I would provide two data values after the DbSeekOptions parameter, three parameters after the DbSeekOptions parameter for a three-column index, and so forth.

Table Direct Wrap-Up

We've discussed a lot about SQL Server CE table-direct mode. Before we leave the topic, I do want to point out that most of your SQL Server CE data access code will likely not use table-direct mode. For most applications, standard SQL commands are the best route. SQL provides the most flexibility to your application and in general is easier to maintain because it's much less tied to the physical structure of the database than is table-direct access. In fact, in most applications it is difficult to even notice the difference in performance.

Table-direct mode is for those situations when you find that your application can't quite reach the performance goals necessary. Applications that benefit most from table-direct mode are those that issue a large number of queries against a small set of tables.

Get Down to the Metal

Although my focus has been on accessing SQL Server CE from the .NET Compact Framework, no discussion of SQL Server CE and scalability is complete without mentioning C++. The absolute fastest and most scalable way to work with SQL Server CE is to use Microsoft eMbedded Visual C++® and the OLE DB API.

All interaction with SQL Server CE 2.0 ultimately occurs through OLE DB. The Compact Framework ADO.NET Provider for SQL Server CE 2.0 is simply an abstracted wrapper over the SQL Server CE 2.0 OLE DB interfaces. Any call to the ADO.NET classes must be mapped to one or more OLE DB method calls. Using C++ your application can call into OLE DB directly and avoid the overhead of the ADO.NET abstraction.

Programming with C++ and OLE DB is notably more complex than using the Compact Framework and ADO.NET. For most applications, the increased productivity of the Compact Framework and ADO.NET as compared to C++ and OLE DB more than justifies the performance difference. If you do find that the scalability and performance requirements of your application cannot be met with the Compact Framework and ADO.NET, check out OLE DB and SQL Server CE for information about accessing SQL Server CE from C++.

Conclusion

Remember that, just like on the desktop, successfully building large-scale mobile applications requires a solid database strategy. I've shown you several techniques for improving the scalability of Microsoft SQL Server 2000 Windows CE Edition 2.0, but this is just part of the story. All of the standard best practices of database design apply as well. Things like writing well-structured queries, determining your indexing strategy, and determining the appropriate level of normalization are important as well. In the end, Microsoft SQL Server 2000 Windows CE Edition 2.0 is a relational database that happens to be running in a somewhat resource-constrained environment.

Well that's it for this month. For those of you who are going to be at MDC Canada or at United States Tech Ed, I look forward to seeing you there; otherwise, I'll see you here next month.