Printer Friendly Version      Send     
Click to Rate and Give Feedback
Related Articles
This article presents an overview of the motivation behind new techniques that decompose problems into independent pieces for optimal use of parallel programming.

By David Callahan (October 2008)
We take a look at planned support for parallel programming for both managed and native code in the next version of Visual Studio.

By Stephen Toub and Hazim Shafi (October 2008)
Here we describe some of the more common challenges to concurrent programming and present advice for coping with them in your software.

By Joe Duffy (October 2008)
Here is an ASP.NET AJAX data-driven Web application that takes the best features from server- and client-side programming to deliver an efficient, user-friendly experience.

By Bertrand Le Roy (October 2008)
More ...
Articles by this Author
Here John Papa demonstrates how to build a Silverlight 2 user interface that communicates through WCF to interact with business entities and a database.

By John Papa (September 2008)
Here we build a solution that fits the Entity Framework into an n-tier architecture that uses WCF and WPF and the MVP pattern.

By John Papa (July 2008)
The Entity Framework is a hot topic, but how do you know when to use EntityClient, Object Services, Entity SQL, or LINQ? Find out here.

By John Papa (May 2008)
LINQ to Objects and LINQ to Entities have an arsenal of Standard Query Operators that operate on sequences to perform a wide variety of operations.

By John Papa (March 2008)
With the Entity Framework, developers are given more flexibility by being able to design around a conceptual model rather than a relational data model. To get started, you need to know how to design an Entity Data Model. John Papa walks you through that process.

By John Papa (February 2008)
This month John Papa takes a look at developing a mobile application that can access data on your application server.

By John Papa (January 2008)
WPF is one of the most important new technologies in the .NET Framework 3.0. This month John Papa introduces its data binding capabilities.

By John Papa (December 2007)
In T-SQL, views and derived tables serve similar purposes and have their advantages. But when working with SQL Server 2005,there's a third option--using Common Table Expressions (CTEs). John Papa explains.

By John Papa (October 2007)
More ...
Popular Articles
Here we present techniques for programmatic and declarative data binding and display with Windows Presentation Foundation.

By Josh Smith (July 2008)
Kenny Kerr sings the praises of the new Visual C++ 2008 Feature Pack, which brings modern conveniences to Visual C++.

By Kenny Kerr (May 2008)
Systems that handle failure without losing data are elusive. Learn how to achieve systems that are both scalable and robust.

By Udi Dahan (July 2008)
In this article, author John Torjo presents a guide to his C++ GUI library called eGUI++ and explains how it makes user interface programming easier.

By John Torjo (June 2008)
More ...
Read the Blog
Well designed code keeps things that have to change together as close together in the code as possible and allows unrelated things in the code to change independently, while minimizing duplication in the code. In the October 2008 issue of MSDN Magazine, Jeremy Miller shows you some design ...
Read more!
The process for ink capture and analysis on the Tablet PC is straightforward in managed code. To the uninitiated developer, however, creating unmanaged Tablet PC applications can be rather daunting. In the October 2008 issue of MSDN Magazine, Gus Class a quick introduction to the Tablet PC ...
Read more!
Multicore systems are becoming increasingly prevalent, but the majority of software today will not automatically take advantage of this additional processing ability. And multithreaded programming, for anything but the most trivial of systems, is incredibly difficult and error prone today. In the October 2008 issue of MSDN ...
Read more!
Concurrent programming is notoriously difficult, even for experts. You have all of the correctness and security challenges of sequential programs plus all of the difficulties of parallelism and concurrent access to shared resources. In the October 2008 issue of MSDN Magazine, David Callahan describes ...
Read more!
A major advantage of AJAX and Silverlight applications is that they can transparently and continuously interact with a back-end service. The problem is that they run over HTTP, which wasn't designed with security in mind. In the September 2008 issue of MSDN Magazine, Dino Esposito shows you ...
Read more!
Unhandled exception processing shouldn't be a mystery. It's actually quite useful since it gives a crashing application an opportunity to perform last-minute diagnostic logging about what went wrong. In the September 2008 issue of MSDN Magazine, Gaurav Khanna discusses how ...
Read more!
More ...
Data Points
DataSet and DataTable in ADO.NET 2.0
John Papa

ADO.NET 2.0 sports some exciting enhancements to the core classes found in ADO.NET 1.x and introduces a variety of new classes, all of which promise to improve performance, flexibility, and efficiency. There have even been some major changes throughout the lifecycle of the pre-beta and beta versions of ADO.NET 2.0, such as the improvements to the new batch updating process. With the final release of ADO.NET 2.0 fast approaching and the feature set becoming more stable, it's time to take a closer look at what's in store.
This month I will begin by exploring the improvements to the DataSet and DataTable classes, what they mean to you, and when you might want to use them. Sometimes in ADO.NET 1.x, especially when working with large rowsets, you can experience performance problems. I'll discuss how some of these performance issues have been addressed in ADO.NET 2.0 by changes to the indexing engine. I'll review the features that have been added to the DataTable class, as well as the loading options through the new Load method and the new methods that change a row's state. In the next installment of this column, I will discuss other improvements, such as the ability to perform batch updates, compressing DataSets for transport using binary serialization, and more.

DataTable Enhancements
In ADO.NET 1.x the DataSet got all the glory leaving the DataTable in the shadows. This is not to say that the DataTable was not a useful class on its own. The DataTable is the container for rows and columns and could be considered the focal point of all disconnected data. However, the DataSet gets most of the press because it can contain DataRelation and the DataTable objects.
However useful, the DataTable does have some limitations in ADO.NET 1.x that the DataSet does not. For example, the DataSet exposes a Merge method that can merge two DataTable objects within a DataSet, but the DataTable itself does not expose a Merge method. So if you had a DataTable on its own (not contained within a DataSet) and you wanted to merge it with another DataTable object, you would have to first create a DataSet object, put the first DataTable into it, and invoke the DataSet.Merge method, as I did in Figure 1.
It's not difficult by any means, but it is annoying. In ADO.NET 2.0, the DataTable object now has a Merge method so you can merge two DataTable objects, like so:
dtCust1.Merge(dtCust2);
Another inconvenience in ADO.NET 1.x is that you cannot perform the basic XML operations on a DataTable without first associating it with a DataSet. For example, if you want to write a DataTable to XML, you need to load the DataTable into a DataSet and use the DataSet's WriteXml method. However, in ADO.NET 2.0, the DataTable has a WriteXml method so the problem is solved. In addition to the WriteXml method, the DataTable in ADO.NET 2.0 also exposes the ReadXml, ReadXmlSchema, and WriteXmlSchema methods.
The DataSet also has some new methods and properties. In fact, both the DataSet and the DataTable now expose the RemotingFormat property as well as the Load and CreateDataReader methods. The RemotingFormat property is used to indicate whether to serialize the DataTable or DataSet in binary or XML format. The Load method can be used to load data into a DataTable or DataSet in a variety of ways, which I'll discuss shortly.

Lightweight and Fast Looping
The DataTable's CreateDataReader method (named GetDataReader in previous Beta versions) creates an instance of the ADO.NET 2.0 DataTableReader. A DataTableReader created using DataTable.CreateDataReader will expose the same rows and columns as the DataTable. When a DataTableReader is created from a DataSet or a DataTable's CreateDataReader method, the DataTableReader will contain all of the rows from the container object, with the exception of deleted rows.
The DataTableReader is a lighter weight object than the DataTable and, unlike the DataReader (SqlDataReader), the DataTableReader is disconnected. This is a great feature because you get the lightweight object that you can iterate through quickly (like the DataReader) and it is disconnected from any data source (unlike the DataReader). A DataTableReader can be thought of as an iterator over the underlying table's rows, similar to a foreach enumeration of the contents of the table. However, unlike enumerating a table's rows (which will cause an exception when a row is added or deleted from the collection during the enumeration), a DataTableReader is resilient to changes being made to the underlying table and will correctly position itself in light of any modifications that occur.
The following example shows how you can create a DataTableReader and bind it to a DataGridView:
using (SqlConnection cn = new SqlConnection(cnStr))
{
    SqlCommand cmd = new SqlCommand(sqlAllCustomers, cn);
    SqlDataAdapter adpt = new SqlDataAdapter(cmd);
    DataTable dtCustomers = new DataTable("Customers");
    adpt.Fill(dtCustomers);
    DataTableReader dtRdr = ds.CreateDataReader();
    dgvCustomers.DataSource = dtRdr;
}
The DataTableReader can only be traversed in a forward manner, just like the DataReader. Again, like the DataReader, you move to the first row by using the Read method of the DataTableReader. If the DataTableReader was created from a DataSet that contained multiple DataTables, the DataTableReader will contain multiple resultsets (one per DataTable). Each subsequent resultset can be accessed using the DataTableReader by invoking the NextResult method (again, similar to the DataReader).
Figure 2 shows how to create a DataTableReader from a DataSet that contains two DataTable objects. Since the DataSet has two DataTables, the DataTableReader will contain two resultsets. You can loop through both resultsets in the DataTableReader using the Read and NextResult methods, as shown in Figure 2. Keep in mind that the DataTableReader only moves forward. So if you want to access the DataTableReader twice, because you want to loop through it again, you'll have to reload it after reading through all of its records the first time.
In Figure 2 the DataSet creates a DataTableReader using the CreateDataReader method. The order in which the DataTable's resultsets are added to the DataTableReader is the same as the order in which they appear in the DataSet. If you want to specify the order of the DataTable's resultsets, there is an alternate overloaded CreateDataReader method you can use.

Loading Data
A DataTableReader can also be used to populate a DataTable or a DataSet. In fact, using the new Load method of a DataSet or a DataTable, you can pass in a DataTableReader or any reader class that implements the IDataReader interface. The following example assumes there is a DataTable called dt1 that contains a schema and some rows. It creates a DataTableReader from the DataTable called dt1 and then turns right around and loads a second DataTable (called dt2) with the same data:
DataTableReader dtRdr = dt1.CreateDataReader();
DataTable dt2 = new DataTable();
dt2.Load(dtRdr);
In ADO.NET 1.x you could populate a DataSet or DataTable using the DataAdapter's Fill method. Alternatively, you could populate a DataSet from XML using the DataSet's ReadXml method. The introduction of the Load method in ADO.NET 2.0 makes it possible to load a DataSet or a DataTable from a class that implements IDataReader (like a DataTableReader or the SqlDataReader). When using the Load method to load several rows you can turn off notifications, index maintenance, and constraint checking by first invoking the BeginLoadData method, and then turn them back on by invoking the EndLoadData method. These methods, which are available in ADO.NET 1.x as well, can make the loading of data faster since ADO.NET does not have to stop after each row and recompute its indices, invoke notifications, or check constraints. Just make sure you remember to turn these things back on.

LoadOption Enumerator
The Load and the Fill methods have an overload that accepts one of the values from the LoadOptions enumeration. These settings introduce powerful new abilities which can be used to indicate how rows should replace existing rows during a Fill or Load operation into a DataSet or DataTable. This process assumes that a primary key has been set since it uses the primary key to determine how to replace or append the rows. These enumerator values help determine whether or not the current value and/or the original value get overwritten with the incoming rows' values. Figure 3 shows the three options and brief descriptions of each.
Each of these options has its place in an application given an appropriate circumstance. The OverwriteChanges option works well if you have a DataTable that contains data but you want to get any changed values that may exist in the database. Using this option will overwrite both the original and the current values in the DataTable with the values that come from the database. The key here is that when using OverwriteChanges, any data that has been modified (original version or current version) in the first DataTable will be overwritten while new rows will be added.
DataSet columns store an original and a current value. PreserveChanges will keep the current value intact while overwriting the original value. Upsert does the opposite of this as it keeps the original value intact while overwriting the current value.
Here is an example of when PreserveChanges might come in handy. Suppose a user named Peggy has opened a screen and loaded a DataGrid with customers from a DataSet. Peggy modifies the city of the customer with CustomerID ALFKI from Berlin to New York, but doesn't click the Save button. She then goes off for a cup of coffee. Meanwhile, Katherine modifies the same customer's city from Berlin to Miami. You'll now have a data concurrency issue if Peggy comes back from her break and saves the record. So in this situation, the original value for Peggy's customer record was Berlin and since she changed it to New York the current value is New York. Meanwhile, in the database the city is now Miami. If you want to reset the original values of Peggy's DataSet to what is in the database, you could get the data from the database into a DataTableReader and then load it into the DataSet using LoadOptions.PreserveChanges. Figure 4 illustrates how this works.
To get a first-hand feel for how these settings can be used, try stepping through this code in the debugger. You can try the different LoadOption enumerators by changing the opt variable that appears immediately before the switch-case block. The code shown in Figure 4 invokes the ShowVersions method which simply displays to the output window the original version and current version of a given column.

Changing RowState
A row's state is the major factor that helps determine which rows to update, insert, or delete when the DataAdapter's Update method is invoked. The RowState is also examined by the GetChanges method, as well, to determine which rows to grab. When you make changes to values in a DataSet, ADO.NET handles setting the RowState for you, indirectly setting it to one of the RowState values, such as Modified, Added, or Unchanged.
Sometimes it would be really helpful to be able to set a row's state directly. This is where the DataRow's new SetAdded and SetModified methods make things a little easier. For example, let's assume that you are faced with the situation in which you need to copy a handful of rows from one database to another using ADO.NET. Using ADO.NET 2.0 you could fill a DataTable from a database using the DataAdapter's Fill method, change the rows' RowState settings to Added, and send them down to the second database (assuming it uses the same schema) to be added using a second DataAdapter. In this type of situation you could change the RowState of the rows from Unchanged to Added by invoking each row's SetAdded method.
To demonstrate how these methods work, I have included another example, shown in the code in Figure 5. This code retrieves a rowset of customers and sets two of the row's RowState settings to Modified and another row's setting to Added. Then, I use the DataTable's GetChanges method to create a DataTable containing the rows with a RowState of Modified and store the number of rows. I then get the number of rows that were added and display them using a MessageBox.
The SetAdded and SetModified methods of the DataRow only work on rows that are unchanged. Another situation in which these come in handy is when you receive a DataSet or a DataTable from a Web service and the rows are all marked as Unchanged. If you intend to make additions or updates to a database based on the DataTable, you could set the RowState using these new methods. Otherwise, if you leave the RowState as Unchanged, the DataAdpater's Udpate method would not send the rows to either the UpdateCommand or the InsertCommand.

0 to 60
One of the best new features of ADO.NET 2.0 is neither a new method nor a new class but rather a focused performance improvement. One of the big knocks on the DataSet and the DataTable has been how slow they can be to load, especially when the number of rows gets large (100, 1,000, 10,000, or more). It can be just as bad, if not worse, when trying to traverse a large DataTable. To address this practical limitation, a lot of effort went into the creation of a faster indexing engine in ADO.NET 2.0. The rewrite of the indexing engine in ADO.NET has resulted in a huge performance boost in all areas including loading and merging DataTables. Figure 6 shows some sample code that can be run in either Visual Studio® .NET 2003 (using ADO.NET 1.1) or Visual Studio 2005 (using ADO.NET 2.0). I ran a benchmark comparison of this code in both environments with varying numbers of rows.
The code in Figure 6 creates a DataTable, adds two columns to it, and then loops 1,000,000 times adding a row to the DataTable in each iteration. Once the loop is completed, the elapsed number of seconds is displayed to the user using the MessageBox method. I ran this test for a few different iterations in both ADO.NET 1.1 and ADO.NET 2.0. My results are shown in Figure 7.
The speed is even better if there are no constraints on the DataTable. For example, when I removed the Unique constraint, I was able to load a million rows in just over a second in both versions of the environment. It is also important to note that I only loaded two columns and the value for the SomeNumber column was a decrementing but still sequential integer. While your results will vary from mine, the key point to take away is that the indexing engine for ADO.NET 2.0 performs significantly faster— so much faster that it is now completely realistic to at least consider using a DataTable to contain a million rows.

Wrapping It Up
In ADO.NET 2.0, there's better performance in areas that were lagging in the previous version, such as loading large numbers of rows. Several new features have been added to make development easier. The DataTable class has gained several methods that already existed on the DataSet class and there is even a new DataTableReader class. In the next installment of the Data Points column, I will continue the discussion of ADO.NET 2.0 by examining how binary serialization improves performance, how to take advantage of batch updates, the new DataView features, the new SqlConnectionBuilder class, and much more.

Send your questions and comments for John to  mmdata@microsoft.com.


John Papa is a Senior .NET Consultant with ASPSOFT and a baseball fanatic who spends most of his summer nights rooting for the Yankees with his family and his faithful dog, Kadi. He has authored several books on ADO, XML, and SQL Server, and can often be found speaking at industry conferences such as VSLive or blogging at codebetter.com/blogs/john.papa.

© 2008 Microsoft Corporation and CMP Media, LLC. All rights reserved; reproduction in part or in whole without permission is prohibited.
Page view tracker