Printer Friendly Version      Send     
Click to Rate and Give Feedback
Related Articles
Here we explain how the new hierarchyID data type in SQL Server 2008 helps solve some of the problems in modeling and querying hierarchical information.

By Kent Tegels (September 2008)
We introduce you to the benefits of building composite applications with the Composite Application Guidance for WPF from Microsoft patterns & practices.

By Glenn Block (September 2008)
ADO.NET Data Services provide Web-accessible endpoints that allow you to filter, sort, shape, and page data without having to build that functionality yourself.

By Shawn Wildermuth (September 2008)
See how routed events and routed commands in Windows Presentation Foundation form the basis for communication between the parts of your UI.

By Brian Noyes (September 2008)
More ...
Articles by this Author
Here John Papa demonstrates how to build a Silverlight 2 user interface that communicates through WCF to interact with business entities and a database.

By John Papa (September 2008)
Here we build a solution that fits the Entity Framework into an n-tier architecture that uses WCF and WPF and the MVP pattern.

By John Papa (July 2008)
The Entity Framework is a hot topic, but how do you know when to use EntityClient, Object Services, Entity SQL, or LINQ? Find out here.

By John Papa (May 2008)
LINQ to Objects and LINQ to Entities have an arsenal of Standard Query Operators that operate on sequences to perform a wide variety of operations.

By John Papa (March 2008)
With the Entity Framework, developers are given more flexibility by being able to design around a conceptual model rather than a relational data model. To get started, you need to know how to design an Entity Data Model. John Papa walks you through that process.

By John Papa (February 2008)
This month John Papa takes a look at developing a mobile application that can access data on your application server.

By John Papa (January 2008)
WPF is one of the most important new technologies in the .NET Framework 3.0. This month John Papa introduces its data binding capabilities.

By John Papa (December 2007)
In T-SQL, views and derived tables serve similar purposes and have their advantages. But when working with SQL Server 2005,there's a third option--using Common Table Expressions (CTEs). John Papa explains.

By John Papa (October 2007)
More ...
Popular Articles
Here the author uses Document Information Panels in the Microsoft 2007 Office system to manipulate metadata from Office docs for better discovery and management.

By Ashish Ghoda (April 2008)
Here we present a rundown of the various language paradigms of CLR-based languages via short language introductions and code samples.

By Joel Pobar (May 2008)
Efficient parallel applications aren’t born by merely running an old app on a parallel processor machine. Tuning needs to be done if you’re to gain maximum benefit.

By Rahul V. Patil and Boby George (June 2008)
We introduce you to the EDI functionality within BizTalk Server 2006 R2, illustrating schema creation, document mapping, EDI delivery and transmission, and exception handling.

By Mark Beckner (August 2008)
More ...
Read the Blog
SQL Server 2008 supports a new data type, HierarchyID, that helps solve some of the problems in modeling and querying hier­archical information. In the September 2008 issue of MSDN Magazine, Kent Tegels introduces you to the ...
Read more!
Many people using SharePoint technologies don't realize that there is auditing support built directly into the Windows SharePoint Services (WSS) 3.0 platform. In the September 2008 issue of MSDN Magazine, Ted Pattison walks you through a ...
Read more!
The September 2008 issue of MSDN Magazine is now available online. Here's what's in the issue: Hierarchy ID: Model ...
Read more!
Silverlight 2 features a rich and robust control model that is the basis for the controls included in the platform and for third-party control packages. You can also use this control model to build controls of your own. In the August 2008 issue of MSDN Magazine, Jeff Prosise describes how to ...
Read more!
In the August 2008 issue of MSDN Magazine, Matt Milner covers several topics regarding development with Windows Workflow Foundation, some that are intended to address specific reader questions, such as how to safely share a persistence database ...
Read more!
LINQ is a powerful tool enabling quick filtering data based on a standard query language. It can tear through a structured set of data using a simple and straightforward syntax. In the August 2008 issue of MSDN Magazine, Jared Parsons demonstrates a ...
Read more!
More ...
Data Points
SqlConnectionStringBuilder, DataView, and More
John Papa

Code download available at: DataPoints0601.exe (130 KB)
Browse the Code Online
Several significant enhancements have been made to ADO.NET 2.0 in the areas of improved performance, increased flexibility, and added features. In my last column (Data Points: DataSet and DataTable in ADO.NET 2.0), I discussed the improvements made to the DataTable class, the effect of the powerful new LoadOption enumerator, how to change a row's state, and the new DataTableReader class. In this installment of Data Points, I will continue to explore significant enhancements that are introduced with ADO.NET 2.0, such as the new SqlConnectionStringBuilder, improvements to the DataView, batch updates, and improved DataSet serialization. I will also demonstrate how some of its new features perform when they are put to the test.

Binary Serialization
ADO.NET 1.x provides the means to serialize a DataSet. The bad news is that the XML format used for DataSets can cause performance problems when a large DataSet is serialized and passed over the network. In ADO.NET 2.0, you can serialize a DataSet or a DataTable in true binary format, resulting in smaller consumption of memory and network bandwidth.
Let's take a look at how to implement binary serialization of a DataTable. The RemotingFormat property of the DataTable is a new property introduced in ADO.NET 2.0 that can be used to set the serialization format. The RemotingFormat property expects one of the SerializationFormat enumeration values: SerializationFormat.Xml or SerializationFormat.Binary. Once RemotingFormat has been set, you can create an instance of the BinaryFormatter class to serialize the DataTable. The example in Figure 1 loads a list of customers into a DataTable and serializes it into the binary format.
A DataTable that is serialized in binary format is generally smaller in size than the same DataTable serialized in XML format. But how much smaller?
Being the skeptic that I am, I tried a few tests to see how different the size of a DataSet would be when serialized in XML versus binary format. I tried several different sample sets using several different powers of 10. For my tests, I created a table called MyOrders and I added over 100,000 rows to it. Then I ran the code in Figure 2, which produced the results shown in Figure 3. (Your results may vary depending on your data.)
One thing that stands out is that the tests I ran for 1 and for 10 rows shows that the binary files are actually larger than the XML files. To figure out the number of rows where the crossover occurred, I ran my test an additional 100 times for rows 1 through 100. For my data sampling, the binary file was smaller than the XML file whenever there were 25 or more rows, so I added the results for 25 rows to the grid in Figure 3. The bottom line here is that the binary format can significantly decrease the amount of network bandwidth and time that is needed to pass a DataTable or a DataSet across a network.
Taking this one step further, I added two additional columns to the table in Figure 3 to represent the amount of memory consumed per row. Looking at the XML serialization data and starting at 100 rows and greater, notice how the amount of space required per row doesn't vary all that much (0.375 versus 0.389). However, the memory required for each row by the binary serialization method decreases significantly as the number of rows increase (0.149KB at 100 rows all the way down to 0.059KB at 100,000 rows).
It is important to keep in mind that it is still good practice to pass the fewest rows necessary in your application. If you are passing data to a data layer, you should only pass the changed data (inserts, updates, and deletes) using the GetChanges method of the DataTable, since there is rarely a need to pass down data that did not change. However, if your application requires that you pass an entire stateful set of data between tiers or possibly to another business system, binary serialization can come in handy.

DataView Enhancements
A DataView is a great tool for filtering, sorting, and in general just presenting data to the user in a more user-friendly format. In ADO.NET 2.0, the DataView class provides several new features that truly enhance its value. Let's assume, for example, that in ADO.NET 1.x you have a DataTable filled with a list of customers that came from a database. The DataTable is being used to store the list of customers and then a DataView is used on top of the DataTable to present the user with the list of customers in a sorted grid. In this example, the DataView is being used as a filter to present data. Going a step further, you might want to grab a distinct list of cities from the customer data to load another grid or a dropdown list. But if you also wanted to persist the DataView of customers or the DataView of the cities, you could not do this directly from the DataView. That is no longer the case in ADO.NET 2.0.
Through the DataView.ToTable method, the DataView now allows you to create a DataTable directly from the DataView. The new DataTable is a separate and distinct object that is not linked to the original DataTable. For example, let's assume I have a DataTable called dtCust and I create a DataView called view that displays a filtered and sorted list of the customers in the dtCust. If I then invoke the ToTable method on view to create a second DataTable called dtCities, I now have two DataTable objects that each store their own separate copies of their data. So if I change a customer in dtCust, it does not affect the data in dtCities whatsoever.
The code in Figure 4 creates a DataTable called dtCust and loads it with a list of customers from a database. It then creates a DataView called vwCust that only contains a distinct list of the cities of the USA customers. The DataView's ToTable method is then used to create a new DataTable called dtCities. This example uses the ToTable method's overloaded signature which accepts the name of the new DataTable to create, a Boolean value to indicate whether or not distinct values should be grabbed, and a string array containing the list of columns to grab from the DataView and add to the new DataTable.

Building Connections
ADO.NET 2.0 introduces a brand new class called SqlConnectionStringBuilder that can be used to help build a connection string. All you have to do is instantiate it and set the properties that are appropriate, and it builds the connection string for you (revealed through its ConnectionString property). The following code sample shows how the SqlConnectionStringBuilder works:
SqlConnectionStringBuilder cnBldr = new SqlConnectionStringBuilder();
cnBldr.DataSource = "MyServer";
cnBldr.InitialCatalog = "Northwind";
cnBldr.IntegratedSecurity = true;
using (SqlConnection cn = new SqlConnection(cnBldr.ConnectionString))
{
    ...
}
In this example, I set the server name via the DataSource property, the database name via the InitialCatalog property, and I tell it to use integrated security. IntelliSense® makes this approach much easier than attempting to hand code a connection string, and the strongly typed nature of the class helps eliminate harmful typos. There is no longer a need to try to remember the name of the attribute for the connection string. The Microsoft® .NET Framework also comes with the System.Data.OleDb.OleDbConnectionStringBuilder, System.Data.Odbc.OdbcConnectionStringBuilder, and System.Data.OracleClient.OracleConnectionStringBuilder classes. All of these are strongly typed classes that inherit from the DBConnectionStringBuilder class. Of course, if your database provider does not offer a strongly typed connection builder class, you could create your own class to build connections by inheriting from the DBConnectionStringBuilder class.

What are Batch Updates?
One of the best new features of ADO.NET is batch updating. Batch updating is easy to implement and reduces the number of network round-trips between your data layer and a database. To demonstrate the value of batch updates, I'll walk through a common update scenario that uses ADO.NET 1.x without batch updates and then discuss one that uses ADO.NET 2.0 with the new batch updating functionality.
First, I'll create a DataTable and load it with 100 rows from a database and present it to a user via a grid. I'll then add 12 rows, modify 9 rows, and delete 11 rows from the DataTable. At this point I have a DataTable with 70 unchanged rows, 9 modified rows, 11 deleted rows, and 12 added rows. I then use the GetChanges method to grab a new DataTable containing only the deltas (the modified, added, and deleted rows). I only want the deltas because I see no reason to pass all of the rows to my data layer throughout the layers of my application. In the data layer, I then create a DataAdapter and assign to the InsertCommand, UpdateCommand, and DeleteCommand properties the appropriate SQL stored procedures. When the DataAdapter's Update method is executed against the delta DataTable, the UpdateCommand is executed for each of the modified rows found using that row's data. Likewise, when it sees a deleted row or an added row, the DeleteCommand and InsertCommand are used, respectively. The DataAdapter.Update method executes 30 stored procedure calls (1 for each dirty row) and makes 30 separate round-trips between the data layer and the database.
In this example, using ADO.NET 1.1 techniques, each execution of a stored procedure packages the stored procedure call and sends it to the database, waits for any returning information, and then executes the next stored procedure in succession. If my data layer and my database are on separate physical servers, I have just introduced 30 round-trips across the network. If I had 100 changed rows, there would be 100 round-trips. You get the idea.
This same example with ADO.NET 2.0 reduces the network round-trips by using batch updates. Just prior to executing the DataAdapter.Update method, I set the UpdateBatchSize property of the DataAdapter to 30. This tells the DataAdapter to send 30 command executions at a time to the database. This time when the DataAdapter.Update method executes and iterates through the delta rows, it prepares the stored procedures to be executed until it has 30 of them prepared. Then it sends the 30 commands to the database in a single RPC request. In this example, all 30 stored procedure calls are sent across the network and then executed individually on the database server. Thus, only one network round-trip is required which streamlines this scenario.
For demonstration purposes, I built a consolidated example of a single batch of code (shown in Figure 5) that emulates the scenario I just described. In a real-world scenario, I would not recommend putting all of this code into a single method, of course. In this example, I retrieve some customers, modify one of them, and add a new customer to the DataTable. Then I set up the DataAdapter and its commands using SQL statements. At this point I set the UpdateBatchSize to 10, which in effect tells the DataAdapter to pool its commands and send them in groups of 10 (or less in case there are less than 10 operations to be performed).
So the idea is to group multiple commands and package them for transmission across the network to the database server. By default the UpdateBatchSize is set to 1, which means it acts just like it did in ADO.NET 1.1, where each delta row causes a single command to be sent to the database on its own, one at a time. When using an explicit value, such as 10, the DataAdapter packages 10 commands and sends them in a single RPC call to the database server. If there are 15 modified, added, and deleted rows in total and if the UpdateBatchSize is set to 10, there will be 2 separate packages transmitted: the first with 10 rows and the second with the remaining 5 rows. If the UpdateBatchSize is set to 0, the DataAdapter will attempt to send all of the command executions in a single package. If batch updates are not supported, then the UpdateBatchSize is automatically reverted back to 1.
Batch updates are implemented by packaging everything into a single RPC request. So if there are six commands to execute against the database contained in the batch, they will all be sent in a single request. The RPC request will contain each SQL command separated by an RPC marker. This is important because if you watch SQL Profiler after running the example in Figure 5 you will see two commands executing against the database individually. This technique is also better than some other techniques that limit the number of parameters or the SQL command length. The first time I tried batch updates, I thought everyone was nuts because I watched SQL Profiler and saw all of the separate SQL commands being executed. It wasn't until I used a network sniffer that I could clearly see that only one round-trip was taking place.

Choose Carefully
There is a time and place for everything, and batch updates are no exception. Batch updates do not support the use of return or output parameters, so you will want to avoid using batch updates if you need return or output parameters. Batch updates also affect the RowUpdated event because they will cause the RowUpdated event to fire only once for each batch even though so many updates were made. Figure 6 shows some differences in the number of times the RowUpdated event handler would execute based upon the UpdateBatchSize when 25 rows are updated.
When using batch updates, keep in mind that the Row property that is exposed in the RowUpdated event handler will represent the last row in the batch. If the count of actual rows in the batch is smaller than the UpdateBatchSize, the Rows property will return null.

Wrapping It Up
The release of ADO.NET 2.0 brings several new enhancements to its core classes along with new classes, some of which bring better performance and others which yield greater functionality and flexibility. With all the changes you'll find in ADO.NET 2.0, it's a good idea to try the features out a little at a time so you can refine your database toolbox and be ready to get the best performance at the best price for your data-driven apps. Now go out and reap the benefits of ADO.NET 2.0.

Send your questions and comments for John to  mmdata@microsoft.com.


John Papa is a Senior .NET Consultant with ASPSOFT (aspsoft.com) and a baseball fanatic who spends most of his summer nights rooting for the Yankees with his family and his faithful dog, Kadi. He has authored several books on ADO, XML, and SQL Server, and can often be found speaking at industry conferences, such as VSLive, or blogging at codebetter.com/blogs/john.papa.

© 2008 Microsoft Corporation and CMP Media, LLC. All rights reserved; reproduction in part or in whole without permission is prohibited.
Page view tracker