Data Points

Techniques in Filling ADO.NET DataTables: Performing Your Own Analysis

John Papa

Code download available at:DataPoints0306.exe(128 KB)

Contents

Retrieval Methods
The Test Scenario
Test Results
Wrap-up

One of the most fundamental differences between ADO.NET and classic ADO is how the data is represented. In ADO.NET it's represented in related DataTable objects, whereas classic ADO represents data in Recordset objects. Using classic ADO, the typical app retrieves a single rowset and stores it in a single Recordset object. When using DataTable objects in ADO.NET, you can store several rowsets in several DataTable objects, all related to one another and contained within a single DataSet. For example, you can retrieve a customer rowset, order rowset, and an order detail rowset and store them in three DataTable objects within a single DataSet. This gives you great flexibility to manage your data.

I've recently received several inquires asking which technique is best for retrieving data and populating a DataSet. Since the Microsoft® .NET Framework offers so many choices on how to write the code, many developers are now taking a close look at the different options in an effort to make the best choice for their apps. One of these choices is how to retrieve data, relate it, display it, and represent it in an app. The concept of running scenarios before deciding how to architect an application isn't new by any means. Its importance is quite critical, especially when presented with a relatively new development model such as the .NET Framework. Because developers are faced with so many new ways to build apps, it could be argued that testing is more critical than ever.

So in this month's Data Points column I will take a look at one way to evaluate different architectural decisions using the retrieval of data as a focal point. I'll examine different ways to retrieve data into DataTables, and I'll discuss some performance tests I ran on them and when and where each of the techniques might be useful in applications. I'll also walk through scenarios that demonstrate how to retrieve relational data and manage it effectively. You can consider some of the points that follow to be best practices, but you should also run your own analysis.

Retrieval Methods

Here's a list of some techniques that can be used to load data into DataTable objects that are related to one another within a DataSet:

  • Retrieve all of the data from a single joined query into a single DataTable (see Figure 1).
  • Retrieve each rowset (three in all) from a query into its own DataTable using subqueries (see Figure 2).
  • Retrieve each rowset (three in all) from a query into its own DataTable using joins (see Figure 3).
  • Retrieve each rowset (three in all) from a query into its own DataTable using parameters.

Figure 3 Using Joins

--- Retrieve Customer information SELECT CustomerID, CompanyName, City FROM Customers WHERE Country = 'USA' ORDER BY CustomerID --- Retrieve Order information SELECT o.CustomerID, o.OrderID, o.OrderDate FROM Customers c INNER JOIN Orders o ON c.CustomerID = o.CustomerID WHERE c.Country = 'USA' ORDER BY o.CustomerID, o.OrderID --- Retrieve Order Detail information SELECT od.OrderID, od.ProductID, od.UnitPrice, od.Quantity FROM Customers c INNER JOIN Orders o ON c.CustomerID = o.CustomerID INNER JOIN [Order Details] od ON o.OrderID = od.OrderID WHERE c.Country = 'USA' ORDER BY o.CustomerID, o.OrderID, od.ProductID

Figure 2 Using Subqueries

--- Retrieve Customer information SELECT CustomerID, CompanyName, City FROM Customers WHERE Country = 'USA' ORDER BY CustomerID --- Retrieve Order information SELECT CustomerID, OrderID, OrderDate FROM Orders WHERE CustomerID IN ( SELECT CustomerID FROM Customers WHERE Country = 'USA' ) ORDER BY CustomerID, OrderID --- Retrieve Order Detail information SELECT OrderID, ProductID, UnitPrice, Quantity FROM [Order Details] WHERE OrderID IN ( SELECT OrderID FROM Orders WHERE CustomerID IN ( SELECT CustomerID FROM Customers WHERE Country = 'USA' ) ) ORDER BY OrderID, ProductID

Figure 1 Retrieve All Data in One Joined Query

--- Retrieve Customer, Order and Order Detail information SELECT c.CustomerID, c.CompanyName, c.City, o.OrderID, o.OrderDate, od.ProductID, od.UnitPrice, od.Quantity FROM Customers c INNER JOIN Orders o ON c.CustomerID = o.CustomerID INNER JOIN [Order Details] od ON o.OrderID = od.OrderID WHERE (c.Country = 'USA') ORDER BY o.CustomerID, o.OrderID, od.ProductID

Each of these techniques will retrieve all the customer, order, and order detail information for all customers in the U.S. The data will then be bound to an ASP DataGrid control in a Web Form to be displayed. First, I'll describe each of the retrieval methods, then I'll explain the tests I put them through, and finally I will share the results of my tests.

The first method will retrieve all of the data using a single joined query and populate a single DataTable. Thus there will be only one query and it will join the customer, order, and order detail tables together using a SQL query and return the fields from each of the required tables. The results are then stored in a DataTable within a DataSet and bound to the ASP DataGrid control.

The second method retrieves all the U.S. customer information from a SQL query and stores it in a DataTable. The order information is retrieved using a SQL query that narrows the orders down to only U.S. customers using a subquery in the WHERE clause. Then, the order detail information is retrieved using a SQL query that restricts those results to only U.S. customers using a nested subquery in the WHERE clause.

The third method retrieves all the U.S. customer information from a SQL query and stores it in a DataTable. Here the order information is retrieved using a SQL query and the orders are restricted to U.S. customers only by joining to the customer table and using the WHERE clause. Then, the order detail information is retrieved using a SQL query that gets U.S. customers only by joining to the customer and order tables and by using the WHERE clause to get order details for U.S. customers only.

The fourth method retrieves all the U.S. customer information from a SQL query and stores it in a DataTable. Then, a query is executed for each customer, one at a time, to retrieve all of the customer's orders. This SQL query has a parameter for the Orders.CustomerID column which is reset each time while looping through the customers. Finally, a query is executed for each order detail to retrieve all of the order's order detail records.

This SQL query has a parameter for the [Order Details].OrderID column which is reset each time while looping through the orders. The problem with this method of retrieval is that the number of queries that will be executed could be enormous compared to the other methods. The query to retrieve the customers is executed only once, but the query to retrieve the orders is executed once for every customer. Furthermore, the query to retrieve the order details is executed once for every order. So if there were 10 customers that have 10 orders each (thus 100 total orders), the query to retrieve the orders would be executed 10 times and the query to retrieve the order details would be executed 100 times. This technique would require 111 queries to be executed (1 to get the customers, 10 to get the orders, and 100 to get the order details).

Each query is specifically geared to getting a particular customer's orders or a particular order's details, thus the queries are all likely to be optimized. Because of the number of queries to be executed and the amount of data going back and forth between the database and the application in those 111 trips, the results from my test were not a surprise when this method took the longest by far to load the DataTables within a DataSet. However, the performance results I gathered from my tests on the other three retrieval methods were much better and not far apart from each other.

The Test Scenario

My goal was to find the most efficient way to load a DataTable from a database query and display the results in a Web Form. The test runs each of the first three methods of retrieval I just discussed and takes the results and loads them into a corresponding set of ASP DataGrids. The test executes the queries, loads the data into a series of DataTables within a DataSet, binds the data to a series of DataGrids, and then repeats itself for a total of 100 repetitions. The process is repeated 100 times to evoke any anomalies or rule them out with a good degree of certainty. The total time (in milliseconds) to perform the data retrieval and bind the data to the DataGrids is summed and then averaged over the set of 100 repetitions. Finally, the average time (in milliseconds) to perform the tasks is displayed on the Web Form, along with the data displayed in the DataGrids. The test isn't perfect, of course, but all three methods of retrieval undergo the exact same test in the same environment, so a comparison of the results from each method of retrieval is certainly valid.

Figure 4 shows the HTML that the Web Form uses to display the results of the test as well as the rowsets from each retrieval method in several ASP DataGrids. The ASP Label control is used to display the average time (in milliseconds) each test took to execute. The DataGrid controls are used to display the rowsets from the last iteration of each test of each retrieval method.

Figure 4 Displaying the Test Results

<%@ Page language="c#" Codebehind="WebForm1.aspx.cs" Inherits="FillingDataTables.WebForm1" %> <html><body> <form id="Form1" method="post" runat="server"> <asp:Label id="lblMessage" runat="server" Height="332px" Width="516px"></asp:Label> <br> <!-- Separate queries using subqueries--> <asp:DataGrid id="grdCustomer1" runat="server"></asp:DataGrid> <br> <asp:DataGrid id="grdOrder1" runat="server"></asp:DataGrid> <br> <asp:DataGrid id="grdOrderDetail1" runat="server"></asp:DataGrid> <br> <!-- 1 joined query --> <asp:DataGrid id="grdCustomer2" runat="server"></asp:DataGrid> <br> <!-- Separate queries using joins--> <asp:DataGrid id="grdCustomer3" runat="server"></asp:DataGrid> <br> <asp:DataGrid id="grdOrder3" runat="server"></asp:DataGrid> <br> <asp:DataGrid id="grdOrderDetail3" runat="server"></asp:DataGrid> </form> </body></html>

The structure of the code for the test is a traditional tiered approach slimmed down a bit for our purposes, but still very much in line with tiered applications. The WebForm1.aspx page handles the display of the data and the test results. It has a codebehind file called WebForm1.aspx.cs, which contains all of the code (for all of the events) that interacts with WebForm1.aspx. It is in the code-behind that the test is initiated, timed, and reported. The code-behind calls a custom class named Customer.cs which contains three public methods worth noting: GetData_Using1Query, GetData_UsingSeparateQueriesUsingJoins, and GetData_UsingSeparateQueriesWithSubqueries.

Each method represents one of the data retrieval techniques. They all accept the user name and password for the Northwind database and create an empty DataSet to store the results. The methods then instantiate a connection to the database using the SqlConnection object. From this point, the different methods diverge as each method sets its SQL statements accordingly. As shown in Figure 5, the method GetData_UsingSeparateQueriesUsingJoins sets the SQL statement to retrieve the CustomerID and CompanyName for the customers in the U.S. A new instance of a SqlCommand object is created and initialized with the SQL statement using the SqlCommand object's one-argument constructor. Next, the CommandType is set and the SqlConnection object is linked to the SqlCommand object.

Figure 5 Using Separate Join Queries

//----------------------------------------------------------------------- // public Method // Overloaded: No // Parameters: string sUserName // string sPassword // Return Value: DataSet //----------------------------------------------------------------------- public DataSet GetData_UsingSeparateQueriesUsingJoins(string sUserName, string sPassword) { this.m_sMethodName = "[public DataSet GetData_UsingSeparateQueriesUsingJoins" + " (string sUserName, string sPassword)]"; DataSet oDS = new DataSet(); string sSQL; try { //--------------------------------------------------------------- //-- Set up the Connection //--------------------------------------------------------------- string sConn = "Data Source=(local);Initial Catalog=northwind;User ID=" + "sUserName + ";Password=" + sPassword + ";"; SqlConnection oCn = new SqlConnection(sConn); //--------------------------------------------------------------- //-- Set up the SELECT Command for the Customer //--------------------------------------------------------------- sSQL = "SELECT CustomerID, CompanyName, City FROM Customers " + " WHERE Country = 'USA' " + " ORDER BY CustomerID"; SqlCommand oSelCmd_Customer = new SqlCommand(sSQL); oSelCmd_Customer.CommandType = CommandType.Text; oSelCmd_Customer.Connection = oCn; //--------------------------------------------------------------- //-- Set up the SELECT Command for the Orders //--------------------------------------------------------------- sSQL = "SELECT o.CustomerID, o.OrderID, o.OrderDate " + " FROM Customers c INNER JOIN Orders o ON c.CustomerID = o.CustomerID " + " WHERE c.Country = 'USA' " + " ORDER BY o.CustomerID, o.OrderID"; SqlCommand oSelCmd_Order = new SqlCommand(sSQL); oSelCmd_Order.CommandType = CommandType.Text; oSelCmd_Order.Connection = oCn; //--------------------------------------------------------------- //-- Set up the SELECT Command for the Order Details //--------------------------------------------------------------- sSQL = "SELECT od.OrderID, od.ProductID, od.UnitPrice, od.Quantity " + " FROM Customers c INNER JOIN Orders o ON c.CustomerID = o.CustomerID " + " INNER JOIN [Order Details] od ON o.OrderID = od.OrderID " + " WHERE c.Country = 'USA' " + " ORDER BY o.CustomerID, o.OrderID, od.ProductID"; SqlCommand oSelCmd_OrderDetail = new SqlCommand(sSQL); oSelCmd_OrderDetail.CommandType = CommandType.Text; oSelCmd_OrderDetail.Connection = oCn; //--------------------------------------------------------------- //-- Create and set up the DataAdapters //--------------------------------------------------------------- SqlDataAdapter oDA_Customer = new SqlDataAdapter(); SqlDataAdapter oDA_Order = new SqlDataAdapter(); SqlDataAdapter oDA_OrderDetail = new SqlDataAdapter(); oDA_Customer.SelectCommand = oSelCmd_Customer; oDA_Order.SelectCommand = oSelCmd_Order; oDA_OrderDetail.SelectCommand = oSelCmd_OrderDetail; //--------------------------------------------------------------- //-- Get the data //--------------------------------------------------------------- oDA_Customer.Fill(oDS, "Customer"); oDA_Order.Fill(oDS, "Order"); oDA_OrderDetail.Fill(oDS, "OrderDetail"); //--------------------------------------------------------------- //-- Dispose of objects //--------------------------------------------------------------- oCn.Dispose(); oSelCmd_Customer.Dispose(); oSelCmd_Order.Dispose(); oSelCmd_OrderDetail.Dispose(); oDA_Customer.Dispose(); oDA_Order.Dispose(); oDA_OrderDetail.Dispose(); } catch (Exception ex) { //--------------------------------------------------------------- //-- Throw the exception //--------------------------------------------------------------- LogAndThrow(ex, sUserName, this.m_sClassName, this.m_sMethodName, "my err description"); } finally { } //-------------------------------------------------------------------- //-- Return the DataSet //-------------------------------------------------------------------- return oDS; }

This process is repeated for both of the other SQL statements that retrieve the orders and order details. The method uses three SqlCommand objects and three SqlDataAdapter objects: one for the customers, one for the orders, and one for the order details. Once all of the SqlCommand objects have been instantiated, have had their SQL statements set, and have been linked to the SqlConnection, the three SqlDataAdapter objects are created. The SqlCommand objects are then set to the SelectCommand property of their corresponding SqlDataAdapter object. For example, the SqlCommand object that retrieves the customers (oSelCmd_Customer) is set to the SqlDataAdapter object's (oDA_Customer) SelectCommand property, and so on. The SqlDataAdapter has four command object properties to correspond to each of the four main data retrieval and manipulation SQL statements: InsertCommand, UpdateCommand, DeleteCommand, and SelectCommand. Since this code will only retrieve data, only the SelectCommand is required to be set.

Next, the Fill method for each of the three SqlDataAdapter objects is called, resulting in a trip to the database and the execution of each of the SQL statements separately. The data is then returned to the DataSet that was created at the beginning of the public method GetData_UsingSeparateQueriesUsingJoins and stored in the Customer, Order, and OrderDetail DataTables within the DataSet. The code then relates the DataTable objects to each other with DataRelation objects. This is only done in the two methods that retrieve three separate rowsets since the other method only retrieves a single rowset.

The code snippet shown in Figure 6 loops 100 times to retrieve the data from the Customer class. Each DataTable is bound to an ASP DataGrid and the elapsed time (in milleseconds) is tallied. Finally, the average elapsed time is calculated and displayed in an ASP Label on the Web Form.

Figure 6 Repeating the Data Retrieval 100 Times

lblMessage.Text += "GetData_UsingSeparateQueriesUsingJoins<br>"; fAvg = 0; for(i = 0; i < 100; i++) { x = System.DateTime.Now; oDS = oCustomer.GetData_UsingSeparateQueriesUsingJoins("sa", ""); grdCustomer3.DataSource = oDS.Tables["Customer"].DefaultView; grdCustomer3.DataBind(); grdOrder3.DataSource = oDS.Tables["Order"].DefaultView; grdOrder3.DataBind(); grdOrderDetail3.DataSource = oDS.Tables["OrderDetail"].DefaultView; grdOrderDetail3.DataBind(); y = System.DateTime.Now; z = y - x; fAvg += z.Milliseconds; } lblMessage.Text += "Average Ms = " + (fAvg / i).ToString() ; lblMessage.Text += "<hr>";

Test Results

After running this test several times over, the results showed that returning the data in a single joined query was consistently the poorest performing technique. The left side of Figure 7 shows the test results for the U.S. customers, their orders, and order details. Gathering the data using one joined query took three milleseconds longer than the other techniques, resulting in approximately a 16 percent slowdown over the next closest technique.

Figure 7 Test Results in Milliseconds

Query U.S. Customers All Customers
Subquery 18.52 123.47
Join 18.81 138.56
One Query 21.87 156.02

Consider that returning the data in a single joined query means gathering the data and retrieving what could amount to a significant redundancy. For example, the company name "Save-a-lot Markets" is repeated over 50 times in the rowset, once for every order and its order detail. That's a lot of data to be stored, transported, and rendered to a browser. It may not seem like a lot, but it is certainly more data than if the customer's company name was returned just once. While three milliseconds is not a ton of time by any means, keep in mind that when these tests are extrapolated into larger, real-world scenarios, the performance can really make a difference.

For example, while the Northwind database doesn't store as much information as you might have in a real-world scenario, let's take a look at what happens when all customers are retrieved—not just those from the U.S. (see the right side of Figure 7). Notice that the gaps still exist in the techniques, with the single query taking approximately 12 percent longer than the next fastest technique.

Keep in mind that the single query is only executing one query while the other techniques are executing three queries each. Despite running more queries, the other techniques consistently complete faster than the single joined query. This behavior demonstrates that running several optimized queries is better than running a single unoptimized query in most scenarios. This could also be interpreted to mean that the queries themselves are inconsequential and rendering and transporting the data is the real slowdown. This is because the single query returns more data than the three separate queries.

On the flip side, using a single joined query can be very useful at times. One such time is when the application requires the data to be shown in a single nonhierarchical grid. For the most part, I run a single query when I want to display data in a single venue like an ASP DataGrid. However, when an application requires reading and writing to several tables in an underlying database, I prefer using separate queries that load separate related DataTable objects due to the built-in flexibility ADO.NET provides.

The test could be altered in many ways, one of which could be to omit the DataRelation objects altogether. I ran a few tests without the DataRelation objects and found that using the DataRelations tacked on about one millisecond to the U.S. customers column results shown in Figure 7 for each of the methods that executed three queries. (Of course, the method that executes the single joined query does not require a DataRelation, so it was unaffected.) Thus, removing the DataRelation improved the performance of running the three separate queries that used either subqueries or joins because of how long ADO.NET takes to apply the relations.

You might conclude that if running a single joined query was slower than running separate queries, that the slowdown was mostly in joining the data. If you take a look at the code from the method that runs three queries using joins, you will notice that the same joins used in the single query are used. The biggest difference is actually in the fields being returned. The single query returns all fields in a single SQL statement, while the separate queries return only the fields required for each query. The customer query only goes to the customer table and gets the customer fields. The order query goes to the customer and order table and only gets the order fields. The order details query goes to all three tables but only gets the order detail fields. With all this in mind, this series of tests show that the joins did have an effect on performance but that the fields being returned (and the redundancy of them) likely had a greater impact on the performance.

The best performing method was the three queries that used subqueries. The code for this technique can get rather cumbersome since the third-level query has to run a subquery that runs a subquery to get its results. Since the results from running the three queries using subqueries was very close to the three queries using joins, I prefer the latter technique since the code is much easier to read and maintain. After all, performance is only one, albeit major, factor in proper application implementation. Maintainability, scalability, flexibility, reliability, and consistency are all very important as well.

Wrap-up

The code I've provided demonstrates how easy it is to retrieve data into ADO.NET DataSet objects using various techniques. Feel free to download the test code from this column and modify it when exploring more testing scenarios (see the link at the top of this article). The key point to remember is to consider carefully how you will architect your applications. Running these tests not only revealed the performance of the various methods, but it also revealed how easy the code was to write, modify, understand, and take to the next level. I highly recommend that when developing any application, you take time to run test scenarios for the various moving parts of your application including data retrieval, reporting, modifications, screen generation, event processing, and other major application aspects. If the application is tested well, you'll have fewer surprises and a more robust application.

Send your questions and comments for John to  mmdata@microsoft.com.

John Papa is a baseball fanatic who spends most of his summer nights rooting for the Yankees with his two little girls, wife, and faithful dog, Kadi. He has authored several books on ADO, XML, and SQL Server and can often be found speaking at industry conferences such as VSLive. You can reach him at mmdata@microsoft.com.