Data Points

Expressions in ADO.NET

John Papa

Contents

Expressions in SQL
DataColumn Expressions
Functions
Aggregates and Relations
Roll-downs and Computations

Managing relations in a database became a whole lot easier with ADO.NET. Instead of returning a single rowset to a data store, you can return a series of rowsets and relate them within a DataSet. There are numerous advantages to using related DataTables in ADO.NET, including the ability to store your data in a hierarchical structure, easier updating of data, and the ability to use expressions in columns.

This month, I'll provide a primer on column-based expressions and computations in ADO.NET. I'll cover the use of aggregate functions on DataColumns, calculating totals, performing other computations on entire DataSets, and linking DataColumns between DataTables. In addition, I'll provide some practical examples.

Summing and averaging a series of related rows in a SQL query is probably old hat to you thanks to aggregate functions like SUM and AVG that are part of ANSI SQL. SQL also allows calculations on columns such as multiplying the unit price by the quantity of products ordered to produce the extended price. Now ADO.NET offers a way to extend these features beyond the data source and into the middle and upper tiers of n-tiered applications. Using column expressions in ADO.NET, you can create your own columns by aggregating values within a DataSet, calculating a value from others in the same row, or even by grabbing a value from a parent or child DataTable through a DataRelation. With column-based expressions and computations in ADO.NET, new techniques for managing data have emerged.

Of course there are pros and cons to using expression-based columns, aggregations, and computations in ADO.NET. Expressions can be used within a single DataTable of a DataSet or across two DataTable objects of a DataSet through a DataRelation. I'll explain the difference between expression-based columns in ADO.NET and in SQL, and what you gain from each. Some of the operations I'll discuss in this column, including rolling up and down with expressions, rely on relations using the DataRelation object. (For more information on DataRelations, you can refer to the November 2002 installment of the DataPoints column.) In this column I will demonstrate how to create expressions in DataColumn objects, how to create aggregate functions with DataSets and SQL, how to roll up and down fields in DataSets, and how to perform calculations on columns in a DataSet.

Expressions in SQL

SQL expressions come in many forms, for many purposes, including string formatting, user-defined functions, and mathematical calculations. If a SQL statement concatenates a first name and last name together, calculates the extended price for an order's line item, or contains a user-defined function in SQL Server™ 2000 to retrieve an order date, then the SQL statement contains an expression.

Expressions provide great flexibility to developers to return values in a rowset that can be derived from other fields in the database. A relational and even somewhat normalized database will not store the extended price of an order's line item; rather, it will store the unit price and the quantity. The extended price can be calculated from these values, thus eliminating the possibility of getting your data out of synch. For example, if you store the quantity, unit price, and extended price in a row, imagine if somehow the data ends up being quantity = 10, unit price = $7.00 and extended price = $100.00. This should never happen, but the point is that it could happen if you store redundant data. The general rule for a transactional database is not to store any information that can be derived, such as the extended price.

This is where SQL expressions come into play. The extended price can be derived by creating a calculated column in the SQL statement that multiplies the unit price by the quantity (see Figure 1). The extended price can also be calculated by taking into account the discount amount, if any.

Figure 1 Math Expressions in SQL

USE northwind
GO

SELECT OrderID,
       ProductID,
       UnitPrice,
       Quantity,
       (UnitPrice * Quantity) AS ExtendedPrice,
       Discount,
       ((UnitPrice * Quantity) * (1 - Discount)) AS 
        ExtendedPriceWithDiscount
FROM    [Order Details]
ORDER BY
       OrderID,
       ProductID

The following SQL code shows how to concatenate strings in a SQL expression. This example takes the first name and last name and puts them together to return the uppercased full name with last name first:

USE pubs
GO

SELECT au_fname AS FirstName,
       au_lname AS LastName,
       au_lname + ', ' + au_fname AS FullName1,
       (UPPER(au_fname) + ' ' + UPPER(au_lname)) AS FullName2
FROM    authors
ORDER BY
       au_lname,
       au_fname

SQL expressions can format strings and perform calculations in a rowset, but there are some things to watch out for when you use them for this purpose. If a DataSet was filled from the SQL statement in Figure 1 and the first row's value for the quantity column was modified, the expression columns would remain unchanged. For example, if the quantity is 10 and the unit price is $7.00 and then the quantity is changed to 5, the ExtendedPrice column would still read $70.00 (10 * $7.00) and the data would be out of synch. The main point is that the expressions are not carried from the SQL statement to the ADO.NET DataSet.

DataColumn Expressions

Expressions can also be defined through the ADO.NET DataColumn object. Instead of calculating the extended price through the SQL statement, a DataColumn can be defined to represent the extended price. One difference between using an expression in a SQL statement and a DataColumn is that if any of the fields in the expression are modified, the DataColumn will automatically update a DataColumn defined as an expression field, but it will not update a DataColumn that resulted from a SQL expression (such as the one in Figure 1).

Figure 2 demonstrates how to populate a DataTable in a DataSet using a SQL statement and create a new DataColumn to represent a calculated expression using the other fields from the DataSet's only DataTable. The DataTable's default view is then bound to the ASP.NET DataGrid called grdOrderDetail.

Figure 2 Filling a DataSet and Adding an Expression

private void LoadOrderDetailData()
{
    //
    //— Set up the connection
    //
    SqlConnection oCn = new SqlConnection("Data Source=papa;" + 
        "Initial Catalog=northwind;User ID=sa;Password=secretpassword;");

    //
    //— Set up the SELECT command
    //
    string sSQL = "SELECT od.OrderID, od.ProductID, p.ProductName, 
        od.UnitPrice, od.Quantity, od.Discount " +
        " FROM [order details] od INNER JOIN Products p ON od.ProductID = 
        p.ProductID " + " ORDER BY od.OrderID, p.ProductName ";
    SqlCommand oSelCmd = new SqlCommand(sSQL, oCn);
    oSelCmd.CommandType = CommandType.Text;

    //
    //— Set the SqlCommand to the DataAdapter and fill 
    //— the DataSet
    //
    SqlDataAdapter oDA = new SqlDataAdapter(oSelCmd);
    DataSet oDs = new DataSet();
    oDA.Fill(oDs, "OrderDetail");

    //
    //— Add the computed columns (Expressions)
    //
    oDs.Tables["OrderDetail"].Columns.Add("ExtendedPrice", 
        typeof(decimal), "UnitPrice * Quantity");
            
    //
    //— Bind the data to the ASP.NET DataGrid
    //
    grdOrderDetail.DataSource = oDs.Tables["OrderDetail"].DefaultView;
    grdOrderDetail.DataBind();
    oDs.Dispose();
}

This code also creates and fills a DataSet with order detail information. Then a column is added to the DataSet's DataTable to represent the expression. This column is called ExtendedPrice, its datatype is set to type decimal, and its expression is the product of the unit price and the quantity columns. This expression can refer to any DataColumn of its DataTable. It gets the values from the DataRow that it is currently on, as well. For example, if the first row has a quantity of 10 and a unit price of $7.00, then the extended price column would use these values to arrive at $70.00.

Expressions can also include literal values as well as other columns from the DataTable in a calculation. For example, the expression that defines the extended price could be changed so that a discount is taken into consideration. With that in mind, the expression could be changed to reflect the discount, too, by changing the code that adds the ExtendedPrice column to look like this:

oDs.Tables["OrderDetail"].Columns.Add("ExtendedPrice", 
    typeof(decimal), "(UnitPrice * Quantity) * (1 - Discount)");

Try modifying any of the UnitPrice, Discount, or Quantity DataColumns' values and, unlike a column derived from a SQL expression, the value of ExtendedPrice DataColumn is also updated. This feature is great for applications where the user can modify their shopping cart, apply the changes, and see the updated total price.

Expressions can also be used to represent other data types such as Boolean and string values. For example, the following code sample demonstrates how to add a field that indicates whether an author gets a discount.

oDs.Tables["OrderDetail"].Columns.Add("GetsDiscount", typeof(bool),
    "Discount > 0");

You can enhance the previous expression by creating a compound expression to consider multiple conditions by using AND, OR, or NOT. Other operators, including those such as LIKE and IN, can also be used in the expression.

Expressions can represent string values, too, such as taking the first name and last name columns from a DataTable and putting them together. The following sample code snippet concatenates ProductName and ProductID.

oDs.Tables["OrderDetail"].Columns.Add("stringfield", typeof(string),
    "ProductID + '-' + ProductName");

Functions

If you want your column to have an expression containing more complicated logic, try embedding some functions in it. Expressions can contain functions such as Len, Iif, IsNull, Convert, Trim, and Substring. Each of these allows greater flexibility in creating expressions. The Len function evaluates the length of a string:

oDs.Tables["OrderDetail"].Columns.Add("LengthOfProductName",
    typeof(int), "Len(ProductName)");

The Iif function is an iterative If statement like the IIf in Visual Basic® .NET. It takes three arguments and evaluates the first to determine if it's true or false. If the first argument evaluates to true, the second argument is returned from the Iif function; otherwise the third argument is returned. The following is basically a condensed If...Then...Else statement that can easily be used for an embedded expression:

oDs.Tables["OrderDetail"].Columns.Add("Inventory", typeof(string),
    "Iif(Quantity < 10,'A few left', 'Plenty in stock')");

The IsNull function evaluates its first argument to see if it is equal to System.DbNull. If it evaluates to false, then the function returns the value of the first argument. If the first argument evaluates to true, then the IsNull function returns the value of the second argument. This can be used in places where you don't want to allow null values and you need to replace them with an empty string or placeholder value, as shown here:

oDs.Tables["OrderDetail"].Columns.Add("DiscountString",
    typeof(string), "IsNull(Discount, '[null value]')");

The Trim function removes trailing whitespace characters from a string value. The Convert function changes the datatype for a value in the expression to the datatype of the second argument passed to the Convert function. The Substring function returns a part of the string value passed to it, like its T-SQL counterpart. Substring can be used to crop long string values to display only a limited number of characters. It can be used to return any part of a string and can be combined with the other functions as necessary:

oDs.Tables["OrderDetail"].Columns.Add("ShortProduct", 
    typeof(string), "Substring(ProductName, 1, 10)");

Aggregates and Relations

Embedding functions within expressions can help you to create a column that represents more complex logic. But what if you want to create a column that calculates values across multiple rows? Enter aggregate functions.

Assuming you have a structure in a DataSet such as the Orders-to-Order Details relation in the SQL Server Northwind database, creating columns with expressions containing aggregate functions is pretty straightforward. The code in Figure 3 demonstrates how to create a DataSet containing a structure where the orders are in the parent DataTable and the order details are in the child DataTable. These DataTable objects are related to one another through the DataRelation called Orders2OrderDetails. For more information on DataRelation objects and how they help relate hierarchical and relational data structures in ADO.NET, see my November 2002 MSDN® Magazine column.

Figure 3 A Relation and Expressions

private void LoadData()
{
    //
    //— Set up the connection
    //
    SqlConnection oCn = new SqlConnection("Data Source=papa; " + 
        "Initial Catalog=northwind;User ID=sa;Password=secretpassword;");

    //
    //— Set up the SELECT command for the orders
    //
    string sSQL = "SELECT OrderID, CustomerID, OrderDate, ShipCity, 
        ShipCountry " + " FROM    Orders " +
        " ORDER BY CustomerID, OrderDate DESC ";
    SqlCommand oSelCmd = new SqlCommand(sSQL, oCn);
    oSelCmd.CommandType = CommandType.Text;

    //
    //— Set the SqlCommand to the DataAdapter and fill 
    //— the DataSet
    //
    SqlDataAdapter oDA = new SqlDataAdapter(oSelCmd);
    DataSet oDs = new DataSet();
    oDA.Fill(oDs, "Order");

    //
    //— Set up the SELECT command for the order details
    //
    string sSQL = "SELECT od.OrderID, od.ProductID, p.ProductName, 
        od.UnitPrice, od.Quantity, od.Discount " +
        " FROM [order details] od INNER JOIN Products p ON 
        od.ProductID = p.ProductID " +
        " ORDER BY od.OrderID, p.ProductName ";
    SqlCommand oSelCmd = new SqlCommand(sSQL, oCn);
    oSelCmd.CommandType = CommandType.Text;

    //
    //— Set the SqlCommand to the DataAdapter and fill 
    //— the DataSet
    //
    SqlDataAdapter oDA = new SqlDataAdapter(oSelCmd);
    oDA.Fill(oDs, "OrderDetail");

    //
    //— Relate the DataTables
    //
    oDs.Relations.Add("Order2OrderDetail", oDs.Tables["Order"].Columns
    ["OrderID"], oDs.Tables["OrderDetail"].Columns["OrderID"]);

    //
    //— Add the Expressions to the OrderDetail Table
    //
    oDs.Tables["OrderDetail"].Columns.Add("OrderDate", typeof(string), 
        "Parent.OrderDate");
    oDs.Tables["OrderDetail"].Columns.Add("ExtendedPrice", typeof(decimal), 
        "(UnitPrice * Quantity) * (1 - Discount)");
    oDs.Tables["OrderDetail"].Columns.Add("GetsDiscount", typeof(bool), 
        "Discount > 0");
    oDs.Tables["OrderDetail"].Columns.Add("stringtest", typeof(string), 
        "ProductID + '-' + ProductName");
    oDs.Tables["OrderDetail"].Columns.Add("LengthOfProductName", 
        typeof(int), 
        "Len(ProductName)");
    oDs.Tables["OrderDetail"].Columns.Add("Inventory", typeof(string), 
        "Iif(Quantity < 10, 'Only a few left', 'Plenty in stock')");
    oDs.Tables["OrderDetail"].Columns.Add("DiscountString", typeof(string), 
        "IsNull(Discount, '[null value]')");
    oDs.Tables["OrderDetail"].Columns.Add("ShortProduct", typeof(string), 
        "Substring(ProductName, 1, 10) + '...'");

    //
    //— Add the Expressions to the order table
    //
    oDs.Tables["Order"].Columns.Add("OrderTotal", typeof(decimal), 
        "Sum(Child(Order2OrderDetail).ExtendedPrice)");
    oDs.Tables["Order"].Columns.Add("AvgQuantity", typeof(decimal), 
        "Avg(Child(Order2OrderDetail).Quantity)");
}

In Figure 3 notice how the expression columns are created and added to the Order DataTable. The first expression creates a column that represents the sum total of each order's order detail items. Specifically, the OrderTotal DataColumn has an expression that sums the OrderDetail DataTable's expression-based DataColumn called ExtendedPrice. So, as you can see, you can use an aggregate function across a DataRelation and even use it on another expression-based column.

There are other aggregate functions available to you in ADO.NET, including Sum, Avg, Max, Min, StDev, Var, and Count. The example in Figure 3 also demonstrates how to use the Avg function to get the average order detail's quantity. One of the real keys here is the use of the parent and child keywords to link the data across the DataRelation, like this:

oDs.Tables["Order"].Columns.Add("AvgQuantity", typeof(decimal),  
    Avg(Child(Order2OrderDetail).Quantity)");

The Child function accepts the name of the DataRelation to get to the child rowset. This argument is optional and only necessary when there is more than one DataRelation indicating a child relation from the source DataTable. So, if DataTable only has a single child DataTable, then the syntax can be simplified to exclude the name of the DataRelation since it is the only DataRelation:

oDs.Tables["Order"].Columns.Add("AvgQuantity", typeof(decimal), 
    Avg(Child.Quantity)");

Roll-downs and Computations

The Parent function works just like the Child function except, of course, that it goes up the relation chain to the parent DataTable. Both of these functions help create a pseudo GROUP BY functionality in your ADO.NET code.

One other operation that these keywords make possible is to roll up and down values that are unchanged from one DataTable to another. I'm often asked how to rejoin a parent and child DataTable into a single DataTable so it can be displayed in a single DataGrid (non-hierarchically). Using the Parent keyword, it is possible to roll down fields to a child DataTable and then bind only the child DataTable to a DataGrid. For example, if I wanted to show the Order Date in each row of the OrderDetail DataTable from the example in Figure 3, I could add the DataColumn using the Parent keyword, like this:

oDs.Tables["OrderDetail"].Columns.Add("OrderDate", typeof(string), 
    "Parent.OrderDate");

This feature allows you to roll up and down fields without performing any calculations on them. Using this rolling feature with the Child keyword, you can roll down fields from the parent table and bind to a DataGrid. By doing this, you end up with a single two-dimensional rowset, just like you get from a SQL statement. Keep in mind, however, that if you intend to use the data in a single rowset then you are better off returning all of your data in a single rowset to a single DataTable. However, if you want to use the relational structure in the DataSet, the Parent keyword gives you flexibility in displaying the data.

Another feature that is worth a look is the DataTable's Compute function, which calculates using an aggregate function on the current DataTable given a specific filter. For example, you could add a column to the DataTable that calculates the total number of orders with an order total over $999.00.

In the first argument of the Compute function, shown in the following code, an aggregate function is performed by counting all OrderTotal values that meet the filter requirement:

//————————————————————————————
//— Display the total number of orders over $999
//————————————————————————————
int iCnt = (int)oDs.Tables["Order"].Compute("Count(OrderTotal)", 
   "OrderTotal >= 1000");
lblTest.Text = iCnt.ToString() + " orders are at least $1000";

The filter is specified in the second argument and limits the aggregate function to include only rows that meet its condition. The code is counting all rows where the OrderTotal is $1000.00 or greater. This is a great tool to quickly perform calculations on a DataTable, especially since you can filter. For example, you could easily find the number of customers who ordered product x and the number of customers who ordered product y, without writing a loop or querying the database again.

It is important to remember that DataColumn objects bound to expressions cannot be updated manually. These columns are bound to an expression so they cannot be overridden unless the expression is removed. Also, this data does not have a corresponding column in the database (or XML file) that serves as the data source. So if you want to save the data back to a database, you need to be aware that the expression won't be saved to the database.

This month I've discussed expressions both in SQL and ADO.NET. Now that you see the wide range of functionality provided by expressions, you'll want to use them to manipulate and add data in your own applications.

Send questions and comments for Johnny to mmdata@microsoft.com.

John Papais VP of Software Development at MJM Investigations in Raleigh, NC. He has authored several books on ADO, XML, and SQL Server and can often be found speaking at industry conferences such as VSLive. You can reach him at mmdata@microsoft.com.