Express Yourself with Expression-based Columns

 

Dino Esposito
Wintellect

June 15, 2002

See the ComputedColumns.vb sample page.

An often-underestimated feature of ADO.NET is the DataColumn.Expression property that allows you to create virtual in-memory columns where content is obtained as the result of an expression applied to other existing columns.

Computed columns are nothing new for database developers. All commercial database management systems (DBMS), in fact, have provided this feature for as long as I can remember. Among other things, this means that you can always alter the form of your query to accommodate your code needs. For example, if your application needs some data expressed and/or combined in a way that the physical columns do not natively provide, a calculated column can perform this function. Computed column can return preprocessed and pre-aggregated data in a more user-friendly format ready to use or process further.

By design, an expression combines chunks of existing data in order to produce a nonstandard structure that you don't strictly need to store, but love to find available in its finalized form. Suppose, for instance, that your database table has two columns—price and tax percentage. More often than not, though, when it comes to displaying that data, you need to represent the information given by the sum of the price and the applicable taxes. Storing the final price in the database would result in a poor design because you still need price and tax as separate data. On the other hand, the sum of price and taxes is the information your users need. In situations like this one, you might want to resort to a virtual column whose values are not physically stored in the database, but computed when needed.

Aside from DBMS-resident computed columns, you can also ask for dynamically created computed columns directly in the query. For example, the query below returns a computed column that does not exist in the table, but gets created on the fly.

SELECT lastname + ', ' + firstname AS Name FROM employees

In ADO.NET, expression-based columns (expression columns) apply this logic to in-memory and disconnected data. In-memory computed columns can have many uses. In particular, they can be employed to overcome some of the limitations of the .NET data binding for both Web Forms and Windows Forms applications. In addition, expression columns also incorporate the features of the GROUP BY SQL statement. Used in conjunction with a DataRelation object, ADO.NET computed columns can calculate aggregate values on groups of records stored in a DataTable object.

In this article, I'll discuss a couple of scenarios in which you can make profitable use of ADO.NET computed columns. Before going any further, though, a bit of background information is in order.

The Expression Property

In ADO.NET, defining a computed column is as easy as setting the Expression property on the DataColumn object with a string value. The string must represent a valid expression according to a given syntax. Internally, the DataColumn class builds a chain of the columns involved in the expression and automatically refreshes the computed values when one or more of the involved columns is updated.

Notice that the parent DataTable object fires a pair of ad hoc events (called ColumnChanging and ColumnChanged) when the value of any of its columns changes. However, no event at all is fired for columns driven by an expression.

Unlike what happens with computed columns in most DBMS, a computed ADO.NET DataColumn object represents a true column in the sense that all the values are stored in memory all the time. A computed column, though, is permanently marked as read-only. In other words, you cannot remove the readonly attribute for the column until the bound expression is removed. Of course, the readonly attribute also prevents any application to assign direct values to the cells of the column.

To disassociate a computed column from the current expression, you set the expression to null (Nothing if you use Visual Basic®) or to the empty string. In this case, all the values in the column are promptly reset to the default value for the column. If no default value for the column has been specified through the DefaultValue property, then the System.DBNull value is assigned. The column's values are equally reset and recalculated if you simply modify, but not cancel, the expression.

When defining an expression-based column, you are also required to indicate the type of the column. The data type, in fact, is the only argument of the DataColumn constructor that allows you to specify an expression string.

Dim col As DataColumn
col = dt.Columns.Add("ExpCol", GetType(Boolean), expr)

You can also use a simpler constructor and set the Expression property explicitly:

Dim col As DataColumn = New DataColumn("ExpCol")
col.Expression = expr 

When you don't explicitly set the column's data type, it defaults to String. When you create a computed column and don't specify the type, you might perhaps expect the .NET Framework to infer the type for you. This never happens and with good reason. Inferring the type would be a rather complicated and computationally expensive. Manually setting the type of the column would result in better and faster code.

Normally, the result of the expression is cast to the type of the column. If this cannot be done, an EvaluateException exception is thrown. In light of this, bear in mind that the representation of the resulting data can change quite a bit. For example, a Boolean expression can originate True/False strings if the column type is String and if 1/0 values exist for numeric columns.

While you can easily change the expression of a computed column on the fly, you cannot do the same with the column type. However, this is a general limitation and does not apply to computed columns exclusively. The error message you get informs you that the system cannot change the type of a column once it has data. To work around this, for any kind of column, remove the column from the table, make all the changes, and then add it again.

Dim col As DataColumn = dt.Columns(index)

' Remove (and automatically empty) the column object from the table
dt.Columns.Remove(col)

' Change the type and add the object back in the same position
col.DataType = GetType(Integer)
dt.Columns.AddAt(col, index)

Removing a column from a DataTable does not destroy the DataColumn object as long as it remains in scope. The only effect is that the column is emptied—the ideal condition to change the data type.

The Syntax of Expressions

A computed column is identified by an expression. The expression calculates a value by combining multiple columns of the same table using operators. You can use simple arithmetic operators like + and * as well as logical operators like = and <. Boolean operators (AND, OR, NOT) are allowed, as well as more complex operators like IN, LIKE, and % (modulus). Operators take the usual precedence that you can alter using parentheses at will.

The following strings represent some feasible expressions for computed columns.

Country = 'USA'
Year >= 1999 And LastName LIKE 'A%'
Price * 0.2 + Freight

Column names can be used as keywords and must be unquoted. Names, instead, must always be single-quoted. Dates should be bracketed by the pound (#) character.

HireDate > #2-1-2002#

Wildcards like % and * are perfectly supported, but only at the beginning or the end of the string. The expression does not recognize enum values unless you cast them to the actual value and type. Decimal values can be used as long as you represent them with strings according to the current locale.

By default, strings are compared in a case-insensitive way. However, how this happens is controlled by the CaseSensitive property available on both the DataSet and the DataTable object. Of course, for a column the property on the DataTable takes precedence.

The expression can include several ad hoc functions for calculating aggregates, including Count, Max, Min, Sum, and Avg. A special keyword—the child keyword—is also admitted for group-by functionalities. (More on this later.)

Finally, the syntax is enriched by six functions: Len, IsNull, Iif, Convert, Trim, and Substring. These functions add a bit of flexibility to the expressions you can define. In particular, Len returns the length of a string and throws an exception if you call it to work on a non-string column. Like the T-SQL counterpart, IsNull wraps a given expression and returns the specified value if the expression evaluates to a System.DbNull value.

The function Iif makes the logic of the expression a bit more sophisticated as it lets you return values based on a Boolean expression. For example,

' Looks like a If..Then..Else statement
Iif(Len(description)>100, "[More]", description)

In this case, if the description is longer than 100, a standard text is returned. Otherwise, you get the content of the field. Iif used in conjunction with Substring can produce a very nice effect. Let's consider the following piece of code.

DataTable dt = ds.Tables["MyTable"];
dt.Columns.Add("FullName", typeof(string), 
   "Iif(Len(LastName + ', ' + FirstName) >10, " + 
      "'<span title=\"' + LastName + ', ' + FirstName + '\">' + 
      "Substring(LastName + ', ' + FirstName, 1, 10) + 
      "' ...<span>', " +
      "LastName + ', ' + FirstName)");

A new computed column called FullName is added to the target DataTable object. The expression concatenates LastName and FirstName with a comma in the middle. If the resulting expression is longer than 10 characters, it gets trimmed and an ellipsis (…) is added; otherwise, the regular text is returned. To add spice, consider that you can insert as many HTML tags and attributes as you need in the expression. In particular, if the text is too long for the application, a tooltip is added using the <span> tag and the title attribute. The result is shown in the figure below. It goes without saying that this trick works only from within an ASP.NET page.

Figure 1. Result of added <span> tag and title attribute

As you can guess from the code above, Substring extracts the specified number of characters from the given position. The function Trim, instead, removes leading and trailing white space characters, such as new lines, tabs, and soft carriage returns. Finally, the Convert function casts the column expression to the specified .NET type. If the cast is not possible, an exception is thrown.

For more information about the syntax supported by computed columns, please refer to the MSDN documentation.

Multi-field Data Binding

Both Web Forms and Windows Forms data-bound controls can be associated with data columns and use such contents to populate their own user interfaces. Classes like the ASP.NET BoundColumn, DropDownList, RadioButtonList, and the Windows Forms ComboBox, TextBox, and ListBox can be bound to data using the primary DataSource property. Sometimes, a few ancillary properties such as DisplayMember, DataTextField, or DataValueField contribute to narrow down the set of rows and columns bound according to the nature and the behavior of the control.

In the .NET Framework, you can only associate a data-bound property with a single, individual column. You can't bind properties to expressions or to simpler combinations of columns. Consider the Windows Forms combo box shown in the figure below.

Figure 2. Multi-field Windows Forms data bound combo box

It displays a full name produced by the following.

cboEmployees.DataSource = data.Tables("Employees")
cboEmployees.DisplayMember = "EmployeeName"

To work around the limitation of the single field that can be bound to a control, you have two possible options. You can request SQL Server® to return a dynamically computed column, or you can create an in-memory column using the ADO.NET DataColumn object.

Using SQL Server, you typically consume more bandwidth because more data is returned—the regular columns plus the computed one. On the client, all the returned data is stored in memory in a DataTable object. If you run a minimal query (without computed columns) and then build the computed columns in memory, you transfer less data between the client and the database server while maintaining constant the memory occupation on the client.

adapter.Fill(dataset, "Employees")
Dim dt As DataTable = data.Tables("Employees")
dt.Columns.Add("EmployeeName", GetType(String), _
   "lastname + ', ' + firstname")

The code snippet above shows how to add a computed column tailor-made to combine together more database fields for display.

In Memory Group-By

Using the child keyword in a column expression lets you implement a kind of client-side group-by functionality. The keyword works by calculating an aggregate function on the children of a parent row. For the child keyword to work, it is necessary for a relationship to exist between the involved columns.

Sum(child(RelationName).Price)

The expression above, when assigned to a column, sums the values in the column Price for all and only the records child of the current row according to the specified relation. If the DataTable has just one relation, the same expression can be rewritten as follows:

Sum(child.Price)

Let's analyze an example and assume to have a DataSet object with Customers and Orders tables. You set a relation on the two with the following code:

Dim rel As DataRelation
rel = New DataRelation("CustToOrders", _
        data.Tables("Customers").Columns("customerid"), _
        data.Tables("Orders").Columns("customerid"))
data.Relations.Add(rel)

The DataRelation object binds each customer row to a subset of orders. From a code standpoint, you access this subset of child rows using the DataRow's GetChildRows method. If you just want to evaluate an aggregate function on this subset of rows, define an expression column using the child keyword.

data.Tables("Customers").Columns.Add("Total", _
   GetType(Decimal), "Sum(child.price)")

The net effect of the column definition above is that the Total column added to the Customers table contains the total of all the orders for a given customer.

Figure 3. Sample table of what results when the GetChildRows method is used

What's the difference between leveraging the ADO.NET and the database group-by function? The database returns data ready to display and performs all the needed operations internally, regardless of the complexity. With ADO.NET objects, you can only fetch the data that you need and store in distinct result sets. Once this data has been downloaded on the client, you arrange it as appropriate by using relations and expressions. The workload is balanced between the database server and the client, and the bandwidth occupation is optimized.

However, if the client happens to be an ASP.NET application, you should carefully consider the impact on overall scalability that extra operations can have on performance. If your client is a Windows Forms, by all means consider using in-memory ADO.NET objects and simplify the database queries necessary to get hierarchical data.

DataColumn Tidbits

You cannot use an expression column to define the primary key of a DataTable object. This is not surprising as it is exactly what happens with some SQL Server indexes. When speaking about SQL Server and the DataColumn object, it is worth noticing that there is no way to load SQL Server computed columns as ADO.NET expression-based columns. In other words, the computability attribute is not inherited when the column is fetched and loaded into a DataTable object. However, if you want to check whether a SQL Server column is expression-based, you can do that in the following way. As a first step, prepare a query that includes all the columns, and then execute the query to get a SqlDataReader object. Next, call the get GetSchemaTable method. The method returns a DataTable with columns that contain valuable information (IsExpression, IsIdentity, IsKey, IsUnique, and NumericPrecision) about the schema of the underlying table.

When a DataTable object is serialized to XML using the WriteXml method, the content of all columns is persisted to the output stream. If you don't save schema information, though, you won't be able to restore any computed column.

dataTable.WriteXml(xmlFile, WriteMode.WriteSchema)

If you don't want to persist schema information, then make sure that you restore the serialized schema-less table into a DataTable object with the correct schema information (that is, a DataTable object in which the columns have been created). The schema of the Total column discussed above is serialized as follows:

<xs:element 
   name="Total" 
   msdata:ReadOnly="true" 
   msdata:Expression="sum(child(CustToOrders).price)" 
   type="xs:decimal" 
   minOccurs="0" />

Finally, although you can create computed columns to calculate aggregate functions on the whole table, if that is your goal, consider using the DataTable.Compute method instead. The Compute method calculates and returns a value and does not create a column.

Dialog Box

I love the ASP.NET DataGrid control, but hate its inability to maintain a fixed height when, on the last page, the number of records is smaller than the page size. Is there a way to work around this issue?

A possible solution is demonstrated on https://www.gotdotnet.com in the User Samples section. The idea is preprocessing the DataTable and normalizing the number of rows so that it matches perfectly the page size. Given the page size, you calculate how many rows the data source will be missing on the last page and add as many blank rows.

An even better and less code-intensive approach is wrapping the DataGrid control in a fixed-height <table> tag, and then adjusting borders and background color. Giving a fixed height to the DataGrid won't work because the height of each row will be proportionally expanded to cover the control's height.

Dino Esposito is Wintellect's ADO.NET expert and a trainer and consultant based in Rome, Italy. Dino is a contributing editor to MSDN Magazine and writes the Cutting Edge column. He also regularly contributes to the asp.netPROl magazine . In addition, he is the author of Building Web Solutions with ASP.NET and ADO.NET from Microsoft Press, and the cofounder of https://www.vb2themax.com/. You can reach Dino at dinoe@wintellect.com.