On the Way to Mastering ASP.NET: Introducing Custom Entity Classes

 

Karl Seguin

March 2005

Summary: There are situations for which untyped DataSets may not be the best solution for data manipulation. The goal of this guide is to explore an alternative to DataSets: custom entities and collections. (32 printed pages)

Contents

Introduction
Problems with Datasets
Custom Entity Classes
Object-Relational Mapping
Custom Collections
Managing Relationships
Beyond the Basics
Conclusion

Introduction

The days of ADODB.RecordSet and the oft-forgotten MoveNext are gone, replaced by the powerful and flexible capabilities of Microsoft ADO.NET. Our new arsenal is the System.Data namespace, featuring lightning-fast DataReaders, feature-rich DataSets, and packaged in a capable object-oriented model. It's no surprise that we have such tools at our disposal. Any 3-tier architecture relies on a solid Data Access Layer (DAL) to elegantly connect the data layer to the business layer. A quality DAL helps promote code reuse, is key to good performance, and is totally transparent.

As our tools have evolved, so too have our development patterns. Saying goodbye to MoveNext was more than ridding ourselves of cumbersome syntax; it opened our minds to disconnected data, which in turn had a profound impact on how we built applications.

While DataReaders might have been familiar (they behave a lot like RecordSets), it didn't take long for us to venture forward and explore DataAdapters, DataSets, DataTables, and DataViews. It was our growing skills at exploiting these new objects that changed how we developed. Disconnected data allowed us to utilize new caching techniques, and thus greatly improve the performance of our applications. The capability of these classes allowed us to write smarter and more powerful functions, while at the same time reducing, sometimes considerably, the amount of code required for common activities.

There are a number of situations for which DataSets are particularly well suited, such as prototyping, small systems, and support utilities. However, using them in enterprise systems, where ease of maintenance is more important than time to market, may not be the best solution. The goal of this guide is to explore an alternative to DataSets that is geared towards this type of work: custom entities and collections. Other alternatives exist, but none provide the same capability or have more backing. Our first task will be to look at the shortcomings of DataSets in order to understand the problem we are trying to solve.

Keep in mind that every solution has its own advantages and disadvantages, so it's possible that the drawbacks of DataSets are going to be more palatable to you than those of custom entities (which we'll also discuss). You and your team must decide which solution is better suited to your project. Remember to consider the total cost of a solution, including the nature of requirements to change and the likelihood that you'll spend more time in post-production than actually developing code. Finally, note that when I refer to DataSets, I don't mean typed-DataSets, which indeed solve some of the shortcomings associated with untyped-DataSets.

Problems with Datasets

Lack of Abstraction

The first and most obvious reason to consider alternatives is the DataSet's inability to decouple your code from the database structure. DataAdapters do a great job of making your code blind to the underlying database vendor (Microsoft, Oracle, IBM, …) but fail to abstract away the core component of a database: tables, columns, and relationships. These core database components are also the core components of the DataSet. DataSets and databases share more than just common components; unfortunately, they also share their schema. Given the following select statement:

SELECT UserId, FirstName, LastName
   FROM Users

We know that values will be available from the UserId, FirstName, and LastName DataColumns within our DataSet.

Why is this so bad? Let's look at a basic everyday example. First we have a simple DAL function:

'Visual Basic .NET
Public Function GetAllUsers() As DataSet
 Dim connection As New SqlConnection(CONNECTION_STRING)
 Dim command As SqlCommand = New SqlCommand("GetUsers", connection)
 command.CommandType = CommandType.StoredProcedure
 Dim da As SqlDataAdapter = New SqlDataAdapter(command)
 Try
  Dim ds As DataSet = New DataSet
  da.Fill(ds)
  Return ds
 Finally
  connection.Dispose()
  command.Dispose()
  da.Dispose()
 End Try
End Function

//C#
public DataSet GetAllUsers() {
 SqlConnection connection = new SqlConnection(CONNECTION_STRING);
 SqlCommand command = new SqlCommand("GetUsers", connection);
 command.CommandType = CommandType.StoredProcedure;
 SqlDataAdapter da = new SqlDataAdapter(command);
 try {
  DataSet ds = new DataSet();
  da.Fill(ds);
  return ds;
 }finally {
  connection.Dispose();
  command.Dispose();
  da.Dispose();
 }            
}

Next we have a page with a repeater displaying all the users:

<HTML>
 <body>
   <form id="Form1" method="post" runat="server">
     <asp:Repeater ID="users" Runat="server">
        <ItemTemplate>
           <%# DataBinder.Eval(Container.DataItem, "FirstName") %>
           <br />
        </ItemTemplate>
     </asp:Repeater>
   </form>
 </body>
</HTML>
<script runat="server">
  public sub page_load
     users.DataSource = GetAllUsers()
     users.DataBind()
  end sub
</script>

As we can see, our ASPX page makes use of the DAL function GetAllUsers for the DataSource of the repeater. If the database schema changes, for whatever reason (demoralization for performance, normalization for clarity, change in requirements), the change trickles all the way to the ASPX, namely the Databinder.Eval line, which makes use of the "FirstName" column name. This should immediately raise a red flag in your mind: a database schema change trickling all the way to ASPX code? Doesn't sound very N-Tier, does it?

If all we are doing is a simple column renaming, making changes in this example isn't complicated. But what if GetAllUsers is used in numerous places or, worse yet, exposed as a Web service, feeding countless consumers? How easily or safely can the change be propagated? For this basic example, the stored procedure itself serves as a layer of abstraction, which would probably suffice; but relying on stored procedures for anything but the most basic protection against this will likely cause greater problems down the road. Think of this as a form of hard-coding; in essence, when using DataSets, you are likely going to create a rigid connection between your database schema (regardless of whether you use column names or ordinal positions) and your application/business layers. Hopefully past experience (or logic) has taught you the impact hard-coding has on maintenance and future development.

Another way DataSets fail to provide proper abstraction is by requiring developers to know the underlying schema. We aren't talking about basic knowledge, either, but full knowledge of column names, types, and relationships. Not only does removing this requirement make your code less likely to break as we just saw, but it also makes it easier to write and maintain. Simply put:

Convert.ToInt32(ds.Tables[0].Rows[i]["userId"]);

is both hard to read and requires intimate knowledge of column names and their type. Ideally, your business layer knows nothing about the underlying database, the database schema, or SQL. If you are using DataSets as it is expressed in the previous code string (using CodeBehind doesn't make anything better), you likely have a very thin business layer.

Weakly-Typed

DataSets are weakly-typed, which makes them error prone and likely to impact your development effort. What this means is that whenever you retrieve a value from a DataSet, it comes in the form of a System.Object, which you have to convert. The risk you run is that a conversion will fail. Unfortunately, this failure won't happen at compile time, but rather at runtime. Additionally, tools such as Microsoft Visual Studio.NET (VS.NET) aren't very good at assisting your developers when it comes to weakly-typed objects. When we previously talked about requiring an in-depth knowledge of the schema, this is what was meant. Again, we'll look at a very common example:

'Visual Basic.NET
Dim userId As Integer = 
?      Convert.ToInt32(ds.Tables(0).Rows(0)("UserId"))
Dim userId As Integer = CInt(ds.Tables(0).Rows(0)("UserId"))
Dim userId As Integer = CInt(ds.Tables(0).Rows(0)(0))

//C#
int userId = Convert.ToInt32(ds.Tables[0].Rows[0]("UserId"));

This code represents possible ways of retrieving a value from a DataSet—chances are your code has this all over the place (if it doesn't do a conversion and you are using Visual Basic .NET, you might have Option Strict off, in which case you're in even deeper trouble).

Unfortunately, each of those lines has the potential to generate numerous runtime errors:

  1. The conversion can fail because:
    • The value could be null.
    • The developer might be wrong about the underlying data type (again, intimate knowledge of the database schema is required).
    • If you are using ordinal values, who knows what column is actually at position X.
  2. ds.Tables(0) might return a null reference (if any part of the DAL method or the stored procedure failed).
  3. "UserId" might not be a valid column name because:
    • It might have changed names.
    • It might not be returned by the stored procedure.
    • It might have a typo.

We could modify the code and write it more defensively, namely by adding checks for null/nothing and adding try/catch around the conversion, but this does nothing to help the developer.

Worst of all is that, as we've discussed, this isn't abstracted. The significance of this is that, every time you want to get the userId out of the DataSet, you'll run the risks listed previously, or you'll need to reprogram the same defensive steps (granted, a utility function would help alleviate this). Weakly-typed objects moves errors from design time or compile time, where they are always automatically detected and easily fixed, to the run time, where they risk being exposed in production and are harder to pinpoint.

Not Object-Oriented

Just because DataSets are objects and C# and Visual Basic .NET are object-oriented (OO) languages doesn't automatically make your usage of them object-oriented. The "hello world" of OO programming is typically a Person class that is sub-classed by an Employee class. DataSets, however, don't make this type of inheritance, or most other OO techniques, possible (or at least natural/intuitive). Scott Hanselman, an outspoken proponent of class entities, explains it best:

"A DataSet is an object, right? But it's not a Domain Object, it's not an 'Apple' or 'Orange'—it's an object of type 'DataSet.' A DataSet is a bowl (one that knows about the backing Data Store). A DataSet is an object that knows how to HOLD Rows and Columns. It's an object that knows a LOT about the Database. But I don't want to return bowls. I want to return Domain Objects, like 'Apples.'"1

DataSets keep your data in a relational format, making them powerful and easy to use with relational databases. Unfortunately, this means you lose out on all the benefits of OO.

Since DataSets can't act as domain objects, it's impossible to add functionality to them. Typically, objects have fields, properties, and methods, which behave against an instance of the class. For example, you might have Promote or CalcuateOvertimePay functions associated with a User object, which can be cleanly called through someUser.Promote() or someUser.CalculateOverTimePay(). Since methods can't be added to a DataSet you'll need to use utility functions, deal with weakly-typed objects, and have more instances of hard-coded values spread throughout your code. You basically end up with procedural code where you either continuously keep getting data out of the DataSet, or cumbersomely store them in local variables and pass them around. Both methods have their disadvantages and neither has advantages.

The Case Against DataSet

If your idea of a Data Access Layer is to return a DataSet, you are likely missing out on some significant benefits. One reason for this is that you may be using a thin or non-existent business layer that, amongst other things, limits your ability to abstract. Additionally, it's difficult to take advantage of OO techniques, since you are using a generic pre-built solution. Finally, tools such as Visual Studio.NET can't easily empower developers with weakly-typed objects, such as DataSets, which reduces productivity while increasing the likelihood of bugs.

All of these factors have a direct impact on the maintainability of your code in one way or another. The lack of abstraction makes feature changes and bug fixes more involved and risky. You aren't able to take full advantage of the code reuse or the boost in readability offered by OO. And, of course, your developers, whether they work on the business logic or the presentation logic, must have intimate knowledge of your underlying data structure.

Custom Entity Classes

Most of the problems associated with DataSets can be solved by taking advantage of the rich capabilities of OO programming within a well-defined business layer. In essence we want to take relationally organized data (database) and have it available as objects (code). The idea being that instead of having a DataTable that holds information about cars, you actually have car objects (called custom entities or domain objects).

Before we look at custom entities we'll first look at the challenges that we'll face. The most obvious is the amount of code required. Instead of simply getting the data and automatically filling a DataSet, we get the data and manually map it to the custom entities that must first be created. Seeing as this is a repetitive task, we can mitigate it using code generation tools or O/R mappers (more on this later). The bigger problem is the actual process of mapping data from the relational to the object world. For simple systems the mapping is mostly straightforward, but as the complexity grows the differences between the two worlds can become problematic. For example, a key technique to help code-reuse, as well as maintainability, in the object world is inheritance. Unfortunately, inheritance is a foreign concept to relational databases. Another example is the difference in dealing with relations, with the object world maintaining a reference to a separate object and the relational world making use of foreign keys.

It might sound as if this approach isn't well suited for more complex systems as the amount of code along with the disparity between the relational data and objects grows, but the opposite is true. Complex systems gain from this approach by having their difficulties isolated in a single layer—the mapping process (which, again, can be automated). Additionally, this approach is already quite popular, which means a number of design patterns exist to cleanly deal with added complexity. Magnify the shortcomings of DataSets previously discussed with that of a complex system and you'll end up with a system whose difficulty to build is only surpassed with its inability to change.

What Are Custom Entities?

Custom entities are objects that represent your business domain; as such, they are the foundation of a business layer. If you have a user authentication component (the example we'll be using throughout this guide), you'll likely have User and Role objects. An e-commerce system would likely have Supplier and Merchandise objects and a real estate company might have Houses, Rooms, and Addresses. Within your code, custom entities are simply classes (there's a fairly tight correlation between an entity and a class, as it's used in OO programming). A typical User class might look like:

'Visual Basic .NET
Public Class User
#Region "Fields and Properties"
 Private _userId As Integer
 Private _userName As String
 Private _password As String
 Public Property UserId() As Integer
  Get
   Return _userId
  End Get
  Set(ByVal Value As Integer)
    _userId = Value
  End Set
 End Property
 Public Property UserName() As String
  Get
   Return _userName
  End Get
  Set(ByVal Value As String)
   _userName = Value
  End Set
 End Property
 Public Property Password() As String
  Get
   Return _password
  End Get
  Set(ByVal Value As String)
   _password = Value
  End Set
 End Property
#End Region
#Region "Constructors"
 Public Sub New()
 End Sub
 Public Sub New(id As Integer, name As String, password As String)
  Me.UserId = id
  Me.UserName = name
  Me.Password = password
 End Sub
#End Region
End Class

//C#
public class User {
#region "Fields and Properties"
 private int userId;
 private string userName;
 private string password;
 public int UserId {
  get { return userId; }
  set { userId = value; }
  }
 public string UserName {
  get { return userName; }
  set { userName = value; }
 }
 public string Password {
  get { return password; }
  set { password = value; }
 }
#endregion
#region "Constructors"
 public User() {}
 public User(int id, string name, string password) {
  this.UserId = id;
  this.UserName = name;
  this.Password = password;
 }
#endregion
}

Why Are They Beneficial?

The primary benefit gained from using custom entities comes from the simple fact that they are objects fully in your control. Namely, they allow you to:

  • Take advantage of OO techniques such as inheritance and encapsulation.
  • Add custom behavior.

For example, our User class could benefit from having an UpdatePassword function added to it (this is something we could do with datasets using external/utility functions, but at a readability/maintenance cost). Additionally, they are strongly-typed, meaning we get IntelliSense support:

Aa479317.entity_fig01(en-us,MSDN.10).gif

Figure 1. IntelliSense with the User class

Finally, since custom entities are strongly-typed, they require less error-prone casts:

Dim userId As Integer = user.UserId
'versus
Dim userId As Integer = 
?         Convert.ToInt32(ds.Tables("users").Rows(0)("UserId"))

Object-Relational Mapping

As discussed earlier, one of the main challenges of this approach is dealing with the differences between relational data and objects. Since our data is persistently stored in a relational database we have no choice but to bridge the two worlds. For the previous User example we could expect to have a user table in our database that looks a lot like:

Aa479317.entity_fig02(en-us,MSDN.10).gif

Figure 2. Data view of the User

Mapping from this relational schema to our custom entity is a simple enough matter:

'Visual Basic .NET
Public Function GetUser(ByVal userId As Integer) As User
 Dim connection As New SqlConnection(CONNECTION_STRING)
 Dim command As New SqlCommand("GetUserById", connection)
 command.Parameters.Add("@UserId", SqlDbType.Int).Value = userId
 Dim dr As SqlDataReader = Nothing
 Try
  connection.Open()
  dr = command.ExecuteReader(CommandBehavior.SingleRow)
  If dr.Read Then
   Dim user As New User
   user.UserId = Convert.ToInt32(dr("UserId"))
   user.UserName = Convert.ToString(dr("UserName"))
   user.Password = Convert.ToString(dr("Password"))
   Return user
  End If
  Return Nothing
 Finally
  If Not dr is Nothing AndAlso Not dr.IsClosed Then
   dr.Close()
  End If
  connection.Dispose()
  command.Dispose()
  End Try
End Function

//C#
public User GetUser(int userId) {
 SqlConnection connection = new SqlConnection(CONNECTION_STRING);
 SqlCommand command = new SqlCommand("GetUserById", connection);
 command.Parameters.Add("@UserId", SqlDbType.Int).Value = userId;
 SqlDataReader dr = null;
 try{
  connection.Open();
  dr = command.ExecuteReader(CommandBehavior.SingleRow);
  if (dr.Read()){
   User user = new User();
   user.UserId = Convert.ToInt32(dr["UserId"]);
   user.UserName = Convert.ToString(dr["UserName"]);
   user.Password = Convert.ToString(dr["Password"]);
   return user;            
  }
  return null;
 }finally{
  if (dr != null && !dr.IsClosed){
   dr.Close();
  }
  connection.Dispose();
  command.Dispose();
 }
}

We still set up our connection and command objects like we normally would, but then we create a new instance of our User class and populate it from our DataReader. You could still use a DataSet within this function and map it to your custom entity, but the primary benefit of DataSets over DataReader is that they provide a disconnected view of the data. In this case the User instance provides that disconnected view, letting us take advantage of the DataReader's speed.

Wait a Minute! You Didn't Solve Anything!

Observant readers might notice that one of the problems I pointed out with DataSets is that they aren't strongly-typed, which leads to a loss of productivity and an increase in the potential for runtime errors. They also require developers to have an in-depth knowledge of the underlying data structure. Looking at the previous code you might notice the exact same pitfalls lurking. Consider, however, that we have encapsulated these problems within a very isolated area of the code; meaning consumers of your class entities (web interface, web service consumer, windows form) remain totally unaware of these problems. Conversely, using DataSets spreads these problems throughout the code.

Enhancement

The previous code was useful to show the basic idea behind mapping, but two key enhancements can be done to improve it. First we want to extract the populate code into its own function, as it'll likely be reused:

'Visual Basic .NET
Public Function PopulateUser(ByVal dr As IDataRecord) As User
 Dim user As New User
 user.UserId = Convert.ToInt32(dr("UserId"))
 'example of checking for NULL
 If Not dr("UserName") Is DBNull.Value Then
  user.UserName = Convert.ToString(dr("UserName"))
 End If
 user.Password = Convert.ToString(dr("Password"))
 Return user
End Function

//C#
public User PopulateUser(IDataRecord dr) {
 User user = new User();
 user.UserId = Convert.ToInt32(dr["UserId"]);
 //example of checking for NULL
 if (dr["UserName"] != DBNull.Value){
  user.UserName = Convert.ToString(dr["UserName"]);   
 }
 user.Password = Convert.ToString(dr["Password"]);
 return user;
}

The second thing to notice is that instead of using a SqlDataReader for our mapping function, we use an IDataRecord. This is an interface that all DataReaders implement. Using IDataRecord makes our mapping process vendor-independent. In other words, we can use the previous function to map a User from an Access database, even if it uses an OleDbDataReader. If you combine this specific approach with the Provider Model Design Pattern (link 1, link 2), you'll have code that can be easily used for different database vendors.

Finally, the above code demonstrates how powerful encapsulation is. Dealing with NULLs in DataSets isn't the easiest thing—that's because every time you pull a value you need to check if it's NULL. With the above population method we've conveniently taken care of this in a single place, and spared our consumers from having to deal with it.

Where to Map?

There is some debate about where such data access and mapping function belongs—as part of a separate class or as part of the appropriate custom entity. There's certainly a nice elegance to having all user-related tasks (fetching data, updating, and mapping) as part of the User custom entity. This tends to work well when the database schema closely resembles the custom entity (as in this example). As your system grows in complexity and the differences between the two worlds start to appear, having a clear separation between your data layer and business layer can greatly help simplify maintenance (I like to call this the Data Access Layer). A side-effect from having the access and mapping code inside its own layer, the DAL, is that it provides us with a nice rule for ensuring a clear separation of our layers:

"Never return a class from the System.Data or child namespace from the DAL"

Custom Collections

So far we've only looked at dealing with individual entities; however, you'll often need to deal with more than a single object. A simple solution would be to store multiple values inside a generic collection, such as an Arraylist. This is a less than ideal solution, as it reintroduces some of the problems we had with DataSets, namely:

  • They aren't strongly-typed, and
  • Custom behavior can't be added.

The solution that best fits our needs is to create our own custom collection. Thankfully the Microsoft .NET Framework provides a class specifically meant to be inherited for this purpose: CollectionBase. CollectionBase works by storing any type of object inside private Arraylists, but exposing access to these private collections through methods that only take a specific type, such as a User object. In other words, weakly-typed code is encapsulated within a strongly-typed API.

While custom collections might seem like a lot of code, most of it is code generation or cut and paste friendly, oftentimes requiring only one search and replace. Let's take a look at the different parts that make up a custom collection for our User class:

'Visual Basic .NET
Public Class UserCollection
   Inherits CollectionBase
 Default Public Property Item(ByVal index As Integer) As User
  Get
   Return CType(List(index), User)
  End Get
  Set
   List(index) = value
  End Set
 End Property
 Public Function Add(ByVal value As User) As Integer
  Return (List.Add(value))
 End Function
 Public Function IndexOf(ByVal value As User) As Integer
  Return (List.IndexOf(value))
 End Function
 Public Sub Insert(ByVal index As Integer, ByVal value As User)
  List.Insert(index, value)
 End Sub
 Public Sub Remove(ByVal value As User)
  List.Remove(value)
 End Sub
 Public Function Contains(ByVal value As User) As Boolean
  Return (List.Contains(value))
 End Function
End Class

//C#
public class UserCollection : CollectionBase {
 public User this[int index] {
  get {return (User)List[index];}
  set {List[index] = value;}
 }
 public int Add(User value) {
  return (List.Add(value));
 }
 public int IndexOf(User value) {
  return (List.IndexOf(value));
 }
 public void Insert(int index, User value) {
  List.Insert(index, value);
 }
 public void Remove(User value) {
  List.Remove(value);
 }
 public bool Contains(User value) {
  return (List.Contains(value));
 }
}

More can be done by implementing CollectionBase, but the previous code represents the core functionality that is necessary for a custom collection. Looking at the Add function, we can see how we are simply wrapping the call to List.Add (which is an Arraylist) in a function that only allows a User object.

Mapping Custom Collections

The process of mapping our relational data to custom collections is very similar to the one we examined for custom entities. Instead of creating a single entity and returning it, we add the entity to the collection and loop to the next one:

'Visual Basic .NET
Public Function GetAllUsers() As UserCollection
 Dim connection As New SqlConnection(CONNECTION_STRING)
 Dim command As New SqlCommand("GetAllUsers", connection)
 Dim dr As SqlDataReader = Nothing
 Try
  connection.Open()
  dr = command.ExecuteReader(CommandBehavior.SingleResult)
  Dim users As New UserCollection
  While dr.Read()
   users.Add(PopulateUser(dr))
  End While
  Return users
 Finally
  If Not dr Is Nothing AndAlso Not dr.IsClosed Then
   dr.Close()
  End If
  connection.Dispose()
  command.Dispose()
 End Try
End Function

//C#
public UserCollection GetAllUsers() {
 SqlConnection connection = new SqlConnection(CONNECTION_STRING);
 SqlCommand command =new SqlCommand("GetAllUsers", connection);
 SqlDataReader dr = null;
 try{
  connection.Open();
  dr = command.ExecuteReader(CommandBehavior.SingleResult);
  UserCollection users = new UserCollection();
  while (dr.Read()){
   users.Add(PopulateUser(dr));
  }
  return users;
 }finally{
  if (dr != null && !dr.IsClosed){
   dr.Close();
  }
  connection.Dispose();
  command.Dispose();
 }
}

We get the data from the database, create our custom collection, and loop through the results to create each User object and add it into the collection. Notice also how the PopulateUser mapping function is reused.

Adding Custom Behavior

When talking about custom entities we only peripherally mentioned the ability to add custom behavior to our classes. The type of functionality that you'll be adding to your entities will mostly depend on the type of business logic you are implementing, but there is probably some common functionality you'll want to implement in your custom collections. One such example would be to return a single entity based on some key, for example a user based on a userId:

'Visual Basic .NET
Public Function FindUserById(ByVal userId As Integer) As User
 For Each user As User In List
  If user.UserId = userId Then
   Return user
  End If
 Next
 Return Nothing
End Function

//C#
public User FindUserById(int userId) {
 foreach (User user in List) {
  if (user.UserId == userId){
   return user;
  }
 }
 return null;
}

Another one might be to return a subset of users based on certain criteria, such as a partial user name:

'Visual Basic .NET
Public Function FindMatchingUsers(ByVal search As String) As UserCollection
 If search Is Nothing Then
  Throw New ArgumentNullException("search cannot be null")
 End If
 Dim matchingUsers As New UserCollection
 For Each user As User In List
  Dim userName As String = user.UserName
  If Not userName Is Nothing And userName.StartsWith(search) Then
   matchingUsers.Add(user)
  End If
 Next
 Return matchingUsers
End Function

//C#
public UserCollection FindMatchingUsers(string search) {
 if (search == null){
  throw new ArgumentNullException("search cannot be null");
 }
 UserCollection matchingUsers = new UserCollection();
 foreach (User user in List) {
  string userName = user.UserName;
  if (userName != null && userName.StartsWith(search)){
   matchingUsers.Add(user);
  }
 }
 return matchingUsers;
}

Using DataSets the same way can be achieved with DataTable.Select. It is important to note that while creating your own functionality puts you in absolute control of your code, the Select method provides a very convenient and code-free means of doing the same thing. On the flip side, Select requires developers to know the underlying database and isn't strongly-typed.

Binding Custom Collections

The first example we looked at was that of binding a DataSet to an ASP.NET control. Considering how common this is, you'll be glad to know that custom collections bind just as easily (this is because CollectionBase implements Ilist, which is used for binding). Custom collections can serve as the DataSource for any control that exposes it, and DataBinder.Eval can be used just as you would a DataSet:

'Visual Basic .NET
Dim users as UserCollection = DAL.GetallUsers()
repeater.DataSource = users
repeater.DataBind()

//C#
UserCollection users = DAL.GetAllUsers();
repeater.DataSource = users;
repeater.DataBind();

<!-- HTML -->
<asp:Repeater onItemDataBound="r_IDB" ID="repeater" Runat="server">
 <ItemTemplate>
  <asp:Label ID="userName" Runat="server">
   <%# DataBinder.Eval(Container.DataItem, "UserName") %><br />
  </asp:Label>
 </ItemTemplate>
</asp:Repeater>

Instead of using the column name as the second parameter for DataBinder.Eval, you specify the property name you wish to display, in this case UserName.

For those doing processing in the OnItemDataBound or OnItemCreated exposed by many data bound controls, you are probably casting e.Item.DataItem to DataRowView. When binding to a custom collection, e.Item.DataItem is instead cast to the custom entity; in our example, the User class:

'Visual Basic .NET
Protected Sub r_ItemDataBound (s As Object, e As RepeaterItemEventArgs)
 Dim type As ListItemType = e.Item.ItemType
 If type = ListItemType.AlternatingItem OrElse
?   type = ListItemType.Item Then
  Dim u As Label = CType(e.Item.FindControl("userName"), Label)
  Dim currentUser As User = CType(e.Item.DataItem, User)
  If Not PasswordUtility.PasswordIsSecure(currentUser.Password) Then
   ul.ForeColor = Drawing.Color.Red
  End If
 End If
End Sub

//C#
protected void r_ItemDataBound(object sender, RepeaterItemEventArgs e) {
 ListItemType type = e.Item.ItemType;
 if (type == ListItemType.AlternatingItem || 
?    type == ListItemType.Item){
  Label ul = (Label)e.Item.FindControl("userName");
  User currentUser = (User)e.Item.DataItem;
  if (!PasswordUtility.PasswordIsSecure(currentUser.Password)){
   ul.ForeColor = Color.Red;
  }
 }
}

Managing Relationships

Within even the simplest system, relationships between entities will exist. With relational databases, relationships are maintained by means of foreign keys; using objects, a relationship is simply a reference to another object. For example, building on our previous examples, it's reasonable to expect a User object to have a Role:

'Visual Basic .NET
Public Class User
 Private _role As Role
 Public Property Role() As Role
  Get
   Return _role
  End Get
  Set(ByVal Value As Role)
   _role = Value
  End Set
 End Property
End Class

//C#
public class User {
 private Role role;
 public Role Role {
  get {return role;}
  set {role = value;}
 }
}

Or a collection of Roles:

'Visual Basic .NET
Public Class User
 Private _roles As RoleCollection
 Public ReadOnly Property Roles() As RoleCollection
  Get
   If _roles Is Nothing Then
    _roles = New RoleCollection
   End If
   Return _roles
  End Get
 End Property
End Class

//C#
public class User {
 private RoleCollection roles;
 public RoleCollection Roles {
  get {
   if (roles == null){
    roles = new RoleCollection();
   }
   return roles;
  }
 }
}

In these two examples, we have a fictitious Role class or RoleCollection class, which are just other custom entity or collection classes like the User and UserCollection classes.

Mapping Relationships

The real issue is how to map relationships. Let's look at a simple example; we want to retrieve a user based on userId along with his or her roles. First, we'll look at the relational model:

Aa479317.entity_fig03(en-us,MSDN.10).gif

Figure 3. Relationships between Users and Roles

Here we see a Users table and a Roles table, both of which we can map in a straightforward manner to custom entities. We also have a UserRoleJoin table, which represents the many-to-many relationship between Users and Roles.

Next we use a stored procedure to pull two separate results: the first for the User, and the second for his or her Role(s):

CREATE PROCEDURE GetUserById(
  @UserId INT
)AS
SELECT UserId, UserName, [Password]
  FROM Users
  WHERE UserId = @UserID
SELECT R.RoleId, R.[Name], R.Code
  FROM Roles R INNER JOIN
     UserRoleJoin URJ ON R.RoleId = URJ.RoleId
  WHERE  URJ.UserId = @UserId

Finally, we map from the relational model to the object model:

'Visual Basic .NET
Public Function GetUserById(ByVal userId As Integer) As User
 Dim connection As New SqlConnection(CONNECTION_STRING)
 Dim command As New SqlCommand("GetUserById", connection)
 command.Parameters.Add("@UserId", SqlDbType.Int).Value = userId
 Dim dr As SqlDataReader = Nothing
 Try
  connection.Open()
  dr = command.ExecuteReader()
  Dim user As User = Nothing
  If dr.Read() Then
   user = PopulateUser(dr)
   dr.NextResult()
   While dr.Read()
    user.Roles.Add(PopulateRole(dr))
   End While
  End If
  Return user
 Finally
  If Not dr Is Nothing AndAlso Not dr.IsClosed Then
   dr.Close()
  End If
  connection.Dispose()
  command.Dispose()
 End Try
End Function

//C#
public User GetUserById(int userId) {
 SqlConnection connection = new SqlConnection(CONNECTION_STRING);
 SqlCommand command = new SqlCommand("GetUserById", connection);
 command.Parameters.Add("@UserId", SqlDbType.Int).Value = userId;
 SqlDataReader dr = null;
 try{
  connection.Open();
  dr = command.ExecuteReader();
  User user = null;
  if (dr.Read()){
   user = PopulateUser(dr);
   dr.NextResult();
   while(dr.Read()){
    user.Roles.Add(PopulateRole(dr));
   }            
  }
  return user;
 }finally{
  if (dr != null && !dr.IsClosed){
   dr.Close();
  }
  connection.Dispose();
  command.Dispose();
 }
}

The User instance is created and populated; we move to the next result/select and loop through, populating Roles and adding them to the RolesCollection property of the User class.

Beyond the Basics

The purpose of this guide was to introduce the concept and usage of custom entities and collections. Using custom entities is a widely used practice in the industry, and as such numerous patterns have been documented to deal with a wide range of scenarios. Design patterns are great for a number of reasons. First, when it comes to addressing specific situations, chances are you aren't the first to face a given problem. Design patterns let you reuse a tried and tested solution to a given problem (design patterns aren't meant to be 100% cut and paste, but they almost always provide a sound foundation to a solution). This in turn provides you with confidence that your system will scale well with complexity, not only because it's a widely used approach but also because it's a well documented one. Design patterns also provide you with a common vocabulary, which can make knowledge transfer and training much easier.

There's nothing to say that design patterns only apply to custom entities, and in fact many don't. However, if you give them a chance you'll likely be pleasantly surprised at how many well documented patterns do apply to custom entities and the mapping process.

This last section is dedicated to pointing out some more advanced scenarios that larger or more complex systems will likely run into. While most of the topics are probably worthy of individual guides, I'll, at the very least, try to provide you with some starting resources.

A great place to start is Martin Fowler's Patterns of Enterprise Application Architecture, which won't only serve as a great reference (with detailed explanations and plenty of sample code) for common design patterns, but the first 100 pages will really get your mind wrapped around the whole concept. Alternatively, Fowler has an online catalog of patterns, which is great for those who are already familiar with the concepts but need a handy reference.

Concurrency

The previous examples all dealt with pulling data from the database and creating objects from that data. For the most part, updating, deleting, and inserting data is just as straightforward. Our business layer creates an object, passes it to our Data Access Layer, and lets it handle the mapping to the relational world. For example:

'Visual Basic .NET
Public sub UpdateUser(ByVal user As User)
 Dim connection As New SqlConnection(CONNECTION_STRING)
 Dim command As New SqlCommand("UpdateUser", connection)
 'could have a reusable function to inversly map this
 command.Parameters.Add("@UserId", SqlDbType.Int)
 command.Parameters(0).Value = user.UserId
 command.Parameters.Add("@Password", SqlDbType.VarChar, 64)
 command.Parameters(1).Value = user.Password
 command.Parameters.Add("@UserName", SqlDbType.VarChar, 128)
 command.Parameters(2).Value = user.UserName
 Try
  connection.Open()
  command.ExecuteNonQuery()
 Finally
  connection.Dispose()
  command.Dispose()
 End Try
End Sub

//C#
public void UpdateUser(User user) {
 SqlConnection connection = new SqlConnection(CONNECTION_STRING);
 SqlCommand command = new SqlCommand("UpdateUser", connection);
 //could have a reusable function to inversly map this
 command.Parameters.Add("@UserId", SqlDbType.Int);
 command.Parameters[0].Value = user.UserId;
 command.Parameters.Add("@Password", SqlDbType.VarChar, 64);
 command.Parameters[1].Value = user.Password; 
 command.Parameters.Add("@UserName", SqlDbType.VarChar, 128);
 command.Parameters[2].Value = user.UserName;
 try{
  connection.Open();
  command.ExecuteNonQuery();
 }finally{
  connection.Dispose();
  command.Dispose();
 }
}

However, one area which isn't as straightforward is when dealing with concurrency—that is, what happens when two users try to update the same data at the same time? The default behavior (if you don't do anything) is that the last person to commit the data will overwrite all previous work. This probably isn't ideal, as one user's work will be silently overwritten. One way to totally avoid any conflicts is to use pessimistic concurrency; however, this method requires some type of locking mechanism, which can be difficult to implement in a scalable manner. The alternative is to use optimistic concurrency techniques. Letting the first commit dominate and notifying subsequent users is typically a gentler and more user-friendly approach to take. This is achieved by some type of row versioning, such as timestamps.

Further reading:

Performance

Too often we worry about minute performance differences as opposed to legitimate flexibility and capability concerns. While performance is indeed important, providing generalized guidelines on anything but the simplest situations is often difficult. Take, for example, custom collections versus DataSets: which is faster? With custom collections you can make heavy use of DataReaders, which is a faster way of pulling data from a database. The point, though, is that the answer really depends on how, and with what type of data, you use them, so a blanket statement is pretty useless. What's even more important to realize is that whatever processing time you are able to save probably doesn't amount to much compared to the difference in maintainability.

Of course, no one said you couldn't have a high performance solution that is also maintainable. While I reiterate that it really depends on how you use them, there are some patterns that can help maximize performance. First, though, it's important to know that custom entities and collections cache as well as DataSets and can make use of the same mechanism—likely HttpCache. One nice thing about DataSets is the ability to write a Select statement to just grab the necessary information. With custom entities you often feel obliged to populate the entire entity as well as child entities. For example, if you want to display a list of Organizations, with a DataSet you might just pull the OganizationId, Name, and Address and bind it to a repeater. With custom entities I always feel the need to also get all the other Organization information, which might be a bit flag to say if it's ISO certified, a collection of all employees, additional contact information, and so on. Maybe others don't share this hang-up, but luckily we have, if we want, fine control over our custom entities. The most common approach is to use a type of lazy-load pattern, which only fetches the information when it's first required (which can be nicely encapsulated in a property). This type of control over individual properties provides tremendous flexibility otherwise not easily achieved (imagine trying to do something similar at the DataColumn level).

Further Reading:

Sorting and Filtering

The DataView's built-in support for sorting and filtering, although requiring knowledge of both SQL and the underlying data structure, is a convenience that is somewhat lost with custom collections. We can still sort and filter, but to do so requires us to write the functionality. While the techniques aren't necessarily advanced, a full demonstration of code is outside the scope of this section. Most techniques are fairly similar, such as using a filter class to filter a collection and a comparer class for sorting, no patterns, that I'm aware of, really exist. A number of resources do exist, however:

Code Generation

Once you get past any conceptual roadblocks, the main drawback to custom entities and collections is the amount of additional code all this flexibility, abstraction, and low maintenance costs you. In fact, you might think that all my talk about reduced maintenance cost and bugs doesn't equate with extra code. While this is certainly a valid point (again, no solution is perfect), design patterns and frameworks such as CSLA.NET go a long way to alleviating the problem. While totally different from patterns and frameworks, code generation tools can reduce the amount of code you actually need to write by significant amounts. Initially this guide was going to have an entire section detailed to code-generation tools, in specific the popular and free CodeSmith; however, numerous resources exist that likely exceed my own knowledge of the product.

Before I continue, I realize that code generation sounds like something of a dream. But when properly used and understood, it truly is a powerful arsenal in your bag of tools—even if you aren't doing custom entities. While it's true that code generation doesn't only apply to custom entities, many are specifically tailored for this purpose. The reason is simple: custom entities require a lot of repetitive code.

Briefly, how does code generation work? The idea might sound far fetched or even counterproductive, but you basically write code (templates) to generate code. CodeSmith, for example, comes with powerful classes that let you hook into a database and get all the properties: tables, columns (types, sizes, and so on), and relations. Armed with this information, much of what we've talked about so far can be automated. For example, a developer could pick a table and automatically, with the right template, have a custom entity created (with the correct fields, properties, and constructors), a mapping function, a custom collection, and basic select, insert, update, and delete functionality. It could even go a step further and implement sorting, filtering, and the other advanced features we've touched on.

CodeSmith also comes with many ready-to-use templates, which serve as a great learning resource. Finally, CodeSmith has a number of templates to implement the CSLA.NET framework. The couple of hours I initially took to learn the basics and get comfortable with CodeSmith have saved me untold time. Additionally, when all developers are using the same templates, the high level of consistency throughout your code makes it easy to work on somebody else's functions.

Further readings:

O/R Mappers

Even though my lack of experience with O/R mappers makes me cautious of talking about them, their potential value makes them impossible to ignore. Where code generators create code that is based on templates for you to copy and paste into your own source code, O/R mappers dynamically generates the code at runtime from some type of configuration mechanism. For example, within an XML file you could specify that column X of some table maps to property Y of an entity. You still create the custom entity, but collections, mappings, and other data access functions (including stored procedures) are all created dynamically. In theory, O/R mappers almost entirely mitigate the problems with custom entities. As the relational and object worlds diverge and the mapping process grows in complexity, O/R mappers become even more invaluable. Two of the downsides of O/R mappers are that they are perceived, in the .NET community at least, as being less secure and having poor performance. From what I've read, I'm convinced that they aren't any less secure, and while they might have poorer performance in some situations, they are probably superior in others. O/R mappers aren't suited for all situations, but if you are dealing with complex systems, you owe it to yourself to investigate them.

Further Reading

.NET Framework 2.0 Features

The upcoming 2.0 release of the .NET Framework will change some of the implementation details we've looked at throughout this guide. These changes will reduce the amount of code necessary to support custom entities as well as help deal with mapping issues.

Generics

One of the main reasons for the existence of the much talked about generics is to provide developers with strongly-typed collections out of the box. We shied away from the existing collections such as Arraylists because of their weakly-typed nature. Generics provide the same kind of conveniences as current collections, but in a strongly-typed manner. This is achieved by specifying the type at declaration. For example, we could replace our UserCollection with no additional code, and simply create a new instance of the List<T> generic and specify our User class:

'Visual Basic .NET
Dim users as new IList(of User)

//C#
IList<User> users = new IList<user>();

Once declared, our users collection can only deal with objects of type User, which provides us with all the niceties of compile-time checks and optimizations.

Further Reading

Nullable Types

Nullable types are actually generics that are used for different reasons than those listed previously. One of the challenges faced when dealing with databases is the proper and consistent handling of columns that support NULL. When dealing with string and other classes (known as reference types), you can simply assign nothing/null to a variable in your code:

'Visual Basic .NET
if dr("UserName") Is DBNull.Value Then
   user.UserName = nothing
End If

//C#
if (dr["UserName"] == DBNull.Value){
   user.UserName = null;
}

Or you could simply do nothing (by default, reference types are nothing/null). This doesn't work nearly as well for value types such as integers, booleans, decimals, and so on. You can certainly assign nothing/null to such values, but this will assign a default value. If you just declare an integer, or assign nothing/null to it, the variable will actually hold the value 0. This makes it difficult to map back to the database: is the value 0 or null? Nullable types solve this problem by allowing value types to either hold an actual value or null. For example, if we wanted to support a null value in the userId column (not exactly realistic), we'd first declare our userId field and corresponding property as a nullable type:

'Visual Basic .NET
Private _userId As Nullable(Of Integer)
Public Property UserId() As Nullable(Of Integer)
   Get
      Return _userId
   End Get
   Set(ByVal value As Nullable(Of Integer))
      _userId = value
   End Set
End Property


//C#
private Nullable<int> userId;
public Nullable<int> UserId {
   get { return userId; }
   set { userId = value; }
}

And then make use of the HasValue property to determine whether nothing/null was assigned:

'Visual Basic .NET
If UserId.HasValue Then
   Return UserId.Value
Else
   Return DBNull.Value
End If

//C#
if (UserId.HasValue) {
   return UserId.Value;
} else {
   return DBNull.Value;
}

Further Reading:

Iterators

The UserCollection example that we looked at represents only the base functionality that you'll likely need in your custom collection. Something that you won't be able to do with the provided implementation is loop through the collection in a foreach loop. To do so, your custom collection must have an enumerator support-class that implements the IEnumerable interface. This is a fairly straightforward and repetitive process, but nonetheless introduces even more code. C# 2.0 introduces the new yield keyword to handle the implementation detail of this interface for you. There is currently no Visual Basic .NET equivalent to the new yield keyword.

Further Readings:

Conclusion

Making the switch to custom entities and collections shouldn't be a decision you make lightly. Numerous factors exist that need to be taken into consideration. For example, your familiarity with OO concepts, the time you have to play with this new approach, as well as the environment you are thinking of deploying it in. While the advantages are significant in general, they may not be in your particular situation. Even if they are significant in your case, the drawbacks may negate them. Also keep in mind that numerous alternative solutions exist. Jimmy Nilsson has an overview of some of these alternatives in his 5 part series Choosing Data Containers for .NET (part 1, 2, 3, 4, and 5).

Custom entities empower you with the rich capabilities of object-oriented programming, as well as help you set up the Framework for a solid, maintainable N-Tier architecture. One of the goals of this guide is to make you think of your system in terms of the business entities that make it up, instead of generic DataSets and DataTable. We've also touched on some key issues you should be aware of regardless of the route you chose, namely design patterns, differences between the object and relational world (read more), and N-Tier architecture. Remember that time spent upfront has a way of paying for itself a number of times over throughout the life of a system.

1http://www.hanselman.com/blog/PermaLink.aspx?guid=d88f7539-10d8-4697-8c6e-1badb08bb3f5http://www.hanselman.com/blog/PermaLink.aspx?guid=d88f7539-10d8-4697-8c6e-1badb08bb3f5

© Microsoft Corporation. All rights reserved.