Data Points

Data Access Strategies Using ADO.NET and SQL

John Papa

Contents

ADO.NET Connections
Open Late, Close Early
Returning a Single Column or Row
Inefficient Queries
Commands and Parameters
ADO.NET Transactions
Wrapping Up

When your goal is a scalable and efficient enterprise solution, you need to develop an efficient data-access strategy. You can't just do some testing on your production machines and rely on the results. While an application can exhibit excellent response times when serving a few users, performance problems can surface when the load increases. So when I am asked to perform architectural and code reviews, there are several coding patterns I look for. In this column, I will share some of them as they relate to data access using ADO.NET. I'll discuss efficient ways to implement ADO.NET database connections to take advantage of connection pools and will present tips on how to use various ADO.NET objects to open and close connections and to manage connection state for you. I'll discuss connections and the DataAdapter and DataReader and move on to stored procedures, parameterized queries, and SQL injection attacks.

ADO.NET Connections

Database connections are usually an afterthought, which is surprising considering how essential they are and how they can easily be a source of inefficiencies and bottlenecks. One way to reduce the number of connections and their concomitant resource usage is to reuse connections through connection pooling. To be able to reuse a connection from a pool, the connection string must match that of an available pooled connection. Each connection pool is associated with one distinct connection string, using an exact matching algorithm. If no exact match is found, a new connection is created and later thrown into the pool.

Figure 1 shows three code samples that open a connection against the same database, using two different users. The first connection is accessed by the SQL Server™ user called TestUser1. Assuming there is no connection available in the pool, this connection is created from scratch. Later, this connection is closed and sent to the connection pool. The second connection is accessed by a different SQL Server user. The deactivated connection in the pool doesn't have an exact connection string match since the users are different. Thus, the second connection is created from scratch, closed, and sent to the connection pool. Then a third connection is opened by the SQL Server user called TestUser1. There is a connection in the pool that matches the connection string exactly, so the connection is retrieved from the pool.

Figure 1 Connection Pooling

using(SqlConnection oCn = new SqlConnection(
    "Server=(local);Database=Pubs;User ID=TestUser1;Password=foo1;")) {
    oCn.Open();
    ...
}

using(SqlConnection oCn = new SqlConnection(
    "Server=(local);Database=Pubs;User ID=TestUser2;Password=foo2;")) {
    oCn.Open(); 
    ...
}

using(SqlConnection oCn = new SqlConnection(
    "Server=(local);Database=Pubs;User ID=TestUser1;Password=foo1;")) {
    oCn.Open();
    ...
}

You can take advantage of connection pooling by using a service account for your application. For example, you can create a single SQL Server user account that your application will use to access the database. This will allow you to take full advantage of connection pooling since the connection strings will always match. This is a good option from a scalability perspective. However, if you want greater control over security, you should consider using Windows® authentication.

Open Late, Close Early

Another connection-related item I look for when performing a code review is whether or not a connection is held open for the shortest period of time that it is required. The basic rule is to keep the connection open only as long as you need it. You should only open a connection immediately before you need to access the database and then close it as soon as you are done accessing the database. Be wary of any nondatabase-related code that executes while the connection is open. Connections hold open valuable resources to the database, consume memory, and can lock data that could cause other queries to slow down. So it's best to open connections late and close them as early as possible.

One of the great features of ADO.NET is that the DataAdapter object's Fill and Update methods can open and close a connection automatically (see Figure 2). The advantage of this is that it is not necessary to open the connection explicitly because the DataAdapter opens it for you at the moment right before it executes its SQL command against the database and then closes it right afterwards.

Figure 2 The Fill Method Opens, Executes, and Closes

using (SqlConnection oCn = new SqlConnection(sConnString)) 
{
    string sProcName = "prGet_Products";
    using (SqlCommand oSelCmd = new SqlCommand(sProcName, oCn)) 
{
        oSelCmd.CommandType = CommandType.StoredProcedure;

        oDa.SelectCommand = oSelCmd;
        oDa.Fill(oDs); 
    }
}

Also, it is a good idea to avoid passing an open connection between methods where possible. Instead, you may want to take advantage of connection pooling and close the first connection, call the method without passing the connection, and then open a new connection within the called method.

Like the Connection object, a DataReader object should always be closed when you have finished using it. When using a DataReader, I look to see if the Command object's ExecuteReader method has its CommandBehavior set to CloseConnection. This will make sure the connection is automatically closed when the DataReader is closed. If you want to return multiple rowsets, you should omit the CommandBehavior to allow it to use its default setting, which allows multiple rowsets to be returned and accessed. Figure 3 demonstrates that the Connection will automatically be closed when the DataReader is closed.

Figure 3 Automatically Closing the Connection

oCn.Open();
SqlDataReader oDr = oCmd.ExecuteReader(CommandBehavior.CloseConnection);
while (oDr.Read())
{
    Debug.WriteLine(oDr[1].ToString());
}
Debug.WriteLine("Connection is " + oCn.State);
oDr.Close();
Debug.WriteLine("Connection is " + oCn.State);

One sign that connections are being held open longer than needed is that you find code that checks if the connection's ConnectionState is open or closed:

if (oCn.State == ConnectionState.Closed) {
    oCn.Open();
}

The presence of such code indicates that the connection is expected to be open in some cases when it gets to this point. One exception to the guideline that could cause you to keep connections open longer is when you need to implement a transaction that spans multiple commands.

If you do not close a connection explicitly and instead wait for the garbage collector to release the resource, the connection will not be released back to the connection pool until the garbage collector comes for it, and you can't be sure when that will be.

Returning a Single Column or Row

The ADO.NET Command object exposes multiple Execute methods. The ExecuteReader and ExecuteXmlReader methods are capable of returning multiple rows and columns to a DataReader and XmlReader, respectively. The Fill method of the DataAdapter can also return multiple rows and columns to a DataSet.

But when you want to execute an action query and do not want to return any data, use the Command object's ExecuteNonQuery method. This executes its command and doesn't bother with any of the overhead involved in returning data. This is ideal for updates and deletes when you want the SQL statement to be executed.

If you want to retrieve a single value, the ExecuteReader, ExecuteXmlReader, and Fill methods are usually overkill. If you use the DataAdapter's Fill method to fill a DataSet with a single column and a single row, you are also getting all of the metadata and overhead that comes with the more powerful DataSet object. It would be more efficient to use the Command object's ExecuteScalar method. You could execute the same SELECT statement and retrieve the single value into a variable. To demonstrate, notice the two examples in Figure 4 that use both techniques to execute a SQL statement that returns a single value.

Figure 4 Returning a Single Value

Using SqlCommand.ExecuteScalar

private void UseExecuteScalar() {
    string sConnString = 
        "Server=(local);Database=Northwind;Integrated Security=True;";
    string sSQL = 
        "SELECT CompanyName FROM Customers WHERE CustomerID = 'ALFKI'";
    using (SqlConnection oCn = new SqlConnection(sConnString)) {    
        using (SqlCommand oSelCmd = new SqlCommand(sSQL, oCn)) {
            oCn.Open();
            string sCompany = oSelCmd.ExecuteScalar().ToString();
            oCn.Close();
        }
    }
}

Using SqlDataAdapter.Fill

private void UseFill() {
    string sConnString = 
        "Server=(local);Database=Northwind;Integrated Security=True;";
    string sSQL = 
        "SELECT CompanyName FROM Customers WHERE CustomerID = 'ALFKI'";
    using (SqlConnection oCn = new SqlConnection(sConnString)) {
        using (SqlCommand oSelCmd = new SqlCommand(sSQL, oCn)) {
            SqlDataAdapter oDa = new SqlDataAdapter();
            DataSet oDs = new DataSet();
            oDa.SelectCommand = oSelCmd;
            oDa.Fill(oDs);
        }
    }
}

If you want to return more than one column from a single row, you could use the Fill method of the DataAdapter to fill a DataSet. Alternatively, instead of returning a rowset from a stored procedure you could return the values in output parameters. For example, instead of using this stored procedure

CREATE PROCEDURE prGet_CustomerRowSet
    @sCustomerID NCHAR(5)
AS
    SELECT CompanyName, ContactName, City 
    FROM Customers 
    WHERE CustomerID = @sCustomerID
GO

you might want to consider using the stored procedure in Figure 5.

Figure 5 Taking Advantage of Output Parameters

CREATE PROCEDURE prGet_Customer
    @sCustomerID NCHAR(5),
    @sCompanyName NVARCHAR(40) OUTPUT, 
    @sContactName NVARCHAR(30) OUTPUT, 
    @sCity NVARCHAR(15) OUTPUT

AS
    SELECT @sCompanyName = CompanyName, 
           @sContactName = ContactName, 
           @sCity = City
    FROM Customers 
    WHERE CustomerID = @sCustomerID
GO

When called from ADO.NET, the first stored procedure could be used to fill a DataSet from a DataAdapter. The DataSet would contain the data and all of the supporting metadata as overhead. The second stored procedure can be called from ADO.NET using a Command object. Its values can be retrieved using the Command object's parameter collection. The prGet_Customer stored procedure is more efficient since it does not need to pass a rowset nor does the client code have to bear the weight of the extra overhead that comes with the DataSet. The code in Figure 6 could be used to retrieve the output parameters from the second stored procedure (prGet_Customer). The three values can be stored in variables requiring less than 100 bytes. In contrast, if you return the same three values into a DataSet from a DataAdapter using the first stored procedure, the DataSet and its schema will be almost 1,000 bytes in size. If all you need are a handful of values from a single row, using the ExecuteScalar (for 1 value) or output parameters executes faster than retrieving a rowset into a DataSet, and then having to pass that data to another application tier.

Figure 6 Retrieving Values via Output Parameters

private void GetOutputValues()
{
    string sConnString = 
        "Server=(local);Database=Northwind;Integrated Security=True;";
    string sProc = "prGet_Customer";
    using (SqlConnection oCn = new SqlConnection(sConnString))
    {
        using (SqlCommand oCmd = new SqlCommand(sProc, oCn))
        {
            oCn.Open();
            oCmd.CommandType = CommandType.StoredProcedure;

            oCmd.Parameters.Add("@sCustomerID", SqlDbType.NChar, 5);
            oCmd.Parameters["@sCustomerID"].Value = "ALFKI";

            oCmd.Parameters.Add("@sCompanyName", SqlDbType.NVarChar, 40);
            oCmd.Parameters["@sCompanyName"].Direction = 
                ParameterDirection.Output;

            oCmd.Parameters.Add("@sContactName", SqlDbType.NVarChar, 30);
            oCmd.Parameters["@sContactName"].Direction = 
                ParameterDirection.Output;

            oCmd.Parameters.Add("@sCity", SqlDbType.NVarChar, 15);
            oCmd.Parameters["@sCity"].Direction = 
                ParameterDirection.Output;

            oCmd.ExecuteNonQuery();
            oCn.Close();

            string sCompanyName = 
                oCmd.Parameters["@sCompanyName"].Value.ToString();
            string sContacName = 
                oCmd.Parameters["@sContactName"].Value.ToString();
            string sCity = oCmd.Parameters["@sCity"].Value.ToString();
        }
    }
}

Inefficient Queries

So often when I examine SQL and stored procedures I notice that more data is being retrieved than is being used. A SELECT statement should only return the columns that are going to be used, possibly for display purposes or some business logic. For example, suppose I return 100 rows of data and you retrieve two extra integer columns that are not used anywhere. Each integer column is 4 bytes, so I am returning 8 bytes per row or 800 extra unused bytes of data. Then, when inserted into a DataSet, which may be passed to another application layer, the XML and supporting metadata that is needed for the extra columns adds to that total.

So now I am returning superfluous data from the database server to my business logic server and then passing extra data around as XML to services that may very well exist on other machines across the network. Often I hear that the data size is so small that it makes no noticeable difference in performance. Of course that argument may be valid when there is a single user hitting the application. But when you try to scale the application to thousands of users, passing unnecessary data across a network can slam performance.

Another issue is when a SELECT statement retrieves more rows than it needs. For example, returning 100 rows is overkill when you will only display the top 10. It's better to trim the SELECT statement to retrieve only the top 10 records, using the TOP keyword.

The CommandBuilder object is often used in demonstration code to show how it can automatically figure out how to execute INSERT, UPDATE, and DELETE statements. It takes the SELECT statement from a DataAdapter and tries to generate the other commands for you. This is going to be slower than if you coded the commands yourself because the CommandBuilder has to figure it out (which takes time) and generally creates less efficient queries than you would create. I recommend avoiding the CommandBuilder object entirely at run time.

Another key point I look for in reviews of SQL code and stored procedures is excessive joins. They're often found in SELECT statements when a table was needed at one time to retrieve a column, but since that time the column has been removed but the join to the table remains. I also look out for extra joins, particularly when joining a table whose column can already be accessed without having to use a join.

Commands and Parameters

When a SQL statement is executed, the database has to generate an execution plan for it. If that SQL statement is run repeatedly, each time it is executed the database might have to regenerate the query's execution plan. In these cases where SQL is run frequently, moving the SQL to a stored procedure can offer greater performance (not to mention greater security). The first time a stored procedure is executed, the database generates a query execution plan, stores that plan in the procedure cache, and then executes the stored procedure. On subsequent calls of the stored procedure, the database engine only has to grab the query plan from the procedure cache and rerun the stored procedure. Thus, it skips the step of devising the query plan on subsequent calls. For the increased performance, I always suggest using stored procedures when it is an available option in a database's architecture.

Another secondary option is to execute a SQL statement using the SqlCommand object and calling its Prepare method. If the data provider supports it, calling the Prepare method tells the data provider to get ready for this SQL statement to be executed multiple times. The SQL statement's query execution plan will then be stored by the database so that subsequent executions will run faster. However, executing the Prepare method should only be used in situations where the SQL will be executed multiple times. The process of preparing the SQL statement requires a bit of overhead and thus if the statements are only executed once, performance will degrade. Figure 7 demonstrates how an UPDATE statement can be prepared and executed twice.

Figure 7 Executing a Prepared Command

private void CallPreparedCmd() {
    string sConnString = 
        "Server=(local);Database=Northwind;Integrated Security=True;";
    string sSQL = 
        "UPDATE Customers SET City=@sCity WHERE CustomerID=@sCustomerID";
    using (SqlConnection oCn = new SqlConnection(sConnString)) {
        using (SqlCommand oCmd = new SqlCommand(sSQL, oCn)) {
            oCmd.CommandType = CommandType.Text;

            oCmd.Parameters.Add("@sCustomerID", SqlDbType.NChar, 5);
            oCmd.Parameters.Add("@sCity", SqlDbType.NVarChar, 15);

            oCn.Open();
            oCmd.Prepare();

            oCmd.Parameters["@sCustomerID"].Value = "ALFKI";
            oCmd.Parameters["@sCity"].Value = "Berlin2";
            oCmd.ExecuteNonQuery();

            oCmd.Parameters["@sCustomerID"].Value = "CHOPS";
            oCmd.Parameters["@sCity"].Value = "Bern2";
            oCmd.ExecuteNonQuery();

            oCn.Close();
        }
    }
}

One setting that is often omitted is the Command object's CommandType. This property helps the Command object process the command by telling it ahead of time whether the command it is going to execute is a stored procedure, the name of a table, or a free-form text command. If you do not set this property, the Command object defaults this property to CommandType.Text; while it will still work with stored procedures, it will execute more efficiently if you tell it ahead of time:

oCmd.CommandType = CommandType.StoredProcedure;

I also look for whether the Parameter object could be used in a query instead of hardcoding parameter values in a SQL statement. In Figure 7 the UPDATE statement runs twice, so it makes sense that the Parameter object be used for the values of the City and CustomerID. If this was not a parameterized query, preparing the SQL would not make much sense in this case since the SQL statements would then differ and preparing a statement and only running it once is inefficient. Also, by using a parameterized query you remove the need for nuisances like embedded quotes. Notice that in Figure 5 the values do not have embedded quotes around them, nor do the placeholders for the parameters in the SQL string. If I were to replace the parameter placeholders with the values themselves, I would have to include the embedded quotes, as follows:

string sSQL = "UPDATE Customers SET City = 'Berlin2' 
WHERE CustomerID = 'ALFKI'";

Using parameterized SQL statements and stored procedures with Command objects is also good for security as they can deter SQL injection attacks, which occur when a malicious user enters input that is used to perform actions other than what was intended. Parameter objects treat the values of the parameters as literal values and not as executable code. So even if SQL code is injected, it would not be executed but rather used as a parameter's value. For a more in-depth look at SQL injection attacks, see Paul Litwin's article in the September 2004 issue of MSDN®Magazine, available at Data Security: Stop SQL Injection Attacks Before They Stop You.

Using embedded values for the parameters can easily get out of hand as the statement grows in size and the number of parameters increases. Any developer who has run SQL statements by concatenating strings together has likely been frustrated at one time or another by trying to get the embedded quotes in the right places. Imagine the following code:

System.Text.StringBuilder oSb = new System.Text.StringBuilder();
oSb.Append("UPDATE Customers SET City = 'Berlin2', ");
oSb.Append("Address = 'Some address', ContactName = 'Some name' ");
oSb.Append("ContactTitle = 'Some title', CompanyName = 'Some company'");
oSb.Append("WHERE CustomerID = 'ALFKI'");

The StringBuilder can make the operation more efficient in certain scenarios, but you still have to be careful to embed the single quotes in the right places. With parameterized queries, this headache is gone.

This leads me to the topic of concatenating fragments of SQL code together in a string. This is much more efficiently handled by using the StringBuilder object than by actually concatenating string fragments together since the string object is immutable. I recommend using the StringBuilder any time you are concatenating more than two strings. That said, I have to reiterate that you should only build up SQL strings like this when absolutely necessary. If done incorrectly, you can open yourself up to a wide variety of attacks, and Microsoft discourages this practice whenever possible.

ADO.NET Transactions

Data transactions are integral to most enterprise applications. However, transactions are an area that can easily become inefficient when unmanaged. Like connections, transactions should be kept as short as possible. Begin them as late as possible and commit or roll them back as soon as possible. Between the time the transaction begins and the time it ends only the crucial INSERT, UPDATE, and DELETE SQL statements should execute. Non-SQL modifying code should be avoided within the transaction since the transaction holds locks open and reduces concurrency.

Creating explicit transactions should be avoided if only a single UPDATE statement is being executed, for example. In this case an implicit transaction is already surrounding the UPDATE statement so creating an explicit transaction adds additional and unneeded overhead. However, if a second command is executed and needs to be included in the atomic transaction along with the first UPDATE command, then a transaction should be used. Take a look at the code in Figure 8 to see how multiple commands can be assigned to a single transaction.

Figure 8 Multiple Commands One Transaction

SqlTransaction oTran = null;
try {
    oCn.Open();
    oTran = oCn.BeginTransaction();

    oCmd1.Transaction = oTran;
    oCmd1.ExecuteNonQuery();

    oCmd2.Transaction = oTran;
    oCmd2.ExecuteNonQuery();

    oTran.Commit();
}
catch {
    oTran.Rollback();
    throw;
}
finally {
    oCn.Close();
}

Transaction objects are only available if the data provider exposes them. These objects must be explicitly created and then assigned to the appropriate Command objects. The Transaction object can then be committed or rolled back, as shown in Figure 8. Also, make sure you clean up your resources in the finally blocks. The figure shows how a Connection object is closed regardless of whether the transaction succeeds or fails.

Wrapping Up

It has often been said that it's the little things that matter. Of course, that sentiment was never intended to refer to the creation of enterprise software applications, but nevertheless it certainly applies. Taken in isolation, these little examples of inefficiencies barely make a noticeable difference in an application on a developer's computer. However, remember that once you throw dozens, hundreds, or thousands of concurrent users at it, these little inefficiencies can really slow down your application and inconvenience your users. I often create a checklist of these items that apply to an application's architecture and use it as the basis for a code review. In addition to what I have discussed here, the DataSet, XML features, indexes, and batch SQL are all areas that can benefit from a little performance tweaking to improve data access efficiency. I will cover these topics in an upcoming installment of this column.

Stay tuned!

Send your questions and comments for John to  mmdata@microsoft.com.

John Papa is a baseball fanatic who spends most of his summer nights rooting for the Yankees with his two little girls, wife, and faithful dog, Kadi. He has authored several books on ADO, XML, and SQL Server and can often be found speaking at industry conferences such as VSLive.