Improving String Handling Performance in .NET Framework Applications

 

James Musson
Developer Services, Microsoft UK

April 2003

Applies to:
   Microsoft® .NET Framework®
   Microsoft Visual Basic .NET®
   Microsoft Visual C#®

Summary: Many .NET Framework applications use string concatenation to build representations of data, be it in XML, HTML or just some proprietary format. This article contains a comparison of using standard string concatenation and one of the classes provided by the .NET Framework specifically for this task, System.Text.StringBuilder, to create this data stream. A reasonable knowledge of .NET Framework programming is assumed. (11 printed pages)

Contents

Introduction
String Concatenation
What is a StringBuilder?
Creating the Test Harness
Testing
Results
Conclusion

Introduction

When writing .NET Framework applications, there is invariably some point at which the developer needs to create some string representation of data by concatenating other pieces of string data together. This is traditionally achieved by using one of the concatenation operators (either '&' or '+') repeatedly. When examining the performance and scalability characteristics of a wide range of applications in the past, it has been found that this is often an area where substantial gains in both performance and scalability can be made for very little extra development effort.

String Concatenation

Consider the following code fragment taken from a Visual Basic .NET class. The BuildXml1 function simply takes a number of iterations (Reps) and uses standard string concatenation to create an XML string with the required number of Order elements.

' build an Xml string using standard concatenation
Public Shared Function BuildXml1(ByVal Reps As Int32) As String
   Dim nRep As Int32
   Dim sXml As String

   For nRep = 1 To Reps
      sXml &= "<Order orderId=""" _
         & nRep _
         & """ orderDate=""" _
         & DateTime.Now.ToString() _
         & """ customerId=""" _
         & nRep _
         & """ productId=""" _
         & nRep _
         & """ productDescription=""" _
         & "This is the product with the Id: " _
         & nRep _
         & """ quantity=""" _
         & nRep _
         & """/>"
   Next nRep

   sXml = "<Orders method=""1"">" & sXml & "</Orders>"

   Return sXml

End Function

This equivalent Visual C# code is shown below.

// build an Xml string using standard concatenation
public static String BuildXml1(Int32 Reps)
{
   String sXml = "";
   
   for( Int32 nRep = 1; nRep<=Reps; nRep++ )
   {
      sXml += "<Order orderId=\"" 
         + nRep 
         + "\" orderDate=\"" 
         + DateTime.Now.ToString() 
         + "\" customerId=\"" 
         + nRep 
         + "\" productId=\"" 
         + nRep 
         + "\" productDescription=\"" 
         + "This is the product with the Id: " 
         + nRep 
         + "\" quantity=\"" 
         + nRep 
         + "\"/>";
   }
   sXml = "<Orders method=\"1\">" + sXml + "</Orders>";

   return sXml;
}

It is quite common to see this method used to build large pieces of string data in both .NET Framework applications and applications written in other environments. Obviously, XML data is used here simply as an example, and there are other, better methods for building XML strings provided by the.NET Framework, such as System.Xml.XmlTextWriter. The problem with the BuildXml1 code lies in the fact that the System.String data type exposed by the .NET Framework represents an immutable string. This means that every time the string data is changed, the original representation of the string in memory is destroyed and a new one is created containing the new string data, resulting in a memory allocation operation and a memory de-allocation operation. Of course, this is all taken care of behind the scenes, so the true cost is not immediately apparent. Allocating and de-allocating memory causes increased activity related to memory management and garbage collection within the Common Language Runtime (CLR) and thus can be expensive. This is especially apparent when strings get big and large blocks of memory are being and allocated and de-allocated in quick succession, as happens during heavy string concatenation. While this may present no major problems in a single user environment, it can cause serious performance and scalability issues when used in a server environment such as in an ASP.NET® application running on a Web server.

So, back to the code fragment above: how many string allocations are being performed here? In fact the answer is 14. In this situation every application of the '&' (or '+') operator causes the string pointed to by the variable sXml to be destroyed and recreated. As I have already mentioned, string allocation is expensive, becoming increasingly more so as the string grows, and this is the motivation for providing the StringBuilder class in the .NET Framework.

What is a StringBuilder?

The concept behind the StringBuilder class has been around for some time and my previous article, Improving String Handling Performance in ASP Applications, demonstrates how to write a StringBuilder using Visual Basic 6. The basic principle is that the StringBuilder maintains its own string buffer. Whenever an operation is performed on the StringBuilder that might change the length of the string data, the StringBuilder first checks that the buffer is large enough to hold the new string data, and if not, the buffer size is increased by a predetermined amount. The StringBuilder class provided by the .NET Framework also offers an efficient Replace method that can be used instead of String.Replace.

Figure 1 shows a comparison of what the memory usage pattern looks like for the standard concatenation method and the StringBuilder concatenation method. Notice that the standard concatenation method causes a new string to be created for every concatenation operation, whereas the StringBuilder uses the same string buffer each time.

Figure 1   Comparison of memory usage pattern between standard and StringBuilder concatenation

The code to build XML string data using the StringBuilder class is shown below in BuildXml2.

' build an Xml string using the StringBuilder
Public Shared Function BuildXml2(ByVal Reps As Int32) As String

   Dim nRep As Int32
   Dim oSB As StringBuilder

   ' make sure that the StringBuilder capacity is
   ' large enough for the resulting text
   oSB = New StringBuilder(Reps * 165)
   oSB.Append("<Orders method=""2"">")

   For nRep = 1 To Reps
      oSB.Append("<Order orderId=""")
      oSB.Append(nRep)
      oSB.Append(""" orderDate=""")
      oSB.Append(DateTime.Now.ToString())
      oSB.Append(""" customerId=""")
      oSB.Append(nRep)
      oSB.Append(""" productId=""")
      oSB.Append(nRep)
      oSB.Append(""" productDescription=""")
      oSB.Append("This is the product with the Id: ")
      oSB.Append(nRep)
      oSB.Append(""" quantity=""")
      oSB.Append(nRep)
      oSB.Append("""/>")
   Next nRep

   oSB.Append("</Orders>")

   Return oSB.ToString()

End Function

The equivalent Visual C# code is shown below.

// build an Xml string using the StringBuilder
public static String BuildXml2(Int32 Reps)
{
   // make sure that the StringBuilder capacity is
   // large enough for the resulting text
   StringBuilder oSB = new StringBuilder(Reps * 165);
   oSB.Append("<Orders method=\"2\">");

   for( Int32 nRep = 1; nRep<=Reps; nRep++ )
   {
      oSB.Append("<Order orderId=\"");
      oSB.Append(nRep);
      oSB.Append("\" orderDate=\"");
      oSB.Append(DateTime.Now.ToString());
      oSB.Append("\" customerId=\"");
      oSB.Append(nRep);
      oSB.Append("\" productId=\"");
      oSB.Append(nRep);
      oSB.Append("\" productDescription=\"");
      oSB.Append("This is the product with the Id: ");
      oSB.Append(nRep);
      oSB.Append("\" quantity=\"");
      oSB.Append(nRep);
      oSB.Append("\"/>");
   }
   oSB.Append("</Orders>");

   return oSB.ToString();
}

How the StringBuilder method performs against the standard concatenation method depends on a number of factors, including the number of concatenations, the size of the string being built, and how well the initialization parameters for the StringBuilder buffer are chosen. Note that in most cases it is going to be far better to overestimate the amount of space needed in the buffer than to have it grow often.

Creating the Test Harness

I decided that I wanted to test the two string concatenation methods using Application Center Test® (ACT) and this implies that the methods should be exposed by an ASP.NET Web application. Because I didn't want the processing involved in creating an ASP.NET page for each request to show up in my results, I created and registered an HttpHandler that accepted requests for my logical URL, StringBuilderTest.jemx, and called the relevant BuildXml function. Although a detailed discussion of HttpHandlers is outside the scope of this article, I have included the code for my test below.

Public Class StringBuilderTestHandler
   Implements IHttpHandler

   Public Sub ProcessRequest(ByVal context As HttpContext) _
         Implements IHttpHandler.ProcessRequest

   Dim nMethod As Int32
   Dim nReps As Int32

   ' retrieve test params from the querystring
   If Not context.Request.QueryString("method") Is Nothing Then
      nMethod = Int32.Parse( _
         context.Request.QueryString("method").ToString())
   Else
      nMethod = 0
   End If
   If Not context.Request.QueryString("reps") Is Nothing Then
      nReps = Int32.Parse( _
         context.Request.QueryString("reps").ToString())
   Else
      nReps = 0
   End If

   context.Response.ContentType = "text/xml"
   context.Response.Write( _
      "<?xml version=""1.0"" encoding=""utf-8"" ?>")

   ' write the Xml to the response stream
   Select Case nMethod
      Case 1
         context.Response.Write( _
            StringBuilderTest.BuildXml1(nReps))
      Case 2
         context.Response.Write( _
            StringBuilderTest.BuildXml2(nReps))
   End Select

   End Sub

   Public ReadOnly Property IsReusable() As Boolean _
         Implements IHttpHandler.IsReusable
      Get
         Return True
      End Get
   End Property

End Class

The equivalent Visual C# code is shown below.

public class StringBuilderTestHandler : IHttpHandler
{
   public void ProcessRequest(HttpContext context) 
   {
      Int32 nMethod = 0;
      Int32 nReps = 0;

      // retrieve test params from the querystring
      if( context.Request.QueryString["method"]!=null )
         nMethod = Int32.Parse(
         context.Request.QueryString["method"].ToString());
      
      if( context.Request.QueryString["reps"]!=null )
         nReps = Int32.Parse(
         context.Request.QueryString["reps"].ToString());
      
      // write the Xml to the response stream
      context.Response.ContentType = "text/xml";
      context.Response.Write(
         "<?xml version=\"1.0\" encoding=\"utf-8\" ?>");
      switch( nMethod )
      {
         case 1 :
            context.Response.Write(
               StringBuilderTest.BuildXml1(nReps));
            break;
         case 2 :
            context.Response.Write(
               StringBuilderTest.BuildXml2(nReps));
            break;
      }
   }

   public Boolean IsReusable { get{ return true; } }
}

The ASP.NET HttpPipeline creates an instance of StringBuilderTestHandler and invokes the ProcessRequest method for each HTTP request to StringBuilderTest.jemx. ProcessRequest simply extracts a couple of parameters from the query string and chooses the correct BuildXml function to invoke. The return value from the BuildXml function is passed back into the Response stream after creating some header information.

For more information about HttpHandlers please see the IHttpHandler documentation.

Testing

The tests were performed using ACT from a single client (Windows® XP Professional, PIII-850MHz, 512MB RAM) against a single server (Windows Server 2003 Enterprise Edition, dual PIII-1000MHz, 512MB RAM) over a 100mbit/sec network. ACT was configured to use 5 threads so as to simulate a load of 5 users connecting to the web site. Each test consisted of a 10-second warm-up period followed by a 50-second load period in which as many requests as possible were made.

The test runs were repeated for various numbers of concatenation operations by varying the number of iterations in the main loop as shown in the code fragments for the BuildXml functions.

Results

Below is a series of charts showing the effect of each method on the throughput of the application and also the response time for the XML data stream to be served back to the client. This gives some idea of how many requests the application could process and also how long the users, or client applications, would be waiting to receive the data.

Table 1   Key to concatenation method abbreviations used

Method Abbreviation Description
CAT Standard string concatenation method (BuildXml1)
BLDR StringBuilder method (BuildXml2)

While this test is far from realistic in terms of simulating the workload for a typical application, it is evident from Table 2 that even at 425 repetitions the XML data string is not particularly large; there are many applications where the average size of data transmissions fall in the higher ranges of these figures and above.

Table 2   XML string sizes and number of concatenations for test samples

No of iterations No of concatenations XML string size (bytes)
25 350 3,897
75 1,050 11,647
125 1,750 19,527
175 2,450 27,527
225 3,150 35,527
275 3,850 43,527
325 4,550 51,527
375 5,250 59,527
425 5,950 67,527

Figure 2   Chart showing throughput results

Figure 3   Chart showing response time results

As we can clearly see from Figures 2 and 3, the StringBuilder method (BLDR) outperforms the standard concatenation (CAT) method both in terms of how many requests can be processed and the elapsed time required to start generating a response back to the client (represented by Time To First Byte, or TTFB, on the graph). At 425 iterations the StringBuilder method is processing 17 times more requests and taking just 3% of the elapsed time for each request as compared with the standard concatenation method.

Figure 4   Chart giving an indication of system health during the tests

Figure 4 gives some indication of the load the server was under during the testing. It is interesting to note that as well as outperforming the standard concatenation method (CAT) at every stage, the StringBuilder method (BLDR) also caused considerably less CPU usage and time to be spent in Garbage Collection. While this does not actually prove that the resources on the server were used more effectively during StringBuilder operations, it certainly does strongly suggest it.

Conclusion

The conclusion to be drawn from these test results is really very straightforward. You should be using the StringBuilder class for all but the most trivial string concatenation (or replace) operations. The extra effort required to use the StringBuilder class is negligible and is far outweighed by the potential performance and scalability benefits to be gained.