System.Messaging Performance

 

Ingo Rammer
thinktecture

Richard Turner
Program Manager
Microsoft Distributed Systems Group

Summary: Investigate the performance characteristics of .NET System.Messaging compared to the native MSMQ COM API for different messaging patterns and requirements. Get guidelines on how to create a high-performance messaging application. (28 printed pages)

Download the associated code sample, SystemMessagingPerformanceTests.msi.

Contents

Introduction
Approach
Software and Hardware Setup
   Hardware/Software Specifications
   Test Application
Managed vs. Unmanaged Performance
   Payloads
   Queue and Message Types
   Timing ... or What Exactly We Measured
   Payload Test Results
Optimizations for Managed-Code Messaging Applications
   Choosing the Right Message Type
   Choosing the Best Message Formatter in .NET
   One-to-Many Messaging
   Always Send Remote/Read Local
   More Performance Tips
About the Test Application
   Parameters
Appendix A – Raw Test Results
   Send Empty Message
   Receive Empty Message
   Complete Processing of Empty Messages
   Express Send (Local)
   Express Receive (Local)
   Complete Processing of Express Messages (Remote)
   IPersistStream vs. .NET Serialization (Send)
   IPersistStream vs. .NET Serialization (Receive)
   .NET Formatter Comparison (Enqueuing)
   .NET Formatter Comparison (Dequeuing)

Introduction

With Microsoft Message Queuing (MSMQ) and its .NET API System.Messaging, you have the infrastructure necessary to create reliable, scalable, high-performance messaging applications. Even though most of these applications are developed on inherently asychronous invocation and communication patterns, performance nevertheless plays a critical role in most production environments.

If you have used message queuing in the unmanaged world, you might especially wonder how the .NET environment affects your application's performance. You might also be interested in knowing how to most efficiently pass your custom data between different queues in the .NET environment.

In this white paper, we compare the performance of the unmanaged MSMQ COM interfaces with the .NET implementation in System.Messaging and provide guidance on how to improve your messaging application's performance. The tests will highlight the differences in raw transport speed between the managed and unmanaged world, as well as the difference in serialization techniques offered by the two environments. The latter is especially important, because in most asynchronous messaging applications, the critical factor is the time necessary to enqueue messages. The complete processing time, by contrast, is only of secondary concern.

Approach

As in Performance of ASP.NET Web Services, Enterprise Services, and .NET Remoting, we carefully considered whether this paper should take the form of a detailed benchmark illustrating how to achieve the fastest performance possible, or whether we should provide a paper that illustrates good, sound, reproducible "relative performance" comparisons for the majority of business application scenarios. We decided upon the latter approach because of its applicability to as broad an audience as possible, and in order to accurately represent the real-life performance characteristics of .NET System.Messaging.

Our goals for this paper are:

  1. To investigate what the relative performance differences are between System.Messaging and MSMQ COM APIs

  2. To clarify the real-life performance characteristics of .NET vs. MSMQ COM

  3. To help guide your decisions about where, when, and how to utilize these technologies most appropriately

  4. To provide a test application allowing you to run these tests on your own machines and in your own environments. We strongly encourage you to build and run this test environment and to investigate and analyze the performance characteristics of these technologies. Only then will you be able to fully comprehend the many factors that affect performance of message-queuing systems.

    Note This paper does not discuss the performance characteristics of native MSMQ Win32/"C" APIs. Understand that this is the lowest-level MSMQ API, and is what both the .NET and MSMQ COM implementations build upon. If performance is your primary concern, you may want to investigate the significant performance improvements you may receive by carefully coding to MSMQ Win32/"C" APIs. However, understand that this performance benefit comes at a price—you will have to spend much more time and effort writing, testing, securing, and deploying such code.

Software and Hardware Setup

The following tests have been performed with two machines: a sender/client and a receiver/server.

Hardware/Software Specifications

Both the sender and the receiver machines were configured as follows:

  • CPU:2.8 GHz Intel P4 Prescott, w/HT, 800 Mhz FSB
  • Hard drive:40 GB UltraDMA 100, ExcelStor Technology J340
  • Chipset:Intel 865G/ICH 5
  • Memory:1024 MB RAM (PC3200)
  • OS:Windows Server 2003 Standard Edition with MSMQ 3.0

Test Application

The test applications are written as single-threaded C++ and .NET applications, created and compiled in Release mode with Visual Studio 2003. We opted for single-threaded test drivers to provide an even comparison of the relative levels of performance of the managed versus the native implementation, rather than saturating the system to try and achieve the absolute maximum level of performance.

Note that all tests have been performed with the maximum amount of physical memory available—memory faults and paging have been rare in our tests. Also note that MSMQ does not page messaging data for express messages in these tests.

Please note that the test code was written in a way that you are likely encounter in typical enterprise applications. It is possible to enhance the COM API's performance, for example, by manually marshalling char* strings into a SafeArray (VT_I1) to avoid the performance hit of the VARIANT-intrinsic conversion of strings to a double-byte Unicode format. However, we have rarely seen such code implemented in typical enterprise applications. Therefore, the test applications do not use such "rare tweaks", but we encourage you to explore the benefits of doing so when you further use the test apps!

Managed vs. Unmanaged Performance

Before performance testing messaging applications, it is important to realize that different applications have different infrastructure requirements. Using MSMQ, you can decide for each message whether messages are transferred in memory only (express messages), or whether they are stored persistently on disk before traversing to the next node (recoverable messages)—this helps apps survive machine failure. You must also determine whether a queue supports transactional messages. All ofthese decisions will directly affect the number of messages that can be transferred in a given period of time.

To cover most scenarios, we have separated the tests into a combination of message payloads and messaging options.

Payloads

We will create four payloads of various sizes:

  • Raw infrastructure: Messages with no body
  • Transport: Preformatted strings of various sizes. This lets us test the infrastructure and transport with minimal marshalling/serialization overhead. The strings will have the following lengths: 500, 1,000, 10,000, 100,000, 1,000,000, and 2,000,000 characters.
  • XML messages: Here we'll compare the sending and receiving of data encoded by the .NET XML Serializer and by the COM-based MSXML engine.
  • Serialized Objects: Here we'll compare .NET Object Serialization and COM IPersistStream mechanisms.

Queue and Message Types

We will test each of the previously mentioned payload types with the following message and queue options, each for local and remote enqueuing and dequeuing

  • Express
  • Recoverable
  • Transacted (only local dequeuing operations are supported here)

Timing ... or What Exactly We Measured

In conventional distributed applications based on synchronous calls being passed via RPC, DCOM, .NET Remoting, or ASP.NET Web services, the most important performance-critical measurement is the complete response time for a single request-reply call. In asynchronous messaging environments, however, this request-reply model is often of lesser or little value. Most asynchronous messaging applications would be interested in these three timings instead:

  • Fire and forget: Sending a number of messages to a local or remote queue
  • Receive only: Retrieving a number of existing messages from a queue
  • End-to-end: Sending messages, retrieving them at the receiving end, and fully deserializing each message into an equivalent representation (i.e., a string into a string, or a COM-object that implements IPersistStream into a deserialized copy of the object).

Payload Test Results

In the following section, you'll see the results of our performance tests.

Test 1 – Empty Messages

In the first test, we measured the number of empty messages per second accepted for delivery to a local or remote queue. The results of this test are in Figure 1.

Figure 1. Enqueuing empty messages

As you can see in this graph, the .NET Framework System.Messaging API performs very slightly better than the COM API for the sending of express messages, but there is virtually no difference for recoverable and transactional messages. This indicates that the .NET infrastructure is on par with MSMQ COM infrastructure when sending express messages.

Figure 2. Dequeuing empty messages

In Figure 2, you'll see that the timing results are comparable for the process of dequeuing messages, except in the case of local express messages, where the difference between the two APIs is considerably larger.

In the final part of this test, we used a server application to process incoming messages as soon as they were sent. The server would send a confirmation message back to the client as soon as the complete batch for one subtest had been successfully processed. There were, therefore, no direct request/response semantics for each message, as this would be adverse to the primary design goals of asynchronous messaging applications. You can see the results in Figure 3 below.

Figure 3. Complete processing of empty messages

It is clear from these results that the .NET Framework's quicker dequeuing capabilities also allow for a quicker complete processing time compared to the COM API.

Test 2 – Sending Strings of Various Sizes

The following larger collection of tests highlights differences in message-transmission performance depending on the size of the payload. To perform this test, simply allocate strings of the required sizes (500, 1,000, 2,000, 10,000, 100,000, 1,000,000, and 2,000,000 characters) and set the message's body to each of these strings. We rely on the internal formatting capabilities of the .NET Framework (using the BinaryMessageFormatter) and the COM API.

The C# source code used to send a similar, non-transactional message containing a 1,000-character string looks like this:

MessageQueue que = new MessageQueue( ... );

// open queue, etc.

String bodyString = new String('x', 1000);

m.Formatter = new BinaryMessageFormatter();
m.Label = "Test";
m.Body = bodyString;
m.Recoverable = false; // Depending on configuration
que.Send(m);

For C++, you can use code similar to the following. Please note that we've been using CoGetClassObject() instead of CoCreateInstance() here to allow for quicker execution of multiple messages:

MSMQ::IMSMQQueue3* pQueue;

// ... open the queue ...

HRESULT hr;
MSMQ::IMSMQMessage3* pMsg;
IClassFactory* pFact;

CString bodyString ('x',1000);

hr = CoGetClassObject(CLSID_MSMQMessage, CLSCTX_ALL, NULL, 
         IID_IClassFactory, reinterpret_cast<void**>(&pFact));

if (FAILED(hr)) exit(1);

hr = pFact->CreateInstance(NULL, IID_IMSMQMessage3, 
         reinterpret_cast<void**>(&pMsg));

if(FAILED(hr)) exit(1);

pMsg->Label = L"Test";
_variant_t var(bodyString);
pMsg->Body = var;
pMsg->Delivery = MSMQ::MQMSG_DELIVERY_EXPRESS;

hr = pMsg->Send(pQueue);
if (FAILED(hr)) exit(1);

pMsg->Release();
pQueue->Release();

When performing tests with different message sizes, you will encounter results like those shown in Figure 4.

Figure 4. Enqueuing predefined strings in Express mode

As you can see, the COM API performs somewhat better when sending string messages of any size in express mode.

In contrast to its performance when sending, .NET performs slightly better than COM when retrieving messages of this type from queues, as illustrated in Figure 5.

Figure 5. Dequeuing predefined strings in Express mode

You can also see that there is not a significant difference between the two platforms, especially when you factor in testing variances and tolerances.

For remote queues, the relative differences between COM and .NET are similar to those for local queues.

All the tests so far have only highlighted parts of a messaging solution—send or receive. For a better understanding of the relative performance difference for real applications, we need to record the complete processing time for a batch of messages on a remote system. For these tests, the client sends a number of messages to the server, where they will be dequeued as quickly as possible. To do our performance measurements, the server sends an acknowledgment back to the server as soon as it is able to process all messages. The results of this test are in Figure 6.

Figure 6. Complete processing times for strings of various sizes

These results clearly illustrate that .NET offers a substantial overall throughput improvement over native MSMQ COM when passing string text as the message body. This is caused largely by the fact that, by default, MSMQ COM handles all strings as double-byte Unicode, whereas .NET encodes strings as more efficient UTF-8, considerably reducing the size of the strings.

Test 3 – Serialized Objects

One of the predominant ways of exchanging data via message queuing is to use the built-in serialization capabilities offered by the .NET Framework. The .NET Framework allows you to pick a standard formatter that works without any custom serialization code. For COM, this usually means a custom implementation of IPersistStream.

To perform the following tests, we'll transfer messages based on the following Order object, including 50 LineItem subjects. We've also created a comparable COM version of these classes that implement IPersistStream.

[Serializable]
public class Order
{
   public DateTime Date;
   public int CustomerID;
   public Address ShippingAddress;
   public Address BillingAddress;
   public double Total;

   [XmlArrayItem(typeof(LineItem))]
   public ArrayList LineItems;
}

[Serializable]
public class Address
{
   public String Firstname;
   public String Lastname;
   public string Company;
   public string City;
   public string Street;
   public string ZIPCode;
   public string Country;
   public string State;
}

[Serializable]
public class LineItem
{
   public int ArticleID;
   public String Name;
   public double Price;
   public double Quantity;
   public double LineTotal;
}

Using the automatic serialization for objects like these shows you one of the bigger benefits of using the .NET Framework, as the built-in serialization capabilities take care of a lot of things for you.

In the .NET version, we used code similar to the following to send messages that contained a serialized representation of an Order object:

Order o = new Order( ... );

// populate the order;

MessageQueue que = new MessageQueue( ... );

// open queue, etc.

m.Formatter = new BinaryMessageFormatter();
m.Label = "Test";
m.Body = ord;
m.Recoverable = false; // Depending on configuration
que.Send(m);

When using the COM API, you can take a COM object that implements IPersistStream and wrap its IUnknown interface pointer in a VARIANT. The MSQM COM API will then take care of creating a stream to serialize and deserialize these for you:

IOrder* pOrd;

// create and populate the order

MSMQ::IMSMQQueue3* pQueue;

// ... open the queue ...

HRESULT hr;
MSMQ::IMSMQMessage3* pMsg;
IClassFactory* pFact;

CString bodyString ('x',1000);

hr = CoGetClassObject(CLSID_MSMQMessage, CLSCTX_ALL, NULL, 
         IID_IClassFactory, reinterpret_cast<void**>(&pFact));

if (FAILED(hr)) exit(1);

hr = pFact->CreateInstance(NULL, IID_IMSMQMessage3, 
         reinterpret_cast<void**>(&pMsg));

if(FAILED(hr)) exit(1);

pMsg->Label = L"Test";

hr=pOrd->QueryInterface(IID_IUnknown, reinterpret_cast<void**>(&pUnk));

if (FAILED(hr)) exit(1);

CComVariant var;
var.punkVal = pUnk;
var.vt = VT_UNKNOWN;
pMsg->Body = var;
pMsg->Delivery = MSMQ::MQMSG_DELIVERY_EXPRESS;

hr = pMsg->Send(pQueue);
if (FAILED(hr)) exit(1);

pMsg->Release();
pQueue->Release();

In Figures 7 and 8, you'll see that the huge development time benefit of the .NET Framework's built-in serialization capabilities takes its toll at runtime. When comparing it with an optimized IPersistStream implementation, you can see that the COM version is faster than the .NET version for enqueuing and dequeuing of messages.

Figure 7. Comparing serialization techniques for message enqueuing

Figure 8. Comparing serialization techniques for message dequeuing

It is important to understand where these considerable differences come from. The .NET Framework provides runtime serialization capabilities that can take any compatible type (for example, classes marked with [Serializable]) and use the built-in Reflection API for dynamic serialization and deserialization. These serializers, for example, will automatically store public and private fields, and also complete object graphs. The use of IPersistStream in the COM world, on the other hand, creates a very strict serialization implementation: You have to specify exactly how you'd like to store your data. The latter is, of course, usually faster.

The good news for your high-performance, critical application is that nothing prevents you from implementing a serialization technique in .NET that behaves like COM's IPersistStream.

We did this in the following tests by adding methods to save and load an object's state using a BinaryWriter/BinaryReader combination. You can see one of these implementations in the following code snippet:

[Serializable]
public class LineItem
{
   public int ArticleID;
   public String Name;
   public double Price;
   public double Quantity;
   public double LineTotal;

   public void Load(BinaryReader rdr)
   {
      ArticleID = rdr.ReadInt32();
      Name = rdr.ReadString();
      Price = rdr.ReadDouble();
      Quantity = rdr.ReadDouble();
      LineTotal = rdr.ReadDouble();
   }

   public void Save(BinaryWriter wrt)
   {
      wrt.Write(ArticleID);
      wrt.Write(Name);
      wrt.Write(Price);
      wrt.Write(Quantity);
      wrt.Write(LineTotal);
   }
}

Comparing this to relevant parts of our IPersistStream implementation in COM, you'll see that they now look remarkably similar:

bool CLineItem::InternalLoad( IStream *pStm )
{
   m_bstrName.ReadFromStream(pStm);
   pStm->Read(&m_nArticleID, sizeof(m_nArticleID),0);
   pStm->Read(&m_nPrice, sizeof(m_nPrice),0);
   pStm->Read(&m_nQuantity, sizeof(m_nQuantity),0);
   pStm->Read(&m_nLineTotal, sizeof(m_nLineTotal),0);
   return false;
}

bool CLineItem::InternalSave( IStream *pStm )
{
   m_bstrName.WriteToStream(pStm);
   pStm->Write(&m_nArticleID, sizeof(m_nArticleID),0);
   pStm->Write(&m_nPrice, sizeof(m_nPrice),0);
   pStm->Write(&m_nQuantity, sizeof(m_nQuantity),0);
   pStm->Write(&m_nLineTotal, sizeof(m_nLineTotal),0);
   pStm->Commit(STGC_DEFAULT);
   return true;
}

You can then use code like the following to send a message using this manually optimized serialization technique in .NET:

MessageQueue que = new MessageQueue ( ... );
using (Message m = new Message())
{
   MemoryStream ms = new MemoryStream();
   BinaryWriter wrt = new BinaryWriter(ms);
   ord.Save(wrt);
   wrt.Flush();
   ms.Flush();
   m.BodyStream= ms;
   m.Label = "Test";
   m.Recoverable = false;
   que.Send(m);
}

When receiving a message, you have to write code similar to the following:

MessageQueue que = new MessageQueue ( ... );
using (Message m = que.Receive())
{
   Order o = new Order();
   BinaryReader rdr = new BinaryReader(m.BodyStream);
   o.Load(rdr);
}

We added the methods Load() and Save() to all our business entity classes (Order, Address, and LineItem) before performing the next comparison, which details the performance differences between the original COM version and our newly optimized BinaryWriter/BinaryReader combination.

In Figures 9 and 10, you can see the large performance improvements that resulted from this change.

Figure 9. Using the manually optimized .NET serialization to enqueue messages

Figure 10. Using the manually optimized .NET serialization to dequeue messages

As you can see above, with just a little more effort than using the default .NET serializer (and practically the same amount of work as you'd use to implement IPersistStream in COM), you can dramatically improve the performance of your .NET-based queuing applications.

Optimizations for Managed Code Messaging Applications

As you have seen in the test results on the previous pages, the .NET Framework allows you to create messaging applications that perform as well as, or even better than, COM applications in a number of use cases. In addition to the performance improvements shown above, we'd like to present some additional tips and guidelines for improving your application's response time.

Choosing the Right Message Type

As you can see in the tests performed for this white paper, the biggest differences in performance result from the type of message chosen. MSMQ gives you access to the following types:

Express: These messages are stored only in memory. If the machine that queues a message is rebooted or struck by power failure, these messages will be lost. Express messages will, however, survive network outages as long as the computers themselves remain running. For example, you can unplug a network and queue a number of messages that will be transferred as soon as your machine is online again. Even though these messages are stored in memory, performance issues might turn up as soon as the real memory threshold is reached. In this case, the standard paging/swapping mechanism of the virtual memory manager could negatively affect your performance.

Recoverable: Recoverable messages are always stored on disk. If a machine is rebooted before it is able to deliver its messages to the final recipient, the state will be restored as soon as the MSMQ service is restarted. Recoverable messages are generally slower than express messages.

Transactional: Transactional queues guarantee exactly-once and in-order delivery of messages. In addition, they allow you to use database-style transactions to send and receive messages. For example, you can send several messages within one transaction with the guarantee that either all messages will be queued, or none at all, and that they will arrive in the same sequence. You can also include transactional interactions in MSDTC or COM+ distributed transactions to coordinate database access with message-queue operations. Transactional queues use the slowest mode of operation.

The performance of a message-queuing application depends on several factors. One of the most important is the selection of the correct messaging type. You should, however, take a close look at your application's infrastructure requirements before selecting express-mode messages. In this case, you have to take recovery logic into account, which could greatly increase your application's complexity, especially since you also need to test for all message-failure scenarios if you decide to do without recoverable messages.

Choosing the Best Message Formatter in .NET

One of the biggest advantages of .NET compared to COM-based messaging applications is the easy-to-use, flexible object-serialization framework. Unless you need the absolute best performance, you don't have to manually implement any persistence code like IPersistStream. Instead, you can rely on the built-in serialization and formatters.

The .NET Framework supports three message formatters for transparent serialization and deserialization of .NET objects. A third formatter is intended primarily for interaction with COM objects that implement IPersistStream, which we'll cover later. The choice of formatter is especially relevant if you communicate from .NET to .NET, as this gives you the flexibility to opt for a faster binary-serialization mechanism.

To test the difference between XML and binary serialization, we decided to use the same order class as above and just change the formatter. In Figure 11, the choice of formatter affects the enqueuing performance for messages when you transfer custom business objects. This is true for express, recoverable, and transactional messages.

Figure 11. Message enqueuing performance for different formatters

As you can see in Figure 12, there are even more dramatic performance differences for the complete processing of a message. Using the binary-message formatter for express messages allows you to process 77% more messages per second than the XML Formatter would allow.

Figure 12. Performance differences for complete processing

Optimization for COM-Compatible Types

If your message body is of a COM-compatible type (for example an int, double, string, or an object that implements IPersistStream), then you can take advantage of the .NET ActiveXMessageFormatter, which is the quickest. This formatter also allows you to exchange serialized COM objects between a classic COM client written in Visual Basic or C++ and a .NET-based application.

In Figures 13 and 14, you can see the performance differences between the BinaryMessageFormatter, the XmlMessageFormatter and the ActiveXMessageFormatter when transferring a message with a body of type System.Int32.

Figure 13. Enqueuing of COM-compatible messages

Figure 14. Dequeuing of COM-compatible messages

One-to-Many Messaging

MSMQ 3.0, which is available starting with Windows Server 2003 and Windows XP, introduces two new means of sending a certain message to multiple recipients. You can rely on IP multicasting if your network infrastructure supports this protocol and if you don't need specific delivery guarantees or transactional guarantees. When using IP multicasting to communicate with multiple recipients, each packet is sent via the network only once (no matter how many receivers) and placed in multiple queues on multiple machines on the receiver side. This implies, however, that there is no delivery or transactional guarantee—MSMQ IP multicast will not give you the possibility of determining whether a message has reached any of its intended recipients.

When using System.Messaging or the COM API to MSMQ 3.0, you can also use a new syntax for specifying multiple recipients. This, however, will be handled internally using conventional point-to-point connections, giving you the necessary delivery and transactional guarantees. Using this technique reduces the serialization overhead that would otherwise be necessary if you'd send the same message to multiple recipients.

To send a message to multiple queues, you can use the following syntax and separate the format names of the destination queues with "," (comma) in the format name:

String queues = "DIRECT=OS:localhost\\private$\\Q1," + 
                "DIRECT=OS:localhost\\private$\\Q2," +
                "DIRECT=OS:localhost\\private$\\Q3"

MessageQueue que = new MessageQueue("FORMATNAME:" + queues);

Message msg = new Message();
msg.Formatter = fmt;
msg.Label = "MULTI";
que.Send(msg);

Always Send Remote/Read Local

When MSMQ transfers outgoing messages from your machine to a remote destination machine, the following happens:

  • As soon as you send a message to a remote queue, for example, by specifying a format name such as "DIRECT=OS:remotehostname/private$/queuename," MSMQ will create a so-called "outgoing queue" on the sender's machine.
  • All messages that your client application sends to the remote queue will first be stored in this outgoing queue. Your client application never directly communicates with the remote server; it only talks to the local MSMQ instance running on your client machine. This allows for completely asynchronous decoupling, which is one of the great benefits of using MSMQ: your client can send messages whether the server is running or not.
  • MSMQ will then contact the remote server and will transfer the messages using an optimized internal protocol.

When receiving messages, however, things are a bit different:

  • As soon as you open a queue for receive access, you are always communicating with the "real" queue.
  • MSMQ in this case uses standard RPC instead of the optimized MSMQ-internal protocol used for send-access to communicate with the remote machine. This means that the remote machine has to be available and, even more important, that you have to live with a slower and chattier protocol.
  • If the remote server hangs, your client might even block until the connection can be restored. In addition, remote reads are only allowed for non-transactional queues.

These considerations lead to one of the most important guidelines of reliable messaging-based systems: Always send remote, read local. If some other machine sends information back to your client, it should do so by forwarding the messages to "DIRECT=OS:yourclient/private$/somequeue" instead of your client creating a connection to the originating machine. This is one of the fundamental design principles of MSMQ-based messaging applications. It should be neglected only if you have a very specific need to do so and only if you can live with the effects on the reliability of your application.

More Performance Tips

In this white paper, we compared the .NET API with the COM API, as empirical data tells us that the COM API is the most commonly used unmanaged interface to MSMQ. In addition to this interface, MSMQ also offers a low-level Win32 API.

The Win32 MSMQ API performs significantly better than COM and .NET, as these two higher-level APIs essentially wrap the underlying native Win32 API. If your native Windows application (written in an unmanaged language) depends on the best possible performance, it might be reasonable for you to look into the low-level Win32 API instead of the COM API.

To get the most out of the MSMQ infrastructure—independently from your chosen API—we recommend that you read the following additional white papers, which can be found at MSDN:

MSMQ Frequently Asked Questions (March 02, 2004)

MSMQ Best Practices (May 27, 2003)

Optimizing Message Queuing Performance (March 28, 2003)

These and other documents are also referenced from the MSMQ product web site.

About the Test Application

Both the .NET and the C# test applications can be started in different modes to test different scenarios. You can configure their behavior by passing several command-line arguments:

MsmqPerfTest.exe <TransactionMode> <Operation> <Queue> [<ResponseQueue>] 

Example:
MsmqPerfTest NOTX SENDANDPURGE DIRECT=OS:localhost\private$\myQueue

Parameters

TransactionMode

NOTX: No transactions are used.

TX: Built-in MSMQ transactions for every send and receive operation. Note: In this case, the specified queues must be transactional!

Operation

SENDANDPURGE: Send messages to a destination queue, measure the time to send all messages and purge the queue after each test.

SENDANDRECEIVE: Send messages to a destination queue, retrieve all messages afterward, and measure the receive-time.

SERVER: Start as a server that immediately processes messages as they arrive. The server will send only one message after each test batch (which can consist of several thousand messages) back to the client. The server has to be started first in a CLIENT/SERVER combination, as it will purge its queues before waiting for test messages.

CLIENT: Send messages to a defined queue and wait for a response after all messages in a batch have been processed.

Queue/ResponseQueue

A valid format name for an existing queue (for example DIRECT=OS:localhost\private$\myQueue). If you specify TX as the TransactionMode parameter, this has to be a transactional queue. In you specify NOTX, this has to be a non-transactional queue.

Note For CLIENT and SERVER operation modes, the first queue name on the command line refers to the server's queue. The second is the client's response queue. This parameter sequence is the same for client and server.

The following two command lines will start a matching pair of receiver and sender.

MsmqPerfTest NOTX SERVER DIRECT=OS:localhost\private$\server DIRECT=OS:localhost\private$\client

MsmqPerfTest NOTX CLIENT DIRECT=OS:localhost\private$\server DIRECT=OS:localhost\private$\client

Appendix A – Raw Test Results

Send Empty Message

    Msg Per Sec StdDev
Mode Local/Remote .NET COM .NET COM
Express Local 49,311 48,674 761.23 430.25
  Remote 25,257 24,864 199.42 93.04
Recoverable Local 2,455 2,457 11.21 11.76
  Remote 2,506 2,500 10.89 7.38
Transactional Local 1,683 1,724 37.65 36.59
  Remote 1,662 1,739 19.84 23.91

Receive Empty Message

    Msg Per Sec StdDev
Mode Local/Remote .NET COM .NET COM
Express Local 37,351 25,708 75.14 478.76
  Remote 2,095 2,074 18.09 13.54
Recoverable Local 2,955 2,796 36.59 38.49
  Remote 2,104 2,092 14.42 28.18
Transactional Local 2,006 1,942 11.16 26.04

Complete Processing of Empty Messages

    Msg Per Sec StdDev
Mode Local/Remote .NET COM .NET COM
Express Local 24,209 18,890 28.57 15.88
  Remote 11,184 11,026 94.85 33.82
Recoverable Local 1,953 2,039 272.5 354.8
  Remote 1,645 1,629 12.62 6.83
Transactional Local 1,414 1,522 18.37 34.87
  Remote 1,228 1,242 9.91 15.32

Express Send (Local)

  Msg/Sec StdDev
Size .NET COM .NET COM
500 19,814 28,572 196.96 1311.04
1,000 15,117 23,476 294.62 1098.6
2,000 10,722 15,345 229.27 1167.3
10,000 3,083 4,500 79.62 236.12
100,000 276 341 4.36 30.18
1,000,000 28.83 30.33 0.75 2.94
2,000,000 14.17 14.50 0.41 1.64

Express Receive (Local)

  Msg/Sec StdDev
Size .NET COM .NET COM
500 24,137 20,237 65.85 72.03
1,000 14,595 18,318 41.36 81.24
2,000 12,930 11,221 50.31 57.77
10,000 6,611 6,750 34.21 28.17
100,000 696 643 6.63 2.35
1,000,000 86.33 49.17 0.52 0.41
2,000,000 42.50 24.00 0.55 0

Complete Processing of Express Messages (Remote)

  Msg/Sec StdDev
Size .NET COM .NET COM
500 8,190 7,769 189.1 130.69
1,000 6,853 4,702 61.34 53.41
2,000 4,549 2,569 71.86 10.01
10,000 1,045 529 13.75 40.49
100,000 105 53 1.83 0.75
1,000,000 9.83 5.00 0.41 0
2,000,000 4.67 2.00 0.52 0

IPersistStream vs. .NET Serialization (Send)

    Msg Per Sec   StdDev  
Mode Local/Remote .NET COM .NET Optimized .NET COM .NET Optimized
Express Local 1,718 6,188 13,305 5.34 331.93 685.93
  Remote 1,441 4,523 9,129 3.29 44.34 247.91
Recoverable Local 910 1,490 2,236 26.65 57.45 43.44
  Remote 948 1,333 1,973 3.89 46.42 16.59
Transactional Local 791 1,261 1,660 18.1 13.26 81.88
  Remote 603 1,114 954 15.81 295.88 29.2

IPersistStream vs. .NET Serialization (Receive)

    Msg Per Sec   StdDev  
Mode Local/Remote .NET COM .NET Optimized .NET COM .NET Optimized
Express Local 1,870 3,049 10,441 1.9 10.59 16.65
  Remote 509 511 672 2.34 2.34 1.64
Recoverable Local 1,096 1,373 2,042 38.58 27.65 118.32
  Remote 505 528 690 4.18 2.88 1.83
Transactional Local 922 1,203 1,604 19.77 5.68 71.65

.NET Formatter Comparison (Enqueuing)

    Msg Per Sec StdDev
Mode Local/Remote .NET Binary .NET XML .NET Binary .NET XML
Express Local 1,718 1,344 5.34 15.38
  Remote 1,441 987 3.29 4.22
Recoverable Local 910 573 26.65 7.33
  Remote 948 682 3.89 5.59
Transactional Local 791 493 18.1 5.68
  Remote 603 495 15.81 13.99

.NET Formatter Comparison (Dequeuing)

    Msg Per Sec StdDev
Mode Local/Remote .NET Binary .NET XML .NET Binary .NET XML
Express Local 1,870 791 1.9 1.48
  Remote 509 282 2.34 1.14
Recoverable Local 1,096 594 38.58 7.19
  Remote 505 284 4.18 0.71
Transactional Local 922 533 19.77 6.44