Resource Management in MSMQ Applications

 

Chris Manderson, Travis Stanfield
Microsoft Corporation

February 2003

Applies to:
     Message Queuing 1.0
     Message Queuing 2.0
     Message Queuing 3.0

Summary:   Learn how to manage resources in your Microsoft Message Queuing (MSMQ) applications by avoiding resource bottlenecks in memory, disk and network as well as abiding by MSMQ and system limitations. (6 printed pages)

Contents

Introduction
Symptoms and Errors
System and MSMQ Limitations
Troubleshooting
Summary

Introduction

How an application will manage resources in its environment is a critical consideration when developing software. MSMQ applications require resources such as memory (RAM), disk, and network bandwidth in most cases. These resources, when consumed and recycled properly, will keep your applications and services running in top condition, keep them highly available, and allow your system to better handle higher load spikes during busy peak times. Those times are when all of your preparation in design will pay off.

Symptoms and Errors

When system resources that are in use by the MSMQ service or MSMQ applications are exceeded or are reaching their limits, some warning signs may be raised or even a of message processing may even be halted. Some of these errors or indications will be received by your application, some will be sent to the event viewer, and others require further investigation to determine through the use of diagnostic tools.

An error that may be returned to your application or to the MSMQ GUI tools such as the MQExplorer or Computer Management could be:

There are insufficient resources to perform this operation. Error 0xc00e0027

Note In Windows 2000, a known reason for the C00E0027 error, not directly related to resource usage, is a mismatch of the MSMQ DLLs and the MSMQ Access Control Driver mqac.sys. This is caused by uninstalling and reinstalling MSMQ after applying Security Rollup Package 1 (SRP1). In this case, the Access Control Driver does not get rolled back to the previous Service Pack level.

Warnings that may appear in your Application event log that indicate a resource issue may be:

Event Id: 2070

This computer is low on memory. The Message Queuing Service is switching to "low memory" mode of operation.

This warning may be followed by the informational event:

Event Id: 2071

This computer has sufficient free memory. The Message Queuing Service is returning to normal mode of operation.

Note These event log messages will only bee seen on an MSMQ 1.0 system running on the NT 4.0 platform. In Windows 2000, these event log warnings are not raised and require diagnostic tools to identify. See Q264936 on the Microsoft support website for more information.

System and MSMQ Limitations

There are a several reasons directly related to your systems capabilities or the capabilities of MSMQ that cause an insufficient resources problem to occur. These are: Message Size, Message Storage Capacity, Threading, and Paged and Non-paged Memory.

Message Size

MSMQ can support messages under 4 MB in size only, this includes Queued Components messages. Any attempt to send a message through the system that is larger than this will raise the insufficient resources error. Be aware that Unicode data takes up twice as much space as non-Unicode data, as two bytes are needed for each character.

Message Storage Capacity

For MSMQ 1.0 and MSMQ 2.0, the combined size of messages capable of being stored on one machine is not limited to the amount of RAM in the machine or the size of the hard disk, but to the amount of virtual address space provided to the MSMQ service by the operating system (this limitation has been lifted in MSMQ 3.0). Each process in an x86 machine is allotted a virtual 4 GB of addressable memory. 2GB is reserved for use in kernel mode and 2GB for user mode. The MSMQ Queue Manager operates in user mode and therefore has an addressable 2GB of virtual address space to work with. Each message's data is stored in RAM, which is backed up by the system's paging file or memory mapped files. MSMQ uses memory mapped files to store both express and recoverable messages. Since we are limited to 2GB of addressable memory, we are limited to 2GB worth of messages on a disk. When you take into account the memory utilized by MSMQ code and its internal data structures, as well as file allocation to store message files on disk, we end up with between 1.4GB and 1.6GB worth of messages that can be stored on disk.

Note This limitation of 1.6GB can be raised to approximately 2.6GB by enabling 3GB tuning on the MSMQ Service. See Q171793 for more information on how to enable 3GB tuning.

Threading

A common technique used to have an application receieve notification of events that happen locally or remotely, is to use asynchronous callbacks This technique works well for MSMQ application developers because they can subscribe to an event, go on with other work, and receive a notification that an event has transpired (message arrived) asynchronously. There is a notable limitation and consequences of calling MQReceiveMessage() with callbacks. The limitation of this technique is that only 63 callbacks can be made against any one process. This limitation stems entirely from how MSMQ has been designed to implement callbacks. The consequences of this design can be seen when taking into account that under the covers there is only one thread in an application process calling the WaitForMultipleObject api. This lone thread is responsible for waking up when any one of the 63 events is fired. Only one event is being used internally by MSMQ at any one time. This also means that callbacks in a process are serialized. If an application makes a 64th call to MQReceiveMessage() with a callback, and the other 63 threads are still waiting to be signaled, the 64th call will get back an INSUFFICENT_RESOURCES error.

Another common threading-based scenario is to get an MQ_ERROR_INSUFFICIENT_RESOURCES error when calling MQReceiveMessage() to read from a remote queue. When your application reads from a remote queue, a thread is created by the local MSMQ service that waits for completion of the remote read on the remote computer. The default threshold of threads created to handle these requests is based mainly on the version of the OS you are running. The limit for Windows NT4 Workstation is 16, NT4 Server is 64, Windows 2000 Professional is 24, Windows 2000 Server is 96 and there is no limit on Windows XP Professional and Windows 2003 Server family editions. You can change these limits by adding the registry DWORD values "MaxRRThreads" and "MinRRThreads" to HKLM\software\microsoft\msmq\parameters and setting them to the decimal values of your choice. Note that the MinRRThreads registry entry is not available on MSMQ 1.0 systems. For more information on these registry keys, check the Registry Section of the Win2K Resource Kit. Please note that in MSMQ 1.0 these threads are created on demand and are never cleaned up. So if you set this number to 1000 and the service indeed creates 1000 threads, all these threads will live as long as the MSMQ service runs. This problem was fixed in MSMQ 2.0.

Paged and Non-paged Memory

The Windows Memory Manager creates two types of dynamically sized memory pools that kernel mode software can use to allocate kernel/system memory. Conceptually, these pools can be thought of as kernel/system heaps. Non-paged memory is roughly similar to physical memory because it is guaranteed to actually be in memory before it is accessed. Access to non-paged memory will never page fault.

Paged pool memory, on the other hand, can be paged to disk. Because we are talking about system memory, these pools are mapped to the 2GB kernel mode virtual address space allocated to every process. There are routines such as ExAllocatePool which allocate and de-allocate from these pools and are documented in the Microsoft Driver Development Kit (DDK).

The maximum size of both pools is determined by the operating system at boot-up time. See Q126402 and Q312362 for more details. Drivers that allocate memory can specify a four-letter tag in their requests. When pool tracking is enabled, this tag is associated with each memory allocation, and can be analyzed via diagnostic tools such as poolmon.exe to determine if there is a leak (note that the tags for MSMQ memory are MQAC, MQQM and MQXA starting in Windows 2000). As is the case in Q264936, exhausting the paged pool will have a critical effect on MSMQ, therefore, it is necessary to understand how MSMQ utilizes kernel memory because you can have millions of resident messages and not reach the 1.4 – 1.6GB limit but still exhaust the page pool. Each message consumes approximately 70-80 bytes, on average, of page pool memory. To determine if you have reached this limit, or are approaching it, you should run Performance Monitor and look at the MSMQ Service objects' Total Messages in All Queues counter.

Troubleshooting

Troubleshooting MSMQ resource issues can either be a quick win, where an obvious detail was overlooked in application design and a system or MSMQ limitation was reached, or a tedious process of watch and wait. Diagnosing if you are sending messages over 4MB or reaching the 1.6GB ceiling of MSMQ storage capacity is fairly straightforward. Determining if you have a threading issue may be a bit more complicated, and deciphering page pool outputs can be tedious. We will take this opportunity to explain some of the tools and techniques that have helped support professionals diagnose and resolve these issues in the past.

Message Capacities

If you believe that you have reached the messages size limit, consider breaking your message into multiple parts. MSMQ was designed to be a lightweight transport for data, anything more than 4MB in size would be considered an atypically large amount of data for a single transaction.

If you navigate to the MSMQ storage directory on your machine (typically winnt\system32\msmq\storage), and the combined space of this folder is between 1.4GB and 1.6GB of data, and the service fails to start, then there is a good chance that you have reached the storage capacity of MSMQ. At this point, there are two things that need to be done: Recover, so that the MSMQ service can process these messages. And then, take preemptive measures to insure that this ceiling is not breached a second time. The MSMQ product development team, in conjunction with Microsoft Product Support Services, is working to build tools to make recovery easier for people who arrive at the MSMQ message capacity limit. These tools will make it possible for users to remove selected messages, de-fragment message files as well as replay messages in offline files.

Recovery

To recover at this point, you will require the assistance of Microsoft Product Support Services and they will work with you to recover this data. There is a quick way to recover to get your system back up and running, by removing all of the files within MSMQ's storage directory that end in .mq. These files contain all of your messages. If you move these files out while the service is down and then restart the service, your service will come back up and be fully functional but all of your queues will be empty. This approach is not recommended when you are sending transactional messages because transactional messages must be sent in sequence. Once a newer message has been sent, older messages are not allowed to be sent. Only move out all message files if absolutely necessary to get the service back up ASAP.

For more information on the files located in the MSMQ storage directory, see the Message Storage section of the Message Queuing documentation contained in the Windows 2000 Server Help files.

Avoid message capacity thresholds

The best way to avoid getting into this situation is to implement quotas in your message queuing architecture. This is a two step process.

Step 1. Set a quota. This process is similar for all versions of MSMQ. These options can typically be found by going to the properties of a queue. Once there, you can set your desired quota size for that queue. Once the quota for either queue or the computer has been reached, MSMQ will no longer accept messages into that queue or computer. For more information on setting quotas see the Message Storage section of the Message Queuing documentation contained in the Windows 2000 Server Help files. For MSMQ 2.0 in workgroup mode, you can set the quota by editing the MachineQuota value under HKLM\Software\Microsoft\MSMQ\Parameters\MachineCache. Set this value to the amount of Kbytes of all messages combined.

Step 2. Request and acknowledgement. Quotas will keep your applications from flooding the MSMQ service but they do not help your application to be more flexible when these quotas are reached. To do this, you can request an exceeded quota acknowledgement from the machine you are sending your messages to. If this acknowledgement, when returned to your application, indicates that the quota for this queue or machine has been reached, then your application can either cease sending messages or offload the messages to another destination. This is an excellent way to scale out MSMQ. For more information on these acknowledgements, see Message Queuing in the Platform SDK under the Message Queuing and Queue Components (MSMQ) node in the MSDN Library.

Note the difference between machine quotas and queue quotas. When a machine quota is reached, the destination machine will not accept any further incoming messages; these messages will begin to accumulate in the sending machines outbound queue or on intermediate routing servers. To troubleshoot this issue, you should acquire a network monitor capture of the MSMQ traffic and look at the MSMQ session establishment packets or MSMQ Session acknowledgement packets. If the Window size is 1 then the machine quota has been reached.

On the other hand, when a queue quota is reached the destination machine simply discards the message, therefore it is very important to always request the proper quota negative acknowledgement when using queue quotas on the destination machine. This "nack" will only be sent from the destination machine when the quota has been reached.

Paged and Non-paged Memory

Option 1. Run Talking Message Queue (TMQ) – State.

TMQ State is currently only available through Microsoft Product Support Services (PSS). At a command prompt, run tmq state. This will create a logfile named tmqstate.log. Within the output of this logfile you will see Memory usage information about the local machine which will look similar this:

Memory usage summary:

   Physical Memory (K)

      Total         3669532

      Available     2237384

      %             60

   Pools limitations (calculated approximately, in KB)

      Paged    : limit 307200,    used for 83 %

      Nonpaged : limit 262144,    used for 25 %

Note that this example shows the OS calculated the Paged Pool limit to be around 307200 kilobytes. Also note that we are over 80% used and thus we have slipped into MSMQ low memory mode. This information verifies that the insufficient resources problem is most likely related to exhausted paged pool memory.

Option 2. Enable pool tagging

Pool tagging can be enabled by running the Gflags.exe utility. This utility can be obtained by downloading the debuggers from https://www.microsoft.com/ddk/debugging. It is also available from the Windows 2000 Support Tools, Platform SDK, or DDK. Run Gflags.exe. Click the System Registry radio button and check the Enable pool tagging checkbox. Click Apply, and then reboot the system.

After the system reboots, run poolmon.exe –b. This will order the pool data by allocations. Since the system just rebooted, the page pool should be clear and the insufficient resources problem should not be evident. In moving forward, periodically obtain poolmon.exe data as snapshots of the system memory. These snapshots can be used to see if one or more tags are leaking memory. By running poolmon.exe –b, you will be able to determine which tags are the top memory consumers. When the problem reoccurs, run poolmon.exe –b again. As is the case with Q177415, this information can be used to determine if there is truly a leak. If a leak is not apparent, the memory manager can be tweaked to cleanup unused pages before it reaches 80% usage. For more information on this, see Q312362.

Option 3. Obtain Performance Monitor data

Performance Monitor should be set up to log data throughout the course of the problem. All instance data for the Process and Memory objects and their counters should be obtained. Pay close attention to each processes' handle count information on handle leaks. The Memory object will have the Pool Paged Bytes and Pool Paged Allocs counters available. Note that this information can help view a leak occurring, however we do not have counters for the Paged Pool limit. Refer to Option 1 for the information on determining the limit. Before making the changes outlined in Q312362, it must be understood what percentage of paged pool your system will use on average. Reducing the PoolUsageMaximum to a percentage lower than your average usage can cause performance problems.

Summary

In this paper, we have covered most of the common reasons why you would run into insufficient resources problems with MSMQ, and introduced you to several tools and techniques used by Microsoft support staff to aid in troubleshooting and avoiding these issues. In Windows XP and Windows 2003 Server, low resources tracing has been enhanced greatly to provide users with ample means of diagnosing their resource issues. An ounce of prevention and planning can make all the difference when dealing with system resources and education is key to resolving and avoiding these issues quickly and effectively.