Printer Friendly Version      Send     
Click to Rate and Give Feedback
MSDN
MSDN Library
 
Tune Your Microsoft Windows Media Server

Eduardo Oliveira with Gail McClellan
Microsoft Digital Media Division

December 2001

Summary: This document provides the results of performance evaluation tests conducted with servers running Microsoft Windows Media Services version 4.1. The data can help you better understand the way Windows Media servers use resources so you can create the best capacity plan for your environment. (20 printed pages)

Contents

Introduction
Bottlenecks
Performance Evaluation
Appendix A: Profiles
Appendix B: Keys
Appendix C: Complete Set of Data Collected from the Experiments
For More Information

Introduction

This document contains the results of performance tests conducted with servers running Microsoft® Windows Media™ Services version 4.1. All tests were conducted in a controlled lab environment. Although the data presented is valuable in evaluating server behavior, it was collected using a simplification of real-world scenarios. You should not use the data as absolute numbers nor apply them in a production environment. Instead, use the conclusions presented to fine-tune and optimize servers to achieve the best possible results according to your individual situation. This document contains the following topics:

Bottlenecks

The main bottlenecks that affect the installation of Windows Media Services are disk throughput, available bandwidth, CPU, and memory. CPU and memory issues are more easily identifiable than disk and bandwidth issues. In most cases, insufficient disk throughput is the main cause of errors on systems allocated primarily for on-demand streaming (especially in LAN environments). The lack of bandwidth comprises the main issue associated with an Internet installation.

Disk Throughput

This section discusses the issues associated with disk throughput and how to achieve the best performance. Two issues relate to disk performance:

  • Disks have better sustained throughput serving a few high bit rate streams than if they serve many low bit rate streams because less time is spent seeking the data on the physical disk.
  • The Windows Media server does not use the NTFS file caching mechanism.

In relation to the first issue, tests indicate that disks can source approximately 650 22-kilobyte (KB) streams (approximately 14 megabits per second (Mbps)) and 45 700-KB streams (approximately 31 Mbps). Adding more clients to source from the same disk results in late reads, which you can detect by looking at the Late Reads counter of Windows Media Unicast service.

The second issue related to disk performance is important because it implies that each file is read once each time a client streams it. Windows Media Services does not use the NTFS file-caching mechanism because the caching of large content files can cause disk paging in some situations. Disk paging is detrimental to server performance and should be avoided. If a particular system installation requires caching or can benefit from caching, you should use disk controller caching mechanisms. Both SCSI and Fiber Channel vendors offer controller caching options that are beneficial for certain scenarios.

In the performance tests, the best server performance was achieved when the disks were used as separate volumes and a publishing point was created on each one.

Note   Using RAID 0 or RAID 5 arrays with the test machine would not have been a problem, since the disk controller cache (53 megabyte (MB)) does a good job compensating the overhead imposed by the RAID setups.

When using multiple bit rate (MBR) content, it is important to remember that the amount of data that is read from the disk is the sum of all encoded bandwidths, not just the stream that is currently being delivered to the user. The result is that MBR is much more disk-intensive.

Additional suggestions for optimizing disk throughput are to place the operating system on a separate disk and to conduct performance tests with your disks. Use the Late Reads counter to observe any problems with the disk performance.

Bandwidth

Failures caused by insufficient bandwidth can generally be diagnosed by looking at the Stream Errors counter in the Windows Media Unicast service. You can find out the amount of aggregate bandwidth being used fairly easily by either multiplying the number of connected clients by the bit rate or by looking at the Allocated Bandwidth counter (this counter is displayed in bytes). Determining the amount of throughput that a particular network interface card (NIC) can provide is a more difficult task.

Previous tests found that most 100 Mbps Ethernet cards support approximately 60 to 70 Mbps of sustained outgoing data in a Windows Media environment. Results with gigabit NICs tend to vary much more widely. Limited tests with gigabit Ethernet cards produced results varying from 300 Mbps to 400 Mbps. In some cases, the PCI local bus speed prevented the card from sending more data.

For the best performance results, make sure the NIC is set to full duplex mode.

CPU

Monitoring CPU consumption is an easy and common task. Knowing the recommended CPU limit for any particular installation is much more complicated. If the server CPU load does not reach 100 percent, it can appear that performance is fine. However, some operations are more CPU-intensive, and CPU usage can vary considerably even when no new clients are connecting. CPU usage depends primarily on the operations that users are performing, such as streaming, fast forwarding, seeking, and pausing.

The most demanding operation for consuming CPU is new client connection, which is why the number of allowed connections per second is limited. Increasing this value can result in problems to the clients already connected to the system, especially if a large number of new clients try to connect at the same time.

Note   See Appendix B: Keys for details on the MaxConnectionsPerSecond key.

It is recommended that the server use no more than 30 percent to 50 percent CPU and never go beyond the 60 percent mark while in a steady streaming state. This will ensure that enough CPU processing power is available in case new clients connect.

The following sections discuss how a Windows Media server scales in relation to the number of CPUs on the system and the number of connected clients, content bit rate, and on-demand versus live.

Memory

As with CPU usage, tracking memory availability is relatively easy. The Windows Media server does not use memory for file caching, so the memory requisites are quite low. The necessary memory is directly related to the number of connected clients, file bit rate, and whether the transmission is live or on-demand.

It is always important to have some RAM available. Some estimations of memory consumption are provided in the following sections. Because allocated memory does not vary depending on the computer configuration, it is one of the few parameter values that can transpose satisfactorily from one system to another.

Performance Evaluation

This section provides a performance evaluation based on the following factors: protocols, bit rates, broadcast publishing points versus on-demand streams, number of processors, single bit rate versus multiple bit rate, and the FastSendDatagramThreshold key.

Protocols

The following diagram indicates that CPU consumption does not increase linearly when streaming on-demand content. According to performance tests, the CPU usage for 4000 MMSU clients is 27 percent. Connecting an additional 500 clients increases the CPU load to 60 percent. The following diagram displays the CPU usage of MMSU, MMST, and HTTP protocols.

Using the MMSU protocol enables 53 percent more clients to stream as compared to MMST or HTTP protocols at the same CPU load levels (4000 versus 2700 clients).

Processor time percentage for on-demand 22 kbps file

Figure 1.

The following diagram displays the results of the test.

Working set for on-demand 22 kbps file

Figure 2.

For the 22-kilobits per second (Kbps) test file, each MMSU client used approximately 103 KB of memory, HTTP used 63.5 KB, and MMST used 62.9 KB. Although the MMSU protocol consumes less CPU, it demands more memory (60 percent more than MMST).

Bit rates

When making calculations to estimate the number of clients a Windows Media server can support, you should have an idea of how the server scales in relation to different bit rates. The server was tested streaming 22 Kbps and 220 Kbps files. Five hundred clients were connected at a time using the 22 Kbps file, and 50 clients were connected at a time using the 220 Kbps file. The total throughputs are equivalent, but the number of clients and the file bit rate is different.

The following diagram shows that 500 22-Kbps streams consume much more CPU than 50 220-Kbps streams. So, CPU usage does not increase linearly with bit rate; it grows at a fraction of it.

Processor time percentage using MMSU from broadcast publishing point

Figure 3.

As a direct result, saturating a gigabit NIC streaming 22 Kbps files is not viable because many systems can run out of processing power. When streaming 700 Kbps or 1 Mbps files, it is more likely that the CPU will not become the limiting factor. In the tests with high bit rate content, CPU proved to be much less of an issue than NIC and disk speed.

Working set using MMSU from broadcast publishing point

Figure 4.

The same applies to memory usage. In the case of 22 Kbps streams, RAM is allocated at a rate of 730 bytes per Kbps of output, but for 220 Kbps streams, the usage falls to only 90 bytes.

For both CPU and memory, server performance is proportionally better for higher bit rate streams.

Broadcast Publishing Point vs. On-demand Streams

Streaming data from a broadcast publishing point (BPP)—which can source from a live feed or from a station—consumes far fewer resources than streaming on-demand. The biggest difference between the two types of transmissions is BPP transmissions scale almost linearly, and on-demand scales in an almost exponential manner. The following diagram displays the processing time for both live and on-demand streams.

Processor time percentage for 22 kbps file using MMSU

Figure 5.

Below 35 percent CPU, on-demand streams use less processor time than BPPs. Nevertheless, BPPs enable system administrators not only to predict how many clients the system can support more easily, but also to connect a greater number of clients. The highest number of 22 Kbps clients connected to a single Windows Media server is 9,500 at the time this document was published. The clients were streaming from multiple BPPs.

Working set for 22 kbps file using MMSU

Figure 6.

The memory used by BPP streams is also relatively low. For each 22-Kbps MMSU client, a server uses 16.3 KB of memory. For on-demand streams, the server uses approximately 103 KB of memory. A server requires approximately 512 MB of installed RAM to stream a 22 Kbps file to 4,000 on-demand unicast clients. With a live stream, the memory requirement for streaming to the same number of clients is 128 MB of installed RAM.

Number of Processors

Windows Media server scales very differently when processors are added, depending on whether the streams are primarily live or on-demand.

Processor time percentage using MMSU with 22 kbps on-demand file

Figure 7.

For on-demand streams, the number of processors and the number of clients the system can serve does not increase at the same rate. With a single processor, you can connect up to 2,000 clients and maintain a maximum of 30 percent CPU usage. With two processors, you can connect 2,600 clients (an average of 1,300 per processor). With four processors, you can connect 4,200 clients (an average of 1,050 per processor).

On the other hand, the server scales very well when the content is live, as displayed in the following diagram.

Processor time percentage using MMSU with 22 kbps live file

Figure 8.

Following the same principle of maintaining CPU around 30 percent, with a single processor you can connect approximately 1,000 clients. With two processors, you can connect approximately 2,000 clients, and with four processors approximately 4,000 clients.

Single Bit Rate vs. Multiple Bit Rate

An issue with multiple bit rate (MBR) files is that all different bit rates must be read from disk, regardless of the bit rate the client is receiving. Late reads will start occurring earlier because the amount of data read from the disks is larger.

Except for this issue, MBR files behave similarly to single bit rate (SBR) files. For on-demand streams, memory use increases linearly, whereas CPU use does not.

FastSendDatagramThreshold Key

The performance tests showed that using a more appropriate value for the FastSendDatagramThreshold key has a very good effect on server performance. This threshold key is used for Microsoft Windows® 2000 to decide if the datagram should go through the fast I/O path or be buffered on send. Fast I/O means copying data and bypassing the I/O subsystem, instead of mapping memory and going through the I/O subsystem.

Note   This paragraph was extracted from the document "Microsoft Windows 2000 TCP/IP Implementation Details: TCP/IP Protocol Stack and Services," written by Dave MacDonald.

The default value of the FastSendDatagramThreshold key is 1,024. If the stream is sent in packets exceeding this value, additional operations are necessary. As a result, the amount of context switches and CPU use increases and the maximum number of clients the server can serve is reduced. The following diagrams display the results to a system with and without changing the threshold key to a more optimal value (in this case, 1,500 bytes).

Processor time percentage using MMSU for 100 kbps on-demand file

Figure 9.

Only high bit rate files benefit from changing the threshold key because only these files are sent in packets that exceed the default limit. Usually 100 Kbps is the first bit rate that causes the respective Windows media files with .asf extensions to be divided in packets larger than 1,024. To decide the appropriate threshold limit, you must identify the packet size of the Windows Media file, and then set the key to a higher number.

Note   Some typical sample values are as follows: for 100 Kbps streams, set the key to 4,096, for 500 Kbps streams, set the key 9,600, and for 1 Mbps files, set the key to 1,6000.

Without changing the key, it is possible to have 750 clients connected to the server, maintaining CPU below 30 percent. With a change to the key, 1,050 clients could connect—an increase of 40 percent.

Pool nonpaged bytes using MMSU for 100 kbps on-demand file

Figure 10.

The side effect of changing the threshold key is an increase in the number of non-paged pool bytes allocated for the server. However, as the previous diagram shows, the increment is not significant enough to cause any problems. The following diagram illustrates an increase in context switches when the packet size exceeds the threshold.

Context switches per second

Figure 11.

Appendix A: Profiles

Tests were conducted on profiles ranging in 22 Kbps to 1 Mbps.

22 Kbps

Processor time percentage for 22 kbps file using MMSU

Figure 12.

Working set for 22 kbps file using MMSU

Figure 13.

  On-demand Live
Memory used per client 103 Kbps 16.2 Kbps
Memory used per Kbps of output 4700 bytes 730 bytes
CPU usage per client Nonlinear 13% per 1000 clients
Suggested number of clients 4000 4200
Max number of clients 5000 6500

100 Kbps

Processor time percentage for 100 kbps file using MMSU

Figure 14.

Working set for 100 kbps file using MMSU

Figure 15.

  On-demand Live
Memory used per client 103 Kbps 16.7 Kbps
Memory used per Kbps of output 1035 bytes 168 bytes
CPU usage per client Nonlinear Nonlinear
Suggested number of clients 1050 1400
Max number of clients 1750 2500

200 Kbps

Processing time percentage for 200 kbps file using MMSU

Figure 16.

Working set for 200 kbps file using MMSU

Figure 17.

  On-demand Live
Memory used per client 138 Kbps 20 Kbps
Memory used per Kbps of output 695 bytes 100 bytes
CPU usage per client Nonlinear 5% per 100 clients
Suggested number of clients 800 1000
Max number of clients 1200 1450

500 Kbps

Processor time percentage for 500 kbps file using MMSU

Figure 18.

Working set for 500 kbps file using MMSU

Figure 19.

  On-demand Live
Memory used per client 324 Kbps 25 Kbps
Memory used per Kbps of output 648 bytes 51 bytes
CPU usage per client Non-linear 4% per 50 clients
Suggested number of clients 400 450
Max number of clients 470 600

700 Kbps

Processor time percentage for 700 kbps file using MMSU

Figure 20.

Working set for 700 kbps file using MMSU

Figure 21.

  On-demand Live
Memory used per client 441 Kbps 31 Kbps
Memory used per Kbps of output 630 bytes 44 bytes
CPU usage per client Nonlinear 1.2% per 10 clients
Suggested number of clients 350 360
Max number of clients 400 440

1 Mbps

Processor time for 1 mbps file using MMSU

Figure 22.

Working set for 1 mbps file using MMSU

Figure 23.

  On-demand Live
Memory used per client 622 Kbps 31 Kbps
Memory used per Kbit/s of output 608 bytes 31 bytes
CPU usage per client Nonlinear 3.5% per 25 clients
Suggested number of clients 300 280
Max number of clients 350 350

Appendix B: Keys

This section provides information on unicast and TCP keys.

Unicast Keys
MaxConnectionsPerSecond

Location: HKLM\System\CurrentControlSet\Services\Nsunicast\Parameters\MaxConnectionsPerSecond

Type: DWORD

Default: 25

When a connection request is sent from a client to a Windows Media server, the request is immediately (within one second) processed until the number of clients attempting to connect simultaneously reaches 25 or the value you set. All subsequent connection requests are received by the server and placed in a queue.

No value for client connection rate is appropriate for all servers, even if the computers have identical resources. The impact the client connection rate has on stream quality for clients already connected depends on the available resources at the time the requests are processed. A server with multiple CPUs and large amounts of memory can support more clients, but if use of system resources is at a high percentage, increasing the connection rate can degrade stream quality for connected clients. A lower-end server can handle 50 connections per second because the server load is lower. When determining the value for connection rate, you must consider resource use at the time the requests are being processed. The number of client requests a server can process without affecting stream quality varies depending on the number of clients currently receiving content from that server.

In the tests presented in this document, the MaxConnectionsPerSecond key was set to 100, but the clients were always connected at a lower rate. When the CPU use was high, the client connection rate was adjusted accordingly.

TCP Keys
MaxUserPort

Location: HKLM\System\CurrentControlSet\Services\TCPIP\Parameters\MaxUserPort

Type: DWORD

Default: 5000

This parameter controls the maximum port number used when an application requests an available user port from the system. By default, the ports in the range 1,024 through 5,000 are available. The server can run out of ports if too many clients try to connect and this value is not changed. The limit is typically hit a little bit above 3,700 MMSU clients. In the case of HTTP or MMST, the limit to the number of clients is higher. If you expect to have a large number of clients connected simultaneously, it is recommended you set this value to 0xFFFE.

FastSendDatagramThreshold

Location: HKLM\System\CurrentControlSet\Services\AFD\Parameters\FastSendDatagramThreshold

Type: DWORD

Default: 1024

Datagrams smaller than the default go through the fast I/O path or are buffered on send. Larger ones are held until the datagram is actually sent. Fast I/O means copying data and bypassing the I/O subsystem, instead of mapping memory and going through the I/O subsystem.

This key should be set to a value higher than the packet size of the highest bit rate stream the server will deliver.

Appendix C: Complete Set of Data Collected from the Experiments

The following Performance Monitor values were collected for the performance tests conducted for Windows Media server.

  • CPU
  • Allocated bandwidth
  • NIC bytes sent/sec
  • Working set
  • Pool non-paged bytes
  • Pool paged bytes
  • Thread count
  • Handle count
  • Page faults/sec
  • Interrupts/sec
  • Context switches/sec

For More Information

To learn more about Windows Media Services 4.1, see Windows Media Services 4.1 Help. The help file can be downloaded from the Windows Media Technologies page at the Microsoft Web site.

 

Legal notice

The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.

This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, AS TO THE INFORMATION IN THIS DOCUMENT.

Complying with all applicable copyright laws is the responsibility of the user.  Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.

Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document.  Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.

© 2000 Microsoft Corporation. All rights reserved.

Microsoft, MS-DOS, Windows, Windows 2000 are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.

The names of actual companies and products mentioned herein may be the trademarks of their respective owners.

© 2008 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Page view tracker