Sound Cards, Voice Management, and Driver Models

Article
06/29/2006

Brian Schmidt
Microsoft Corporation

January 17, 2000

Summary: This article explains what you need to know to program and test your Microsoft DirectX application so that it works optimally with the widest variety of sound cards and across the full range of Windows versions. (10 printed pages)

Introduction

Sound cards and Microsoft® Windows® system infrastructure have both undergone major changes over the past couple of years. The greatest of these changes is that sound cards have moved from the ISA bus to the PCI bus. This design change fundamentally altered the way that audio cards, particularly advanced audio cards, process and deal with audio data. Additionally, within the Windows operating system, a complete redesign of the device driver model has occurred, moving from “VxD” driver model to “WDM” driver model. Accompanying this change to WDM is a reworking of the way that Microsoft DirectSound® handles and mixes audio data. Finally, new digital audio devices such as USB speakers are being connected to PCs in greater numbers.

This article describes what you may need to know to program and test your application so that it works optimally with the widest variety of sound cards and across the full range of Windows versions. It will go over the changing architecture of audio cards, address “voice management” ramifications and programming issues. It will also describe how the new driver model may affect how DirectSound works and how it behaves in your application.

ISA, PCI, Sound Card Design and DirectSound Acceleration

When DirectSound was first introduced, all audio devices lived on the ISA bus of the PC. This relatively low-speed bus provided basic access to the audio device, but could not transfer large amounts of data without clogging it. As it turns out, the ISA bus was sufficient to send (stream) 16-bit/22-kiloherz (kHz) stereo data from the CPU to the sound card without unduly affecting overall system performance. Attempting to send more data, however, often placed an undue burden on the bus. This limitation had a strong influence on audio hardware design.

Older Cards: DirectSound Acceleration on ISA Sound Cards

In order to accelerate (that is, offload to hardware) the mixing of DirectSound buffers, a hardware-mixing chip on the sound card needs fast and direct access to the audio data to be mixed. A DirectSound accelerating sound card has two choices: It can “reach” across the ISA bus into system RAM to access the audio data and mix the data, or it can design in special dedicated sound RAM on the card itself, copy the audio data to the sound RAM, and mix from there. Due to the ISA bus’s limited bandwidth, reading audio data directly from RAM is not practical. Therefore, ISA-based sound cards that do DirectSound acceleration almost always have dedicated on-card RAM. An example of such a card is the SoundBlaster AWE series. When a sound card like the AWE is used to accelerate DirectSound, the sound data must first be copied from system memory, across the ISA bus, into the dedicated sound RAM on the card. (Note: This copying is done during the Unlock call on the DirectSound buffer). Because the data must be copied across the slow bus, the Unlock call can take some time. However, once the data is in system RAM, there is virtually no “hit” on the system for starting, stopping, and mixing sounds; it’s all handled by the audio processor chip accessing its own dedicated RAM.

This basic limitation led to the creation of two classes of ISA sound cards:

Simple “DAC/ADC only” sound cards—Often called non-accelerating cards, these audio cards provide only the simplest audio functionality, converting a single stereo stream of audio data into analog voltage suitable for driving external speakers or amplifiers.
DirectSound accelerating cards—These cards contain on-card dedicated sound RAM and perform hardware mixing of audio data in this RAM. The sound RAM is analogous to the dedicated video RAM used to store textures in most three-dimensional (3-D) graphics accelerator systems.

Streaming and Static Buffers

As noted above, a sound card has two choices. It can use data resident in main system RAM or use its own RAM local to the card. DirectSound gives the terms streaming buffer and static buffer to refer to sound buffers that are stored in system memory or in on-card memory respectively. Actually, the terms are somewhat misleading and perhaps poorly chosen. This is because it is easy to confuse “streaming” as it defines a buffer, with streaming when meant to mean “the act of using a small buffer to play a large audio file by repeatedly copying, bit-by-bit while it’s playing, the audio data into the small buffer.” The former has nothing to do with the latter. It is quite possible (and done extremely frequently) to have a “streaming buffer” contain an entire sound, such as a gunshot. More precise terms for streaming and static buffers would be “system memory buffers” and “local memory buffers.” We will continue to use the DirectSound terms of streaming and static to refer to these, but the distinction must be kept very clear. Note that the ISA-based DirectSound accelerators described above all support static buffers but not streaming buffers, because they use dedicated, on-card sound RAM and because it is impractical to send much data across the slow ISA bus.

New Cards: Enter PCI Sound Cards

In order to improve performance, the sound card has moved from the ISA bus to the PCI bus. Today, virtually all audio chips shipped are designed to connect to the PCI bus. With the movement to the PCI bus comes a huge increase in the available bandwidth between the system and the audio card. So much in fact, that by using the PCI bus, it is now practical to have the audio chip reach directly into main PC system memory to access audio data for hardware mixing. This fact, combined with the economic advantages of using system memory instead of dedicated audio memory, has resulted in the following fact: Virtually all PCI-based DirectSound accelerating sound cards use system memory and do not have dedicated on-board RAM. To use DirectSound terminology, all these cards support streaming buffers*,* not static buffers.

Side Note After some thought, you’ll realize that a streaming buffer can do anything a static buffer can. But the reverse is not true due to the extra step needed by static buffers to copy data into local RAM. For this reason, most sound cards that actually support streaming buffers also report that they have static buffers, even though they have no on-card RAM. On these cards, if you open a streaming buffer, both the number of streaming and static buffers decrease, reflecting that they are, in fact, the same buffer.

This ability led to the creation of two classes of PCI sound cards:

Simple “DAC/ADC only” sound cards (non-accelerating cards)—These function exactly like their ISA counterparts except for their faster connection via the PCI bus.
DirectSound accelerating cards—These cards do not contain on-card dedicated sound RAM. Rather, they simply reach into main system memory and access the DirectSound buffer data that DirectSound has allocated. Virtually all current DirectSound accelerators are so designed.

The Implications of PCI/ISA Acceleration when Using DirectSound 7.0

One of the features of DirectSound 7.0 is the addition of voice management. By using voice management, a DirectSound buffer can be created but not assigned to hardware or software until the sound is played. Simply adding the DSBCAPS_LOCDEFER flag when creating the sound buffer does this. How this affects your game performance is dependent on the type of DirectSound accelerator in the end user’s system.

Consider the case of a PCI accelerator. When a voice-managed DirectSound buffer is played, DirectSound first determines if there is hardware available to play the sound. Assuming there is, DirectSound must then ensure that the hardware has access to the sound data. Finally, DirectSound tells the hardware to start playing. Because the PCI accelerator utilizes system memory, the audio hardware already has direct access to the audio; it is merely the existing DirectSound buffer. So all DirectSound needs to do is let the hardware know where the data is and start it playing. This process is both efficient and fast.

Note The voice manager manages just that, voices. This is in contrast to the DirectDraw texture manager that manages memory. Since modern PCI audio accelerators use system memory, the voice manager does not manage memory.

However, if the user has an ISA-based accelerator, the situation is quite different. In this case, getting audio data to the sound card is very slow, since the data needs to cross the ISA bus and be loaded into the sound RAM on the card. This will introduce an unacceptable latency between when the Play call is made and when the sound actually starts. For this reason, using the voice manager is not recommended on ISA-based DirectSound accelerator cards.

Fortunately, it is easy to program your game to use only the voice manager on PCI sound cards. Recall that PCI sound cards all support streaming buffers and ISA cards support static buffers. When creating a DirectSound buffer, the buffer is presumed to be streaming unless the DSBCAPS_STATIC flag is used. In other words, a buffer won’t be assigned to a static-type hardware buffer unless the DSBCAPS_STATIC flag is used. So, to use the voice manager, but only for cards that use system RAM (and therefore give your game good performance), there is one simple rule:

It is recommended that you not use DSBCAPS_STATIC when using DSBCAPS_LOCDEFER when creating a DirectSound buffer.

This rule will assure that you don’t try to use voice management on cards that don’t perform well with it, namely older cards with on-board RAM.

If you want your game to utilize older style, RAM-on-card accelerators, you can do the following:

Check the caps of the sound card. If the card reports more than zero static buffers and no streaming buffers, this card is probably an older ISA card with on-card RAM. You can then use the DSBCAPS_STATIC flag, but do not use DSBCAPS_LOCDEFER.
If the card reports more than zero streaming buffers, you are probably on a newer PCI-based accelerator card. In this case you’ll want to use DSBCAPS_LOCDEFER (and not DSBCAPS_STATIC) to make maximum use of the hardware voices.

USB Speakers

With the advent of USB, several companies have created speakers that connect directly to the computer through USB. A true USB speaker system has no analog inputs at all; the only connection is the all-digital USB connection from the PC. This enables the USB speaker to separate the analog components far away from the electrically noisy environment of the PC. USB speakers can also let the system know information about the speakers themselves, in the same way that a video monitor can. Systems that use USB speakers as the primary DirectSound device do not have the ability to perform hardware acceleration of DirectSound. All their mixing is done through WDM’s kmixer, discussed below.

Windows Audio Driver Models and DirectSound

Windows has changed the underlying driver architecture for audio devices. This is completed with Windows 2000 and all future Windows versions into the foreseeable future.

In “old” DirectSound, the VxD driver model was used and all DirectSound mixing was done in Dsound.vxd, a virtual device driver. Dsound.vxd also provided fairly close access to the actual DMA buffer that the sound card used to receive data from the host CPU. This buffer is the well-known “primary buffer.” A DirectSound application could set specific properties of the primary buffer (like sampling rate and bit depth); this directly impacted the actual properties of the hardware itself.

Under WDM, DirectSound no longer has “direct” access to the sound hardware (except in the case of hardware accelerated buffers). Instead, DirectSound talks to kmixer, (for kernel mixer). kmixer’s job is to convert the format of multiple audio streams to a common format, mix them together and send the result to the hardware. In a sense, it does what dsound.vxd did. One major difference is that, while dsound.vxd only mixed DirectSound buffer data, kmixer mixes all windows audio data. This includes audio data from application that use the WaveOut APIs. (kmixer is more sophisticated in other aspects as well, such as support for multichannel and high-resolution sounds, but we won’t discuss those here.) Since kmixer can mix WaveOut and DirectSound data simultaneously, the old rule that DirectSound and WaveOut can’t both be open at the same time is no longer true on systems with WDM drivers.

Of particular importance is kmixer’s relationship with the audio hardware. kmixer is the only piece of software on the system that can determine the actual format of the hardware’s DMA buffer. It selects the format on the basis of sounds it is asked to mix. The simple rule is

kmixer sets the output format to the lower of:

What the hardware will support.
The highest quality format of sounds that is asked to mix.

This has one very important implication: DirectSound no longer sets the actual format of the hardware’s DMA buffer; kmixer sets it instead. For your application, this means that the hardware format (and associated performance) will be based on the data you actually try to play. If you play a 44-kHz file, kmixer will mix all data up to 44 kHz and ensure the hardware is running at 44-kHz. So, the overall system performance is dependent on the content you give to kmixer. (kmixer does some optimizations. For example, if you have 12 sounds at 22 kHz and one at 44 kHz, it will mix all the 22-kHz sounds together at 22 kHz, and then just sample-rate convert the aggregate to 44 kHz to minimize CPU consumption.)

Note As an application developer, you don’t choose the driver model used. That is completely determined by the type of sound card, Windows version and particular driver the user has installed. For that reason, it is very important that you cover all the bases when testing your application. DirectSound might be using dsound.vxd or it might be using kmixer, and you should ensure your game behavior and performance are acceptable on both. What you need to test to ensure complete audio coverage is described below.

DirectMusic and Hardware Acceleration

With Windows 98, Second Edition, and Windows 2000, DirectMusic acceleration is available on systems that have a WDM driver that supports it. Because of potential differences in features between hardware and software synthesizers, you should ensure that you test your DirectMusic application on Windows 2000 and/or Windows 98, Second Edition with an appropriate DirectMusic accelerating WDM driver. At the time of this writing, cards based on the Yamaha DS1 audio chip have WDM drivers that support hardware acceleration of DirectMusic, and drivers are forthcoming from other manufacturers.

Testing your Games (Audio-wise)

Because of the changes in driver model between operating systems and the various styles of sound cards available, it is important that you test your application across a wide range of systems. In particular, you should test on both WDM and VxD systems, using both accelerating and non-accelerating sound cards.

The following table shows which operating systems support which sound card driver models:

Table 1. Operating system support for sound card driver models

OS	VxD	WDM	Notes
Windows 95	Yes	No
Windows 98	Yes	Yes*	*WDM drivers do not support DirectSound acceleration.
Windows 98, Second Edition	Yes	Yes	Full DirectSound Support. DirectMusic support (WDM only).
Windows 2000	No	Yes	Full DirectSound support. DirectMusic support.

As you can see from the table above, Windows 98 and Windows 98, Second Edition support both WDM and VxD drivers. Which is in use depends entirely on the particular sound card used and the driver the end user happens to have installed.

Following is a list of the various types of sound cards and driver models with which you need to ensure you test your game along with some popular cards. By testing on each of these six categories, you’ll be ensured that your application works properly and with appropriate performance on whatever system the user has.

Table 2. Sound cards and driver models for testing games

	Examples	Comments
ISA Non-accelerating cards with VxD drivers	SB16, OPL3, ESS 186x, Crystal 423x
ISA DirectSound Accelerating cards with VxD Drivers	AWE 32/64 Guillemot	AEW 32/64 is an extremely popular ISA based card.
PCI DirectSound accelerating cards with VxD Drivers	Diamond MX300 (Aureal Vortex-based cards), Diamond MX400, Creative Labs SBLive, Crystal Sound Fusion, Yamaha 192XG, Guillemot MaxiSound Fortissimo	Most common modern DirectSound accelerated scenario. All listed cards do 3-D.
Non-accelerating cards with WDM drivers	Examples: SB16, AWE, ESS 186x, Crystal 423x, Yamaha OPL3, USB speakers	1. USB speakers use WDM exclusively. 2. The Soundblaster AWE card, though capable of acceleration, does not support it through WDM drivers.
DirectSound Accelerating cards with WDM drivers	Aureal Vortex, Crystal sound fusion, Yamaha DS1 Others coming soon	Shipping in Windows 2000. You can also get WDM drivers from the sound card or chip maker.
DirectMusic Accelerating cards with WDM drivers	Yamaha DS1 Others coming soon	Shipping in Windows 2000. You can also get WDM drivers from the sound card or chip maker.

Emulating Performance and the DirectSound “Hardware Acceleration” Slider

Within the Windows 98 and Windows 2000 multimedia control, you can alter the performance of DirectSound on a system-wide basis. The slider can be reached by

 Settings->Control Panel->Multimedia->Audio->Advanced->Performance.

There are four settings: Emulation, Basic, Standard, and Full.

Full—This provides complete DirectSound acceleration, including the enabling of IksPropertySet extensions. IksPropertySet extensions are sound card specific enhancements such as EAX. This is the default setting on Windows 98 and Windows 98, Second Edition.
Standard—This provides acceleration of DirectSound secondary buffers but disables any hardware-specific IksPropertySet extensions such as EAX. This is the default on Windows 2000.
Basic—This disables hardware acceleration of DirectSound secondary buffers. When set to Basic, all sound cards, regardless of capability perform as if there were no DirectSound hardware acceleration present. This option is useful if you want to emulate a non-DirectSound-accelerated sound card for testing purposes.
Emulation—This forces DirectSound into emulation mode, whereby DirectSound acts as if there is no DirectSound compatible driver on the system. All mixing is done by DirectSound in user mode, and the resulting data is written out to the WaveOut API’s. This typically results in a large increase in latency. Note that after you select this setting, you may need to reboot if you want to return to any other setting (Basic, Standard or Full).

Conclusion

The basic architecture of both sound cards and the Windows audio infrastructure has changed. Applications need to ensure that they take best advantage of whatever configuration the user ultimately has. This includes both ensuring maximum performance (for example, by not using both DSBCAPS_STATIC and DSBCAPS_LOCDEFER), and providing adequate testing on all common OS/Sound card/Driver model combinations. Windows 98 and later provide a control panel to alter the behavior of DirectSound in ways that can facilitate such testing.