Programming Audio in Microsoft Windows and on Web Pages: An Overview

 

Peter Donnelly
Microsoft Corporation

April 2004

 

Applies to:
   Microsoft® DirectSound®
   Microsoft DirectMusic®
   Microsoft DirectShow®
   Microsoft DirectPlay® Voice
   Microsoft Windows Media® Format SDK
   Microsoft Windows Media Player ActiveX® control

 

Summary: This article provides a brief introduction to the various technologies that can be used to play and record audio under Microsoft Windows and on Web pages. Solutions for sound playback and capture include the Windows Multimedia functions, Microsoft DirectSound, Microsoft DirectMusic, Microsoft DirectShow, Microsoft DirectPlay Voice, the Windows Media Format SDK, the Windows Media Player control, and HTML+TIME. Each of these solutions has its strengths and is most suitable for particular applications. (21 printed pages)

Contents

Introduction
Waveform and MIDI
Encoding and File Formats
Audio Compression Manager
Windows Driver Model
Windows Multimedia: Simple Playback
DirectSound: Low-Latency Wave Playback
DirectMusic: Dynamic Soundtracks
DirectShow: Integrating Sound and Video
Effects: DirectX Media Objects
Using the Windows Media Player Control in Applications
Windows Media Format SDK
Playing Sounds on Web Pages
Sound Capture
Full Duplex
Conclusion
For More Information

Introduction

The available means for playing and recording audio in Microsoft® Windows® cover a wide range, from functions that have been around for many years like PlaySound, to more recent and elaborate application programming interfaces (APIs) such as Microsoft DirectMusic®. Which is the best tool for the job? This article seeks to help you answer that question by providing an overview of audio in Windows and the solutions provided by Microsoft. In addition, the article provides a brief guide to some techniques for playing audio on a Web page.

This article is aimed at client-side application developers and Web page designers, and does not touch on issues such as format conversion or the streaming of audio over a network.

Some sample code is provided, mainly to give an idea of the relative complexity of the various APIs in simple playback scenarios. Details of implementation are not covered.

Waveform and MIDI

Before discussing the various available technologies for capturing and playing sounds, let's take a quick look at the forms in which those sounds are stored.

Audio data generally falls into one of the following two categories:

  • Musical Instrument Digital Interface (MIDI)
    In MIDI audio, the values in the file do not describe the sound itself, but are simply instructions to a MIDI synthesizer. For example, a few bytes of data are sufficient to produce a middle C note on a flute for any given duration.

    Until recently, MIDI output on computer sound cards was restricted to an agreed-upon set of musical instruments and a few standard sound effects, and the actual sound produced could vary greatly from one sound card to another. However, with the advent of the Downloadable Sounds (DLS) standard, applications can now download waveform samples to a hardware or software synthesizer and then use MIDI to trigger and modify those sounds. DLS makes it possible to combine the economy of MIDI files with the greater fidelity of waveforms. To take advantage of DLS in MIDI playback, use DirectMusic.

  • Waveform
    In waveform audio, digital values describe the waveform of the sound. In the most common format, pulse code modulation (PCM), the amplitude of the wave has been sampled at regular intervals, and this amplitude is stored as an integer value. The precision of the measurement is determined by the number of bits allocated to each sample. For example, at 1 byte per sample, the range of amplitude can be divided into 256 steps, whereas at 2 bytes per sample, it can be divided into 65,536 steps. The quality of the sound is also affected by the sampling frequency, measured in hertz (samples per second). The sampling frequency typically ranges between 8.0 kilohertz (kHz) (low quality) and 44.1 kHz (high quality). The total length of the data is the product of the time, the sample size, and the sampling frequency. However, various compression schemes can be used to reduce file sizes.

On operating systems earlier than Windows 98 Second Edition, waveform formats are restricted to one or two channels of PCM data. For information on the extended capabilities of later operating systems, see the following articles:

Encoding and File Formats

In addition to being either waveform or MIDI, audio data is encoded and stored in a variety of formats. The following are the formats with which you are most likely to work:

  • RIFF
    The Resource Interchange File Format (RIFF) is a flexible format that can be used to store any kind of data in self-describing chunks.

    Waveform data is often stored in RIFF files with the file name extension .wav. These are commonly called WAV files. They usually contain uncompressed PCM data, but can contain compressed formats as well.

    MIDI files also use RIFF and most often have the extension .mid.

    RIFF files can be read or written by using the Windows Multimedia File I/O functions mmioOpen, mmioRead, and so on. However, most of the techniques for playing and capturing sounds discussed in this article do not require you to parse the data yourself.

  • MP3
    This format contains waveform data encoded according to a set of standards developed by the Motion Picture Experts Group (MPEG).

  • Windows Media Format
    Files created in Windows Media® Format can contain video, waveform audio, or both. The audio files are known as Windows Media Audio (WMA) files and have a .wma file name extension. Windows Media files are usually highly compressed and are suitable for streaming over a network.

  • Redbook
    This is the informal term for CD audio.

Audio Compression Manager

Windows contains a component called the Audio Compression Manager (ACM) that handles the streaming of data through compression/decompression filters (codecs).

Most of the technologies discussed in this article automatically make use of the ACM for playing compressed PCM data. DirectSound® is an exception; you must handle the decompression yourself before sending data to DirectSound buffers. In DirectShow®, you can add the ACM Wrapper Filter to your filter graph.

For more information on using the ACM directly, see Audio Compression Manager in the Platform SDK documentation.

Windows Driver Model

Under the Windows Driver Model (WDM), available with Windows 98 Second Edition and later operating systems, final mixing of multiple audio streams is done by the kernel mixer, or KMixer. For most purposes, what happens at the driver level is transparent to applications. However, you should be aware of the following implications of playing audio under WDM:

  • Multiple streams can be played by applications using the Waveform-Audio Interface (commonly called waveOut).
  • The final waveform format is determined by the system, which mixes all streams to the same format.
  • Multichannel sound is automatically mixed to two channels on systems that have only two speakers.
  • Applications cannot control the primary DirectSound buffer.

Windows Multimedia: Simple Playback

The Windows Multimedia APIs provide several ways to play waveform and MIDI audio. For applications that just need to play a few simple sounds with no requirement for low latency, the following APIs can be used.

API Purpose Benefits Disadvantages
PlaySound function Play a waveform. File or resource is loaded and played with a single call. Only one sound can be played at a time. Entire sound must be loaded into memory. No controls such as volume or pause.
Waveform-Audio Interface (waveOut) Play a waveform. Enables control over volume, pitch, position, and so on. Sound data must be passed to the device by the application. Under non-WDM drivers, only one sound can be played at a time.
Media Control Interface (MCI) Play a waveform, MIDI, or other media. Common interface for playing all types of content. Can control and play from devices such as CD and DAT. In Visual Basic®, functionality is exposed by the Multimedia Control. Does not allow close synchronization between MIDI events and other real-time events such as video. No support for DLS.
MIDI Services and Stream Buffers Play MIDI. Low-level device control. Complex to use. No support for DLS.

Owing to its simplicity, PlaySound may still be the best solution for playing simple alerts or non-layered sound effects where timing is not critical. The waveform-audio interface and MCI, though they are much more powerful and flexible than PlaySound, have essentially been superseded by the other technologies discussed in this article. Compared with DirectSound, DirectMusic, and DirectShow, all the APIs in the table have the following disadvantages:

  • Higher latency and inability to take advantage of hardware acceleration.
  • Inability to play multiple sounds simultaneously, except on WDM drivers.
  • No easy way to implement features such as 3-D positioning or effects processing.
  • No implementation of DLS for MIDI output.

DirectSound: Low-Latency Wave Playback

Microsoft DirectSound, part of the Microsoft DirectX® family of technologies, is designed to give games and other applications high-performance and full-featured interfaces to the multimedia capabilities of computers running Windows. DirectSound takes full advantage of hardware acceleration where it is available, but also provides fallback solutions in software.

DirectSound is used for playing waveform audio, specifically uncompressed PCM data. Data is put into secondary sound buffers, which are individually controlled by the application. Sounds playing in secondary buffers are mixed automatically in the primary buffer.

Using DirectSound Buffers

Secondary buffers can be treated as either static or streaming buffers.

A static buffer is loaded with a short sound that is typically played from beginning to end, perhaps many times. Static buffers are ideal for short sound effects. The extremely low latency of DirectSound ensures that the sounds will be well synchronized with visual or input events. For example, when the player in a shooting game pulls the trigger, the sound of the weapon is heard immediately.

A streaming buffer plays from beginning to end, and then loops to the beginning and continues playing. At intervals during playback, new data overwrites data that has already been played. In this way, a fairly small block of memory can be used to play a sound of any length. Streaming buffers can be used for musical backgrounds, lengthy pieces of dialog, and any other sounds that are more than a few seconds long.

The DirectSound C++ API does not have any built-in methods to load PCM data from a file or resource. However, the DirectX SDK includes sample code that you can easily adapt to your needs. The DirectSound API for Microsoft Visual Basic also includes two methods, DirectSound8.CreateSoundBufferFromFile and DirectSound8.CreateSoundBufferFromResource, which create and load static buffers in a single step.

The following Visual Basic sample code shows the steps involved in initializing the DirectSound system, loading a static buffer, and playing it:

Dim dx As New DirectX8
Dim ds As DirectSound8
Dim buff As DirectSoundSecondaryBuffer8

Private Sub Form_Load()
 
  ' Create DirectSound, using the default wave output device.
 
  Set ds = dx.DirectSoundCreate(vbNullString)
 
  ' Set the cooperative level. This defines how the application interacts
  ' with other applications using the same device.
 
  ds.SetCooperativeLevel Me.hWnd, DSSCL_PRIORITY
 
  ' Create a buffer from a .wav file. The DSBUFFERDESC type 
  ' describes the size and capabilities of the buffer, and the WAV
  ' format. In this case, the size and WAV format are determined by
  ' the file, so we don't need to fill in those members. However, we
  ' can set other capabilities. In this example, a flag is set to ensure
  ' that the buffer has panning capabilities.
 
  Dim desc As DSBUFFERDESC
  desc.lFlags = DSBCAPS_CTRLPAN
  Set buff = ds.CreateSoundBufferFromFile("test.wav", desc)
 
End Sub
 
Private Sub btnPlay_Click()
 
  ' The buffer can be set to play just once, or to loop until explicitly
  ' stopped.
 
  buff.Play DSBPLAY_LOOPING
 
End Sub

The application can set various capability flags on buffers, which enable the application to control the buffer in many ways, including the following:

  • Pan the sound from left to right.
  • Move the sound source in 3-D space.
  • Set the volume.
  • Set the frequency.
  • Add sound effects such as echo or distortion.
  • Locate the buffer in hardware-controlled memory or in software-controlled memory.

Advantages of Using DirectSound

DirectSound offers the following advantages for the playback of waveform audio:

  • Very low latency.
  • Full use of hardware acceleration, including dynamic voice management, which ensures the efficient allocation of available hardware buffers.
  • Mixing of multiple sounds, with automatic conversion of WAV formats to a single output format. The number of secondary buffers that can play simultaneously is limited only by the resources available.
  • Full control over 3-D effects, including spatialization of sound sources, directional sound, listener orientation, and Doppler effect. Vectors can easily be integrated with those used in a Microsoft Direct3D® application.
  • Support for multichannel WAV formats, such as 5.1 speaker configurations, on WDM drivers.
  • Support for extended properties on sound cards through the IKsPropertySet interface.

Many of the advantages of DirectSound are also available in DirectMusic, which sends its output to DirectSound buffers and provides full access to the interfaces of those buffers. The main advantage of using the DirectSound API is lower latency than can be provided by DirectMusic. It is quite possible, however, to mix DirectSound and DirectMusic in a single application, using application-created DirectSound static buffers for quick-response sound effects while DirectMusic manages the music, dialog, and ambient sounds.

DirectShow also uses DirectSound as its default audio renderer. The IDirectSound3DBuffer interface can be obtained from the DirectSound renderer filter to apply 3-D spatialization to individual sounds. You can also obtain the IDirectSound3DListener interface to make global changes to the 3-D sound environment.

DirectMusic: Dynamic Soundtracks

Microsoft DirectMusic was originally intended to complement DirectSound, being used to play MIDI-based content (albeit with a host of features that went well beyond the simple playback of musical notes) while DirectSound handled waveform audio. However, with the release of DirectX 8.0, DirectMusic became the primary API for playing all kinds of sounds.

This point is worth emphasizing, because the terminology can be confusing. DirectMusic is not just for playing music. It is a full-featured audio-playback technology that can do everything from playing simple WAV or MIDI files to performing a complex, dynamically changing soundtrack based on specially authored content.

DirectMusic ultimately plays all sounds through DirectSound buffers. An application can retrieve an interface to any of these buffers in order to apply 3-D parameters or manipulate other effects. As a programmer, you have the best of both worlds: you can leave all the loading and streaming of data up to DirectMusic, but you can still have low-level control over the final processing of the sounds.

Advantages of Using DirectMusic for Simple Playback

Leaving aside, for the moment, the many features that make it much more than a playback engine, DirectMusic offers the following advantages for playing simple sounds:

  • Handles the reading of MIDI and waveform data, including waves in compressed formats, from files or resources.
  • Handles all streaming of data.
  • Synthesizes MIDI notes in a software synthesizer that supports Downloadable Sounds Level 2 (DLS2).
  • Gives the application full control over many aspects of playback including volume, tempo, and MIDI patch changes.

A playable unit of sound in DirectMusic is called a segment. The following Visual Basic sample code shows the steps involved in playing a WAV file as a segment:

Dim dx As New DirectX8
Dim dmLoader As DirectMusicLoader8
Dim dmPerformance As DirectMusicPerformance8
Dim waveSeg As DirectMusicSegment8
 
Private Sub Form_Load()
 
  ' The DMUS_AUDIOPARAMS type can be used to change the default 
  ' configuration of resources. We'll simply pass 0 in all members.

  Dim dmParams As DMUS_AUDIOPARAMS
  
  ' Create the loader. This object is used to load all types of content.
 
  Set dmLoader = dx.DirectMusicLoaderCreate
 
  ' Create the performance. This is the overall manager of playback.
 
  Set dmPerformance = dx.DirectMusicPerformanceCreate
 
  ' Initialize the performance. This allocates some resources and sets up
  ' a default audiopath.
 
  dmPerformance.InitAudio Me.hWnd, DMUS_AUDIOF_ALL, dmParams, Nothing, _
      DMUS_APATH_SHARED_STEREOPLUSREVERB, 128
  
  ' Load the wave as a segment object.
 
  Set waveSeg = dmLoader.LoadSegment("test.wav")
 
  ' Make any waveform samples associated with the segment available 
         to the synthesizer.
  ' All sounds, not just MIDI notes, pass through the synthesizer.
 
  seg.Download dmPerformance
 
End Sub
 
 
Private Sub btnPlay_Click()
 
 ' Play the segment immediately. Other flags would enable it to be 
 ' synchronized with events such as a musical beat.
 
  dmPerformance.PlaySegmentEx waveSeg, DMUS_SEGF_REFTIME, 0
 
End Sub

Note that most of the code in the example is for the preliminary setup of the DirectMusic system, which is done just once. After setup is complete, only four steps are necessary to play a segment:

  1. Load the segment.
  2. Download it to the synthesizer.
  3. Play it.
  4. Unload it from the synthesizer, if it won't be played again.

No special steps have to be taken to play a MIDI file instead of a WAV file. The loader and performance objects recognize the data format and deal with it appropriately.

Using DirectMusic to Implement a Dynamic Soundtrack

In addition to providing a simple mechanism for loading and playing MIDI and waveform files, DirectMusic has many other features, only a few of which can be touched on here.

The full power of the DirectMusic API is unleashed when it is used to play content authored in Microsoft DirectMusic Producer, a utility distributed with the DirectX SDK. DirectMusic Producer enables the composer or sound designer to create content from multiple elements, including the following:

  • MIDI sequences.
  • Styles. A style is a collection of note patterns, with variations, that can be arranged and transposed at run time.
  • Chord progressions. As the segment plays, the notes in style patterns are transposed according the current chord.
  • Waveforms. A segment can contain many different waveforms and play them back in different combinations each time the segment loops, making it possible to create ambient sounds that are not repetitious.
  • Commands to trigger other segments or to change some parameter of the performance.
  • Text for debugging, or for passing lyrics or custom commands to the application.
  • Scripts. The content creator can use scripts to control many of the details of playback in the application.

Authored content is largely self-controlling, which means that the application developer has less responsibility for manipulating the soundtrack. However, when an application plays content, it can exert further control and cause the soundtrack to respond to events. For example, the application can do the following:

  • Cause segments to play and stop on appropriate musical boundaries, such as on a beat or bar.
  • Cause different patterns to play by setting a global value called the groove level.
  • Compose transitions on the fly.
  • Change the tempo.
  • Change instrumentation.
  • Force recomposition of the chord progression.
  • Modify 3-D parameters and other effects; for example, add reverberation when a sound source enters a room.

DirectMusic Audiopaths

An important feature of DirectMusic is its ability to create different configurations for playback, called audiopaths, and to play different segments on different audiopaths. A set of standard audiopaths is available to applications through the DirectMusic API, and custom audiopath configurations can be created in DirectMusic Producer.

Among other things, an audiopath defines the set of DirectSound buffers and effects through which the sound will play after it has left the synthesizer. For example, a sound played on the standard DMUS_APATH_SHARED_STEREOPLUSREVERB audiopath is played through a stereo DirectSound buffer and has a reverberation effect applied. A sound played on a DMUS_APATH_DYNAMIC_3D audiopath is played through a DirectSound buffer that has 3-D controls.

An example might help to clarify the usefulness of audiopaths. Suppose you are developing a car-racing game. Each car in the race produces a set of sounds: the roar of the engine, squealing tires, and a horn. For each car, you create a DMUS_APATH_DYNAMIC_3D audiopath and obtain an interface to the associated DirectSound 3-D buffer object. For that car, you play all sounds to the appropriate audiopath. Then, as the car moves in relation to the listener, you have to change the position of only a single 3-D buffer to move all the sounds for that car. Similarly, if you have added a reverb effect to the buffer, you can adjust the reverb for all sounds with a single call in response to an event such as the car entering a tunnel.

DirectShow: Integrating Sound and Video

Microsoft DirectShow is an architecture for streaming media, both audio and video, that is geared toward rapid development of full-featured multimedia applications. DirectShow simplifies media playback, format conversion, and capture, but also provides access to the underlying stream control architecture for applications that require custom solutions.

At the heart of the DirectShow API is the concept of the filter graph, a collection of components that handle the stream at various stages. A graph can be constructed dynamically in code, or it can be authored in a utility called GraphEdit (supplied with the DirectX SDK), saved to a file, and then loaded by the DirectShow application.

DirectShow has limited support for Visual Basic. To use its full power, you must code in C++.

Advantages of DirectShow for Audio Playback

DirectShow is not geared toward dynamic layered soundtracks or the low-latency playback of short sounds. However, it offers the following advantages for streaming audio files:

  • Support for many formats including WAV, MIDI, MP3, and Windows Media Format. (To write applications that play Windows Media files, you need the Windows Media Format SDK as well as the DirectX SDK.)
  • Built-in file parsing and data streaming.
  • High-level API and GraphEdit enable the easy construction of complex filter graphs to add effects.
  • Integration with video playback.

The following example shows how simple it is to set up playback of a file with an effect implemented as a DirectX Media Object (DMO). The filter graph is first constructed in GraphEdit, as follows.

Filter graph in GraphEdit

Figure 1. Filter graph in GraphEdit

The first filter reads the file, the second parses the data, the third adds the Waves Reverb effect, and the last renders the final mix through a DirectSound buffer.

You can edit the parameters of the effect in GraphEdit; for example, to increase the reverb delay. Effect parameters are saved with the graph.

In the application, the graph file can be passed to the IGraphBuilder::RenderFile method just as if it were a simple audio file. The following is a complete console application that plays the graph shown in the illustration:

#include <dshow.h>
 
void main(void)
{
    IGraphBuilder *pGraph;
    IMediaControl *pMediaControl;
    IMediaEvent   *pEvent;
    CoInitialize(NULL);
    
    // Create the filter graph manager and query for interfaces.
    CoCreateInstance(CLSID_FilterGraph, NULL, CLSCTX_INPROC_SERVER, 
                        IID_IGraphBuilder, (void **)&pGraph);
    pGraph->QueryInterface(IID_IMediaControl, (void **)&pMediaControl);
    pGraph->QueryInterface(IID_IMediaEvent, (void **)&pEvent);

    // Build the graph.
    HRESULT hr = pGraph->RenderFile(L"C:\\music\\dixie.grf", NULL);

    // Run the graph. This plays the audio.
    pMediaControl->Run();

    // Wait for completion. 
    long evCode;
    pEvent->WaitForCompletion(INFINITE, &evCode);

    // Clean up.
    pMediaControl->Release();
    pEvent->Release();
    pGraph->Release();
    CoUninitialize();
}

Effects: DirectX Media Objects

DirectX Media Objects (DMOs) are components that can be inserted in a media stream to manipulate the data in some way. In audio, DMOs are generally used to create effects. The use of DMOs is supported in DirectShow, DirectSound, and DirectMusic.

In DirectShow, you implement an effect in code by using the DMO Wrapper Filter. If you're using GraphEdit to construct your graphs, DMOs appear as filters under the DMO Audio Effects category. For more information, see Using DMOs in a DirectShow Application in the DirectShow documentation.

In DirectMusic Producer, effects are inserted into audiopaths. In application code, you can add and remove effects by using the IDirectSoundBuffer8::SetFX method of a secondary sound buffer object. This buffer can be one obtained from an audiopath in a DirectMusic application, or one created by using the DirectSound API. You can obtain an interface to the effect from this buffer in order to set effect parameters.

The following standard effects are installed with the operating system:

  • Chorus
  • Compression
  • Distortion
  • Echo
  • Environmental reverberation
  • Flange
  • Gargle
  • Parametric equalizer
  • Waves reverberation (music reverb based on the Waves MaxxVerb technology, licensed to Microsoft)

Interfaces to these standard effects are exposed in the DirectSound API. In DirectShow, they can be obtained by using the DMOEnum function or the System Device Enumerator.

For information on creating your own DMOs, see DirectX Media Objects in the DirectX documentation.

Using the Windows Media Player Control in Applications

An easy way to implement simple audio playback in a Visual Basic or Microsoft .NET-based application is to use the Windows Media Player control. You can make the control visible or invisible, load any file by setting a property on the control, and start, stop, or seek by using the control's methods.

For simple audio playback scenarios, the Windows Media Player control offers the following advantages:

  • Supports many audio formats including MIDI, WAV, MP3, and Windows Media Audio.
  • Enables seeking to markers in Windows Media files.
  • Handles all loading and streaming of data.

Windows Media Format SDK

The Windows Media Format SDK can be used to create applications that play Windows Media Audio (WMA) files. Using the Windows Media Format SDK, you are responsible for streaming the data to an output device. You can use the Windows Multimedia waveOut functions to do this. Always use the Windows Media Format SDK for scenarios involving Digital Rights Management or custom implementations of the file format.

For simple playback scenarios that do not involve Digital Rights Management, you can use DirectShow to play Windows Media files. DirectShow handles the loading and streaming of data. You need to install the Windows Media Format SDK in order to implement support for WMA files in DirectShow.

Playing Sounds on Web Pages

To play sounds on Web pages, you have several choices:

  • Create a link to a sound file. When the user clicks the link, the sound file is played by the application that is registered to play that type of file on the user's machine.

  • Play a background sound automatically, using the BGSOUND or EMBED tags. The following code plays a .wav file, repeating it as long as the page is open:

    <EMBED SRC="sample.wav" AUTOSTART="True" HIDDEN="True" LOOP="true">
    

    To ensure that the sound plays in all versions of Internet Explorer, add the following code:

    <NOEMBED>
    <BGSOUND SRC="sample.wav" LOOP=INFINITE>
    </NOEMBED>
    
  • Embed the Windows Media Player control.

  • Use HTML+TIME to create timed or interactive presentations.

The following sections discuss the last two options in more detail.

Using the Windows Media Player Control in a Web Page

By using the Windows Media Player control you gain the following advantages:

  • You can play any common audio format, including Windows Media Audio (WMA), which is an ideal format for streaming over the Internet.
  • You can optionally provide the user with player controls to start, stop, and pause playback, and to adjust volume.

The following HTML code shows just the controls of the Windows Media Player and automatically plays a WMA file, without looping:

<HTML>

<head>
<title>Sample HTML Script</title>
</head>

<body>

<object classid="clsid:6BF52A52-394A-11d3-B153-00C04F79FAA6" id="Player" height="40">
  <param name="AutoStart" value="True">
  <param name="uiMode" value="mini">
  <param name="URL" value="sample.wma"> 
</object>

</body></body>

</HTML>

HTML+TIME

HTML+TIME (Timed Interactive Multimedia Extensions), first released in Microsoft Internet Explorer 5, adds timing and media synchronization support to HTML pages. Using Extensible Markup Language (XML) elements and attributes, you can add images, video, and sounds to an HTML page and synchronize them with HTML text elements over a timeline or play them in response to user input. For example, you can create slide-show-style presentations with synchronized text, images, audio, video, and streaming media.

HTML+TIME uses a default player to play audio files including MIDI, WAV, and Windows Media files. Alternatively, you can specify another player.

Sound Capture

This section provides a brief overview of the options for capturing sound from a microphone or other audio input, for recording to a file or for immediate playback in a full-duplex system.

To capture audio, you have the following choices.

API Purpose Benefits Disadvantages
Waveform-Audio Interface (waveInxxx functions) Record a waveform. Can achieve lower latency than DirectSoundCapture. Data must be retrieved and written to a file by the application.
Media Control Interface (MCI) Record a waveform, MIDI, or other media. Saving to a file is simple. In Visual Basic, functionality is exposed by the Multimedia Control.  
MIDI Services Record MIDI. Low-level device control. Data must be retrieved and written to a file by the application.
DirectSoundCapture Record a waveform. Convenient if you are already using the DirectSound headers and DLL. Can control acoustic echo cancellation. Data must be retrieved and written to a file by the application.
DirectMusic Record MIDI. Easy integration with DirectMusic applications. Data must be retrieved and written to a file by the application.
DirectShow Record a waveform. Data compression and output to file is easy. Can capture from more than one device. Cannot control acoustic echo cancellation.

Full Duplex

An application or device is said to have full-duplex capabilities if it can simultaneously capture and play back sounds. A telephone is a full-duplex device. A walkie-talkie is a half-duplex device; you can either listen or talk, but as long as you are pressing the talk button, the device does not play the incoming signal.

In Windows, full duplex can be implemented by using DirectSound. If capture and playback are on the same device, the device must have full-duplex capability.

In Windows XP Home Edition and Windows XP Professional, you can use the acoustic echo cancellation (AEC) capture effect to improve the quality of a full-duplex implementation. AEC is primarily of interest for applications that use voice communication from one site to another. Without AEC, the signal from a microphone at site A is output from the speakers at site B, picked up by the microphone at site B, and rebroadcast at site A, possibly resulting in a feedback loop. AEC overcomes this problem by monitoring the incoming signal, adjusting it to take the room environment into account, and then removing it from the outgoing signal.

To implement full-duplex audio in a network game, use Microsoft DirectPlay® Voice. For more information, see the DirectX SDK documentation.

To implement full-duplex audio in a collaborative online session under Windows XP Home Edition and XP Professional, use the Real-Time Communications (RTC) API. For more information, see the Platform SDK documentation.

Conclusion

Clearly, none of the technologies reviewed in this article suits every playback or recording scenario. Some provide the simplest possible API, others offer a greater degree of control at the expense of development time.

The following table presents some common scenarios, with recommendations for the API or APIs you should consider for each.

Scenario Recommended technology Notes
Redbook (CD) audio MCI  
MIDI files DirectMusic, Windows Media Player control, or Windows Multimedia MIDI services DirectMusic ensures identical playback on all systems by using DLS synthesizer.
Short, non-overlapping WAV sound file PlaySound function  
Music or ambient sounds in a game DirectMusic  
Low-latency sound effects DirectSound static buffers Can be combined with DirectMusic within an application.
Background music or voiceover in a video presentation DirectShow  
Music and sounds with effect DMOs DirectMusic, DirectSound, or DirectShow  
Simple playback on a Web page Windows Media Player control  
Synchronized music on a Web page HTML+TIME  
Playback of unlicensed Windows Media Format audio files DirectShow, Windows Media Format SDK, or Windows Media Player control Consider Windows Media Format when storage space is at a premium.
Playback of licensed Windows Media Format audio files Windows Media Format SDK For more information, see Digital Rights Management in the Windows Media Format SDK Help.
Playback of compressed or custom audio formats DirectShow  
Multiformat player applications Windows Media Player control or DirectShow  
Capture from microphone Waveform-Audio Interface, DirectSound, or DirectShow  
Full duplex DirectSound  
MIDI capture DirectMusic or Windows Multimedia MIDI services  
Voice communication in a network game DirectPlay Voice  
Voice communication in a network collaboration client RTC  

For More Information

The following resources provide more information about the topics covered in this article, as well as the necessary programming tools. All the SDKs listed can be obtained from MSDN Downloads.

DirectSound, DirectMusic, and DirectShow

HTML+TIME

Windows Media Format

Windows Media Player control

Windows Multimedia (MCI and the waveform-audio interface)