Streaming Wave Files with DirectSound


class WaveFile
{
public:
  WaveFile (void);
  ~WaveFile (void);
  BOOL Open (LPSTR pszFilename);
  BOOL Cue (void);
  UINT Read (BYTE * pbDest, UINT cbSize);
  UINT GetNumBytesRemaining (void) { return (m_nDataSize - m_nBytesPlayed); }
  UINT GetAvgDataRate (void) { return (m_nAvgDataRate); }
  UINT GetDataSize (void) { return (m_nDataSize); }
  UINT GetNumBytesPlayed (void) { return (m_nBytesPlayed); }
  UINT GetDuration (void) { return (m_nDuration); }
  BYTE GetSilenceData (void);
  WAVEFORMATEX * m_pwfmt;
protected:
  HMMIO m_hmmio;
  MMRESULT m_mmr;
  MMCKINFO m_mmckiRiff;
  MMCKINFO m_mmckiFmt;
  MMCKINFO m_mmckiData;
  UINT m_nDuration;      // duration of sound in msec
  UINT m_nBlockAlign;    // wave data block alignment spec
  UINT m_nAvgDataRate;   // average wave data rate
  UINT m_nDataSize;      // size of data chunk
  UINT m_nBytesPlayed;   // offset into data chunk
};

This class was designed expressly to stream wave file data, hence there are none of the traditional file I/O functions for operations such as seeking, writing, and creating new files. The following table describes the purpose of each of the member functions in the WaveFile class:

Function	Description
Open	Opens a wave file.
Cue	Cues a wave file for playback.
Read	Reads a given number of data bytes.
GetNumBytesRemaining	Returns the number of data bytes remaining to be read.
GetAvgDataRate	Returns the average data rate in bytes per second.
GetDataSize	Returns the total number of wave data bytes.
GetNumBytesPlayed	Returns the number of data bytes that have been read.
GetDuration	Gets the duration of the wave file in milliseconds.
GetSilenceData	Returns a byte of data representing silence.

I chose to use the Win32 Multimedia File I/O services (MMIO) for implementation of WaveFile objects because these services take care of the basics of parsing the chunks in Resource Interchange File Format (RIFF) files. Since the point of this article is to explain streaming with DirectSound, I'm not going to explain the WaveFile code in detail. Take my word for it: the biggest challenge in writing this code was properly handling the myriad of errors that can occur when accessing files.

Silence, Please!

There is one detail I do want to explain. Implementing the AudioStream class required that blocks of data representing silence be written to the sound buffer (if you read the remainder of this article, you'll learn why). Since the data representing silence depends on the format of the wave file, I added a GetSilenceData member function to the WaveFile class. Word size for pulse-code modulation (PCM) formats can range from one byte for 8-bit mono to four bytes for 16-bit stereo, as shown in the following table.

PCM Format	Word Size	Silence Data
8-bit mono	1 byte	0x80
8-bit stereo	2 bytes	0x8080
16-bit mono	2 bytes	0x0000
16-bit stereo	4 bytes	0x00000000

Rather than make the AudioStream code deal with the different word sizes for different wave file formats, I took advantage of the fact that regardless of word size, silence data for PCM formats can be represented by a single byte. Thus, the GetSilenceData functions returns a BYTE. This shortcut saved me from having to write a lot of extra code.

The AudioStreamServices Object

The DirectSound interface consists of two objects, IDirectSound and IDirectSoundBuffer. The IDirectSound object represents the DirectSound services for a single window. Services are apportioned on a per-windows basis to facilitate muting a sound stream when a window loses the input focus. I created the AudioStreamServices class to wrap the IDirectSound object:


class AudioStreamServices
{
public:
  AudioStreamServices (void);
  ~AudioStreamServices (void);
  BOOL Initialize (HWND hwnd);
  LPDIRECTSOUND GetPDS (void) { return m_pds; }
protected:
  HWND m_hwnd;
  LPDIRECTSOUND m_pds;
};

As you can see, this is a pretty light class. In addition to a constructor and destructor, there are two member functions, Initialize and GetPDS. The GetPDS function returns the pointer to the IDirectSound object created by the Initialize function. The Initialize function takes a window handle and creates and initializes an IDirectSound object. Here's the code for the Initialize function:


// Initialize
BOOL AudioStreamServices::Initialize (HWND hwnd)
{
  BOOL fRtn = SUCCESS;  // assume success

  if (m_pds == NULL)
  {
    if (hwnd)
    {
      m_hwnd = hwnd;

      // Create IDirectSound object
      if (DirectSoundCreate (NULL, &m_pds, NULL) == DS_OK)
      {
        // Set cooperative level for DirectSound. Normal means our
        // sounds will be silenced when our window loses input focus.
        if (m_pds->SetCooperativeLevel (m_hwnd, DSSCL_NORMAL) == DS_OK)
        {
          // Any additional initialization goes here
        }
        else
        {
          // Error
          DOUT ("ERROR: Unable to set cooperative level\n\r");
          fRtn = FAILURE;
        }
      }
      else
      {
        // Error
        DOUT ("ERROR: Unable to create IDirectSound object\n\r");
        fRtn = FAILURE;
      }
    }
    else
    {
      // Error, invalid hwnd
      DOUT ("ERROR: Invalid hwnd, unable to initialize services\n\r");
      fRtn = FAILURE;
    }
  }

  return (fRtn);
}

The Initialize function creates an IDirectSound object by calling the DirectSoundCreate function. The first parameter to the DirectSoundCreate call is NULL to request the default DirectSound device. The second parameter is a pointer to a location that DirectSoundCreate fills with a pointer to an IDirectSound object. The pointer returned by DirectSoundCreate provides an interface for accessing IDirectSound member functions.

After successfully creating an IDirectSound object, the Initialize code calls the SetCooperativeLevel member function specifying the DSSCL_NORMAL flag to set the normal cooperative level. This is the lowest cooperative level--other levels are available if you require more control of DirectSound's buffers. For example, in normal cooperative level, the format of audio output is always 8-bit 22kHz mono. To change to another output format, you have to set the priority cooperative level (DSSCL_PRIORITY) and call the SetFormat function.

The AudioStream Object

Now we're down to the good stuff. I've explained how to use AudioStreamServices and AudioStream objects in an application. I've described the Timer and WaveFile objects that are used to provide periodic timer services and read wave files. Now I'm going to explain the implementation of the AudioStream object, the object that actually streams wave files using DirectSound. Here's the AudioStream class declaration:


class AudioStream
{
public:
  AudioStream (void);
  ~AudioStream (void);
  BOOL Create (LPSTR pszFilename, AudioStreamServices * pass);
  BOOL Destroy (void);
  void Play (void);
  void Stop (void);
protected:
  void Cue (void);
  BOOL WriteWaveData (UINT cbSize);
  BOOL WriteSilence (UINT cbSize);
  DWORD GetMaxWriteSize (void);
  BOOL ServiceBuffer (void);
  static BOOL TimerCallback (DWORD dwUser);
  AudioStreamServices * m_pass;  // ptr to AudioStreamServices object
  LPDIRECTSOUNDBUFFER m_pdsb;    // sound buffer
  WaveFile * m_pwavefile;        // ptr to WaveFile object
  Timer * m_ptimer;              // ptr to Timer object
  BOOL m_fCued;                  // semaphore (stream cued)
  BOOL m_fPlaying;               // semaphore (stream playing)
  DSBUFFERDESC m_dsbd;           // sound buffer description
  LONG m_lInService;             // reentrancy semaphore
  UINT m_cbBufOffset;            // last write position
  UINT m_nBufLength;             // length of sound buffer in msec
  UINT m_cbBufSize;              // size of sound buffer in bytes
  UINT m_nBufService;            // service interval in msec
  UINT m_nDuration;              // duration of wave file
  UINT m_nTimeStarted;           // time (in system time) playback started
  UINT m_nTimeElapsed;           // elapsed time in msec since playback started
};

In addition to a standard constructor and destructor, there are four public interface methods: Create, Destroy, Play, and Stop. The purpose of these methods should be obvious from the names I've given them.

The main players here are the Create and Play methods, and a third method, ServiceBuffer, that is not an interface. Here is an explanation of the role each of these methods plays in streaming wave files:

Create opens a wave file, creates a sound buffer, and cues the stream for playback.
Play begins DirectSound playback and launches a timer to service the sound buffer.
ServiceBuffer determines how much of sound buffer is free and fills free space with wave data (or with silence data if all wave data has been sent to buffer). ServiceBuffer also maintains an elapsed time count and stops playback when all of wave file has been played.

Creating the Sound Buffer

Before creating a sound buffer, you must open the wave file to determine its format, average data rate, and duration. Here's the corresponding code from the Create method:


// Create a new WaveFile object
if (m_pwavefile = new WaveFile)
{
  // Open given file
  if (m_pwavefile->Open (pszFilename))
  {
    // Calculate sound buffer size in bytes
    m_cbBufSize = (m_pwavefile->GetAvgDataRate () * m_nBufLength) / 1000;
    m_cbBufSize =   (m_cbBufSize > m_pwavefile->GetDataSize ())
            ? m_pwavefile->GetDataSize ()
            : m_cbBufSize;

    // Get duration of sound (in milliseconds)
    m_nDuration = m_pwavefile->GetDuration ();
    
    . . .
  }
}

After opening the file, Create determines the required size of the sound buffer and the duration of the sound. The size of the sound buffer is calculated from the average data rate and the default buffer length in milliseconds (the m_nBufLength data member). The default buffer length is set to a constant in the AudioStream constructor. I chose to use a two-second sound buffer, but it's a good idea to experiment with your particular application. The timer interval for servicing the sound buffer should be no more than half of the buffer length. I used a 500-millisecond service interval, one-fourth the length of the sound buffer. You can adjust the buffer length and buffer service intervals in the STREAMS sample application by changing the DefBufferLength and DefBufferServiceInterval constants in the AUDIOSTREAM.CPP file:


const UINT DefBufferLength      = 2000;
const UINT DefBufferServiceInterval  = 250;

After successfully opening the wave file and calculating the required buffer size, Create creates a DirectSound sound buffer by initializing a DSBUFFERDESC structure and calling IDirectSound::CreateSoundBuffer:


// Create sound buffer
HRESULT hr;
memset (&m_dsbd, 0, sizeof (DSBUFFERDESC));
m_dsbd.dwSize = sizeof (DSBUFFERDESC);
m_dsbd.dwBufferBytes = m_cbBufSize;
m_dsbd.lpwfxFormat = m_pwavefile->m_pwfmt;
hr = (m_pass->GetPDS ())->CreateSoundBuffer (&m_dsbd, &m_pdsb, NULL);

The lpwfxFormat element of the DSBUFFERDESC structure points to a WAVEFORMATEX structure specifying the format of the wave file. Currently, DirectSound will not play compressed wave formats. The CreateSoundBuffer method will fail for any formats that are not PCM. Note that no flags are specified for DSBUFFERDESC.dwFlags. This causes CreateSoundBuffer to create a looping secondary buffer which is the proper type of buffer for streaming.

Filling the Sound Buffer with Wave Data

After successfully creating the sound buffer, Create calls the AudioStream::Cue method to prepare the stream for playback. Cue resets the buffer pointers and the file pointer and then calls AudioStream:: WriteWaveData to fill the buffer with data from the wave file. The following is the source for WriteWaveData:


BOOL AudioStream::WriteWaveData (UINT size)
{
  HRESULT hr;
  LPBYTE lpbuf1 = NULL;
  LPBYTE lpbuf2 = NULL;
  DWORD dwsize1 = 0;
  DWORD dwsize2 = 0;
  DWORD dwbyteswritten1 = 0;
  DWORD dwbyteswritten2 = 0;
  BOOL fRtn = SUCCESS;

  // Lock the sound buffer
  hr = m_pdsb->Lock (m_cbBufOffset, size, &lpbuf1, &dwsize1, &lpbuf2, &dwsize2, 0);
  if (hr == DS_OK)
  {
    // Write data to sound buffer. Because the sound buffer is circular,
    // we may have to do two write operations if locked portion of buffer
    // wraps around to start of buffer.
    ASSERT (lpbuf1);
    if ((dwbyteswritten1 = m_pwavefile->Read (lpbuf1, dwsize1)) == dwsize1)
    {
      // Second write required?
      if (lpbuf2)
      {
        if ((dwbyteswritten2 = m_pwavefile->Read (lpbuf2, dwsize2)) == dwsize2)
        {
          // Both write operations successful!
        }
        else
        {
          // Error, didn't read wave data completely
          fRtn = FAILURE;
        }
      }
    }
    else
    {
      // Error, didn't read wave data completely
      fRtn = FAILURE;
    }

    // Update our buffer offset and unlock sound buffer
    m_cbBufOffset = (m_cbBufOffset + dwbyteswritten1 + dwbyteswritten2)
                     % m_cbBufSize;
    m_pdsb->Unlock (lpbuf1, dwbyteswritten1, lpbuf2, dwbyteswritten2);
  }
  else
  {
    // Error locking sound buffer
    fRtn = FAILURE;
  }

  return (fRtn);
}

WriteWaveData reads a given number of data bytes from the wave file and writes the data to the sound buffer. To write data to a DirectSound sound buffer you must first call the IDirectSoundBuffer::Lock method to get write pointers. No that's not a typo, Lock return two pointers. Usually, the second pointer will be returned as NULL, but if the write operation spans the end of the buffer the second pointer will be a valid address (the beginning of the buffer). That's the nature of circular buffers. No problem though, the resulting code is still pretty simple and straightforward.

Beginning Playback

The AudioStream::Play method begins playback by calling the IDirectSoundBuffer::Play method and creating a timer to service the sound buffer:


// Begin DirectSound playback
HRESULT hr = m_pdsb->Play (0, 0, DSBPLAY_LOOPING);
if (hr == DS_OK)
{
  // Save current time (for elapsed time calculation)
  m_nTimeStarted = timeGetTime ();
  
  // Kick off timer to service buffer
  m_ptimer = new Timer ();
  if (m_ptimer)
  {
    m_ptimer->Create (m_nBufService, m_nBufService, DWORD (this),
                      TimerCallback);
  }

  . . . 
}

Note that the call to IDirectSoundBuffer::Play includes the DSBPLAY_LOOPING flag to specify that playback continue until explicitly stopped. Play also sets the m_nTimeStarted data member to the current system time (in milliseconds) to allow calculation of the time that has elapsed since playback was started.

Servicing the Sound Buffer

The Timer object created by AudioStream::Play periodically calls the ServiceBuffer routine to perform the following tasks:

Maintain an elapsed time count.
Determine if playback is complete and stop if necessary.
Fill sound buffer with more wave data or with silence data if all wave data has been sent to buffer.

The following is the complete source for ServiceBuffer:


LONG lInService = FALSE;  // reentrancy semaphore

BOOL AudioStream::ServiceBuffer (void)
{
  BOOL fRtn = TRUE;

  // Check for reentrance
  if (InterlockedExchange (&lInService, TRUE) == FALSE)
  { // Not reentered, proceed normally
    // Maintain elapsed time count
    m_nTimeElapsed = timeGetTime () - m_nTimeStarted;

    // Stop if all of sound has played
    if (m_nTimeElapsed < m_nDuration)
    {
      // All of sound not played yet, send more data to buffer
      DWORD dwFreeSpace = GetMaxWriteSize ();

      // Determine free space in sound buffer
      if (dwFreeSpace)
      {
        // See how much wave data remains to be sent to buffer
        DWORD dwDataRemaining = m_pwavefile->GetNumBytesRemaining ();
        if (dwDataRemaining == 0)
        { // All wave data has been sent to buffer
          // Fill free space with silence
          if (WriteSilence (dwFreeSpace) == FAILURE)
          { // Error writing silence data
            fRtn = FALSE;
          }
        }
        else if (dwDataRemaining >= dwFreeSpace)
        { // Enough wave data remains to fill free space in buffer
          // Fill free space in buffer with wave data
          if (WriteWaveData (dwFreeSpace) == FAILURE)
          { // Error writing wave data
            fRtn = FALSE;
          }
        }
        else
        { // Some wave data remains, but not enough to fill free space
          // Write wave data, fill remainder of free space with silence
          if (WriteWaveData (dwDataRemaining) == SUCCESS)
          {
            if (WriteSilence (dwFreeSpace - dwDataRemaining) == FAILURE)
            { // Error writing silence data
              fRtn = FALSE;
            }
          }
          else
          { // Error writing wave data
            fRtn = FALSE;
          }
        }
      }
      else
      { // No free space in buffer for some reason
        fRtn = FALSE;
      }
    }
    else
    { // All of sound has played, stop playback
      Stop ();
    }
    // Reset reentrancy semaphore
    InterlockedExchange (&lInService, FALSE);
  }
  else
  { // Service routine reentered. Do nothing, just return
    fRtn = FALSE;
  }
  return (fRtn);
}

I feel like the code pretty much speaks for itself here (that's why I included all of this rather lengthy routine). There are several things I want to explain, however. The first is the call to InterlockedExchange. This is a nifty Win32 synchronization mechanism that I'm using to detect if the ServiceBuffer routine is reentered. It's possible that you could still be servicing the buffer when another timer interrupt comes along. If ServiceBuffer is reentered, it simply returns immediately without doing anything.

I also want to explain why you need to write silence data to the sound buffer. DirectSound has no concept of when playback of a wave file is complete--it just happily cycles through the sound buffer playing whatever data is there until it's told to stop. The ServiceBuffer routine keeps track of how much time has elapsed since playback was started and stops playback as soon as enough time has elapsed to play the entire wave file. Since you can't stop playback at the exact millisecond that the last wave data byte is played, you have to follow the wave data with data representing silence. If you don't do this, you will get some random blip of sound at the end of a wave file.

Managing the Read-and-Write Cursors

Two offsets are required to manage data in a circular buffer. Traditionally these offsets are called the head and the tail of the buffer. I can never remember which is the head and which is the tail, so I like to call these two offsets the "read cursor" and the "write cursor." In this case, the read cursor identifies the location in the buffer where DirectSound is reading wave data and the write cursor identifies the location where we need to write the next block of wave data.

If you take a look at the IDirectSoundBuffer::GetCurrentPosition method, you'll see that it returns a read cursor and a write cursor. Looks easy enough. At least that's what I thought, but that's not exactly correct. It took me several days of hair-pulling to fi gure out that the write cursor returned by GetCurrentPosition was not the write cursor I needed to manage a sound buffer. Don't you hate it when things don't work like you want them to?

To manage a sound buffer with DirectSound, you must maintain your own write cursor. In the AudioStream class I represent the write cursor with the m_cbBufOffset data member. Each time you write wave data to the sound buffer, you must increment m_cbBufOffset and check to see if it has wrapped around to the beginning of the buffer. It's not difficult code to write, but it certainly took me a while to discover that I couldn't use the write cursor provided by DirectSound! The following code is a helper method called by ServiceBuffer to determine how much of the sound buffer has already been played (in other words, how much data can be written to the sound buffer):


DWORD AudioStream::GetMaxWriteSize (void)
{
  DWORD dwWriteCursor, dwPlayCursor, dwMaxSize;

  // Get current play position
  if (m_pdsb->GetCurrentPosition (&dwPlayCursor, &dwWriteCursor) == DS_OK)
  {
    if (m_cbBufOffset <= dwPlayCursor)
    {
      // Our write position trails play cursor
      dwMaxSize = dwPlayCursor - m_cbBufOffset;
    }

    else // (m_cbBufOffset > dwPlayCursor)
    {
      // Play cursor has wrapped
      dwMaxSize = m_cbBufSize - m_cbBufOffset + dwPlayCursor;
    }
  }
  else
  {
    // GetCurrentPosition call failed
    ASSERT (0);
    dwMaxSize = 0;
  }
  return (dwMaxSize);
}

GetMaxWriteSize provides a good illustration of how to manage the read and write cursors. You may also want to look at the WriteWaveData method presented earlier and see how m_cbBufOffset is used with the IDirectSoundBuffer::Lock method to get an actual write pointer in the sound buffer.

Now I'll bet you're wondering what the deal is with the write cursor maintained by DirectSound. No, it's not broken, that's the way it was designed to operate! DirectSound's write cursor specifies the position in the buffer where it is safe to write data. During playback, DirectSound won't allow you to write to the section of the sound buffer that begins with its play cursor and ends with its write cursor. Typically, this is about 15 milliseconds worth of data. DirectSound does not change its write cursor when you write data to a sound buffer--the write cursor always tracks the play cursor and leads it by about 15 milliseconds during playback.

Quick Fix: A Summary of Streaming with DirectSound

This following list summarizes what you need to know about streaming wave files with DirectSound:

DirectSound uses a single sound buffer. For streaming, you need to create a looping secondary buffer by calling the IDirectSound::CreateSoundBuffer method without specifying either the DSBCAPS_STATIC or DSBCAPS_PRIMARYBUFFER flags in the DSBUFFERDESC structure.
The required size of the sound buffer depends on the format of the wave file you are streaming. For example, a 44.1 kHz 16-bit stereo file will require a much larger sound buffer than an 11.025 kHz 8-bit mono file. I recommend using a one- or two-second sound buffer.
Use the Win32 multimedia timer services to provide a periodic timer interrupt to service the sound buffer. The timer interval depends on the size of the sound buffer and the data rate of the wave file you are streaming. I recommend using a timer interval that is one-fourth the size of your sound buffer. For example, with a two-second sound buffer, use a timer interval of 500 milliseconds.
There are two pointers used to manage the contents of the sound buffer, a play cursor and a write cursor. DirectSound maintains the play cursor, which you can obtain with the IDirectSoundBuffer::GetCurrentPosition method. You must maintain your own write cursor to determine how much wave data to write into the buffer and where to write the data. Don't use the write cursor maintained by DirectSound for this purpose.
DirectSound will continue to play the contents of the sound buffer until you tell it to stop. After you've written all of the wave file data into the sound buffer, you must write data representing silence to the buffer until you determine that all of the wave file data has been played. To determine when all of the data has been played, calculate the duration of the wave file and keep track of how much time has elapsed since you began playback.
DirectSound only plays PCM data formats. Compressed wave formats are not supported. To play compressed wave data, you must first expand the data into PCM format before writing the data to a DirectSound sound buffer.

Discuss this article in the forums

Date this article was posted to GameDev.net: 9/14/1999
(Note that this date does not necessarily correspond to the date the article was written)

See Also:
DirectX Audio