Working with AVI Files
by Jonathan Nix

Comment on this Document

Abstract

You have several options when working with AVI files.

* Parse the file yourself.
* Parse the file with MMIO routines.
* Piece together a DirectShow filter.
* Use the Win32 AVIFile API.
* Use MCI to draw it to a window.

There are benefits and drawbacks to each of them, but only Microsoft’s AVIFile api makes it easy enough for the novice, but advanced enough for most purposes. This document deals with the loading and interpreting of AVI files using the AVIFile interface. If you are already comfortable with DirectShow, then I recommend that method instead. As an alternative, AVIFile will allow you to process the many different kinds of AVI files, handle decompression, and read the video frames easily. The API is also for reading any Resource Interchange File Format (RIFF), so learning the API will enable you to process other types of RIFF files like WAV, and even help you to create your own custom format that extends AVI’s capabilities while remaining compatible with other software.

This document will show you how to extract video and sound information from an AVI file. You’ll see how to synchronize game elements over the top of the AVI video, and the sample code shows how the AVI can be blitted with transparency so it’s superimposed over a game’s action. You will receive royalty-free wrapper classes for avi files, bitmap files, and direct draw to get you started.

Introduction

When learning to process a file, I usually learn its file format and create a wrapper class to make it easier for the rest of my code. One thing noticed as files get more advanced, are that they’re utilizing a chunk based format. WAV is probably the easiest of the chunk based, and PCM waves can be parsed with little difficulty. The reason why I recommend using an API for AVI files, though, is because they come in so many forms that you must either learn them all or limit your capability in some way. They’re also usually compressed with one of several different formats.

This document is designed to lead you through the entire process from beginning to end. I only talk about the stuff that’s pertinent to opening and getting frames and audio from the file. I don’t go into how the sound can be played back or the images rendered, because there are many different ways depending on what you want to do. The sample code, written using MSVC++ 6.0, displays the frame sequence using DirectX 7.0 in full screen exclusive mode. If there’s popular demand I will submit other articles to describe in more detail how the video is rendered and sound played back.

Reference

Project Settings

You’ll need to link to winmm.lib, and vfw32.lib to use the AVIFile functions. They should be included in Borland as well. They’re included with MSVC++ 6.0, but should also be available for versions as low as 4.0. Newer versions of the libraries are also part of the latest Platform SDK release from Microsoft.

You can use these functions with any kind of project: C, C++, MFC, Win32, console app, Direct Draw, Direct3D, etc.

Initialization


AviFileInit();

Opening the AVI File

The AVIFileOpen function only takes a string for the filename, as opposed to a file handle. This means that you’ll be unable to embed an AVI file into a proprietary WAD format and load it directly from within while using this API, unless you can figure out a trick. It’s possible to encode the file format and/or change the extension to protect your copyright for a game’s release.


PAVIFILE pAviFile;
if(AVIFileOpen(&pAviFile, “filename.avi”, OF_READ, NULL))
   // error

Getting the File’s Info


AVIFILEINFO info;
AVIFileInfo(pAviFile, &info, sizeof(info));

The info structure contains some extra stuff you might use later on, but nothing spectacular or essential for our purpose so I won’t go into that here.

Finding Audio and Video Streams

An AVI file may have any number of streams of any type. Usually they’re just audio and video streams. It’s possible to open all of the streams and then query what type they are later. Usually a program ignores streams it doesn’t recognize or need. This allows you to innovate the AVI file format, while retaining compatibility with other programs.

I’ll use preallocated arrays to contain only the audio and video streams in the file. Ordinarily I would recommend a linked list, but such implementation details are out of this document’s context, not to mention most AVI files will have only one audio and video stream anyway.


PAVISTREAM
   pAudio[MAX_AUDIO_STREAMS],
   pVideo[MAX_VIDEO_STREAMS];

int nNumAudioStreams=0, nNumVideoStreams=0;

The loops to open each stream are pretty strait forward. I explicitly specify what type of stream to load, either streamtypeAUDIO or streamtypeVIDEO. I ignore any other stream that might be in the file, like streamtypeTEXT or streamtypeMIDI. To load any stream type available, specify zero for the streamtype.


do {
      if(AVIFileGetStream(pAviFile, &pAudio[nNumAudioStreams],
         streamtypeAUDIO, nNumAudioStreams))
         break;
} while(++nNumAudioStreams < MAX_AUDIO_STREAMS);

do {
   if(AVIFileGetStream(pAviFile, &pVideo[nNumVideoStreams],
      streamtypeVIDEO, nNumVideoStreams))
      break;
} while(++nNumVideoStreams < MAX_VIDEO_STREAMS);

Now we have neat arrays of audio and video streams, and we know the number contained in each. Processing them will consist of looping through these streams, so here forward I simply refer to the current stream as pStream. Note that we haven’t actually loaded anything yet, we’ve merely obtained a handle to the data that’s in the file. This allows us to play potentially massive AVI files without a significant memory impact.

Getting a Stream’s Info

A stream’s information is obtained simply through the use of this function.


AVISTREAMINFO info;
if(AVIStreamInfo(pStream, &infoAudio, sizeof(info)))
   // error

Like the file’s info, this stuff isn’t essential for processing an AVI file. There are some things you can calculate from the structure members, but I’ve found an easier way to determine these values that’s described later.

Determining a Stream’s Format

Since a stream can be of numeral kinds, we need to know its format. First of all, we must know how long the format data is. The following code accomplishes that.


LONG lSize; // in bytes
if(AVIStreamReadFormat(pStream, AVIStreamStart(pStream), NULL, &lSize))
   // error

* Audio Stream Specifics

The format data for an audio stream is based on the WAVEFORMAT structure, but it may have a few extra data members at the end and a different structure name. You can tell what structure it is by comparing the lSize variable with the sizeof(WAVEFORMATEX) or sizeof(PCMWAVEFORMAT). These structures, and most others, simply extend WAVEFORMAT with a few extra bytes.

PCMWAVEFORMAT includes the important wBitsPerSample member, and WAVEFORMATEX includes both wBitsPerSample and cbSize. The cbSize member tells how many extra bytes are stored after the WAVEFORMATEX structure. The extra bytes are for non-Pulse Code Modulation (PCM) formats, if you want to support those. Usually you’ll only find PCM formats, but the code you’re about to see supports all of them.

We’ll accomplish that by reading in a chunk and casting it to a WAVEFORMAT pointer.


LPBYTE pChunk = new BYTE[lSize];
if(!pChunk)
   // allocation error

if(AVIStreamReadFormat(pStream, AVIStreamStart(pStream), pChunk, &lSize))
   // error

LPWAVEFORMAT pWaveFormat = (LPWAVEFORMAT)pChunk;

Now that we know the audio format, we are better equipped to interpret the actual sound information in order to play it back. It’s not necessary to be familiar with the structure members just yet.

* Video Stream Specifics

The BITMAPINFO structure defines the format for a video stream. That structure contains one of the BITMAPINFOHEADER, BITMAPV4HEADER, or BITMAPV5HEADER structures, followed by the palette information if the image format is 8bits per pixel. Software for Windows 98 or Windows2000 should write the BITMAPV5HEADER, but for reading an AVI file we need to determine which version was stored in the file in order to be backward compatible. That’s easiest by allocating and reading it all in one chunk like we did with the sound format.


LPBYTE pChunk = new BYTE[lSize];
if(!pChunk)
// allocation error

if(AVIStreamReadFormat(pStream, AVIStreamStart(pStream), pChunk, lSize))
    // error

The reason why that’s possible, is because each structure begins with the same information as BITMAPINFO, but adds a few extra members to the previous version. If you determine that you need these extra members for a project you’re working on, it’s easy to cast the data chunk over to the appropriate structure pointer:


// Only if you need to:
LPBITMAPINFO pInfo = (LPBITMAPINFO)pChunk;
DWORD biSize = pInfo->bmiHeader.biSize;
switch(biSize)
{
    case sizeof(BITMAPV5HEADER):
        // ...
    case sizeof(BITMAPV4HEADER):
        // ...
    case sizeof(BITMAPINFOHEADER):
        // ...
};

The BITMAPINFO structure tells us a lot about the image format. It tells what type of compression is used, the frame size, bit depth, etc. That’s all the necessary information one needs when converting image data over to a GDI HBITMAP, MFC CBitmap, LPDIRECTDRAWSURFACE, or custom format.

Processing an Audio Stream

Streams of this type are typically uncompressed, so it’s probably best to describe that method. We’ll start by determining the size of the audio data contained in the stream.


LONG lSize;
if(AVIStreamRead(pStream, 0, AVISTREAMREAD_CONVENIENT, NULL, 0, &lSize, NULL))
    // error

Since we already know the stream’s format, we’ll load the sound data into a byte buffer.


LPBYTE pBuffer = new BYTE[lSize];
if(!pBuffer)
    // error

if(AVIStreamRead(pStream, 0, AVISTREAMREAD_CONVENIENT, pBuffer, lSize, NULL, NULL))
    // error

So now with the sound format and data, it’s a simple task to create a DirectSound buffer, play it back through a Win32 multimedia function, or custom library. Changing the function calls won’t do anything special, but you can read about the parameters if you want in the online docs.

Processing a Video Stream

The great thing about AVIFile is that it handles decompression of video for us. We’ll initialize that feature in the following code.

Since we determined what format the frames are stored in earlier, you can modify it slightly, or calculate a new format from scratch, and have AVIFile convert the frames into whatever format best suits your rendering system.

The PGETFRAME pointer is used by the system to handle the decompression of the video frames. The NULL parameter says I just want AVIFile to leave the image format the way it is. You can try passing a BITMAPINFO pointer, perhaps by changing the image format that was loaded before.


PGETFRAME pgf;
pgf = AVIStreamGetFrameOpen(pStream, NULL);
if(!pgf)
    // error

Now that the decompression system has been initialization, we can enter the loop that plucks and displays each frame for this video stream. AVIFile is organized so you only need one frame in memory at a time, allowing you to quickly play large files, but you can copy or buffer the frames as needed.

Next we’ll determine which frame we need to pluck from the stream. Usually that’s accomplished by incrementing the value lTime each millisecond via a multimedia timer, or calculating it via the difference in time between frame renderings. The API functions are then used to calculate a frame value based on the amount of time elapsed since the beginning of play. This allows accurate playback for whatever the file specifies as its playback speed, regardless of the time it takes us to render a frame. With this code you’ll be able to synchronize the video with speech or game events. Alternatively you can calculate the lFrame variable through any means depending on what effect you’re accomplishing.


// Precalculated: When stream is opened
lEndTime = AVIStreamEndTime(pStream);

// Calculated just before next frame is blitted
if(lTime <= lEndTime)
    lFrame = AVIStreamTimeToSample(pStream, lTime);
else // the video is done

With that information, it’s easy to pluck a packed DIB from the video stream.


LPBITMAPINFOHEADER lpbi;
lpbi = (LPBITMAPINFOHEADER)AVIStreamGetFrame(pgf, lFrame);

The packed DIB is comprised of a BITMAPINFOHEADER structure, followed by the palette information if needed, and then followed by the bitmap data. All of this is just one sequential block of memory, so it’s possible to calculate the palette and bitmap pointers using pointer arithmetic.


// For 16, 24, or 32bit image formats
LPBYTE pData = lpbi + lpbi->biSize;

// For 8bit image formats
LPBYTE pData = lpbi + lpbi->biSize + 256 * sizeof(RGBQUAD);

The image data that has been extracted is only good until the next time we call AVIStreamGetFrame, so it’s important to display it to screen, copy it to a texture or bmp file, or whatever you want to do with it. I don’t go into such details here, but you’ll see how in the provided sample code.

When all of the frames have been processed, it’s important to close down the decompression system as follows.


if(AVIStreamGetFrameClose(pgf))
    // error

Cleaning up

After you’re done with the streams, and file, you release them as follows. That’s all there is to it! Hopefully I have helped you to understand this file format. Feel free to email me if you have any questions.


// Remember to release all streams
AVIStreamRelease(pStream);
AVIStreamRelease(pStream);

AVIFileRelease(pAviFile);
AVIFileExit();

Conclusion

Did you skip the document just to get to the samples?

How can you have any pudding if you haven’t finished your meat?

- Pink Floyd, “The Wall”

* Sample One

The first sample plays an AVI file that was made by Klowner. This sample uses my CDirectDraw wrapper class to do the rendering. The file is first played forwards, and then played in reverse so the action is seamless. It’s time synchronized so the visual frame rate is consistent with the frame rate specified by Klowner when he made the file. Each frame is also blitted with transparency, so the sequence can be superimposed over a backdrop during a game’s action.

* Sample Two, featuring the PowerRender API

The second sample shows a rotating teapot superimposed over the AVI file’s action. It may look like the teapot is being stretched, skewed and zooming in and out, but I used PR’s camera features namely field of view and aspect ratio to optimize those effects. This sample requires a hardware accelerator compatible with Microsoft’s Direct3D in order to operate, but should be compatible with OpenGL, GLIDE, or software rendering depending on what you recompile it for. If you want more information on the professional PowerRender api, here’s their site: Egerter software.

à[Get the Samples Now!!]ß

à[Visit my Web Site!!]ß

©1999 Jonathan Nix. All Rights Reserved.

All sample code is subject the most current copyrights and/or disclaimers posted on my website.

PowerRender is a trademark of Egerter Software.

Direct3D and Microsoft are trademarks of Microsoft Coorporation.

Discuss this article in the forums


Date this article was posted to GameDev.net: 11/8/1999
(Note that this date does not necessarily correspond to the date the article was written)

See Also:
General

© 1999-2011 Gamedev.net. All rights reserved. Terms of Use Privacy Policy
Comments? Questions? Feedback? Click here!