An Introduction to DirectShow Parser Filters Page 1 of 10
http://www.gdcl.co.uk
An Introduction to DirectShow
Parser Filters
Geraint Davies
Introduction & Overview
DirectShow is a multimedia architecture on Windows® that is widely used for video
playback as well as other tasks. Instead of a monolithic core with optional plug-ins, it is
made up of separate, replaceable components that are connected together to make a filter
graph. These filter graphs are built to perform the playback, recording or media
processing task.
If there is a playback engine at the heart of DirectShow, it is the parser filter. This
component is responsible for selecting the range of media data to be played and setting
the timing of playback, as well as preparation of demultiplexed elementary stream data
for decoding and delivery. The requirements and behaviour of the parser are not widely
understood, although there are a number of cases where developers may need to develop
their own. This article, and the accompanying sample parser, sets out to show the how and
why of DirectShow parser development.
Shown below is a diagram of a very simple playback graph.
The source filter reads data from the original media source. This might be simply a read
from a local file, but it could equally well fetch data from a URL or receive a UDP
multicast. Essentially, the source filter provides access to the data without understanding
it at all. The parser separates the elementary video and audio data, timestamps them to
keep them sync with each other, and delivers them for decoding.
The parser’s tasks are:
• Identify the format of each elementary stream that it wishes to expose,
and create an appropriate media type for the corresponding output pin.
• Locate the start point. If the source provides random access to a file,
this might mean use of an index or making estimates of the file
Source Parser
Audio
Decoder
Video
Decoder
Audio
Renderer
Video
Renderer
An Introduction to DirectShow Parser Filters Page 2 of 10
http://www.gdcl.co.uk
location. If the source is a stream without random access, this may
mean simply discarding incoming data until an appropriate point is
located.
• Identify the elementary stream frames or packets and deliver them to
each pin.
• Timestamp each frame or packet if possible. For a seekable file, this
will be based on the known file timing. For a live stream, the basis of
timestamps will be determined on the fly when streaming begins.
To accompany this article, there is a fully working MPEG-1 parser available for
download in source form from http://www.gdcl.co.uk. This is a seekable parser for local
file playback in pull mode, which illustrates a number of the points discussed.
Identification and Connection
Getting the right filters
DirectShow plays back multimedia using a graph of connected filters. But it has no hard-
wired knowledge of the graph layout required for any particular type of file. Instead, each
graph is custom-built using a set of simple rules. The basic principles are the following:
• The source file is checked using a pattern-matching table that matches
bit patterns at fixed locations in the file. This gives a source filter
CLSID, and a media type (major/minor GUIDs) for the file type.
• Each output pin is rendered by trying in turn the filters available in the
graph, and the filters registered for that media type.
• Output pin rendering is repeated recursively until the stream is
rendered.
So how do you get your filter in the right place in the graph, and what role does it have it
building the rest of the graph?
The first method uses the pattern matching table in HKEY_CLASSES_ROOT\Media
Types. The keys and sub-keys listed here represent the known major types and subtypes
and each value entry contains a pattern to match against fixed byte positions in the file.
The graph manager’s AddSourceFilter method scans this table, and when a match is
found, the table gives the source filter’s CLSID together with the major type and subtype
for the file.
You can of course combine your source filter and parser, reading from the file as
necessary. In this case the source filter CLSID points to your filter; the major
type/subtype pair is not really used. However, most parsers will use the common source
filter, so that they can work with other interchangeable source filters, such as the
progressive download URL source. In this case, the source filter CLSID points to the File
Source (Async), and the major type/subtype pair is used as a media type for the output pin
of the source filter, and hence the input pin of your parser. This will typically be
MAJORTYPE_Stream, and then a subtype describing the file format.
Your parser is then brought into the graph because it is registered as a handler for that
type pair. You will then create output pins representing the elementary streams that you
are going to output, and the rest of the graph is built stepwise from there.
The pattern match entries in the table specify bytes at a fixed position in the file, together
with an AND pattern to be applied before the test. Each entry is a set of four strings (file
position, length, mask and test); you can have multiple tests in a single entry, in which
case all of them must apply. You can have several different entries for a particular
subtype, and only one needs to apply. For an example, here is the entry for an MPEG-1
System file:
An Introduction to DirectShow Parser Filters Page 3 of 10
http://www.gdcl.co.uk
[HKEY_CLASSES_ROOT\Media Type\
{e436eb83-524f-11ce-9f53-0020af0ba770}\
{E436EB84-524F-11CE-9F53-0020AF0BA770}]
"0"="0, 16, FFFFFFFFF100010001800001FFFFFFFF,
000001BA2100010001800001000001BB"
"Source Filter"="{E436EBB5-524F-11CE-9F53-0020AF0BA770}"
This matches a pack header and system packet header at the very beginning of the file. Of
course not all file types have known patterns at fixed positions. Mpeg-2 Transport Stream
is easily recognisable since there are start codes with the value 0x47 every 188 bytes.
However, if the file is part of a stream, it may not begin at a 0x47 start code. In this case,
the pattern matching scheme cannot be used.
When the graph manager fails to match any entry in the table, it loads the default source
filter File Source (Async) with a media type of MEDIATYPE_STREAM and
MEDIASUBTYPE_NULL. Then any parsers registered for this wild card type will be
loaded, and each can try to recognise the input format.
This is the mechanism chosen by the sample parser. You will see that the input pin is
registered with this wild card subtype:
{
&MEDIATYPE_Stream,
&MEDIASUBTYPE_NULL
},
The input pin’s CheckMediaType accepts any media type of MEDIATYPE_Stream. This
means that it can be used whether or not the pattern-match table has succeeded in
identifying the file format as MPEG-1.
Push and Pull
In the early days of DirectShow development, there was a good deal of nearly religious
debate about the merits of push and pull as models for data delivery. In the push mode,
the supplier delivers data when it is ready, but it is limited by flow control mechanisms,
such as a limited buffer count. In the pull mode, the consumer requests data when it is
ready to process it, but it is limited by the availability of data at the supplier. In the end,
there is not much difference between them, and the push model was selected for all filters.
However, there is a case where the push model does not work effectively. The standard
source filter is just a wrapper around the ReadFile API (although the sector-aligned
unbuffered reads are efficient). But placing this functionality in a separate filter means
that data can be received from other sources, most notably the URL source used for
progressive download from web sites. Even here, a seekable push model would work for
MPEG files, which can be read and played sequentially as a stream. But AVI files (and
other table-based formats such as QuickTime and MPEG-4) require the reader to use an
index. This means that the parser needs to have random access to the file during reading
(even if, most of the time, the reads are very localised and sequential).
To allow the AVI parser to use a common file source, the IAsyncReader interface was
designed. This is intended to permit random access to the source data using efficient,
overlapped I/O. However, it has to be said that it introduces some complexity, and in
many cases the benefits of interoperability with other parsers or sources are outweighed
by the added complexity.
Mpeg files can be parsed and read as a stream, and so the sample parser does not need
random access to the source data. It therefore uses a class at the input pin which uses a
worker thread to simulate a push-mode interface – the worker thread simply requests
each block in turn and delivers it to the input pin’s receive method. If you use this
scheme, your input pin needs only three interactions with the CPullPin object:
• Create an object derived from CPullPin when your input pin is connected.
• Activate the CPullPin object when the filter leaves stop mode. Note that, since
the pulling and delivery occurs on another thread, the first data could be
An Introduction to DirectShow Parser Filters Page 4 of 10
http://www.gdcl.co.uk
delivered before the Active() call has returned. You therefore need to position
the Active call carefully to make sure that the filter state has already been
changed. In the sample filter, the filter’s Pause method calls a method on the
input pin to set the start position and activate the pulling thread.
• De-activate the CPullPin class when the filter enters stop mode – in the example,
this is in the input pin’s Inactive method.
In addition, the Receive method needs to be overridden to forward the data to the parser’s
main processing function – and that’s essentially it.
I have made the sample parser more complicated because it supports variable bit rate
files, and this requires calls to the Seek function of CPullPin made on the worker thread
itself – but this is covered more clearly below. For this reason, you will see that the
sample parser uses a CPullPin2 class.
Input recognition
One of the jobs of the parser is to identify the elementary streams contained in the
multiplex and create correctly-typed output pins for each one. How this is done depends
on the source format, but typically this requires the parser to scan parts of the file. This
must be done when the input pin is connected, so that the output pins can be correctly
rendered for playback.
In the sample parser, the input pin’s CompleteConnect method passes control to the
parser filter to allow it to verify the input format and create the correct output pins. The
filter checks that this really is an MPEG-1 format that it can understand (see section
Getting the right filters above). Then it scans the file for elementary stream headers that
are used to create the fully-detailed media types required to connect to the decoders. At
the same time, the sample parser establishes the time base and duration of the file.
The IAsyncReader interface was designed to allow parser filters to read the data at
connection time for this very reason. The SyncRead method can be called even when the
filter is in Stop state – for the standard source filter, this is simply a call to ReadFile. You
can see in the sample parser, the CompleteConnect method, in PESScanner::SyncFill,
uses IAsyncReader::SyncRead to read the beginning and the end of the file.
Not all source/parsers have this option. A parser for live data would not be able to have
random access to the data, and would not access the data until the source is active. For
this situation, two solutions have become common:
• For parsers that work with push-mode (IMemInputPin) source filters, it
is common to implement IStream::Read on the source output pin so that
the parser can analyse the data during connection. Alternatively,
combine the source and parser, so that the parser can read the source
data via a private interface.
• Create the output pins using default media types. When the first data is
received, detect the correct media type detail (and time base) and attach
the media type as a dynamic type change to the first sample. This will
only work if the number of elementary streams extracted is fixed in
advance, and if the changes to the type detail do not require a different
decoder.
Operation
Thread organisation
A DirectShow filter needs to be aware of a number of different threads that may execute
code inside the filter at any time. These threads fall into two groups: state change threads
such as the application’s main thread and graph manager background threads, and worker
threads that deliver data.
An Introduction to DirectShow Parser Filters Page 5 of 10
http://www.gdcl.co.uk
The normal rule in DirectShow is that each graph segment should have a separate
delivery thread. You do not need a separate thread for each filter, but wherever a stream
begins, or is split into multiple outputs, you will normally have a thread for each stream.
So in the typical playback graph, there is a thread which delivers data to the parser, and
then a thread in each parser output pin which delivers the data downstream. Both
decoding and rendering take place on this parser output thread. So for our pull-mode
sample parser, there will be three delivery threads, shown here in red, blue and green.
This shows three delivery threads:
• The red thread in the input pin calls the source output pin’s
SyncReadAligned method, and returns the data to the filter’s Receive
method. This thread is created in the CPullPin2 class, and executes the
CPullPin2::Process method.
• The blue thread in the audio output pin delivers audio data downstream,
taking packets off the queue and calling the Receive method on the
audio decoder’s input pin. This method will typically decompress the
data and deliver the decompressed data to the renderer, where – still on
the same thread inside the Receive call – the decompressed data will be
delivered to the device driver for rendering.
• The green thread delivers compressed video data to the video decoder,
where it will be decompressed and delivered to the video renderer. The
video renderer will typically block this thread until it is time to
complete the drawing of the frame – if the graph is paused, this might
block indefinitely.
If the red source thread were used to deliver data downstream after parsing, it would not
be possible to perform audio decoding and video decoding in parallel. More importantly,
when the first video frame were delivered to the video renderer, this thread would block
until the graph left Paused state, and no more audio would be decompressed. Of course,
the graph might be waiting for decompressed audio before it can leave Paused state, so
the graph could easily deadlock.
For this reason, parsers typically use a worker thread on each output pin. This is contained
in the COutputQueue class in the base class library: sample buffers are placed on a queue
at the output pin, and the COutputQueue worker thread collects them and delivers them to
the connected input pin’s Receive method.
Some Receive implementations will immediately queue the data without blocking, and
then process it on some other thread. In this case, it is inefficient to have a worker thread
in the COutputQueue class, as it just introduces an unnecessary thread switch. This is why
the IMemInputPin interface contains a method ReceiveCanBlock. The COutputQueue
class will only create a worker thread if ReceiveCanBlock. If not, the sample buffers are
not queued, but are delivered immediately downstream. So if you have a filter that queues
data immediately and does not block (within itself or further downstream), you can
override the default implementation of this and avoid unneeded worker threads.
There are a few other points worth mentioning about the COutputQueue class:
Audio
Renderer
Source Parser
Audio
Decoder
Video
Decoder
Video
Renderer
An Introduction to DirectShow Parser Filters Page 6 of 10
http://www.gdcl.co.uk
• To minimise the cost of thread switching, the queue does not activate
the worker thread until the batch limit is reached. The sample parser
avoids problems with this by forcing a thread activation at the end of
every input buffer: this is the reason for the SendAnyway() calls in the
filter’s Receive method.
• When a sample buffer is passed to the queue, a reference count is
passed with it. That is, the COutputQueue class does not AddRef the
sample, but it will Release it.
State changes between Stopped, Paused and Running states occur on the application’s
main thread, or sometimes on graph manager background threads. The parser does not
need to handle Running state – paused and running are the same except to renderers – so
there are three relevant methods in the filter:
Pause The filter’s pause method is called when the filter is going
active. Transitions to Run will always call the Pause
method first. The point to note here is that the three
worker threads will be started during this call, and may
start work before this method returns, so the order of
operations is important. Remember that during your Pause
method, downstream filters will already be paused, but
those upstream will not.
Stop The Inactive methods on all three pins will be called
during the Stop processing. Since all the graph delivery
threads are created by the parser, these Inactive methods
will need to stop the worker threads. Since the graph
manager has already stopped the downstream filters, there
is no danger of the threads being blocked downstream.
Seek This method is the most complicated, since it involves
suspending the input pin thread when the rest of the graph
is still running. This is discussed in Seeking.
Buffering
DirectShow is designed to minimise the number of unnecessary buffer copies introduced
during copying. This was a serious problem with Video for Windows under some
circumstances, and the DirectShow design tries to avoid forcing buffer copies between
filters. Each pin-to-pin connection negotiates its own allocator for buffer space. To allow
both sides to use memory efficiently, the negotiation includes the number of buffers
available, the size of each buffer and details such as prefix space and buffer alignment.
Both pins can propose an allocator, but the output pin has the final decision.
The default behaviour is an allocator which allocates a fixed number of buffers, all of the
same size. This is the simplest to use since it does not involve any custom allocator
creation. It works fine for data in a stream format (such as uncompressed audio) or with a
fixed size of sample (such as uncompressed video, where each buffer contains one frame).
However for the compressed elementary stream data in a parser, it is not ideal.
The sample parser uses a default allocator like this. There are two drawbacks: the
compressed data must be copied from the input buffer into the output buffer, and the
output buffers are all of a fixed size. The copy of data is unlikely to be a problem in our
sample parser, since the bandwidth of compressed MPEG-1 is so small that the copy will
use minimal resources. However, the fixed buffer size may be more of a problem. For
compatibility with the decoders, we are expected to place one PES packet in each buffer.
A PES packet can be up to 64Kb in length (or more in MPEG-2 Transport Stream).
However, many are only a few hundred bytes. Using a default allocator means we have to
make all our buffers 64Kb in length, and waste a
本文档为【An Introduction to DirectShow Parser Filters】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。