This paper is a revised version of an article by the same
title and author which appeared in the April 1991 issue
of Communications of the ACM.
Abstract
For the past few years, a joint ISO/CCITT committee
known as JPEG (Joint Photographic Experts Group)
has been working to establish the first international
compression standard for continuous-tone still images,
both grayscale and color. JPEG’s proposed standard
aims to be generic, to support a wide variety of
applications for continuous-tone images. To meet the
differing needs of many applications, the JPEG
standard includes two basic compression methods, each
with various modes of operation. A DCT-based method
is specified for “lossy’’ compression, and a predictive
method for “lossless’’ compression. JPEG features a
simple lossy technique known as the Baseline method,
a subset of the other DCT-based modes of operation.
The Baseline method has been by far the most widely
implemented JPEG method to date, and is sufficient in
its own right for a large number of applications. This
article provides an overview of the JPEG standard, and
focuses in detail on the Baseline method.
1 Introduction
Advances over the past decade in many aspects of
digital technology - especially devices for image
acquisition, data storage, and bitmapped printing and
display - have brought about many applications of
digital imaging. However, these applications tend to be
specialized due to their relatively high cost. With the
possible exception of facsimile, digital images are not
commonplace in general-purpose computing systems
the way text and geometric graphics are. The majority
of modern business and consumer usage of photographs
and other types of images takes place through more
traditional analog means.
The key obstacle for many applications is the vast
amount of data required to represent a digital image
directly. A digitized version of a single, color picture
at TV resolution contains on the order of one million
bytes; 35mm resolution requires ten times that amount.
Use of digital images often is not viable due to high
storage or transmission costs, even when image capture
and display devices are quite affordable.
Modern image compression technology offers a
possible solution. State-of-the-art techniques can
compress typical images from 1/10 to 1/50 their
uncompressed size without visibly affecting image
quality. But compression technology alone is not
sufficient. For digital image applications involving
storage or transmission to become widespread in
today’s marketplace, a standard image compression
method is needed to enable interoperability of
equipment from different manufacturers. The CCITT
recommendation for today’s ubiquitous Group 3 fax
machines [17] is a dramatic example of how a standard
compression method can enable an important image
application. The Group 3 method, however, deals with
bilevel images only and does not address photographic
image compression.
For the past few years, a standardization effort known
by the acronym JPEG, for Joint Photographic Experts
Group, has been working toward establishing the first
international digital image compression standard for
continuous-tone (multilevel) still images, both
grayscale and color. The “joint” in JPEG refers to a
collaboration between CCITT and ISO. JPEG
convenes officially as the ISO committee designated
JTC1/SC2/WG10, but operates in close informal
collaboration with CCITT SGVIII. JPEG will be both
an ISO Standard and a CCITT Recommendation. The
text of both will be identical.
Photovideotex, desktop publishing, graphic arts, color
facsimile, newspaper wirephoto transmission, medical
imaging, and many other continuous-tone image
applications require a compression standard in order to
The JPEG Still Picture Compression Standard
Gregory K. Wallace
Multimedia Engineering
Digital Equipment Corporation
Maynard, Massachusetts
Submitted in December 1991 for publication in IEEE Transactions on Consumer Electronics
1
develop significantly beyond their present state. JPEG
has undertaken the ambitious task of developing a
general-purpose compression standard to meet the
needs of almost all continuous-tone still-image
applications.
If this goal proves attainable, not only will individual
applications flourish, but exchange of images across
application boundaries will be facilitated. This latter
feature will become increasingly important as more
image applications are implemented on general-purpose
computing systems, which are themselves becoming
increasingly interoperable and internetworked. For
applications which require specialized VLSI to meet
their compression and decompression speed
requirements, a common method will provide
economies of scale not possible within a single
application.
This article gives an overview of JPEG’s proposed
image-compression standard. Readers without prior
knowledge of JPEG or compression based on the
Discrete Cosine Transform (DCT) are encouraged to
study first the detailed description of the Baseline
sequential codec, which is the basis for all of the
DCT-based decoders. While this article provides many
details, many more are necessarily omitted. The reader
should refer to the ISO draft standard [2] before
attempting implementation.
Some of the earliest industry attention to the JPEG
proposal has been focused on the Baseline sequential
codec as a motion image compression method - of the
‘‘intraframe’’ class, where each frame is encoded as a
separate image. This class of motion image coding,
while providing less compression than ‘‘interframe’’
methods like MPEG, has greater flexibility for video
editing. While this paper focuses only on JPEG as a
still picture standard (as ISO intended), it is interesting
to note that JPEG is likely to become a ‘‘de facto’’
intraframe motion standard as well.
2 Background: Requirements and Selec-
tion Process
JPEG’s goal has been to develop a method for
continuous-tone image compression which meets the
following requirements:
1) be at or near the state of the art with regard to
compression rate and accompanying image
fidelity, over a wide range of image quality ratings,
and especially in the range where visual fidelity to
the original is characterized as “very good” to
“excellent”; also, the encoder should be
parameterizable, so that the application (or user)
can set the desired compression/quality tradeoff;
2) be applicable to practically any kind of
continuous-tone digital source image (i.e. for most
practical purposes not be restricted to images of
certain dimensions, color spaces, pixel aspect
ratios, etc.) and not be limited to classes of imagery
with restrictions on scene content, such as
complexity, range of colors, or statistical
properties;
3) have tractable computational complexity, to make
feasible software implementations with viable
performance on a range of CPU’s, as well as
hardware implementations with viable cost for
applications requiring high performance;
4) have the following modes of operation:
• Sequential encoding: each image component is
encoded in a single left-to-right, top-to-bottom
scan;
• Progressive encoding: the image is encoded in
multiple scans for applications in which
transmission time is long, and the viewer
prefers to watch the image build up in multiple
coarse-to-clear passes;
• Lossless encoding: the image is encoded to
guarantee exact recovery of every source
image sample value (even though the result is
low compression compared to the lossy
modes);
• Hierarchical encoding: the image is encoded at
multiple resolutions so that lower-resolution
versions may be accessed without first having
to decompress the image at its full resolution.
In June 1987, JPEG conducted a selection process
based on a blind assessment of subjective picture
quality, and narrowed 12 proposed methods to three.
Three informal working groups formed to refine them,
and in January 1988, a second, more rigorous selection
process [19] revealed that the “ADCT” proposal [11],
based on the 8x8 DCT, had produced the best picture
quality.
At the time of its selection, the DCT-based method was
only partially defined for some of the modes of
operation. From 1988 through 1990, JPEG undertook
the sizable task of defining, documenting, simulating,
testing, validating, and simply agreeing on the plethora
of details necessary for genuine interoperability and
universality. Further history of the JPEG effort is
contained in [6, 7, 9, 18].
2
3 Architecture of the Proposed Standard
The proposed standard contains the four “modes of
operation” identified previously. For each mode, one
or more distinct codecs are specified. Codecs within a
mode differ according to the precision of source image
samples they can handle or the entropy coding method
they use. Although the word codec (encoder/decoder)
is used frequently in this article, there is no requirement
that implementations must include both an encoder and
a decoder. Many applications will have systems or
devices which require only one or the other.
The four modes of operation and their various codecs
have resulted from JPEG’s goal of being generic and
from the diversity of image formats across applications.
The multiple pieces can give the impression of
undesirable complexity, but they should actually be
regarded as a comprehensive “toolkit” which can span a
wide range of continuous-tone image applications. It is
unlikely that many implementations will utilize every
tool -- indeed, most of the early implementations now
on the market (even before final ISO approval) have
implemented only the Baseline sequential codec.
The Baseline sequential codec is inherently a rich and
sophisticated compression method which will be
sufficient for many applications. Getting this minimum
JPEG capability implemented properly and
interoperably will provide the industry with an
important initial capability for exchange of images
across vendors and applications.
4 Processing Steps for DCT-Based Coding
Figures 1 and 2 show the key processing steps which
are the heart of the DCT-based modes of operation.
These figures illustrate the special case of
single-component (grayscale) image compression. The
reader can grasp the essentials of DCT-based
compression by thinking of it as essentially
compression of a stream of 8x8 blocks of grayscale
image samples. Color image compression can then be
approximately regarded as compression of multiple
grayscale images, which are either compressed entirely
one at a time, or are compressed by alternately
interleaving 8x8 sample blocks from each in turn.
For DCT sequential-mode codecs, which include the
Baseline sequential codec, the simplified diagrams
indicate how single-component compression works in a
fairly complete way. Each 8x8 block is input, makes
its way through each processing step, and yields output
in compressed form into the data stream. For DCT
progressive-mode codecs, an image buffer exists prior
to the entropy coding step, so that an image can be
stored and then parceled out in multiple scans with suc-
cessively improving quality. For the hierarchical mode
of operation, the steps shown are used as building
blocks within a larger framework.
4.1 8x8 FDCT and IDCT
At the input to the encoder, source image samples are
grouped into 8x8 blocks, shifted from unsigned integers
with range [0, 2P - 1] to signed integers with range
[-2P-1, 2P-1-1], and input to the Forward DCT (FDCT).
At the output from the decoder, the Inverse DCT
(IDCT) outputs 8x8 sample blocks to form the
reconstructed image. The following equations are the
idealized mathematical definitions of the 8x8 FDCT
and 8x8 IDCT:
The DCT is related to the Discrete Fourier Transform
(DFT). Some simple intuition for DCT-based
compression can be obtained by viewing the FDCT as a
harmonic analyzer and the IDCT as a harmonic
synthesizer. Each 8x8 block of source image samples
is effectively a 64-point discrete signal which is a
function of the two spatial dimensions x and y. The
FDCT takes such a signal as its input and decomposes
it into 64 orthogonal basis signals. Each contains one
of the 64 unique two-dimensional (2D) “spatial
frequencies’’ which comprise the input signal’s
“spectrum.” The ouput of the FDCT is the set of 64
basis-signal amplitudes or “DCT coefficients” whose
values are uniquely determined by the particular
64-point input signal.
The DCT coefficient values can thus be regarded as the
relative amount of the 2D spatial frequencies contained
in the 64-point input signal. The coefficient with zero
frequency in both dimensions is called the “DC
coefficient” and the remaining 63 coefficients are
called the “AC coefficients.’’ Because sample values
[F(u, v) = 14 C(u)C(v)
X
7
x
=0
X
7
y
=0
f(x, y) *
cos
(2
x
+1)
upi
16 cos
(2
y
+1)
vpi
16 ] (1)
[f(x, y) = 14
X
7
u
=0
X
7
v
=0
C(u)C(v)F(u, v) *
cos
(2
x
+1)
upi
16 cos
(2
y
+1)
vpi
16 ] (2)
where: for
otherwise.
C(u), C(v) = 1= 2√ u,
C(u), C(v) = 1
v =
; 0
3
typically vary slowly from point to point across an
image, the FDCT processing step lays the foundation
for achieving data compression by concentrating most
of the signal in the lower spatial frequencies. For a
typical 8x8 sample block from a typical source image,
most of the spatial frequencies have zero or near-zero
amplitude and need not be encoded.
At the decoder the IDCT reverses this processing step.
It takes the 64 DCT coefficients (which at that point
have been quantized) and reconstructs a 64-point ouput
image signal by summing the basis signals.
Mathematically, the DCT is one-to-one mapping for
64-point vectors between the image and the frequency
domains. If the FDCT and IDCT could be computed
with perfect accuracy and if the DCT coefficients were
not quantized as in the following description, the
original 64-point signal could be exactly recovered. In
principle, the DCT introduces no loss to the source
image samples; it merely transforms them to a domain
in which they can be more efficiently encoded.
Some properties of practical FDCT and IDCT
implementations raise the issue of what precisely
should be required by the JPEG standard. A
fundamental property is that the FDCT and IDCT
equations contain transcendental functions.
Consequently, no physical implementation can
compute them with perfect accuracy. Because of the
DCT’s application importance and its relationship to
the DFT, many different algorithms by which the
FDCT and IDCT may be approximately computed have
been devised [16]. Indeed, research in fast DCT
algorithms is ongoing and no single algorithm is
optimal for all implementations. What is optimal in
software for a general-purpose CPU is unlikely to be
optimal in firmware for a programmable DSP and is
certain to be suboptimal for dedicated VLSI.
Even in light of the finite precision of the DCT inputs
and outputs, independently designed implementations
of the very same FDCT or IDCT algorithm which differ
even minutely in the precision by which they represent
cosine terms or intermediate results, or in the way they
sum and round fractional values, will eventually
produce slightly different outputs from identical inputs.
To preserve freedom for innovation and customization
within implementations, JPEG has chosen to specify
neither a unique FDCT algorithm or a unique IDCT
algorithm in its proposed standard. This makes
compliance somewhat more difficult to confirm,
because two compliant encoders (or decoders)
generally will not produce identical outputs given
identical inputs. The JPEG standard will address this
issue by specifying an accuracy test as part of its
compliance tests for all DCT-based encoders and
decoders; this is to ensure against crudely inaccurate
cosine basis functions which would degrade image
quality.
8x8 blocks DCT-Based Encoder
FDCT Quantizer Entropy
Encoder
Source Table Table Compressed
•
Specifications Image Data SpecificationsImage Data
Entropy
Decoder
Dequantizer IDCT
DCT-Based Decoder
Table Table
Specifications Specifications
Compressed
Image Data
Reconstructed
Image Data
Figure 1. DCT-Based Encoder Processing Steps
Figure 2. DCT-Based Decoder Processing Steps
4
For each DCT-based mode of operation, the JPEG
proposal specifies separate codecs for images with 8-bit
and 12-bit (per component) source image samples. The
12-bit codecs, needed to accommodate certain types of
medical and other images, require greater
computational resources to achieve the required FDCT
or IDCT accuracy. Images with other sample
precisions can usually be accommodated by either an
8-bit or 12-bit codec, but this must be done outside the
JPEG standard. For example, it would be the
responsibility of an application to decide how to fit or
pad a 6-bit sample into the 8-bit encoder’s input
interface, how to unpack it at the decoder’s output, and
how to encode any necessary related information.
4.2 Quantization
After output from the FDCT, each of the 64 DCT
coefficients is uniformly quantized in conjunction with
a 64-element Quantization Table, which must be
specified by the application (or user) as an input to the
encoder. Each element can be any integer value from 1
to 255, which specifies the step size of the quantizer for
its corresponding DCT coefficient. The purpose of
quantization is to achieve further compression by
representing DCT coefficients with no greater precision
than is necessary to achieve the desired image quality.
Stated another way, the goal of this processing step is
to discard information which is not visually significant.
Quantization is a many-to-one mapping, and therefore
is fundamentally lossy. It is the principal source of
lossiness in DCT-based encoders.
Quantization is defined as division of each DCT
coefficient by its corresponding quantizer step size,
followed by rounding to the nearest integer:
F
Q(u, v) = Integer Round ( F(u,v)
Q
(
u
,
v
) ) (3)
This output value is normalized by the quantizer step
size. Dequantization is the inverse function, which in
this case means simply that the normalization is
removed by multiplying by the step size, which returns
the result to a representation appropriate for input to the
IDCT:
When the aim is to compress the image as much as
possible without visible artifacts, each step size ideally
should be chosen as the perceptual threshold or “just
noticeable difference” for the visual contribution of its
corresponding cosine basis function. These thresholds
are also functions of the source image characteristics,
display characteristics and viewing distance. For
applications in which these variables can be reasonably
well defined, psychovisual experiments can be
performed to determine the best thresholds. The
experiment described in [12] has led to a set of
Quantization Tables for CCIR-601 [4] images and
displays. These have been used experimentally by
JPEG members and will appear in the ISO standard as a
matter of information, but not as a requirement.
4.3 DC Coding and Zig-Zag Sequence
After quantization, the DC coefficient is treated
separately from the 63 AC coefficients. The DC
coefficient is a measure of the average value of the 64
image samples. Because there is usually strong
correlation between the DC coefficients of adjacent 8x8
blocks, the quantized DC coefficient is encoded as the
difference from the DC term of the previous block in
the encoding order (defined in the following), as shown
in Figure 3. This special treatment is worthwhile, as
DC coefficients frequently contain a significant fraction
of the total image energy.
F
Q
′(u, v) = FQ(u, v) Q(u, v)* (4)
. . .
D
本文档为【Wallace.JPEG】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。