The wavelet transform is a method of signal analysis and
synthesis. The technology analyzes and represents signals in terms
of waveletsfunctions that have both time and frequency
domains. Wavelet technology is a relatively new breakthrough
technology; yet in terms of fast-paced technological history it has
been around "eons". The algorithms for wavelets are based on the
work of Joseph Fourier from the early 1800s in which he discovered
the utility of superimposing sines and cosines to represent other
function. In wavelet analysis, the scale one uses to look a the
data plays an important role, essentially defining subbands. If you
look at a signal with a large window, you notice the gross
features. If you use a small window (similarly to zooming), you see
the details and discontinuities. Hence, in video/image processing,
a series of high pass and low pass filters are typically applied to
an input signal. Sub-band coders transform two-dimensional spatial
video data into spatial frequency filtered sub-bands. Then adaptive
quantization and entropy encoding processes provide compression
(Figure 1). To perform wavelets on a stream of video data
obviously requires some pretty good computational horsepower.
Figure 1: Block Diagram of wavelet-based
compression technology
Wavelet theory was first introduced as a mathematical tool in
the mid-1980s by Morlet and Grossman in their geophysics work.
Wavelet theory was quickly adopted for use in theoretical physics
and applied math as well as in music, MRI, speech discrimination,
optics, and geophysics and civil engineering. Since the late 80s
and early 90s, wavelet theory has been applied to image processing,
particularly compression.
Table 1: Comparison of various videoconferencing
formats
Wavelet technology enables digital video to be compressed by
removing redundancy and using only the data which can be perceived
by the human eye. Like the human ear, the human eye is less
sensitive to high frequencies. And the frequency sensitivity varies
by color, so different compression schemes can be used on the
different color elements of a video signal as well. For instance,
the ADV601LC, which is used by Intelect Visual Communications (IVC)
as the engine for their videoconferencing system, filters the video
signal into 42 separate frequency bands, 28 for color information
(14 for Cb and 14 for Cr) and 14 for luminance (Figure 2).
Each band is then optimized to include only those frequencies which
can be seen by the human eye. Reassembling all the transformed
blocks would results in a complete reconstruction of the original
image. Much of the information below discusses the IVC
implementation of the wavelet algorithm, but the principals can be
generalized to other applications as well.
Figure 2: Transformation of the Y component of
video color signal into 14 new images
Wavelet functions offer three key advantages:
- They correspond more accurately to the broadband nature of
images than do the sinusoidal waves of Fourier transforms
- They can be implemented with simple filters that result in
low-cost silicon chips
- They also provide full-image filtering to eliminate
block-shaped artifacts in the compressed image.
Under certain conditions, JPEG and MPEG and the H.261 video
compression standard display block shaped artifacts because, being
based on the discrete cosine transform, they start with 8x8 blocks
of pixels. Wavelets gracefully degrade the picture quality as the
compression rate increases and enable the end user to have full
control over the compression rate.
Once the image has been transformed, the data can be used
to:
- Implement what appears to be a nearly lossless
compression
- Achieve lossy compression at either constant quality or
constant bit rate
- create high-quality scaled images without computational
overhead
- Create an error-resilient compressed bit stream, because each
block contains information about the whole image.
As the image is transformed, a set of statistics is extracted
for all 42 blocks. The statistics include the sum of the square (or
energy), as well as the minimum and maximum pixel value for each
block. The adaptive quantizer on the chip receives this information
and uses it in con-junction with a "human visual model", relating
the importance of each block to what the human eye would see. The
quantizer then takes all this information, considers the user
programmed bit rate, and calculates 42 "bin widths" or "binary
widths" for each field (Figure 3).
In the case where high quality video is required while
maintaining an accurate bit rate, low frequency bands are given the
maximum bin width to ensure perfect reconstruction. The high
frequency bands are provided with as large a bin width as possible,
based on the complexity of the image and the needed bitrate. Some
of the high frequency information must be given up to maintain the
needed bitrate. But since the human eye does not perceive high
spatial frequencies (darker areas of an image) as well as it
perceives low spatial frequencies, blocks with darker areas may be
compressed more. In the case where extremely high compression is
required (over 100:1) ninety-nine per cent of the bits in each
field must be eliminated. Only the smallest block gets a large bin
width. The remaining bits are dispersed across the remaining blocks
as determined by the bin-width allocator. Compression schemes based
solely on information within each field usually fail at such high
compression ratios. Wavelet's ability to maintain adequate
information about the entire image is a very important factor in
providing high quality video under circumstances that require
extremely high compression.
Intelec Visual Communications uses the wavelet compression
algorithm for the company's LANscape 2.0 system which does
real-time videoconferencing, video-on-demand, video servers, video
mail, and video distribution/multimedia on a wired or wireless
IP-based networks. Users get full motion transmission quality while
operating on any network that transports IP. The company's original
produce used a Motion-JPEG CODEC.
IVC chose wavelet technology because of its superior picture
quality and data handling capabilities. Wavelets have some
commonality with the M-JPEG compression used by IVC's earlier
products:
- Each full frame of video is the subject of compression, so each
frame can be located for editing and for more complete
decompression
- Luminance and chrominance sampling is 4:2:2.
About H.261 and H.263 CODECs
These international standards use only 4:1:1 luminance and
chrominance sampling. Hence, the image contains less data than is
available in either M-JPEG or Wavelet. Also, M-JPEG and the H.261 /
H.263 standards divide the image into 8x8 blocks, then form
macroblocks, group of blocks and a picture. At higher levels of
compression, or during intense motion segments, this block
structure provides an image where square pixels can be perceived as
artifacts by the human eye. In addition H.261 and H.263 compression
standards do not compress each full frame of video, but instead
create Intra frames, Predictive frames and bi-direction frames (I,
P, and B frames). Sampling is performed on blocks of data within
the image. Because of this, if a packet is dropped during transport
and data is lost, or if there is rapid movement in the video, the
quality degrades rapidly, in other words, the eye perceives the
blocks as pixelation and motion appears blurred.
The H.261 and H.263 protocols were designed to provide very high
compression and, hence, low bandwidth consumption over ISDN lines
for videoconferencing over Wide Area Networks. Bandwidth is reduced
to fit the communications pipeline at the expense of picture
quality. As higher bandwidth lines become more widely available, or
as videoconferencing moves to the LAN, there is less need for high
levels of compression and the quality disadvantages begin to
clearly outweigh the bandwidth advantages. While the H.261/H.263
standards offer low bandwidth usage, the trade-off is that the
images contain visible pixelation, loss of detail, and when the
video contains intense motion segments, a noticeable blurring
occurs. M-JPEG provides a clearer, less pixelated image than either
H.261 or H.263 because initially there is more data to work with
(Luminance/Chrominance/Chrominance at 4:2:2) and because the
compression is done on each full video frame. Wavelet compression
technology improves the picture quality even further by providing
not only 4:2:2 sampling and compression on each full frame, it also
uses a type of compression that results in artifacts that are less
noticeable to the human eye while maintaining or improving
bandwidth usage.
Thanks to Analog Devices, Norwood, MA, and
Intelec Visual Communications, NY, NY for information which
contributed to this Wavelet Introduction.