Newsletter

A Quick Introduction to Wavelets





TechOnline


The wavelet transform is a method of signal analysis and synthesis. The technology analyzes and represents signals in terms of wavelets—functions that have both time and frequency domains. Wavelet technology is a relatively new breakthrough technology; yet in terms of fast-paced technological history it has been around "eons". The algorithms for wavelets are based on the work of Joseph Fourier from the early 1800s in which he discovered the utility of superimposing sines and cosines to represent other function. In wavelet analysis, the scale one uses to look a the data plays an important role, essentially defining subbands. If you look at a signal with a large window, you notice the gross features. If you use a small window (similarly to zooming), you see the details and discontinuities. Hence, in video/image processing, a series of high pass and low pass filters are typically applied to an input signal. Sub-band coders transform two-dimensional spatial video data into spatial frequency filtered sub-bands. Then adaptive quantization and entropy encoding processes provide compression (Figure 1). To perform wavelets on a stream of video data obviously requires some pretty good computational horsepower.

Figure 1:  Block Diagram of wavelet-based compression technology

Wavelet theory was first introduced as a mathematical tool in the mid-1980s by Morlet and Grossman in their geophysics work. Wavelet theory was quickly adopted for use in theoretical physics and applied math as well as in music, MRI, speech discrimination, optics, and geophysics and civil engineering. Since the late 80s and early 90s, wavelet theory has been applied to image processing, particularly compression.

Table 1:  Comparison of various videoconferencing formats

Wavelet technology enables digital video to be compressed by removing redundancy and using only the data which can be perceived by the human eye. Like the human ear, the human eye is less sensitive to high frequencies. And the frequency sensitivity varies by color, so different compression schemes can be used on the different color elements of a video signal as well. For instance, the ADV601LC, which is used by Intelect Visual Communications (IVC) as the engine for their videoconferencing system, filters the video signal into 42 separate frequency bands, 28 for color information (14 for Cb and 14 for Cr) and 14 for luminance (Figure 2). Each band is then optimized to include only those frequencies which can be seen by the human eye. Reassembling all the transformed blocks would results in a complete reconstruction of the original image. Much of the information below discusses the IVC implementation of the wavelet algorithm, but the principals can be generalized to other applications as well.

Figure 2:  Transformation of the Y component of video color signal into 14 new images

Wavelet functions offer three key advantages:

  1. They correspond more accurately to the broadband nature of images than do the sinusoidal waves of Fourier transforms

  2. They can be implemented with simple filters that result in low-cost silicon chips

  3. They also provide full-image filtering to eliminate block-shaped artifacts in the compressed image.

Under certain conditions, JPEG and MPEG and the H.261 video compression standard display block shaped artifacts because, being based on the discrete cosine transform, they start with 8x8 blocks of pixels. Wavelets gracefully degrade the picture quality as the compression rate increases and enable the end user to have full control over the compression rate.

Once the image has been transformed, the data can be used to:

  1. Implement what appears to be a nearly lossless compression

  2. Achieve lossy compression at either constant quality or constant bit rate

  3. create high-quality scaled images without computational overhead

  4. Create an error-resilient compressed bit stream, because each block contains information about the whole image.

As the image is transformed, a set of statistics is extracted for all 42 blocks. The statistics include the sum of the square (or energy), as well as the minimum and maximum pixel value for each block. The adaptive quantizer on the chip receives this information and uses it in con-junction with a "human visual model", relating the importance of each block to what the human eye would see. The quantizer then takes all this information, considers the user programmed bit rate, and calculates 42 "bin widths" or "binary widths" for each field (Figure 3).

In the case where high quality video is required while maintaining an accurate bit rate, low frequency bands are given the maximum bin width to ensure perfect reconstruction. The high frequency bands are provided with as large a bin width as possible, based on the complexity of the image and the needed bitrate. Some of the high frequency information must be given up to maintain the needed bitrate. But since the human eye does not perceive high spatial frequencies (darker areas of an image) as well as it perceives low spatial frequencies, blocks with darker areas may be compressed more. In the case where extremely high compression is required (over 100:1) ninety-nine per cent of the bits in each field must be eliminated. Only the smallest block gets a large bin width. The remaining bits are dispersed across the remaining blocks as determined by the bin-width allocator. Compression schemes based solely on information within each field usually fail at such high compression ratios. Wavelet's ability to maintain adequate information about the entire image is a very important factor in providing high quality video under circumstances that require extremely high compression.

Intelec Visual Communications uses the wavelet compression algorithm for the company's LANscape 2.0 system which does real-time videoconferencing, video-on-demand, video servers, video mail, and video distribution/multimedia on a wired or wireless IP-based networks. Users get full motion transmission quality while operating on any network that transports IP. The company's original produce used a Motion-JPEG CODEC.

IVC chose wavelet technology because of its superior picture quality and data handling capabilities. Wavelets have some commonality with the M-JPEG compression used by IVC's earlier products:

  1. Each full frame of video is the subject of compression, so each frame can be located for editing and for more complete decompression

  2. Luminance and chrominance sampling is 4:2:2.


About H.261 and H.263 CODECs

These international standards use only 4:1:1 luminance and chrominance sampling. Hence, the image contains less data than is available in either M-JPEG or Wavelet. Also, M-JPEG and the H.261 / H.263 standards divide the image into 8x8 blocks, then form macroblocks, group of blocks and a picture. At higher levels of compression, or during intense motion segments, this block structure provides an image where square pixels can be perceived as artifacts by the human eye. In addition H.261 and H.263 compression standards do not compress each full frame of video, but instead create Intra frames, Predictive frames and bi-direction frames (I, P, and B frames). Sampling is performed on blocks of data within the image. Because of this, if a packet is dropped during transport and data is lost, or if there is rapid movement in the video, the quality degrades rapidly, in other words, the eye perceives the blocks as pixelation and motion appears blurred.

The H.261 and H.263 protocols were designed to provide very high compression and, hence, low bandwidth consumption over ISDN lines for videoconferencing over Wide Area Networks. Bandwidth is reduced to fit the communications pipeline at the expense of picture quality. As higher bandwidth lines become more widely available, or as videoconferencing moves to the LAN, there is less need for high levels of compression and the quality disadvantages begin to clearly outweigh the bandwidth advantages. While the H.261/H.263 standards offer low bandwidth usage, the trade-off is that the images contain visible pixelation, loss of detail, and when the video contains intense motion segments, a noticeable blurring occurs. M-JPEG provides a clearer, less pixelated image than either H.261 or H.263 because initially there is more data to work with (Luminance/Chrominance/Chrominance at 4:2:2) and because the compression is done on each full video frame. Wavelet compression technology improves the picture quality even further by providing not only 4:2:2 sampling and compression on each full frame, it also uses a type of compression that results in artifacts that are less noticeable to the human eye while maintaining or improving bandwidth usage.

Thanks to Analog Devices, Norwood, MA, and Intelec Visual Communications, NY, NY for information which contributed to this Wavelet Introduction.



 






EE Times TechCareers
Search Jobs

Enter Keyword(s):


Function:


State:
  

Post Your Resume
-----------------
Employers Area
Most Recent Posts More career-related news, resources and job postings for technology professionals
 Sponsor