Video processing loads have driven processor architects to consider a wide range of architectural approaches to deal with the sheer volume of pixel data. As the preferred medium for communication and entertainment, video resolutions and frame rates show no signs of leveling off. Processor clocks speeds, while having dutifully risen as per Moore’s predictions, still can’t keep up with ever increasing demands of the video marketplace. Similarly, compression algorithms, despite an order-of-magnitude jump in network capacities every couple years and the rapid drop in storage costs; have only continued to rise in implementation complexity.

This paradox has brought about a variety of architectural approaches to tackle throughputs demanded by video processing applications. Among them Multiprocessor DSPs, VLIW DSPs, Vector/Array DSPs, and Dual-core DSPs all present valid programmable approaches to the problem of harnessing parallelism from video tasks. Of these, we examine which has merit for delivering maximum usable overall throughput by harnessing parallelism.