Key enablers to achieving tera-scale processor performance under a constant power envelope will be the addition of special-purpose hardware accelerators and their ability to operate at ultra-low supply voltages. Special-purpose hardware accelerators can improve energy efficiency by an order of magnitude, compared to general-purpose cores, for key compute-intensive applications. Performance per Watt increases as the supply voltage is reduced, but the degraded transistor on/off current ratios at the lower supply voltages can limit the minimum operational supply voltage. Circuit solutions for robust operation at ultra-low supply voltages will also need to minimize any performance impact on the nominal supply operation for a highly scalable design. This article describes ultra-low voltage design techniques and learnings from a video motion estimation engine fabricated in 65nm CMOS technology.