Embedded systems often have performance bottlenecks, which prevent them from meeting their performance requirements. Designers may employ high performance processors or perform code modifications to meet these performance goals.

We will explore techniques to profile a system. We present data that can be extracted from profiles allowing designers to identify system bottlenecks. Then we shall present ways to modify the design, both hardware and software to improve performance. Depending upon the cause, it could be a change in arbitration logic, modification in the code to keep certain variables in the registers, changes to the data organization such as using global variables, in-lining code, or migration from software to hardware.

The paper will present real life examples. One of the case studies will be a low bit rate speech encoder. We will walk our audience through several changes in the design and their impact on performance. Finally, we shall show the tuned design achieving much lower frequency, saving power and cost.