This paper first provides a detailed introduction into the basic operation of caches and presents an overview of the TMS320C64x DSP cache architecture. In order to guarantee correct functioning of embedded applications on a cache-based DSP architecture, programmers have to be aware of potential cache coherence issues. While cache coherence on C6x1x DSPs is automatically maintained for internal memory, it is the programmer’s responsibility for accesses to external memory. A simple set of programming guidelines is established that ensures that coherence is maintained correctly. Further, the main purpose of a cache is to reduce the average memory access time. When the CPU requests data that is not held in cache a longer access time is incurred. Therefore, the goal is to keep those address locations allocated in cache that are repeatedly used or likely to be used soon. Since the capacity of caches is limited, memory layout and access pattern have a large impact on which addresses can be kept in cache and which will be evicted. Various optimization techniques are discussed for designing cache-friendly memory layouts to improve cache efficiency and thus application performance.