The newly released Freescale SC3850 StarCore DSP implements new cache hint instructions that allow a compiler or an assembly language programmer to minimize cache-miss latency by allocating data into a cache before it is accessed. These instructions affect the performance, but not the functionality, of software in which they are used. In theory, intelligent use of L1 cache hints can result in zero memory overhead, but in many situations, changing the algorithm is hard.

This article examines an automatic flow for cache hint instruction insertion into existing code using profiler feedback:

  • DFETCH/PFETCH instructions are used to prefetch read misses ahead of DSP Core request
  • DMALLOC is used to allocate cache lines before sequential writes

This evaluation found that automatic insertion of cache hint instructions into the existing code can provide significant performance benefits of 5-15% of effective frequency by removing memory stalls. Performance improvement for applications running from slow memory is greater due to the higher latency associated with this type of memory. The developed flow can be incorporated directly into assembly code, added by the compiler, or used as separate tool.