This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Cache misses consideration

Other Parts Discussed in Thread: OMAP-L137

Why C6455 has larger cache misses penalty than C6416?

For example, the cache misses for C6455 is 12.5 cycles per miss but C6416 has only 6 cycles per miss.  How comes?

  • The 64x+ architecture moves to a multi-master DMA architecture.  This allows for much greater overall data throughput in the device, e.g. EDMA could move data from DDR2 to L1D SRAM while the CPU simultaneously accesses another chunk of on-chip memory (e.g. the 128 KB chunk of SRAM on OMAP-L137).   Of course this adds an extra layer to things because now there needs to be arbitration of these sources.

    The 64x+ architecture also was designed to scale to faster frequencies and larger cache sizes.

    Although the miss penalty is greater there have also been other improvements that more than offset the change (generally):

    • L1 cache sizes are 2x greater (sometimes more) than older 64x devices, e.g. fewer cache misses due to larger L1 cache.
    • Ability to configure L1 memory as SRAM instead of cache, i.e. capability to make certain algorithms/data always accessible in a single cycle.
    • Wider bus between L1 and L2 to get more throughput (though with more initial latency).
    • Compact instructions gives better code density, e.g. more instructions now fit in the same amount of SRAM (fewer misses)
    • Improvements in snooping mechanisms to reduce the number of misses.