Hello,
I want to understand L1P effect regarding straight-line code. There is no for-loop and if-branch. I guess that L1P disabled is faster than L1P enabled because every access from L1P to L2SRAM is cache miss. I tried a benchmark of L1P enabled and disabled. The result is below. L1P enabled is faster. What are the causes of the result? Please see attached project file. (CCSv6.1.3 / CGTv7.4.16)
L1P enabled: 841 cycles
L1P disabled: 1133 cycles
I saw the following document. I understand that some amount of the cache miss overhead can be overlapped with dispatch stalls that occur in the fetch pipeline. But I think it doesn't describe the difference between L1P enabled and disabled. I mean dispatch stalls also occur when L1P is disabled. Please give me some advice.
TMS320C66x DSP CorePac User Guide (SPRUGW0C) / 2.6 L1P Performance (P.39)
Regards,
Kazu