This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CCS/OMAPL138B-EP: Is there Cache Monitoring?

Part Number: OMAPL138B-EP
Other Parts Discussed in Thread: OMAPL138

Tool/software: Code Composer Studio

Dear all,

we are using the OMAP L138/ DSP with a very fast task (125 us). We rely on that most of the functions in this task load only once from DDR RAM and reside in L2. Thus, we set the L2 Cache to 50% and linked some of the functions to L2 RAM. However, it was not possible to link all functions to L2, esp. there are many alternative functions which are selected by run-time configuration parameters.

I had the effect that a part of code involving a number of function calls with just some lines of effective code took 3 us - by linking one of them to L2, the time was reduced to below 500ns. I suppose this was caused by a cache performance or cache miss problem, but was not able wo prove this.

Is there a way to monitor cache hits/ cache misses during run-time?

Is there a "good practice" of reducing cache misses by using a certain link pattern?

Upon our configuration parameters, it would be possible on start-up to determine "n out of m" functions that are really needed with the actual configuration. We had the idea to build a memory management, copy the necessary functions there and call them by pointers. (I know the gcc overlay mechanism, but I think we cannot use it because we need "n out of m"). Has this already been done/ is there some hints or sample code?

Thanks for any hint

Alexander

  • Alexander,

    We here in the CCS forum are not very experienced with the cache mechanism of C674x devices. However, upfront I can provide a few answers:

    >> Is there a way to monitor cache hits/ cache misses during run-time?
    Not from a CCS perspective. We used to have this feature in very old releases but it didn't yield reliable results and therefore it was discontinued.

    >> Is there a "good practice" of reducing cache misses by using a certain link pattern?
    There are a few references that can help you with this:
    processors.wiki.ti.com/.../Program_Cache_Layout
    www.ti.com/.../spra756.pdf
    www.ti.com/.../spru656a.pdf
    processors.wiki.ti.com/.../C6000_Compiler:_Recommended_Compiler_Options

    (this last page has some outdated information in section 6, but overall is a good reference)

    I will move your post to the device forum so the experts there may be able to help you with additional insights.

    Hope this helps,
    Rafael
  • Alexander,

    Some points to add to your consideration:

    1. L2 Cache is shared between data and program, always tending to keep the most-recently accessed cache lines of data or program.

    2. L1P and L1D Caches are very important to this consideration also. You may in fact want to allocate some of those as SRAM and link extremely critical functions to there. This is a not a common practice, but it is one to be considered.

    3. There is a 128KB shared memory on the OMAPL138 that could be used for more internal space. I have not benchmarked it for speed compared with L2 SRAM accesses, but expect it to be slower than L2 SRAM and faster than DDR accesses.

    And some various comments:

    I believe the C674x core supports cache visibility in CCS. When you open a Memory Browser Window there are three check boxes that indicate highlighting of memory contents by whether those locations are currently residing in L1P, L1D, or L2 caches. You can uncheck the boxes to turn off the highlighting and to also see the physical contents at the physical non-cached memory location. Set the address to an area of interest in DDR in a program, and you can see if some locations are currently cached in L1P and/or L2; do the same with an area of data memory to check the current caching status.

    This can be used by setting a breakpoint in your code and then getting a snapshot of cache status. It can be a tedious and manual operation, but can be very useful especially if there are only a few areas of particular interest.

    You can use the MAR bits to control the cacheability of sections of memory. The granularity is very coarse, 16MB per MAR bit. But this could allow you to place some of your less critical program sections into non-cacheable addresses and make sure all of your critical program sections are cacheable and less likely to be evicted between uses.

    All of this depends very strongly on the exact nature of your algorithms, how often they are called, how much code is called, what percentage is used each time it is called, how much data is used, whether the same or different data sections are worked each time, whether other things will interfere with the caching outside the critical 125us task.

    Regards,
    RandyP