This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DRA80XMEVM: Reconfigure C7x L2 Memory as cache

Part Number: DRA80XMEVM
Other Parts Discussed in Thread: DRA829, TDA4VM

Hello TI support,

I'm running some tests on the C7x using the J721E_DRA829_TDA4VM EVM board and the newest PSDK 6.1. I'm able to start bare metal applications on the C7x and have all my data in the main memory of the board. Natively, the L2 of the C7x is configured to be used fully as SRAM and without any cache. Of course, this greatly impacts my performance.

I was looking in the J721E.gel file to see, where the C7x is configured but was not able to find anything. Through the CCS register view of the C7x I found out that there is a 4 byte L2CFG register of which the bits 0 - 2 are used to configure the L2 mode. But I could not find the address of this register, so I don't know how to modify the gel file correctly.

I did found some other xml file in the CCS folder that said something about an offset of 0x280 for the L2CFG, but there was no base address on which the offset has to added.

How can I set the L2 mode correctly, so that all of the L2 is used as cache?

Thank you and kind regards,

Florian

  • Hi Florian,

    I've answered this in another thread (below link). Perhaps we can close this one?

    Regards,
    Shyam

  • Shyam Jagannathan said:

    Hi Florian,

    Yes C7x L2 SRAM supports cache/SRAM configurations similar to C66x. The C7x DSP also requires an MMU page-table setup similar to A72. This is different from MAR setup in C66x. Using CSL we can partition L2/L1 memories to cache/sram but it requires MMU setup to make it cacheable.

    You can take a look at ti-processor-sdk-rtos-automotive-j7-evm-06_01_00_15/vision_apps/apps/basic_demos/app_tirtos/tirtos_linux/c7x_1/c7x_1.cfg

    You will need to register a callback for setting up a page table as per your requirement. For example you can take a look at, appMmuInit() function in ti-processor-sdk-rtos-automotive-j7-evm-06_01_00_15/vision_apps/apps/basic_demos/app_tirtos/tirtos_linux/c7x_1/main.c file.

    To know about C7x architecture, programming and some optimized examples, I recommend going over the C7x Training material v0.5 which should be available along with  ti-processor-sdk-rtos-automotive-j7-evm-06_01_00_15 in MySecureSoftware.

    Regards,
    Shyam

    Hi Shyam,

    I'd like to continue this discussion in this thread and rather close the other one. I clicked the resolve button in the other thread because previous questions were answered and this thread is specific to this topic.

    I'm not quite sure if the approach from the vision app example applies for me at this point. I boot the board in the no boot config and connect to it via the debugger. The A72 is not running as an host and TI-RTOS is not running on the DSPs. I load the launch script found under ti-processor-sdk-rtos-automotive-j7-evm-06_01_00_15/pdk/packages/ti/drv/sciclient/tools/ccsLoadDmsc/j721e/launch.js which initialized all the processors. I'm under the impression that I need to add something to this script in order to start the DSP with the correct cache mode. Is that correct?

    Kind regards,

    Florian

  • Hi Florian,

    On reset the C7x DSP defaults to 32KB of L1D cache, 32KB of L1P cache and 0KB of L2 cache (or 512KB of L2 SRAM)

    If you are not using BIOS on C7x and doing bare-metal then you will have to fill the C7x MMU page-table by writing a small piece of code which runs on C7x.

    You can take a look at one of the examples in C7x training package v0.5

    For Eg. <C7x Training package v0.5 install path>/c7x_dsp_code_samples_adv/c7x_cellSum_4x4/main.c 

    In this file you will notice that it includes couple of header files "enable_cache_mmu.h" and "csl_dspcahe7.h" and calls the function enable_cache_mmu() which accepts a pre-filled MMU table array. Please note that MMU table programming is similar to A72 and the recommended approach is to use the BIOS API's.

    These files, functions and MMU table array can be found under <C7x Training package v0.5 install path>/c7x_dsp_code_samples_adv/c7x_mmu folder.

    So basically, you need to include all the files in the c7x_mmu folder in your project and call the enable_cache_mmu() function as done in the main.c file.

    But this still keeps the cache size to 0KB. So you need to explicitly set the L2 cache size by calling the below CSL function as required.

      CSL_c7xSetL2CacheSize(2); // value 2 means 64KB cache, 448KB SRAM

      CSL_c7xSetL2CacheSize(3); // value 3 means 128KB cache, 384KB SRAM

      CSL_c7xSetL2CacheSize(4); // value 4 means 256KB cache, 256KB SRAM

      CSL_c7xSetL2CacheSize(5); // value 5 means 512KB cache, 0KB SRAM

     

    Do try this out and let me know. Ideally we recommend that all this should be done by running BIOS on C7x DSP.

    Regards,
    Shyam

  • Hi Florian,

    Did my previous reply help you?

    Regards,
    Shyam

  • Hi Shyam,

    sorry for the late reply. I was able to compile the code using CSL_c7xSetL2CacheSize() function and set it to 4. But there was no detectable speed up in my function. The data resided in the main DDR, so it should benefit from having an L2 cache, shouldn't it?

    The benchmark I did basically just read a bunch of data from the LPDDR4, did add some other data to it and wrote it back to main memory. I would expect some speed up for reading the data using the L2 as cache instead of SRAM.

    Best,

    Florian

  • Hi Florian,

    Just setting the cache size will not help cache data, you will have to setup MMU page tables as well. Did you try enabling MMU as provided in the examples?


    Regards,
    Shyam

  • Hi Shyam,

    yes, I call

        enable_cache_mmu((uint64_t)pte_lvl0);
        CSL_c7xSetL2CacheSize(4); // value 4 means 256KB cache, 256KB SRAM
    

    in the main, similar to the main of the example you mentioned.

    Best,

    Florian

  • Hi Florian,

    I apologize that the MMU table present in the examples does not map DDR sections, hence any data/program in DDR will not be cached. This could be the reason why you are seeing bad performance. This is something I will take care in the next release of the training material.

    For now, I recommend you to use the files in TIDL package present in the PSDKRA 6.1 here,

    <PSDKRA 6.1 install path>/tidl_src_j7_01_00_00_00/common/c7x_mmu

    Use all the files here instead of the c7x_mmu folder in the training material. Set the cache size before you enable_cache_mmu()

    Please try it an let me know.

    Regards,
    Shyam

  • Hi Shyam,

    I'm currently on a training and have a couple of vacation days afterwards. It could be a while until I can try it out. I'll give you feedback once I got time. Thanks.

    Kind regards,

    Florian

  • Hi Shyam.

    I finally had the time to test the caching with the files from the TIDL package. It worked! Thank you for your input.

    Have a nice day and a great Christmas :)

    Best,

    Florian