This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA2EX17EVM: OCMC RAM execution speed

Part Number: TDA2EX17EVM

Hi,

We have some observations with respect to PRCM module.

As this is part of Vision SDK,  we need inputs for resolving the issue we are observing.

 

We need to ensure that the execution on OCMC RAM should also have a good performance.

We debugged and understood some factors that may be affecting the performance.

 

Our understanding/observation:

 

  1. We checked for the PRCM configuration by running the Get_PRCM_config.gel.

Upon performing the same, the observation is that DPLL for MPU is configured for 1GHz as per the message displayed on CCS console.

 

  1. Also, upon trying to find the API timings we have the following query:

 

  • MPU (A15) accesses the OCMC RAM through L3 main.
  • L3 main is derived from the CORE_CLK and CORE_X2_CLK correspondingly.
  • CORE_X2_CLK is configured in the register CM_DIV_H12_DPLL_CORE with Register address 0x4A00513C.

 

  • Current configuration in VSDK 2_12_3  is  CM_CORE_AON__CKGEN_CM_DIV_H12_DPLL_CORE (0x4A00513C)  =  0x00000204.

The API timings for CAN_write() observed is 40 ms.

 

  • When this register setting was changed to  0x00000202

The API timings for CAN_write() observed is approx. 30 ms

 

Please respond with your comments on this.

Regards,

Shivanand

  • Hi Shivanand,

    L3 does not support higher frequency than 266 MHz.
    You should not modify this setting as you can't overclock L3.

    Regards,
    Rishabh
  • Hi Rishabh,

    By PRCM method, we see that A15 is configured for 1GHz.
    We toggled a GPIO pin in a loop and we observed the frequency to be 18MHz.

    We understand that the Bootloader is executing instructions on OCMC RAM at a lower frequency.
    1. How can this be improved?

    2. If L3 can be clocked max to 266MHz, does that mean SW that runs on OCMC will be at 266MHz ?

    Regards,
    Shivanand
  • Hi Shivanand,

    1. When you toggle GPIO pin you make an access from A15 which results in access latency. The IOs/balls used to access GPIO do not support 1 GHz frequency. Also GPIO registers are in non cacheable space. Hence you can't measure frequency by toggling GPIO pins in a loop.
    2. Bootloader is executing instructions on A15 which runs at 1 GHz. OCMC memory is running at 266 MHz.

    Regards,
    Rishabh
  • Hi Shivanand,

    Can you let us know what is the end goal here.

    Regards,
    Rishabh
  • Hi Rishabh,

    We have MCAL driver built on A15 but executing on DDR.
    The API timings for CAN_Write() is observed to be around 100 micro seconds.

    The same driver is now integrated with Bootloader which executes on OCMC RAM.
    We observe the API timing for CAN_write() to be around 40 milli seconds.

    This timing is quite high and will affect the download speed.
    The intention is to improve the performance.

    Regards,
    Shivanand
  • Hi Shivanand,

    You should see similar numbers with OCMC and DDR.
    1. Can you run the gel file and check what is the frequency at which L3 is running. It should be 266 MHz.
    2. When you execute CAN_write() is I cache enabled? RBL enables I Cache by default. I Cache should be enabled in both cases.
    3. What is the state of D Cache when you execute CAN_write(). What is the MMU configuration for both OCMC and DDR? D Cache should be enabled globally as well as for MMU pages having OCMC and DDR mapping.

    Regards,
    Rishabh
  • Hi Rishabh,

    1. When we run the PRCM_Config.gel, we observe the DPLL frequencies for MPU, CORE, IPU and so on.
    DDR is at 388 MHz
    MPU is at 1 GHz
    IPU at 212 MHz

    But, not for L3 main. How to check for L3 ?

    2. Cache is enabled for MCAL running on DDR. But, not for OCMC RAM. We had previously tried to enable cache using the VSDK examples.
    We were unable to observe any change wrt performance. It would be helpful if you guide us in configuring the same for OCMC RAM.

    Regards,
    Shivanand
  • Hi Shivanand,

    Can you share the PRCM_Config.gel you are using.
    You should look at cache_a15.h on how to use cache.
    Can you answer 2 and 3 question.
    You can use CACHEA15GetEnabled API to see what all caches are enabled.

    Regards,
    Rishabh
  • Hi Rishabh,

    In CACHEA15GetEnabled() we observe both I and D-cache being enabled.

    Can I share the gel files on this platform?

    Regards,
    Shivanand
  • Hi Shivanand,

    You can share the gel file here.
    I would appreciate if you can give a bit more elaborate reply so that there is no scope of confusion.
    I am assuming that I cache and D cache are enabled in both cases i.e. when CAN_Write is executed from OCMC as well as from DDR.
    If this is the case what is the cache policy for OCMC? Can you share the code snippet where you have enabled MMU.

    Regards,
    Rishabh
  • Hi Rishabh,

    I have shared the gel files and code snippet over email.

    Regards,
    Shivanand
  • Hi Shivanand,

    I have not received any email.
    Also I would like to keep this support on e2e and hence share the code snippet and gel here.
    Thanks.

    Regards,
    Rishabh

  • Hi Rishabh,

    I have shared the gel file and code snippet for enabling cache, in the above post.

    Regards,
    Shivanand
  • Hi Shivanand,

    I see that you have made changes in the code to enable MMU compared to the one that was shared by TI.
    Can you try MMU configuration with the original code.
    There are multiple issues with the modifications you have done:
    1. For 0x4000_0000 region you are setting first level descriptor as block and this means second level descriptor is not needed. Defining second level descriptor won't have any effect.
    2. The size of second level descriptor is 2 MB so you can't start from 0x4030_0000.

    Regards,
    Rishabh
  • Hi RIshabh,

    These changes with respect to cache and MMU were made by Prasad. J during a debug session.
    Anyway, we replaced the functions with what was shared by you previously and tested the same.

             1. The observation is that the API timing of CAN-Write() dropped to 11 ms.

    This seems to be a good improvement. But, it is quite high compared to 100 micro seconds.

    What can be done for this?

    2. The above observation is made in debug mode.

    In Standalone / QSPI mode, current through the board is around 0.15A and we suspect booting is not occuring.

    Whereas, SW without cache enabled boots without any issue.

    Why is it so?

    Regards,
    Shivanand

  • Hi Shivanand,

    I would suggest you to explain the issues that you are facing in detail.
    What exactly is your SW doing?
    What are the changes wrt DDR vs OCMC?
    Why did you try to debug wrt Cache and MMU?
    Did the shared code did not work the first time?
    When cache is enabled it means the boot has occurred and bootloader is already running.
    What exactly do you mean boot is not happening?

    Regards,
    Rishabh
  • Hi Rishabh,

    The code shared by you previously was not working on standalone.

    During the debug session with Prasad.J, some modifications were made w.r.t cache and MMU.
    However, that also did not work in standalone.

    Previously, we did not measure API timings with that code.
    Now, we observed some difference w.r.t API timings.

    The difference between the SW running on DDR and OCMC are as below:
    DDR - cache enabled for DDR - API timing : 100 micro sec - checked in debug mode
    OCMC- cache enabled as per the code snippet shared by you - API timing : 10 milli sec - checked in debug mode

    This does not work in QSPI mode.
    Code with cache disabled- works fine and we observe frames.
    Code with cache enabled- does not respond to any requests.

    The current to the board also varies w.r.t these 2 SWs.
    Code with cache disabled - 0.24 A
    Code with cache enabled - 0.15 A

    Regards,
    Shivanand
  • Hi Rishabh,

    Please reply to the comments above.

    It would be helpful if we conclude on this soon.

    Regards,

    Shivanand

  • Hi Shivanand,

    Can you reply to my first question. What exactly is your SW doing? Can you explain the SW flow.

    Also are you saying previously the code was not working in standalone and now the same code is working in standalone.

    I am not able to understand the difference between three modes: standalone, QSPI and debug as referred in your question.

    Also for SBL boot you need all the changes I had shared on 22nd Dec, 2017. Did you try after adding Cache Invalidate API call in sbl_lib_tda2xx_platform.c?

    Regards,

    Rishabh

  • Hi Rishabh,

    1. Standalone --> refers to QSPI boot mode.
    2. SW is a bootloader developed over VSDK 2.12.3. We have integrated VSDK with MCAL driver.
    The SW performs the functionality of initializations/booting as in VSDK and in absence of APP Image, CAN driver shall process the received requests. The API timings during this operation is high.

    3. This SW had been checked ONLY in debug mode. Now, when we check the same in QSPI mode, it fails.

    Yes, CacheInvalidate () API is added as per our previous discussion. We still observe the same issue.

    Regards,
    Shivanand
  • Hi Shivanand,

    What is difference in flow wrt DDR and OCMC? SBL always runs from OCMC.
    Where is CAN driver running?

    Regards,
    Rishabh
  • Hi Rishabh,

    SW for DDR includes - mcu_init, enabling cache and then CAN driver processing requests.

    SW for OCMC includes the CAN driver and VSDK .

    Regards,
    Shivanand
  • Hi Shivanand,

    There are two problems here:
    1. Use case not working when CAN driver is running in OCMC RAM with cache enabled for OCMC region.
    2. Performance issue with respect to DDR (100 micro seconds) vs OCMC (10 ms) in debug mode.

    My question is regarding the second issue. What is difference in the flow? Can you explain in flow chart manner like A -> B -> C.

    Regards,
    Rishabh
  • Rishabh Garg said:
    Hi Shivanand,

    There are two problems here:
    1. Use case not working when CAN driver is running in OCMC RAM with cache enabled for OCMC region.
    2. Performance issue with respect to DDR (100 micro seconds) vs OCMC (10 ms) in debug mode.

    My question is regarding the second issue. What is difference in the flow? Can you explain in flow chart manner like A -> B -> C.

    Regards,
    Rishabh

       Hi Rishabh,

    I have attached the screenshots of the snippets used in CAN driver built for OCMC and CAN driver built for DDR respectively, with respect to Cache memory.

    There were some changes observed in the cache code shared by you and the code built for DDR .

    I have highlighted them in the attachment.

    After including these differences in code running on OCMC, we observe the cache code getting executed without creating any exception.

    This works both in debug and QSPI boot mode.

    Can you please tell me what these differences are ?

    What can be the impact of the same?

    Is the code on the left side in the image correct ? - this seems functional

    How can we verify exactly that cache is enabled correctly ?

    We want to enable cache only for CAN.

    How can we ensure or check whether cache is enabled for entire bootloader or only CAN?

    Regards,

    Shivanand

  • Hi Shivanand,

    I have answered your questions below:

    Cache cannot be enabled for a peripheral. Cache is enabled for a memory region. The second level descriptors have size of 2 MB and hence cache can be enabled for minimum 2 MB region. E.g. 0x4000_0000 to 0x4020_0000.

    OCMC_RAM1 start address is 0x4030_0000 and size is 512 KB and hence cache will be enabled for full 0x4020_0000 to 0x4040_0000 region.

    The code on left side is same as the one you had shared earlier. I had specified the below issues previously as well:
    1. For 0x4000_0000 region you are setting first level descriptor as block and this means second level descriptor is not needed. Defining second level descriptor won't have any effect.
    2. The size of second level descriptor is 2 MB so you can't start from 0x4030_0000.
    To summarize the cache is not enabled for OCMC in the code on the left hand side.

    I would suggest you to go through chapter "B3.6 Long-descriptor translation table format" of the ARM documentation static.docs.arm.com/.../DDI0406C_C_arm_architecture_reference_manual.pdf.

    Regards,
    Rishabh

  • Hi Rishabh,

    A correction in what I had mentioned in previous post.
    The code on right hand side is what is configured right now and seems to be functional as the API timing has reduced.

    Can you please answer the below questions wrt to right hand side snippet:

    Is the right hand side configuration correct?
    How can we verify exactly that cache is enabled correctly ?

    Regards,
    Shivanand
  • Hi Shivanand,

    The code on right hand side enables cache for DDR memory region.
    Cache is not enabled for OCMC.
    You can use API CACHEA15GetEnabled to see whether cache is enabled or not.

    Regards,
    Rishabh

  • Hi Rishabh,

    What modifications needs to be done in the Right side code to enable cache for OCMC RAM .

    Regards,
    Shivanand
  • Shivanand,

    I had already share the modified code previously (the one on 22nd Dec). The code on left hand side is the one which has changes made on top of original code shared by me.
    Prasad has scheduled a remote debug to resolve this issue quickly. Can you check the mail.

    Regards,
    Rishabh
  • Hi Shivanand,

    I will close this e2e thread as it is being supported by Prasad over webex.
    If you need any help feel free to post a reply here or start a new thread if this thread has locked out due to inactivity.

    Regards,
    Rishabh