This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM5728: MPU Write Latency

Part Number: AM5728


Hello,


For research purposes, I am measuring the time it takes for a store and a load to execute when aiming the DDR3 memory and the OCMC RAM. To do so, I execute them in streams and I record the time using either the performance counters for the ARM Cortex A15 cores or the time stamp register for the C66x DSPs. All is done in bare-metal and with the data caches deactivated. I am using the default Sitara AM5728 GEL configuration files.

When I measure the stores and loads for the C66x when pointing to the DDR3 memory, I obtain around 8 and 178 cycles respectively. This is consistent with other results that I have obtained on other Texas Instruments MPSoCs like Keystone II. However, I am confused about the results that I obtain for the Cortex A15. In this case, I get around 58 and 27 cycles for stores and loads respectively. The former value is surprising for two reasons: (1) it is very different than the one obtained on the Keystone II and (2) I would always have expected to have shorter times for the stores than for the loads. I also appreciate the previous behavior when using the DDR3 memory controller performance counters.

As well, when measuring the store time from the Cortex A15 to the OCMC RAM I see that is much higher than to the DDR3 memory. This does not happen to the DSPs on this SoC or both core types on the Keystone II MSMC SRAM.

Therefore, my questions are the following:

  1. Is this a problem derived from a wrong register configuration or is this the expected behavior?
  2. It seems wrong to me the fact that it takes more time to write to the RAM than to the SDRAM. What should I expect here?


Thank you in advance for your help!


Regards,

  • Hi,

    It seems wrong to me the fact that it takes more time to write to the RAM than to the SDRAM. What should I expect here?

    I will check to see if we have collected similar data.

    A few comments though:

    • I would generally agree with your expectation that it should take less time for the A15 to store / load to on chip memory than the DDR3 memory.
    • The A15 on AM5728 does have a direct path to the DDRSS and does not need to go through the L3 interconnect. (see figure 14-1 in the TRM) https://www.ti.com/lit/ug/spruhz6l/spruhz6l.pdf. The architecture of AM5728 could be different compared to Keystone II devices, which could be contributing to the discrepancy between the results.
    • You might consider reviewing this performance application note to see if it has any relevant information. (DRA75x is similar to AM5728)  https://www.ti.com/lit/pdf/sprac46 

    Regards,
    Kevin

  • Hi Kevin,

    I have been taking a look to the DRA75x performance application note. This solves my OCMC RAM problem. I would have never thought that the writes to the OCMC could take longer than to the DDR. Something new that I learn. 

    According to the application note, the writes to the SDRAM take less than the reads. Therefore, my DDR write latency problem still remains. As you point out, AM5728 has a direct path to the EMIFs. This should benefit the requests transmission. That's why I would expect the latency of the writes from the MPU on AM5728 to not be slower than on Keystone II (and now also to DRA74x_75x). I have increased the frequency of the MPU from 1GHz to 1.5GHz in order to see if the problem was due to the EMIF Dependency on MPU Clock Rate (Section 15.3.4.2.2 in SPRUHZ6L). There is an improvement but the writes still take more time than the reads. The data suggests me that the MPU stores are being delayed by something which does not affect the loads. As I see it, it may be the MPU memory adapater or the MPU interface with the DDR memory. 

    Thank you for your help.

    Best regards,

    Alfonso