This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM6441: Execution Speed for R5f in SRAM

Part Number: AM6441
Other Parts Discussed in Thread: SYSCONFIG

Hi TI Experts,

Customer is trying to put a small application run by R5f in the 2MB SRAM shown below.

We think the execution speed is the same as the corresponding clock for the SRAM.

Could you help to check if this speed is the same as R5f 800Mhz, or it will reduce to 400MHz or any other values?

Thanks a lot!

Kevin

  • Edited April 15 2024: incorrect information is crossed out. Please see later responses for more details

    Hello Kevin,

    I'll provide an initial response, so that you can get additional information from the customer if needed. If the customer needs additional input, I can reassign to another team member.

    What is actually going into the SRAM? Case 1: Instruction memory 

    You can put both instructions and data into SRAM (as well as the local memories in the R5F subsystem). The R5F has both an instruction cache and a data cache, so the rest of "case 1" is incorrect. Refer to later responses.

    If you have INSTRUCTION data saved to SRAM, the speed of execution for the code will drop to however many clock cycles it takes the R5F to retrieve the next instruction from SRAM. Accesses to the Instruction RAM (IRAM) of the R5F core are single cycle, so the instructions execute at the clock speed of the processor. However, if the R5F is accessing instructions that are stored in SRAM, it will only be able to execute one instruction per the time it takes to access the memory.

    https://www.ti.com/lit/spracv1 section "Memory Read Latencies" shows that we measured it takes 64 nanoseconds (best case) to read a 32 bit word from SRAM to an unspecified R5F. Let's say you are running the R5F core at 800MHz. Then the absolute FASTEST you could run while reading instructions from SRAM is about 1/64ns = 15MHz, or more than 50 times slower than if you were running instructions from the R5F subsystem's internal memory.

    What is actually going into the SRAM? Case 2: Data 

    The answer here is NOT as straightforward, for many reasons.

    1) Cache exists. So even if an R5F core has to take at least 64 nanoseconds to read the data in SRAM the FIRST time, the data could be stored in internal cache in the R5F, so that any time the memory is used later on it is accessible immediately.

    2) Even if the data is NOT saved in cache, and you need to read from SRAM each time the data is used, you are not accessing SRAM every single clock cycle. One read that takes at least 64 nanoseconds every 10 assembly instructions has a very different impact on processor execution speed than one read every 10,000 assembly instructions.

    Regards,

    Nick

  • Hello Kevin,

    was very kind and pointed out to me that parts of my response above apply to the M4F, NOT to the R5F cores! I apologize for the confusion. I am going to modify my response accordingly, and reassign your thread to another team member.

    I will include some additional details that Dominic provided. I'll see if I can get my team member to validate his information:

    instruction cache actually greatly improves latency when grabbing instructions from SRAM, because you "pay" the SRAM read latency only once per cache line, even if the instructions were not yet in the cache. So 64ns latency + 1,25 per additional instruction that can be used, for better average cycles per instruction. 

    On the R5F you don't really have "IRAM" (I believe that's a Cortex-M term), but two banks of TCM.

    If the customer is concerned about high speed SRAM accesses, there's the option of using a r5fss in single core mode for twice the TCM (i.e., instead of dividing the TCM in an R5F subsystem evenly across two R5F cores, only enable one core so that it can use all of the TCM in the subsystem).

    Regards,

    Nick

  • Hi Nick,

    Thanks for your response. I have the following questions.

    1.How can I determine if the R5f has enabled cache to improve latency? Is this default to being enable?

    2.Can other codes(Code not executed in TCM)still use cache if I use TCM?

    3.Local TCM means R5F0_0 R5F0_1 share the left 125KB TCM, R5F1_0 R5F1_1 share the right 125KBTCM. Is my understanding correct?

  • Hello Jerry,

    I'm not TI, so you might want to wait for Nick's confirmation, but here's what I know:

    1.How can I determine if the R5f has enabled cache to improve latency? Is this default to being enable?

    You can check e.g. in CCS via the "ARM advanced features" tab in the "Control Panel" view.

    The hardware default is having the caches disabled, but the usual TI startup code enables them. You need entries in the MPU (memory protection unit) to specify which areas should be cached. This can be configured via SysConfig. The hello_world_am64x-evm_r5fss0-0_freertos_ti-arm-clang example configures the whole SRAM (2 MB @ 0x70000000) and the full DDR memory (2 GB @ 0x80000000) as cacheable normal memory.

    2.Can other codes(Code not executed in TCM)still use cache if I use TCM?

    Yes. That's how you achieve decent performance with the R5fs in the AM6x. Put your most critical code in TCM, put your less critical code in SRAM, put the rest in DDR memory. TCM and caches are separate things. Code in TCM will never be cached, but is always accessible at single-cycle latency.

    3.Local TCM means R5F0_0 R5F0_1 share the left 125KB TCM, R5F1_0 R5F1_1 share the right 125KBTCM. Is my understanding correct?

    Yes. Each subsystem has 128 KB of TCM, or 64 KB per core. If you put a subsystem in single-core mode, you get 128 KB of TCM for that single core. You really only have the option of "64 KB each" or "128 KB and just one core".

    Regards,

    Dominic

  • Hello Dominic,

    Thank you very much for your help. I understand all three questions now.

  • Hello Jerry,

    The above Dominic statements are fine.

    Thanks Dominic  and Nick for responding quickly.

    Regards,

    S. Anil.