AMIC110: Disabling Cache, but the Execution time is varied.

Hideaki Matsumoto

Genius 9640 points

Part Number: AMIC110

Hi,

I’ve received a question about a cache of AMIC110. Could you help to answer their question below.

< Questions >

Is it possible for the execution time to vary depending on the placement address of .text when caches are disabled?

If so, could you explain the cause?

< Background >

On the AMIC110 (Cortex-A8), we implemented a simple “for” loop that iterates approximately 30 million times as follows:

for (int i = 0; i < 33554432; i++) {

// simple operation

}

Even though the “for” loop itself has not been modified, changing unrelated parts of the code causes a difference in execution time of the “for” loop.

When the function’s address is separated from .text into a custom section with a fixed address, the execution time no longer varies even if unrelated code is modified.

Upon investigation, we found that if the “for” loop instructions fit within a 32-byte alignment, the execution completes faster; if they cross a 32-byte boundary, it becomes slower.

Enabling the I-cache eliminates this timing difference, so we believe it is related to some hardware-specific behavior when fetching instructions from internal SRAM.

< Conditions >

No operating system is used.
Interrupts are disabled.
I-cache and D-cache are disabled.
The function’s .text section is placed in internal SRAM.

Thanks and regards,

Hideaki

3 months ago

0 Paula Carrillo 3 months ago

TI__Mastermind 40580 points

Hi Hideaki, which RTOS SDK version are they using? is this a new development?

thank you,

Paula

0 Hideaki Matsumoto 3 months ago in reply to Paula Carrillo

TI__Genius 9640 points

Hi Paula,

Thank you for your reply. They're not using OS, baremetal. This is not new development, they're modifying the current model.

Thanks and regards,

Hideaki

0 Hideaki Matsumoto 2 months ago in reply to Hideaki Matsumoto

TI__Genius 9640 points

Hi,

Can I receive any update ?

Regards,

Hideaki

0 Nick Saulnier 2 months ago in reply to Hideaki Matsumoto

TI__Guru** 106910 points

Hello Hideaki-san,

We can no longer provide design support for AMIC110 software, as per the title of the RTOS SDK:
https://www.ti.com/tool/download/PROCESSOR-SDK-RTOS-AM335X

Please refer to this announcement:
Notice regarding Processor SDK TI-RTOS for AM335x, AM437x, OMAP-L13x, C674x, K2G, AMIC110, AMIC120 devices

Regards,

Nick

+1 PratheeshGangadhar 2 months ago in reply to Nick Saulnier

TI__Guru 50201 points

Hideaki Matsumoto said:
Upon investigation, we found that if the “for” loop instructions fit within a 32-byte alignment, the execution completes faster; if they cross a 32-byte boundary, it becomes slower.

Typical burst size via interconnect is 32 bytes to SRAM, so I can imagine the overhead when alignment is not met as this translates to multiple read transactions to SRAM.

Hideaki Matsumoto said:
Enabling the I-cache eliminates this timing difference, so we believe it is related to some hardware-specific behavior when fetching instructions from

Cache fetches the lines ( 32 bytes ) and as long as the locality of reference is there (for loop in this case), it stays in cache and does not translate to SRAM transactions

Processors

Processors forum

AMIC110: Disabling Cache, but the Execution time is varied.