This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS570LC4357: Execution Time is huge when compared to TMS570LS3137

Part Number: TMS570LC4357
Other Parts Discussed in Thread: TMS570LS3137, HALCOGEN

Hi,

I have to two ti controllers: TMS570LC4357 and TMS570LS3137. I executed a simple addition and assignment instruction on both the controllers and toggled a GPIO before and after the instruction. When I compared the time, the time taken by TMS570LC4357 was very huge. Below is the information:

S. No.

Property

TMS570LC4357

TMS570LS3137

1

Instruction Executed

Uint32_V1 = Uint32_V2 + Uint32_V3

Uint32_V1 = Uint32_V2 + Uint32_V3

2

Code Running from

Flash

Flash

3

GCLK

240MHz

160MHz

4

HCLK

120MHz

160MHz

5

CCS Version

6.1.3.00033

6.1.3.00033

6

Compiler Version

TIv15.12.1.LTS

TIv5.2.5

7

Debugger Used

XDS220 USB Debug Probe

XDS220 USB Debug Probe

8

Flash Wait States

2

3

9

Time measurement method

GPIO Toggle – checked on Oscilloscope

GPIO Toggle – checked on Oscilloscope

10

Cache Memory Enabled

Disabled

Disabled (Not available I think)

Looking at these settings and the capability of TMS570LC4357, we assumed that the time consumed by it will be far lesser than TMS570LS3137, but the data was exactly opposite with huge difference, (Look at row 4 below for example):

S. No.

No. of time the instruction was executed in series*

Time consumed in TMS570LC4357

Time consumed in TMS570LS3137

1

Once

520ns

400ns

2

Three Times

1380ns

500ns

3

Six Times

2632ns

650ns

4

Eighteen Times

7720ns

1250ns

 *No loop was used. The instruction was just copied once more to execute one more time.

Could you please help me out if I am missing anything here. We had selected this controller for critical functionality looking at its capability, but it is not serving the purpose as of now.

Thank in advance.

Gobind Singh

  • Hello Gobind,

    The CPU interconnect architecture is significantly different between these two parts of the TMS570 series. The LC4357 has a cached architecture, so that accesses from level-2 flash and RAM memories take a significant number of cycles longer than on the LS3137 which uses a tightly-coupled memory architecture for flash and RAM.

    It is not useful to compare execution cycles between these two MCUs with the cache disabled on the LC4357.

    You can run a representative benchmark for your application on the cached LC4357 by enabling the instruction and data caches, and also run this benchmark on the LS3137 from the tightly-coupled memories. This will be more representative of the actual performance that you can expect from these parts. Also, I would suggest using the CPU's performance monitoring unit for the benchmarking.

    Best regards,
    Sunil
  • Thank you for the reply. Could you also please help me with the steps to enable cache memory and how to check (which register) whether cache has been enabled? I found some code to enable cache memory but could not find which register to check if it is enabled.

    Also, how to configure it in Write-Back, No Write-Allocate modes or other modes?

    Thank you.
  • Thank you for the reply. Could you also please help me with the steps to enable cache memory and how to check (which register) whether cache has been enabled? I found some code to enable cache memory but could not find which register to check if it is enabled.

    Also, how to configure it in Write-Back, No Write-Allocate modes or other modes?

    Thank you.
  • Hi Gobind,

    HALCoGen generates code for initializing the part, which includes enabling/disabling the cache. Check the "R5-MPU-PMU" tab in HALCoGen. On this same tab, you can configure the MPU to set up the memory regions and choose whether each region is cacheable, as well as other attributes.

    Regards,
    Sunil
  • OK. so now I have configured the cache memory and there is a big reduction in the execution time.
    I have few questions:

    1. I did not enable MPU, which means the memory attributes are the default ones and whole of Flash and RAM are in Write-Back, Write-Allocate (WBWA)mode. (Is this assumption correct?) The execution time for some set of instructions was 734ns. When I enabled MPU and I think i configured it Write-Through (WT) mode. The execution time for the same set of instructions was 610ns. Is it expected behavior? If yes, why WT mode is faster than WBWA mode?

    2. In CCS Expressions window, when the code is running with Cache Enabled, the values in Expression window are all zeros. As soon as the execution is PAUSED, the actual values are seen. Is there a way to parallely see these values in the Expression window when Cache is enabled? When Cache is disabled, the values are fine.
  • Another question is, Is it not possible to execute code from RAM when Cache memory is enabled? I tried that, but I am facing problems. I see some exceptions getting raised or sometimes control remains in the function that is running from RAM and never comes out.
  • Could someone please answer these questions above?

    Thank you.
  • Gobind,

    These questions are no longer related to the original post and title. Please start a new thread for questions / topics unrelated to the original post.

    See my comments below:

    1. I did not enable MPU, which means the memory attributes are the default ones and whole of Flash and RAM are in Write-Back, Write-Allocate (WBWA)mode. (Is this assumption correct?) The execution time for some set of instructions was 734ns. When I enabled MPU and I think i configured it Write-Through (WT) mode. The execution time for the same set of instructions was 610ns. Is it expected behavior? If yes, why WT mode is faster than WBWA mode?

    >> Please share your code configuring the MPU, as well as the instructions that you are executing. How do you measure the execution time?

    2. In CCS Expressions window, when the code is running with Cache Enabled, the values in Expression window are all zeros. As soon as the execution is PAUSED, the actual values are seen. Is there a way to parallel see these values in the Expression window when Cache is enabled? When Cache is disabled, the values are fine.

    >> The debugger memory window does use the processor to refresh the contents. This requires the processor to stop executing code, switch to a debug state (refer to ARM architecture reference manual for debug state details), and perform reads from the memory locations being displayed. There is a way on TMS570 MCUs to display memory locations without using or halting the processor. This requires you to map the debugger memory window to be refreshed by the DAP (Debug Access Port) in stead of the processor.

    3. Another question is, Is it not possible to execute code from RAM when Cache memory is enabled? I tried that, but I am facing problems. I see some exceptions getting raised or sometimes control remains in the function that is running from RAM and never comes out.

    >> How do you set up your code to execute from RAM? What exceptions do you get?

    Regards,
    Sunil
  • I have posted these questions in another thread:

    e2e.ti.com/.../739162