TMS570LC4357: Execution Time is huge when compared to TMS570LS3137

Gobind Singh

Part Number: TMS570LC4357
Other Parts Discussed in Thread: TMS570LS3137, HALCOGEN

Hi,

I have to two ti controllers: TMS570LC4357 and TMS570LS3137. I executed a simple addition and assignment instruction on both the controllers and toggled a GPIO before and after the instruction. When I compared the time, the time taken by TMS570LC4357 was very huge. Below is the information:

S. No.	Property	TMS570LC4357	TMS570LS3137
1	Instruction Executed	Uint32_V1 = Uint32_V2 + Uint32_V3	Uint32_V1 = Uint32_V2 + Uint32_V3
2	Code Running from	Flash	Flash
3	GCLK	240MHz	160MHz
4	HCLK	120MHz	160MHz
5	CCS Version	6.1.3.00033	6.1.3.00033
6	Compiler Version	TIv15.12.1.LTS	TIv5.2.5
7	Debugger Used	XDS220 USB Debug Probe	XDS220 USB Debug Probe
8	Flash Wait States	2	3
9	Time measurement method	GPIO Toggle – checked on Oscilloscope	GPIO Toggle – checked on Oscilloscope
10	Cache Memory Enabled	Disabled	Disabled (Not available I think)

Looking at these settings and the capability of TMS570LC4357, we assumed that the time consumed by it will be far lesser than TMS570LS3137, but the data was exactly opposite with huge difference, (Look at row 4 below for example):

S. No.	No. of time the instruction was executed in series*	Time consumed in TMS570LC4357	Time consumed in TMS570LS3137
1	Once	520ns	400ns
2	Three Times	1380ns	500ns
3	Six Times	2632ns	650ns
4	Eighteen Times	7720ns	1250ns

*No loop was used. The instruction was just copied once more to execute one more time.

Could you please help me out if I am missing anything here. We had selected this controller for critical functionality looking at its capability, but it is not serving the purpose as of now.

Thank in advance.

Gobind Singh

over 7 years ago

0 Sunil Oak over 7 years ago

TI__Mastermind 49120 points

Hello Gobind,

The CPU interconnect architecture is significantly different between these two parts of the TMS570 series. The LC4357 has a cached architecture, so that accesses from level-2 flash and RAM memories take a significant number of cycles longer than on the LS3137 which uses a tightly-coupled memory architecture for flash and RAM.

It is not useful to compare execution cycles between these two MCUs with the cache disabled on the LC4357.

You can run a representative benchmark for your application on the cached LC4357 by enabling the instruction and data caches, and also run this benchmark on the LS3137 from the tightly-coupled memories. This will be more representative of the actual performance that you can expect from these parts. Also, I would suggest using the CPU's performance monitoring unit for the benchmarking.

Best regards,
Sunil

0 Gobind Singh over 7 years ago in reply to Sunil Oak

Intellectual 720 points

Thank you for the reply. Could you also please help me with the steps to enable cache memory and how to check (which register) whether cache has been enabled? I found some code to enable cache memory but could not find which register to check if it is enabled.

Also, how to configure it in Write-Back, No Write-Allocate modes or other modes?

Thank you.

0 Gobind Singh over 7 years ago in reply to Sunil Oak

Intellectual 720 points

0 Sunil Oak over 7 years ago in reply to Gobind Singh

TI__Mastermind 49120 points

Hi Gobind,

HALCoGen generates code for initializing the part, which includes enabling/disabling the cache. Check the "R5-MPU-PMU" tab in HALCoGen. On this same tab, you can configure the MPU to set up the memory regions and choose whether each region is cacheable, as well as other attributes.

Regards,
Sunil

0 Gobind Singh over 7 years ago in reply to Sunil Oak

Intellectual 720 points

OK. so now I have configured the cache memory and there is a big reduction in the execution time.
I have few questions:

1. I did not enable MPU, which means the memory attributes are the default ones and whole of Flash and RAM are in Write-Back, Write-Allocate (WBWA)mode. (Is this assumption correct?) The execution time for some set of instructions was 734ns. When I enabled MPU and I think i configured it Write-Through (WT) mode. The execution time for the same set of instructions was 610ns. Is it expected behavior? If yes, why WT mode is faster than WBWA mode?

2. In CCS Expressions window, when the code is running with Cache Enabled, the values in Expression window are all zeros. As soon as the execution is PAUSED, the actual values are seen. Is there a way to parallely see these values in the Expression window when Cache is enabled? When Cache is disabled, the values are fine.

0 Gobind Singh over 7 years ago in reply to Gobind Singh

Intellectual 720 points

Another question is, Is it not possible to execute code from RAM when Cache memory is enabled? I tried that, but I am facing problems. I see some exceptions getting raised or sometimes control remains in the function that is running from RAM and never comes out.

0 Gobind Singh over 7 years ago in reply to Gobind Singh

Intellectual 720 points

Could someone please answer these questions above?

Thank you.

0 Sunil Oak over 7 years ago in reply to Gobind Singh

TI__Mastermind 49120 points

Gobind,

These questions are no longer related to the original post and title. Please start a new thread for questions / topics unrelated to the original post.

See my comments below:

1. I did not enable MPU, which means the memory attributes are the default ones and whole of Flash and RAM are in Write-Back, Write-Allocate (WBWA)mode. (Is this assumption correct?) The execution time for some set of instructions was 734ns. When I enabled MPU and I think i configured it Write-Through (WT) mode. The execution time for the same set of instructions was 610ns. Is it expected behavior? If yes, why WT mode is faster than WBWA mode?

>> Please share your code configuring the MPU, as well as the instructions that you are executing. How do you measure the execution time?

2. In CCS Expressions window, when the code is running with Cache Enabled, the values in Expression window are all zeros. As soon as the execution is PAUSED, the actual values are seen. Is there a way to parallel see these values in the Expression window when Cache is enabled? When Cache is disabled, the values are fine.

>> The debugger memory window does use the processor to refresh the contents. This requires the processor to stop executing code, switch to a debug state (refer to ARM architecture reference manual for debug state details), and perform reads from the memory locations being displayed. There is a way on TMS570 MCUs to display memory locations without using or halting the processor. This requires you to map the debugger memory window to be refreshed by the DAP (Debug Access Port) in stead of the processor.

3. Another question is, Is it not possible to execute code from RAM when Cache memory is enabled? I tried that, but I am facing problems. I see some exceptions getting raised or sometimes control remains in the function that is running from RAM and never comes out.

>> How do you set up your code to execute from RAM? What exceptions do you get?

Regards,
Sunil

0 Gobind Singh over 7 years ago in reply to Sunil Oak

Intellectual 720 points

I have posted these questions in another thread:

e2e.ti.com/.../739162

Arm-based microcontrollers

Arm-based microcontrollers forum

TMS570LC4357: Execution Time is huge when compared to TMS570LS3137