AM263P4: Code Execution Takes longer on AM263P4 Compared to AM2634

Pouya Sabouriazad

Part Number: AM263P4
Other Parts Discussed in Thread: AM2634, LP-AM263P, , LP-AM263

Hi,

I have two evaluation boards LP-AM263P and LP-AM263. I am running the same code on both boards and interestingly code executes faster on AM2634 processor than on AM263P4. both codes are running from RAM with slight difference. The difference is as follow:

Processor	Executable Code Section Memory Location	Stack, Initialized, Uninitialized Data Section Memory Location
AM2634	OCSRAM Bank 1	OCSRAM Bank 3
AM263P4	OCSRAM Bank 1	OCSRAM Bank 4

I have already updated the gel file for AM263P4 to initilize OCSRAM banks 4 and 5. Why the same code is running slower on AM263P4? Is there a different access time to OCSRAM banks 4 and 5 compared to banks 0 to 3? Or am I missing another configuration for AM263P$ to use banks 4 and 5?

Note: Unfortunately, I am not allowed to share the code. But I beleive you should be able to see the same behavior testing any code.

Thanks,

Pouya

4 months ago

+1 Pouya Sabouriazad 4 months ago

Prodigy 140 points

Hi,

I found the problem, I wasn't specifying the bank 4 to be cacheable in MPU and hence the slower execution.

Thanks,

Pouya

0 Sahana H G 4 months ago in reply to Pouya Sabouriazad

TI__Expert 7166 points

Glad that you found the issue. However, please note the following present in the Device TRM (AM263Px Sitara Microcontrollers Technical Reference Manual (Rev. D) section 3.2) with respect to OCSRAM banks access latencies in AM263P:

The placement of 6 L2OCRAM Banks across the 3 interconnects (R5SS0 VBUSM, R5SS1 VBUSM and VBUSM CORE Interconnect) has been done such that cores in a cluster can have faster access (lesser latency) to the banks closer to that particular cluster. In other words, R5SS0_Core0 and R5SS0_Core1 cores will have faster access latency to its near L2OCSRAM banks (BANK0 and BANK1) placed on R5SS0 VBUSM interconnect. Similarly, R5SS1_Core0 and R5SS1_Core1 cores will have faster access latency to its near L2OCSRAM banks (BANK2 and BANK3) placed on R5SS1 VBUSM interconnect. All the 4 cores, will have the same but slower access latency to the common L2OCSRAM banks (BANK4 and BANK5) as compared to their near banks. Furthermore, all the 4 cores will have slower access latency to their far L2OCSRAM banks (BANK2 and BANK3 for cluster R5SS0 and BANK0 and BANK1 for cluster R5SS1) as compared to common banks. To summarize, for particular cores in a cluster, below is the L2OCRAM Bank access latency comparison: Access latency of near banks < Access latency of common banks < Access latency of far banks

0 Pouya Sabouriazad 4 months ago in reply to Sahana H G

Prodigy 140 points

Thank you for pointing that out. It is really helpful.

Is it possible to share the on average latency differences between Near, Common and Far banks accesses?

+1 Sahana H G 4 months ago in reply to Pouya Sabouriazad

TI__Expert 7166 points

Hello,

Pouya Sabouriazad said:
Is it possible to share the on average latency differences between Near, Common and Far banks accesses?

Please see this data below:

Read Access Latencies:

Near banks: 32.5ns
Common banks: 47.5ns (15ns slower than near)
Far banks: 57.5ns (25ns slower than near, 10ns slower than common)

Write Access Latencies:

Near banks: 25ns
Common banks: 35ns (10ns slower than near)
Far banks: 40ns (15ns slower than near, 5ns slower than common)

Summary of Latency Overhead:

Common vs Near: Read ~46% slower, Write ~40% slower
Far vs Near: Read ~77% slower, Write ~60% slower
Far vs Common: Read ~21% slower, Write ~14% slower

Regards,

Sahana

Arm-based microcontrollers

Arm-based microcontrollers forum

AM263P4: Code Execution Takes longer on AM263P4 Compared to AM2634

Read Access Latencies:

Write Access Latencies:

Summary of Latency Overhead: