This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM263P4: Code Execution Takes longer on AM263P4 Compared to AM2634

Part Number: AM263P4
Other Parts Discussed in Thread: AM2634, LP-AM263P, , LP-AM263

Hi,

I have two evaluation boards LP-AM263P and LP-AM263. I am running the same code on both boards and interestingly code executes faster on AM2634 processor than on AM263P4. both codes are running from RAM with slight difference. The difference is as follow:

Processor Executable Code Section Memory Location Stack, Initialized, Uninitialized Data Section Memory Location
AM2634 OCSRAM Bank 1 OCSRAM Bank 3
AM263P4 OCSRAM Bank 1 OCSRAM Bank 4

I have already updated the gel file for AM263P4 to initilize OCSRAM banks 4 and 5. Why the same code is running slower on AM263P4? Is there a different access time to OCSRAM banks 4 and 5 compared to banks 0 to 3? Or am I missing another configuration for AM263P$ to use banks 4 and 5?

Note: Unfortunately, I am not allowed to share the code. But I beleive you should be able to see the same behavior testing any code.

Thanks,

Pouya

  • Hi,

    I found the problem, I wasn't specifying the bank 4 to be cacheable in MPU and hence the slower execution.

    Thanks,

    Pouya

  • Glad that you found the issue. However, please note the following present in the Device TRM (AM263Px Sitara Microcontrollers Technical Reference Manual (Rev. D) section 3.2) with respect to OCSRAM banks access latencies in AM263P:

    The placement of 6 L2OCRAM Banks across the 3 interconnects (R5SS0 VBUSM, R5SS1 VBUSM and VBUSM CORE Interconnect) has been done such that cores in a cluster can have faster access (lesser latency) to the banks closer to that particular cluster. In other words, R5SS0_Core0 and R5SS0_Core1 cores will have faster access latency to its near L2OCSRAM banks (BANK0 and BANK1) placed on R5SS0 VBUSM interconnect. Similarly, R5SS1_Core0 and R5SS1_Core1 cores will have faster access latency to its near L2OCSRAM banks (BANK2 and BANK3) placed on R5SS1 VBUSM interconnect. All the 4 cores, will have the same but slower access latency to the common L2OCSRAM banks (BANK4 and BANK5) as compared to their near banks. Furthermore, all the 4 cores will have slower access latency to their far L2OCSRAM banks (BANK2 and BANK3 for cluster R5SS0 and BANK0 and BANK1 for cluster R5SS1) as compared to common banks. To summarize, for particular cores in a cluster, below is the L2OCRAM Bank access latency comparison: Access latency of near banks < Access latency of common banks < Access latency of far banks

  • Thank you for pointing that out. It is really helpful.

    Is it possible to share the on average latency differences between Near, Common and Far banks accesses?

  • Hello,

    Is it possible to share the on average latency differences between Near, Common and Far banks accesses?

    Please see this data below:

    Read Access Latencies:

    • Near banks: 32.5ns
    • Common banks: 47.5ns (15ns slower than near)
    • Far banks: 57.5ns (25ns slower than near, 10ns slower than common)

    Write Access Latencies:

    • Near banks: 25ns
    • Common banks: 35ns (10ns slower than near)
    • Far banks: 40ns (15ns slower than near, 5ns slower than common)

    Summary of Latency Overhead:

    • Common vs Near: Read ~46% slower, Write ~40% slower
    • Far vs Near: Read ~77% slower, Write ~60% slower
    • Far vs Common: Read ~21% slower, Write ~14% slower

    Regards,

    Sahana