This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

SK-AM62: Memory Bandwidth may be too low

Part Number: SK-AM62

Hello.

Out team tried to compare the performance of web browser on SK-AM62 against AM3558, AM57x. The PROCESSOR-SDK-LINUX-AM62X did not include any web browser, so we built chromium and compared it. We evaluated the performance comparison of CPU, FPU, Memory, Graphic, Disk (SD Card Access) like HDBENCH with the benchmark implemented by JavaScript. We hope that AM625x has very high speed, but,  sure, the AM625x has a high-performance CPU, but the memory access was bad, and it seemed to be only about half the speed of the AM572x. We try to determine if this result is due to the implementation of SK-AM625x, the lack of tuning of DDR4 of PROCESSOR-SDK-LINUX-AM62X, the specifications of AM625x SoC, Chromium built by ourselves, etc.

First of all, I was investigating whether it was a problem with Chromium that I built myself, and then the Processor SDK Linux Software Developer ’s Guide
I found the result of LMBench in the following links:

https://software-dl.ti.com/processor-sdk-linux/esd/AM62X/08_03_00_19/exports/docs/devices/AM62X/linux/Linux_Performance_Guide.html#lmbench

https://software-dl.ti.com/processor-sdk-linux/esd/AM65X/latest/exports/docs/devices/AM65X/linux/Release_Specific_Performance_Guide.html#lmbench

Comparing these, we were able to confirm the same tendency as the execution result on Chromium.

I thought I should know whether it was Read or Write, so I made a graph, but I didn't understand.

Therefore, it seems that the memory performance of AM625x (or SK-AM62) is not so good. I also checked the schematics of each board, SK-AM62 has DDR4-3200x16bit width, AM57x-EVM has DDR3-1600x32bit width, and it looked the same in terms of bandwidth. 

Is there a possibility that the memory performance of AM625x (or SK-AM62) will be improved by tuning around DDR4 of PROCESSOR-SDK-LINUX-AM62X in the future?

Regards.
Y.Bando
  • Can you explain a little more how you compiled and built the chromium browser on each platform?  The DDR4 configuration should be optimized for maximum frequency if you used the SDK, so i don't think the issue is with the memory.  Something must be unoptimized in the build that is restricting the browser performance.

    Regards,

    James

  • I knew that the web browser wasn't optimized enough, so I compared it in a software environment that I didn't build myself.

    Well, I'd like to ask you the value of LMbench in the Perfomance Guide of the Processor SDK Linux Software Developer's Guide, which is what TI did.

    I would like to use the benchmark results published in the Developer's Guide as the criteria for SoC selection, so please post the results in a properly optimized environment. Of course, I feel that the AM625x hasn't matured right after its release.

    ----

    By the way, I don't know the details because the performance comparison of Chromium browser was done by the contractor. For the AM335x and AM572x environments, we used what was built into the pre-built boot SD environment of the processor SDK. In the AM625x environment, we matched the version of the AM572x Chromium browser for comparison with the AM572x. However, we changed to the 64-bit version. He tried to build the latest Chromium if he had to build it anyway, but the latest Chromium uses clang as the compiler, and there is a possibility that it does not fit the Arago Project environment very much.

    AM335x​ Chromium 53.0.2785.143​ 32​bit
    AM572x​ Chromium 75.0.3770.142​ 32​bit
    AM62x​ Chromium 75.0.3770.142​ 64​bit

    -----

    The memory performance of the SK-AM62 is not good for the following reasons;

    AM65xx EVM   DDR4-3200 8bit x 4chips in parallel -> 3200MT/s x 8bit x 4 = 102,400

    SK-AM62         DDR4-3200 16bit x 1chip -> 3200MT/s x 16bit x 1 = 51,200

    AM572x EVM   DDR3-1600 16bit x 2chips in parallel -> 1600MT/s x 16bit x 2 = 51,200

    I suspected that even if the bandwidth was the same, the performance would not increase due to the overhead of bus switching. It has only 2GB of memory, so it may be better to use a 32-bit OS instead of a 64-bit OS. The time to prefetch 32bit instructions from DDR4 is half the time for 64bit instructions. 64-bit instructions are faster after being cached. Considering the impact of DDR4 effective bandwidth, caching is becoming more and more important. Even though it is a long-awaited 64-bit environment ... 

  • I will have to transfer your questions to our software experts to comment further.  

    BTW, the AM572x operates DDR3 at a maximum of 1066MTs, not 1600MTs.  Also, AM62x and AM65x operate DDR4 at a maximum of 1600MTs, not 3200MTs

    Regards,

    James

  • I misread AM62x TRM. It was 1600Mbps instead of 1600MHz.

    • Support of LPDDR4 memory type (up to 1600Mbps)

    • Support of DDR4 memory type (up to 1600Mbps)

    In other words, the upper limit is DDR4-1600, not DDR4-3200.

    Even if DDR4-1600 is the same, AM65x has a bus width of 32 bits and AM62x has 16 bits, so it was natural that there was only half the memory bandwidth.

    As for the performance of AM62x against AM65x, I understood that the bus width is half. So, If I think the AM62x has poor memory bandwidth, I can use the AM65x instead of the AM62x.

    The approximate memory bandwidth may look like following:

    AM65x 32bit DDR4-1600 : 1600x2=3200 : 4
    AM57x 32bit DDR3-1066 : 1066x2=2133 : 2.66
    AM62x 16bit DDR4-1600 : 1600x1=1600 : 2
    AM335x 16bit DDR3-800 : 800x1= 800 : 1

    So I think I'll make it Resolved, the high speed memory of AM572x is still a mystery, but the document link is broken, so now I can't confirm that the AM572x's values in the document were correct.

    www.ti.com/.../PROCESSOR-SDK-AM57X

    User guides Processor SDK - Linux - Software Developer's Guide
  • So I think I'll make it Resolved, the high speed memory of AM572x is still a mystery, but the document link is broken, so now I can't confirm that the AM572x's values in the document were correct.

    Note the AM572x has 2MB of L2 cache. So the LMBench tests with footprint of 1MB do not utilize the external memory at all on AM572x, but are pure L2 cache contained numbers. In general I suggest running a multithreaded version of LMBench bandwidth measurements to get a better example of achievable memory bandwidth. The automated scripts we have and include in the performance guide include many runs with parameters that are not so useful and are all single threaded. For example:

    bw_mem -P 4 16M bcopy 

    on AM62x will be representative of achievable memory bandwidth.

      Pekka