This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

MCU-PLUS-SDK-AM243X: Optimized memcpy (linker flag --use_memcpy=fast) fails on flash memory space 0x60000000, if PHY mode enabled

Part Number: MCU-PLUS-SDK-AM243X
Other Parts Discussed in Thread: LP-AM243, SYSCONFIG

Tool/software:

Hi,

currently I am facing the problem that accesses to the flash on LP-AM243 board via octo (quad) SPI fail (some bytes are wrongly read as 0x00), if PHY mode and memcpy optimization (--use_memcpy=fast) are enabled.

Steps to reproduce:

  • Use newest SDK "mcu_plus_sdk_am243x_09_02_01_05" and newest compiler "ti-cgt-armllvm_3.2.2.LTS".
  • Import the example "ospi_flash_io_am243x-lp_r5fss0-0_nortos_ti-arm-clang". 
  • Set linker flag --use_memcpy=fast in project settings. Project->Properties->Build->Arm Linker->Advanced Options->Linker optimizations. Choose fast for --use_memcpy.
  • Then adapt the example project by performing one of the following steps:
    • Either disable DMA support in SysCfg tool for Flash/OSPI;
    • Or change APP_OSPI_DATA_SIZE macro in ospi_flash_io.c to e.g. 512 (something smaller than 1024).
    • One of the two steps above is needed so that the Flash_read() function of the Flash API will ultimately use memcpy(), otherwise (for copy operations > 1024 byte) the DMA will be used and the DMA works fine.
  • Compile example (I used debug build) and load it with the debugger on the chip, or flash it, it does not matter.
  • The example will fail with "ERROR: ospi_flash_io_compare_buffers:153: OSPI read data mismatch !!!"

The below screenshot illustrates what went wrong:

We have just jumped over the memcpy instruction. pSrc (an address inside flash; flash is mapped to 0x60000000) should be copied to pDst. We see at the right top that pSrc[7]==7, but pDst[7]==0 (I choose to only display the fist 10 values, so everything fits in one screen, there are also errors afterwards). Hence, there seems to be an issue with the optimized memcpy in combination with a flash address and the PHY mode of flash enabled. It works, if PHY mode is disabled.

Originally I stumbled upon this, because the EtherCAt Beckhoff example has the --use_memcpy=fast flag set by default. It also has the PHY mode disabled by default, but if you then enable it (and correct clock frequency settings, otherwise Flash_open() will already fail), flash reads start to fail.

I hope I have provided enough information. Feel free to ask, if anything is unclear. Thank you for your help.

Kind regards,

Martin

  • Greetings Martin,

    Thank you for your question along with the steps to reproduce the failure.

    Meanwhile I recreate this at my end, I want you to send me the MPU ARM Configurations from SysConfig, or the entire SysConfig file would be nice too.

    Regards,

    Vaibhav

  • Greetings Martin,

    I have been able to reproduce this issue.

    Please see the attached screenshot and points highlighted in yellow marking false read values of 0.

    Allow me sometime to investigate into this and find the root cause.

    Thank you for your patience.


    UPDATE:

    I am seeing that the flash writing is completely fine, as 0x60200000 shows correct values. So only the read operation needs to be investigated.

    Regards,

    Vaibhav

  • Hi Vaibhav,

    sorry for my late reply. I was out of office for the first two days of this week.

    Thank you very much, that you started investigating this so quickly. Please let me know, if I can help you in any way.

    Kind regards,

    Martin

  • Greetings Martin,

    I have few comments on memcpy().

    C library memory access functions like memcpy() always assumes standard memory and cannot be used with special device/peripheral memory accesses where unaligned access could cause fault.

    It is true that using --use_memcpy=small results in a slow version of memcpy. The slower one does a byte by byte copy and works. But instead of relying on the compiler's version of the memcpy implementation, I would recommend you to go ahead and experiment with a customized memcpy variant. You can experiment with 6/7/8 bytes memcopy.

    I would also request you to go through: https://software-dl.ti.com/codegen/docs/tiarmclang/rel3_2_0_LTS/compiler_manual/using_compiler/compiler_options/runtime_model_options.html#cmdoption-munaligned-access

    Originally I stumbled upon this, because the EtherCAt Beckhoff example has the --use_memcpy=fast flag set by default. It also has the PHY mode disabled by default, but if you then enable it (and correct clock frequency settings, otherwise Flash_open() will already fail), flash reads start to fail.

    I want you to correct my understanding on this one. So you mean to say that EtherCat Beckhoff example by default has memcpy as fast and phy as disabled. So if you enable phy over there then flash read starts to fail?

    Looking forward to your response.

    Regards,

    Vaibhav

  • Hi Vaibhav,

    Thank you for your comments.

    Yes, the EtherCat Beckhoff example has by default memcpy as fast and phy as disabled. So, if you enable PHY, flash read will fail. 

    Currently I just disabled the --use_memcpy option and I am basically fine with this. My main motivation for creating this thread was to hint a bug inside the Flash IO driver of the SDK to the TI dev team. Because it is not me at all, who decided to use memcpy() for a non standard memory area. The SDK function OSPI_readDirect() uses memcpy() on the flash area.

    Kind regards,

    Martin

  • Greetings Martin,

    Thank you for your response.

    Currently I just disabled the --use_memcpy option and I am basically fine with this. My main motivation for creating this thread was to hint a bug inside the Flash IO driver of the SDK to the TI dev team.

    I have conveyed your message to the software dev team.

    I created a JIRA for the mentioned issue and I will make sure to update you as per the progress.

    Regards,

    Vaibhav

  • Hello Martin,

    I have kept this thread open to provide you with status updates. So currently, the created JIRA cannot go into the MCU PLUS SDK version 10.0 as the code freeze has happened and the SDK is going to be out by mid of August, 2024.

    It will be mostly accepted into the release after 10.0.

    Regards,

    Vaibhav