This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

MCU-PLUS-SDK-AM243X: Alignment-problems with GPMC pSRAM in OSPI in release-builds

Part Number: MCU-PLUS-SDK-AM243X

Hello, we are using SDK 08.06 and LTS2.1.3

I just noticed an alignment-Problem with the ospi-functions when we are building in release (-Oz). In debug everything works. We use an external pSRAM via GPMC. I am not sure anymore, but I thought we found something which said it can only access 4 byte alignments. We also use the -mno-unaligned-access compiler option.

The abort happens here:

Taking a look at the variables in question:

it seem it accesses an address aligned with 2 and internally it casts from an uint8_t to an uint32_t (which I think is dangerous anyway everytime, but I know the ARM allows unaligned access, but not for our psram...)

Also I checked the fault registers:

The 1 means alignment fault following the arm documentation:
https://developer.arm.com/documentation/ddi0406/b/System-Level-Architecture/Protected-Memory-System-Architecture--PMSA-/Fault-Status-and-Fault-Address-registers-in-a-PMSA-implementation/Fault-Status-Register-encodings-for-the-PMSA?lang=en#Bcejbahe

So... what would be the options here? Currently I'll try to align every data we have to 4 but I do not think that's the useful option.

Edit: I moved the memory to copy into internal SRAM. But still the abort happens.It happens right after calling this.


As you can see the srcPtr shows to an address aligned to 2. And this only happens with -Oz (and debug-symbols for the screenshot). If I build a Debug-elf with -Og the adresses are also aligned to 2 but this somehow works.

Debug generates the following assembly:

while release generates:

right after an assembly step the application runs into the abort.
What could we have done wrong?

Best regards

Felix

  • Update:

    We can directly trock it down it to the -Oz-option. -Os generates different code that works. We changed back to -Os for the SDK-compilation like in 08.04. Is this a compiler-issue?

    Best regards

    Felix

  • Hello Felix, 

    From the information you have provided, it looks like a compiler-issue. This issue was seen previously , we are trying to get more data on the bug. 
    Will get back to you once we have all the information on this issue 

    Regards, 
    K.Joshi

  • Hello Felix, 

    We are still trying to understand the full picture here i.e with which compiler version this issue was observed first and what was the fix for this ?. 
    Will get back to you within couple of days. 

    Best Regards, 
    K.Joshi 

  • Hey Keval,

    so in our case it did not come up with a sepcial compiler-version but aith a change of a flash-driver at our side, which is not built up like the flash-driver of the sdk (which does only aligned access I think). Happens both with LTS2.1.3 and STS 3.1.0 more were not tested at our side. Fix is to use -Os instead of -Oz for the SDKs driver-library.

    Thanks!

  • Hello Felix, 

    Looking at this in the first screenshot. The case of const char* src to uint32_t* is always dangerous. 

    Moreover glancing at the generated assembly we can make following argument. 

    1. In debug mode compiler generated ldr instructions where 2 byte aligned access are allowed. 

    2. In release mode to optimize the size as the src pointer is casted to uint32_t* compiler generates LDM instruction. 

    LDM instruction's behavior is undefined incase of 2 byte align access. 
    LDM and STM instructions require 4 byte alignment. 

    Thus this might not be a compiler bug. Anyways this need to be confirmed via further analysis 


    Regards, 
    K.Joshi

  • Hi Keval,

    Any progress on this issue?

  • Keval is OOO till 24th October. Please expect delays, he will be able to respond back once he is back.

  • Hello Ming, 

    The looking at the c code it definitely looks like a bug as a char* argument is casted to uint32_t* which is an unsafe cast. 
    Also the code generated with -Oz optimization flag i.e LDM instruction expects 4 byte aligned access. 

    The fix for this issue will be to handle 2 byte aligned access meticulously. 
    I will test the fix locally and update as soon as possible. 

    Regards, 
    K.Joshi 

  • Hello Felix, 

    We are arranging the setup and validating the fix on our side. Will update the status on this thread as soon as possible 

    Regards, 

    K.Joshi

  • Felix, 

    Validation is in progress. 

    Apologies for the delay. 

    Regards, 
    K.Joshi

  • Hey Keval,

    thanks for all the effort. Since it was not totally clear for me: Is this a fix in the compiler or in the SDK? Just that we can arrange everything when changing back to -Oz.

    Regards,

    Felix

  • Hello Felix, 

    The fix is in SDK in OSPI_writeFifoData() API. Its happening because of unsafe cast. We need to handle 2 byte aligned address gracefully. 

    Moreover , On our setup we didn't see LDM instruction generated with -Oz. 

    If possible can you please share a reference CCS project so that we can reproduce this on our end. 

    Regards, 
    K.Joshi

  • Felix, 

    Can you please explain your setup how OSPI driver is coming into picture ? 
    GPMC + pSRAM & OSPI  are two separate interfaces. 

    Thanks & Regards, 
    K.Joshi

  • Hey Keval,

    we are not working with ccs projects and are not bound to any IDE since we only use cli-based building. Also we probably cannot just create an example since there is a whole driver-layer in between :(

    But maybe we can find out why it is not generating the command at all. Our input into the function is not static generated code but can be variable over runtime. This also depends on the input we get from external sources (via webserver which uses our flash-driver for examples). I am not sure about what you wrote to try to debug it, but I can imagine it's a static address pointing to a 2-byte-aligned address known at compile time? maybe the compiler optimizes it the correct way then and only does the ldm command if the address can be variable at runtime and is not known at compile-time?

    best regards

    Felix

  • Hey Keval, answering to your second question:

    So in this case the OSPI-Driver is not used with the SDKs flash-driver. We wrote our own flash-drivers which handle all the page-stuff etc. but coming from that the address it reads from can be variable and is not ensured to be 4-byte-aligned. E.g. if we have a receive-buffer for a file, which is written via a webserver the webserver puts all its bytes into a buffer. Our flash-driver provides the possibility to write any size to any address you want, even odd addresses. Under the hood it has page-buffers and sector-buffers which are filled partly and where an end- and start-address will always be aligned to two, since this is the specification for octal-spi-flashes at DDR-settings. underneath it just erases sectors and writes pages.

    But coming to the page-writing it could happen that the src-pointer now shows to a 2-byte-aligned address since it is directly passed if it's a page in "the middle" of a whole data-packet. E.g. writing 1374 bytes (only 0x11 for example), starting at address 0xFE in the flash. The source in RAM of those 1374 bytes is at 0x7000000 for example, so pretty ok.

    the first two bytes are copied into a page-sized buffer which is filled with 0xFF. so only the last two bytes of this buffer are 0x1111. The first page is written down (which is fine, since it's a buffer that is 256 bytes big and is aligned to 4 automatically). now the next page is not copied into a buffer and directly taken from the source. that means that we copied the two bytes from 0x70000000 previously into a buffer but are now accessing address 0x70000002 directly and even pass this address to the OSPI-driver. And this should provoke this issue. The decision was mainly to keep copy-operations as less as possible and passing the src-pointer directly is the fastest way to go here when writing data to the flash.

    For the GPMC at least we once had alignment issues but solved them with a compiler-define (-mno-unaligned-access). So since that time we did not experience any alignment issues anymore regarding the GPMC/pSRAM-topic and the data which is stored there. And yes, this is a different topic. I imagine it could still happen to throw an abort there if we directly access an non-4-byte-aligned address but I think it's out of scope for this topic since I already moved the the buffers into SRAM.

    An addition to my previous answer:

    It may be important to know that we compile the sdk-libs beforehand and then later link them against our application. So at the time the sdk-driver-lib is compiled it does not know which input it will get.

    Regards

    Felix

  • Felix , 

    Thanks for the detailed explanation. 
    In sdk-flash driver the above mentioned problem doesn't occur as unaligned accesses were handled by the sw layer written on top of ospi driver. 


    However the implementation of the OSPI_WriteFifoData() is not safe. Moreover thanks for pointing this out , we have identified this issues as a potential bug and will be fixed in the next sdk release. 

    The solution is to handle the unaligned read access. 

    static void OSPI_writeFifoData(uintptr_t indAddr, const uint8_t *src, uint32_t wrLen)
    {
        uint32_t temp = 0;
        uint32_t remaining = wrLen;
        uint32_t unaligned_bytes = (4 - ((uintptr_t)src % 4))%4;
        uint32_t shiftVal = 8*(4-unaligned_bytes);
        
        // Handle unaligned access 
        if(unaligned_bytes>0)
        {
            memcpy(&temp,src,unaligned_bytes);
            temp = temp<<shiftVal;
            CSL_REG32_WR(indAddr, temp);
            src += unaligned_bytes;
            remaining -= unaligned_bytes;
            temp = 0;
        }
    
        uint32_t *srcPtr = (uint32_t *)src;
    
        while (remaining > 0)
        {
            if (remaining >= CSL_OSPI_FIFO_WIDTH)
            {
                CSL_REG32_WR(indAddr, *srcPtr);
                remaining -= CSL_OSPI_FIFO_WIDTH;
            }
            else
            {
                /* dangling bytes */
                memcpy(&temp, srcPtr, remaining);
                CSL_REG32_WR(indAddr, temp);
                break;
            }
            srcPtr++;
        }
    }
    
    

    Regards, 
    K.Joshi