This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM2431: OSPI Direct Access length

Part Number: AM2431
Other Parts Discussed in Thread: SYSCONFIG

I am working with an AM2431 trying to get the ospi peripheral to talk to a psram. I need to be able to control the amount of data read during DAC reads in order to meet one of the psram timing requirements. The amount of data accessed seams to be multiples of 32bytes but I have been unable to determine how it decides what that multiple will be.

Also,

I found in the TRM and have seen it on the logic analyzer that there is a relationship between the ospi input clock frequency and the minimum divisor needed to make the bus signal timing correct. There is little guidance into what are valid combinations of input clock frequency and divisors, could you provide more information on valid configurations? 

thanks,

|Nick

  • Hi Nick,

    The amount of data accessed seams to be multiples of 32bytes but I have been unable to determine how it decides what that multiple will be.

    AM243x possess a 32bit bus, but the amount of data transferred does not depend on an MMR value on the device if that is what you are asking. in OSPI communication the amount of data transferred is typically controlled by what kind of command your external memory support. I'd suggest you go over the psram's datasheet specifically looking over what multi read/write commands it supports during operation. You will most likely find functions that allow you to access entire pages and smaller blocks from them at a time

    As for your second question, the TRM is currently the document to reference for clock configuration options. You can do this in different ways and it depends entirely on your use case.

    • You can configure your baud rate divider in the OSPI_CONFIG_REG:MSTR_BAUD_DIV_FLD. Just keep in mind the considerations made in section 12.3.2.4.16 OSPI PHY Module before you configure this value
    • If the external memory supports an external loopback pin, you can use this with AM243x to facilitate timing closure at high speeds. Section 12.3.2.2 OSPI Environment shows the connection for this
    • OSPI_DEV_DELAY_REG also helps introduce relative delays into the generation of the output signals of the AM243x device. Section  12.3.2.5.2 Configuring the OSPI Controller for Optimal Use goes over instructions on how to use this

    Hope you find this helpful

    Best,

    Daniel

  • Daniel,

    I have become very familiar with the psram (APS6404L-SQRH) and the ospi interface over the last few weeks. When using STIG or Indirect access it is possible to program the ospi registers with the amount of data to transfer. During Direct Access I would imagine that the amount of data accessed is related to cache, buffers, bus width, and a few other factors. Trouble is the psram has a maximum CE active time of 4us which using 1GHz as my input clock and 28 as a divider (values i had to find experimentally) gives 142 clocks which is less than a 64 byte transfer with the command, address, and wait states. How can I guarantee that while using direct access I will never create a transfer that will take too long?

    Thanks,

    |Nick

  • Hi Nick,

    During Direct Access I would imagine that the amount of data accessed is related to cache, buffers, bus width, and a few other factors

    DAC mode is just memory mapped accesses to the OSPI region, so either the CPU or DMA does accesses to that memory address it goes directly out to the flash. The amount of data accessed by OSPI in DAC mode depends on how much data either the CPU or the DMA is requesting. If you have the R5's configured to have OSPI as just normal non-cacheable memory, then the amount of data read depends on code (16 bit read will read 2 bytes, 32 bit will read 4 bytes). DMA is affected by ICNT0 and ICNT1 in the configuration for the transfer.

    How are you accessing your RAM with the OSPI controller? are you using any of the R5 cores or the DMA?

    -Daniel

  • Daniel,

    using an R5 core and the ospi I have the ospi currently setup with a 1GHz input clock, a divisor of 32 to make the spi clock, octal mode, DTR, spi mode 0, legacy mode disabled, 4 address bytes (its a 128MB psram im trying to talk to), it uses dual opcodes so that is enabled and both upper and lower command registers have the same commands set in them, DAC mode enabled, DDR commands enabled, the device size is set, the read and write dummy clocks are set, the dev delays are set, and the wel is disabled as it isnt used on psram. I have cache setup for the memory mapped region as I found that using any other setting besides "Region Attributes: Cached" in the mpu setting of sysconfig causes crashes when the ospi access is attempted. I am not using the DMA and am trying to avoid the PHY because of the errata. I have found that with the psram and not running at the limits of the parts the phy tuning will not converge either. With this setup my DAC is memory mapped to address 0x60000000 and I access it using a test for loop copied below:

    #define DATA_AMOUNT 32u
    #define TEST_LEN (262144u-1u)
    
    uint32_t fail = 0u;
    void testPsramForever(void)
    {
        //setup trigger pin
        setPinGpioDirection(TP128, GPIO_OUTPUT);
        setPinLogicHigh(TP128);
    
        dumpOspiRegs();
        uint32_t pass = 0u;
        fail = 0u;
    
        while(received2 != 'c')
        {
            runTest() ? pass++ : fail++;
            //waitMicroseconds(1000000u);
            printString("\n\rPSRAM SINGLE TEST DONE");
            printString("\n\rPSRAM PASS: %d", pass);
            printString("\n\rPSRAM FAIL: %d\n\r", fail);
        }
    }
    
    static Boolean runTest(void)
    {
        uint8_t read[DATA_AMOUNT] __attribute__((aligned(128U)));
    
        for(uint32_t o = 0u; o < TEST_LEN; ++o)
        {
            //write
            while((CSL_REG32_RD(0x0FC40000u) & 0x80000000) == 0u){}
            uint8_t* psram = (uint8_t*)(0x60000000u + (o * DATA_AMOUNT));
            for(uint32_t i = 0u; i < DATA_AMOUNT; ++i)
            {
                //psram[i] = data[i];
                psram[i] = TEST_DATA[i];
            }
    
            for(uint32_t i = 0u; i < DATA_AMOUNT; ++i)
            {
                read[i] = 0u;
            }
    
            //read
            while((CSL_REG32_RD(0x0FC40000u) & 0x80000000) == 0u){}
            psram = (uint8_t*)(0x60000000u + (o * DATA_AMOUNT));
            //CacheP_inv(read, DATA_AMOUNT, CacheP_TYPE_ALL);
            for(uint32_t i = 0u; i < DATA_AMOUNT; ++i)
            {
                read[i] = psram[i];
            }
            for(uint32_t i = 0u; i < DATA_AMOUNT; ++i)
            {
                if(read[i] != TEST_DATA[i])
                {
                    setPinLogicLow(TP128);
                    waitMicroseconds(1000000u);
                    printString("\n\rPSRAM FAIL");
                    printString("\n\raddress: %x", psram + i);
                    printString("\n\rbytes = %d\n\r", (o * DATA_AMOUNT) + (i));
                    printString("TEST_DATA: %d, i = %d", read[i], i);
                    printString("\n\rread: %x,\n\rgot:  %x", TEST_DATA[i], read[i]);
                    setPinLogicHigh(TP128);
                    o--;
                    fail++;
                    CacheP_inv(read, DATA_AMOUNT, CacheP_TYPE_ALL);
                    break;
                    //return false;
                }
            }
            //printString("\ro = %d          ", o);
        }
        return true;
    }

    Under some conditions the ospi will respect the page and block settings but I have found that changing the for loop size is the biggest determining factor of the triggered ospi access. If I set the for loop to 16 or 32 bytes with a page size of 32 bytes I will get ospi transactions that are multiples of 32 bytes but I need to guarantee that those access will never be bigger than 64bytes at a time, excluding overhead, to make sure that I do not violate a 4us maximum CE active timing requirement of the psram.

    Thanks,

    |Nick

  • Hi Nick,

    I have cache setup for the memory mapped region as I found that using any other setting besides "Region Attributes: Cached" in the mpu setting of sysconfig causes crashes when the ospi access is attempte

    When the R5 does a read with OSPI set as a cacheable region of memory it'll do 32 byte (cacheline sized) read from flash. In order to meet your psram timing requirements you need to avoid more than 2 cacheline fetches happening back to back. 

    Each cache miss will result in a cacheline fetch to OSPI, this is the 32byte access observed. Multiple misses will result in multiple cachleline fetches.

    I need to guarantee that those access will never be bigger than 64bytes at a time, excluding overhead, to make sure that I do not violate a 4us maximum CE active timing requirement of the psram.

    With the previous consideration in mind: you would first need to determine your performance needs, mainly knowing the slowest throughput you can use, and then start adding memory barriers in the for loop based off your need in order to reach your objective.

    Another option would be to put a small delay after reading 32 bytes, or even just after the first read on a 32 byte boundary.

    The way the OSPI DAC operates is an ARM concept unrelated to the SOC. To understand a bit more about how it works you can refer to the ARM Architecture Reference Manual. I'll provide the link down here:

    ARM Architecture Reference Manual: ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition

    Additionally, here are the links to the memory barriers section with the explanation on them: ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition

    Please go over this and reach back again if you still have the same question or if you have any additional ones

    Best,

    Daniel

  • Daniel,

    Thank you for the information. I had been playing with the cache in different ways before hand, part of the remnants can be seen in the commented out cache invalidate calls in the excerpt above. I did try putting data barriers after my for loop memory map accesses but did not see improvement, its possible I did it wrong and will try again.

    My coworker is curious about the OSPI_DEV_SIZE_CONFIG_REG register. He is hoping that the page size would limit read access but thinking about it flash does not have read limitations. The page size and block size are related to write and erase access. Is that why the register has no effect on read length? 

  • Hi Nick,

    The page size and block size are related to write and erase access

    Yes, this is correct. This will have no effect on read access size.

    If you have any more questions about the data barriers during memory mapped access, please feel free to reach back. I will make sure to loop you in with the R5FSS experts in our team.

    Best,

    Daniel