CCS/AM5728: QSPI - Read sequence with 4 lines

Sylvain PALMERI

Part Number: AM5728

Tool/software: Code Composer Studio

Hi team,

I didn't find all the information in the TRM (despite of the 8059 pages) needed to develop the QSPI drivers when the master is configured to receive data on the 4 lines.

The Master (AM572x) sends a command to the Slave (FPGA) to read a block of memory.

1. The line D0 need to switch from output to input after sending the command. What is the expected delay (1 clock period ?)

2. The memory shared into the FPGA is organized with 32-bits data words. Do you have any information regarding the Read sequence when the 4 lines are used. Is the sequence look like the following picture:

Regards,

Sylvain.

over 5 years ago

0 z over 5 years ago

TI__Expert 4015 points

Section 24.5.4.1.2 of TRM describes the order of a QSPI read transaction:

1. SPI chip-select goes active.

2. Read command byte is issued.

3. 1 to 4 address bytes, which correspond to the first address supplied, are issued.

4. 0 to 3 dummy bytes are issued, if “fast read” is supported.

5. Data bytes are read from the external SPI flash memory.

6. SPI chip-select goes inactive

The picture above depicts 1 command byte, 3 address bytes, 1 mode byte, 1 dummy byte, and 2 data bytes. The mode bytes are not supported by this QSPI interface. I am not sure when the D0 becomes an input, but the direction of D0 is automatically controlled by the SPI state machine.

-Zack

0 Sylvain PALMERI over 5 years ago in reply to z

Expert 1145 points

Hi Zack,

I didn't find any information about PHY in the TRM related to the QSPI module, but the clock source selected is PER_QSPI_CLK from DPLL_PER (bit 24 of CM_L4PER2_QSPI_CLKCTRL register).

The Device connected to the CPU is not a Flash, but a FPGA where the SPI protocol is quite different from a standard Flash QSPI protocole, so that I can't use the Memory mapped port for several reason: 1. the adress need to be sent first (in MMP the command is sent first) 2. chip select must stay active the whole transaction (in MMP, the CS go back inactive after each byte received) 3. The data sent by the FPGA are 32-bits length (in MMP, I think that data are 8bits lenght). 4 The delay between each byte transmit is about 1~2us

So I need to use the Configuration port and manage the transaction manually. That's ok for now, but I'm face of several issues. The SPI protocol fixes by the FPGA is "3 bytes address + 1 byte command + Ndata" The 32bits data follow the command and the MSB is present on the next SCLK rising edge. The number of data depend on the command. So, it's possible to read the whole FPGA memory without any interruption.

So I try to find a way to read the whole memory without interruption but it seems that the read sequence is limited by the number of QSPI_SPI_DATA_REG registers (which is limited to 4). That means that I can read 4 words 32-bits lenght only in a raw. It's really an issue for me because this read sequence is include in a control task that must be as fast as possible. I can't imagine wast 1us all 4 words. PLease could you tell me a way to go through this limitation.

The next step will be to use the 4-lines to reduce the transmission time, and that's why I need to know how the data are organized into the 4-lines. So you confirm that the organisation showed in the picture is correct?

Regards,

Sylvain.

0 z over 5 years ago in reply to Sylvain PALMERI

TI__Expert 4015 points

I don't think I understand the interface you are trying to use. From your description, I have a few questions:

1. the address needs to be sent first (in MMP the command is sent first).

Makes sense. I understand why you can't use MMP.

2. chip select must stay active the whole transaction (in MMP, the CS go back inactive after each byte received).

The CS does not go inactive after each byte. I'm not sure where you get that impression. According to the TRM:

"The QSPI supports long transfers through a frame-style sequence. In its generic SPI use mode, a word can be defined up to 128 bits and multiple words can be transferred during a single access. For each word, a device initiator must read or write the new data and then tell the QSPI to continue the current operation. Using this sequence, a maximum of 4096 128-bit words can be transferred in a single SPI read or write operation."

3. The data sent by the FPGA are 32-bits length (in MMP, I think that data are 8bits lenght).

See the answer to the above. The data received can take the form of 32-128 bit words, and multiple words can be received in a single transaction. This is true of both MMP and configuration port transactions.

4. The delay between each byte transmit is about 1~2us

Are you saying that between each byte, the FPGA waits 1~2us? Or do you mean there is a ~2us delay between each transaction (ie between CS deselecting and selecting again?)

To address your other questions:

That means that I can read 4 words 32-bits lenght only in a raw. It's really an issue for me because this read sequence is include in a control task that must be as fast as possible. I can't imagine wast 1us all 4 words. PLease could you tell me a way to go through this limitation.

Use the QSPI_SPI_CMD_REG[25:19] to set the length of a word. You will want to maximize this at 128 bits. QSPI_SPI_CMD_REG[11:0] sets the number of words in a frame. Once the QSPI has transmitted or received all the words in the frame, then CS will go high.

I have a few comments on the picture you sent. The timing diagram above uses all four bits as bidirectional. On the QSPI, only bit D0 is bidirectional. The rest are input only. As the TRM says: "The QSPI supports only dual and quad reads. Dual or quad writes are not supported."

When reading in quad mode, 4-bit nibbles are received into the QSPI_SPI_DATA_REG, and shifted left with each new nibble. So the first nibble received will be the most significant. The order of data received in your image appears to be correct.

Finally, using the configuration port, I don't think it's possible to write command/address/dummy bytes, switch the D0 line to an input, and then read, all in one transaction. This can be done when coordinated by the MMP, but as we established, you can't use the MMP. If you need CS to remain low between writing address and command data, and reading returned data, it may be possible to bit bang some GPIO as a CS, giving you complete control over the length of the transaction when using the configuration port. For example, a transaction could go like this:

1) Set CS (GPIO) low.

2) Load address, command, Alt, and Dummy cycles into the data registers.

3) Use the QSPI_SPI_CMD_REG to set up a single 48 bit word transaction, and trigger "4 pin write single" command using the register field QSPI_SPI_CMD_REG[18:16].

4) When the "word complete" interrupt is triggered, use the QSPI_SPI_CMD_REG to set up a 128 bit x N word transaction (where N can be up to 4096, so a maximum of 65,536 bytes read).

5) Trigger a 6-pin quad read using QSPI_SPI_CMD_REG[18:16].

a) When the "word complete" interrupt is triggered, read the received data from the data registers, and trigger another read using QSPI_SPI_CMD_REG[18:16].

b) When the "frame complete" interrupt is triggered, the transaction is complete.

6) Set CS (GPIO) high.

However, if the FPGA you are using will allow for CS to go high between the command and data portions of the transaction, you can just use the normal chip select.

Regards,

Zack

0 Sylvain PALMERI over 5 years ago in reply to z

Expert 1145 points

Hi Zack,

Thanks a lot for your time spent to help us.

Note1. -> 3.

I didn't know that we could mix Configuration Port registers to configure the transaction when we are in Memory Map Port to increase the word and frame lenght. The qspi example software in the CSL folder does not not show this possibility. But anyway, the protocole specified by the FPGA is too much different to use MMP.

Note 4.

Please look at the following scope capture:

Yellow: D0 (MOSI) - The address/Read command is sent - I initiate a WRITE_SINGLE. As the command is 32-bits, I can read the three 32-bits data in a raw.

Purple: D1 (MISO) - The first packet corresponds to the three 32-bits data following the SINGLE_WRITE command. I don't know why it works as the command send to the QSPI module is a "write" command, but the QSPI_SPI_DATA_REG, QSPI_SPI_DATA_REG_1 and QSPI_SPI_DATA_REG_2 register are updated with the data present on the D1 line following the command.

The next two packet are 128bits read sequence performed following a READ_SINGLE command.

The CS is not represented but it is active during all the transaction.

Here is the code implemented:

As you see between each frame, there is a delay about 2.5us. As the FPGA is able to transfer 32-bits data continuously without any interuption since the CS stay active, I would like to find a way to perform a read of the whole FPGA memory without these delays.

QSPI 4-lines

You said "Only D0 is bidirectionnal... " -> I agree

you said:

if the CS go back inactive between write an read sequence, the FPGA will cancel the transaction anyway.So you probably right and using the CS as a standard IO would be the best solution. My concern is moslty the time for the IO to switch from output to input because if the FPGA switch the line to output first, it can have a conflict and result to a damage for both devices.

Regards,

Sylvain.

0 z over 5 years ago in reply to Sylvain PALMERI

TI__Expert 4015 points

First I want to clarify the terminology being used:

A frame refers to all of the data transferred during a single transaction. Everything between the edges of CS during which data is transferred is a frame. So what you are setting up in lines 295 and 296 is not 3 frames, it is a single frame of 3 words, each word is 128 bits in length. Each bursts of clock activity you see on the scope is a word.

So with that said, I don't think that what you've captured on the scope is what I would recommend doing. The first word has data going out of MOSI, and later data comes in on MISO, all in the same word. If you tried to expand this to 4bit read, with a bidirectional D0, this will not work. Instead, you should be separating the write (command and address) portion from the read (data) portion into two separate frames. As you do the following, keep the FPGA's CS line active, but monitor the CS line output by the QSPI module, as well as MOSI, MISO, and Clk:

1)The write portion should send only enough clock cycles to get the FPGA ready to send data. So set frameLength=1, and length=48. Then trigger the write command as you do in line 308. No data should come out of the FPGA during this frame. Please see the TRM, section "24.5.4.1.3 SPI Control Interface" for details on how to load the command and address data into the registers.

2)Next perform the read portion. Set frameLength=2 (or 3, or however much data you want to read), and length=128. Do the reads as you are in your example code.

Doing this will prevent an issue of data collision on D0, since you are completely separating the write and read portions of the sequence.

As for the gap between words, I understand the issue now. it's caused by the cpu reading from the data registers before initiating the next word. I'm not sure if the gap is supposed to be that big. This will take me some time to look into, and see if there is a solution.

Best,

Zack

0 Sylvain PALMERI over 5 years ago in reply to z

Expert 1145 points

Hi Zack,

I read the MISO line in the same time I write the command because it's the only way I found to not have any delay. In any how, I get this delay if the number of words read exceed 3.

But you are right, I need to split to be able to read words with 4-lines. I will modify my code according to your advices.

So I really hope you'll find a fix for the delay.

Regards,

Sylvain.

0 z over 5 years ago in reply to Sylvain PALMERI

TI__Expert 4015 points

We will see what can be done to minimize the gap. Do you have a target for throughput? I have a few questions:

1) What core is this code running out of, and what frequency?

2) Where is pData pointing to?

3) If pData is in DDR memory, what is the DDR speed?

4) Is the location to which pData is pointing cached?

5) What happens inside the function QSPIreadData()?

-Zack

0 z over 5 years ago in reply to z

TI__Expert 4015 points

Also, the way the code is written, it seems that you are waiting for the QSPI bus to be idle before performing the transfer from data registers to pData. If that is true, then you are essentially wasting processor time while the QSPI bus is busy. If you run the functions in the following order:

QSPIsetCommandReg();

QSPIreadData();

QSPIwaitIdle();

Then the internal transfer of data will happen while the QSPI bus is reading. The only modification will be that you have to do the first "Read" command outside of the "for" loop, then wait for it to be idle, before going into the loop. That will make sure that the first occurrence of QSPIreadData() is reading valid data. Hopefully this solves your issue.

-Zack

0 Sylvain PALMERI over 5 years ago in reply to z

Expert 1145 points

Hi Zack,

Sorry for this late answer, but I was not available till now.

I modified the order as you suggested and it reduces significantly the time into the gap. It's a reallly good progress. Here is a capture of the result:

The first packet is the "Read" command sent to the MOSI and the second is the 32b data received on the MISO. In purple, its a GPIO toggled around the function QSPIwaitIdle();

That means that the busy bit stay active when the transfert has finished during 360ns.

Do you have any idea why the bit stay high ?

Regards,

Sylvain

0 z over 5 years ago in reply to Sylvain PALMERI

TI__Expert 4015 points

Sylvain,

We're looking at the data line in this scope shot, but what is happening on the clock line during this 360ns period? And CS? If there is a a similar 360ns gap between clock stopping and CS, then I can't account for that. But if CS more closely matches the duration of clock activity, that tells me there might be a delay in updating the busy flag from the memory mapped register. Switching to using interrupts instead of register polling may help in reducing this unnecessary delay in your code.

-Zack

0 Sylvain PALMERI over 5 years ago in reply to z

Expert 1145 points

Hi Zack,

I think this time is coming from the "while" loop. Reading register takes time as well, so I think we need to deal with this.

I have a last question : What is the state of the D0 line (input or output) when the CS is inactive? Between quad read transactions, it's not really clear in the TRM if the line stay as an high impedance input.

Regards,

Sylvain

0 z over 5 years ago in reply to Sylvain PALMERI

TI__Expert 4015 points

D0 will become high-impedance after a write transaction, according to table 7-46 and figure 7-39 in the Data Manual.

Before, during, and after Quad Read transactions, D0 will be a high-impedance input.

Processors

Processors forum

CCS/AM5728: QSPI - Read sequence with 4 lines