AM4376: Ethernet transmit DMA error

Clifford Geschke

Part Number: AM4376

I am writing a driver for the AM4376 Ethernet sub-system. My DMA engine stops sometimes with register CPSW_DMASTS == 0x00200000 = TX_HOST_ERR_CODE == 0010 = "Ownership bit not set in SOP buffer". I understand what this means and I am certain that I am setting the SOP and Ownership bits in the appropriate TX buffer descriptor. I am wondering if there are some timing issues between writing the descriptor and starting the DMA engine. Also are there any special timing issues recovering from a misqueued packet? Is there anything I can do or wait for to make sure the descriptor data I push out is actually visible to the DMA engine? The descriptors are in a part of memory that is not cached, so that should not be the problem.

Any advice would be appreciated.

Thanks

Cliff

over 6 years ago

0 RonB over 6 years ago

TI__Mastermind 30526 points

Cliff,

Thanks for your post. Could you please confirm which software package you are using and what version?

Thank you.

0 Clifford Geschke over 6 years ago in reply to RonB

Prodigy 160 points

I am using Code Composer Studio 7.6 on Linux. I am developing a custom driver written in C for the open source RTEMS operating system.

This is not a software package issue. It is a question about how the CPSW DMA and ARM processor interact.

Thanks,

Cliff

0 Ming Wei over 6 years ago in reply to Clifford Geschke

TI__Guru 55045 points

Hi Clifford,

Are you developing your Ethernet driver for AM437x from scratch? TI provided a Processor SDK for RTOS which has the Network Development Kit(NDK), EDMA LLD, and Ethernet Driver. Of course, the TI RTOS and your RTEMS are different, but the porting work can be easily done via the changes to the OS Abstraction Layer (OSAL) which is also part of the Processor SDK for RTOS.

Here is the link to the latest release (6.00.00):

http://software-dl.ti.com/processor-sdk-rtos/esd/AM437X/latest/index_FDS.html

Best regards,

Ming

0 Clifford Geschke over 6 years ago in reply to Ming Wei

Prodigy 160 points

We are porting over an RTEMS driver for a different processor, so essentially it is from scratch.

I am trying to understand the timing between the CPDMA and the host CPU during transmit. I have seen situations where the CPDMA appears to be halted (CPSW_STATERAM_TX0_HDP == 0) after a misqueue but the OWN flag on the next descriptor spontaneously changes from 1 to 0. I am guessing there are some internal timing issues with when the CPDMA changes the various flags and registers (OWN, EOQ, and CPSW_STATERAM_TX0_HDP). I would have thought that once CPSW_STATERAM_TX0_HDP was set to 0, all Tx DMA operations would have stopped. It may also be due to the bus as to when the host sees these changes. I am trying to understand how all this works, but there does not appear to be detailed information about the timing. I have experimentally determined that a 1 usec delay during the misqueue handling after seeing EOQ and CPSW_STATERAM_TX0_HDP == 0 fixes things, but I would rather wait for something specific to minimize the wait time.

0 Ming Wei over 6 years ago in reply to Clifford Geschke

TI__Guru 55045 points

Hi Clifford,

According to the AM437x TRM (spruhl7h), the CPSW DMA Transmit operation works as following:

1. Host constructs the transmit queues in memory and writes the appropriate TX DMA state head descriptor pointers. (Ownership Bit of SOP to 1, EOQ of EOP to 0)

2. The port begins TX packet transmission when the TX DMA state head descriptor pointer has been set to none zero (with correct SOP and EOP settings)

3. Once a packet transmission (from SOP to EOP) is done, the port will set the ownership bit of SOP to zero and write the packet's last buffer descriptor address to the queue's TX DMA State Completion Pointer (which will generate a interrupt to the host). If the packet transmitted is the last one in the queue, then the port will also set the EOQ in the EOP to 1, clear the Ownership bit of the SOP, and set the appropriate DMA state head descriptor pointer to zero.

4. When the host gets the interrupt from the port, it should release all the buffer associate with the packet (from SOP to EOP) if the Ownership bit of the SOP is set to 0. Once host processed all the transmitted packets, it will write the address of the last buffer descriptor to the to the queue’s associated Tx Completion Pointer to de-assert the interrupt (which has to match the address written by the port)

5. A misqueued packet condition may occur when the host adds a packet to a queue for transmission as the port finishes transmitting the previous last packet in the queue. The misqueued packet is detected by the host when queue processing detects a cleared Ownership bit in the SOP buffer descriptor, a set End of Queue bit in the EOP buffer descriptor, and a nonzero Next Descriptor Pointer in the EOP buffer descriptor.

From what you described, it looks like the misqueued packet condition has happened and has not be handled properly. Please check the TRM (http://www.ti.com/lit/ug/spruhl7h/spruhl7h.pdf) section 15.4.1 Transmit Operation for details.

Best regards,

Ming

0 Clifford Geschke over 6 years ago in reply to Ming Wei

Prodigy 160 points

I am very familiar with the TRM and the procedure described in it.

The problem is during misqueue handling.

I see the SOH and EOQ bit set and the next descriptor pointer is non-zero, and CPSW_STATERAM_TX0_HDP == 0 indicating the Tx DMA is idle.

But it apparently it is not idle.

The problem is in the next descriptor.

In theory, the next descriptor has both SOP and OWN set because it is not processed. This is required to start the DMA again using this next descriptor.

But when I look, I initially see SOP and OWN set. But a short time later, OWN is cleared. Even though CPSW_STATERAM_TX0_HDP remains zero!

How can that be?

It is not my driver that is clearing OWN.

My work around is to wait for 1 usec, check OWN, and set it again before restarting the DMA. This is unsettling to me and I am trying to understand what is going on.

If I blindly follow the instructions in the TRM, I get a "SOP without OWN tx" error occasionally, which stops the entire DMA stream.

Thanks for looking into this,

Cliff

0 Ming Wei over 6 years ago in reply to Clifford Geschke

TI__Guru 55045 points

HI Clifford,

You said:

The problem is during misqueue handling.

I see the SOH and EOQ bit set and the next descriptor pointer is non-zero, and CPSW_STATERAM_TX0_HDP == 0 indicating the Tx DMA is idle.

But it apparently it is not idle.

I assume you meant "I see the OWN bit of SOP and EOQ bit se to none zero". In this case, I do not understand why the OWN bit of SOP is not cleared. Since the EOQ has been set and the HDP has been cleared, it means the TX queue is empty.

Can you clarify that when do you see both "OWN bit of SOP and EOQ have been set to none zero"? During the TX completion interrupt processing or host adding a new TX packet?

If you send one TX packet at a time (wait until the previous TX packet transmission completes, before sending the next one), do you still see the incorrect SOP and EOQ

Here are a few more questions:

1. How many descriptors/buffers in you each TX packet? ( Are SOP and EOP the same or not?)

2. How many TX channels are you current using?

3. Are there any RX channels in use at the same time?

Best regards,

Ming

0 Clifford Geschke over 6 years ago in reply to Ming Wei

Prodigy 160 points

Hello Ming,

More details:

In the current (just completed packet):

SOP is set, OWN is clear, EOP is set, EOQ is set, next descriptor pointer in EOP is non-zero. PSW_STATERAM_TX0_HDP == 0

In the next descriptor (pointed to by current):

At first SOP is set, OWN is set.

After a short time (< 1 usec) OWN is cleared but not by software.

I do not see this with single packet transmissions. This is an intermittent problem that occurs maybe 1 or 2 times per 1000 packets. I do not see very many misqueues. And this does not happen every misqueue.

There is no handling at interrupt level. The tx is serviced by a system thread that is awakened by the interrupt or by posting a new output or by a periodic clock. It can be checking for completed packets at any time. New Tx packets are added and sent ones removed by the same thread, so there is no overlap of the two operations. Only DMA is operating asynchronously.

1. There are typically 1 to 3 descriptors per TX packet. So sometimes SOP and EOP are the same descriptor.

2. I am using only 1 Tx channel.

3. There is 1 Rx channel active at the same time and full-duplex data is flowing.

Best regards,

Cliff

0 Ming Wei over 6 years ago in reply to Clifford Geschke

TI__Guru 55045 points

Hi Cliff,

Thanks a lot for the detailed description.

From your previous post, it looks like the issue was caused by the misqueue:

1. The port reaches the end of the TX queue and set the OWN bit of SOP and the EOQ of the EOP before the host added a new packet into the TX queue. Port moves the current completed packet into the Tx DMA State Completion Pointer.

2. The host detects the misqueue condition and re-started the DMA for the packet pointed by writing the address of the first buffer descriptor of the next packet pointer in the current completed packet to the Tx queue head descriptor pointer.

3. The port sets the OWN bit of SOP (and also the EOQ) after the TX transmission completed (in 1us).

If the above sequence is correct, then the IP is doing what was told to do. No issue then.

What you described before was that you have to set he OWN of SOP from 0 to 1 after 1us and re-start the DMA to make the TX going. Is it possible this case is caused by the two re-starts of DMA? The re-start of the DMA in step 2 should be enough, the host should just wait for interrupt to come. The set OWN of SOP from 0 to 1 and re-start the DMA (your workaround) seems redundant.

The bottom line is that the re-start of DMA should only be done once and it should have been done in the mis-queue handling by the host when the new packet is added and the mis-queue is happening.

One question:

Where the mis-queue is detected and handled in your program and what steps are done during the process?

Best regards,

Ming

0 Clifford Geschke over 6 years ago in reply to Ming Wei

Prodigy 160 points

Sorry you have it wrong

1. The port reaches the end of the TX queue and clears the OWN bit of SOP etc.

2. The hosts detects the misqueue and does not restart the DMA. It checks that the DMA is idle and checks the OWN bit of the next descriptor which is the SOP for the next packet. At first this OWN bit is set as expected. But within1 usec, the OWN bit or the next descriptor is cleared by someone. Probably the port. At this point the DMA has not been restarted at all.

3. If the host attempts to follow the normal misqueue restart by writing the next descriptor address to the Tx queue head descriptor pointer, the DMA will see SOP set and OWN not set. This is an error that stops the DMA.

The work around is to delay 1 usec and explicitly set the OWN bit in the next descriptor before writing the next descriptor address to the Tx queue head descriptor pointer.

The problem appears to be that the port is clearing the SOP in the next descriptor even while it is sitting idle with the Tx queue head descriptor pointer == 0.

Under what conditions could the SOP bit be cleared when the DMA is idle?

Cliff

0 Clifford Geschke over 6 years ago in reply to Clifford Geschke

Prodigy 160 points

Sorry for confusing things. The last two sentences should be:

The problem appears to be that the port is clearing the OWN in the next descriptor even while it is sitting idle with the Tx queue head descriptor pointer == 0.

Under what conditions could the OWN bit be cleared when the DMA is idle?

Cliff

0 Ming Wei over 6 years ago in reply to Clifford Geschke

TI__Guru 55045 points

Hi Cliff,

In you previous post, you said:

In the current (just completed packet):

SOP is set, OWN is clear, EOP is set, EOQ is set, next descriptor pointer in EOP is non-zero. PSW_STATERAM_TX0_HDP == 0

In the next descriptor (pointed to by current):

At first SOP is set, OWN is set.

After a short time (< 1 usec) OWN is cleared but not by software.

The only condition all the above happen is when the following occur:

1. When port finished TX transmission of the current packet and clears the OWN of the SOP and set the EOQ of EOP, issue the interrupt, but has not clear the TX DMA state head descriptor pointer yet.

2. The host writes to a address of the new packet to the next pointer of the current packet and also to the TX DMA state head descriptor pointer to trigger the DMA for the new packet.

3. The port then clears the TX DMA state head descriptor pointer.

Adding a condition (has to be zero) to the host write to the TX DMA state head descriptor pointer should eliminate this condition.

Ming

Processors

Processors forum

AM4376: Ethernet transmit DMA error