TCI6486CSL: TMS320TCI6486 DSP EMAC Rx DMA SOF Overrun Problem

Shalini Shukla

Part Number: TCI6486CSL

Hi,

On TMS320TCI6486 DSP, there is problem where after some run DSP EMAC0 Receive Channel DMA Head Descriptor Pointer Register becomes null leading to all packets drop on EMAC by incrementing DMA Rx SOF Overrun counters. I am suspecting it is due to some bug in the EMAC module, which is leading to buffer overflow in the EMAC driver. There is only one EMAC port in use. Any details/hints for the problem will be helpful.

Thanks!

over 6 years ago

0 Cvetolin Shulev-XID over 6 years ago

TI__Guru 65405 points

Hi Shalini,

Do you have some console log related to the issue and could you give details about software release running on your board?

Regards,
Tsvetolin Shulev

0 Shalini Shukla over 6 years ago

Prodigy 240 points

Hi Cvetolin,

Thanks for the reply.

I don't have console logs. I have gathered all the information here by reading the EMAC control and statistics register in my dsp application with below revision.

th10, Rel:12.24.500.0, Id:14118 Jan:20:2012 02:27:14

What kind of console logs can be gathered here? Also, do you have the idea if we can build CSL library and flash it into the chip TMS320TCI6486?

Thank you!

0 Sahin Okur over 6 years ago in reply to Shalini Shukla

TI__Mastermind 27355 points

Hello,

Snippet from TMS320TCI6486 DSP EMAC/MDIO Module Reference Guide:

Receive channel n DMA Head Descriptor pointer. Writing a receive DMA buffer descriptor address to this location allows receive DMA operations in the selected channel when a channel frame is received. Writing to these locations when they are nonzero is an error (except at reset). Host software must initialize these locations to zero on reset.

Can you please confirm you are following the guidance in the highlighted text above?

0 Shalini Shukla over 6 years ago in reply to Sahin Okur

Prodigy 240 points

Hi Sahin,

Thank you for your reply.
Yes, these locations initializes to zero when reset. But, this issue is happening when DSP is running for a long duration with medium traffic.
As I mentioned that DSP EMAC0 Receive Channel DMA Head Descriptor Pointer Register is becoming null, I was able to workaround this by updating this register with new receive DMA buffer descriptor address which resolves this issue.

But, there is one more scenario where 'DMA Rx SOF Overrun Register' are increasing, I found that EMAC is still discarding packets at the firmware level(DMA Head Descriptor Pointer Register is not null), so I am suspecting that issue is in EMAC firmware and I don't have EMAC firmware code.

Also, there are three factors mentioned in the document which can cause the EMAC overrun issue.
1) mac address match due to promiscuous mode
2) ovresize or undersize frames received
3) cell fifo full or no DMA buffer available at SOF

I have checked the first two factors are not causing the overrun issue as promiscuous mode is disabled and there were no oversized or undersized frames received.

Do you have any insight how can we confirm the root cause of this issue?

Regards,
Shalini.

0 Mukul Bhatnagar over 6 years ago in reply to Shalini Shukla

TI__Guru* 85305 points

Hi Shalini

>>I don't have EMAC firmware code

Can you clarify this? We have limited familiarity with this device and its software. It is our understanding that your company maintains most of the software for this product for your end customer. So when you say you don't have the EMAC firmware code - I don't understand.

It will also be good to give additional background on the failures - are these field failures - did something change over time on these product that is causing the failures?

Regards

Mukul

0 Mukul Bhatnagar over 6 years ago in reply to Mukul Bhatnagar

TI__Guru* 85305 points

Please also confirm that if you are seeing any MOF overruns, or just SOF only?

0 Shalini Shukla over 6 years ago in reply to Mukul Bhatnagar

Prodigy 240 points

Hi Mukul,

Thanks for the update.
I can confirm from the register reading that MOF overruns are not happening and just SOF overruns are increasing causing all six dsp cores in problem state. But in our code I did not find the place where these register values are incremented. I suppose nothing has much changed on this product which can cause this failure and this issue is happening on various sites.

Please let me know if you have any other queries.

Regards,
Shalini.

0 Mukul Bhatnagar over 6 years ago in reply to Shalini Shukla

TI__Guru* 85305 points

Hi Shalini

Thanks for confirming that you do not see MOF errors. Unfortunately the support team has very limited knowledge on this and the associated software (which as per our understanding is maintained by your organization).

The only suggestion I got from some internal discussion is that we think it can be a potential software timing issue where an end of queue (EOQ) condition occurs on an Receive Channel and software is unable to add more resources to the RXnHDP before the next packet is received, resulting in RXDMAOVERRUNS.

We recommend that you review that your software/firmware manages the receive descriptor linked list to insure free descriptors/buffers are always available to the receive channel.

We will not be able to provide much guidance beyond this for this device family- so I am going to mark this thread closed.

Sorry for the limited support & guidance on this.

Regards

Mukul

0 Denis Beaudoin over 6 years ago in reply to Mukul Bhatnagar

TI__Expert 3075 points

The receiver drop frames if there is no resources available.

This can be caused by slow driver software or insufficient DMA bandwidth.
Cache also plays a part in operation of the HW to SW and SW to HW handling.

The basic receive interrupt processing is as below.

When a packet reception is complete, the CPGMAC issues an interrupt to the host by writing the packet’s last buffer descriptor address to the appropriate channel queue’s RX completion pointer.

The interrupt is generated by the write, regardless of the value written.

Upon interrupt reception, the host processes one or more packets from the queue and then acknowledges one or more interrupt(s) by writing the address of the last buffer descriptor processed to the queue’s associated Rx Completion Pointer.

It is important to note that if the processor did not process any buffers in this interrupt, it should write the last buffer processed in the previous receive interrupt.

The reason for this is that in the event the CPU reads the descriptor before the CGGMAC write hit the memory the CPU could read the old stale data.

If the host written buffer address to the Rx Completion Pointer value is different from the buffer address written by the HW port, then the level interrupt remains asserted (which means that the CPGMAC has received more packets than the host has processed interrupts for).

If the host written buffer address to the Rx Completion Pointer value is equal to the HW port written value (which means that the host has processed all packets that the CPGMAC has received ), then the level interrupt is de-asserted.

Now if the CPGMAC DMA had read the descriptor and the next field was zero, it will set the End Of Queue bit - This bit is valid only on eop.

0 – The RX queue has more buffers available for reception.
1 - The Descriptor buffer is the last buffer in the last packet in the queue.

If the EOQ bit is set during the processing of the receive packets AND the current buffer's next pointer is non-zero, that pointer should be written to the Head Descriptor Pointer register for that channel. This will cause the receiver to start receiving again.

If the next pointer is zero, the queue count should be set to zero so that when descriptors are added to that channel, the Head Descriptor Pointer register for that channel is written with the buffer address.

If the number of buffer descriptors is too small, the host may get an interrupt to service all of them, and when the host returns them it restarts the receive, during this time any backlogged receive packets will be dropped as SOF overruns.

If the Descriptors are in cacheable memory, the below becomes important…

Cache can be managed in two manners, pre-invalidating or invalidating methods.

Invalidating method means invalidating the descriptor prior to reading it.

Pre-invalidating method means you invalidate the descriptor after an access so that the cache has time to invalidate prior to an access in the future.

That is the HW reads what is in memory, so the CPU need to writeback and invalidate the descriptor to ensure the CPGMAC sees the expected data. And the CPU needs to ensure the cache was invalidated before it references the descriptor in memory.

That is you should writeback and invalidate the current buffer prior to write thru of the previous next descriptor pointer when adding buffers.

The buffers read by the CPU should either be pre invalidated or invalidated prior to the read of HW state.

Care needs to be had when reading a descriptor in one interrupt versus another interrupt.

0 Shalini Shukla over 6 years ago in reply to Denis Beaudoin

Prodigy 240 points

Hi Denis, Mukul,

Thanks for giving us this information. I'm really not clear with the idea of SOP and EOP bit set at the end of list, so please correct me if I'm wrong somewhere.

The EMAC patches the SOP descriptor of the corresponding packet and clears the OWNER flag as packets are processed. This means that the EMAC is finished processing all descriptors up to and including the first with the EOP flag set. This indicates that you have reached the end of the packet. This may only be one descriptor with both the SOP and EOP flags set.
- So, it means this descriptor will have both the SOP and EOP? What if they are not in the same descriptor? Is there a chance that they are not in the same descriptor at the end of list?

The software application must always examine the Flags field of all EOP packets, looking for a special flag called end of queue (EOQ). The EOQ flag is set by the EMAC on the last descriptor of a packet when the descriptor’s next pointer is NULL, allowing the EMAC to indicate to the software application that it has reached the end of the list.
- If the next pointer is zero, the queue count should be set to zero so that when descriptors are added to that channel, the Head Descriptor Pointer register for that channel is written with the buffer address. (this I have tried and it is working as per expectation. In my solution, I did not look for EOQ bit being set, but I directly compared the Head Descriptor Pointer register to null and reset the queue count and other parameters and re-updating it with the current buffer address).

If the EOQ bit is set during the processing of the receive packets AND the current buffer's next pointer is non-zero, that pointer should be written to the Head Descriptor Pointer register for that channel. This will cause the receiver to start receiving again.
- Here I checked when EOQ bit set is happening, I am just updating the Head Descriptor Pointer register with current buffer's non-zero next pointer. But, the EOQ bit set seems not happening. I'm not sure whether is it because of SOP and EOP not being set in the same descrptor.

Also, my another doubt is when we are updating the Head Descriptor Pointer register, it will be updated for one channel(there is one DMA channel per core). Will the reupdation on all the cores will happen for the EOQ bit, how this will be managed?

Thanks,
Shalini

0 Denis Beaudoin over 6 years ago in reply to Shalini Shukla

TI__Expert 3075 points

“-So, it means this descriptor will have both the SOP and EOP? What if they are not in the same descriptor? Is there a chance that they are not in the same descriptor at the end of list?”

Descriptors associate a buffer of a particular size for the CPGMAC to use. That is the CPGMAC can receive a packet up to MAXPACKET size into the associated buffers.

The CPGMAC only updates the descriptors on end of frame processing. If the frame is across multiple descriptors only the start of frame and end of frame descriptors are modified all the intermediate buffer descriptors are left untouched. That is in the start of frame buffer only the SOP is set all other status fields are cleared. And in the end of frame buffer descriptor the status and length are updated. That is, EOP, EOQ if the next pointer was zero, error bits and the length are all updated.

If the entire frame fits within the buffer the then the SOP is added with the EOP, EOQ if the next pointer was zero, error bits and the length.

If all your buffers are greater than MAXPACKET programmed, then you will always get the entire frame within a single descriptor.

Frame Example 1: Let’s say you had descriptors that have a length of 512 bytes. A 1536 byte frame would require three of these buffers before it could be received. The first buffer would have the first 512 bytes and the second buffer would have the next 512 bytes and the last would contain the remaining 512 bytes. When the last byte has been written to the third buffer, the first buffer status is updated with SOP and the second is updated with EOP/EOQ/length.

In this example, the first buffer would have an 0x80000200 in the status/length field (Only SOP is set), the second buffer would be as the SW had written it I would expect 0x20000200 in the length/status field (Only ownership bit set) and the last buffer would have 0x40000200 in the status/length field (only EOP is set) if there were more buffers and 0x50000200 (EOP and EOQ set ) if there were no more buffers.

In the event that only one or two buffer exists you would get a MOF overrun and the buffer(s) would be reused for the next frame. If you receive a frame that fits within the buffer(s) then the status/length fields are updated and given to the host.

So if the entire frame fits within a buffer then the SOP/EOP will be set, the EOQ will be set if the next field was read as zero and the length field of the buffer will be updated with the length of the frame.

If the entire frame fits within the buffer then the status length field would be 0xc0000600 or 0xd0000600 if there were no more buffers.

“-Here I checked when EOQ bit set is happening, I am just updating the Head Descriptor Pointer register with current buffer's non-zero next pointer. But, the EOQ bit set seems not happening. I'm not sure whether is it because of SOP and EOP not being set in the same descriptor. “

Are you saying you never see the EOQ bit set in an EOP buffer descriptor?

It that is the case, please look at the MacStatus - MAC Status Register (0x164) register for the

RX Host Error Code – This field is set to indicate CPGMAC detected RX DMA

related host errors. The host should read this field after a HOST_ERR_INT to

determine the error. Host error Interrupts require hardware reset in order to

recover.

“Also, my another doubt is when we are updating the Head Descriptor Pointer register, it will be updated for one channel(there is one DMA channel per core). Will the reupdation on all the cores will happen for the EOQ bit, how this will be managed?”

Each DMA channel is completely independent, the fact that a single channel runs out of buffers does not indicate that another channel is in the same situation. Each channel has separate Head Descriptor Pointer register to be written in the case for that channels EOQ operation. Each channel should be treated independently. If all channels are dying, it most probably is rx host error code issue.

0 Denis Beaudoin over 6 years ago in reply to Denis Beaudoin

TI__Expert 3075 points

What is the below register programmed to?

CFIG3 MacControl - MAC Control Register (0x160)

0 Shalini Shukla over 6 years ago in reply to Denis Beaudoin

Prodigy 240 points

Hi Denis,

Thanks for the reply.

MAC Control Register (0x160) register is programmed to value '0x00020018' i.e, 'FULLDUPLEX', 'RXBUFFERFLOWEN', 'TXFLOWEN', 'GMIIEN', 'GIG' and 'GIGFORCE' bits are set for Mac port zero(only this port is in use).

Also, please can you answer my few more questions,
1) What are the conditions which can cause 'RX Host Error'?
2) Does the statement 'Host error Interrupts require hardware reset in order to recover' means 'DSP reset is the only solution for RX Host Error'?

Thank you,
Shalini.

0 Denis Beaudoin over 6 years ago in reply to Shalini Shukla

TI__Expert 3075 points

The MAC register looks good.

“1) What are the conditions which can cause 'RX Host Error'?”

Writing a zero to the channel head pointer
The DMA reads the descriptor and the ownership bit is not set
The DMA reads the descriptor length of zero on a receive buffer

“2) Does the statement 'Host error Interrupts require hardware reset in order to recover' means 'DSP reset “is the only solution for RX Host Error'?

No, only the CPSW needs to be reset to correct the DMA host error shutdown.

An Rx Host Error is the only error that can affect all channels at the same time.

Are you getting an Rx Host error?

0 Denis Beaudoin over 6 years ago in reply to Shalini Shukla

TI__Expert 3075 points

Are you getting a rx host error ?

0 Denis Beaudoin over 6 years ago in reply to Denis Beaudoin

TI__Expert 3075 points

An Rx Host Error is the only error that can affect all channels at the same time.

Are you getting an Rx Host error?

0 Shalini Shukla over 6 years ago in reply to Denis Beaudoin

Prodigy 240 points

Hi Denis,

Thanks for the reply.
I have confirmed that SOF and MOF overrun both are happening at the customer site with the recent test results.
Also, I found the MACSTATUS reg value as 0x0000241A when the issue happened. So, Rx Host error is happened when the issue happened with reason 'The DMA reads the descriptor and the ownership bit is not set'.

Do you have any suggestions what could be the scenario behind that?
What are all the conditions in which the bit needs to be set?
And why if the issue happened for one DMA channel, all channels fails to accept any of the incoming packets?

Regards,
Shalini.

0 Denis Beaudoin over 6 years ago in reply to Shalini Shukla

TI__Expert 3075 points

So we are investigating a software queuing issue!

It looks like channel 4 was the offending channel.

This issue can be created by poor software queueing methods or can be created by mishandling of cache when descriptors are managed within cache space.

Normally channels were used by a single CPU for priority, so if the CPU messed up a single channel, it was important to stop everything in its tracks so that the issue can be found easily.

In your case the channels are being used for different DSP cores, but there was no feature to disable the system wide halt on error of a single channel.

If you are keeping the descriptors in cacheable memory space, it is important to ensure that the descriptor that you are about to chain to the current end of queue is written back and invalidated prior to updating the current end of queue next pointer.

QueueRxFreeDescriptor(Descriptor_t * Current_descriptor, Queue_t * Queue)

{

Current_descriptor->length=default_length;

Current_descriptor->Ownership=HARDWARE;

Current_descriptor->Next=0;

WriteBackAndInvalidate(Current_descriptor); //Push the descriptor out

Queue->Tail->Next= Current_descriptor;

WriteBackAndInvalidate(Queue->Tail); //Push the Next pointer out

Queue->Tail= Current_descriptor;

}

If the descriptors are not in cacheable space it is important that they are defined as volatile so that the compiler does not reorder them is respect to each other.

0 Shalini Shukla over 6 years ago in reply to Denis Beaudoin

Prodigy 240 points

Hi Denis,

Thanks for sharing your suggestions. Our code flow goes like this:

1. Software application extracts the packet chain when the head of chain contains a packet if SOP is set. The cppi structures containing the chain are removed from the master chain and placed onto the output chain.

master->head = eop->next;

2. For freeing up the chain the tail chain is added to the master chain. The ownership of the last element as in the master chain, and the eoq setting of the last element as in the master chain are returned.

Is my understanding correct for the code where you mentioned to write back and invalidate while linking the tail of the master to the head of the tail chain like below.

WriteBackAndInvalidate(tail_Chain->head);
master->tail->next=tail_Chain->head;
WriteBackAndInvalidate(master->tail);
master->tail=tail_Chain->tail;

Am I missing any considerations?

Regards,
Shalini.

0 Denis Beaudoin over 6 years ago in reply to Shalini Shukla

TI__Expert 3075 points

It is not clear for #1 if you process multiple frames or each frame is process individually.

But to understand, the CPGMAC updates both the SOP and EOP of each frame, so both the SOP and EOP descriptors need to be restored back to their original state prior to placing them back onto the Rx free queue.

If the MAX Packet Length register is less than the buffer size then all packet will fit into a single buffer, it is still expected by the driver to handle a multi buffer frame in the event of an error.

As for #2, It looks OK if you return a single buffer at a time. But from the code it looks like you are returning a chain, so you should traverse the chain in the writeback loop. Otherwise you may not be returning the EOP buffer which may be left in cache. This could be causing the Ownership error! And yes it would only occur under heavy load as this is where the cache and the HW are getting on top of each other.

dpTmp=tail_Chain->head

While (dpTmp)

{

WriteBackAndInvalidate(dpTmp);

dpTmp= dpTmp->next;

}

master->tail->next=tail_Chain->head;

WriteBackAndInvalidate(master->tail);

master->tail=tail_Chain->tail;

If you only return a single frame at a time you could get away with a

WriteBackAndInvalidate(tail_Chain->head);

WriteBackAndInvalidate(tail_Chain->tail);

master->tail->next=tail_Chain->head;

WriteBackAndInvalidate(master->tail);

master->tail=tail_Chain->tail;

Need to ensure you restore the status and length for each buffer returned prior to the writeback and invalidate.

0 Shalini Shukla over 6 years ago in reply to Denis Beaudoin

Prodigy 240 points

Hi Denis,

Thanks for the reply. Please find the details below:

It is not clear for #1 if you process multiple frames or each frame is process individually.

But to understand, the CPGMAC updates both the SOP and EOP of each frame, so both the SOP and EOP descriptors need to be restored back to their original state prior to placing them back onto the Rx free queue.

If the MAX Packet Length register is less than the buffer size then all packet will fit into a single buffer, it is still expected by the driver to handle a multi buffer frame in the event of an error.

- Yes, each packet is processed individually. Also, SOP, EOP, and EOQ are cleared before placing into the Rx Free queue. MAX Packet Length register is 0x190 and rx buffer length is 0xA0.

As for #2, It looks OK if you return a single buffer at a time. But from the code it looks like you are returning a chain, so you should traverse the chain in the writeback loop. Otherwise you may not be returning the EOP buffer which may be left in cache. This could be causing the Ownership error! And yes it would only occur under heavy load as this is where the cache and the HW are getting on top of each other.

- Yes I got your point. As there are multiple buffers, so I will have to writeback and invalidate all the buffers in a packet chain. And also the tail in master chain.

dpTmp=tail_Chain->head

While (dpTmp)

{

WriteBackAndInvalidate(dpTmp);

dpTmp= dpTmp->next;

}

master->tail->next=tail_Chain->head;

WriteBackAndInvalidate(master->tail);

master->tail=tail_Chain->tail;

Need to ensure you restore the status and length for each buffer returned prior to the writeback and invalidate.
- Correct me if I'm wrong here. Since only one frame is processed in a packet chain, status and length can be returned after complete traversal of a packet chain.

Regards,
Shalini.

0 Denis Beaudoin over 6 years ago in reply to Shalini Shukla

TI__Expert 3075 points

The hardware can process 1.5M packet per second if the memory bandwidth and buffers are available.

Mots CPU can only handle about 33K interrupts per second, so to deal with the packet rates; they will service multiple packets within the same interrupt service event.

Since I did not write the particular software you are using and I have seen many solutions, here are the possible solutions we have seen and each need a slightly different solution.

Single buffer returned at a time
Single frame but multi buffer returned at a time.
Multi frame multi buffer returned at a time.

In #1 Restore the status/length are zero the next, writeback the descriptor and then chain to the channel. No assumptions!

In #2 Restore the status/length of the first and last, zero the next of the last and then writeback the descriptors for the first and last and then chain to the channel. Assumes the intermediate buffer descriptors are not modified by the software.

In #3 Travers the chain and Restore the status/length and write back each when you get to the end zero the next and then chain to the channel. No assumptions!

Do realize that depending on how the system is setup the may be some critical sectioning arround some of the queuing.

0 Shalini Shukla over 6 years ago in reply to Denis Beaudoin

Prodigy 240 points

Hi Denis,

Thank you for the description.

We are using 2 solution i.e, Single frame but multi buffer returned at a time.

So, the final implementation for 2nd solution should be like this:

1. Restore the status/length of the first and last, zero the next of the last.

2. Writeback the descriptors for the first and last -

" If you only return a single frame at a time you could get away with a:

WriteBackAndInvalidate(tail_Chain->head);

WriteBackAndInvalidate(tail_Chain->tail); "

3. Then chain to the channel -

" master->tail->next=tail_Chain->head;

WriteBackAndInvalidate(master->tail);

master->tail=tail_Chain->tail; "

Regards,
Shalini.