Workaround for RM57 EMAC EOQ race condition

Stephen Holstein

Other Parts Discussed in Thread: AM3505

In evaluating the RM57’s EMAC, I noticed that the transmit CPPI chain was stalling. I believe this is caused by the EOQ not being set until the frame has completed transmission, though the EMAC has already determined to stop (CPPI descriptor’s next == NULL). The driver we’ve developed seeks to minimize the need for the main core to manage the EMAC.

To resolve the race condition between the EMAC’s decision to continue and the EOQ flag state, I’ve tested the the following workaround:

The tail of the transmit CPPI chain always points to a sacrificial descriptor
1. Sacrificial descriptor
- - Next Descriptor Pointer = NULL
  - Buffer Pointer = (a non-NULL address)
  - Buffer Length = 1
  - Flags = SOP | EOP | OWNER
  - Packet Length = 0
Since the chain should always continue to the sacrificial descriptor, we can now test the final real descriptor for the OWNER flag to determine if it has been transmitted. If the OWNER flag is set (EMAC owned), we assume that we can safely append to the CPPI chain. If the OWNER flag is clear (core owned), we wait for the sacrificial descriptor to have it’s EOQ flag set, and then start a new CPPI chain.

As I had to use trial and error to identify a sacrificial descriptor that works, I wanted to request TI's opinion on this workaround. Specifically, my concern is that these settings may impact the RM57 internals in some way that we've not identified in testing just ethernet alone.

over 9 years ago

0 Anthony F. Seely over 9 years ago

TI__Guru 68830 points

I don't know that we've got an opinion on this. I'd be careful about what non-null address you use but other than this, is this the 'Race Condition' you are referring to:

'There is a potential race condition where the EMAC may read the “next” pointer of a descriptor as NULL in
the instant before an application appends additional descriptors to the list by patching the pointer. This
case is handled by the software application always examining the buffer descriptor flags of all EOP
packets, looking for a special flag called end of queue (EOQ). The EOQ flag is set by the EMAC on the
last descriptor of a packet when the descriptor’s “next” pointer is NULL. This is the way the EMAC
indicates to the software application that it believes it has reached the end of the list. When the software
application sees the EOQ flag set, the application may at that time submit the new list, or the portion of
the appended list that was missed by writing the new list pointer to the same HDP that started the
process.'

If so what are you trying to streamline exactly?

0 Stephen Holstein over 9 years ago in reply to Anthony F. Seely

Intellectual 290 points

Yes, that is the race condition I'm referring too. I'm trying to eliminate:
1. Requiring an EOQ polling loop to wait the transfer to complete (This is the EOQ lifecycle I've observed in testing) Using the proposed workaround, the sacrificial descriptor should incur minimal delay in setting EOQ as the descriptor contains no data to send.
2. Requiring a future event handler to reset the TXnHDP when the final packet completes transfer- as I've been able to avoid using interrupts thus far.
3. Accepting that packets may not be sent in a timely fashion (i.e. reset the TXnHDP on the next addition of a transmit packet)

As for buffer-pointer address choice, is the only implication adressability (flash or ram, but not memio peripherals) and MPU protections, or are there other considerations?

0 Anthony F. Seely over 9 years ago in reply to Stephen Holstein

TI__Guru 68830 points

Thanks Stephen,

It'll take some time to research this one as this EMAC has been around for quite some time now.
(it's an older IP). My instinct in the meantime is to stick w. the user's guide. It is pretty detailed in that section;
and knowing who developed the IP my gut says if there were a better way it would have been documented.
But will still check into it once i understand this section better along w. your description above.

One thing we need to check on definitely is the validity to have the length be 0...

On the addressing - yes I'd worry that if the EMAC accesses some memory that you point to (even though the length is 0)
that there would be an exception raised by an MPU or a bus hang by addressing unmapped memory. (latter shouldn't happen but might). I don't know if I'd point it at flash necessarily as that would impact the CPU performance ever so slightly.

I forget if the buffer itself or just the descriptor can be in CPPI RAM. If both descriptor and buffer can be there, I'd probably point the sacrifical descriptor at the CPPI RAM for the buffer too - so that it's 'self contained' .. but that again is just instinct not out of experience.

-Anthony

0 Stephen Holstein over 9 years ago in reply to Anthony F. Seely

Intellectual 290 points

Any news on whether packet length = 0 is going to be an issue?

0 Anthony F. Seely over 9 years ago in reply to Stephen Holstein

TI__Guru 68830 points

No I sent an email to the person I thought would know but hadn't heard back yet.
Might not get a reply and may need to research it a different way.

0 Stephen Holstein over 9 years ago in reply to Anthony F. Seely

Intellectual 290 points

Are there any new developments?

0 Anthony F. Seely over 9 years ago in reply to Stephen Holstein

TI__Guru 68830 points

Hi Stephen,

No nothing new. Have no had time to research. I'd say try to go by what is allowed in the documentation for now and when in doubt be conservative.

0 Szilard Lovas over 9 years ago in reply to Stephen Holstein

Expert 1425 points

Hi Stephen,
I used the following pattern:

In the Emac_TX() function:
===================
1, Create a NEW_BD
2, DISABLE_EMAC_TX_IRQ()
3, if(EMAC_TX_HP != NULL) link the NEW_BD to the end of previous BD chain.
else Start transmission by writing NEW_BD address to EMAC_TX_HP
4, ENABLE_EMAC_TX_IRQ()

In the Emac TX interrupt:
=========================
1, Acknowledge EMAC IRQ by writing EMAC_TX_CP to EMAC_TX_CP
2, if(EMAC_TX_CP->next != NULL) restart the transmission by writing EMAC_TX_CP->next to EMAC_TX_HP
3, Acknowledge the EMAC_INT_CORE0_TX

notes:
======
NEW_BD: New Buffer Descriptor
EMAC_TX_HP: Emac TX Head Pointer
EMAC_TX_CP: Emac TX Completition Pointer

- clean cache before sending (emac sends data from memory)
- make sure not to give zero length packet to EMAC
- make sure not to give packet with buffer_pointer == NULL
- the most of the EMAC registers are byte swapped, use EMACSwizzleData(x) or __rev(x) intrinsic

Regards, Szilárd

0 Stephen Holstein over 9 years ago in reply to Szilard Lovas

Intellectual 290 points

The goal of this workaround is to remove the need for interrupts. In your solution you test EMAC_TX_HP, can this be guaranteed to be NULL at the start of the EOQ CPPI entry? Otherwise you'd end up with the same transmission stalls I referred to.

0 Szilard Lovas over 9 years ago in reply to Stephen Holstein

Expert 1425 points

Hi Stephen,
Sorry, because of late answer - I was on holiday. As I experienced, you can add new descriptor entry at the end of the chain, by updating NEXT field, while EMAC is working. If EMAC has read out the NEXT filed before update (NULL value) it stop the transmission, so you have to restart again with the new BD, otherwise it continues the transfer with the new BD automatically.

This solution has been working on my desk for months, tested with floodpingig from different nodes, while it serves ftp, http - so looks like dependable enough.

Regards: Szilárd

0 Theodore Witkamp over 9 years ago in reply to Stephen Holstein

Prodigy 100 points

I think I am experiencing the same problem on the AM3505.
I too am trying not to use interrupts.
Any resolution on this?

e2e.ti.com/.../543686

Thanks.

0 Stephen Holstein over 9 years ago in reply to Szilard Lovas

Intellectual 290 points

Your solution requires an interrupt. I'm specifically trying to avoid using interrupts.

0 Szilard Lovas over 9 years ago in reply to Stephen Holstein

Expert 1425 points

Hi Stephen,
Did you check MAC Status Register (MACSTATUS)? You can read out the reason of HOST ERROR (TX/RX) - It helped me lot.
Regards, Szilárd

Arm-based microcontrollers

Arm-based microcontrollers forum

Workaround for RM57 EMAC EOQ race condition