Are EDMA CC IPR writes atomic?

Manu Bansal

Hi,

I'm wondering if updates to the EDMA controller's Interrupt Pending Registers (IPR, IPRH) are atomic. I want multiple DSP cores to post EDMA transfers and poll the interrupt pending register. Upon completion, respective cores will clear respective IPR bits by writing 1 with the appropriate bit mask corresponding to the Transfer Completion Code (TCC). I understand that IPR and IPRH will be written in two separate calls and so the entire 64-bit update of the IPR register-pair will not be atomic, but that is fine as long as either register is updated atomically. Then, the two cores can safely read and clear IPR bits on the same channel controller instance without semaphores.

I didn't find anything on this in the EDMA User's Guide or the Multicore Programming Guide. The only mention of atomicity was that PaRAMs are updated atomically by the channel controllers.

Thanks.

over 13 years ago

0 RandyP over 13 years ago

TI__Guru* 84110 points

Manu,

The question of atomicity only applies for a read-modify-write operation. The design of the EDMA3 Interrupt registers was specifically done to eliminate this as an issue.

Your code will technically read IPR then write to ICR, which is not an atomic operation from the register point-of-view. But this sequence of operations only apply to the bits to which that code is responding, for example IPR.bit3 and IPR.bit4 represented by 0x00000018. The write to ICR is a WTC (Write To Clear) operation and will only work on the bits that your code wants to clear; it will not force any other bits to 0 so it will not interfere with any other threads or processors.

Atomicity or designed protection exists as long as you do not have two different cores or threads that are responding to the same IPR bit and might try to clear that bit before the other thread has had a chance to read IPR. That would be a code architecture problem that needs to be solved by re-thinking your application.

But for what you care about, there is no issue because there is no situation for atomicity to apply in the operation of separate bits of IPR for different threads.

Regards,
RandyP

0 Chad Courtney over 13 years ago

TI__Mastermind 30825 points

The IPR & IPRH registers are read only. I'm not sure why they'd need to be atomic access. Maybe you mean the ICR & ICRH registers, which is what you should be writting to clear the IPR & IPRH pending interrupts.

These are not Atomic in the traditional sense of only one core has complete access at any given time. Any core can access it at any given time, but if two cores attempt to write at the same time, the order will be serialized.

As long as each core is handling it's own associated channel then there would be no conflicts.

Best Regards,

Chad

EDIT: Correction I typed IPC/IPCH when I meant ICR/ICRH. I've corrected it above.

0 Karthik Ramana Sankar over 13 years ago in reply to Chad Courtney

TI__Intellectual 2415 points

For safe multicore access of EDMA CC resources, one can make use of the EDMA CC shadow region registers (there are 8 set of shadow registers). Also, each shadow region has its own interrupt. For example: If core0 uses shadow region 0, core1 uses shadow region 1 and so on. The EDMA CC shadow region 0 interrupt (transfer completion interrupt) can be routed to core0. In this way, each core will write to its own copy of shadow region registers (ICR, ICRH etc) and there will be no contention among the cores.

0 Manu Bansal over 13 years ago in reply to Karthik Ramana Sankar

Intellectual 710 points

RandyP: It's very useful to know that bits of the IPR don't interfere.

Chad: Yes, writes will be to the IPC registers, not IPR. I misstated that. But I am eventually interested in the read-modify-update cycle of the IPR register. It's useful to know that writes will be serialized - that guarantees consistent outcome of the IPR status under concurrent writes.

Karthik: I'm not sure if shadow regions alone solve the problem since they share the same IPR registers. If two IPC calls are made from different shadow regions on the same EDMA CC, the problem is the same as my original one.

All:

There is still a part of behavior left unspecified. With write serialization, it is clear that writes will not conflict. With bit-wise partitioning, it is clear that read and write operations on different bits will not conflict. But what about read and write on the same bit?

I am looking at a scenario where a channel is being monitored by two cores but modified/trigger by only one. So I have two readers of the IPR and one writer of the IPR (through the IPC), say all for the same IPR bit. The question of write atomicity is that when the writer is updating that bit, what does the other reader get to see if the read is concurrent? I just need to know that bit will not flicker. The time of transition from 1 to 0 is not important.

Thanks.

0 RandyP over 13 years ago in reply to Manu Bansal

TI__Guru* 84110 points

IPC is a completely different module in the C6678.

You mean ICR and ICRH, in this thread.

RandyP

0 Chad Courtney over 13 years ago in reply to RandyP

TI__Mastermind 30825 points

Thanks for catching this Randy, I mistyped - must have IPC on the brain or something. I did intent ICR/ICRH registers.

-Chad

0 RandyP over 13 years ago in reply to RandyP

TI__Guru* 84110 points

Manu,

The scenario you described with IPR being monitored by two cores is the same as what I mentioned in my post above:

"Atomicity or designed protection exists as long as you do not have two different cores or threads that are responding to the same IPR bit and might try to clear that bit before the other thread has had a chance to read IPR. That would be a code architecture problem that needs to be solved by re-thinking your application."

You cannot do what you describe without re-thinking your application. You cannot allow one core to always test and clear the IPR and expect the other core to always be able to test the IPR and still see that bit set. You must do it differently or add some synchronization between the two cores.

1. Let the first core respond to the interrupt and clear the IPR bit. Then have it send an IPC Inter-Processor Communication interrupt to the second core to let it know this event occurred.

2. Let each core respond to the interrupt and test the IPR, but use a shared memory semaphore to indicate that this core has responded to the interrupt. Then when the second core responds to the interrupt, it will test that semaphore to know that it is safe to clear the IPR through ICR.

There may be a race condition with the semaphore in #2 that needs to be thought out. So #1 is safest, or you may be able to work out the logic to prove no race condition, or you may have another better way to do this.

Regards,
RandyP

0 Manu Bansal over 13 years ago in reply to RandyP

Intellectual 710 points

RandyP,

I understand the situation you are describing, but that is not what I am trying to achieve. I don't care for both cores to have read the set IPR before it is cleared by the primary. I just want to ensure that the secondary core, which is only monitoring the IPR bit, only sees 1 1 1 1 1 0 0 0 0 0... for that bit in time but never 1 1 1 (1 0 1...1) 0 0 0, where the part in parentheses is the transient period of update of that bit caused by the primary. As long as write-from-core-1 and read-from-core-2 are "serialized" in this sense, I am good. May be this is trivial but I want to make sure I know how the hardware behaves.

0 Manu Bansal over 13 years ago in reply to Manu Bansal

Intellectual 710 points

I think it comes down to whether the IPR register update is a single-cycle operation assuming IPR read is aligned on the cycle boundary. If the update is multi-cycle, the read could end up seeing a transient state of the IPR, otherwise the cycle boundary acts as a serialization point.

0 Karthik Ramana Sankar over 13 years ago in reply to Manu Bansal

TI__Intellectual 2415 points

Manu,

In SW, you can make sure that the write to IPR (from core0) and read to IPR (from core1) are serialized as follows:

In core1:

IPR_LOOP:

Read IPR

While (IPR == 1)

{

Read IPR

}

while (1)

{

count=0;

Read IPR

if (IPR == 0)

count++;

if (count > x) //x should be greater than the transient period, x = 20 to 25 cycles should work

break;

}

//continue with whatever you want to do, whenever IPR becomes zero.

Rather than polling for IPR bit in core1, an efficient way would be for core0 to post a IPC message or IPC interrupt to core1 after it is done with clearing the IPR bit.

0 Karthik Ramana Sankar over 13 years ago in reply to Karthik Ramana Sankar

TI__Intellectual 2415 points

Assuming you are using C6678 or C6670, another very nice way to communicate the EDMA CC transfer completion to both core0 and core1 is, by using the CIC0 broadcast interrupts (CorePAC INTC events 102 to 109). These CIC0 outputs are broadcasted to all the cores. In this way, whenever the IPR bit is set, both core0 and core1 receive interrupts and you can have only core0 clear the IPR bit.

0 Karthik Ramana Sankar over 13 years ago in reply to Karthik Ramana Sankar

TI__Intellectual 2415 points

The above comment was mentioning a method using the global region interrupts and not using shadow regions.

I'm not sure if shadow regions alone solve the problem since they share the same IPR registers. If two IPC calls are made from different shadow regions on the same EDMA CC, the problem is the same as my original one.

[Karthik]: Yes. All the shadow regions share the same IPR/IPRH registers. If the particular DMA channel transfer completion interrupt is enabled in IER/IERH registers of both shadow region0 and shadow region1 (also the corresponding bit in the global IER/IERH need to be set). Hence, on transfer completion both core0 and core1 receive interrupts and you will not encounter any race conditions, if you clear the IPR/IPRH bit in only one core (say core0). For more details, please, refer to section 2.9.1.1 "Enabling Transfer Completion Interrupts" in Keystone EDMA3 users guide.

0 Manu Bansal over 13 years ago in reply to Karthik Ramana Sankar

Intellectual 710 points

Karthik,

Using interrupts with or without shadow regions doesn't change the problem since the interrupt still needs to be resolved into the actual channel/IPR bit by receiver of the interrupt. This would require the receiver core to read the IPR register eventually.

The software solution you suggest will serve the purpose but it seems very inefficient to wait 20-25 cycles just to allow the IPR to settle. I really don't think IPR update would ever take more than 1 cycle. In particular, this must be true in the case when the EDMA controller is setting the IPR bit upon transfer completion, otherwise even a single reader could end up seeing flickering values. I find it reasonable to assume the same behavior in the case of IPC-triggered update too. I was just hoping this could be verified from hardware design specifications. Maybe the best thing to do is to just test it a large number of times:

Core 0:

Upon IPR bit = 1, set IPC bit = 1

Core 1:

for 1 to 1000 { //much more than settlement time

Read IPR bit

Compare with previously read IPR bit

If the difference (old bit - new bit) == -1: test FAILED

}

0 RandyP over 13 years ago in reply to Manu Bansal

TI__Guru* 84110 points

Manu, Karthik,

The only value I see in having a polling loop for Core1 is if you want to stall Core1 until Core0 has cleared IPR. That could be a long time depending on what other higher priority things are going on with Core0, including servicing other EDMA3 interrupts that were set before this common IPR bit.

But which core do you intend to clear the IPR bit? This is the basic problem that I have not said very well.

Manu Bansal said:
I understand the situation you are describing, but that is not what I am trying to achieve. I don't care for both cores to have read the set IPR before it is cleared by the primary. I just want to ensure that the secondary core, which is only monitoring the IPR bit, only sees 1 1 1 1 1 0 0 0 0 0... for that bit in time but never 1 1 1 (1 0 1...1) 0 0 0, where the part in parentheses is the transient period of update of that bit caused by the primary. As long as write-from-core-1 and read-from-core-2 are "serialized" in this sense, I am good. May be this is trivial but I want to make sure I know how the hardware behaves.

The "race condition" is that the secondary core (aka Core1 here) might only see 0 0 0 0 0 0 0 0 0 0. It might never see the IPR bit be 1. This would happen if Core0 is ready to service the EDMA3 interrupt immediately, reads IPR, and clears IPR (through ICR), all before Core1 ever has a chance to get to its EDMA3 ISR / Dispatcher. If Core1 happens to be servicing other EDMA3 interrupts that take a lot of time, or if Core1 happens to be in a non-interruptible loop, or if Core1 happens to be servicing a couple of other higher priority interrupts before it can get around to reading IPR in int EDMA3 ISR / Dispatcher, then Core1 might respond to the EDMA3 interrupt and it could conceivably read IPR=0x00000000 because the bit was already read and cleared by Core0.

This same "race condition" can occur if Core1 is in a blocking polling loop doing nothing but reading IPR, unless you disable interrupts.

Karthik's advice to use IPC is the clean choice and is how the architecture was designed for handling this situation.

Regards,
RandyP

0 RandyP over 13 years ago in reply to RandyP

TI__Guru* 84110 points

Manu,

To directly answer your specific question, Core1 will never see "1 1 1 (1 0 1...1) 0 0 0" due to Core0 clearing an IPR bit through ICR. It will not bounce like that.

Regards,
RandyP

0 Karthik Ramana Sankar over 13 years ago in reply to RandyP

TI__Intellectual 2415 points

Randy,

Thanks for the detailed explanation of potential issues of polling the IPR bit (in cores, which do not clear the IPR bit).

After thinking long about this problem, we will still not be efficient, if we use IPC interrupts (say on identifying a common IPR bit set, core0 triggers the IPC interrupts for all the other cores). Still, the latency of servicing these common interrupts for the other cores is dependent on the master core (which triggers the IPC interrupts). Hence, we do not get any improvement by using IPC interrupts.The usage of additional IPC interrupt resources, makes this solution less attractive.

The better solution will be to improvise on the previous discussed polling solution, to make sure there are no race conditions:

To keep things simple, let us assume that we have a single common IPR bit which is shared among 2 cores and their corresponding shadow regions are used for triggering the completion interrupts. The following pseudo-code explains how to handle common EDMA interrupts going to multiple cores:

unsigned long long shared_IPR/IPRH [NUM_CORES]; //This 64 bit array needs to be placed in common shared memory (MSMC or DDR3)

Core0 ISR () //master, responsible for clearing the common IPR bit

{

if (common IPR bit is set)
{

- Write the common IPR/IPRH bit status to shared_IPR/IPRH[0]......shared_IPR/IPRH[NUM_CORES-1]

- Service the common IPR interrupt (if there are multiple common IPR bits, then make sure you service the interrupts based on the IPR status captured in shared_IPR/IPRH[0] and not the global IPR/IPRH register values)

- Clear the appropriate global IPR/IPRH bits and also the shared_IPR/IPRH[0] bits.

}

else

{

//This is not a common interrupt, have a normal DMA routine

}

Core 1 ISR () //Also applicable for Cores2-7

{

if (common IPR bit set || common IPR bit set in shared_IPR/IPRH[DNUM])

{

- while (common IPR bit set in shared_IPR/IPRH[DNUM]); //wait for the master core to populate the shared_IPR/IPRH[ ]

- Service the common IPR interrupt which is set in the shared_IPR/IPRH[DNUM] (if there are multiple common IPR bits, then make sure you service the interrupts based on the IPR status captured in shared_IPR/IPRH[DNUM] and not the global IPR/IPRH register values)

- Clear the appropriate shared_IPR/IPRH[DNUM] bits.

}

else

{

//This is not a common interrupt, have a normal DMA routine

}

0 Manu Bansal over 13 years ago in reply to Karthik Ramana Sankar

Intellectual 710 points

Karthik and Randy,

That's my analysis too. If the intention is to ensure both cores get to process completion of an EDMA transfer at their respective earliest available opportunity, the best strategy is to have some inter-core signaling. Let either core poll the TCC. The core that detects TCC first signals this to the other core, clears the TCC and performs its own service routine. The core that polls TCC later does not see TCC in the IPR but sees the corresponding signal from the other core. It only performs its own service routine. This also frees up the TCC at the earliest opportunity to allow reuse by either core. This is also roughly the idea in the pseudo code in the previous post, except that I am relaxing clearing of the IPR by either core, not just the master.

Thanks for following it up.

Manu

Processors

Processors forum

Are EDMA CC IPR writes atomic?