Processors

Processors forum

State TI Thinks Resolved
Locked Locked
Replies 21 replies
Answers 1 answer
Subscribers 100 subscribers
Views 1269 views
Users 0 members are here

Support feedback

Options

Options

Related

OMAP-L138 Async EMIFA Turnaround phase

Milan Zelenka

Intellectual 870 points

Other Parts Discussed in Thread: OMAP-L138

Hello,
OMAP-L138 Reference Manual says that TA precedes the Async EMIFA operation.

Could you please explain in plain English how this feature helps slow down access to a slow peripheral?

If TA cases leading delay, then I suspect SW cannot assign a prolonged TA to the peripheral with slow OE-to-databus-High-Z response time. What am I missing here?

Many thanks,
Milan

over 12 years ago

0 Shankari G over 12 years ago

TI__Mastermind 43955 points

Hi Milan,

Please let us know which section of OMAP-L138 TRM ( Technical reference manual) you are referring to?

What do you mean by the term "TA" ?

regards,

Shankari

0 Milan Zelenka over 12 years ago in reply to Shankari G

Intellectual 870 points

Hello Shankari,
TA stands for Turnaround phase. This is an offical acronym used in OMAP-L138 documentation for the Turnaround option

For Turnaround behavior, please refer to TRM, page 831, Table 20-19, and a few others.

Thanks,
Milan

0 Sunil Kamath over 12 years ago

TI__Expert 7470 points

Milan,

I do not completely understand the issue you see with TA behavior when interfacing with a slow peripheral. Can you expand on your concern?

I suppose you have already seen the description for TA in Table 20-15 of the TRM. TA is inserted between consecutive write-read (and vice-versa) operations to the same chip select or between accesses to different chip selects . The HOLD count can be used to add delay if TA is insufficient. EMIFA also supports Extended Wait delays to account for large tR delays.

Regards,

Sunil Kamath

0 Milan Zelenka over 12 years ago in reply to Sunil Kamath

Intellectual 870 points

Hello Sunil,
Yes, I can expand my concern.

1. I am assuming that the purpose of TA phase is to allow designers to connect slow peripherals to the EMIFA.
[Yes/No] ?

2. If yes, I would expect TA phase shall create a trailing delay. Would you expect the same?

3. If TA before the operation is okay with you, then please explain how to use the option in systems with 4 peripherals connected to 4 different chip selects.

4. Also, I do not think TA is fully interchangable with Hold as the latter is being performed during each transaction whereas TA is not inserted in bursts (consecutive reads/write). Thus, the impact on overall system performance is much less when TA is used. That's why I need to fully understand how to use the TA option.

Thanks,
Milan

0 Sunil Kamath over 12 years ago in reply to Milan Zelenka

TI__Expert 7470 points

Milan,

Milan Zelenka said:
1. I am assuming that the purpose of TA phase is to allow designers to connect slow peripherals to the EMIFA.

Yes, but maybe it would be more accurate to say it helps avoid contention on the bus.

Milan Zelenka said:
2. If yes, I would expect TA phase shall create a trailing delay. Would you expect the same?
3. If TA before the operation is okay with you, then please explain how to use the option in systems with 4 peripherals connected to 4 different chip selects.

As long as there is required delay between accesses, does it matter if it is a leading TA delay of the next access or a trailing TA delay of the previous access? EMIFA has CS0 active by default. So for example if you have a read access to CS2 immediately followed by a read access to CS3, here is what you might expect to see:

a) CS0 active

b) CS2 active (CS0 deactivated) for async read access with TA followed by Setup, Strobe and Hold for CS2

c) CS3 active (CS2 deactivated) for async read access with TA, Setup, Strobe and Hold for CS3.

d) Back to CS0 active - but since this is an Async to SDRAM access, TA for CS3 in inserted before it

Do you observe or expect a different behavior?

Milan Zelenka said:
4. Also, I do not think TA is fully interchangable with Hold as the latter is being performed during each transaction whereas TA is not inserted in bursts (consecutive reads/write). Thus, the impact on overall system performance is much less when TA is used.

Agreed. This recommendation is made for the case where timing cannot be met using TA field, otherwise would not recommend unnecessarily extending access cycles.

Regards,

Sunil Kamath

0 Milan Zelenka over 12 years ago in reply to Sunil Kamath

Intellectual 870 points

Sunil,
Re 1 OK

Re 2 You did not answer my question.
Action: Please say slowly and clearly whether TA phase is executed before the operation or after.

Re 3 I totally disagree. Think about different TA settings for different mem windows. See the following scenario:

Flash connected to CS2
RAM1 connected CS3
RAM2 connected to CS4
A SLOW PERIPHERAL connected to CS4.

If TA is executed before CS4 transaction, how do I know which transaction follows the slow access to the SLOW peripheral?

Or - do you want me to slow down ALL subsequent TAs in the system? I hope not as that would lead to significant loss of performance.

Re 4 OK. With that being said, we have to clarify usage of the TA option.

Thanks,
Milan

0 Milan Zelenka over 12 years ago in reply to Milan Zelenka

Intellectual 870 points

Hello Sunil,
Do you think you could clarify my questions above?

Thanks,
Milan

0 Milan Zelenka over 12 years ago in reply to Milan Zelenka

Intellectual 870 points

Hello Sunil,
Do you think you could reply my questions above?

Many thanks in advance,
Milan

0 Sunil Kamath over 12 years ago in reply to Milan Zelenka

TI__Expert 7470 points

Milan,

Sorry for the delay in getting back to you. I think the TRM makes it quite clear in the Asynchronous Read/Write Operations sections that the turn-around phase is executed before the operation. Hopefully there is no ambiguity about it.

Though the EMIF can be used to interface to some other compatible peripherals, its main design is to connect to synchronous and asynchronous memories in the context of typical use cases in which the devices is usually used. Given the behavior of turn-around phase, it will be be necessary to use the worst-case TA in the scenario that you have outlined. I am not aware of any other way out of this except by redesigning the system - interfacing the slower peripheral to other interfaces if possible or connecting RAM to EMIFB etc.

Thank you for the suggestion. We will definitely take the necessary steps to clarify this behavior on further evaluation.

Hope this addresses your concern.

Regards,

Sunil Kamath

0 Milan Zelenka over 12 years ago in reply to Sunil Kamath

Intellectual 870 points

Re 2 OK, I take it that TA is executed before the operation.

Re 3 I think you misunderstood my point. No need discussing the worst case of my interface.

Here is the problem:
From my standpoint, it makes no sense to execute Turnaround phase BEFORE the operation as it is useless.

Question:
Please clarify WHY the OMAP executes Turnaround phase before the operation. There must be something I have been missing here.

Many thanks,
Milan

0 Milan Zelenka over 12 years ago in reply to Milan Zelenka

Intellectual 870 points

One important note:

The TRM reads:
If more turnaround cycles are required than can be programmed into the TA field, additional cycles can be added to the R_HOLD field to compensate.

Well, R_HOLD is executed AFTER the operation, right?

Question:
Please clarify WHY TRM suggests combining these two options. I do not get the point. Please explain.

Many thanks,
Milan

0 Milan Zelenka over 12 years ago in reply to Milan Zelenka

Intellectual 870 points

Hello,
do you think you could reply?

Many thanks,

Milan

0 Milan Zelenka over 12 years ago in reply to Milan Zelenka

Intellectual 870 points

Hello Sunil,
Do you think you coud help me understand TA implementation on the OMAP-L138?

Many thanks,
Milan

0 Milan Zelenka over 12 years ago in reply to Milan Zelenka

Intellectual 870 points

Hello Sunil,
I would really appreciate your clarification.

Best,
Milan

0 Milan Zelenka over 12 years ago in reply to Milan Zelenka

Intellectual 870 points

Hello Sunil,
Are you going to show up again?

Milan

0 Milan Zelenka over 12 years ago in reply to Milan Zelenka

Intellectual 870 points

Is there anybody out there?

0 Michael Williamson over 12 years ago in reply to Milan Zelenka

Expert 1995 points

Hi Milan,

Saw your posts go by....

It seems like Table 20-19 (for reads) and Table 20-20 (for writes) of the TRM (I am looking at Rev-A) have pretty good descriptions of the use of the TA cycles for async devices. They are inserted before a cycle conditionally based on the checks cited in the tables.

E.G., from Table 20-19 (reads):

Once the read operation becomes the highest priority task for the EMIFA, the EMIFA waits for the programmed
period number of turn-around cycles before proceeding to the setup period of the operation. The number of wait cycles is
taken directly from the TA field of the asynchronous n configuration register (CEnCFG). There are two exceptions
to this rule:
• If the current read operation was directly proceeded by another read operation to the same chip select, no
turnaround cycles are inserted.
After the EMIFA has waited for the turnaround cycles to complete, it again checks to make sure that the read
operation is still its highest priority task. If so, the EMIFA proceeds to the setup period of the operation. If it is no
longer the highest priority task, the EMIFA terminates the operation.

Sounds like the hardware implementation is going to force you to apply the TA times across all your CS devices if your SLOW device can't disable its output enables quick enough on a CS transition. Though it would seem you could up the Hold time of the cycle to match your slow device on its CS and avoid using longer TA times on your other CS spaces. If you add extra hold time to your slow device, then the OE will be de-asserted and then HOLD clocks will be added to the access time of your slow device, giving it time to disable its outputs (assuming it uses CS and OE to enable its drivers). Then when the next access is posted, the need for TA (for slow device read transitions to any other CS access) would be reduced or removed as the data bus should be free and clear. Agree?

-Mike

0 Milan Zelenka over 12 years ago in reply to Michael Williamson

Intellectual 870 points

Hi Mike,
Thanks for taking the existing communication thread over.

I totally agree with you, however, I think it is important to say that the approach you suggested has the following drawbacks:

1. In contrast to TA, Hold time phase executes in each and every Read. Thus, you cannot accelerate two consecutive Reads from the same (slow) device. The overall system perfomance would degrade.

2. If you decide to prevent potential bus contention issue by means of HOLD phase, then there is literally no need for TA phase. Unfortunately, the OMAP-L138 does not allow to turn off TA completely.

From my standpoint, the fact that Technical Reference Manual combines TA with HOLD implicates there is something wrong about either documentation or implementation, do you agree?

If current implementation is intentional, then I need to fully understand the rationale behind. If the implementation is wrong, then I suggest TI clealry describes the issue in the Errata document.

Let me know what you think.

Many thanks,
Milan

0 Michael Williamson over 12 years ago in reply to Milan Zelenka

Expert 1995 points

Milan,

Just so you know, I am not a TI employee, just a passer-by who uses the EMIFA a fair amount with the OMAP-L138 and FPGAs.

I don't disagree with 1 and 2. You're basically in a trade-off, keep the inefficiency bound to your slow device CS (on every read cycle) or let the issue spread to your other CS under certain cases using the TA fields.

It seems as though the TA behavior seems reasonably documented. The only thing I think could be improved in the documentation is to clarify which CEnCFG register the TA is taken from. The way it reads, it always takes the TA value from the CS space for the pending transaction. This pretty much makes having multiple TA values for different CS spaces useless. If the TA value is read from the CEnCFG of the CS space of the transaction just completed, then I would seem as though the TA values would be far more useful, and it would work more or less as you would expect. Perhaps some experiments are in order (given its unlikely TI will respond to this thread)?

The only reason the TRM suggests extending HOLD is if you require more clocks than the TA field will hold. All they are saying is that if your device is *that slow*, then you aren't out of luck you can use the HOLD parameter as a work-around.

I have to tell you, if you are trying to squeeze every last clock cycle out of the EMIFA, you might want to do some prototyping. If you search this forum you'll find that the path from the CPUs to the EMIFA is not exactly optimized due to the cross-bar / peripheral transaction routing int he chip, and a lot of folks have had trouble pushing the bandwidths up to what they would normally expect to see without using DMAs and some tweaking.

I wish you well in your endeavors.

-Mike

0 Milan Zelenka over 12 years ago in reply to Michael Williamson

Intellectual 870 points

Mike,
thanks for sharing your thoughts.
We agreed that TA should be implemented AFTER the slow access, and that's where the problem stems from.

I'll try to get a clarification/explanation from TI.

Best,
Milan

0 Milan Zelenka over 12 years ago in reply to Milan Zelenka

Intellectual 870 points

My pleasure talking to you !