EMIFA timing on the OMAP-L138

Dmitri Krivchitch

Other Parts Discussed in Thread: OMAP-L138

While running performance tests on OMAP-L138 we found some timing descrepancies when accessing EMIFA that we cannot explain.

The function we running is this:

CMP      r1,#0
BXEQ     lr
MOV      r12,#0
GetNextWord:
LDRH     r3,[r2,#0]
SUBS     r1,r1,#1
STRH     r3,[r0,r12]
ADD      r12,r12,#2
BNE      GetNextWord

It takes 3 arguments pointer to destination buffer, number of words to read and fixed source address. The function is executed with interrupts locked.

When source address is located in EMIFA CS2 it takes 211us to execute this function, when source address is located in shared RAM (0x80000000) it takes 23us to excute the same function.

In both cases destination buffer is located in DDR memory, number of words to read is 758.

The CS2 is configured to produce a read cycle of 70ns, EMIFA clock is 100Mz. Turn around time is set to 2, but it should not matter as we are not switching between readds and writes.

This should account for 53us (70ns*758) + ~ 20us, so the total time is expected to be ~75us, but we see 211us.

Is there a "dead" time between read cycles that is introduced by the EMIFA controller?

What takes an extra 136us when EMIFA reads are performed?

over 13 years ago

0 Renjith Thomas over 13 years ago

Guru 31670 points

Dmitri,

Can you give more idea about your testing environment? Are you running it from bootloader or an operating system, etc..

0 Dmitri Krivchitch over 13 years ago in reply to Renjith Thomas

Prodigy 220 points

We are running this code from the vxWorks on the ARM side. We see similar timing when EMIFA reads are performed by the DSP.

0 Renjith Thomas over 13 years ago in reply to Dmitri Krivchitch

Guru 31670 points

Dmitri,

I assume that i-cache is already enabled. Can you confirm the following?

1. The CS2 memory is having cacheable permissions?

2. Shared SRAM is also cacheable?

3. Can you try the function with LDR and STR instead of LDRH and STRH. I know that you are using half-word because of 758 bytes. I just want to see whether there is any impact with the word access?

4. Can you share more details or the schematics of the EMIF interface with external memory? Which is the memory chip used?

0 Dmitri Krivchitch over 13 years ago in reply to Renjith Thomas

Prodigy 220 points

Hi Renjith,

1. Yes, I-cache is enabled.

2. CS2 marked as not cacheable. we are interfacing smc 91c111 ethernet controller that has a 16bit data register. It has to be read each time while clocking ethernet frame in .

3. Shared RAM is set to not cacheable as well, so timing should be comparable.

4. Our data bus is 16 bit wide, that is why we are using LDRH and STRH.

5. As mentioned above the part we are interfacing is smc 91c111, we see same timing when accessing XR16L784CV uart (CS4). In case of CS4 we only using D0-D7, A0-A5,BA0,BA1, CS4,OE,WE.

Thank you,

Dmitri

0 Renjith Thomas over 13 years ago in reply to Dmitri Krivchitch

Guru 31670 points

Dmitri Krivchitch said:

4. Our data bus is 16 bit wide, that is why we are using LDRH and STRH.

But can you still try LDR and STR. Eventhough the interface is 16-bit, it should overlap and write, if you've configured properly.

0 Dmitri Krivchitch over 12 years ago in reply to Renjith Thomas

Prodigy 220 points

Using LDR to read data from EMIFA did not seem to make a difference in terms of timing.

0 Renjith Thomas over 12 years ago in reply to Dmitri Krivchitch

Guru 31670 points

Dmitri,

Can you share the part number of the device that is connected to EMIF CS2? Also can you share the EMIF controller register settings?

0 B Bresnahan over 12 years ago in reply to Renjith Thomas

Prodigy 10 points

Barrie here w. Dmitrik -

Part attached is a SMSC LAN91C111i-NU

EMIF Registers set per:

AWCC: 0x100500FF

CE2CFG: 0x48522195

Thanks for help,

Barrie

0 Renjith Thomas over 12 years ago in reply to B Bresnahan

Guru 31670 points

Barrie,

Can you try with the value CE2CFG: 0x08522195 and see whether any difference is there or not?

If there is nothing, can you try using LDM/STM instructions instead of LDR/STR? This will help in performing a burst transfer.

0 Dmitri Krivchitch over 12 years ago in reply to Renjith Thomas

Prodigy 220 points

Hi Renjith,

Thank you very much for your suggestion on using LDM. At first glance it seem to double EMIFA throughput .

I going to verify that data we read is valid.

Disabling Extended wait on the hand did not have affect on the read timing.

Again, thank you very much for your help.

Dmitri

0 Renjith Thomas over 12 years ago in reply to Dmitri Krivchitch

Guru 31670 points

Dmitri,

If this improves the throughput, then I believe you should be enabling DMA for the transfer, which will do a larger burst transfer than this while not loading the ARM so much. If you believe that your issue is solved completely, then please mark this post as answered.

0 Dmitri Krivchitch over 12 years ago in reply to Renjith Thomas

Prodigy 220 points

Hi Renjith,

It looks like DMA is the only way to improve performance.

While LDM instruction increases performance it does not allow us to read from same source address to multiple registers.

Thank you for your help,

Dmitri

0 Renjith Thomas over 12 years ago in reply to Dmitri Krivchitch

Guru 31670 points

Dmitri,

No, I will not agree. You can still use LDM/STM with incrementing address and you don't have to use the same source address. This will work because if your address higher address bits (A2, A3, A4, A5 etc) are not connected to LAN controller. Basically the incremented address will have no effect if it is not physically connected. I can confirm if you can share the schematics.

0 Dmitri Krivchitch over 12 years ago in reply to Renjith Thomas

Prodigy 220 points

Hi Renjith,

I do agree that if our address lines would be shifted so that each register would be mapped to 32 byte space (2 byte per register in existing hardware) than LDM/STM would work.

That would require hardware respin.

Dmitri

0 Renjith Thomas over 12 years ago in reply to Dmitri Krivchitch

Guru 31670 points

Dmitri,

Have you tried LDM/STM for at least 2 registers(8bytes)? How are your address lines tied?

0 Dmitri Krivchitch over 12 years ago in reply to Renjith Thomas

Prodigy 220 points

Hi Renjith,

Yes we tried LDM/STM instruction and we do see improvment in EMIFA throughput.Unfortunately, we cannot take advantage of LDM instruction to clock data from 16bit data port (91c111) on on the existing design. Each 32 bit read returns data from 2 registers plus LDM increments source pointer after each read. As I mentioned before mapping each register to 32 byte space would allow us take advantage of the LDM instruction.

Right now we have EMA_A0:EMA_13 connected to A2:A15 ( 91c111 ), EMA_BA1 connected to A1 (91c111). We would have to respin hardware to change the addressing.

We will setup EDMA transfer to clock data in from the data port and what throughput we will get.

Processors

Processors forum

EMIFA timing on the OMAP-L138