This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

EMIFA Problem only on read access, 200ns delay after each read access with CS deasserted ("turnaround time")

Other Parts Discussed in Thread: ADS8556

I've got a A/D converter connected to a DSP6747 device (B; Rev. 2.0), configured to asynchronous R/W, CS4

At some point I wondered why my interrupt routine accessing this converter at 500kHz consumes such CPU load.

Some data on the configuration:

100 MHz bus clock (confirmed on pin), 300 MHz DSP clock (consistent with NOP delay time)

CS4 with 1-3-1- timing, WAIT disabled, normal mode

No SDRAM or Flash at this interface

The answer came looking at the bus with an oszilloscope within a simple test: endless loop with 16 or 32 bit accesses to the interface (6 accesses within the loop, no asm instruction between the 6 reads/writes).

every read access is followed by 200 ns  to 220ns of silence on the bus, basically:

16 bit read: CS low -> 10ns later OE low -> 30ns later OE back -> 10ns later CS high -> 220ns break!!! -> CS low ...

32 bit read: CS low -> 10ns later OE low -> 30ns later OE back -> 20ns later OE low-> 30ns later OE back -> 10ns later CS high -> 220ns break!!! -> CS low ...

The funny thing: write accesses are not affected!

16 bit write: CS low -> 10ns later WR low -> 30ns later WR back -> 10ns later CS high -> 20ns break -> CS low ...

32 bit read: CS low -> 10ns later WR low -> 30ns later WR back -> 20ns later WR low-> 30ns later WR back -> 10ns later CS high -> 20ns break -> CS low ...

Something inserts a 200ns delay after each read access. I checked, as far as I could, all possibilities:

NAND is disabled

WAIT is disabled

EMIFA powerdown is disabled

Power controller? Would probably switch the whole thing off completely...

Bandwidth controller? But there are nothing but L1PRAM accesses and EMIFA.

Asynchronous bridge? OK, some delay but 200ns for a 300MHz / 100MHz bridge?

Error Registers? empty!

Interrupts? Disabled except NMI

The impact of this effect is severe! A connected ADS8556 cannot be read within the timeframe between two conversions (700ns @ 500kHz), reading these values takes 800ns and therefore 40% CPU load!!!

btw: sampling the same asynchronous peripheral by DMA, I can confirm the same "turnaround" time as found here http://e2e.ti.com/support/dsp/omap_applications_processors/f/42/t/199192.aspx

Also looking through the posts, in http://e2e.ti.com/support/dsp/omap_applications_processors/f/42/t/193033.aspx within the screenshot the first 2 µs before the interrupt, that looks very familiar. Read after write follows 'immediately', but after a read there are 200ns delay before the next write.

Additional information:

-> Address range not cached!

-> EDMA has a delay of 8 bus cycles between two "reads", CPU access 20-22

-> read after write follows according to turnaround time, write after read after the 200ns @ 100 MHz (CPU access)

-> all other timings follow the CE4CFG register settings

-> same problem with CE5 and when mixing CE3 / CE4 / CE5 access (no device connected)

-> doesn't matter if same, consecutive of completely different address within read access

-> PSC on always on

-> Busclock out on scope without interruption

-> AWCC MAX_WAIT or PMCR CSx_PG_DEL values have no influence

-> code in L1PRAM or L2RAM(->L1P by caching ), doesn't matter

-> Writing read values to L1DRAM or EMIFB (write cache?), also no change

  • Hi,

    Please provide your timing diagrams in image format.

    -Thanks

    Balaji N

     

     

  • I only have an old, crappy scope at hand, sorry for only providing a cellphone image.

    Timing is 2-4-2-2 (Setup-Strobe-Hold-TA) @100MHz on this image.

    On the left, there are two 16 bit write access cycles, following each other directly. (which works)

    On the right, there is the result of three read access cycles triggered by three LDW CPU commands (which have the 200ns in between in which the CPU is stalled).

    Upper trace is CS4, lower trace OE (and the probes were very crappy)

    Another company working a lot with C6xxx CPUs confirmend the behaviour and suspect the complex data path structure (and many bridges) to be responsible for the delays.

    My only chance is to program an EDMA to read the 6 values from the ADS8556 and then have an interrupt on the EDMA complete.

  • I just tried to service the ADS8556 with EDMA.

    Since a constant addressing mode is not supported, it is in my configuration necessary to setup 3 transfers(B) of 4 byte(A) each (servicing a channel pair and then resetting to the base address).

    This results in three transfers of 100ns (32-bit resulting in 2x16bit 10ns-30ns-10ns) separated by 100ns. (=500ns)

    Since the EDMA directly reacting on the BUSY pin seems to have 240ns of response time, the timing is still not acceptable, since some transfers are delayed by 100ns or more (-> resulting in more than 850ns which is more than the time betwen two cycles at my planned 500ns.

    Is there any chance of servicing a constant address at EMIFA within less than 500ns?

    So far I've encountered the following delays:

    a) The CPU needs 300ns (stalled) for a 32-bit read access to EMIFA (but enters the ISR quickly)

    b) The EDMA does not support constant address mode (as described)

    c) An A/B configuration of EDMA to emulate constant address mode inserts 100ns delay between every 32-bit/16-bit access

    d) The EDMA needs 250ns to setup an access

    Therefore there is almost no possibility to service an ADS8556 on the EMIFA within 750ns.

  • Carl, 
    the same problem with EMIFA was reported to me a few weeks ago.
    Did you get any feedback from TI?
    Milan

  • No, nothing!

    It seems, since it is a packet-based switching matrix in the DSP, that the following delays are 'normal' (C6747@ 300MHz / 150MHz Matrix / 100MHz EMIFA):

    Event-> DMA transfer (bus free) 250ns

    DMA second burst (AB Transfer): 80ns

    CPU read on EMIFA, 32-bit: 350ns (CPU stalled!!!; @ EMIFA 1-3-1 timing)

    CPU write: ? (probably cached, no delay)

    Since the company who designed the hardware gave the ADS8556 only two addresses for the read register, I need to program the DMA to have 3 Bursts(B) of 32-bit / 4 Byte (A) with no B source increment (constant addressing mode not supportet by EMIFA), therefore it takes 250ns+100ns+80ns+100ns+80ns+100ns (1-3-1 timing on EMIFA for the ADS8556->100ns for 2x16 bit). So even if the bus / switching matrix is completely free, it takes 700ns to read the AD converter by DMA, much longer by the CPU. Since the converter is supposed to be driven with 500 kSps or more and needs 1,26µs for a conversion, there is only a 50ns gap under ideal circumstances.

    My only advice:

    Service EMIFA always by DMA, not by the CPU regarding read access.

    If you have a peripheral is on the bus which needs constant addressing (FIFO Register, ADS8556...), connect it to some higher address line so it can be read in one DMA burst (16x64 bit) even if the DMA makes address increments (A-Transfer w/o constant address mode, not AB transfer)!

  • We are seeing similar issues.  Really hurting us.  Taking something like 4.5 usecs to read 20 bytes of data (using x16 width) using 1-6-1 timing at 100 MHz bus.

    UPP is another option that might work for you (unless latency is the key and not throughput....)

    -Mike

  • Hi

    This thread has been unattended from our side for a while, I apologize for the delay.

    I have not studied all the numbers, and cycles, but at a high level if the disconnect is on read vs write behavior and associated latency, I think that is generally expected the way reads vs write behaves.

    A good explanation on this can be found on the following thread from Kyle

    http://e2e.ti.com/support/dsp/omap_applications_processors/f/42/p/136305/494280.aspx#494280

    Using DMA bursting, and other suggesiton on uPP etc might be good options to pursue, if you are seeing stand alone performance on EMIFA via CPU accesses not acceptable.

    Regards

    Mukul

  • I finally used DMA.

    Now I have no CPU stall, but it doesn't mean that the latency is lower.

    The DMA controller needs a significant time to react on an event and since the uPP does not support constant addressing modes, the A-B adressing introduces additional transfer setup delays.I tweaked a lot using the uPP parameters, priorities, DMA settings, PLL etc.

    Still 700ns to complete the readout of the ADS8556. Since the ADS8556 has no FIFO and only 750ns between two cycles at full speed, almost impossible to run the converter at full speed without violating the specs.

    For the prototype I shipped I can live with it, but for a product development this would be a no-go.