Simple Description of Memory Stalls Required

MattB

Expert 2025 points

Hi,

We have an FPGA connected to a C6747 (OMAPL-137) via EMIFA.

Is the following correct?

"The FPGA/EMIFA is slower than than the CPU so when a load from the FPGA occurs the CPU waits for the result and this is called a memory stall."

"Memory stalls happen automatically and can't be seen in the code. They are not the same as NOP instructions."

What happens when a store to the FPGA occurs?

Any thoughts or pointers to documentation greatly appreciated,

Matt

over 15 years ago

0 tlee over 15 years ago

TI__Guru 62975 points

Matt,

Yes, when the CPU requests data from EMIFA, the bus stalls while the data is fetched. For writes to EMIFA, the data is buffered in the data bus bridge FIFOs.

You can learn more about the architecture here: http://processors.wiki.ti.com/index.php/OMAP-L1x/C674x/AM1x_SOC_Architecture_and_Throughput_Overview

-Tommy

0 MattB over 15 years ago in reply to tlee

Expert 2025 points

Hi Tommy,

tlee said:
For writes to EMIFA, the data is buffered in the data bus bridge FIFOs

So for a write there is no bus stall and the CPU runs as fast as possible.

I'm asking because I've been contemplating possible optimisations of our FPGA communications.

This application mainly writes to the FPGA.

There are some memory mapped configuration registers which are written to using C structures that reflect the natural format of the register. In other words, if it is a 16-bit registers (most of them are) the C structure contains a Uint16 and the value is calculated and written to the FPGA with a C assignment resulting in a STH (store halfword) instruction.

There is also data to send and this is clocked in to the FPGA by repeatedly writing to the same 32-bit register. Again this is done using a C assignment in a loop.

In the case of the configuration I was contemplating using some kind of ^hack^ (perhaps the union hack) to cast the data into a longer type and get the compiler to generate longer store instructions, STW or even STDW. But if the CPU isn't waiting for the STH and it has some processing to do to get the next value ready what would this achieve? The EMIF hasn't got much else to do at this point so maximising its throughput isn't important.

In the case of the data I was contemplating using DMA which would unload the CPU but I'm not sure that I've got anything else for the CPU to do so what would it achieve?

In summary, the geek in me would love to start messing with union hacks and DMA but the software engineer in me is saying "don't optimise, simple and obvious code is best"!

tlee said:
You can learn more about the architecture here: http://processors.wiki.ti.com/index.php/OMAP-L1x/C674x/AM1x_SOC_Architecture_and_Throughput_Overview

Good link, thanks,

Matt

0 tlee over 15 years ago in reply to MattB

TI__Guru 62975 points

Matt,

Yes, the CPU can run as fast as the FIFOs can buffer the data.

In my opinion, using EDMA to move the data to EMIF would be a good idea for the following reasons:

The CPU bandwidth that is freed by EDMA may not be needed now, but you'll probably find a good use for it in the future
EDMA has a programmable burst size and ACNT/BCNT/CCNT so you can handle the data movement optimization directly through hardware parameters instead of software data structure manipulation
EDMA has its own system priority setting so you can vary its importance vs CPU activity and other peripherals

-Tommy

0 MattB over 15 years ago in reply to tlee

Expert 2025 points

Thanks for your thoughts Tommy.

I shall give using DMA some consideration while I deal with some more pressing issues!

Matt

Processors

Processors forum

Simple Description of Memory Stalls Required