Customer reporting the use the CPU profile in CCS to measure the clock cycles for write and read to EMIFA address in the code below. The write takes about 10 clock cycles but the read takes about 150 clock cycles. Stepping into the disassembly, it appeared the instructions LDHU.D1T1 *+A3[0], A3 take about ~140 cycles.
There are two memory mapped registers that write and read samples from EMIFA external address range defined as below
#define VD32_BASE_ADDRESS (0x66000100)
#define VD32_OUTPUT *(volatile int16u*) (VD32_BASE_ADDRESS + 0x08) // for read
#define VD32_SOFT_D *(volatile int16u*) (VD32_BASE_ADDRESS + 0x00) // for write
int16u ReadBackBits[VD32_INPUT_R12_LEN + VD32_INPUT_R34_LEN]; // put this in either with local memory or in internal RAM
int16u *ReadBackBitsPtr; /* Point to read back bits buffer */
ReadBackBitsPtr = &ReadBackBits[0];
int16u SoftDecisionsR34[1] = { 0x4340};
Looking for an explanation why the read from EMIFA address are much slower?.The EMIFA controller is set up for 16 data bits, chip select 5. They get the same number of cycles when the memory location where it reads the data from EMIFA into is either in a local buffer or in L3 internal RAM.