TMS320C6748: EDMA3 / compiler optimization

Peter Hong

Part Number: TMS320C6748

I've run into an interesting situation. EDMA3 (QDMA in my case) does not work as expected when the code is optimized at level 3 with speed 5. I am using QDMA to read a 16 bit port (same port) 16 times and transfer the readings to memory.

When code is compiled (with debug on) with optimization turned off, QDMA works as expected.

When code is compiled with optimization set as described above, QDMA does not work correctly the first time. Specifically, instead of signalling completion when all 16 values are transferred, the QDMA signals completion after the first value read/transfer.

What might be happening here?

The compiler version is 8.3.3

over 4 years ago

0 Yordan Kovachev over 4 years ago

TI__Guru**** 161600 points

Hi,

Can you share which Processor SDK RTOS are you using?

Best Regards,
Yordan

0 lding over 4 years ago in reply to Yordan Kovachev

TI__Guru* 95265 points

Hi,

Are you able to provide a snapshot of the EDMA code: including how the ParamSet is configured? I want to decode the OPT, also want to see how the source and destination address alignment, ACNT, BCNT, etc? And how you trigger the QDMA?

Regards, Eric

0 Peter Hong over 4 years ago in reply to Yordan Kovachev

Intellectual 926 points

No RTOS is used.

0 Peter Hong over 4 years ago in reply to lding

Intellectual 926 points

//=============================================================================

// PaRAM Block

#define PaRAM_START 0x01c04000
#define PaRAM_SET_0 0x4000
#define PaRAM_SET_1 (PaRAM_SET_0 + 32)
#define PaRAM_SET_2 (PaRAM_SET_1 + 32)
#define RF_IN (SOC_GPIO_0_REGS + 0x70)
#define NO_LINK 0XFFFF

#define QCHMAP0 0x01c00200
#define QER 0x01c01080
#define QEER 0x01c01084
#define QEECR 0x01c01088
#define QEESR 0x01c0108c

#define IER 0x01c01050
#define IECR 0x01c01058
#define IESR 0x01c01060
#define IPR 0x01c01068
#define ICR 0x01c01070
#define QDMA_READY (HWREG(IPR) & 0x00000001)

struct __attribute__((__packed__)) T_PaRAM {
// keep declaration order below!
uint32 opt;
uint32 src;
uint16 acnt;
uint16 bcnt;
uint32 dst;
int16 srcbidx;
int16 dstbidx;
uint16 link;
uint16 bcntrld;
int16 srccidx;
int16 dstcidx;
uint16 ccnt;
uint16 rsvd;
};

int16 DATA_READINGS[16];

#pragma LOCATION(data_PaRAM, PaRAM_START)
struct __attribute__((__packed__)) T_PaRAM data_PaRAM; //Param Set 0

//=============================================================================

bool QDMA_Done() { // DMA transfer complete?
return (QDMA_READY != 0);
}

//=============================================================================

void StartQDMA() {

//clear interrupt status
HWREG(ICR) = 0x00000001;
//set interrupt enable
HWREG(IESR) = 0x00000001;
// trigger DMA
data_PaRAM.opt = 0x0010000c; //TCINTEN=1, STATIC=1, AB synchronization
}

//=============================================================================

void InitQDMA() { //set up DMA

//clear interrupt status
HWREG(ICR) = 0x00000001;
//enable QDMA Ch 0
HWREG(QEESR) = 0x00000001;
// initialize PaRAM Set 0
data_PaRAM.src = RF_IN;
data_PaRAM.acnt = 2;
data_PaRAM.bcnt = 16;
data_PaRAM.srcbidx = 0;
data_PaRAM.dstbidx = 2;
data_PaRAM.bcntrld = 0;
data_PaRAM.srccidx = 0;
data_PaRAM.dstcidx = 0;
data_PaRAM.ccnt = 1;
// adjust destination address per EDMA3
data_PaRAM.dst = (uint32) DATA_READINGS + 0x11000000;
data_PaRAM.link = NO_LINK;
// trigger word; must be the last assignment statement in the function
//set interrupt enable
HWREG(IESR) = 0x00000001;
data_PaRAM.opt = 0x0010000c; //TCINTEN=1, STATIC=1, AB synchronization
}
//=============================================================================

At initialization , InitQDMA() is called.

In main loop, following is executed:

if (QDMA_Done()) {

// execute required stuff here

......

// start next 16 sample acquisition via QDMA

StartQDMA(); //trigger DMA action

// do other stuff hererfout = AD4_GetRFOutAverage();
....
}

0 Peter Hong over 4 years ago in reply to Peter Hong

Intellectual 926 points

here is what buffer looks like when the problem occurs:

0x00810198 DATA_READINGS
0x00810198 10382 0 0 0 0 0 0
0x008101A6 0 0 0 0 0 0 0
0x008101B4 0 0

0 lding over 4 years ago in reply to Peter Hong

TI__Guru* 95265 points

Hi,

The QDMA setup looks right: A-B sync, source and destination address incremental mode with source index 0 (essentially no increment) and destination index by 2 bytes movement. Each time you move 16 x 2 bytes from a FIFO. BIt 20 indicated the transfer completion interrupt set. And used the static for QDMA.

How fast the new data arrived into GPIO FIFO? Is it possible that the FIFO data is not arrived when you read the 2 to 16 times, so you got zero using optimization? If you measure the time spend from StartQDMA(); to (QDMA_Done()) , do you see the execution time is much faster?

What if you use A-sync only and use Intermediate transfer completion interrupt enable (bit 21), will you see 16 interrupt in both optimized and non-optimized case?

Regards, Eric

0 Peter Hong over 4 years ago in reply to lding

Intellectual 926 points

1. The speed of execution should not matter because QDMA_Done() checks the interrupt pending flag for DMA completion. It seems that, somehow, the interrupt pending flag is erroneously getting set prior to InitQDMA() is called, as well as and subsequently QDMA_Done() is called before InitQDMA() is called.

2. I rearranged the code by inserting a small delay after init and before start QDMA. This seems to have fixed the bug.

Processors

Processors forum

TMS320C6748: EDMA3 / compiler optimization