C6472 target. Here's the top of the listing file:
1 ;******************************************************************************
2 ;* TMS320C6x C/C++ Codegen PC v7.4.8 *
3 ;* Date/Time created: Wed Jul 16 16:16:33 2014 *
4 ;******************************************************************************
5 .compiler_opts --abi=coffabi --c64p_l1d_workaround=default --endian=little --hll_source=on --l
6
7 ;******************************************************************************
8 ;* GLOBAL FILE PARAMETERS *
9 ;* *
10 ;* Architecture : TMS320C64x+ *
11 ;* Optimization : Enabled at level 2 *
12 ;* Optimizing for : Speed *
13 ;* Based on options: -o2, no -ms *
14 ;* Endian : Little *
15 ;* Interrupt Thrshld : Disabled *
16 ;* Data Access Model : Far Aggregate Data *
17 ;* Pipelining : Enabled *
18 ;* Speculate Loads : Enabled with threshold = 0 *
19 ;* Memory Aliases : Presume are aliases (pessimistic) *
20 ;* Debug Info : DWARF Debug *
21 ;* *
22 ;******************************************************************************
This is the loop that fails:
void runBadLoop (MB_RegInitBuf *pInitBuf, int inputEstPerLine)
{
int i;
pInitBuf->sumColOffset[0] = coeffTable[0].colOffset = 0;
for (i = 1; i < inputEstPerLine; i++)
{
pInitBuf->sumColOffset[i] = pInitBuf->sumColOffset[i-1] + coeffTable[i].colOffset;
}
}
/* +runBadLoop */
The details aren't important, but can be provided. The point is that the inside of the loop is of the form: a[i] = a[i-1] + b[i], and it's failing to recognize that the one of the source arrays is also the destination array. It ends up loading future values of the array before they're set, rendering the results useless. It treats pInitBuf->sumColOffset[i] and pInitBuf->sumColOffset[i-1] as disjoint arrays.
It seems to be the pointer dereference that throws the optimizer off. If I rewrite the loop to a[i] = a[i-1] + b[i], then it's fine, and the optimizer does the right thing.
Here is one successful workaround for this issue:
void fixBadLoop (MB_RegInitBuf *pInitBuf, int inputEstPerLine)
{
int i;
Int16 lastValue;
pInitBuf->sumColOffset[0] = coeffTable[0].colOffset = 0;
lastValue = 0;
for (i = 1; i < inputEstPerLine; i++)
{
lastValue = pInitBuf->sumColOffset[i] = lastValue + coeffTable[i].colOffset;
}
}
/* +fixBadLoop */
I admit that the construct we use to cascade that array initialization is unusual, but it's not all that unusual. The optimizer shouldn't be confused by it.
Our shop is tempted to stay away from 7.4.x and beyond until this bug is addressed.
Thanks,
BZ