Other Parts Discussed in Thread: TMS320C6748
I believe I've found an error from the TMS320C6748 optimizing compiler. I'm posting it here for the TI development team.
I'll here identify the problem, and attach the source file. I'll highlight the important stuff in red.
This is just the loop kernel portion of the assembly code generated by the optimizing compiler. (I'm using a beta version of a recent compiler that was released in beta version a few weeks ago.)
Notice the two things highlighted in red. The first one loads a data item, and the second one stores (time-shifted data, intermediary in a digital filter) to that same location. The problem is the store occurs TOO SOON. It stores on the first pass through this multi-pass pipeline, before the value has been calculated. I suspect the statement should have a conditional qualifier on it, to prevent the store from occuring while the pipeline is filling.
Something like this: [!A2] STDW .D1T1 wn11:wn10,*-data_ptr(16)
*******************************************************************************
Here is the compiler generated code:
$C$L2:
; PIPED LOOP KERNEL
[ cnt] ADD .S1 0xffffffff,cnt,cnt ; |87| <0,48> Decrement loop counter
|| SUBDP .L1 wn01:wn00,wn21$3:wn20$3,Rtmp31$1:Rtmp30$1 ; |75| <2,32>
|| LDDW .D1T1 *data_ptr++(16),wn21:wn20 ; |60| <6,0> Delayed DP data
|| LDDW .D2T2 *coeff_ptr++(24),Rb21:Rb20 ; |64| <6,0> Filter coefficients
MPYDP .M2X Ra01:Ra00,Rtmp31$1:Rtmp30$1,Rtmp31:Rtmp30 ; |76| <1,41>
|| SUBDP .S1 Rtmp11:Rtmp10,Rtmp31$2:Rtmp30$2,wn01:wn00 ; |74| <3,25>
|| LDDW .D1T1 *-data_ptr(8),wn11:wn10 ; |61| <6,1> Delayed DP data
|| LDDW .D2T2 *-coeff_ptr(16),Rb11:Rb10 ; |65| <6,1> Filter coefficients
[ cnt] B .S2 $C$L2 ; |88| <0,50> Loop back
|| [!A2] STDW .D1T1 wn01:wn00,*-data_ptr(72) ; |80| <2,34>
|| ROTL .M1 wn21$2,0,wn21$3 ; |60| <3,26> Split a long life
|| ADDDP .L1X Rtmp21:Rtmp20,Rtmp31$3:Rtmp30$3,Rtmp31$2:Rtmp30$2 ; |73| <4,18>
DPSP .L2 Rtmp31:Rtmp30,Rout1 ; |83| <0,51>
|| ROTL .M1 wn20$2,0,wn20$3 ; |60| <3,27> Split a long life
|| MV .S1 wn20$1,wn20$2 ; |60| <4,19> Split a long life
|| MV .D1 wn21$1,wn21$2 ; |60| <4,19> Split a long life
[!A2] LDDW .D2T2 *-coeff_ptr(104),Ra01:Ra00 ; |66| <2,36> Filter coefficients
|| MV .L1 wn21,wn21$1 ; |60| <5,12> Split a long life
|| MV .S1 wn20,wn20$1 ; |60| <5,12> Split a long life
[ A2] SUB .S1 A2,1,A2 ; <0,53>
|| MVD .M1 wn20$2,wn20$2 ; |60| <4,21> Split a long life
|| MPYDP .M2X Rb21:Rb20,wn21:wn20,Rtmp31$3:Rtmp30$3 ; |71| <6,5>
STDW .D1T1 wn11:wn10,*-data_ptr(16) ; |79| <6,6>
|| MPYDP .M1X Rb11:Rb10,wn11:wn10,Rtmp21:Rtmp20 ; |72| <6,6>
[ A1] SUB .D1 A1,1,A1 ; <0,55>
|| [!A1] STW .D2T2 Rout1,*output_ptr++ ; |85| <0,55> Output the result of this filter section
; Branch back occurs here