C6713- Real Time FFT Computation Algorithm

BAS

Other Parts Discussed in Thread: SPRC121, CCSTUDIO

Hi,

I am trying to compute FFT of a sine wave given as an input to C6713 DSK but I don't know which FFT program should I use to do so. I also downloaded C67xDSPLIB functions and there I found so many FFT programs in the support folder but not sure which one will full fill my requirement.

Below is the part of program which is sampling input signal and storing in the buffer of size 1024 now I need to compute FFT on this 1024 points of data.

interrupt void c_int11()    //interrupt service routine
{
output_sample((short)input_sample());//input/output data

buffer[i] =((short)input_sample());   //store data in buffer
i++;                          //increment buffer index
if (i==BUFFER_SIZE) i = 0; //reinit index if buffer full
return;                       //return from ISR
}

Can any one of please explain me which FFT program from the DSPLIB folder could be used to perform FFT ? How can I use this buffer's value as an input to to FFT function ?

Please guys help me out.

BR,

over 15 years ago

0 clam over 15 years ago

TI__Expert 8785 points

I'm not a DSPLIB expert, but you can find more information about the various FFT functions from the appnote and userguide located here, http://focus.ti.com/docs/toolsw/folders/print/sprc121.html#Technical%20Documents. The documents will provide a brief description of each function, as well as the required parameters needed to make the function call.

- Christina

0 BAS over 15 years ago in reply to clam

Expert 2370 points

Hi,

I have collection of data points taken from Oscilloscope for sine wave means digital data points now I am interested to see the frequency response of those data. Can any one please help me how to read those data from PC using C6713 in order to compute FFT ?

Waiting for your help.

Thanks

0 BAS over 15 years ago in reply to BAS

Expert 2370 points

Hey TI guys, please help me.

0 Mukul Bhatnagar over 15 years ago in reply to BAS

TI__Guru* 82775 points

Hi
Do you have access to Matlab, Simulink tools?

http://www.mathworks.com/products/new_products/release14.html#TIC6000

Typically that is a good way to accomplish what you are trying to do.

Regards

Mukul

0 BAS over 15 years ago in reply to Mukul Bhatnagar

Expert 2370 points

Hi,

I don't have access to Simulink tool, there was no Matlab CD with my DSP kit instead there is a flyer from MathWorks indicating the web address for the 30-day trial of Matlab but I am still waiting for the download link which supposed to be sent by MathWorks affter getting register there.

Please help me downloading Simulink tool and elaborate how can I use it for my task.

I have data consisting of 1 Million points, how can I load them in any register in C6713 to compute FFT ?

Please respond me ASAP.

Thanks.

0 Mukul Bhatnagar over 15 years ago in reply to BAS

TI__Guru* 82775 points

BAS said:
Please help me downloading Simulink tool and elaborate how can I use it for my task.

That is a Mathworks offering. We cannot help on this ( I see a trial software link) . Please refer to the link provided in my previous post.

BAS said:
I have data consisting of 1 Million points, how can I load them in any register in C6713 to compute FFT ?

I don't completely understand this statement. I feel like some basic concepts are missing in what you are trying to accomplish. If this is a critical product/application you are working on , please try to get local TI support in your region to work on this. IMO there are several online labs/presentations/tutorials that help on how c6713DSK can be used for signal processing. It seems to me that you would need to start with those. I am not aware of any canned documentation collateral to provide to you in what you are trying to accomplish.

0 BAS over 15 years ago in reply to Mukul Bhatnagar

Expert 2370 points

Hi Guys,

I am computing FFT using TI’s radix-2 optimized FFT function (cfftr2_dit), the function for generating the index for bit reversal (digitrev_index), and the function for the bit reversal procedure (bitrev). When I am compiling this program, I am getting following errors.

undefined                first referenced
symbol                            in file
---------                        ----------------
_cfftr2_dit                      C:\CCStudio_v3.1\MyProjects\FFTr2\Debug\FFTr2.obj
_bitrev                            C:\CCStudio_v3.1\MyProjects\FFTr2\Debug\FFTr2.obj

error: symbol referencing errors - './Debug/FFTr2.out' not built

>> Compilation failure

I don't know whats wrong in there.

One thing is that files bitrev.sa and cfftr2_dit.sa are appearing as a text file in Code Composer Studio, I just copied them as it is in notepad and saved them in .sa format. I don't know how to use .sa files. My main code is mentioned below. Please help me to run this program.

#include "dsk6713_aic23.h"
Uint32 fs=DSK6713_AIC23_FREQ_8KHZ; //set sampling rate
#include <math.h>
#define N 256                        //number of FFT points
#define RADIX 2                        //radix or base
#define DELTA (2*PI)/N                //argument for sine/cosine
#define PI 3.14159265358979
short i = 0;
short iTwid[N/2];                    //index for twiddle constants W
short iData[N];                        //index for bitrev X
float Xmag[N];                        //magnitude spectrum of x
typedef struct Complex_tag {float re,im;}Complex;
Complex W[N/RADIX];                    //array for twiddle constants
Complex x[N];                         //N complex data values
#pragma DATA_ALIGN(W,sizeof(Complex))   //align W on boundary
#pragma DATA_ALIGN(x,sizeof(Complex))    //align input x on boundary

void main()
{
for( i = 0 ; i < N/RADIX ; i++ )
{
   W[i].re = cos(DELTA*i);            //real component of W
   W[i].im = sin(DELTA*i);            //neg imag component
}                                    //see cfftr2_dit
digitrev_index(iTwid, N/RADIX, RADIX);//produces index for bitrev() W
bitrev(W, iTwid, N/RADIX);               //bit reverse W

comm_poll();                        //init DSK,codec,McBSP
for(i=0; i<N; i++)
     Xmag[i] = 0;                    //init output magnitude
while (1)                             //infinite loop
{
output_sample(32000);                //negative spike for reference
for( i = 0 ; i < N ; i++ )
   {
    x[i].re = (float)((short)input_sample()); //external input
    x[i].im = 0.0 ;                    //zero imaginary part
    if(i>0)    output_sample((short)Xmag[i]);    //output magnitude
   }

cfftr2_dit(x, W, N ) ;                    //TI floating-pt complex FFT
digitrev_index(iData, N, RADIX);        //produces index for bitrev() X
bitrev(x, iData, N);                    //freq scrambled->bit-reverse x
for (i =0; i<N; i++)
    Xmag[i] = sqrt(x[i].re*x[i].re+x[i].im*x[i].im)/32; //magnitude of X
}
}

BR,

0 BAS over 15 years ago in reply to BAS

Expert 2370 points

Here attached .sa files. I just saved them as .sa and added into project in Code Composer Studio.

bitrev.sa
===============================================================================
*
*	TEXAS INSTRUMENTS, INC.
*
*	Copyright © Texas Instruments Incorporated 1998
*
*	TI retains all right, title and interest in this code and authorizes its
*	use solely and exclusively with digital signal processing devices
*	manufactured by or for TI.  This code is intended to provide an
*	understanding of the benefits of using TI digital signal processing devices.
*	It is provided "AS IS".  TI disclaims all warranties and representations,
*	including but not limited to, any warranty of merchantability or fitness
*	for a particular purpose.  This code may contain irregularities not found
*	in commercial software and is not intended to be used in production
*	applications.  You agree that prior to using or incorporating this code
*	into any commercial product you will thoroughly test that product and the
*	functionality of the code in that product and will be solely responsible
*	for any problems or failures.  
*
*	TI retains all rights not granted herein.
*
*
*	Linear Time, Small Lookup Table: Bit Reversal
*
*	Revision Date: 5/12/98
*
*	USAGE	This routine is C Callable and can be called as:
*
*		void bitrev(float *x, short *index, int n){
*		
*		x	=	Input Array to be Bit-Reversed
*		n	=	Number of points in array must be a power of 2)
*		index	=	Array of ~sqrt(n) created by the routine
*				digitrev_index found below to allow the fast 
*				implementation of the bit-reversal
*
*		If routine is not to be used as a C callable function
*		then all instructions relating to stack should be removed.
*		Refer to comments of individual instructions.  You will also
*		need to initialize values for all of the values passed as these
*		are assumed to be in registers as defined by the calling 
*		convention of the compiler, (refer to the C compiler reference
*		guide).
*
*	C Code 	This is the C equivalent of the Assembly Code without 
*		restrictions.  Note that the assembly code is hand optimized 
*		and restrictions may apply
*
*	TI retains all rights, title and interest in this code and only
*	authorizes the use of this code on TI TMS320 DSPs manufactured by TI.
*
*	void bitrev(float *xs, short *index, int n){
*		int    i;
*		short  i0, i1, i2, i3;
*		short  j0, j1, j2, j3;
*		double xi0, xi1, xi2, xi3;
*		double xj0, xj1, xj2, xj3;
*		short  t;
*		int    a, b, ia, ib, ibs;
*		int    mask;
*		int    nbits, nbot, ntop, ndiff, n2, halfn;
*		double *x ;
*		x = (double *)xs ;
*		
*		nbits = 0;
*		i = n;
*		while (i > 1)
*		{
*		   i = i >> 1;
*		   nbits++;
*		}
*		
*		nbot    = nbits >> 1;
*		ndiff   = nbits & 1;
*		ntop    = nbot + ndiff;
*		n2      = 1 << ntop;
*		mask    = n2 - 1;
*		halfn   = n >> 1;
*		
*		for (i0 = 0; i0 < halfn; i0 += 2)
*		{
*		    b       = i0 & mask;
*		    a       = i0 >> nbot;
*		    if (!b) ia = index[a];
*		    ib      = index[b];
*		    ibs     = ib << nbot;
*		
*		    j0      = ibs + ia;
*		    t       = i0 < j0;
*		    xi0     = x[i0];
*		    xj0     = x[j0];
*		
*		    if (t)
*		    {
*		      x[i0] = xj0;
*		      x[j0] = xi0;
*		    }
*		
*		    i1      = i0 + 1;
*		    j1      = j0 + halfn;
*		    xi1     = x[i1];
*		    xj1     = x[j1];
*		    x[i1] = xj1;
*		    x[j1] = xi1;
*		
*		    i3      = i1 + halfn;
*		    j3      = j1 + 1;
*		    xi3     = x[i3];
*		    xj3     = x[j3];
*		    if (t)
*		    {
*		      x[i3] = xj3;
*		      x[j3] = xi3;
*		    }
*		  }
*	}
*	
*	DESCRIPTION
*		This routine performs the bit-reversal of the input array x[].
*		where x[] is an array of length n 32-bit complex pairs of data.
*		This requires the index array provided by the program below.
*		This index should be generated at compile time not by the DSP.
*
*	ASSUMPTIONS
*		n is a power of 2
*
*	NOTE: If n <= 4K one can use the char (8-bit) data type for
*		the "index" variable. This would require changing the LDH when
*		loading index values in the assembly routine to LDB. This would
*		further reduce the size of the Index Table by half its size.
*
*	CYCLES	(n/4)*11 + 9
*
*	MEMORY NOTE
*		There are NO memory bank hits regarless of array alignment
*
*******************************************************************************
* 	Use This Routine To Generate the Index Table for
* 	Bit/Digit Reversing of Radix-2 and Radix-4 Routines
*******************************************************************************
* 	This routine calculates the index for digitrev of length n 
* 	(length of index is 2^(radix*ceil(k/radix)) where n = 2^k
* 	in otherwords
* Either:sqrt(n) when n=2^even# Or: sqrt(2)*sqrt(n) when n=2^odd# [radix 2]
*	 sqrt(n) when n=4^even# Or: sqrt(4)*sqrt(n) when n=4^odd# [radix 4]
* Note: the variable "radix" is 2 for radix-2 and 4 for radix-4
*******************************************************************************
*	 
*	void digitrev_index(short *index, int n, int radix){
*	
*		int		i,j,k;
*		short	nbits, nbot, ntop, ndiff, n2, raddiv2; 
*	
*		nbits = 0;
*		i = n;	
*		while (i > 1){
*			i = i >> 1;
*			nbits++;
*		}
*	
*		raddiv2	= radix >> 1;
*		nbot	= nbits >> raddiv2;
*		nbot	= nbot << raddiv2 - 1;
*		ndiff	= nbits & raddiv2;
*		ntop	= nbot + ndiff;
*		n2		= 1 << ntop;
*	
*		index[0] = 0;
*		for ( i = 1, j = n2/radix + 1; i < n2 - 1; i++){
*			index[i] = j - 1;
*			for (k = n2/radix; k*(radix-1) < j; k /= radix)
*					j -= k*(radix-1);
*			j += k;
*		}
*		index[n2 - 1] = n2 - 1;
*	}
*
*	TECHNIQUES
*
*	1.	The following registers are sharead to reduce register pressure
*			A8  (between) xi2, xi4, n2, nbits
*			A9  (between) xj2, xj4, tmp, ndiff
*			A10 (between) xi0, cnst
*			A11 (between) xj0, ntop
*			B4  (between) index, ptr_i1
*			B6  (between) xi1, xi3
*			B7  (between) xj1, xj3
*	2.	The first set of indices ia and ib are loaded before
*			loop kernel
*	3.	Three load pointers are used to decouple 3 load/stores 
*	4.	Memory bank hits are eliminated by loading odd/even 
*		index pair in one cycle
*
*******************************************************************************

	.global	_bitrev
	.text

_bitrev:

	STW	.D2T2	B14,*B15--(48)
	STW	.D2T2	B3,*+B15(4)
	STW	.D2T1	A10,*+B15(8)
	STW	.D2T1	A11,*+B15(12)
	STW	.D2T1	A12,*+B15(16)
	STW	.D2T1	A13,*+B15(20)
	STW	.D2T1	A14,*+B15(24)
	STW	.D2T1	A15,*+B15(28)
	STW	.D2T2	B10,*+B15(32)
	STW	.D2T2	B11,*+B15(36)
	STW	.D2T2	B12,*+B15(40)
	STW	.D2T2	B13,*+B15(44)

; Begin Benchmark Timing

	LDH	.D2T1	*B4, A13		; ib = *index
||	SHR	.S2X	A6, 2, B1		; icntr = n >> 2
||	MVK	.S1	31, A10			; cnst = 31
||	LMBD	.L1	1, A6, A9		; tmp = lmbd(1, n)
||	ZERO	.L2	B13			; i0 = 0

	SHR	.S2X	A6, 1, B5		; halfn = n >> 1
||	SHL	.S1	A6, 2, A12		; halfnbytes = n << 2
||	LDH	.D2	*B4, B3			; ia = *index
||	SUB	.L1	A10, A9, A8		; nbits = cnst - tmp

	SHR	.S1	A8, 1, A14		; nbot1 = nbits >> 1
||	AND	.L1	A8, 1, A9		; ndiff = nbits & 1

	ADD	.L1	A14, A9, A11		; ntop = nbot1 + ndiff
||	MVK	.S1	1, A9			; tmp = 1

	SHL	.S1	A9, A11, A8		; n2 = tmp << ntop
||	ADD	.S2	B13,2,B2		; aa = i0 + 2
||	ADD	.L1X	B13,2,A1		; bb = i0 + 2
||	MV	.L2X	A14, B14		; nbot = nbot1

	SHL	.S1	A13, A14, A13		; ib = ib << nbot1
||	SUB	.D1	A8, 1, A15		; mask = n2 - 1

	ADD	.L1X	A13, B3, A0		; j0 = ib + ia
||	SHR	.S1	A6, 1, A6		; n = n >> 1

	SHL	.S1	A0,3,A3			; ptr_y0 = j0 << 3
||	MV	.L1X	B4, A7			; ptr_i0 = index

	MV	.L2X	A4,B11			; ptr_y1 = x
||	ADD	.L1	A4,A3,A3		; ptr_y0 = x + ptr_y0
||	AND	.S1	A1, A15, A1		; bb = bb & mask
||	SHR	.S2	B2, B14, B2		; aa = aa >> nbot


LOOP:
	LDDW	.D2T1	*++B11[1], A9:A8	; xj2:xi2 = *++ptr_y1[1]
||	LDDW	.D1T2	*++A3[A6], B7:B6	; xj3:xi3 = *++ptr_y0[n]
||	ADD	.S2	B11,8,B12		; ptr_z1 = ptr_y1 + 8
||	ADD	.L1	A3,A12,A5		; ptr_z0 = ptr_y0 + halfnbytes
||	CMPLT	.L2X	B13, A0, B0		; if (i0 < j0) {t=1} else {t=0}

	LDH	.D1	*+A7[A1], A13		; ib = *+ptr_i0[bb]
||	MV	.L1	A4, A2			; ptr_x0 = x 
||	MV	.L2X	A4, B10			; ptr_x1 = x

  [B0]	LDDW	.D2T1	*++B10[B13], A11:A10	; if (t) xj0:xi0 = *++ptr_x1[i0]
||[B0]	LDDW	.D1T2	*++A5[1], B9:B8		; if (t) xj5:xi5 = *++ptr_z0[1]
||	ADD	.L2	B13, 2, B13		; i0 = i0 + 2

  [!A1]	LDH	.D2	*+B4[B2], B3		; if (!bb) ia = *+ptr_i1[aa]

  [B0]	LDDW	.D1T2	*++A2[A0], B7:B6	; if (t) xj1:xi1 = *++ptr_x0[j0]
||[B0] 	LDDW	.D2T1	*++B12[B5], A9:A8	; if (t)xj4:xi4=*++ptr_z1[halfn]
||[B1]	SUB	.S2	B1, 1, B1		; if (icntr) icntr -=1
||	ADD	.L2	B13,2,B2		; aa = i0 + 2
||	ADD	.L1X	B13,2,A1		; bb = i0 + 2

	STW	.D1T1	A8, *A3++[1]		; *ptr_y0++[1] = xi2
||	STW	.D2T2	B7, *++B11[1]		; *++ptr_y1[1] = xj3
||[B1]	B	.S2	LOOP

	STW	.D1T1	A9, *A3--[1]		; *ptr_y0--[1] = xj2
||	STW	.D2T2	B6, *--B11[1]		; *--ptr_y1[1] = xi3
||	SHL	.S1	A13, A14, A13		; ib = ib << nbot1 

  [B0]	STW	.D1T1	A10, *A2++[1]		; if (t) *ptr_x0++[1] = xi0
||[B0]	STW	.D2T2	B9, *++B12[1]		; if (t) *++ptr_z1[1] = xj5
||	AND	.S1	A1, A15, A1		; bb = bb & mask
||	SHR	.S2	B2, B14, B2		; aa = aa >> nbot

  [B0]	STW	.D1T1	A11, *A2--[1]		; if (t) *ptr_x0--[1] = xj0
||[B0]	STW	.D2T2	B8, *--B12[1]		; if (t) *--ptr_z1[1] = xi5
||	ADD	.L1X	A13, B3, A0		; j0 = ib + ia

  [B0]	STW	.D2T2	B7, *++B10[1]		; if (t) *++ptr_x1[1] = xj1
||[B0]	STW	.D1T1	A8, *A5++[1]		; if (t) *ptr_z0++[1] = xi4
||	SHL	.S1	A0,3,A3			; ptr_y0 = j0 << 3
||	SHL	.S2	B13,3,B11		; ptr_y1 = i0 << 3

  [B0]	STW	.D2T2	B6, *--B10[1]		; if (t) *--ptr_x1[1] = xi1
||[B0]	STW	.D1T1	A9, *A5--[1]		; if (t) *ptr_z0--[1] = xj4
||	ADD	.L2X	A4,B11,B11		; ptr_y1 = x + ptr_y1
||	ADD	.L1	A4,A3,A3		; ptr_y0 = x + ptr_y0

;---------------------------------------------------------------------------
; End Benchmark Timing

	LDW	.D2	*+B15(4),B3		; pop return address
	LDDW	.D2T1	*+B15(8),A11:A10
	LDDW	.D2T1	*+B15(16),A13:A12
	LDDW	.D2T1	*+B15(24),A15:A14
	LDDW	.D2T2	*+B15(32),B11:B10
	LDDW	.D2T2	*+B15(40),B13:B12
||	B	.S2	B3			; Return to calling function

	LDW	.D2T2	*++B15(48),B14

	NOP		4

cfftr2_dit.sa
===============================================================================
*
* From: https://www-a.ti.com/apps/c6000/xt_download.asp?sku=C67x_cfftr2
*
*	TEXAS INSTRUMENTS, INC.
*
*	Copyright © Texas Instruments Incorporated 1998
*
*	TI retains all right, title and interest in this code and authorizes its
*	use solely and exclusively with digital signal processing devices
*	manufactured by or for TI.  This code is intended to provide an
*	understanding of the benefits of using TI digital signal processing devices.
*	It is provided "AS IS".  TI disclaims all warranties and representations,
*	including but not limited to, any warranty of merchantability or fitness
*	for a particular purpose.  This code may contain irregularities not found
*	in commercial software and is not intended to be used in production
*	applications.  You agree that prior to using or incorporating this code
*	into any commercial product you will thoroughly test that product and the
*	functionality of the code in that product and will be solely responsible
*	for any problems or failures.  
*
*	TI retains all rights not granted herein.
*	
*
*	RADIX-2 FFT (DIT)
*
*	Revision Date: 5/21/98
*	
*	USAGE	
*
*		This routine is C Callable and can be called as:
*		
*		void cfftr2_dit( float *x, const float *w, short N)
*
*		x	Pointer to Array of Dimension 2*N elements holding 
*			Input to and Outputs from function cfftr2_dit()
*		w	Pointer to an array holding the coefficient (Dimension
*			n/2 complex numbers)
*		N	Number of complex points in x
*
*		If routine is not to be used as a C callable function then
*		you need to initialize values for all of the values passed
*		as these are assumed to be in registers as defined by the 
*		calling convention of the compiler, (refer to the C compiler
*		reference guide).
*
*		ARGUMENTS PASSED   	->   REGISTER
*		---------------------------------
*		x                	->   A4
*		w                	->   B4
*		N                	->   A6
*
*	C CODE
*
*		This is the C equivalent of the assembly code.  Note that
*		the assembly code is hand optimized and restrictions may
*		apply.
*
*		void cfftr2_dit(float* x, float* w, short n)
*		{
*		   short n2, ie, ia, i, j, k, m;
*		   float rtemp, itemp, c, s;
*		
*		   n2 = n;
*		   ie = 1;
*		
*		   for(k=n; k > 1; k >>= 1)
*		   {
*		      n2 >>= 1;
*		      ia = 0;
*		      for(j=0; j < ie; j++)
*		      {
*		         c = w[2*j];
*		         s = w[2*j+1];
*		         for(i=0; i < n2; i++)
*		         {
*		            m = ia + n2;
*		            rtemp     = c * x[2*m]   + s * x[2*m+1];
*		            itemp     = c * x[2*m+1] - s * x[2*m];
*		            x[2*m]    = x[2*ia]   - rtemp;
*		            x[2*m+1]  = x[2*ia+1] - itemp;
*		            x[2*ia]   = x[2*ia]   + rtemp;
*		            x[2*ia+1] = x[2*ia+1] + itemp;
*		            ia++;
*		         }
*		         ia += n2;
*		      }
*		      ie <<= 1;
*		   }
*		}
*
*
*	DESCRIPTION
*
*		This routine performs the Decimation-in-Time (DIT) Radix-2 FFT 
*		of the input array x.
*		x has N complex floating point numbers arranged as successive
*		real and imaginary number pairs. Input array x contains N
*		complex points (N*2 elements). The coefficients for the
*		FFT are passed to the function in array w which contains
*		N/2 complex numbers (N elements) as successive real and
*		imaginary number pairs.
*		The FFT Coefficients w are in N/2 bit-reversed order
*		The elements of input array x are in normal order
*		The assembly routine performs 4 output samples (2 real and 2
*		imaginary) for a pass through inner loop.
*
*		Note that (bit-reversed) coefficients for higher order FFT (1024
*		point) can be used unchanged as coefficients for a lower order 
*		FFT (512, 256, 128 ... ,2)
*
*		The routine can be used to implement Inverse-FFT by any ONE of 
*		the following methods:
*
*		1.Inputs (x) are replaced by their Complex-conjugate values 
*		  Output values are divided by N
*		2.FFT Coefficients (w) are replaced by their Complex-conjugates
*		  Output values are divided by N
*		3.Swap Real and Imaginary values of input
*		4.Swap Real and Imaginary values of output
*		
*	TECHNIQUES
*
*		1. The inner two loops are combined into one inner loop whose 
*		   loop count is N/2. 
*		2. The first 4 cycles of inner loop prolog are scheduled in 
*		   parallel with the outer loop.
*		3. Load counter is not used, so extreneous loads are performed
*		4. Variables n and c share the register A6 and variables w and 
*		   nsave share the register B4.
*	
*	ASSUMPTIONS
*
*		N is a integral power of 2 (4, 8,16,32 ...) and the FFT 
*		dimension (N) must atleast be 4.
*		The FFT Coefficients w are in bit-reversed order
*		The elements of input array x are in normal order
*		The imaginary coefficients of w are negated as {cos(d*0), 
*		sin(d*0), cos(d*1), sin(d*1) ...} as opposed to the normal 
*		sequence of {cos(d*0), -sin(d*0), cos(d*1), -sin(d*1) ...} 
*		where d = 2*PI/N.
*		
*	MEMORY NOTE
*
*		Arrays x (data) and w (coefficients) must reside in 
*		different memory banks to avoid memory conflicts.  If Data and 
*		Coefficents do reside in the same memory bank, add (N/2) + log2(N) + 1
*		cycles to the cycles equation below. The memory bank hits are due to 
*		scheduling of assembly code and also due to extreneous loads 
*		causing bank hits.
*
*		Data and Coefficents must be aligned on an 8 byte boundary.
*
*	CYCLES
*
*		((2*N) + 23)*log2(N) + 6
*
*		For N=1024, Cycles = 20716
*
*	NOTATIONS
*
*		f = Function Prolog or Epilog
*		o = Outer Loop
*		p = Inner Loop Prolog
*
*===============================================================================
	.global	_cfftr2_dit
	.text

_cfftr2_dit:

	STW	.D2T2	B14,*B15--(48)
	STW	.D2T2	B3,*+B15(4)
	STW	.D2T1	A10,*+B15(8)
	STW	.D2T1	A11,*+B15(12)
	STW	.D2T1	A12,*+B15(16)
	STW	.D2T1	A13,*+B15(20)
	STW	.D2T1	A14,*+B15(24)
	STW	.D2T1	A15,*+B15(28)
	STW	.D2T2	B10,*+B15(32)
	STW	.D2T2	B11,*+B15(36)
	STW	.D2T2	B12,*+B15(40)
	STW	.D2T2	B13,*+B15(44)

* Begin Benchmark Timing

	ADDAW	.D1	A4,A6,A3	; f xx2 = x + n*4
||	MV	.L2	B4,B12		; f wsave = w
||	SHRU	.S2X	A6,1,B4		; f nsave = n>>1
||	MV	.L1X	B4,A5		; f w1 = w
||	MVK	.S1	1,A14		; f onea = 1

	MV	.S2X	A3,B8		; f xx1 = xx2
||	MV	.L2	B4,B0		; f i = nsave
||	SHL	.S1	A6,2,A10	; f k1 = n<<2

	LDDW	.D2	*B8++,B7:B6	; p @ t2_1:t2_0 = *xx1++
||	LDDW	.D1	*A5++,A7:A6	; p @ s:c = *w1
||	MPY	.M2	B0,1,B13	; f ireset = i*1
||	ADD	.L2X	A6,1,B9		; f bk = n+1
||	MV	.S2	B8,B5		; f xx3 = xx1

	MV	.L1	A4,A11		; f xx4 = x
||	SHL	.S2X	A6,2,B1		; f k = n * 4

	MV	.L1X	B9,A0		; f bk1 = bk
||[B0]	SUB	.L2	B0,1,B0		; p if (i) i = i-1
||	MV	.S1	A4,A3		; f xx2 = x
||	MVK	.S2	1,B14		; f oneb = 1

  [!B0]	ADD	.L2	B8,B1,B8	; p if (!i) xx1 = xx1 + k
||	MV	.S2	B13,B2		; f t3_ctr = ireset
||	MV	.L1X	B13,A1		; f st_ctr = ireset

oloop:

	LDDW	.D2	*B8++,B7:B6	; p @@ t2_1:t2_0 = *xx1++
||[!B0]	LDDW	.D1	*A5++,A7:A6	; p @@ if (!i) s:c = *w1
||[!B0]	MPY	.M2	B14,B13,B0	; p if (!i) i = ireset*1

	MPYSP	.M1X	A6,B6,A13	; p rtemp2 = c*t2_0
||	MPYSP	.M2X	A6,B7,B11	; p itemp2 = c*t2_1

  [B0]	SUB	.S2	B0,1,B0		; p if (i) i = i-1
||	SUB	.L1X	B4,1,A2		; f l = nsave - 1

	MPYSP	.M1X	A7,B7,A15	; p rtemp3 = s*t2_1
||	MPYSP	.M2X	A7,B6,B3	; p itemp3 = s*t2_0
||[!B0]	ADD	.L2	B8,B1,B8	; p if (!i) xx1 = xx1 + k

	LDDW	.D2	*B8++,B7:B6	; p @@@ t2_1:t2_0 = *xx1++
||[!B0]	LDDW	.D1	*A5++,A7:A6	; p @@@ if (!i) s:c = *w1
||[!B0]	MPY	.M2	B14,B13,B0	; p if (!i) i = ireset*1

	MPYSP	.M1X	A6,B6,A13	; p rtemp2 = c*t2_0
||	MPYSP	.M2X	A6,B7,B11	; p itemp2 = c*t2_1

  [B0]	SUB	.S2	B0,1,B0		; p if (i) i = i-1

	LDDW	.D1	*A3++,A9:A8	; p @ t3_1:t3_0 = *xx2++
||	MPYSP	.M1X	A7,B7,A15	; p rtemp3 = s*t2_1
||	MPYSP	.M2X	A7,B6,B3	; p itemp3 = s*t2_0
||[!B0]	ADD	.D2	B8,B1,B8	; p if (!i) xx1 = xx1 + k
||	ADDSP	.L1	A13,A15,A12	; p rtemp1 = rtemp2 + rtemp3
||	SUBSP	.L2	B11,B3,B10	; p itemp1 = itemp2 - itemp3

	LDDW	.D2	*B8++,B7:B6	; p @@@@ t2_1:t2_0 = *xx1++
||[!B0]	LDDW	.D1	*A5++,A7:A6	; p @@@@ if (!i) s:c = *w1
||[!B0]	MPY	.M2	B14,B13,B0	; p if (!i) i = ireset*1
||[B2]	SUB	.S2	B2,1,B2		; p if (t3_ctr) t3_ctr -= 1

	MPYSP	.M1X	A6,B6,A13	; p rtemp2 = c*t2_0
||	MPYSP	.M2X	A6,B7,B11	; p itemp2 = c*t2_1
||[!B2]	ADD	.S1	A3,A10,A3	; p if (!t3_ctr) xx2 = xx2 + k1

  [B0]	SUB	.S2	B0,1,B0		; p if (i) i = i-1
||[!B2]	MPY	.M2	B14,B13,B2	; p if (!t3_ctr) t3_ctr = ireset*1

	LDDW	.D1	*A3++,A9:A8	; p @@ t3_1:t3_0 = *xx2++
||	MPYSP	.M1X	A7,B7,A15	; p rtemp3 = s*t2_1
||	MPYSP	.M2X	A7,B6,B3	; p itemp3 = s*t2_0
||[!B0]	ADD	.D2	B8,B1,B8	; p if (!i) xx1 = xx1 + k
||	ADDSP	.L1	A13,A15,A12	; p rtemp1 = rtemp2 + rtemp3
||	SUBSP	.L2	B11,B3,B10	; p itemp1 = itemp2 - itemp3

	LDDW	.D2	*B8++,B7:B6	; p @@@@@ t2_1:t2_0 = *xx1++
||[!B0]	LDDW	.D1	*A5++,A7:A6	; p @@@@@ if (!i) s:c = *w1
||[!B0]	MPY	.M2	B14,B13,B0	; p if (!i) i = ireset*1
||[B2]	SUB	.S2	B2,1,B2		; p if (t3_ctr) t3_ctr -= 1
||	SUBSP	.L1	A8,A12,A15	; p rtemp3 = t3_0 - rtemp1
||	SUBSP	.L2X	A9,B10,B3	; p itemp3 = t3_1 - itemp1

	MPYSP	.M1X	A6,B6,A13	; p rtemp2 = c*t2_0
||	MPYSP	.M2X	A6,B7,B11	; p itemp2 = c*t2_1
||[!B2]	ADD	.S1	A3,A10,A3	; p if (!t3_ctr) xx2 = xx2 + k1
||	B	.S2	iloop

  [B0]	SUB	.S2	B0,1,B0		; p if (i) i = i-1
||[!B2]	MPY	.M2	B14,B13,B2	; p if (!t3_ctr) t3_ctr =ireset*1
||	ADDSP	.L1	A8,A12,A15	; p rtemp3 = t3_0 + rtemp1
||	ADDSP	.L2X	A9,B10,B3	; p itemp3 = t3_1 + itemp1

; Kernel Loop Begins
 
iloop:

	LDDW	.D1	*A3++,A9:A8	; @@@ t3_1:t3_0 = *xx2++
||	MPYSP	.M1X	A7,B7,A15	; rtemp3 = s*t2_1
||	MPYSP	.M2X	A7,B6,B3	; itemp3 = s*t2_0
||[!B0]	ADD	.D2	B8,B1,B8	; if (!i) xx1 = xx1 + k
||	ADDSP	.L1	A13,A15,A12	; rtemp1 = rtemp2 + rtemp3
||	SUBSP	.L2	B11,B3,B10	; itemp1 = itemp2 - itemp3
||[!A1]	ADD	.S2	B5,B1,B5	; if (!st_ctr) xx3 = xx3 + k
||[!A1]	ADD	.S1	A11,A10,A11	; if (!st_ctr) xx4 = xx4 + k1

	LDDW	.D2	*B8++,B7:B6	; @@@@@@ t2_1:t2_0 = *xx1++
||[!B0]	LDDW	.D1	*A5++,A7:A6	; @@@@@@ if (!i) s:c = *w1
||[!B0]	MPY	.M2	B14,B13,B0	; if (!i) i = ireset*1
||[B2]	SUB	.S2	B2,1,B2		; if (t3_ctr) t3_ctr -= 1
||	SUBSP	.L1	A8,A12,A15	; rtemp3 = t3_0 - rtemp1
||	SUBSP	.L2X	A9,B10,B3	; itemp3 = t3_1 - itemp1
||[A2]	SUB	.S1	A2,1,A2		; if (l) l = l-1
||[!A1]	MPY	.M1X	A14,B13,A1	; if (!st_ctr) st_ctr = ireset*1

	MPYSP	.M1X	A6,B6,A13	; rtemp2 = c*t2_0
||	MPYSP	.M2X	A6,B7,B11	; itemp2 = c*t2_1
||[!B2]	ADD	.S1	A3,A10,A3	; if (!t3_ctr) xx2 = xx2 + k1
||[A2]	B	.S2	iloop		; Branch iloop
||	STW	.D2T1	A15,*B5++[2]	; *xx3++[2] = rtemp3
||	STW	.D1T2	B3,*+A11[A0]	; *+xx4[bk1] = itemp3

  [B0]	SUB	.S2	B0,1,B0		; if (i) i = i-1
||[!B2]	MPY	.M2	B14,B13,B2	; if (!t3_ctr) t3_ctr = ireset*1
||	ADDSP	.L1	A8,A12,A15	; rtemp3 = t3_0 + rtemp1
||	ADDSP	.L2X	A9,B10,B3	; itemp3 = t3_1 + itemp1
||	STW	.D1	A15,*A11++[2]	; *xx4++[2] = rtemp3
||	STW	.D2	B3,*-B5[B9]	; *-xx3[bk] = itemp3
||[A1]	SUB	.S1	A1,1,A1		; if (st_ctr) st_ctr=st_ctr-1

; Kernel Loop Ends

	MV	.S2	B13,B1		; o k = ireset
||	MV	.S1X	B13,A2		; o l = ireset		

	SUB	.S1X	B1,1,A2		; o l = k - 1
||	SHRU	.S2	B13,1,B0	; o i = ireset>>1
||	ADDAW	.D1	A4,A2,A3	; o xx2 = x + 4*l

  [A2]	B	.S1	oloop
||[!A2]	LDW	.D2	*+B15(4),B3	; o if (!l) pop B3
||	MV	.S2X	A3,B8		; o xx1 = xx2

	MV	.L1X	B12,A5		; o w1 = wsave
||[!A2]	LDDW	.D2T1	*+B15(8),A11:A10; o if (!l) pop A11:A10

  [A2]	LDDW	.D2	*B8++,B7:B6	; p @ if (l) t2_1:t2_0 = *xx1++
||[A2]	LDDW	.D1	*A5++,A7:A6	; p @ if (l) s:c = *w1
||	SHL	.S1X	B1,2,A10	; f k1 = k<<2
||	MPY	.M2	B0,1,B13	; f ireset = i*1
||	ADD	.L2	B1,1,B9		; f bk = k + 1

	MV	.L1	A4,A11		; f xx4 = x
||	SHL	.S2	B1,2,B1		; f k = k<<2
||	MV	.L2X	A3,B5		; f xx3 = xx2
||[!A2]	LDDW	.D2T1	*+B15(16),A13:A12; o if (!l) pop A13:A12

	MV	.L1X	B9,A0		; f bk1 = bk
||[B0]	SUB	.L2	B0,1,B0		; p if (i) i = i - 1
||	MV	.S1	A4,A3		; f xx2 = x
||[!A2]	LDDW	.D2T1	*+B15(24),A15:A14; o if (!l) pop A15:A14

  [!B0]	ADD	.L2	B8,B1,B8	; p if (!i) xx1 = xx1 + k
||	MV	.S2	B13,B2		; f t3_ctr = ireset
||	MV	.L1X	B13,A1		; f st_ctr = ireset
||[!A2]	LDDW	.D2T2	*+B15(32),B11:B10; o if (!l) pop B11:B10

;-----------------------------------------------
* End Benchmark Timing

	LDDW	.D2T2	*+B15(40),B13:B12
||	B	.S2	B3

	LDW	.D2T2	*++B15(48),B14
	NOP		4

0 BAS over 15 years ago in reply to BAS

Expert 2370 points

Please guys, reply !

Processors

Processors forum

C6713- Real Time FFT Computation Algorithm