This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Hi,
I am trying to compute FFT of a sine wave given as an input to C6713 DSK but I don't know which FFT program should I use to do so. I also downloaded C67xDSPLIB functions and there I found so many FFT programs in the support folder but not sure which one will full fill my requirement.
Below is the part of program which is sampling input signal and storing in the buffer of size 1024 now I need to compute FFT on this 1024 points of data.
interrupt void c_int11() //interrupt service routine
{
output_sample((short)input_sample());//input/output data
buffer[i] =((short)input_sample()); //store data in buffer
i++; //increment buffer index
if (i==BUFFER_SIZE) i = 0; //reinit index if buffer full
return; //return from ISR
}
Can any one of please explain me which FFT program from the DSPLIB folder could be used to perform FFT ? How can I use this buffer's value as an input to to FFT function ?
Please guys help me out.
BR,
BS
I'm not a DSPLIB expert, but you can find more information about the various FFT functions from the appnote and userguide located here, http://focus.ti.com/docs/toolsw/folders/print/sprc121.html#Technical%20Documents. The documents will provide a brief description of each function, as well as the required parameters needed to make the function call.
- Christina
Hi,
I have collection of data points taken from Oscilloscope for sine wave means digital data points now I am
interested to see the frequency
response of those data. Can any one please help me how to read
those data from PC using C6713 in order to compute FFT ?
Waiting
for your help.
Thanks
Hi
Do you have access to Matlab, Simulink tools?
http://www.mathworks.com/products/new_products/release14.html#TIC6000
Typically that is a good way to accomplish what you are trying to do.
Regards
Mukul
Hi,
I don't have access to Simulink tool, there was no Matlab CD with my DSP kit instead there is a flyer from MathWorks indicating the web address for the 30-day trial of Matlab but I am still waiting for the download link which supposed to be sent by MathWorks affter getting register there.
Please help me downloading Simulink tool and elaborate how can I use it for my task.
I have data consisting of 1 Million points, how can I load them in any register in C6713 to compute FFT ?
Please respond me ASAP.
Thanks.
BAS said:Please help me downloading Simulink tool and elaborate how can I use it for my task.
That is a Mathworks offering. We cannot help on this ( I see a trial software link) . Please refer to the link provided in my previous post.
BAS said:I have data consisting of 1 Million points, how can I load them in any register in C6713 to compute FFT ?
I don't completely understand this statement. I feel like some basic concepts are missing in what you are trying to accomplish. If this is a critical product/application you are working on , please try to get local TI support in your region to work on this. IMO there are several online labs/presentations/tutorials that help on how c6713DSK can be used for signal processing. It seems to me that you would need to start with those. I am not aware of any canned documentation collateral to provide to you in what you are trying to accomplish.
Hi Guys,
I am computing FFT using TI’s radix-2 optimized FFT function (cfftr2_dit), the function for generating the index for bit reversal (digitrev_index), and the function for the bit reversal procedure (bitrev). When I am compiling this program, I am getting following errors.
undefined first referenced
symbol in file
--------- ----------------
_cfftr2_dit C:\CCStudio_v3.1\MyProjects\FFTr2\Debug\FFTr2.obj
_bitrev C:\CCStudio_v3.1\MyProjects\FFTr2\Debug\FFTr2.obj
error: symbol referencing errors - './Debug/FFTr2.out' not built
>> Compilation failure
I don't know whats wrong in there.
One thing is that files bitrev.sa and cfftr2_dit.sa are appearing as a text file in Code Composer Studio, I just copied them as it is in notepad and saved them in .sa format. I don't know how to use .sa files. My main code is mentioned below. Please help me to run this program.
#include "dsk6713_aic23.h"
Uint32 fs=DSK6713_AIC23_FREQ_8KHZ; //set sampling rate
#include <math.h>
#define N 256 //number of FFT points
#define RADIX 2 //radix or base
#define DELTA (2*PI)/N //argument for sine/cosine
#define PI 3.14159265358979
short i = 0;
short iTwid[N/2]; //index for twiddle constants W
short iData[N]; //index for bitrev X
float Xmag[N]; //magnitude spectrum of x
typedef struct Complex_tag {float re,im;}Complex;
Complex W[N/RADIX]; //array for twiddle constants
Complex x[N]; //N complex data values
#pragma DATA_ALIGN(W,sizeof(Complex)) //align W on boundary
#pragma DATA_ALIGN(x,sizeof(Complex)) //align input x on boundary
void main()
{
for( i = 0 ; i < N/RADIX ; i++ )
{
W[i].re = cos(DELTA*i); //real component of W
W[i].im = sin(DELTA*i); //neg imag component
} //see cfftr2_dit
digitrev_index(iTwid, N/RADIX, RADIX);//produces index for bitrev() W
bitrev(W, iTwid, N/RADIX); //bit reverse W
comm_poll(); //init DSK,codec,McBSP
for(i=0; i<N; i++)
Xmag[i] = 0; //init output magnitude
while (1) //infinite loop
{
output_sample(32000); //negative spike for reference
for( i = 0 ; i < N ; i++ )
{
x[i].re = (float)((short)input_sample()); //external input
x[i].im = 0.0 ; //zero imaginary part
if(i>0) output_sample((short)Xmag[i]); //output magnitude
}
cfftr2_dit(x, W, N ) ; //TI floating-pt complex FFT
digitrev_index(iData, N, RADIX); //produces index for bitrev() X
bitrev(x, iData, N); //freq scrambled->bit-reverse x
for (i =0; i<N; i++)
Xmag[i] = sqrt(x[i].re*x[i].re+x[i].im*x[i].im)/32; //magnitude of X
}
}
BR,
BS
Here attached .sa files. I just saved them as .sa and added into project in Code Composer Studio.
bitrev.sa
===============================================================================
*
* TEXAS INSTRUMENTS, INC.
*
* Copyright © Texas Instruments Incorporated 1998
*
* TI retains all right, title and interest in this code and authorizes its
* use solely and exclusively with digital signal processing devices
* manufactured by or for TI. This code is intended to provide an
* understanding of the benefits of using TI digital signal processing devices.
* It is provided "AS IS". TI disclaims all warranties and representations,
* including but not limited to, any warranty of merchantability or fitness
* for a particular purpose. This code may contain irregularities not found
* in commercial software and is not intended to be used in production
* applications. You agree that prior to using or incorporating this code
* into any commercial product you will thoroughly test that product and the
* functionality of the code in that product and will be solely responsible
* for any problems or failures.
*
* TI retains all rights not granted herein.
*
*
* Linear Time, Small Lookup Table: Bit Reversal
*
* Revision Date: 5/12/98
*
* USAGE This routine is C Callable and can be called as:
*
* void bitrev(float *x, short *index, int n){
*
* x = Input Array to be Bit-Reversed
* n = Number of points in array must be a power of 2)
* index = Array of ~sqrt(n) created by the routine
* digitrev_index found below to allow the fast
* implementation of the bit-reversal
*
* If routine is not to be used as a C callable function
* then all instructions relating to stack should be removed.
* Refer to comments of individual instructions. You will also
* need to initialize values for all of the values passed as these
* are assumed to be in registers as defined by the calling
* convention of the compiler, (refer to the C compiler reference
* guide).
*
* C Code This is the C equivalent of the Assembly Code without
* restrictions. Note that the assembly code is hand optimized
* and restrictions may apply
*
* TI retains all rights, title and interest in this code and only
* authorizes the use of this code on TI TMS320 DSPs manufactured by TI.
*
* void bitrev(float *xs, short *index, int n){
* int i;
* short i0, i1, i2, i3;
* short j0, j1, j2, j3;
* double xi0, xi1, xi2, xi3;
* double xj0, xj1, xj2, xj3;
* short t;
* int a, b, ia, ib, ibs;
* int mask;
* int nbits, nbot, ntop, ndiff, n2, halfn;
* double *x ;
* x = (double *)xs ;
*
* nbits = 0;
* i = n;
* while (i > 1)
* {
* i = i >> 1;
* nbits++;
* }
*
* nbot = nbits >> 1;
* ndiff = nbits & 1;
* ntop = nbot + ndiff;
* n2 = 1 << ntop;
* mask = n2 - 1;
* halfn = n >> 1;
*
* for (i0 = 0; i0 < halfn; i0 += 2)
* {
* b = i0 & mask;
* a = i0 >> nbot;
* if (!b) ia = index[a];
* ib = index[b];
* ibs = ib << nbot;
*
* j0 = ibs + ia;
* t = i0 < j0;
* xi0 = x[i0];
* xj0 = x[j0];
*
* if (t)
* {
* x[i0] = xj0;
* x[j0] = xi0;
* }
*
* i1 = i0 + 1;
* j1 = j0 + halfn;
* xi1 = x[i1];
* xj1 = x[j1];
* x[i1] = xj1;
* x[j1] = xi1;
*
* i3 = i1 + halfn;
* j3 = j1 + 1;
* xi3 = x[i3];
* xj3 = x[j3];
* if (t)
* {
* x[i3] = xj3;
* x[j3] = xi3;
* }
* }
* }
*
* DESCRIPTION
* This routine performs the bit-reversal of the input array x[].
* where x[] is an array of length n 32-bit complex pairs of data.
* This requires the index array provided by the program below.
* This index should be generated at compile time not by the DSP.
*
* ASSUMPTIONS
* n is a power of 2
*
* NOTE: If n <= 4K one can use the char (8-bit) data type for
* the "index" variable. This would require changing the LDH when
* loading index values in the assembly routine to LDB. This would
* further reduce the size of the Index Table by half its size.
*
* CYCLES (n/4)*11 + 9
*
* MEMORY NOTE
* There are NO memory bank hits regarless of array alignment
*
*******************************************************************************
* Use This Routine To Generate the Index Table for
* Bit/Digit Reversing of Radix-2 and Radix-4 Routines
*******************************************************************************
* This routine calculates the index for digitrev of length n
* (length of index is 2^(radix*ceil(k/radix)) where n = 2^k
* in otherwords
* Either:sqrt(n) when n=2^even# Or: sqrt(2)*sqrt(n) when n=2^odd# [radix 2]
* sqrt(n) when n=4^even# Or: sqrt(4)*sqrt(n) when n=4^odd# [radix 4]
* Note: the variable "radix" is 2 for radix-2 and 4 for radix-4
*******************************************************************************
*
* void digitrev_index(short *index, int n, int radix){
*
* int i,j,k;
* short nbits, nbot, ntop, ndiff, n2, raddiv2;
*
* nbits = 0;
* i = n;
* while (i > 1){
* i = i >> 1;
* nbits++;
* }
*
* raddiv2 = radix >> 1;
* nbot = nbits >> raddiv2;
* nbot = nbot << raddiv2 - 1;
* ndiff = nbits & raddiv2;
* ntop = nbot + ndiff;
* n2 = 1 << ntop;
*
* index[0] = 0;
* for ( i = 1, j = n2/radix + 1; i < n2 - 1; i++){
* index[i] = j - 1;
* for (k = n2/radix; k*(radix-1) < j; k /= radix)
* j -= k*(radix-1);
* j += k;
* }
* index[n2 - 1] = n2 - 1;
* }
*
* TECHNIQUES
*
* 1. The following registers are sharead to reduce register pressure
* A8 (between) xi2, xi4, n2, nbits
* A9 (between) xj2, xj4, tmp, ndiff
* A10 (between) xi0, cnst
* A11 (between) xj0, ntop
* B4 (between) index, ptr_i1
* B6 (between) xi1, xi3
* B7 (between) xj1, xj3
* 2. The first set of indices ia and ib are loaded before
* loop kernel
* 3. Three load pointers are used to decouple 3 load/stores
* 4. Memory bank hits are eliminated by loading odd/even
* index pair in one cycle
*
*******************************************************************************
.global _bitrev
.text
_bitrev:
STW .D2T2 B14,*B15--(48)
STW .D2T2 B3,*+B15(4)
STW .D2T1 A10,*+B15(8)
STW .D2T1 A11,*+B15(12)
STW .D2T1 A12,*+B15(16)
STW .D2T1 A13,*+B15(20)
STW .D2T1 A14,*+B15(24)
STW .D2T1 A15,*+B15(28)
STW .D2T2 B10,*+B15(32)
STW .D2T2 B11,*+B15(36)
STW .D2T2 B12,*+B15(40)
STW .D2T2 B13,*+B15(44)
; Begin Benchmark Timing
LDH .D2T1 *B4, A13 ; ib = *index
|| SHR .S2X A6, 2, B1 ; icntr = n >> 2
|| MVK .S1 31, A10 ; cnst = 31
|| LMBD .L1 1, A6, A9 ; tmp = lmbd(1, n)
|| ZERO .L2 B13 ; i0 = 0
SHR .S2X A6, 1, B5 ; halfn = n >> 1
|| SHL .S1 A6, 2, A12 ; halfnbytes = n << 2
|| LDH .D2 *B4, B3 ; ia = *index
|| SUB .L1 A10, A9, A8 ; nbits = cnst - tmp
SHR .S1 A8, 1, A14 ; nbot1 = nbits >> 1
|| AND .L1 A8, 1, A9 ; ndiff = nbits & 1
ADD .L1 A14, A9, A11 ; ntop = nbot1 + ndiff
|| MVK .S1 1, A9 ; tmp = 1
SHL .S1 A9, A11, A8 ; n2 = tmp << ntop
|| ADD .S2 B13,2,B2 ; aa = i0 + 2
|| ADD .L1X B13,2,A1 ; bb = i0 + 2
|| MV .L2X A14, B14 ; nbot = nbot1
SHL .S1 A13, A14, A13 ; ib = ib << nbot1
|| SUB .D1 A8, 1, A15 ; mask = n2 - 1
ADD .L1X A13, B3, A0 ; j0 = ib + ia
|| SHR .S1 A6, 1, A6 ; n = n >> 1
SHL .S1 A0,3,A3 ; ptr_y0 = j0 << 3
|| MV .L1X B4, A7 ; ptr_i0 = index
MV .L2X A4,B11 ; ptr_y1 = x
|| ADD .L1 A4,A3,A3 ; ptr_y0 = x + ptr_y0
|| AND .S1 A1, A15, A1 ; bb = bb & mask
|| SHR .S2 B2, B14, B2 ; aa = aa >> nbot
LOOP:
LDDW .D2T1 *++B11[1], A9:A8 ; xj2:xi2 = *++ptr_y1[1]
|| LDDW .D1T2 *++A3[A6], B7:B6 ; xj3:xi3 = *++ptr_y0[n]
|| ADD .S2 B11,8,B12 ; ptr_z1 = ptr_y1 + 8
|| ADD .L1 A3,A12,A5 ; ptr_z0 = ptr_y0 + halfnbytes
|| CMPLT .L2X B13, A0, B0 ; if (i0 < j0) {t=1} else {t=0}
LDH .D1 *+A7[A1], A13 ; ib = *+ptr_i0[bb]
|| MV .L1 A4, A2 ; ptr_x0 = x
|| MV .L2X A4, B10 ; ptr_x1 = x
[B0] LDDW .D2T1 *++B10[B13], A11:A10 ; if (t) xj0:xi0 = *++ptr_x1[i0]
||[B0] LDDW .D1T2 *++A5[1], B9:B8 ; if (t) xj5:xi5 = *++ptr_z0[1]
|| ADD .L2 B13, 2, B13 ; i0 = i0 + 2
[!A1] LDH .D2 *+B4[B2], B3 ; if (!bb) ia = *+ptr_i1[aa]
[B0] LDDW .D1T2 *++A2[A0], B7:B6 ; if (t) xj1:xi1 = *++ptr_x0[j0]
||[B0] LDDW .D2T1 *++B12[B5], A9:A8 ; if (t)xj4:xi4=*++ptr_z1[halfn]
||[B1] SUB .S2 B1, 1, B1 ; if (icntr) icntr -=1
|| ADD .L2 B13,2,B2 ; aa = i0 + 2
|| ADD .L1X B13,2,A1 ; bb = i0 + 2
STW .D1T1 A8, *A3++[1] ; *ptr_y0++[1] = xi2
|| STW .D2T2 B7, *++B11[1] ; *++ptr_y1[1] = xj3
||[B1] B .S2 LOOP
STW .D1T1 A9, *A3--[1] ; *ptr_y0--[1] = xj2
|| STW .D2T2 B6, *--B11[1] ; *--ptr_y1[1] = xi3
|| SHL .S1 A13, A14, A13 ; ib = ib << nbot1
[B0] STW .D1T1 A10, *A2++[1] ; if (t) *ptr_x0++[1] = xi0
||[B0] STW .D2T2 B9, *++B12[1] ; if (t) *++ptr_z1[1] = xj5
|| AND .S1 A1, A15, A1 ; bb = bb & mask
|| SHR .S2 B2, B14, B2 ; aa = aa >> nbot
[B0] STW .D1T1 A11, *A2--[1] ; if (t) *ptr_x0--[1] = xj0
||[B0] STW .D2T2 B8, *--B12[1] ; if (t) *--ptr_z1[1] = xi5
|| ADD .L1X A13, B3, A0 ; j0 = ib + ia
[B0] STW .D2T2 B7, *++B10[1] ; if (t) *++ptr_x1[1] = xj1
||[B0] STW .D1T1 A8, *A5++[1] ; if (t) *ptr_z0++[1] = xi4
|| SHL .S1 A0,3,A3 ; ptr_y0 = j0 << 3
|| SHL .S2 B13,3,B11 ; ptr_y1 = i0 << 3
[B0] STW .D2T2 B6, *--B10[1] ; if (t) *--ptr_x1[1] = xi1
||[B0] STW .D1T1 A9, *A5--[1] ; if (t) *ptr_z0--[1] = xj4
|| ADD .L2X A4,B11,B11 ; ptr_y1 = x + ptr_y1
|| ADD .L1 A4,A3,A3 ; ptr_y0 = x + ptr_y0
;---------------------------------------------------------------------------
; End Benchmark Timing
LDW .D2 *+B15(4),B3 ; pop return address
LDDW .D2T1 *+B15(8),A11:A10
LDDW .D2T1 *+B15(16),A13:A12
LDDW .D2T1 *+B15(24),A15:A14
LDDW .D2T2 *+B15(32),B11:B10
LDDW .D2T2 *+B15(40),B13:B12
|| B .S2 B3 ; Return to calling function
LDW .D2T2 *++B15(48),B14
NOP 4
cfftr2_dit.sa
===============================================================================
*
* From: https://www-a.ti.com/apps/c6000/xt_download.asp?sku=C67x_cfftr2
*
* TEXAS INSTRUMENTS, INC.
*
* Copyright © Texas Instruments Incorporated 1998
*
* TI retains all right, title and interest in this code and authorizes its
* use solely and exclusively with digital signal processing devices
* manufactured by or for TI. This code is intended to provide an
* understanding of the benefits of using TI digital signal processing devices.
* It is provided "AS IS". TI disclaims all warranties and representations,
* including but not limited to, any warranty of merchantability or fitness
* for a particular purpose. This code may contain irregularities not found
* in commercial software and is not intended to be used in production
* applications. You agree that prior to using or incorporating this code
* into any commercial product you will thoroughly test that product and the
* functionality of the code in that product and will be solely responsible
* for any problems or failures.
*
* TI retains all rights not granted herein.
*
*
* RADIX-2 FFT (DIT)
*
* Revision Date: 5/21/98
*
* USAGE
*
* This routine is C Callable and can be called as:
*
* void cfftr2_dit( float *x, const float *w, short N)
*
* x Pointer to Array of Dimension 2*N elements holding
* Input to and Outputs from function cfftr2_dit()
* w Pointer to an array holding the coefficient (Dimension
* n/2 complex numbers)
* N Number of complex points in x
*
* If routine is not to be used as a C callable function then
* you need to initialize values for all of the values passed
* as these are assumed to be in registers as defined by the
* calling convention of the compiler, (refer to the C compiler
* reference guide).
*
* ARGUMENTS PASSED -> REGISTER
* ---------------------------------
* x -> A4
* w -> B4
* N -> A6
*
* C CODE
*
* This is the C equivalent of the assembly code. Note that
* the assembly code is hand optimized and restrictions may
* apply.
*
* void cfftr2_dit(float* x, float* w, short n)
* {
* short n2, ie, ia, i, j, k, m;
* float rtemp, itemp, c, s;
*
* n2 = n;
* ie = 1;
*
* for(k=n; k > 1; k >>= 1)
* {
* n2 >>= 1;
* ia = 0;
* for(j=0; j < ie; j++)
* {
* c = w[2*j];
* s = w[2*j+1];
* for(i=0; i < n2; i++)
* {
* m = ia + n2;
* rtemp = c * x[2*m] + s * x[2*m+1];
* itemp = c * x[2*m+1] - s * x[2*m];
* x[2*m] = x[2*ia] - rtemp;
* x[2*m+1] = x[2*ia+1] - itemp;
* x[2*ia] = x[2*ia] + rtemp;
* x[2*ia+1] = x[2*ia+1] + itemp;
* ia++;
* }
* ia += n2;
* }
* ie <<= 1;
* }
* }
*
*
* DESCRIPTION
*
* This routine performs the Decimation-in-Time (DIT) Radix-2 FFT
* of the input array x.
* x has N complex floating point numbers arranged as successive
* real and imaginary number pairs. Input array x contains N
* complex points (N*2 elements). The coefficients for the
* FFT are passed to the function in array w which contains
* N/2 complex numbers (N elements) as successive real and
* imaginary number pairs.
* The FFT Coefficients w are in N/2 bit-reversed order
* The elements of input array x are in normal order
* The assembly routine performs 4 output samples (2 real and 2
* imaginary) for a pass through inner loop.
*
* Note that (bit-reversed) coefficients for higher order FFT (1024
* point) can be used unchanged as coefficients for a lower order
* FFT (512, 256, 128 ... ,2)
*
* The routine can be used to implement Inverse-FFT by any ONE of
* the following methods:
*
* 1.Inputs (x) are replaced by their Complex-conjugate values
* Output values are divided by N
* 2.FFT Coefficients (w) are replaced by their Complex-conjugates
* Output values are divided by N
* 3.Swap Real and Imaginary values of input
* 4.Swap Real and Imaginary values of output
*
* TECHNIQUES
*
* 1. The inner two loops are combined into one inner loop whose
* loop count is N/2.
* 2. The first 4 cycles of inner loop prolog are scheduled in
* parallel with the outer loop.
* 3. Load counter is not used, so extreneous loads are performed
* 4. Variables n and c share the register A6 and variables w and
* nsave share the register B4.
*
* ASSUMPTIONS
*
* N is a integral power of 2 (4, 8,16,32 ...) and the FFT
* dimension (N) must atleast be 4.
* The FFT Coefficients w are in bit-reversed order
* The elements of input array x are in normal order
* The imaginary coefficients of w are negated as {cos(d*0),
* sin(d*0), cos(d*1), sin(d*1) ...} as opposed to the normal
* sequence of {cos(d*0), -sin(d*0), cos(d*1), -sin(d*1) ...}
* where d = 2*PI/N.
*
* MEMORY NOTE
*
* Arrays x (data) and w (coefficients) must reside in
* different memory banks to avoid memory conflicts. If Data and
* Coefficents do reside in the same memory bank, add (N/2) + log2(N) + 1
* cycles to the cycles equation below. The memory bank hits are due to
* scheduling of assembly code and also due to extreneous loads
* causing bank hits.
*
* Data and Coefficents must be aligned on an 8 byte boundary.
*
* CYCLES
*
* ((2*N) + 23)*log2(N) + 6
*
* For N=1024, Cycles = 20716
*
* NOTATIONS
*
* f = Function Prolog or Epilog
* o = Outer Loop
* p = Inner Loop Prolog
*
*===============================================================================
.global _cfftr2_dit
.text
_cfftr2_dit:
STW .D2T2 B14,*B15--(48)
STW .D2T2 B3,*+B15(4)
STW .D2T1 A10,*+B15(8)
STW .D2T1 A11,*+B15(12)
STW .D2T1 A12,*+B15(16)
STW .D2T1 A13,*+B15(20)
STW .D2T1 A14,*+B15(24)
STW .D2T1 A15,*+B15(28)
STW .D2T2 B10,*+B15(32)
STW .D2T2 B11,*+B15(36)
STW .D2T2 B12,*+B15(40)
STW .D2T2 B13,*+B15(44)
* Begin Benchmark Timing
ADDAW .D1 A4,A6,A3 ; f xx2 = x + n*4
|| MV .L2 B4,B12 ; f wsave = w
|| SHRU .S2X A6,1,B4 ; f nsave = n>>1
|| MV .L1X B4,A5 ; f w1 = w
|| MVK .S1 1,A14 ; f onea = 1
MV .S2X A3,B8 ; f xx1 = xx2
|| MV .L2 B4,B0 ; f i = nsave
|| SHL .S1 A6,2,A10 ; f k1 = n<<2
LDDW .D2 *B8++,B7:B6 ; p @ t2_1:t2_0 = *xx1++
|| LDDW .D1 *A5++,A7:A6 ; p @ s:c = *w1
|| MPY .M2 B0,1,B13 ; f ireset = i*1
|| ADD .L2X A6,1,B9 ; f bk = n+1
|| MV .S2 B8,B5 ; f xx3 = xx1
MV .L1 A4,A11 ; f xx4 = x
|| SHL .S2X A6,2,B1 ; f k = n * 4
MV .L1X B9,A0 ; f bk1 = bk
||[B0] SUB .L2 B0,1,B0 ; p if (i) i = i-1
|| MV .S1 A4,A3 ; f xx2 = x
|| MVK .S2 1,B14 ; f oneb = 1
[!B0] ADD .L2 B8,B1,B8 ; p if (!i) xx1 = xx1 + k
|| MV .S2 B13,B2 ; f t3_ctr = ireset
|| MV .L1X B13,A1 ; f st_ctr = ireset
oloop:
LDDW .D2 *B8++,B7:B6 ; p @@ t2_1:t2_0 = *xx1++
||[!B0] LDDW .D1 *A5++,A7:A6 ; p @@ if (!i) s:c = *w1
||[!B0] MPY .M2 B14,B13,B0 ; p if (!i) i = ireset*1
MPYSP .M1X A6,B6,A13 ; p rtemp2 = c*t2_0
|| MPYSP .M2X A6,B7,B11 ; p itemp2 = c*t2_1
[B0] SUB .S2 B0,1,B0 ; p if (i) i = i-1
|| SUB .L1X B4,1,A2 ; f l = nsave - 1
MPYSP .M1X A7,B7,A15 ; p rtemp3 = s*t2_1
|| MPYSP .M2X A7,B6,B3 ; p itemp3 = s*t2_0
||[!B0] ADD .L2 B8,B1,B8 ; p if (!i) xx1 = xx1 + k
LDDW .D2 *B8++,B7:B6 ; p @@@ t2_1:t2_0 = *xx1++
||[!B0] LDDW .D1 *A5++,A7:A6 ; p @@@ if (!i) s:c = *w1
||[!B0] MPY .M2 B14,B13,B0 ; p if (!i) i = ireset*1
MPYSP .M1X A6,B6,A13 ; p rtemp2 = c*t2_0
|| MPYSP .M2X A6,B7,B11 ; p itemp2 = c*t2_1
[B0] SUB .S2 B0,1,B0 ; p if (i) i = i-1
LDDW .D1 *A3++,A9:A8 ; p @ t3_1:t3_0 = *xx2++
|| MPYSP .M1X A7,B7,A15 ; p rtemp3 = s*t2_1
|| MPYSP .M2X A7,B6,B3 ; p itemp3 = s*t2_0
||[!B0] ADD .D2 B8,B1,B8 ; p if (!i) xx1 = xx1 + k
|| ADDSP .L1 A13,A15,A12 ; p rtemp1 = rtemp2 + rtemp3
|| SUBSP .L2 B11,B3,B10 ; p itemp1 = itemp2 - itemp3
LDDW .D2 *B8++,B7:B6 ; p @@@@ t2_1:t2_0 = *xx1++
||[!B0] LDDW .D1 *A5++,A7:A6 ; p @@@@ if (!i) s:c = *w1
||[!B0] MPY .M2 B14,B13,B0 ; p if (!i) i = ireset*1
||[B2] SUB .S2 B2,1,B2 ; p if (t3_ctr) t3_ctr -= 1
MPYSP .M1X A6,B6,A13 ; p rtemp2 = c*t2_0
|| MPYSP .M2X A6,B7,B11 ; p itemp2 = c*t2_1
||[!B2] ADD .S1 A3,A10,A3 ; p if (!t3_ctr) xx2 = xx2 + k1
[B0] SUB .S2 B0,1,B0 ; p if (i) i = i-1
||[!B2] MPY .M2 B14,B13,B2 ; p if (!t3_ctr) t3_ctr = ireset*1
LDDW .D1 *A3++,A9:A8 ; p @@ t3_1:t3_0 = *xx2++
|| MPYSP .M1X A7,B7,A15 ; p rtemp3 = s*t2_1
|| MPYSP .M2X A7,B6,B3 ; p itemp3 = s*t2_0
||[!B0] ADD .D2 B8,B1,B8 ; p if (!i) xx1 = xx1 + k
|| ADDSP .L1 A13,A15,A12 ; p rtemp1 = rtemp2 + rtemp3
|| SUBSP .L2 B11,B3,B10 ; p itemp1 = itemp2 - itemp3
LDDW .D2 *B8++,B7:B6 ; p @@@@@ t2_1:t2_0 = *xx1++
||[!B0] LDDW .D1 *A5++,A7:A6 ; p @@@@@ if (!i) s:c = *w1
||[!B0] MPY .M2 B14,B13,B0 ; p if (!i) i = ireset*1
||[B2] SUB .S2 B2,1,B2 ; p if (t3_ctr) t3_ctr -= 1
|| SUBSP .L1 A8,A12,A15 ; p rtemp3 = t3_0 - rtemp1
|| SUBSP .L2X A9,B10,B3 ; p itemp3 = t3_1 - itemp1
MPYSP .M1X A6,B6,A13 ; p rtemp2 = c*t2_0
|| MPYSP .M2X A6,B7,B11 ; p itemp2 = c*t2_1
||[!B2] ADD .S1 A3,A10,A3 ; p if (!t3_ctr) xx2 = xx2 + k1
|| B .S2 iloop
[B0] SUB .S2 B0,1,B0 ; p if (i) i = i-1
||[!B2] MPY .M2 B14,B13,B2 ; p if (!t3_ctr) t3_ctr =ireset*1
|| ADDSP .L1 A8,A12,A15 ; p rtemp3 = t3_0 + rtemp1
|| ADDSP .L2X A9,B10,B3 ; p itemp3 = t3_1 + itemp1
; Kernel Loop Begins
iloop:
LDDW .D1 *A3++,A9:A8 ; @@@ t3_1:t3_0 = *xx2++
|| MPYSP .M1X A7,B7,A15 ; rtemp3 = s*t2_1
|| MPYSP .M2X A7,B6,B3 ; itemp3 = s*t2_0
||[!B0] ADD .D2 B8,B1,B8 ; if (!i) xx1 = xx1 + k
|| ADDSP .L1 A13,A15,A12 ; rtemp1 = rtemp2 + rtemp3
|| SUBSP .L2 B11,B3,B10 ; itemp1 = itemp2 - itemp3
||[!A1] ADD .S2 B5,B1,B5 ; if (!st_ctr) xx3 = xx3 + k
||[!A1] ADD .S1 A11,A10,A11 ; if (!st_ctr) xx4 = xx4 + k1
LDDW .D2 *B8++,B7:B6 ; @@@@@@ t2_1:t2_0 = *xx1++
||[!B0] LDDW .D1 *A5++,A7:A6 ; @@@@@@ if (!i) s:c = *w1
||[!B0] MPY .M2 B14,B13,B0 ; if (!i) i = ireset*1
||[B2] SUB .S2 B2,1,B2 ; if (t3_ctr) t3_ctr -= 1
|| SUBSP .L1 A8,A12,A15 ; rtemp3 = t3_0 - rtemp1
|| SUBSP .L2X A9,B10,B3 ; itemp3 = t3_1 - itemp1
||[A2] SUB .S1 A2,1,A2 ; if (l) l = l-1
||[!A1] MPY .M1X A14,B13,A1 ; if (!st_ctr) st_ctr = ireset*1
MPYSP .M1X A6,B6,A13 ; rtemp2 = c*t2_0
|| MPYSP .M2X A6,B7,B11 ; itemp2 = c*t2_1
||[!B2] ADD .S1 A3,A10,A3 ; if (!t3_ctr) xx2 = xx2 + k1
||[A2] B .S2 iloop ; Branch iloop
|| STW .D2T1 A15,*B5++[2] ; *xx3++[2] = rtemp3
|| STW .D1T2 B3,*+A11[A0] ; *+xx4[bk1] = itemp3
[B0] SUB .S2 B0,1,B0 ; if (i) i = i-1
||[!B2] MPY .M2 B14,B13,B2 ; if (!t3_ctr) t3_ctr = ireset*1
|| ADDSP .L1 A8,A12,A15 ; rtemp3 = t3_0 + rtemp1
|| ADDSP .L2X A9,B10,B3 ; itemp3 = t3_1 + itemp1
|| STW .D1 A15,*A11++[2] ; *xx4++[2] = rtemp3
|| STW .D2 B3,*-B5[B9] ; *-xx3[bk] = itemp3
||[A1] SUB .S1 A1,1,A1 ; if (st_ctr) st_ctr=st_ctr-1
; Kernel Loop Ends
MV .S2 B13,B1 ; o k = ireset
|| MV .S1X B13,A2 ; o l = ireset
SUB .S1X B1,1,A2 ; o l = k - 1
|| SHRU .S2 B13,1,B0 ; o i = ireset>>1
|| ADDAW .D1 A4,A2,A3 ; o xx2 = x + 4*l
[A2] B .S1 oloop
||[!A2] LDW .D2 *+B15(4),B3 ; o if (!l) pop B3
|| MV .S2X A3,B8 ; o xx1 = xx2
MV .L1X B12,A5 ; o w1 = wsave
||[!A2] LDDW .D2T1 *+B15(8),A11:A10; o if (!l) pop A11:A10
[A2] LDDW .D2 *B8++,B7:B6 ; p @ if (l) t2_1:t2_0 = *xx1++
||[A2] LDDW .D1 *A5++,A7:A6 ; p @ if (l) s:c = *w1
|| SHL .S1X B1,2,A10 ; f k1 = k<<2
|| MPY .M2 B0,1,B13 ; f ireset = i*1
|| ADD .L2 B1,1,B9 ; f bk = k + 1
MV .L1 A4,A11 ; f xx4 = x
|| SHL .S2 B1,2,B1 ; f k = k<<2
|| MV .L2X A3,B5 ; f xx3 = xx2
||[!A2] LDDW .D2T1 *+B15(16),A13:A12; o if (!l) pop A13:A12
MV .L1X B9,A0 ; f bk1 = bk
||[B0] SUB .L2 B0,1,B0 ; p if (i) i = i - 1
|| MV .S1 A4,A3 ; f xx2 = x
||[!A2] LDDW .D2T1 *+B15(24),A15:A14; o if (!l) pop A15:A14
[!B0] ADD .L2 B8,B1,B8 ; p if (!i) xx1 = xx1 + k
|| MV .S2 B13,B2 ; f t3_ctr = ireset
|| MV .L1X B13,A1 ; f st_ctr = ireset
||[!A2] LDDW .D2T2 *+B15(32),B11:B10; o if (!l) pop B11:B10
;-----------------------------------------------
* End Benchmark Timing
LDDW .D2T2 *+B15(40),B13:B12
|| B .S2 B3
LDW .D2T2 *++B15(48),B14
NOP 4