Accessing the hardware fft

Holo

Other Parts Discussed in Thread: TMS320C5505, TMS320C5515, TMS320VC5505

Hello,

I tried for some time to get the hardware fft to work, and found out that my code does work, but only when I do not use the printf() function. What does printf() change? Why does it corrupt the calculated data? I used the attached code to test this behaviour. When the two printf() in the attached code are uncommented it does no longer calculate the correct values.

Holo

over 15 years ago

0 Holo over 15 years ago

Prodigy 135 points

I found out that the array hwafft_br writes to must be aligned at an border two times the size of the array. So for an Int32 array of 8 elements, DATA_ALIGN must be 16. Why is this not documented in the system user guide? With correct alignement I can use printf and file I/O without crashing or data corruption. The next problem: some input values are calculated correct, some are not. Is there some guideline available for using the hardware accelerated fft? I wasted precious hours trying different alignements and datasets and found no solution.

Holo

0 Brett Hancock over 15 years ago

Prodigy 20 points

Hi there Holo,

I am new to this EZ DSP c55x development stick but have some rusty experience with CCS from years past (C2000 and C54x). Not an expert, so let me just take a stab.

Printf() functions typically, I don't have CCS here handy or I'd check before posting this, are from the STDIO library. If you go in to your tool suite and track down the printf function, which header file (stdio.h??) and what registers it changes, and compare that against the data sheet for the device you are compiling for, you will probably see it is due to a changed register.

(shoulder shrug) ; )

Brett

0 Jason Matalka over 15 years ago in reply to Holo

Prodigy 115 points

I am having a similar problem. If I call the hardware FFT functions from my main program, then my FFT results look good. However, if I call the FFT functions from another function which is called from main, then I have problems.

It seems like the placement of the functions, or what happens to be in random registers, has something to do with whether or not the HW FFT produces the correct results.

Any ideas?

-jason

0 Rijurekha Sen over 15 years ago in reply to Jason Matalka

Intellectual 645 points

Hi,

How do I set DATA_ALIGN 16 thatresolves the printf() issues? In which file is DATA_ALIGN defined? I ned it urgently, as I need to show a demo using FFT.

Thanks in advance,

Riju

0 Holo over 15 years ago in reply to Rijurekha Sen

Prodigy 135 points

DATA_ALIGN is not defined, it is a compiler pragma and used like this:

// Size of FFT
#define NX 8
// Alignment needed for hwfft_br
#define ALIGN (2*NX)
// Input array
Int16 in[2*NX];
// Align input array
#pragma DATA_ALIGN(in, ALIGN);
// Output array
Int16 out[2*NX];
// Align output array
#pragma DATA_ALIGN(out, ALIGN);

Holo

0 steffen_ger over 15 years ago in reply to Holo

Intellectual 290 points

Hello Holo,

Do you have resolved your problems? I`ve read your code and asked myself why you are define the following to enable fft hardware, where is this described?

#define ICR (*(ioport volatile unsigned *)0x0001)
#define ISTR (*(ioport volatile unsigned *)0x0002)

Furthermore I wonder about your input data 2,0,2,0,2,0,2,0,2,0,2,0,2,0,2,0 because of the as complex handled input of hwafft that would be only numbers with imaginary parts.

btw: Mark McKeown has recently published a document where the hwafft is described. It is also described how to align data in RAM

0 Holo over 15 years ago in reply to steffen_ger

Prodigy 135 points

We did not solve the problem and switched to software fft. The benefit of hardware accelerated fft was not cost effective enough to justify hours of fiddling with hardly documented features to get it working.

Regarding the macros: The code to enable the fft hardware can be found in sprz281a.pdf page 8.

Regarding input data: In sprufp0a.pdf page 13 it states: The 32-bit input and output data consist of 16-bit real and 16-bit imaginary data. From that sentence I assumed that the layout in the array must be Re,Im,Re,Im,...

Thanks for pointing me to sprabb6.pdf from Mark McKeown. I will read it and evaluate if it contains new information. If it does, maybe we can switch to hardware fft.

With kind regards,

Holo

0 steffen_ger over 15 years ago in reply to Holo

Intellectual 290 points

I think in the hwafft function there is another nomenclature as in the DSP Lib. In the DSP Lib input is Re, Im, Re, Im... or Re,Re,Re,Re... depending on the function

But in hwafft: Take a look at SPRABB6, "The 32-bit input and output data consist of 16-bit real and 16-bit imaginary data." means that the 16 MSB of the 32-bit input are real and the 16 LSB of the 32-bit input are imaginary

by the way: I have problems with the DSP Lib function while the hwafft runs, so it`s the opposite of your problems ;-)

Maybe you would like to look at my example code I posted in another thread and say me what I`m doing wrong? I tracked the failure to "puts()", as soon as I use it I get false results. But I need this function in my project.

0 Holo over 15 years ago in reply to steffen_ger

Prodigy 135 points

The 5505 is big endian, so when the 16 bit input array is cast to int32, the ordering should be correct with MSB first (real part), LSB second (imaginary part).

With kind regards,

Holo

PS: I have attached a small example of how we use the dsplib fft. Maybe it is off some use.

0 steffen_ger over 15 years ago in reply to Holo

Intellectual 290 points

Thanks for your example, I checked it with your input arrays and it works fine. I have tried your code with an input sequence from the dsp library examples too and there the test fails (again, like my tests).

Your example with the input sequence is attached to this post. Maybe it is because of the negative input... I dunno... But I think it is enough for me when it passes with real input like in your example. So thanks a lot.

regarding the input array I missunderstood your interpretation of the sequence.It`s ok.

0 Jiří Babka over 14 years ago in reply to Jason Matalka

Intellectual 465 points

Hello,

I have had the same problem. The hwafft_1024pts routine gives good results sometimes but at another places it gives wrong results. I have found there is problem in the hwafft_1024pts routine that they work with AR0 - AR5 registers somewhere as if those were 16-bit only. So it helps to make an initialization of these registers before calling the hwafft_1024pts routine. The addresses of buffers are put into the registers AR0, AR1 when calling the hwafft_1024pts routine. I initialize registers AR2 - AR5 to the value 0x010000 because I have the buffers at addresses 01xxxxh:

    asm(" AMAR *(#010000h),XAR2");
    asm(" AMAR *(#010000h),XAR3");
    asm(" AMAR *(#010000h),XAR4");
    asm(" AMAR *(#010000h),XAR5");
    out_sel = hwafft_1024pts(data_br_buf, scratch_buf, FFT_FLAG, SCALE_FLAG);

I don't know if it is necessary to initializate all these registers. Maybe someone from TI could specify this.

Best regards

Jiri Babka

0 David Whitehouse over 14 years ago in reply to Jiří Babka

Prodigy 190 points

Jiri,

Have you had any confirmation from TI on this? I'm also trying to get the HWA running. The hwafft_br is behaving fine (I've got the DSPLib cfft and hwafft_br functions giving me the same result) but I'm getting different behavour on the hwafft_256pts between compiling optimised and unoptimised.

Looking at the dissassembly before the call:

Unoptimised

0135D9          C$L19:
0135D9 a40c                     MOV *SP(#06h),T0
0135DB a50e                     MOV *SP(#07h),T1
0135DD ed089f                   MOV dbl(*SP(#04h)),XAR1
0135E0 ec318e008000             AMAR *(#08000h),XAR0
0135E6 6cff75de                 CALL #0xff75de
0135EA c410                     MOV T0,*SP(#08h)
0135EC 060093                   B C$L24

Optimised

013864 ece18e_2365              AMAR *AR7,XAR0 || MOV T2,T1
013869 eca19e_23e4              AMAR *AR5,XAR1 || MOV AR6,T0
01386E 6cff75de                 CALL #0xff75de
013872 06004e                   B C$L9

Stepping into the ROM function T0, T1, XAR0 and XAR1 are loaded consitently between both builds but in the unoptimised code the function does something (modifies both the *data and *scratch buffers); the optimised code doesn't touch the *data buffer. It does look like something additional needs initialisation or something, but the assembly listing posted with the FFT Implementation on the TMS320VC5505,TMS320C5505, and TMS320C5515 DSPs looks fair enough.

Can anyone at TI throw any light on this?

Dave

0 Jason51114 over 14 years ago in reply to Jiří Babka

Prodigy 70 points

I pointed out the same issue in the following thread last July:

http://e2e.ti.com/support/dsp/tms320c5000_power-efficient_dsps/f/109/p/54489/193396.aspx

Not sure if they actually updated the document as they said they would or not. The input/output and scratch buffers must be on the same 64KB data page in order for the HWAFFT routines to return correct results.

Jason

0 David Whitehouse over 14 years ago in reply to Jason51114

Prodigy 190 points

Jason,

Nice one! Only documentation I've got is SPRABB6A–June 2010 and I can't see any mention of that requirement

Many thanks,

Dave

Processors

Processors forum

Accessing the hardware fft