general FIR help

Josrocket

Other Parts Discussed in Thread: CCSTUDIO, TMS320C6414, TMS320C6416T

I got a buffer size of 32000 and a SR of 32kHz and im implementing an FIR with 721 coefficients...im having trouble implementing the FIR...Im confused on what to do with the extra samples produced during convolution...In order to keep the buffer from being overwritten with these extra samples by doing some bit shifts, but no luck....could someone give me some advice or sample code. Thanks

over 15 years ago

0 Sid over 15 years ago

Genius 4530 points

Hi,

Maybe you can periodically store the sample values in some external memory if available, so that you can access it whenever required and not lose the coefficients and also avoid them from getting overwritten.

Regards,

Sid

0 burhan türkel over 15 years ago

Intellectual 470 points

Josrocket,

You can find various filtering functions in DSPLib which is TI's optimized library for digital signal processing.

I think you're talking about the extra output samples calculating for first N samples. These are calculated for filling all the taps.

If you fınd any information please share with us.

Burhan.

0 Josrocket over 15 years ago in reply to Sid

Intellectual 390 points

my question is that exactly...I was wondering how other people deal with this problem..how would i go about storing it and putting it back together??

0 RandyP over 15 years ago in reply to Josrocket

TI__Guru* 84110 points

Are you using DSPLIB? If so, which version and which FIR function - and what are the parameters you use (length, not coefficient values)?

Which processor? Which board? Which versions of CCS, BIOS, Code Gen?

I thought that DSPLIB comes with example projects. Are those missing from your installation?

0 Brad Griffis over 15 years ago in reply to RandyP

TI__Guru*** 125430 points

An FIR filter requires both a data buffer (e.g. your 32000 samples) plus a history buffer (num_taps-1). Our implementation combines them into a single buffer 'x' of size nx+nh-1. For example in your case that would be 32000+721-1 = 32720. So the first 720 samples correspond to what would normally be the history buffer. Initially this would be initialized to zero, but in the "steady state" you would copy the last 720 samples of the previous buffer into this space. So your first actual data sample would be placed immediately after those 720 history samples.

So why do we do it this way? In the traditional method you would need to maintain a circular history buffer. This would require a single copy for every sample being processed. In this case we are using the data buffer itself as the history buffer and you simply need to copy the last num_taps-1 samples. Specifically for your case the traditional method would require 32000 copies to preserve the history buffer, but the TI-method requires only 720 copies.

0 burhan türkel over 15 years ago in reply to Brad Griffis

Intellectual 470 points

Brad Griffis said:

An FIR filter requires both a data buffer (e.g. your 32000 samples) plus a history buffer (num_taps-1). Our implementation combines them into a single buffer 'x' of size nx+nh-1. For example in your case that would be 32000+721-1 = 32720. So the first 720 samples correspond to what would normally be the history buffer. Initially this would be initialized to zero, but in the "steady state" you would copy the last 720 samples of the previous buffer into this space. So your first actual data sample would be placed immediately after those 720 history samples.

So why do we do it this way? In the traditional method you would need to maintain a circular history buffer. This would require a single copy for every sample being processed. In this case we are using the data buffer itself as the history buffer and you simply need to copy the last num_taps-1 samples. Specifically for your case the traditional method would require 32000 copies to preserve the history buffer, but the TI-method requires only 720 copies.

Thanks.

0 Josrocket over 15 years ago in reply to Brad Griffis

Intellectual 390 points

By the way I ment 821 coeff. not 721 ...anyways would this work:

void DSP_fir_gen(etc.....)

{

int16 i=0;

int16 j=0;

int16 sum=0;

int16 count=0;

for(j=0; j < nfout; j++)

{

sum=0;

for(i=0; i < ncoe; i++)

{

sum += inbuf[ i + j] * filtercoe[i];

if (j <= 821)

{

sum=0;

}

if (j > 821)

{

count++;

filterout[count] = sum;

}

}// end of dsp func

void dataplaybackfunc( etc.......)

{

int ncoe = 821;

int nfout= 32000 + ncoe -1;

float filterout[32000];

int16 i = 0;

DSP_fir_gen ( etc................)

for( i = 0; i < 32000; i++)

{

outbuf[ i ] = filterout[ i ];

}

0 burhan türkel over 15 years ago in reply to Josrocket

Intellectual 470 points

Hi Josrocket,

Why do you rewrite ''DSP_fir_gen''. Don't you use TI's DSP Library.

And I couldn't understand why you copy filter output to another buffer. They are the same.

Burhan.

0 Josrocket over 15 years ago in reply to burhan türkel

Intellectual 390 points

i dont really understand how the one in the DSP library is suppost to work for what im doing...i didnt understand what to do with all the extra samples after convolution....im trying to have it set up so it only processes one buffer then the program stops...for what im doing thats what i need it to do

0 burhan türkel over 15 years ago in reply to Josrocket

Intellectual 470 points

Josrocket said:

i dont really understand how the one in the DSP library is suppost to work for what im doing...i didnt understand what to do with all the extra samples after convolution

"DSP_fir_gen" function takes (nout+ ncoef − 1) inputs and ncoef filter coefficients. Finally, it returns nout output samples. These output samples are what you need. Function takes (ncoef - 1) extra input values which Brad Griffis calls history samples. Function needs these extra samples because in order to obtain nth output sample FIR filter needs (n-1)th, (n-2)th,..., (n-N)th input samples. So there is no extra samples after convolution. For initial conditions, you can set first (ncoef - 1) extra inputs to zero.

Burhan.

0 Josrocket over 15 years ago in reply to burhan türkel

Intellectual 390 points

so then how would i modify the example code because it doesnt work....and i dont see how it can

0 Josrocket over 15 years ago in reply to Josrocket

Intellectual 390 points

for(j=0; j<nr; j++)

{

sum=0;

for (i=0; i<nh; i++)

{

sum += x[i+j]*h[i]

}

r[j] = sum >>15;

}

This does not work..i dont understand why since you guys say its right

0 burhan türkel over 15 years ago in reply to Josrocket

Intellectual 470 points

Josrocket said:

This does not work..i dont understand why since you guys say its right

Because I am using it. How do you get that it didn't work?

Josrocket said:

for(j=0; j<nr; j++)

{

sum=0;

for (i=0; i<nh; i++)

{

sum += x[i+j]*h[i]

}

r[j] = sum >>15;

}

This is pseudo code but in fact it is logical. So I guess it should work.

But you should call the function which is optimized. I am sure you include "dsp64pluslibe.lib" to your project. Check that. And be sure that endianness is OK.

Maybe you should try to write your own code.

Burhan.

0 Josrocket over 15 years ago in reply to burhan türkel

Intellectual 390 points

burhan türkel said:

This does not work..i dont understand why since you guys say its right

Because I am using it. How do you get that it didn't work?

Josrocket said:

for(j=0; j<nr; j++)

{

sum=0;

for (i=0; i<nh; i++)

{

sum += x[i+j]*h[i]

}

r[j] = sum >>15;

}

This is pseudo code but in fact it is logical. So I guess it should work.

But you should call the function which is optimized. I am sure you include "dsp64pluslibe.lib" to your project. Check that. And be sure that endianness is OK.

Maybe you should try to write your own code.

Burhan[/quote]

Not much to go off of to write my own code...TI has sh***y documentation...i understand the logic but it doesn't tell you anymore than that...I was unable to include that lib file how do i get that to include?

0 Josrocket over 15 years ago in reply to Josrocket

Intellectual 390 points

some code examples would help me out a ton once i have this the rest i should be fine....TI products are new to me and iv been stuck here for 2 weeks ...im getting nowhere...im implementing my own lie detection algorithm from matlab to dsp
Cancel
Up 0 True Down

Cancel
0 RandyP over 15 years ago in reply to Josrocket

TI__Guru* 84110 points

The first time you move to any new processor, it can be frustrating and difficult. Or at least that is often the case when moving to TI DSPs and TI tools like CCStudio.

We may disagree on the colorful language, but some have said that TI documentation has all the answers as long as you already know what the answer is and know exactly where to look for it.

For us to help you find those answers, we need your help narrowing the search, so please answer the following:

Which DSP chip are you using?
Which board are you developing on or are you using the simulator for now?
Which versions of CCS, BIOS, Code Generation Tools are you using?

From your comments in this thread, I can tell you are not using DSPLIB yet, so the answers to the questions above will help me point you in the right direction.

There are a lot of reasons why moving from pseudo code to running DSP code might fail. The most likely reasons are data type (16-bit vs. 32-bit) and data format (integer vs. Q15 vs. Q31). So another question pair of questions:

What data type and data format are you using?
Where does the data come from and go to?
If the ADC/DAC devices (assumption) do not use the same data type and data format as your program, how do you adjust for that? For example, 12-bit ADC into 16-bit data word, how you you place those 12-bits into the 16-bits?
Cancel
Up 0 True Down

Cancel
0 Brad Griffis over 15 years ago in reply to RandyP

TI__Guru*** 125430 points

I think it's worth noting that the C code given in the documentation is not just pseudo code. It's real tested C code and is precisely what this function calculates. Additional restrictions will apply on the assembly written version because it will do special loop unrolling tricks, etc. which will impose additional constraints such nh >= 5 and nr a multiple of 4.

One thing that I see does not appear to be documented (well) is the fact that this code is assuming a Q.15 data type. In other words, it is dealing with 16-bit fixed point numbers where you have 1 bit to the left of the implied decimal and 15 bits to the right of the implied decimal. Side note: I think you could make this any Q value you want by modifying the assembly code. I see 8 lines of code (4 in the kernel and 4 in the epilog) that do a shift-right 15 (SHR 15). If for example you wanted Q.10 you would just need to make it SHR 10. I haven't tested it, but I'm pretty sure that would work correctly.

A question was raised earlier about passing a Dirac delta function through. If the number 0x0001 was passed, then that's almost zero in Q.15 format. To pass a Dirac delta as a Q.15 number you want to put in 0x7FFF which is nearly one and would provide almost your FIR filter taps!
Cancel
Up 0 True Down

Cancel
0 RandyP over 15 years ago in reply to Brad Griffis

TI__Guru* 84110 points

Good catch on the Dirac pulse value!

The value -1.0 Q15 = 0x8000 could also be used to return a more exact coefficient list that just needs to be negated to get the right value.
Cancel
Up 0 True Down

Cancel
0 Josrocket over 15 years ago in reply to RandyP

Intellectual 390 points

RandyP said:

The first time you move to any new processor, it can be frustrating and difficult. Or at least that is often the case when moving to TI DSPs and TI tools like CCStudio.

We may disagree on the colorful language, but some have said that TI documentation has all the answers as long as you already know what the answer is and know exactly where to look for it.

For us to help you find those answers, we need your help narrowing the search, so please answer the following:

Which DSP chip are you using?
Which board are you developing on or are you using the simulator for now?
Which versions of CCS, BIOS, Code Generation Tools are you using?

From your comments in this thread, I can tell you are not using DSPLIB yet, so the answers to the questions above will help me point you in the right direction.

There are a lot of reasons why moving from pseudo code to running DSP code might fail. The most likely reasons are data type (16-bit vs. 32-bit) and data format (integer vs. Q15 vs. Q31). So another question pair of questions:

What data type and data format are you using?
Where does the data come from and go to?
If the ADC/DAC devices (assumption) do not use the same data type and data format as your program, how do you adjust for that? For example, 12-bit ADC into 16-bit data word, how you you place those 12-bits into the 16-bits?

Im using a tms320c6414 with ccs 3.1

i started with the sample project dsk_app It just simple audio input and output....I modified the buffer size to 32k and the sr to 32k .... I am now trying to take the outbuf variable and channel it through an fir filter before the buffer is played back. I created a DSP_fir_gen() and I call it in the copyData()
Cancel
Up 0 True Down

Cancel
0 Josrocket over 15 years ago in reply to Josrocket

Intellectual 390 points

I did figure out that its getting stuck inside the DSP_fir_gen() function...commented out the filter function call and let the program run (it worked) but when I uncomment the filter but still leave outbuf=inbuf as opposed to outbuf=filteroutbuf it still doesnt work which means its getting stuck in the dsp_fir_gen().....Hope this helps
Cancel
Up 0 True Down

Cancel
0 RandyP over 15 years ago in reply to Josrocket

TI__Guru* 84110 points

RandyP said:
Which board are you developing on or are you using the simulator for now?

?

Josrocket said:
I created a DSP_fir_gen() and I call it in the copyData()

These are two things that are confusing: DSP_fir_gen() is your own function but named like a DSPLIB function, and copyData() does a lot more than just copy data. Am I right about both of these interpretations?

Josrocket said:
i started with the sample project dsk_app It just simple audio input and output....I modified the buffer size to 32k and the sr to 32k .... I am now trying to take the outbuf variable and channel it through an fir filter before the buffer is played back. I created a DSP_fir_gen() and I call it in the copyData()

Is the modified buffer size 32k samples or 32k bytes? I assume samples, but assumptions do not help debug.
Do you get different results if you cut the buffer size down?
What was the buffer size in the original dsk_app?

Josrocket said:
I did figure out that its getting stuck inside the DSP_fir_gen() function.

Good debug work to determine that it is getting stuck in this function that you wrote. If I understand correctly, this is a pure C function, no outside function calls, just multiplies and adds and memory reads/writes and for-loops. And it is getting stuck, not giving bad results or outputting all zeroes or such.

What does "getting stuck" mean?

You code postings above do not give all the detail and the detail is where bugs live. I did notice that you use a float array for the filter output but the samples seem to be int16. That does not make sense. You probably do not want to use float here.

I assume you are using the Debug Configuration for this code build. Since your code is not optimized by structure or compiler settings, the filter loops probably take a lot of clock cycles per filter tap. It could be optimized down to a very low cycles-per-tap number, but what you have might be taking as much as 10-40 cycles per tap if it is fixed point or 100's of cycles if it is really float. 32000 samples * 821 taps * 100 cycles * 1ns = 2.6 seconds on hardware. This would take a lot longer on the simulator.
Cancel
Up 0 True Down

Cancel
0 Josrocket over 15 years ago in reply to RandyP

Intellectual 390 points

RandyP said:

Which board are you developing on or are you using the simulator for now?

?

Josrocket said:
I created a DSP_fir_gen() and I call it in the copyData()

These are two things that are confusing: DSP_fir_gen() is your own function but named like a DSPLIB function, and copyData() does a lot more than just copy data. Am I right about both of these interpretations?

Josrocket said:
i started with the sample project dsk_app It just simple audio input and output....I modified the buffer size to 32k and the sr to 32k .... I am now trying to take the outbuf variable and channel it through an fir filter before the buffer is played back. I created a DSP_fir_gen() and I call it in the copyData()

Is the modified buffer size 32k samples or 32k bytes? I assume samples, but assumptions do not help debug.
Do you get different results if you cut the buffer size down?
What was the buffer size in the original dsk_app?

Josrocket said:
I did figure out that its getting stuck inside the DSP_fir_gen() function.

Good debug work to determine that it is getting stuck in this function that you wrote. If I understand correctly, this is a pure C function, no outside function calls, just multiplies and adds and memory reads/writes and for-loops. And it is getting stuck, not giving bad results or outputting all zeroes or such.

What does "getting stuck" mean?

You code postings above do not give all the detail and the detail is where bugs live. I did notice that you use a float array for the filter output but the samples seem to be int16. That does not make sense. You probably do not want to use float here.

I assume you are using the Debug Configuration for this code build. Since your code is not optimized by structure or compiler settings, the filter loops probably take a lot of clock cycles per filter tap. It could be optimized down to a very low cycles-per-tap number, but what you have might be taking as much as 10-40 cycles per tap if it is fixed point or 100's of cycles if it is really float. 32000 samples * 821 taps * 100 cycles * 1ns = 2.6 seconds on hardware. This would take a lot longer on the simulator.

[/quote]

Im using the spec digital TMS320C6416T DSK with CCS 3.1

I fixed the data type issue. but this still didnt give any results.

The original buffer size was 1000samples(i think) and a sr at 8k It also works with the configuration of buffer size 32k with sr of 32k...it just has a short delay before it plays back audio...and as you know this is my current setting. I dont think this setting should make any diffrence

well when the function gets stuck it im not sure why but when i compile with all the original settings that just feed audio in and then back out it works as long as the FIR function call is commented out...if its not commented out it wont work even if im not even using the filterout buffer as my audio output buffer....basically it doesnt work even if i by pass the filter by simply not using the filteroutput buffer.

I started my experimentation by simply modifying the copydata() by: outbuf[i] =inbuf[i] * n just to see if i could get some amplification...it worked so i proceeded to route the inbuf[i] into the FIR and then set outbuf[i] = filteroutbuf[i]....but like i said it gets stuck like i said above....now im thinking that maybe i need to modify the inbuf and outbuf after the copydata() function is called within the processBuffer() function...what do you guys think?
Cancel
Up 0 True Down

Cancel
0 RandyP over 15 years ago in reply to Josrocket

TI__Guru* 84110 points

FYI, if you select or highlight any portion of the previous posting which is shown above your reply box, and then click Quote, it will only quote the part that was highlighted. This helps to keep the thread shorter while also accenting the specific part that you reference in your reply.

RandyP said:

Good debug work to determine that it is getting stuck in this function that you wrote. If I understand correctly, this is a pure C function, no outside function calls, just multiplies and adds and memory reads/writes and for-loops. And it is getting stuck, not giving bad results or outputting all zeroes or such.

Please confirm my understanding of your fir code.

RandyP said:

What does "getting stuck" mean?

For example, if you halt the program in CCS is it executing in a loop? If so, can you watch the local variables and see that they change if you run and halt again? Or is the program "lost in the weeds"?

Josrocket said:
maybe i need to modify the inbuf and outbuf after the copydata() function is called within the processBuffer() function...what do you guys think?

I think you need to figure out why your fir filter is getting stuck. Making changes outside a function that fails will usually not fix the function, unless the arguments are wrong.
Cancel
Up 0 True Down

Cancel
0 Josrocket over 15 years ago in reply to RandyP

Intellectual 390 points

RandyP said:
Please confirm my understanding of your fir code.

RandyP said:
I think you need to figure out why your fir filter is getting stuck. Making changes outside a function that fails will usually not fix the function, unless the arguments are wrong.

Im sure the code is logically sound and that it doesnt get stuck in some infinite loop so i think what the problem is, is that im calling the filter in the wrong place in the code which seems to screw up playback...im not sure what to do at this point....Im sure of this because of this because it is a pure C function like you said and whether or not the code for the filter is correct it should make it through the function to do a normal unfiltered playback since i didnt do anything with the filteroutbuf. Basically it looks like its screwing up because im calling a function in the middle of copydata() and it doesnt like it.
Cancel
Up 0 True Down

Cancel
0 RandyP over 15 years ago in reply to Josrocket

TI__Guru* 84110 points
With what little I know of your application code from this thread, I have given you almost all the advice I can give. When you answer my questions, some asked twice, I will be able to offer more advice, but until then this will be my last reply.

You have a lot of confidence and have demonstrated debug skills, so I am sure you will succeed on the path you have chosen.

My only remaining advice:

Comment out the fir function and insert a dummy function with the same arguments that just returns. This may confirm "calling a function in the middle of copydata() and it doesnt like it".

Add a simple for-loop in the dummy function that simply copies the first *quarter* of the input buffer to the first *quarter* of the output buffer. This adds a bit of meat to the function.

Increase the count to *half*.

Increase the count to the full buffer size. Note the difference, if any.

Best of luck,
RandyP
Cancel
Up 0 True Down

Cancel
0 Josrocket over 14 years ago in reply to RandyP

Intellectual 390 points

it turns out that the reason its not working is because its taking to long to process the data before it wants to spit the data back out....Is there a way to manually control the buffer reads and writes to stop make it so that the code dictates when to sample data then let me process the input buffer with my algorithm and spit it out when my code dictates?
Cancel
Up 0 True Down

Cancel
0 RandyP over 14 years ago in reply to Josrocket

TI__Guru* 84110 points
Under the title "general FIR help", you really do not want to start and stop your I/O streams. The math gets tricky (next history buffer != newest prior data), the output goes discontinuous, and you no longer are doing a FIR filter. I cannot imagine that you would be happy with the output you would get from this plan.

The "DSP" thing to do is to optimize your application so it can run in the time required. The best chance is to use the DSPLIB functions that were referenced by burhan türkel at the beginning of this thread. These are highly optimized and just require you to work within their structure of buffers to get them to work. You can find examples with the DSPLIB library installation, and probably by searching the E2E Forum and TI Wiki pages.

For a useful learning experience, you can try optimizing your own code to see what improvements you can make.

With the current implementation, try changing the number of coefficients being used to determine how many can be used before you run out of time.

Switch to the Release Configuration and try again to find how many coefficients can be used before running out of time.

If that is not good enough, look at optimization techniques like "restrict" and "#pragma MUST_ITERATE" and reading up to 8 bytes at-a-time using the _amemd8 intrinsic.

If taking snapshots to process and outputting bursts is really what you want to do, then you will need to learn everything about the sample application in order to change the system architecture from streaming to bursting. It could be as easy as hijacking an interrupt service routine to not start the next cycle, or it could be as involved as rewriting the EDMA PARAM for single-buffer accesses and completely restart before each new buffer. As you come up with specific questions on specific features, please show us snippets of real code with the needed detail and hopefully someone will know the base app well enough or can understand what you are trying to do and help you.
Cancel
Up 0 True Down

Cancel
0 Josrocket over 14 years ago in reply to RandyP

Intellectual 390 points

Unfortunately this will not work for me....everything i need it to do is necessary....The algorithm Im implementing has been optimized to a very high degree....It still takes my matlab algorithm about 30sec to process only 0.25 seconds of voice audio recorded at 8k samples a sec...I need to be able to be able to dictate when the buffer reads and writes because real-time operation seems out of the question with the current algorithm.
Cancel
Up 0 True Down

Cancel
0 Brad Griffis over 14 years ago in reply to Josrocket

TI__Guru*** 125430 points

Normally we use double-buffered I/O in real-time DSP systems. The EDMA and the CPU each "own" a buffer at any given time. The EDMA fills one while the CPU processes the other. Once the EDMA finishes filling a buffer, then the EDMA and CPU swap buffer ownership. This way the data rate is dictated by the EDMA (and McBSP). When the CPU receives a buffer it just processes the whole thing.

Maybe I misunderstand what you're doing or asking, but it seems to me like you need the double-buffered I/O to make your system work properly. You'll also need to watch out for cache coherence issues if your shared buffers are in external memory.
Cancel
Up 0 True Down

Cancel
0 RandyP over 14 years ago in reply to Josrocket

TI__Guru* 84110 points

In our training classes, like one I will take part in next week, we often have time to talk details about projects you are working on. It helps when you can hear my voice inflection as I try to point out what is a good idea, a bad idea, or just a question or suggestion. I do not like to use upper-case because it has too much of a negative connotation, and I do not know you well enough to have the right to make a judgment of your plan by any criteria other than what has been stated here. I will do my best to help you even when it means repeating myself, but please remember that I am on your side here. I do want you to succeed.

Josrocket said:
real-time operation seems out of the question with the current algorithm

Correction: This should say "with the current implementation of the current algorithm".If you used the C6472 6-core DSP, you could probably get real-time without making much of a change to your code. And if you work on some other steps, the performance of your implementation may surprise you.

You have done the hard part, which is to come up with the algorithm to be implemented. Trust me, that is harder than anything we are talking about in terms of implementing that algorithm. And you are in luck that you are using a very fast and very powerful DSP in the C6414T. We have made some additional improvements in some of our newer DSPs, and have placed multiple cores in a single package, but the C6414T is a very capable processor. And, again, you have done the hard part which is the science behind your algorithm and especially the science behind coming up with the 821 coefficients you need to use.

Josrocket said:
I need to be able to be able to dictate when the buffer reads and writes because real-time operation seems out of the question

If real-time operation is not vital or even important to your application, then that may be the direction you end up going. After my suggestion for that path, please still read on to see if you cannot have everything.

Most likely, the dsk_app (basis for this application, right?) captures a certain amount of data and then throws an interrupt to indicate that a buffer of data is ready. Then it continues capturing data into a second buffer to allow you to do processing in the first buffer - the well-named ping-pong buffering scheme. The easiest thing you can do in your application would be to check your algorithm's status when an interrupt comes in, and if the algorithm has not completed just return from the interrupt and ignore that buffer of data. I do not know how you will handle output audio data under these circumstances, but that problem has to be solved when you drop buffers (this plan) or when you start/stop capture - there will always be some time when something is missing and you either put out 0's or old data.

That is my recommendation for dictating when the buffer reads and writes.

Josrocket said:
The algorithm Im implementing has been optimized to a very high degree.

RandyP 9/21 2:58 PM CDT USA said:
The "DSP" thing to do is to optimize your application so it can run in the time required. The best chance is to use the DSPLIB functions...

What you are referring to here is your algorithm, but what I was referring to was your implementation on the C6414T. The pseudo code that you showed in your 9/16 3:16 PM CDT USA post was definitely not highly optimized code. Since there is no mention of how you have optimized your compile process (compiler switches, Debug vs. Release, etc.), I have to assume you have done none.

But I do have to assume that you understand the requirements for real-time operation, that you are able to do your math processing during the time it takes to read another buffer of data in. So please find the C64x (not C64x+) DSPLIB Programmers Reference and look up the number of cycles it takes to do the DSP_fir_gen, DSP_fir_r4, and DSP_fir_r8 functions. You can multiply your numbers for nh=#taps or coefficients -1, nr=number of samples, and multiply the number of cycles you calculate by 1ns for running at 1.0 GHz on the C6414T. Compare that number to 1 second (32000 samples at 32 KHz sample rate) and you will see what the best performance could be. What number do you get for # of cycles? I get a little more than 6.5 ms.

Let me better state some of the other comments I made earlier:

RandyP 9/21 2:58 PM CDT USA said:
With the current implementation, try changing the number of coefficients being used to determine how many can be used before you run out of time.

Change this to "changing the number of coefficients or samples". The idea is to find out how far from real-time you are. If you only have to lower the number of samples from 32000 to 31500, then it will not take much optimization to get you running at real-time.

RandyP 9/21 2:58 PM CDT USA said:
Switch to the Release Configuration and try again to find how many coefficients can be used before running out of time.

Most likely, changing from Debug to Release will improve your speed by 4x or more. Depending on how far you are from real-time, that could be enough and then you can stop.

RandyP 9/21 2:58 PM CDT USA said:
If that is not good enough, look at optimization techniques like "restrict" and "#pragma MUST_ITERATE" and reading up to 8 bytes at-a-time using the _amemd8 intrinsic.

This is basically trying to get your code to run as well as the DSPLIB functions run. So why not just use the DSPLIB functions?

I am sure you will succeed no matter which way you go about it. Be sure to take advantage of parallel data movement by the EDMA while the DSP is processing data. And like Brad says, pay attention to cache coherency issues since they always come up to the surface in high performance applications.

Regards,
RandyP
Cancel
Up 0 True Down

Cancel
0 Josrocket over 14 years ago in reply to Brad Griffis

Intellectual 390 points

Brad Griffis said:

Normally we use double-buffered I/O in real-time DSP systems. The EDMA and the CPU each "own" a buffer at any given time. The EDMA fills one while the CPU processes the other. Once the EDMA finishes filling a buffer, then the EDMA and CPU swap buffer ownership. This way the data rate is dictated by the EDMA (and McBSP). When the CPU receives a buffer it just processes the whole thing.

Maybe I misunderstand what you're doing or asking, but it seems to me like you need the double-buffered I/O to make your system work properly. You'll also need to watch out for cache coherence issues if your shared buffers are in external memory.

The code im using is this double buffer system but the problem is that my algorithm takes a very long time to process....and realistically all i need it to be able to do is load a wav file that is about .25-2 seconds long and process it, then output results....thats why i think the easiest thing to do is just use one very large buffer
Cancel
Up 0 True Down

Cancel
0 Josrocket over 14 years ago in reply to RandyP

Intellectual 390 points

RandyP said:
If real-time operation is not vital or even important to your application, then that may be the direction you end up going. After my suggestion for that path, please still read on to see if you cannot have everything.

you are exactly right...like i said Im porting my optimized matlab algorithm for voice stress analysis....Iv got your stellaris touch screen with cortex M3 that will be the interface....it will load a wav file that is about 0.25-2.00 seconds long with a sr of 8k per/sec to the DSP and run the algorithm and out put the results (sinusoidal waveform) back to the stellaris interface. To make it easy I just want to have a very large buffer that can hold the whole wav file and process it, then output results....Real time operation will come later once i get it working.

RandyP said:
Most likely, changing from Debug to Release will improve your speed by 4x or more. Depending on how far you are from real-time, that could be enough and then you can stop.

What do you mean by this...what is the difference?
Cancel
Up 0 True Down

Cancel
0 RandyP over 14 years ago in reply to Josrocket

TI__Guru* 84110 points

RandyP said:
Most likely, changing from Debug to Release will improve your speed by 4x or more. Depending on how far you are from real-time, that could be enough and then you can stop.

Josrocket said:
What do you mean by this...what is the difference?

[Thanks for prompting me. I only read the email version of your post and did not see this question. For my benefit to keep the forum working, did you edit your post to add the question above? If not, then the font corruption that our editor introduces affected the email I received and I would like to report that as an E2E bug, internally.]

There should be a CCS User's Guide somewhere to help you with this, but I could not find it. It is probably integrated into the CCS 3.1 Help menu. You will get a lot more information there, and worded better, than what I can tell you here.

When you build a project in CCS, you have first selected a Build Configuration. The default configurations are named Debug and Release, and these can be selected from a drop-down box (there are usually 3 ways to do most everything, but this is the easiest).

The first one alphabetically is the one that CCS chooses when you open a project, so that is Debug. It is a good default, since we recommend you do your build your program in Debug for debugging it for basic functionality while ignoring processing time requirements.

Then you switch to the Release configuration using the drop-down box and selecting Release. Some of the updates to the Debug configuration may have to be repeated for the Release configuration, like include paths.

The primary difference between the two configurations is that Debug has the -g compiler switch and Release has the -o3 compiler switch. There are other switches you can use to change the behavior of the compiler and there are ways to write your code to improve the optimization that the compiler can do. Refer back a few postings for hints of what to look for in the Optimizing Compiler User's Guide.
Cancel
Up 0 True Down

Cancel

Processors

Processors forum

general FIR help