This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Tool/software: Code Composer Studio
Hi,
I am developing some DSP audio effects with the help of the mcasp starterware. One issue I am presented with is the presence of noise within my system, I think this is mainly due to inefficient code execution (please correct me if there is another significant factor that could cause this).
/* ** Transmit buffers. If any new buffer is to be added, define it here and ** update the NUM_BUF. */ signed char txBuf0[AUDIO_BUF_SIZE]; signed char txBuf1[AUDIO_BUF_SIZE]; //
/* Array of transmit buffer pointers */ signed int txBufPtr[NUM_BUF] = { (signed int) txBuf0, (signed int) txBuf1 };
In streamlining my code I want to figure out how to point to and assign values to my transmission buffers. Currently I am checking the lastSentTxBuf through an 'if' statement and assigning values to the arrays txBuf0 and txBuf1 within.
for (all values in buffer)
{
if (lastSentTxBuf == 0) { txBuf0[i] = Processed data; } else { txBuf1[i] = Processed data; }
}
Is there a way to avoid this, potentially by using the txBufPtr array directly? I have tried multiple times to solve this but nothing seems to work. Please advise me on any solutions to this problem, along with ways to increase code efficiency and to get the most processing power out of the C6748.
Thanks,
Calum
Hello!
Its hard to tell, whether the noise is related with data processing efficiency, but if you ask, is there a room to improve your loop the answer might be more certain. One big thing on C6000 is loop pipelining. Long story short, pipelining does not like branching (read ifs and other conditionals) in the loop. Just with a quick look you may rewrite your code as
for (all values in buffer) { if (lastSentTxBuf == 0) { for (one buf len ) txBuf0[i] = Processed data; } else { for (one buf len ) txBuf1[i] = Processed data; } }
This way inner loop has no conditional and could be pipelined more efficiently.
However, there is a lot of speculation. I suggest you find a document like "Optimizing loops on C6000 DSP", and learn to read output assembly.
Be sure, that is as useful, as addictive. Once started you'll never quit, and its worth efforts :-)
Hi,
The data is coming through the mcasp and being transferred to the DDR2 memory by the DMA (I have tried moving it to L2 memory but have not seen a noticeable improvement).
Each sample is two little-endian 8-bit singed char, with two 8-bit zero padded values after ( e.g. 1101 0010, 0000 0100, 0000 0000, 0000 0000 = 1234 in decimal).
I am transferring this to a short type circular array, disregarding the two zero padded char. (Through the use of the DSP)
I apply my processing to a single short 'processing' variable using the values in the circular array. (Through the use of the DSP)
Then convert the 'processing' variable to it's two constituent char values and place them in my output array, in the same format as the input array (16-bit), then add two zero padding char values. (Through the use of the DSP)
Then output my data from DDR2 (have tried doing so from L2 as well but with minimal improvement) to the mcbsp, through the DMA.
Thanks
Hello!
First of all, please don't take my considerations as immediate recipes, but rather as directions to review.
Point one, whether you can avoid transferring dummy data. If that's a case, that would change the game a lot. Assuming you can do nothing about that and have to skip fillers and unpack data manually, still there is a room for improvement.
I think EDMA can do data stream extraction and skip dummies. However, you still need to expand 8-bit values to 16-bit ones and as they are signed, EDMA can't do that, you have to use processor.
Its not certain, that having to do that job with processor would definitely degrade performance. If your loop is well pipelined, data access and unpacking might be just a step of the pipeline. One may try to use SIMD intrinsics to speed up packed data processing. However, here we step on the land on speculation.
Do you use optimization? Is that o3 level? Would you mind to show excerpt of processing loop?
Calum,
I am not from the TI-RTOS team, but am jumping into the discussion, if I can help.
There is a lot of good information in the C6000 Embedded Design Workshop, particularly about using the EDMA3 to service the peripherals and chain to a ping/pong buffer scheme. There are examples of this also in the TRM, but the workshop includes labs and solutions with the pictures. And there are videos for some of the modules that might be useful, especially EDMA3. It is tricky to find on TI.com; "training c6748" does not find it but "workshop c6748" does (no quotes).
You should always use the EDMA3 to move data buffers. It is much better at it than the DSP. And you should always avoid moving data whenever possible. The workshop will help you visualize that.
Starterware is simple code to help with simple tasks, and it is not actively supported by TI - only offered as-is. You may find some value in switching to the Processor SDK, which is fully supported and portable across all recent TI processors.
What board are you using, and what ADC/DAC?
Regards,
RandyP