This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Hi all,
I am programming with dsplibC67x.lib on C6713 DSK. As
everyone knows, there are lots of bugs and special requirements in
dsplibC67x v2.0. For example, function DSPF_sp_mat_mul(float *x,int
r1,int c1,float *y,int c2,float *r) is used for matrix multiplication,
but it needs r1, c1, c2 >1. Unfortunately, I have to do a
multiplication (Mx1)x(1xN). What can I do to handle this? (I know
do-not-use it is a solution)
I wonder if there is an available substitute for this dsplib? If I
rebuild dsplibC674x for C671x platforms, can it works well for C6713? Is
there still any restrictions (DSPF_sp_mat_mul() as an example)?
Thank you in advance.
Regards,
Li Bo
Hello Li Bo, I'm sorry for your trouble.
Please note DPSLIB is provided in source to allow for customizations. The kernels are also meant to serve as an example for how the different functionalities need to be implemented. I encourage you to attempt implementing the kernel you desire by referring the DPSLIB source. We would love to help you in achieving this.
For ex, you can implement (Mx1)x(1xN) as a vector multiplied by a scalar:
void vector_mpy_scalar(const float *x1, const float m, float *restrict y, const int n) {
int i;
for (i = 0; i < n; i ++) {
y[i] = m * x1[i];
}
}
This should be Ok if M or N is fairly big.
Regards,
Gagan
Gagan,
thank you for your reply.
I have write a C version code just as you show. Indeed, I was asking for a assembly optimized code. I will try to look at this optimization when i have enough time.
However, I would like to point out that TI should provide more basic & advanced signal processing routines.
Regards,
Bo LI
Hello Li Bo, understand and agree.
On this kernel, if you compile with –o3 option, the compiler will generate a schedule of 1 clock per iteration. That is already good and I don’t think you can do better than that on C6713.
Note however that handASM doesn’t necessarily mean better/faster code. The C6x architecture is very suitable for C code development and also extracting performance from C itself. We have invested heavily in our compiler technology to enable easy and efficient high level programming for the C6x DSP. There are very few cases when going to ASM (Serial ASM at that) gives advantage. But the advantage in most cases is small.
If you do plan to work on optimizing any kernel, I recommend focusing on C optimization. If for some reason you feel that the C compiler is not doing a good job of optimization and the performance can be improved by writing ASM, please write to us and we most likely will be able to suggest how you can get the best performance staying in C itself
Regards,
Gagan