This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Hi every one.
I'm using TMS320C6455 @ 1Ghz. DSP chip and CCS V3.3
I need to calculate exponential (e.g in matlab exp() ) and power ( e.g x^2 ) many times,
but calculating these takes too long..
*(*ppDiffusivity_scanline_increase + index) = (exp(-pow(abs(*(*ppGrad_Scanline_2 + index))/SCANLINE_THRESHOLD,2)));
this block takes 324169075 cycles (320ms)... : (
is there a faster way to calculate exponential, power and log??
Thanks!!
this attach my code where using 'exp' , 'pow' and 'log' function
// Grad_NEWS ���ϱ� // �ٲ�� �� �͵� // 1.¥���� ���ϱ� ���̷��� ®���� �ڴ�. // scanline difference�� �׳� index���� ��Ű�� ���������� �ٸ��� �����س����� �ȴ�. // depth difference�� depth���� ������ ������ ������ �̷�������� �� �� ����. // -07.09- void Difference_2(short **ppROI_IMAGE_2, short **ppGrad_Scanline_2, short **ppGrad_Depth_2, int ROI_SCANLINE_SIZE, int ROI_DEPTH_SIZE, int FILTER_SCANLINE_SIZE, int FILTER_DEPTH_SIZE) { int scanline_index = 0; int depth_index = 0; int scanline_address_L = 0; int scanline_address_R = 0; int scanline_address_OUT= 0; int depth_address_U = 0; int depth_address_D = 0; int depth_address_OUT = ROI_DEPTH_SIZE-1; int ROI_SCANLINE_MARGIN = 1; int ROI_DEPTH_MARGIN = 0; ROI_DEPTH_MARGIN = ROI_DEPTH_SIZE + FILTER_DEPTH_SIZE; /* 2�� pointer ���� short y_index = 0; int a = 0; **ppROI_IMAGE_2 = 1212; for(a=0; a<5;a++) { y_index = *(*ppROI_IMAGE_2+a); } x_index = 1; */ // Scanline�������� �����ϴ� gradiant 3-1(W)��� // 2 // 3 1 4 // 5 for (scanline_index = 0; scanline_index < ROI_SCANLINE_SIZE+1; scanline_index++) { for (depth_index = 0; depth_index < ROI_DEPTH_SIZE; depth_index++) { scanline_address_L = ROI_SCANLINE_MARGIN + depth_index + scanline_index * (ROI_DEPTH_SIZE + FILTER_DEPTH_SIZE); scanline_address_R = ROI_SCANLINE_MARGIN + depth_index + (scanline_index+1) * (ROI_DEPTH_SIZE + FILTER_DEPTH_SIZE); scanline_address_OUT= depth_index + scanline_index * ROI_DEPTH_SIZE; *(*ppGrad_Scanline_2 + scanline_address_OUT) = *(*ppROI_IMAGE_2 + scanline_address_L ) - *(*ppROI_IMAGE_2 + scanline_address_R); } } // Depth�������� �����ϴ� gradiant 2-1(N)��� // 2 // 3 1 4 // 5 for (scanline_index = 0; scanline_index < ROI_SCANLINE_SIZE; scanline_index++) { for (depth_index = 0; depth_index < ROI_DEPTH_SIZE+1; depth_index++) { depth_address_U = ROI_DEPTH_MARGIN + depth_index + scanline_index * (ROI_DEPTH_SIZE + FILTER_DEPTH_SIZE); depth_address_D = ROI_DEPTH_MARGIN + depth_index+1 + scanline_index * (ROI_DEPTH_SIZE + FILTER_DEPTH_SIZE); depth_address_OUT= depth_index + scanline_index * (ROI_DEPTH_SIZE+1); *(*ppGrad_Depth_2 + depth_address_OUT) = *(*ppROI_IMAGE_2 + depth_address_U ) - *(*ppROI_IMAGE_2 + depth_address_D); } } }
The fundamental problem you face is that you are performing floating point calcuations on a processor which does not have native floating point instructions. Perhaps it is time to use a different TI DSP. The rest of this post presumes you cannot do that.
Some options to consider ... Use the float type versions of those operations (expf, powf or powfi, logf) instead of the standard double type versions. Or, you could try the C62x/C64x Fast RTS Library.
Thanks and regards,
-George
In addition to what George said about using floating-point emulation, in general the best way to compute a *small* nonnegative integral power of a floating-point quantity in C is:
inline float pow_small_n(float x, unsigned n) {float xn = 1.0; for (; n != 0; --n) xn *= x; return xn;}
To compute a *general* (not necessarily small) nonnegative integral power of a floating-point quantity:
inline float pow_any_n(float x, unsigned n) {float xn = 1.0, xi = x; for (; n != 0; xi *= x, n >>= 1) if (n & 1) xn *= xi; return xn;}
Of course you can adapt these to non-float types. For negative powers, compute the reciprocal using one of these schemes.
I assumed that the computation will not overflow. You could add overflow checking if you think you might need it.
The C optimizer should do a good job of generating fast code for particular instances of these. For example, for small constant n it should completely unroll the loop.
Hello,
... No trace of exp, pow, nor log in C file ...
Another additions:
Always prefer multiplies to divides (create some variable temp = 1.0 / SCANLINE_THRESHOLD).
You can also build lower precision (and faster) implementation of expf,...
In the expression mentioned, if the main argument is 16-bit, another way is to precompute a look-up table (LUT) of the function for all the permitted argument values;
the calculation is then very fast (one table indexation per point). If the other parameters are quite fixed, the LUT computation may be done at init time.
Jakez
PS: I permit myself to correct Douglas's typo:
inline float pow_any_n(float x, unsigned n) {float xn = 1.0, xi = x; for (; n != 0; xi *= xi, n >>= 1) if (n & 1) xn *= xi; return xn;}
Thanks for the correction. I actually had that while typing up the message, but when I reviewed it before posting, for some reason it looked wrong, so I changed it :-( Of course, it is always a good idea to test any such code with a handful of simple known cases before using the code in your application.
The LUT approach is a good idea, which I have used many times. The idea is to build the table only the first time the function is called, and always return the corresponding table value. A similar approach is to cache already-computed cases; look-up is slower (usually we hash the inputs to make the search faster), but it works when a complete table would be too large.