it takes too long time Using exp function

Yoon Chong Ook

Prodigy 10 points

Other Parts Discussed in Thread: TMS320C6455

Hi every one.

I'm using TMS320C6455 @ 1Ghz. DSP chip and CCS V3.3

I need to calculate exponential (e.g in matlab exp() ) and power ( e.g x^2 ) many times,

but calculating these takes too long..

*(*ppDiffusivity_scanline_increase + index) = (exp(-pow(abs(*(*ppGrad_Scanline_2 + index))/SCANLINE_THRESHOLD,2)));

this block takes 324169075 cycles (320ms)... : (

is there a faster way to calculate exponential, power and log??

Thanks!!

this attach my code where using 'exp' , 'pow' and 'log' function

Difference_2_rev1.c

Fullscreen

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
// Grad_NEWS ���ϱ�
// �ٲ�� �� �͵� 
// 1.¥���� ���ϱ� ���̷��� ®���� �𸣰ڴ�. 
//   scanline difference�� �׳� index���� ��Ű�� ���������� �ٸ��� �����س����� �ȴ�. 
//   depth difference�� depth���� ������ ������ ������ �̷�������� �� �� ����.
//   -07.09-
void Difference_2(short **ppROI_IMAGE_2, short **ppGrad_Scanline_2, short **ppGrad_Depth_2, int ROI_SCANLINE_SIZE, int ROI_DEPTH_SIZE, int FILTER_SCANLINE_SIZE, int FILTER_DEPTH_SIZE)
{
    int scanline_index      = 0;
    int depth_index         = 0;
    int scanline_address_L  = 0;
    int scanline_address_R  = 0;
    int scanline_address_OUT= 0;
    int depth_address_U     = 0;
    int depth_address_D     = 0;
    int depth_address_OUT   = ROI_DEPTH_SIZE-1;
    int ROI_SCANLINE_MARGIN = 1;
    int ROI_DEPTH_MARGIN    = 0;
    ROI_DEPTH_MARGIN = ROI_DEPTH_SIZE + FILTER_DEPTH_SIZE;
    
    
/*  2�� pointer ���� 
    short   y_index = 0;
    int a = 0;
    **ppROI_IMAGE_2 = 1212;
    for(a=0; a<5;a++)
    {
        y_index = *(*ppROI_IMAGE_2+a);
    }
    x_index = 1;
*/
    // Scanline�������� �����ϴ� gradiant 3-1(W)��� 
    //      2
    //  3   1   4   
    //      5
    for (scanline_index = 0; scanline_index < ROI_SCANLINE_SIZE+1; scanline_index++)
    {
        for (depth_index = 0; depth_index < ROI_DEPTH_SIZE; depth_index++)
        {   
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

// Grad_NEWS ���ϱ�

// �ٲ�� �� �͵� 
// 1.¥���� ���ϱ� ���̷��� ®���� �𸣰ڴ�. 
//   scanline difference�� �׳� index���� ��Ű�� ���������� �ٸ��� �����س����� �ȴ�. 
//   depth difference�� depth���� ������ ������ ������ �̷�������� �� �� ����.
//	 -07.09-

void Difference_2(short **ppROI_IMAGE_2, short **ppGrad_Scanline_2, short **ppGrad_Depth_2, int ROI_SCANLINE_SIZE, int ROI_DEPTH_SIZE, int FILTER_SCANLINE_SIZE, int FILTER_DEPTH_SIZE)
{

	int scanline_index		= 0;
	int depth_index			= 0;
	int scanline_address_L	= 0;
	int scanline_address_R	= 0;
	int scanline_address_OUT= 0;
	int depth_address_U		= 0;
	int depth_address_D		= 0;
	int depth_address_OUT	= ROI_DEPTH_SIZE-1;
	int ROI_SCANLINE_MARGIN	= 1;
	int ROI_DEPTH_MARGIN 	= 0;

	ROI_DEPTH_MARGIN = ROI_DEPTH_SIZE + FILTER_DEPTH_SIZE;
	
	
/*	2�� pointer ���� 
	short 	y_index	= 0;
	int a = 0;
	**ppROI_IMAGE_2 = 1212;
	for(a=0; a<5;a++)
	{
		y_index = *(*ppROI_IMAGE_2+a);
	}
	x_index = 1;
*/
	// Scanline�������� �����ϴ� gradiant 3-1(W)��� 
	//		2
	//	3	1	4	
	//		5
	for (scanline_index = 0; scanline_index < ROI_SCANLINE_SIZE+1; scanline_index++)
	{
		for (depth_index = 0; depth_index < ROI_DEPTH_SIZE; depth_index++)
		{	
			scanline_address_L 	= ROI_SCANLINE_MARGIN + depth_index + scanline_index * (ROI_DEPTH_SIZE + FILTER_DEPTH_SIZE);
			scanline_address_R 	= ROI_SCANLINE_MARGIN + depth_index + (scanline_index+1) * (ROI_DEPTH_SIZE + FILTER_DEPTH_SIZE);
			scanline_address_OUT= depth_index + scanline_index * ROI_DEPTH_SIZE;

			*(*ppGrad_Scanline_2 + scanline_address_OUT) = *(*ppROI_IMAGE_2 + scanline_address_L ) - *(*ppROI_IMAGE_2 + scanline_address_R);
		}
	}

	// Depth�������� �����ϴ� gradiant 2-1(N)��� 
	//		2
	//	3	1	4	
	//		5
	for (scanline_index = 0; scanline_index < ROI_SCANLINE_SIZE; scanline_index++)
	{
		for (depth_index = 0; depth_index < ROI_DEPTH_SIZE+1; depth_index++)
		{
			depth_address_U  = ROI_DEPTH_MARGIN + depth_index + scanline_index * (ROI_DEPTH_SIZE + FILTER_DEPTH_SIZE);
			depth_address_D  = ROI_DEPTH_MARGIN + depth_index+1 + scanline_index * (ROI_DEPTH_SIZE + FILTER_DEPTH_SIZE);
			depth_address_OUT= depth_index + scanline_index * (ROI_DEPTH_SIZE+1);

			*(*ppGrad_Depth_2 + depth_address_OUT) = *(*ppROI_IMAGE_2 + depth_address_U ) - *(*ppROI_IMAGE_2 + depth_address_D);
		}
	}

}

over 13 years ago

George Mock over 13 years ago

TI__Guru**** 243870 points

The fundamental problem you face is that you are performing floating point calcuations on a processor which does not have native floating point instructions. Perhaps it is time to use a different TI DSP. The rest of this post presumes you cannot do that.

Some options to consider ... Use the float type versions of those operations (expf, powf or powfi, logf) instead of the standard double type versions. Or, you could try the C62x/C64x Fast RTS Library.

Thanks and regards,

-George

Douglas Gwyn over 13 years ago

Expert 2210 points

In addition to what George said about using floating-point emulation, in general the best way to compute a *small* nonnegative integral power of a floating-point quantity in C is:

inline float pow_small_n(float x, unsigned n) {float xn = 1.0; for (; n != 0; --n) xn *= x; return xn;}

To compute a *general* (not necessarily small) nonnegative integral power of a floating-point quantity:

inline float pow_any_n(float x, unsigned n) {float xn = 1.0, xi = x; for (; n != 0; xi *= x, n >>= 1) if (n & 1) xn *= xi; return xn;}

Of course you can adapt these to non-float types. For negative powers, compute the reciprocal using one of these schemes.

I assumed that the computation will not overflow. You could add overflow checking if you think you might need it.

The C optimizer should do a good job of generating fast code for particular instances of these. For example, for small constant n it should completely unroll the loop.

Jakez over 13 years ago

Expert 1950 points

Hello,

... No trace of exp, pow, nor log in C file ...

Another additions:

Always prefer multiplies to divides (create some variable temp = 1.0 / SCANLINE_THRESHOLD).
You can also build lower precision (and faster) implementation of expf,...

In the expression mentioned, if the main argument is 16-bit, another way is to precompute a look-up table (LUT) of the function for all the permitted argument values;
the calculation is then very fast (one table indexation per point). If the other parameters are quite fixed, the LUT computation may be done at init time.

Jakez

PS: I permit myself to correct Douglas's typo:

inline float pow_any_n(float x, unsigned n) {float xn = 1.0, xi = x; for (; n != 0; xi *= xi, n >>= 1) if (n & 1) xn *= xi; return xn;}

Douglas Gwyn over 13 years ago in reply to Jakez

Expert 2210 points

Thanks for the correction. I actually had that while typing up the message, but when I reviewed it before posting, for some reason it looked wrong, so I changed it :-( Of course, it is always a good idea to test any such code with a handful of simple known cases before using the code in your application.

The LUT approach is a good idea, which I have used many times. The idea is to build the table only the first time the function is called, and always return the corresponding table value. A similar approach is to cache already-computed cases; look-up is slower (usually we hash the inputs to make the search faster), but it works when a complete table would be too large.

Code Composer Studio™︎

Code Composer Studio forum

it takes too long time Using exp function