DSP TMS320F28335 needs far too much time

david12870

Prodigy 40 points

Other Parts Discussed in Thread: TMS320F28335

DSP TMS320F28335 needs far too much time for calculating mathematical algorithms.

The Pin 31 is defined as an output Pin and an Oszi is connected to it.

The oscilosscope shows that a multiplication of two float variables takes 1µs. Normally it should take max. 50 ns.

//Example

float a=8.2, b=8.2, x;

while (1)

{

GpioDataRegs.CPACLEAR.bit.GPIO31 = 1;

x = a * b ;

GpioDataRegs.GPASET.bit.GPIO31 = 1;

};

Does anybody know why this cannot meet the expected performance?

over 16 years ago

0 Tim Love over 16 years ago

TI__Expert 5340 points

David,

What memory are you running this code from? Internal RAM is zero wait states but the External Bus and internal Flash have wait states associated with them. If you have not configured the wait states they be at their default slowest rate.

Regards,

Tim Love

0 Frank Bormann over 16 years ago in reply to Tim Love

Mastermind 7585 points

I think more important than the number of wait states is the correct compiler model and the library version in your project.

1. When you use floating point support ( Bulid options - Compiler - Advanced - Floating Point Support - fpu32) and library rts2800_fpu32.lib the compiler generates for line x=a*b:

MOV32 R0H,*-SP[2]

MOV32 R1H,*-SP[4]

MPYF32 R0H,R1H,R0H

NOP

MOV32 *-SP[2],R0h

which uses 4 floating point unit instructions and takes 6 cycles ( appr. 40 ns @ 150MHz).

2. If you set your Build - Options to "Floating Point Support : none " and you use library "rts2800_ml.lib" you get from line x=a*b:

MOVL ACC,*-SP[6]

MOVL *-SP[2], ACC

MOVL ACC,*-SP[4]

LCR #FS$$MPY

MOVL *-SP[8],ACC

which uses a library function call (FS$$MPY) to multiply the 2 numbers. This takes some dozens of additional cycles....

So please make sure that your code does actually use the floating point hardware unit.

Regards.

PS: I've used CCS3.3.82.12 and Code Generation Tools 5.2.1 for the test.

0 david12870 over 16 years ago

Prodigy 40 points

Hello!

Thanks for your suggestions.

I use CCS 3.3.82.13 and Code Generation Tools 5.2.0.

The wait states are cunfigured.

I have the correct compiler model and libary version.

I use rts2800_fpu32.lib but I can not find the file in the specified folder.

Regards

0 Lori Heustess over 16 years ago in reply to david12870

TI__Guru* 92200 points

David,

What type of memory are you running from? And where is the data/stack? RAM, Flash, XINTF? Can you show us the disassembly?

Regards

Lori

0 Ricardo Picatoste Ruilope over 16 years ago

Prodigy 215 points

Hi David,

Just as suggestion, have you tried measuring the time in other ways? Is all that time due to only the multiplication?

For example, adding the same lines but whitout the multiplication, what is the difference? I mean doing this, to see one low level in GPIO31 long and one short (will be the difference 1 us?):

while (1)

{

GpioDataRegs.CPACLEAR.bit.GPIO31 = 1;

x = a * b ;

GpioDataRegs.GPASET.bit.GPIO31 = 1;

GpioDataRegs.CPACLEAR.bit.GPIO31 = 1;

GpioDataRegs.GPASET.bit.GPIO31 = 1;

};

You can also use the profiler and see how many cycles consumes each line.

Let us know :-)

Regards,

Ricardo.

0 Frank Bormann over 16 years ago in reply to Ricardo Picatoste Ruilope

Mastermind 7585 points

I am sorry if my previous reply was not clear enough. Use FPU-support plus the fpu -library and you get 60ns. A project without fpu-support and a non-fpu-librarry will give a runtime in the area of microseconds.

Of course we assume that the device runs at maximum speed (correct PLL-setup).

Regards, Frank.

PS: Ricardo, a measurement setup like the one you described in your reply will not work, because the GPIO's have a maximum toggle speed of 20 MHz. So if you use a SET, CLEAR, SET - Sequence at maximum CPU-speed you will miss the CLEAR instruction at the I/O-pins.

0 david12870 over 16 years ago in reply to Lori Heustess

Prodigy 40 points

Lori,

Memory: On-Chip Memory 256K x 16 Flash.

The data/stack is in the Flash.

The disassembly-file is in the appendix.

@ Frank

The project is running with fpu-support and a fpu-libary.

@ Ricardo

I've tried. But there is no change in the result.

The "Clock View" shows me, that a definiton of a float variable needs 48 cycles ( example: a = 8.1;). That's impossible.

Regards

David

disassembly-test.pdf

0 Frank Bormann over 16 years ago in reply to david12870

Mastermind 7585 points

David,

(1) Why did you place data and stack into FLASH?? It will never work!

(2) Here's my performance test program for a 28335 floating point multiply:

#include "DSP2833x_Device.h"
extern void InitSysCtrl(void);

int main(void)
{
float a=8.2,b=8.2,x;
InitSysCtrl(); // Basic Core Initialization
EALLOW;
GpioCtrlRegs.GPADIR.bit.GPIO0=1; // GPIO0 (ePWM1A) = output
EDIS;
GpioDataRegs.GPASET.bit.GPIO0 = 1;
x = a*b;
GpioDataRegs.GPACLEAR.bit.GPIO0 = 1;

while(1); // stop here
}

A scope - measurement (see attachment) shows that the DSP spends 80 nanoseconds between the SET and CLEAR instruction for GPIO0. My 28335 runs at 100 MHz.

Result: As stated in my previous reply the 28335 consumes 6 cycles (or 60 ns @ 100 MHz) for a floating point multiply instruction.

I hope this will settle this topic.

Regards

0 david12870 over 16 years ago in reply to Frank Bormann

Prodigy 40 points

Thank you for your answers.

I have solved the problem.

Frank you have right. Thank you very much.

Regards

David

C2000™︎ microcontrollers

C2000 microcontrollers forum

DSP TMS320F28335 needs far too much time