This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DSP TMS320F28335 needs far too much time

Other Parts Discussed in Thread: TMS320F28335

DSP TMS320F28335 needs far too much time for calculating mathematical algorithms.

The Pin 31 is defined as an output Pin and an Oszi is connected to it.

The oscilosscope shows that a multiplication of two float variables takes 1µs. Normally it should take max. 50 ns.

//Example

float a=8.2, b=8.2, x;

while (1)

{

    GpioDataRegs.CPACLEAR.bit.GPIO31 = 1;

    x = a * b ;

   GpioDataRegs.GPASET.bit.GPIO31 = 1;

};

Does anybody  know why this cannot meet the expected performance?

  • David,

    What memory are you running this code from? Internal RAM is zero wait states but the External Bus and internal Flash have wait states associated with them. If you have not configured the wait states they be at their default slowest rate.

    Regards,

    Tim Love

  • I think more important than the number of wait states is the correct compiler model and the library version in your project.

    1. When you use floating point support ( Bulid options - Compiler - Advanced - Floating Point Support - fpu32) and library rts2800_fpu32.lib the compiler generates for line x=a*b:

         MOV32 R0H,*-SP[2]

         MOV32 R1H,*-SP[4]

        MPYF32 R0H,R1H,R0H

        NOP

        MOV32 *-SP[2],R0h

       which uses 4 floating point unit instructions and takes 6 cycles ( appr. 40 ns @ 150MHz).

     

    2. If you set your Build - Options to "Floating Point Support : none " and you use library "rts2800_ml.lib" you get from line x=a*b:

        MOVL ACC,*-SP[6]

       MOVL *-SP[2], ACC

       MOVL ACC,*-SP[4]

       LCR #FS$$MPY

       MOVL *-SP[8],ACC

     which uses a library function call (FS$$MPY) to multiply the 2 numbers. This takes some dozens of additional cycles....

    So please make sure that your code does actually use the floating point hardware unit.

    Regards.

    PS: I've used CCS3.3.82.12 and Code Generation Tools 5.2.1 for the test.

     

     

     

     

     

         

     

     

  • Hello!

    Thanks for your suggestions.

    I use CCS 3.3.82.13 and Code Generation Tools 5.2.0.

    The wait states are cunfigured.

    I have the correct compiler model and libary version.

    I use rts2800_fpu32.lib but I can not find the file in the specified folder.

     

    Regards

     

  •  

    David,

    What type of memory are you running from?  And where is the data/stack? RAM, Flash, XINTF?  Can you show us the disassembly?

    Regards

    Lori

  • Hi David,

    Just as suggestion, have you tried measuring the time in other ways? Is all that time due to only the multiplication?

    For example, adding the same lines but whitout the multiplication, what is the difference? I mean doing this, to see one low level in GPIO31 long and one short (will be the difference 1 us?):

    while (1)

    {

        GpioDataRegs.CPACLEAR.bit.GPIO31 = 1;

        x = a * b ;

       GpioDataRegs.GPASET.bit.GPIO31 = 1;

       GpioDataRegs.CPACLEAR.bit.GPIO31 = 1;

       GpioDataRegs.GPASET.bit.GPIO31 = 1;

    };

     

    You can also use the profiler and see how many cycles consumes each line.

    Let us know :-)

     

    Regards,

    Ricardo.

  • I am sorry if my previous reply was not clear enough.  Use FPU-support plus the fpu -library and you get 60ns. A project without fpu-support and a non-fpu-librarry will give a runtime in the area of microseconds.

    Of course we assume that the device runs at maximum speed (correct PLL-setup).

    Regards, Frank.

    PS:  Ricardo, a measurement setup like the one you described in your reply will not work, because the GPIO's have a maximum toggle speed of 20 MHz. So if you use a SET, CLEAR, SET - Sequence at maximum CPU-speed you will miss the CLEAR instruction at the I/O-pins.

     

      

     

  • Lori,

    Memory: On-Chip Memory 256K  x  16 Flash.

    The data/stack is in the Flash.

    The disassembly-file is in the appendix.

     

    @ Frank

    The project is running with fpu-support and a fpu-libary.

     

    @ Ricardo

    I've tried. But there is no change in the result.

    The "Clock View" shows me, that a definiton of a float variable needs 48 cycles ( example: a = 8.1;). That's impossible.

     

    Regards

    David

    disassembly-test.pdf
  • David,

    (1) Why did you place data and stack into FLASH?? It will never work!

    (2) Here's my performance test program for a 28335 floating point multiply:

    #include "DSP2833x_Device.h"
    extern void InitSysCtrl(void);

    int main(void)
    {
     float a=8.2,b=8.2,x;
     InitSysCtrl();    // Basic Core Initialization
     EALLOW;
     GpioCtrlRegs.GPADIR.bit.GPIO0=1; // GPIO0 (ePWM1A) = output
     EDIS;
     GpioDataRegs.GPASET.bit.GPIO0 = 1;
     x = a*b;
     GpioDataRegs.GPACLEAR.bit.GPIO0 = 1;
     
     while(1);  // stop here
    }

    A scope -  measurement (see attachment) shows that the DSP spends 80 nanoseconds between the SET and CLEAR instruction for GPIO0. My 28335 runs at 100 MHz.

    Result:  As stated in my previous reply the 28335 consumes 6 cycles (or 60 ns @ 100 MHz) for a floating point multiply instruction.

    I hope this will settle this topic.

    Regards

     

     

  • Thank you for your answers.

    I  have solved the problem.

    Frank you have right. Thank you very much.

     

    Regards

    David