This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320F28388D: Dual MAC Required Build Options

Part Number: TMS320F28388D
Other Parts Discussed in Thread: C2000WARE

Hi team,

The page seems to crash in my previous question.TMS320F28377D: Dual MAC Required Build Options

The compiler test case I submitted is as follow: https://tidrive.itg.ti.com/a/TFG8Yt5RvEdeUbxD/84bd9166-9fd1-4611-8b65-ff67e71701c1?l

compiler options:  "C:/ti/ccs1040/ccs/tools/compiler/ti-cgt-c2000_20.2.5.LTS/bin/cl2000" -v28 -ml -mt --cla_support=cla2 --float_support=fpu64 --idiv_support=idiv0 --tmu_support=tmu0 --vcu_support=vcrc -O2 --opt_for_speed=5 --fp_mode=relaxed --define=DEBUG --define=CPU1 --preproc_with_comment --preproc_with_compile --diag_suppress=10063 --diag_warning=225 --diag_wrap=off --display_error_number --gen_func_subsections=on --abi=eabi -z -m"empty_driverlib_project.map" --heap_size=0x200 --stack_size=0x100 --warn_sections -i"C:/ti/ccs1040/ccs/tools/compiler/ti-cgt-c2000_20.2.5.LTS/lib" -i"C:/ti/ccs1040/ccs/tools/compiler/ti-cgt-c2000_20.2.5.LTS/include" --reread_libs --define=RAM --diag_wrap=off --display_error_number --xml_link_info="empty_driverlib_project_linkInfo.xml" --entry_point=code_start --rom_model -o "empty_driverlib_project.out" "./empty_driverlib_main.obj" "./device/device.obj" "./device/f2838x_codestartbranch.obj" "../2838x_RAM_lnk_cpu1.cmd" "C:/ti/c2000/C2000Ware_3_04_00_00/driverlib/f2838x/driverlib/ccs/Debug/driverlib.lib" -llibc.a 

Please help check why I cannot generate optimized code. Thanks!

  • Thank you for sending the test case.  I understand the problem now.

    The loop with the __dmac operates on global variables.  Change it to operate on local variables instead.  Here is one way to to do that ...

    Change this code ...

        long res = 0;
        long temp = 0;
        for (i=0; i < 5; i++) // N does not have to be a known constant
             __dmac(((long *)a)[i], ((long *)b)[i], res, temp, 0);
        res += temp;

    ... to call a function instead ...

        res = intrinsic_mac(a, b, 10);

    The implementation of that function is copied from page 27 of the C28x optimization presentation in the article TI Compiler Presentations.

    int32_t intrinsic_mac(int16_t *p1, int16_t *p2, int_fast16_t length)
    {
       int_fast16_t i;
       int32_t *p1_32 = (int32_t *) p1;
       int32_t *p2_32 = (int32_t *) p2;
       int32_t acc1, acc2;
    
       acc1 = acc2 = 0;
       length >>= 1;
    
       for (i = 0; i < length; i++)
          __dmac(p1_32[i], p2_32[i], acc1, acc2, 0);
    
       return acc1 + acc2;
    }

    Add a prototype for this function in an appropriate header file ...

    int32_t intrinsic_mac(int16_t *p1, int16_t *p2, int_fast16_t length);

    This implementation of intrinsic_mac presumes the arrays passed, a and b, are aligned on 32-bit boundaries.  That does not happen by default.  Add these pragmas to tell the compiler this alignment is required ...

    #pragma DATA_ALIGN(a, 2)
    #pragma DATA_ALIGN(b, 2)
    

    After all these changes, the DMAC loop generated is ...

            RPT       AR5
    ||      DMAC     ACC:P,*XAR4++,*XAR7++
            MOVL      XAR6,ACC

    Thanks and regards,

    -George

  • Hi George,

    Thanks for support! The DMAC loop generate well now.

    There is one more thing I want to ask you for help. At present, I have a task to write a compiler optimization-related guidance document for the local team, so as to better support customers. I know that you are an expert in this field. What work is worth verifying about C2000 compiler optimization?

    -Bruce

  • Please consider using the C28x Optimization Guide, and both the C28x compiler presentations in the article TI Compiler Presentations.

    Thanks and regards,

    -George