This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Matrix Multiplication Optimization - DSP C6678

Other Parts Discussed in Thread: TMS320C6678, MATHLIB

Hello, 

I'm working with CCS v 6.1.3 for a DSP multicore chip C6678 on a TMS320C6678 EVM, and I'm trying to optimize the performance of a matrix multiplication algorithm; in particular, optimizing the multiplication of a 24 x 660 matrix and a 660 x 2 matrix. To test this, I modified the code included in DSPLIB for DSPF_sup_mat_mul_66_LE_ELF to multiply matrices of sizes 24 x 660 and 660 x 2. The calculated number of clock cycles for this operation gives me 12378 cycles, but the measured result I am getting is 22369 - nearly double. Here are some additional details about my code:

  • My project was set to be built in Release mode, with the highest optimization level (level 3) for the compiler
  • The timer functions for the clock cycles are the same as those included in the DSPLIB file
  • The L1D-cache and L1P-cache are both enabled and set at 32KB
  •  Compiler version TI v8.1.0
  • Using SYS/BIOS 6.45.1.29, DSPLIB 3.4.0.0, MATHLIB 3.1.2.1
  • All memory is on L2SRAM
  • Running on Windows 7 64-bit, Service Pack 1

So here's my question: What's the best way to bring the measured number of clock cycles closer to the calculated number of clock cycles for this particular case? 

  • I have little expertise with system memory issues.  So I presume that this ...

    Anirudh Sridhar said:
    The L1D-cache and L1P-cache are both enabled and set at 32KB

    and this ...

    Anirudh Sridhar said:
    All memory is on L2SRAM

    ... means that no cycles are lost due to waiting on memory to respond.  If that is presumption is wrong, then so is the rest of this post.

    I measure the performance of code like this as described in this wiki article on tuning loops.  You inspect the compiler generated assembly, find the initiation interval (ii for short) of a key loop (or two) and focus on making it smaller.  

    Anirudh Sridhar said:
    I modified the code included in DSPLIB for DSPF_sup_mat_mul_66_LE_ELF

    If there are opportunities for improvement, they are probably available in these modifications.  So I need to see this code.  Please preprocess it and submit it.  Also show all the build options exactly as the compiler sees them.

    Thanks and regards,

    -George