This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

c66x loop optimization

Version of MCSDK - MCSDK_HPC_03_00_01_08
Processor and board platform (EVM & its revision) - Hawking chip Evaluation Board EVMK2H Rev 3.0.

API - OpenCL 1.1, no CSL

Hello everyone,

Very specific question: the openCL code was compiled with the options -Wall -O3 -k options. Below is the ASM code; ii=19. The question is how to optimize this loop and reduce ii? I think "Split a long life (split-join)" should help at this, but I am not sure what that means. I would appreciate if you point to some reading materials as well: I remember I read a document which describes all possible compiler hints on optimizing the loops, but I can not find it again. I also remember that even when I had that document I did not find any information about the meaning of things like "[B_M66] <0,15> " (in the right of every loop instruction). What do they mean?

;*----------------------------------------------------------------------------*
;*   SOFTWARE PIPELINE INFORMATION
;*
;*      Loop found in file               : Unknown
;*      Known Minimum Trip Count         : 1                    
;*      Known Max Trip Count Factor      : 1
;*      Loop Carried Dependency Bound(^) : 19
;*      Unpartitioned Resource Bound     : 6
;*      Partitioned Resource Bound(*)    : 6
;*      Resource Partition:
;*                                A-side   B-side
;*      .L units                     0        1     
;*      .S units                     1        0     
;*      .D units                     2        4     
;*      .M units                     2        2     
;*      .X cross paths               0        3     
;*      .T address paths             2        4     
;*      Logical  ops (.LS)           0        2     (.L or .S unit)
;*      Addition ops (.LSD)         13        8     (.L or .S or .D unit)
;*      Bound(.L .S .LS)             1        2     
;*      Bound(.L .S .D .LS .LSD)     6*       5     
;*
;*      Searching for software pipeline schedule at ...
;*         ii = 19 Schedule found with 2 iterations in parallel
;*
;*      Register Usage Table:
;*          +-----------------------------------------------------------------+
;*          |AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA|BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB|
;*          |00000000001111111111222222222233|00000000001111111111222222222233|
;*          |01234567890123456789012345678901|01234567890123456789012345678901|
;*          |--------------------------------+--------------------------------|
;*       0: |*** *****                       |**  ** ***      ***             |
;*       1: |*** *****                       |**  ** ***      ***             |
;*       2: |*********                       |**  ** ***      ***             |
;*       3: |*********                       |**  ** ***      ***             |
;*       4: |*********                       |**  ** ***      ***             |
;*       5: |**********                      |**  ** ***      ***             |
;*       6: |*********                       |**  ** ***      ***             |
;*       7: |*********                       |**  ******      ***             |
;*       8: |*********                       |**  ******      ***             |
;*       9: |**********                      |**  ******      ****            |
;*      10: |*** *****                       |**  ******      ***             |
;*      11: |*** *****                       |**  ******      ***             |
;*      12: |*** *****                       |**  ******      ***             |
;*      13: |*** *****                       |**  ******      ****            |
;*      14: |*** *****                       |**  ******       **             |
;*      15: |*** *****                       |**  ******      ***             |
;*      16: |*** *****                       |**  ******      ***             |
;*      17: | ********                       |**  ******      ***             |
;*      18: | ********                       |**  ******      ****            |
;*          +-----------------------------------------------------------------+
;*
;*      Done
;*
;*      Collapsed epilog stages       : 1
;*      Prolog not removed
;*      Collapsed prolog stages       : 0
;*
;*      Minimum required memory pad   : 0 bytes
;*
;*      For further improvement on this loop, try option -mh56
;*
;*      Minimum safe trip count       : 1
;*      Min. prof. trip count  (est.) : 3
;*
;*      Mem bank conflicts/iter(est.) : { min 0.000, est 0.125, max 1.000 }
;*      Mem bank perf. penalty (est.) : 0.7%
;*
;*      Effective ii                : { min 19.00, est 19.12, max 20.00 }
;*
;*
;*      Total cycles (est.)         : 6 + trip_cnt * 19        
;*----------------------------------------------------------------------------*
;*       SETUP CODE
;*
;*                  MVK     1,A2    ; [] 
;*                  MV      A2,A1   ; [] 
;*                  MV      A1,A0   ; [] 
;*
;*        SINGLE SCHEDULED ITERATION
;*
;*        $C$C205:
;*   0      [ A2]   LDW     .D2T2   *B5++(4),B4       ; [B_D64P] 
;*   1              NOP     1       ; [A_L66] 
;*   2      [ A2]   LDW     .D1T1   *A6++(4),A7       ; [A_D64P] 
;*   3              NOP     2       ; [A_L66] 
;*   5              CMPEQ   .L2     B7,B4,B0          ; [B_L66]  ^ 
;*   6      [ A1]   LDW     .D2T2   *+B8[B4],B17      ; [B_D64P] 
;*   7              MV      .L1     A7,A3             ; [A_L66] Split a long life (split-join)
;*     ||   [!B0]   LDW     .D1T1   *A5(0),A3         ; [A_D64P]  ^ 
;*   8              NOP     1       ; [A_L66] 
;*   9              SUB     .L2     B1,1,B1           ; [B_L66] 
;*  10              MV      .S1     A3,A9             ; [A_S66] Split a long life (split-join)
;*     ||   [!B1]   ZERO    .L1     A2                ; [A_L66] 
;*  11              MPYSP   .M2X    A7,B17,B6         ; [B_M66] 
;*     ||   [!B0]   MPYSP   .M1     A9,A3,A9          ; [A_M66]  ^ 
;*     ||           MV      .L1     A2,A3             ; [A_L66] Split a long life (split-join)
;*  12              MV      .L2     B4,B6             ; [B_L66] Split a long life (split-join)
;*  13              NOP     1       ; [A_L66] 
;*  14              MV      .L2     B6,B19            ; [B_L66] Split a long life (split-join)
;*  15              MV      .S2     B19,B6            ; [B_Sb66] Split a long life (split-join)
;*     ||           MPYSP   .M2X    A4,B6,B19         ; [B_M66] 
;*     ||   [!B0]   MPYSP   .M1     A4,A9,A8          ; [A_M66]  ^ 
;*     ||   [!A1]   MVK     .L2     1,B0              ; [B_L66]  ^ 
;*     ||           MV      .L1     A3,A1             ; [A_L66] Split a long life (split-join)
;*  16      [!B0]   LDW     .D2T2   *+B9[B6],B16      ; [B_D64P]  ^ 
;*  17              NOP     2       ; [A_L66] 
;*  19              FADDSP  .L2     B19,B16,B16       ; [B_L66] 
;*     ||   [ A2]   B       .S1     $C$C205           ; [A_S66] 
;*  20              NOP     1       ; [A_L66] 
;*  21      [!B0]   FADDSP  .L2X    B16,A8,B19        ; [B_L66]  ^ 
;*  22      [ A0]   MV      .L2     B16,B18           ; [B_L66] 
;*     ||           MV      .L1     A1,A3             ; [A_L66] Split a long life (split-join)
;*  23              NOP     1       ; [A_L66] 
;*  24      [!B0]   STW     .D2T2   B19,*+B9[B6]      ; [B_D64P]  ^ 
;*     ||           MV      .L1     A3,A0             ; [A_L66] Split a long life (split-join)
;*  25              ; BRANCHCC OCCURS {$C$C205}       ; [] 
;*
;*       RESTORE CODE
;*
;*                  MV      B18,B16 ; [] 
;*----------------------------------------------------------------------------*
$C$L150:    ; PIPED LOOP PROLOG
;** --------------------------------------------------------------------------*
$C$L151:    ; PIPED LOOP KERNEL
;          EXCLUSIVE CPU CYCLES: 19

   [ A1]   LDW     .D2T2   *+B8[B4],B17      ; [B_D64P] <0,6> 
|| [!B0]   LDW     .D1T1   *A5(0),A3         ; [A_D64P] <0,6>  ^ 

           MV      .L1     A7,A3             ; [A_L66] <0,7> Split a long life (split-join)
           NOP             1                 ; [A_L66] 
           SUB     .L2     B1,1,B1           ; [B_L66] <0,9> 

           MV      .S1     A3,A9             ; [A_S66] <0,10> Split a long life (split-join)
|| [!B1]   ZERO    .L1     A2                ; [A_L66] <0,10> 

           MV      .L1     A2,A3             ; [A_L66] <0,11> Split a long life (split-join)
||         MPYSP   .M2X    A7,B17,B6         ; [B_M66] <0,11> 
|| [!B0]   MPYSP   .M1     A9,A3,A9          ; [A_M66] <0,11>  ^ 

           MV      .L2     B4,B6             ; [B_L66] <0,12> Split a long life (split-join)
           NOP             1                 ; [A_L66] 
           MV      .L2     B6,B19            ; [B_L66] <0,14> Split a long life (split-join)

           MV      .S2     B19,B6            ; [B_Sb66] <0,15> Split a long life (split-join)
||         MV      .L1     A3,A1             ; [A_L66] <0,15> Split a long life (split-join)
||         MPYSP   .M2X    A4,B6,B19         ; [B_M66] <0,15> 
|| [!A1]   MVK     .L2     1,B0              ; [B_L66] <0,15>  ^ 
|| [!B0]   MPYSP   .M1     A4,A9,A8          ; [A_M66] <0,15>  ^ 

   [!B0]   LDW     .D2T2   *+B9[B6],B16      ; [B_D64P] <0,16>  ^ 
           NOP             2                 ; [A_L66] 

   [ A2]   BNOP            $C$L151,1         ; [] <0,19> 
||         FADDSP  .L2     B19,B16,B16       ; [B_L66] <0,19> 
|| [ A2]   LDW     .D2T2   *B5++(4),B4       ; [B_D64P] <1,0> 

   [!B0]   FADDSP  .L2X    B16,A8,B19        ; [B_L66] <0,21>  ^ 
|| [ A2]   LDW     .D1T1   *A6++(4),A7       ; [A_D64P] <1,2> 

           MV      .L1     A1,A3             ; [A_L66] <0,22> Split a long life (split-join)
|| [ A0]   MV      .L2     B16,B18           ; [B_L66] <0,22> 

           NOP             1                 ; [A_L66] 

           MV      .L1     A3,A0             ; [A_L66] <0,24> Split a long life (split-join)
|| [!B0]   STW     .D2T2   B19,*+B9[B6]      ; [B_D64P] <0,24>  ^ 
||         CMPEQ   .L2     B7,B4,B0          ; [B_L66] <1,5>  ^ 

;** --------------------------------------------------------------------------*
$C$L152:    ; PIPED LOOP EPILOG
;** --------------------------------------------------------------------------*