This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Compiler/TMS320C6678: ILC Loading and SPLOOP Delay Issue

Part Number: TMS320C6678

Tool/software: TI C/C++ Compiler

Hi,

I have an urgent issue with the TMS320C6678 DSP. Sometimes, not constantly, running my program I get to an opcode exception.

Looking at the code pointed by the ERP register and the code before it, I couldn't see any code overwrite and the code is as it was when I loaded it to the chip.

The exception is at the SPLOOP instruction.

After reading the C66x CPU and Instruction Set Reference Guide (SPRUGH7, November 2010), paragraph 8.4.3: "There is a 4 cycle latency between when ILC is loaded and when its contents are available for use. When used with the SPLOOP instruction, it should be loaded 4 cycles before the SPLOOP instruction is encountered. ILC must be loaded explicitly using the MVC instruction." I expected to see delay between ILC loading and the following SPLOOP instrcution, but there isn't such delay in our generated code.

The code which got the exception is the following, ERP points to 0x0085f054, This code was generated by the 7.4.1 (and 7.6.0) compiler version and it's originally a C code. I would expect a delay between the MVC.S2 B4, ILC instruction and the SPLOOP 2 instruction (due to the above paragraph from the reference guide):

"

0085f048:   1246                MV.L1X        B4,A0
0085f04a:   C22E     ||         ADD.S1        A6,A4,A6
0085f04c:   1BF6     ||         MVK.D1        0,A7
0085f04e:   A6BA         [!A0]  BNOP.S1       $C$DW$L$FUNC$4$E (PC+52 = 0x0085f074),5
0085f050:   C69003A2 ||  [ A0]  MVC.S2        B4,ILC
          $C$L22:
0085f054:   0CE6                SPLOOP        2
0085f056:   3761     ||         ADD.L2X       A6,1,B6
          $C$DW$L$FUNC$4$B, $C$L23:
0085f058:   3F5D                LDB.D2T2      *B6++[2],B5
0085f05a:   2F5C                LDB.D1T1      *A6++[2],A5
0085f05c:   EDC184B0            .fphead       n, l, W, B, br, nosat, 1101110b
0085f060:   4C6E                NOP           3
0085f062:   9AE3                EXTU.S2       B5,24,24,B4
0085f064:   01950CA0            SHL.S1        A5,0x8,A3
0085f068:   02107FF8            OR.L1X        A3,B4,A4
0085f06c:   0C6E                NOP           1
0085f06e:   7262                EXTU.S1       A4,16,16,A3
0085f070:   5CE6                SPKERNEL      4,1
0085f072:   63F0     ||         ADD.L1        A3,A7,A7
          $C$DW$L$FUNC$4$E, $C$L24, $C$L25:
0085f074:   000C0363            B.S2          B3
0085f078:   019E09A0 ||         SHRU.S1       A7,0x10,A3

"

As I said, the opcode seems valid and legal, so I can't understand why I get an exception.

Please assist ASAP.

Thanks in advance,

Elad.

  • Hi,

    Are you able to attach the code snippet of C source code and the compiler options used for this issue?

    Regards, Eric
  • Hi Eric,

    The C code is as follows:

    tuint FUNC ( tuint *p_data, tuint len, tword byteOffset )
    
    {
    
     tulong  chksum = 0;
    
     while (len > 0)
    
     {
    
       chksum += (tulong)pktRead16bits_m ((tword *)p_data, byteOffset);
    
       p_data++;
    
       len--;
    
     }
    
     chksum = (chksum >> 16) + (chksum & 0xFFFF); /* add in carry   */
    
     chksum += (chksum >> 16);                    /* maybe one more */
    
     return (tuint)chksum;
    
    }

    Compiler options are as follows:

    -c -mv6600 --abi=eabi -k -q --mem_model:data=far -al -pds1111 -pds827 -pds824 -pds837 -pds1037 -pds195 -pdsw225 -pdsw994 -pdsw262 -pds77 -pden -pds232 --consultant -mw -os -mi10000 -as -ss -o3 -Dti_targets_elf_C66 -Dxdc_target_types__=ti/targets/std.h -fc

    Regards,

    Elad.

  • Thanks Elad, I asked our colleague in compiler team for help.

    Regards, Eric
  • Unfortunately, I cannot build the test case with what you show.  While I can guess, I don't know the types for tuint, tword, etc.  It is also clear that in the loop ...

    Elad Roichman said:

    while (len > 0)
    {
    chksum += (tulong)pktRead16bits_m ((tword *)p_data, byteOffset);
    p_data++;
    len--;
    }

    pktRead16bits_m is either a macro, or a function call that is inlined.  

    Please submit a test case as described in the article How to Submit a Compiler Test Case.

    Since you use compiler version 7.4.1, please consider upgrading to version 7.4.23.  It is the same as version 7.4.1, but with many bug fixes.  There is a chance this problem has been fixed already.

    Thanks and regards,

    -George

  • Hi George,

    I'm sorry for the missing info.

    tuint is unsigned short.

    tword is unsigned char.

    tulong is unsigned 32 bit.

    pktRead16bits_m is defined as follows:

    static inline tuint pktRead16bits_m (tword *base, tuint byteOffset) 
    {
      char *wptr = ((char *)base + byteOffset);
      tuint ret;
    
      /* Shift/mask is endian-portable, but look out for stupid compilers */
      ret = (((tuint)wptr[0]) << 8) | (wptr[1] & 0xFF);
    
      return ret;
    }

    Regards,

    Elad.

  • Thank you the test case.  I can generate the same assembly output.  I filed the entry CODEGEN-5452 in the SDOWP system to have this investigated.  You are welcome to follow it with the SDOWP link below in my signature.

    Thanks and regards,

    -George

  • Hi George,

    First of all, thank you for filing the issue in the SDOWP.

    Secondly, do you have any estimation on how long it's going to take to solve it?

    Thirdly, I can't follow the issue on the SDOWP link that you provided. I get the following error: "ClearQuest login of user: This login is forbidden: CRMMD1881E Invalid Credentials: Either the login name or the password is incorrect."

    It\'s crucial for us to have this issue fixed ASAP. This issue seems to cause our operational DSP to crash due to an exception. Please forward our request for a fast solution to this issue to the appropriate team.

    Thanks in advance,

    Elad.

  • Elad Roichman said:
    Secondly, do you have any estimation on how long it's going to take to solve it?

    The issue has not been analyzed.  So, unfortunately, no estimate is available at this time.

    Elad Roichman said:
    Thirdly, I can't follow the issue on the SDOWP link that you provided.

    It appears you can't even view the first screen.  Please try again.  But, even once you are past that, you still won't see the SDOWP entry.  That is because I made an error when I filed it.  Sorry.  I corrected the error, but that change takes a few hours to propagate through the system.

    Elad Roichman said:
    It\'s crucial for us to have this issue fixed ASAP.

    I will convey your urgency to the team.

    Please understand that standard procedure calls for a fix to be applied to versions 8.1.x, 8.2.x, and 8.3.x.  The 7.4.x series of releases is no longer supported.  Is it practical for you to upgrade to one of those releases?

    Thanks and regards,

    -George

  • Hi George,

    We'll have to have the fix for 7.2.24 version.

    It's not practical for us to update to one of 8.x.x releases for several reasons. One of these reasons is the following open issue:

    https://e2e.ti.com/support/tools/ccs/f/81/t/738993

    Please do whatever you can in order to get the fix for version 7.2.24 ASAP.

    Thanks in advance,

    Elad.

  • A workaround to consider ...

    Disable the software pipeline optimization just for the problem loop, by adding this statement inside it ...

          asm(" ; ");

    For those reading along ... I generally advise against using the asm statement at all.  But it is a useful workaround for this specific case.  This asm statement merely inserts an empty comment into the assembly instructions generated by the compiler.  More importantly, for this situation, it causes the compiler to presume some instruction is being injected into the loop, and therefore it is not possible to optimize it correctly.  

    Thanks and regards,

    -George

  • Hmm, something's not quite right here.  Consider the problem sequence:

           [!A0]  BNOP.S1       around,5
        || [ A0]  MVC.S2        B4,ILC
                  SPLOOP        2

    Yes, the MVC is adjacent to the SPLOOP, but the BNOP has 5 delay slots, even though its predicate is false. 

    In the "TMS320C600 CPU and Instruction Set Reference Guide", SPRU189E (January 2000), the entry for BNOP (page 5-54) says "Note: BNOP instructions may be predicated.  The predication condition controls whether or not the branch is taken, but does not affect the insertion of NOPs.  BNOP always inserts the number of NOPs specified by N, regardless of the predication condition."

    I had a colleague test the cutdown test case on C66x hardware (I added an appropriate main function).  The compiler did generate the above instruction sequence, but the hardware did not have an exception of any kind.  We stepped through the instructions and observed the NOPs from the BNOP occuring, despite the false condition.  We also let it run. In both cases, we got the output expected.

    For the cutdown test case, at least, I conclude that the compiler is doing the right thing and that the bug must lie elsewhere.

    Are you able to single-step through your function?  Are you able to see the NOPs due to the BNOP occuring, or does the BNOP take only one cycle for you?

  • Elad,

    Can you provide the following info for investigation:
    1) Is this consistent: happening every time or intermittent? frequency of failure?

    2) can you provide a ccs project which demos this issue? If you can't not provide this, please provide at least the binary. But I strongly recommend you provide the code which demos the failure in order to reduce the time for us to reproduce the issue.

    3) do you see this issue on your custom board or TI EVM?

    4) what is the boot loader being used?

    5) what is the PG version of the C6678? If it is TI EVM, what is the EVM version #?

    Thank you!

    best regards,
    David Zhou
  • 6) did it failed on one C6678 on one board or you have tried the same test on multiple boards and it all failed?

    regards,
    David
  • Archaeologist said:

    Hmm, something's not quite right here.  Consider the problem sequence:

           [!A0]  BNOP.S1       around,5
        || [ A0]  MVC.S2        B4,ILC
                  SPLOOP        2

    Yes, the MVC is adjacent to the SPLOOP, but the BNOP has 5 delay slots, even though its predicate is false. 

    In the "TMS320C600 CPU and Instruction Set Reference Guide", SPRU189E (January 2000), the entry for BNOP (page 5-54) says "Note: BNOP instructions may be predicated.  The predication condition controls whether or not the branch is taken, but does not affect the insertion of NOPs.  BNOP always inserts the number of NOPs specified by N, regardless of the predication condition."

    I had a colleague test the cutdown test case on C66x hardware (I added an appropriate main function).  The compiler did generate the above instruction sequence, but the hardware did not have an exception of any kind.  We stepped through the instructions and observed the NOPs from the BNOP occuring, despite the false condition.  We also let it run. In both cases, we got the output expected.

    For the cutdown test case, at least, I conclude that the compiler is doing the right thing and that the bug must lie elsewhere.

    Are you able to single-step through your function?  Are you able to see the NOPs due to the BNOP occuring, or does the BNOP take only one cycle for you?

    Hi,
    Doing a single-step through the function's assembly code, I was able to see the NOPs due to the BNOP, despite the fact that A0 wasn't '0'.
    In case an interrupt will occur between the BNOP and the SPLOOP instructions (given that A0 != '0'), could that cause an issue?
    Regards,
    Elad.

  • Hi David,

    My answers to your questions are below:

    dzhou said:
    Elad,

    Can you provide the following info for investigation:
    1) Is this consistent: happening every time or intermittent? frequency of failure?

    My problem isn't consistent. It happens from time to time. Unfortunately, I don't have the frequency of failure to give you.

    dzhou said:

    2) can you provide a ccs project which demos this issue? If you can't not provide this, please provide at least the binary. But I strongly recommend you provide the code which demos the failure in order to reduce the time for us to reproduce the issue.

    It's not really something that I can do, especially given the fact that it's not consistent.

    Is there any debug info that will be helpful once the problem occurred?

    dzhou said:


    3) do you see this issue on your custom board or TI EVM?

    I see the issue on TI EVM.

    dzhou said:


    4) what is the boot loader being used?

    I'm using the Ethernet boot loader.

    dzhou said:


    5) what is the PG version of the C6678? If it is TI EVM, what is the EVM version #?

    The EVM is Rev 3B.

    dzhou said:


    Thank you!

    best regards,
    David Zhou

  • Hi David,

    dzhou said:
    6) did it failed on one C6678 on one board or you have tried the same test on multiple boards and it all failed?

    It failed on several boards running our tests.

    Regards,

    Elad.

  • Elad,

    A few followup questions from your response to David above:

    - Did the work-around of adding asm(" ; "); suggested earlier to disable SW pipe-lining optimization work?
    - What is the value of the IERR (internal exception report register) bit 3 when you see the exception?

    Lali
  • Elad Roichman said:
    Doing a single-step through the function's assembly code, I was able to see the NOPs due to the BNOP, despite the fact that A0 wasn't '0'.
    In case an interrupt will occur between the BNOP and the SPLOOP instructions (given that A0 != '0'), could that cause an issue?

    Please bear in mind that I am a C6000 compiler expert, but I am not a C6000 hardware expert.  In particular, I cannot diagnose the hardware exception.

    I do not see any reason why an interrupt between the BNOP||MVC packet and the SPLOOP packet would cause any sort of problem.

  • Hi Lali,

    Unfortunately, I wasn't able to test the workaround yet. Please take into consideration that the problem is not consistent and it might take a few days until I'll get an answer if the workaround seems to change the behavior.
    The OPX bit in the IERR register is '1' when the problem occurs.

    Regards,

    Elad.
  • Hi Elad,

    We would need to get a project that shows the problem, and is reproducible on TI's EVM in order to further debug.

    As you would understand, it would be difficult otherwise to root cause.

    It's not really something that I can do, especially given the fact that it's not consistent.
    Is there any debug info that will be helpful once the problem occurred?

    Can you cut down your application code to a snippet that creates this problem?

    Did you try to enable disable cache around the offending instructions?

    Also, when you do have the answer to the  asm(" ; ") work-around, do let us know.

    Lali