This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CLA C Compiler and pipeline



I'm doing all my CLA code in C, I'm very bad with assembly ...  and I'm wondering if the CLA C compiler takes care of putting the proper amount of MNOPs in-between statements that could be problematic with the 8-stage pipeline.

But, I'm wondering if the 8-stage pipeline needs to be padded for regular things in C .. or does the CLA C compiler does this already for us?

If in one of my CLA tasks has for example,

{
x=x*(5+12);    //x initialized to 1
y=x;
}

What is y on the first run of this loop?  Is y=1 or y=17?

Do I need to do 7 __mnop() to flush the pipeline?

{
x=x*(5+12);    //x initialized to 1
__mnop();__mnop();__mnop();__mnop();__mnop();__mnop();__mnop();
y=x;
}

  • Here is some example code where I placed 4 __mnop(); in-between where temp1 is assigned a value, and the statement right after that uses temp1:

    366               temp1 = 1.0 / (float32)BUF_LEN;
    00009b6e:   75001482    MI16TOF32  MR0, @0x1482
    00009b70:   7F000003    MEINVF32   MR3, MR0
    00009b72:   78413F80    MMOVIZ     MR1, #0x3f80
    00009b74:   7C000032    MMPYF32    MR2, MR0, MR3
    00009b76:   780A4000    MSUBF32    MR2, #0x4000, MR2
    00009b78:   7C00003B    MMPYF32    MR3, MR2, MR3
    00009b7a:   7C00000E    MMPYF32    MR2, MR3, MR0
    00009b7c:   780A4000    MSUBF32    MR2, #0x4000, MR2
    00009b7e:   7C00003B    MMPYF32    MR3, MR2, MR3
    00009b80:   7C00001E    MMPYF32    MR2, MR3, MR1
    00009b82:   74E08832    MMOV32     @0x8832, MR2
    367               CLA_PIPELINE_FLUSH
    00009b84:   7FA00000    MNOP       
    00009b86:   7FA00000    MNOP       
    00009b88:   7FA00000    MNOP       
    00009b8a:   7FA00000    MNOP       
    00009b8c:   7FA00000    MNOP       
    368               VoutRMS = CLAsqrt(VoutSqSum * temp1);
    00009b8e:   799F0280    MCCNDD     0x280, UNCF
    00009b90:   73C01518    MMOV32     MR0, @0x1518, UNCF
    00009b92:   7AC000F9    MMOV32     MR1, MR2, UNCF
    00009b94:   7C000004    MMPYF32    MR0, MR1, MR0
    00009b96:   74C01488    MMOV32     @0x1488, MR0

    CLA_PIPELINE_FLUSH is just a #define as {__mnop();__mnop();__mnop();__mnop();}

    Why does it do ITRAP0 and then MOV?  I thought this would just be MNOP? (Realized I was not connected to the target, now I can properly read the disassembly.

    Anyway, do I need to be doing this pipeline flushing with 7-8 MNOPs?

  • Fulano de Tal said:

    If in one of my CLA tasks has for example,

    {
    x=x*(5+12);    //x initialized to 1
    y=x;
    }

    What is y on the first run of this loop?  Is y=1 or y=17?

    Do I need to do 7 __mnop() to flush the pipeline?

    {
    x=x*(5+12);    //x initialized to 1
    __mnop();__mnop();__mnop();__mnop();__mnop();__mnop();__mnop();
    y=x;
    }

    No, there is no reason to flush the pipeline.  Are you seeing behavior that contradicts this?

  • Hi Lori, thank you for your response.  I'm not seeing any strange behavior, but all the literature that describes the unprotected CLA pipeline (spruh18e pg. 559) made it seem like I have to have a buffer of a certain amount of MNOPs to clear the pipeline before I read any memory that I just wrote to.

    In my case, if I just wrote to the variable x, and then try to read it and copy it to y, my concern is that it will be copying an old value to y.

  • I see yes you are correct write followed by read to the same location can be problematic - the compiler should know that you are writing to the same location and take care of this.  Can you show the disassembly of the code in question? (the disassembly you posted seems to be of a different slice of code).  I will double check with the compiler team.

  • Hi Lori,

    Both pieces of code are doing the same thing, but we can use this one as an example, I have defined CLA_PIPELINE_FLUSH as:

    #define CLA_PIPELINE_FLUSH {__mnop();__mnop();__mnop();__mnop();}

    366               temp1 = 1.0 / (float32)BUF_LEN;
    00009b6e:   75001482    MI16TOF32  MR0, @0x1482
    00009b70:   7F000003    MEINVF32   MR3, MR0
    00009b72:   78413F80    MMOVIZ     MR1, #0x3f80
    00009b74:   7C000032    MMPYF32    MR2, MR0, MR3
    00009b76:   780A4000    MSUBF32    MR2, #0x4000, MR2
    00009b78:   7C00003B    MMPYF32    MR3, MR2, MR3
    00009b7a:   7C00000E    MMPYF32    MR2, MR3, MR0
    00009b7c:   780A4000    MSUBF32    MR2, #0x4000, MR2
    00009b7e:   7C00003B    MMPYF32    MR3, MR2, MR3
    00009b80:   7C00001E    MMPYF32    MR2, MR3, MR1
    00009b82:   74E08832    MMOV32     @0x8832, MR2
    367               CLA_PIPELINE_FLUSH
    00009b84:   7FA00000    MNOP       
    00009b86:   7FA00000    MNOP       
    00009b88:   7FA00000    MNOP       
    00009b8a:   7FA00000    MNOP       
    00009b8c:   7FA00000    MNOP       
    368               VoutRMS = CLAsqrt(VoutSqSum * temp1);
    00009b8e:   799F0280    MCCNDD     0x280, UNCF
    00009b90:   73C01518    MMOV32     MR0, @0x1518, UNCF
    00009b92:   7AC000F9    MMOV32     MR1, MR2, UNCF
    00009b94:   7C000004    MMPYF32    MR0, MR1, MR0
    00009b96:   74C01488    MMOV32     @0x1488, MR0

    Basically, when I write to temp1, I am using temp1 in the next statement to calculate VoutRMS.  My concern is that the CLA C compiler is not inserting the proper amount of MNOPs in-between to flush the pipeline.  The MNOPs were inserted by me, I'm not sure how many I need (I think 7 or 8?).  I thought the compiler would take care of this for me, but now I am thinking I may need to manually insert __mnop()s in-between C-statements where I have just written to a certain variable, and then use it in the next C-statement.

    This is the disassembly without my manually inserted CLA_FLUSH_PIPELINE:

    366               temp1 = 1.0 / (float32)BUF_LEN;
    00009a44:   75001482    MI16TOF32  MR0, @0x1482
    00009a46:   7F000003    MEINVF32   MR3, MR0
    00009a48:   78413F80    MMOVIZ     MR1, #0x3f80
    00009a4a:   7C000032    MMPYF32    MR2, MR0, MR3
    00009a4c:   780A4000    MSUBF32    MR2, #0x4000, MR2
    00009a4e:   7C00003B    MMPYF32    MR3, MR2, MR3
    00009a50:   7C00000E    MMPYF32    MR2, MR3, MR0
    00009a52:   780A4000    MSUBF32    MR2, #0x4000, MR2
    00009a54:   7C00003B    MMPYF32    MR3, MR2, MR3
    00009a56:   7C00001E    MMPYF32    MR2, MR3, MR1
    00009a58:   74E08832    MMOV32     @0x8832, MR2
    368               VoutRMS = CLAsqrt(VoutSqSum * temp1);
    00009a5a:   799F0270    MCCNDD     0x270, UNCF
    00009a5c:   73C01518    MMOV32     MR0, @0x1518, UNCF
    00009a5e:   7AC000F9    MMOV32     MR1, MR2, UNCF
    00009a60:   7C000004    MMPYF32    MR0, MR1, MR0
    00009a62:   74C01488    MMOV32     @0x1488, MR0

  • Thanks - I was looking for something where the code immediately used the variable without making a call. 

    In this case there are a number of instructions before the variable is read  - it is stored in @0x08832 but it is also still in MR2.  It is copied to MR1 (mov32, mr1, mr2, uncf) and then MR1 is used for the multiply when the call is taken (the MPYF32 instruction) - so memory is never actually read.  The compiler will likely do this in the same way in the original case you showed (just use the variable in the register).  

    If the value is made volatile then the compiler will have to access the memory each time the variable is used which may yield different behavior.

    I've asked the compiler team to confirm if there are any cases where you have to take care. 

  • OK thanks Lori, that would be comforting to know if there are any special cases where I would have to flush the pipeline in C code for the CLA.

  • Hi Lori,

    Any word on this?

  • Fulano,

    One follow-up that I did not comprehend in my previous message.  If you are writing to peripheral registers there is a case you have to take care of.  This case is where a write to one register (call it register A) will change the contents of another register (call it register B).

    ie

    • write to register A  
    • -----> hardware peripheral itself changes register B in response
    • -----> wait 3 NOPs before reading register B
    • read from register B

    Obviously the compiler cannot detect this since they are different address locations and the compiler does not know the relationship between the registers.  In this case you would need to add the NOPs manually if you performed a write to register A followed by a read of register B.

    Regards

    Lori

  • Fulano de Tal said:

    Hi Lori,

    Any word on this?

    We have not had complaints about this so I'm still confident that the compiler is doing the right thing for write/read from the same address.  Having said that they are checking and will get back to me likely next week.

  • Sounds good, thank you Lori very much for your help!

  • Fulano - I've closed this with the compiler team and our CLA experts.  The peripheral register case is the one to watch for.  Otherwise the compiler protects for this. 

    I will submit a change to the documentation to clarify this point.

    Regards
    Lori