This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320F28388D: Understanding GSx RAM

Part Number: TMS320F28388D
Other Parts Discussed in Thread: C2000WARE

Dear Experts,
I am executing test to verify the if there is any interference occurs in CPU due the read/write operation performed by another CPU.
In Both the scenarios I am using different IPC counter to capture the delta time (gf32TimeUs). In both scenarios, I have two section
of the code which gets executed in the loaded core based on the gu16CpuId from System Controller module.

In all scenarios, I have loaded output (.out) file enabling this below function in CPU1 and loaded .out file in CPU2 disabling the same function and I got the below timings.

#1. Scenario:
-------------------------
Result/Output:
The timing aspect observed are:

CPU1 Read -> 430.125 microseconds
CPU1 Write -> 430.125 microseconds

CPU2 Read -> 409.64 microseconds
CPU2 Write -> 430.125 microseconds

  • Code snippet:

void ReadIntinCPU1due2CPU2Read_SameBlock()
{
    UINT16 i = 0;
    while (1)
    {
        i = 0;
        if( CPUID_1 == gu16CpuId )
        {
            start = guniIpcRegs.gstrCpu1IpcRegs.IPCCOUNTERL;
            for (i = 0; i < u16size; i++)
            {
                gu16GS0_Array[i];                               // For write, replacing this line to "gu16GS0_Array[i] = 6;"
            }
            end = guniIpcRegs.gstrCpu1IpcRegs.IPCCOUNTERL;
            gf32TimeUs = (FLOAT32)(end - start)*0.005f;
        }

        i = 0;
        if( CPUID_2 == gu16CpuId )
        {
            start = guniIpcRegs.gstrCpu1IpcRegs.IPCCOUNTERL;
            for (i = 0; i < u16size; i++)
            {
                gu16GS0_Array[i];                               // For write, replacing this line to "gu16GS0_Array[i] = 7;"
            }
            end = guniIpcRegs.gstrCpu1IpcRegs.IPCCOUNTERL;
            gf32TimeUs = (FLOAT32)(end - start)*0.005f;
        }
    }
}

#2. Scenario:
-------------------------
I have reversed the sequence of execution of Code in CPU block i.e., (CPUID_2 == gu16CpuId) followed by (CPUID_1 == gu16CpuId) and executed the same test and got the below result.

#Result/Output:
The timing aspect observed are:


CPU1 Read -> 409.64 microseconds
CPU1 Write -> 430.125 microseconds

CPU2 Read -> 430.125 microseconds
CPU2 Write -> 430.125 microseconds

  • Code Snippet:

void ReadIntinCPU1due2CPU2Read_SameBlock()
{
    UINT16 i = 0;
    while (1)
    {
        i = 0;
        if( CPUID_2 == gu16CpuId )
        {
            start = guniIpcRegs.gstrCpu2IpcRegs.IPCCOUNTERL;
            for (i = 0; i < u16size; i++)
            {
                gu16GS0_Array[i];                               // For write, replacing this line to "gu16GS0_Array[i] = 7;"
            }
            end = guniIpcRegs.gstrCpu2IpcRegs.IPCCOUNTERL;
            gf32TimeUs = (FLOAT32)(end - start)*0.005f;
        }

        i = 0;
        if( CPUID_1 == gu16CpuId )
        {
            start = guniIpcRegs.gstrCpu1IpcRegs.IPCCOUNTERL;
            for (i = 0; i < u16size; i++)
            {
                gu16GS0_Array[i];                              // For write, replacing this line to "gu16GS0_Array[i] = 6;"
            }
            end = guniIpcRegs.gstrCpu1IpcRegs.IPCCOUNTERL;
            gf32TimeUs = (FLOAT32)(end - start)*0.005f;
        }
    }
}

#Queries/Observations:

  • If I compare the Read timings on CPU1, I see the difference i.e., in Scenario #1 the CPU1 Read -> 430.125 microseconds where as in Scenario #2 the CPU1 Read -> 409.64 microseconds.                              The same happened for CPU2 Read as well i.e., in scenario #1 CPU2 Read -> 409.64 microseconds whereas in Scenario #2 CPU2 Read -> 430.125 microseconds.
  • The only difference in the above two functions is the positions of the CPUs performing there tasks, but that should not effect the timings as these task is performed in the particular CPU.


Q1. Why these behavior are observed ?
Q2. Is there any impact of mastership ?
Q3. Is there any exception in accessing the RAM sections like GS0-GS4 should be accessed by CPU1 and rest by CPU2 ?

please share if there is any document to refer to understand the internals of GSx RAM to support this test.

  • Hello Surya,

    Q1. Why these behavior are observed ?

    I think it may be helpful to look at the disassembly to see what the code compiles to; I would find it strange if the timing for the accesses are different without the compiled code itself being different, unless a read and write are happening simultaneously, but if no other code in your program is running it shouldn't be causing this.

    Q2. Is there any impact of mastership ?

    I'm not sure what you mean by this, if you're asking if this sort of timing is caused by which CPU owns the RAM I don't think it does. Looking at the Memory Architecture diagram in the Memory Controller Module of the reference manual, both CPU1 and CPU2 have the same type of access path to the GSxRAM, so the CPU which doesn't own the RAM doesn't seem to need to go through some alternate path. Can you tell me which CPU is selected to be the owner of the particular GSxRAM you're accessing? I'll ask the design experts to see if there's anything architecturally that might explain the timing differences.

    Q3. Is there any exception in accessing the RAM sections like GS0-GS4 should be accessed by CPU1 and rest by CPU2 ?

    No, based on what I see all GSxRAM should behave as globally shared RAM as described in the reference manual.

  • Hi Amir,
    Q2. Is there any impact of mastership ?

    I'm not sure what you mean by this, if you're asking if this sort of timing is caused by which CPU owns the RAM I don't think it does. Looking at the Memory Architecture diagram in the Memory Controller Module of the reference manual, both CPU1 and CPU2 have the same type of access path to the GSxRAM, so the CPU which doesn't own the RAM doesn't seem to need to go through some alternate path. Can you tell me which CPU is selected to be the owner of the particular GSxRAM you're accessing? I'll ask the design experts to see if there's anything architecturally that might explain the timing differences.

    I have checked with both the CPUs one-by-one, both are showing the same behavior as mentioned earlier.

  • Hello Surya,

    I will bring this up with the design experts to see if they are familiar with why this might be, I should get a response by the end of the week.

  • Would it be possible for you to provide me the code for both CPU1 and CPU2 you used to verify this? I will try to test this on a F2838x device once I get the chance.

    Also, have you validated that the assembly code generated from your code to access the GSx memories for both cores were fairly the same? Were there any differences in the Disassembly view that you noticed?

  • Hi Amir,
    I didn't noticed any differences in the Disassembly view.

  • Please find below assembly code snippet:

    #1. Scenario:

    void ReadIntinCPU1due2CPU2Read_SameBlock()
    {
        UINT16 i = 0;
        while (1)
        {
            i = 0;
            if( CPUID_1 == gu16CpuId )
            {...
            }
    
            i = 0;
            if( CPUID_2 == gu16CpuId )
            {...
            }
        }
    }
    
    //////////////////////////////////////////////////////
    
    622             if( CPUID_1 == gu16CpuId )
    0883a9:   761F025A    MOVW         DP, #0x25a
    0883ab:   920D        MOV          AL, @0xd
    0883ac:   5201        CMPB         AL, #0x1
    0883ad:   602A        SB           $C$L96, NEQ
    624                 start = guniIpcRegs.gstrCpu1IpcRegs.IPCCOUNTERL;
    0883ae:   761F1738    MOVW         DP, #0x1738
    0883b0:   060C        MOVL         ACC, @0xc
    0883b1:   761F0259    MOVW         DP, #0x259
    0883b3:   1E2C        MOVL         @0x2c, ACC
    625                 for (i = 0; i < u16size; i++)
    0883b4:   2B41        MOV          *-SP[1], #0
    0883b5:   9226        MOV          AL, @0x26
    0883b6:   5441        CMP          AL, *-SP[1]
    0883b7:   6909        SB           $C$L95, LOS
    0883b8:   8F00D000    MOVL         XAR4, #0x00d000
    627                     gu16GS0_Array[i];// = 7;
            $C$L94:
    0883ba:   5841        MOVZ         AR0, *-SP[1]
    0883bb:   9294        MOV          AL, *+XAR4[AR0]
    625                 for (i = 0; i < u16size; i++)
    0883bc:   0A41        INC          *-SP[1]
    0883bd:   9226        MOV          AL, @0x26
    0883be:   5441        CMP          AL, *-SP[1]
    0883bf:   66FB        SB           $C$L94, HI
    629                 end = guniIpcRegs.gstrCpu1IpcRegs.IPCCOUNTERL;
            $C$L95:
    0883c0:   761F1738    MOVW         DP, #0x1738
    0883c2:   060C        MOVL         ACC, @0xc
    0883c3:   761F0259    MOVW         DP, #0x259
    0883c5:   1E2E        MOVL         @0x2e, ACC
    630                 gf32TimeUs = (FLOAT32)(end - start)*0.005f;

    #2. Scenario:

    void ReadIntinCPU1due2CPU2Read_SameBlock()
    {
        UINT16 i = 0;
        while (1)
        {
            i = 0;
            if( CPUID_2 == gu16CpuId )
            {...
            }
    
            i = 0;
            if( CPUID_1 == gu16CpuId )
            {...
            }
        }
    }
    
    ///////////////////////////////////////////////////////////
    
    633             if( CPUID_1 == gu16CpuId )
            $C$L96:
    0883d7:   761F025A    MOVW         DP, #0x25a
    0883d9:   920D        MOV          AL, @0xd
    0883da:   5201        CMPB         AL, #0x1
    0883db:   60CD        SB           $C$L93, NEQ
    635                         start = guniIpcRegs.gstrCpu1IpcRegs.IPCCOUNTERL;
    0883dc:   761F1738    MOVW         DP, #0x1738
    0883de:   060C        MOVL         ACC, @0xc
    0883df:   761F0259    MOVW         DP, #0x259
    0883e1:   1E2C        MOVL         @0x2c, ACC
    636                         for (i = 0; i < u16size; i++)
    0883e2:   2B41        MOV          *-SP[1], #0
    0883e3:   9226        MOV          AL, @0x26
    0883e4:   5441        CMP          AL, *-SP[1]
    0883e5:   6909        SB           $C$L98, LOS
    0883e6:   8F00D000    MOVL         XAR4, #0x00d000
    638                             gu16GS0_Array[i];// = 7;
            $C$L97:
    0883e8:   5841        MOVZ         AR0, *-SP[1]
    0883e9:   9294        MOV          AL, *+XAR4[AR0]
    636                         for (i = 0; i < u16size; i++)
    0883ea:   0A41        INC          *-SP[1]
    0883eb:   9226        MOV          AL, @0x26
    0883ec:   5441        CMP          AL, *-SP[1]
    0883ed:   66FB        SB           $C$L97, HI
    640                         end = guniIpcRegs.gstrCpu1IpcRegs.IPCCOUNTERL;
            $C$L98:
    0883ee:   761F1738    MOVW         DP, #0x1738
    0883f0:   060C        MOVL         ACC, @0xc
    0883f1:   761F0259    MOVW         DP, #0x259
    0883f3:   1E2E        MOVL         @0x2e, ACC
    641                         gf32TimeUs = (FLOAT32)(end - start)*0.005f;

  • Find the configuration code below:

    CPU Selection:

    ScGsMemInit_Struct strGsMemInit =
    {
    .uniGsxMsel.bit.MSEL_GS0 = GS_MSEL_CPU2,
    ...

    Function invocation:

    ScConfigGsMemory(&strGsMemInit, BOOL_FALSE, BOOL_FALSE);

    Inside function "ScConfigGsMemory":

    gstrMemCfgRegs.GSxMSEL.all = 0x0;
    gstrMemCfgRegs.GSxINIT.all = 0x0000FFFF;

    # It is necessary and urgently required to know about this behavior of the GSx RAM, as this is the part of an Aerospace Project. So, share the internal bus architecture of the GSx RAM, if any available.

  • Hello Surya,

    It looks like there are no notable differences in the assembly. Are you just using CPU2 as the core for your project? I'm confused how to different scenarios are loaded to each core, are you loading scenario 1 to CPU1 and scenario 2 to CPU2, vice versa, or something else?

    I tried to create a similar example (I couldn't do the same exact example since I don't have your full code available), and I wasn't able to find any difference in the access times for either CPUs:

    #include "driverlib.h"
    #include "device.h"
    #include "board.h"
    #include "c2000ware_libraries.h"
    
    uint64_t start, end;
    volatile uint16_t gu16GS0_Array[2048];
    float32_t time;
    
    void main(void)
    {
        MemCfg_setGSRAMControllerSel(MEMCFG_SECT_GS0, MEMCFG_GSRAMCONTROLLER_CPU1);
        uint16_t i = 0;
        start = IPC_getCounter(IPC_CPU1_L_CPU2_R);
        for (i = 0; i < 2048; i++)
        {
            gu16GS0_Array[i];// = 6;
        }
        end = IPC_getCounter(IPC_CPU1_L_CPU2_R);
        time = (float32_t)(end - start)*0.005f;
        while (1);
    }
    

    This code is the same for both CPUs, the only item I changed was the macro for accessing the IPC timer. I loaded the code to both CPUs separately and tested reads vs writes, both times on both cores I got 174.474991 for my time variable. I don't think I'll be able to replicate this on my side unless either 1) you provide me your entire project with all the variables/structs you used or 2) re-create a simplified test-case from an example in C2000Ware so I can easily copy the code you added to be able to test the timing. If neither of these can happen, then there's not much I can do to support this thread.

    So, share the internal bus architecture of the GSx RAM, if any available.

    I will check to see where I can find this, but I believe it may be under NDA.

  • Hi Amir,

    Are you just using CPU2 as the core for your project?

    We are using both the core for the project, now I am testing the timing aspect for both the CPUs separately so that we can take that timing(no interference timing) as a base and find the deviation of timing due to interference for both the CPUs. 

    I'm confused how to different scenarios are loaded to each core, are you loading scenario 1 to CPU1 and scenario 2 to CPU2, vice versa, or something else?

    #1. Scenario:

    • I have build that code with the function below and loaded it in CPU1.
      • void ReadIntinCPU1due2CPU2Read_SameBlock()
        {
            UINT16 i = 0;
            while (1)
            {
                i = 0;
                if( CPUID_1 == gu16CpuId )
                {
                    start = guniIpcRegs.gstrCpu1IpcRegs.IPCCOUNTERL;
                    for (i = 0; i < u16size; i++)
                    {
                        gu16GS0_Array[i];   // For write, replacing this line to "gu16GS0_Array[i] = 6;"
                    }
                    end = guniIpcRegs.gstrCpu1IpcRegs.IPCCOUNTERL;
                    gf32TimeUs = (FLOAT32)(end - start)*0.005f;
                }
        
                i = 0;
                if( CPUID_2 == gu16CpuId )
                {
                    start = guniIpcRegs.gstrCpu2IpcRegs.IPCCOUNTERL;
                    for (i = 0; i < u16size; i++)
                    {
                        gu16GS0_Array[i];   // For write, replacing this line to "gu16GS0_Array[i] = 7;"
                    }
                    end = guniIpcRegs.gstrCpu2IpcRegs.IPCCOUNTERL;
                    gf32TimeUs = (FLOAT32)(end - start)*0.005f;
                }
            }
        }

    • For the CPU2, I have commented the same(above) function, build it and loaded it to CPU2.
    • The important point is I have also kept CPU2 in if statement in the function which I have loaded in CPU1 i.e.
      •     if( CPUID_2 == gu16CpuId )
                   {
                        start = guniIpcRegs.gstrCpu2IpcRegs.IPCCOUNTERL;
                        for (i = 0; i < u16size; i++)
                       {
                            gu16GS0_Array[i];           // For write, replacing this line to "gu16GS0_Array[i] = 7;"
                        }
                        end = guniIpcRegs.gstrCpu2IpcRegs.IPCCOUNTERL;
                         gf32TimeUs = (FLOAT32)(end - start)*0.005f;
                    }

    • Ideally, CPU2 in this case should not effect the timing aspect of CPU1 read/write as we are loading its build(having this function enabled) only in CPU1.
    • But I have found that CPU2 operations(read and write) is also affecting the CPU1 read/write timing aspect. 
    • Case_1: The CPU2 "if statement part" in this case having below line of code will effect the timing of CPU1 and give different timing aspect.
      • for (i = 0; i < u16size; i++)
            {
                gu16GS0_Array[i];           
            }

    • Case_2: The CPU2 "if statement part" in this case having below line of code will effect the timing of CPU1 and give different timing aspect as compared to case_1.
      • for (i = 0; i < u16size; i++)
            {
                gu16GS0_Array[i] = 7;           
            }

    #2. Scenario: Vice-versa

  • I think my confusion is the context behind your numbers regarding which cores you were testing at the time. For example, for scenario 1, which way did you obtain the timing?

    Case A (A1 and A2 separately run on the device):

    • A1: Core 1 given GS0 control:
      • Core 1:
        • CPU1 Read -> 430.125 microseconds
      • Core 2: 
        • CPU2 Write -> 430.125 microseconds
    • A2: Core 2 given GS0 control:
      • Core 1:
        • CPU1 Write -> 430.125 microseconds
      • Core 2: 
        • CPU2 Read -> 409.64 microseconds

    Case B (B1 and B2 separately run on the device):

    • B1: Core 1 given GS0 control:
      • Core 1:
        • CPU1 Read -> 430.125 microseconds
      • Core 2: 
        • CPU2 Read -> 409.64 microseconds
    • B2: Core 2 given GS0 control:
      • Core 1:
        • CPU1 Write -> 430.125 microseconds
      • Core 2: 
        • CPU2 Write -> 430.125 microseconds

    Case C:

    • Core 1:
      • CPU1 Read -> 430.125 microseconds
      • CPU1 Write -> 430.125 microseconds
    • Core 2: 
      • CPU2 Read -> 409.64 microseconds
      • CPU2 Write -> 430.125 microseconds

    Keep in mind case B2 and C are invalid, since you cannot have both cores writing to a global RAM without changing the mux selecting the GS0RAM controller, and your code doesn't show any difference/changing of the GS0 controller. If you are changing the GS0RAM controller between the cores, let me know.

    If you are doing case A, I wasn't able to re-create this on the hardware I have, so try running the code I attached previously and seeing if that still results in inconsistent timings. In the case where the timings are no longer consistent, it may have something to do with your subsequent conditional which are not supposed to be entered, but I don't have your full project so I can't replicate this on my side.