This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

C6678, L2RAM performance issue



Hello

I'm working on c6678, CCS 6.1.2, compiler v8.1.0

My application code is stored in MSMCSRAM, data memory is stored in L2SRAM

I have a function that reads input data from a big buffer (57600 bytes)

I have 2 input buffers, the first allocated at 0x800000 (L2SRAM), the second at 0x80E100 (L2SRAM)

Passing pointer to the first buffer, the function takes 800 us to be completed (I use an oscilloscope to measure time)
Passing pointer to the second buffer, the function takes 700 us to be completed

Obviously, the two input buffers contain the same test data pattern.

Can you explain me why 100 us difference??? 

Thanks

  • Hi Fabio,
    I have assigned this thread to our expert. Thank you for your patience.
  • Before we continue let's do the following:
    Disable L1D cache and L1P cache and repeat the experiment. Report back what are the performances with disabled caches

    Ran
  • Hi
    disabled L1D and L1P cache

    time are equal for both buffer , 2400 us

    Fabio
  • OK

    The second buffer the code is already in the L1P cache, so it takes less time.

    And you may have other variables in L2 that may cause cache trashing for L1D


    To check for the first item do the following:

    Enable only L1P - not L1D

    Now run the first routine twice (I mean with the vector in the first part of L2) and then the second one (the vector in the second half of L2) and compare the numbers

    If they are the same, this is l1P issue

    Report to me again

    Thanks

    Ran
  • Sorry but  I forgot  to say one important thing

    I did 3 different tests

    first (original pseudocode source)

    main()

    {

    int32 buffer1[ size]

    int32 buffer2[ size]

    //fill buffer1 and buffer2

    myFunction( buffer1)    //800 us

    myFunction( buffer2)   //700 us

    }

    second

    main()

    {

    int32 buffer1[ size]

    int32 buffer2[ size]

    //fill buffer1 and buffer2

    myFunction( buffer2)    //700us

    myFunction( buffer2)   //700 us

    }

    third

    main()

    {

    int32 buffer1[ size]

    int32 buffer2[ size]

    //fill buffer1 and buffer2

    myFunction( buffer1)    //800 us

    myFunction( buffer1)   //800 us

    }

    So buffer2 takes less time even if I never call myFunction( buffer1)

  • OK so we solve the issue

    The difference is due to L1P getting the program from MSMC memory ONLY the first run. Then the program resides in L1P and it takes only 700us regardless where the data resides.

    I close the thread

    Ran
  • Sorry but I think I badly explained the problem, so let's start from scratch

    My original code actually looks like this

    int32 buffer1[ SIZE]
    int32 buffer2[ SIZE]
    int pingpong = 1
    
    //This function is called periodically after an IRQ
    void execute()
    {
       if(pingpong ) myFunction( buffer1) //800 us
       else myFunction( buffer2) //700 us
    
       pingpong != pingpong
    }

    So I use alternately buffer1 and buffer2, the time sequence is 800,700,800,700,800,700,800,700........

    I modified the code as follows ( using always buffer1)

    if(pingpong ) myFunction( buffer1)
    else          myFunction( buffer1) //always buffer1

    and the time sequence is 800,800,800,800,800,800........

    I modified the code as follows ( using always buffer2)

    if(pingpong ) myFunction( buffer2)
    else          myFunction( buffer2) //always buffer2

    and the time sequence is 700,700,700,700,700,700,700,700......

    so as you can see, buffer1 takes always 800 us while buffer2 always 700.

    Then You suggested to Disable L1D cache and L1P cache and I always have 2400,2400,2400,2400,2400,2400,2400,2400,2400.... in the 3 samples above

  • Additional info : I tried the following

    instead of using buffer1 @0x800000 , and buffer2 @0x80E100
    I set buffer1 @0x811000 , and buffer2 @0x81F100

    In this case I get 700,700,700,700,700 ......
  • OK

    You should analyze your system and look at two things - the cache structure and the memory bank structure.

    I enclose some slides that explain how the cache is working and about the memory bank, just in case you are not familiar with the way the cache and L1D are working.

    Ran/cfs-file/__key/communityserver-discussions-components-files/791/4571.8004.Direct-Cache-Structure.pptx

  • Got no response for a week. I close the thread
  • Did not get a response for a week
    Close the thread