This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

slow memcpy

Using 28375D on CPU2 I am calling memcpy

memcpy(&theComplexMatrixResult->theComplexArray[0], &theComplexMatrixSource->theComplexArray[0], 4*theComplexMatrixResult->rows*theComplexMatrixResult->cols );

 

the first few ML instructions of memcpy in the disassembly window are:

CMPB     AL, #0x00 
MOVL     @P, XAR4

MOVL    XAR0, @XAR4
SB         C$L2, EQ

MOVZ    AR7, @AL

SUBB    XAR7, #1

MOVZ   AR6, @AR7

C$L1:

MOVZ   AR7, *XAR5++

MOV     *XAR0++, AR7

BANZ    65534, AR6--

 

NOw just focusing on the main loop (last 3 lines)

when I use the timer clock to time the cycles its taking 1 cycle for the MOVZ, 16 cycles for MOV and 32 for BANZ...

Is there some memory conflict with the particular memories it's copying from that causes this delay?  Is there something I can do to ensure that this part of the loop only takes 3 cycles? instead of 49 cycles? as this adds up fast when copying memory.

Why am I observing such a long delay.

I have confirmed this delay is real by more than just your clock... since I am trying to execute an interrupt between samples, I can adjust my sample rate (which controls how many machine cycles I have between interrupts to copy this) and by experimentation I find that by decreasing the sample rate (increasing number of cycles available) everything starts to work when I give it enough time to copy the data...  using the 49 per copy estimate I can set the sample rate so enough cycles exist, and the code does execute, so it does appear that it's going VERY slow.

 

This is executing in RAM not flash.

What could be the source of the conflict?

  • Rob,
    Thanks for the post, and sorry for the delay in responding. I have not yet succeeded in replicating this behaviour. It should be taking 4 cycles at the BANZ and that is what I'm seeing on my setup.
    When you step through can you let me know what values are in XAR5 and XAR0 please? In other words, what are the physical source and destination addresses? Thanks.
    Regards,
    Richard
  • It turns out it was putting memcpy in flash, so the slowness came from flash execution.

    Since I am relying on the ti ramfuncs compiler option to select/ mark all functions to be moved to ram, it turns out that the default of how you setup your files, it doesn't mark memcpy to move to flash.

    While I can manually make alterations to the TI supplied files, is there any automatic way to get it to move memcpy to RAM (TI Ramfuncs)

    This is not the only function that is placed in ram, it was just a critical one so I noticed it.  When I reviewed the map file there are a few functions that were left in flash, all ones from TI

    rts2800_fpu32.lib

    Is there any way to have it include any functions it pulls from this library (or others) into the ti ramfuncs group?

  • Hi Rob,

    Yes, by default this is placed into .text section.

    You can use following method in linker cmd file to allocate a function (in this case memcpy) in library to any section.

    .memcpy : > MEMORY_SEGMENT_NAME PAGE = 0
    {
    --library=rts2800_fpu32.lib<memcpy.obj> (.text)
    }

    But I am not sure how are you planning to use this after putting into RAM. Any code in RAM need to be copied from flash and memcpy function is used for that. Content in RAM can be only used fro debug with CCS connected but not in standalone run.

    Regards,

    Vivek Singh