slow memcpy

Rob Barton

Using 28375D on CPU2 I am calling memcpy

memcpy(&theComplexMatrixResult->theComplexArray[0], &theComplexMatrixSource->theComplexArray[0], 4*theComplexMatrixResult->rows*theComplexMatrixResult->cols );

the first few ML instructions of memcpy in the disassembly window are:

CMPB AL, #0x00
MOVL @P, XAR4

MOVL XAR0, @XAR4
SB C$L2, EQ

MOVZ AR7, @AL

SUBB XAR7, #1

MOVZ AR6, @AR7

C$L1:

MOVZ AR7, *XAR5++

MOV *XAR0++, AR7

BANZ 65534, AR6--

NOw just focusing on the main loop (last 3 lines)

when I use the timer clock to time the cycles its taking 1 cycle for the MOVZ, 16 cycles for MOV and 32 for BANZ...

Is there some memory conflict with the particular memories it's copying from that causes this delay? Is there something I can do to ensure that this part of the loop only takes 3 cycles? instead of 49 cycles? as this adds up fast when copying memory.

Why am I observing such a long delay.

I have confirmed this delay is real by more than just your clock... since I am trying to execute an interrupt between samples, I can adjust my sample rate (which controls how many machine cycles I have between interrupts to copy this) and by experimentation I find that by decreasing the sample rate (increasing number of cycles available) everything starts to work when I give it enough time to copy the data... using the 49 per copy estimate I can set the sample rate so enough cycles exist, and the code does execute, so it does appear that it's going VERY slow.

This is executing in RAM not flash.

What could be the source of the conflict?

over 8 years ago

0 Richard Poley over 8 years ago

TI__Mastermind 27200 points

Rob,
Thanks for the post, and sorry for the delay in responding. I have not yet succeeded in replicating this behaviour. It should be taking 4 cycles at the BANZ and that is what I'm seeing on my setup.
When you step through can you let me know what values are in XAR5 and XAR0 please? In other words, what are the physical source and destination addresses? Thanks.
Regards,
Richard

0 Rob Barton over 8 years ago in reply to Richard Poley

Expert 2220 points

It turns out it was putting memcpy in flash, so the slowness came from flash execution.

Since I am relying on the ti ramfuncs compiler option to select/ mark all functions to be moved to ram, it turns out that the default of how you setup your files, it doesn't mark memcpy to move to flash.

While I can manually make alterations to the TI supplied files, is there any automatic way to get it to move memcpy to RAM (TI Ramfuncs)

This is not the only function that is placed in ram, it was just a critical one so I noticed it. When I reviewed the map file there are a few functions that were left in flash, all ones from TI

rts2800_fpu32.lib

Is there any way to have it include any functions it pulls from this library (or others) into the ti ramfuncs group?

0 Vivek Singh over 8 years ago in reply to Rob Barton

TI__Guru** 115281 points

Hi Rob,

Yes, by default this is placed into .text section.

You can use following method in linker cmd file to allocate a function (in this case memcpy) in library to any section.

.memcpy : > MEMORY_SEGMENT_NAME PAGE = 0
{
--library=rts2800_fpu32.lib<memcpy.obj> (.text)
}

But I am not sure how are you planning to use this after putting into RAM. Any code in RAM need to be copied from flash and memcpy function is used for that. Content in RAM can be only used fro debug with CCS connected but not in standalone run.

Regards,

Vivek Singh

C2000™︎ microcontrollers

C2000 microcontrollers forum

slow memcpy