This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320F28379D: Alternate/Intrinsics for memcpy function

Part Number: TMS320F28379D
Other Parts Discussed in Thread: C2000WARE,

Hello all,

I'm looking for alternate way for memcpy function as it's taking more time. Also, I looked at the document, but I'm not able to get.

Kindly help in this regard.

Thanks in advance,

Raju

  • Hi all,

    At the same time, I'm not looking to change my build settings to --unified_memory.

    I would like to know any other possible alternative implementation to memcpy.

    Thanks in advance,

    Raju

  • Hi,

    Can you give more details on the problem and the alternate ways you have tried?

    Thanks
    Vasudha

  • Hi Vasudha,

    Thanks for responding to my query.

    • In fact I have gone through spru514w for memcpy intrinsic in which there is a option of using __builtin_memcpy() which I tried but there is no use of reduction in clock cycles. 

    • Also, it is suggested that the as mentioned in page 47, as shown below. Can you a opinion on that.
      • For details about intrinsics, and a list of the intrinsics, see Section 7.6. In addition to those listed, abs and memcpy are implemented as intrinsics.

    • Also, in case if I'm using model based design, kindly suggest the format in which I have to keep the -mt(--unified memory) setting.

    • Also, I read that memcpy uses RPT & PREAD instructions, but I'm not able to see the same in the disassembly window. My disassembly is as shown below. In case I have to use RPT & PREAD can you suggest appropriate lines to add.
      • memcpy():
        08117f: 5200 CMPB AL, #0x0
        081180: A8AB MOVL @P, XAR4
        081181: C5A4 MOVL XAR7, @XAR4
        081182: 6107 SB C$L2, EQ
        081183: 88A9 MOVZ AR6, @AL
        081184: DE81 SUBB XAR6, #1
        C$L1:
        081185: 5C85 MOVZ AR4, *XAR5++
        081186: 7C87 MOV *XAR7++, AR4
        081187: 000EFFFE BANZ -2,AR6--
        C$L2:
        081189: 88A9 MOVZ AR6, @AL
        08118a: 0FA6 CMPL ACC, @XAR6
        08118b: 610F SB C$L5, EQ
        08118c: 5300 CMPB AH, #0x0
        08118d: 610D SB C$L5, EQ
        08118e: 9DFF ADDB AH, #-1
        08118f: 5CA8 MOVZ AR4, @AH
        C$L3:
        081190: 76BFFFFE MOVL XAR6, #0x3ffffe
        C$L4:
        081192: 9285 MOV AL, *XAR5++
        081193: 9687 MOV *XAR7++, AL
        081194: 000EFFFE BANZ -2,AR6--
        081196: 9285 MOV AL, *XAR5++
        081197: 9687 MOV *XAR7++, AL
        081198: 000CFFF8 BANZ -8,AR4--
        C$L5:
        08119a: A9A4 MOVL @XAR4, P
        08119b: 0006 LRETR

    Thanks in advance,

    Raju.

  • Hi Vasudha,

    Would like to add one screenshot as below is the initial disassembly before memcpy function call.

    Looking forward for your valuable inputs,

    Thanks & Regards,

    Raju 

  • Hi,

    Are you using any compiler optmization in your project? Also, can you tell more about the requirement? Is to reduce the cycles of the memcpy() itself or is it related to inlining?

    Thanks
    Vasudha

  • Hi Vasudha,

    I kept the optimizations off which as shown in the below picture. (Note: I'm using simulink settings)

    Yes, my requirement is to reduce the cycles for memcpy.

    Thanks & Regards,

    Raju

  • Hi Vasudha, 

    I read a TI document for Optimization spru514w, which says as below,

    But I couldn't see any such instructions in my disassembly. Need a support in optimizing the memcpy()

    Thanks & Regards,

    Raju

  • Hi,

    I am looking into this. How many cycles are you observing? Did you try the same with the optimization enabled? Is there any change in the cycles for the memcpy with optimization enabled?

    Meanwhile, did you try using memcpy_fast asm implementation available under C2000Ware_3_04_00_00\libraries\dsp\FPU\c28\source\fpu32\utility\memcpy_fast.asm

    Thanks
    Vasudha

  • Hi Vasudha,

    Good to see you back & thanks for the response.

    I tried by enabling optimization for faster execution as well. Below is the corresponding setting but I couldn't see any improvement still the disassembler showing the same assembly code.

    Also, I would like to say that we are using TMS320F28379D with clock setting 200MHz, for IPC application. The memcpy (IPC Write) copies from RAMGS2 to share RAM (CPU1TOCPU2RAM) address of 0x3FC00.

    And the IPC application with sharing the information between CPU's at 5us & the write function takes a time around 307 cycle count (1.535us) to execute out of which memcpy takes 120 counts(0.6us). Note: we are running these functions in RAM for faster execution.

    Thanks for sharing the piece of code & I added the c as below, but I'm surprised where to provide source address & the destination address for memcpy

    Looking forward for your valuable inputs.

    Regards,

    Raju.