This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM2612: AM261x-LP: Shared memory slow access on R5FSS0-0

Part Number: AM2612

Dear Experts,

I am working on IPC communication between Core 1 and Core 0 of my AM261x-LP. I modified the example l2_enet_switch, in a way that Core 1 receive the packet and then send the packet to the Core 0. The communication works like this:

  1. Ethernet pkt is received from Core 1
  2. ISR is generated and semaphore is posted
  3. RX Task run and write the dequeued packet inside shared memory between Core 0 and Core 1
  4. Core 1 notify Core 0 (IPC Notify mechanism) that a new packet is received
  5. Core 0 wakes up and copy the packet from the shared ram inside its own RAM

Every access of shared ram is protected by a spinlock. I read the post on shared ram, and my memory map is like this:

  1. OCRAM Bank 0: SBL + Core 0
  2. OCRAM Bank 1: CPPI desc + Core 1 (included the Enet DMA Packet Pool)
  3. OCRAM Bank 2: Shared Memory between Core 1 and Core 2. It is just a 2 KB region (for 1 full Etheret Packet), where the packet is copied to / from. The MPU is configured like this, both for Coresimage.png

The problem arise when I perform a memcpy in Core 0 and I am not managing to explain why: the memcpy takes exactly 85 us, against the 3.75 us of the Core 1.

image.png

The code snippet are:

namespace Core0
{
static EthFrame internalFrame;
// Called inside the IPC callback
static void ethTakePkt() 
{
    spinlock_lock(SPINLOCK_0);
    CacheP_inv((void*)&ethPkt, sizeof(ethPkt), CacheP_TYPE_L1D);
    hal_gpio_on(DEBUG_PIN_0);
    memcpy((void*)&internalFrame, (void*)&ethPkt, sizeof(EthFrame));
    hal_gpio_off(DEBUG_PIN_0);
    spinlock_unlock(SPINLOCK_0);
    return;
}
}

namespace Core 1
{
// Called inside RX Task
static void copyFrame(EthFrame *frame)
{
        spinlock_lock(SPINLOCK_0);
        hal_gpio_on(DEBUG_PIN_1);
        memcpy((void*)&ethPkt, (void*) frame, sizeof(EthFrame));
        hal_gpio_off(DEBUG_PIN_1);
        CacheP_wbInv((void*)&ethPkt, sizeof(EthFrame), CacheP_TYPE_L1D);
        spinlock_unlock(SPINLOCK_0);
}
}

namespace Common
{
volatile EthFrame ethPkt __attribute__((aligned(128), section(".bss.eth_pkt")));
}

I inspect the generated code and confirm that compiler is not optimizing memcpy by deleting it. 

Any idea why Core 0 takes much long to copy data?

  • Hi Elia,

    Please allow me to review this, run some experiments and get back by tomorrow.

    Regards,
    Shaunak

  • Hi Elia,

    Can you also share the example.syscfg files for the two cores so I could take a look at the memory configurator and MPU configs

    Regards,
    Shaunak

  • Sure, here you can find the syscfgs.

    I also made more tests, and actually it seems that memcpy is slow always (nothing related to IPC or Ethernet). I suspect some problems in MPUbut do not really know where to look

    ---------------------------Core 0-------------------------------------
    /**
     * These arguments were used when this file was generated. They will be automatically applied on subsequent loads
     * via the GUI or CLI. Run CLI with '--help' for additional information on how to override these arguments.
     * @cliArgs --device "AM261x_ZFG" --part "AM2612" --package "ZFG" --context "r5fss0-0" --product "MCU_PLUS_SDK_AM261x@11.01.00"
     * @v2CliArgs --device "AM2612" --package "NFBGA (ZFG)" --variant "500MHz" --context "r5fss0-0" --product "MCU_PLUS_SDK_AM261x@11.01.00"
     * @versions {"tool":"1.26.0+4407"}
     */
    
    /**
     * Import the modules used in this configuration.
     */
    const gpio       = scripting.addModule("/drivers/gpio/gpio", {}, false);
    const gpio1      = gpio.addInstance();
    const gpio2      = gpio.addInstance();
    const ipc        = scripting.addModule("/drivers/ipc/ipc");
    const rti        = scripting.addModule("/drivers/rti/rti", {}, false);
    const rti1       = rti.addInstance();
    const uart       = scripting.addModule("/drivers/uart/uart", {}, false);
    const uart1      = uart.addInstance();
    const debug_log  = scripting.addModule("/kernel/dpl/debug_log");
    const dpl_cfg    = scripting.addModule("/kernel/dpl/dpl_cfg");
    const mpu_armv7  = scripting.addModule("/kernel/dpl/mpu_armv7", {}, false);
    const mpu_armv71 = mpu_armv7.addInstance();
    const mpu_armv72 = mpu_armv7.addInstance();
    const mpu_armv73 = mpu_armv7.addInstance();
    const mpu_armv74 = mpu_armv7.addInstance();
    const mpu_armv75 = mpu_armv7.addInstance();
    const mpu_armv76 = mpu_armv7.addInstance();
    const mpu_armv77 = mpu_armv7.addInstance();
    const general    = scripting.addModule("/memory_configurator/general", {}, false);
    const general1   = general.addInstance();
    const region     = scripting.addModule("/memory_configurator/region", {}, false);
    const region1    = region.addInstance();
    const section    = scripting.addModule("/memory_configurator/section", {}, false);
    const section1   = section.addInstance();
    const section2   = section.addInstance();
    const section3   = section.addInstance();
    const section4   = section.addInstance();
    const section5   = section.addInstance();
    const section6   = section.addInstance();
    const section7   = section.addInstance();
    const section8   = section.addInstance();
    const section9   = section.addInstance();
    const section10  = section.addInstance();
    const section11  = section.addInstance();
    
    /**
     * Write custom configuration values to the imported modules.
     */
    gpio1.$name          = "GPIO_LED";
    gpio1.pinDir         = "OUTPUT";
    gpio1.GPIO_n.$assign = "GPIO84";
    
    gpio2.$name          = "DEBUG_PIN0";
    gpio2.pinDir         = "OUTPUT";
    gpio2.GPIO_n.$assign = "GPIO49";
    
    ipc.r5fss0_1     = "notify";
    ipc.intrPriority = 1;
    
    rti1.$name          = "CONFIG_RTI0";
    rti1.counter0Enable = true;
    rti1.compare0Enable = true;
    rti1.eventCallback0 = "rtiEvent0";
    rti1.nsecPerTick0   = 1000000;
    
    uart1.$name            = "CONFIG_UART0";
    uart1.baudRate         = 1000000;
    uart1.intrEnable       = "USER_INTR";
    uart1.rxTrigLvl        = "1";
    uart1.txTrigLvl        = "63";
    uart1.intrPriority     = 10;
    uart1.UART.$assign     = "UART0";
    uart1.UART.RXD.$assign = "GPIO27";
    uart1.UART.TXD.$assign = "GPIO28";
    uart1.child.$name      = "drivers_uart_v2_uart_v2_template0";
    
    debug_log.enableLogZoneWarning = false;
    debug_log.enableLogZoneError   = false;
    debug_log.enableCssLog         = false;
    
    mpu_armv71.$name             = "CONFIG_MPU_REGION0";
    mpu_armv71.size              = 31;
    mpu_armv71.attributes        = "Device";
    mpu_armv71.accessPermissions = "Supervisor RD+WR, User RD";
    mpu_armv71.allowExecute      = false;
    
    mpu_armv72.$name             = "CONFIG_MPU_REGION1";
    mpu_armv72.size              = 15;
    mpu_armv72.accessPermissions = "Supervisor RD+WR, User RD";
    
    mpu_armv73.$name             = "CONFIG_MPU_REGION2";
    mpu_armv73.baseAddr          = 0x80000;
    mpu_armv73.size              = 15;
    mpu_armv73.accessPermissions = "Supervisor RD+WR, User RD";
    
    mpu_armv74.$name             = "CONFIG_MPU_REGION3";
    mpu_armv74.baseAddr          = 0x70000000;
    mpu_armv74.size              = 21;
    mpu_armv74.accessPermissions = "Supervisor RD+WR, User RD";
    
    mpu_armv75.$name        = "CONFIG_MPU_REGION4";
    mpu_armv75.allowExecute = false;
    mpu_armv75.size         = 11;
    mpu_armv75.isShareable  = true;
    mpu_armv75.isBufferable = false;
    mpu_armv75.baseAddr     = 0x70100000;
    mpu_armv75.attributes   = "NonCached";
    
    mpu_armv76.$name        = "CONFIG_MPU_REGION5";
    mpu_armv76.baseAddr     = 0x50D00000;
    mpu_armv76.size         = 14;
    mpu_armv76.attributes   = "Device";
    mpu_armv76.allowExecute = false;
    
    mpu_armv77.$name        = "CONFIG_MPU_REGION6";
    mpu_armv77.baseAddr     = 0x72000000;
    mpu_armv77.size         = 14;
    mpu_armv77.attributes   = "NonCached";
    mpu_armv77.allowExecute = false;
    
    general1.$name        = "CONFIG_GENERAL0";
    general1.heap_size    = 1024;
    general1.stack_size   = 8192;
    general1.linker.$name = "TIARMCLANG0";
    
    region1.$name                               = "MEMORY_REGION_CONFIGURATION0";
    region1.memory_region.create(10);
    region1.memory_region[0].type               = "TCMA";
    region1.memory_region[0].$name              = "R5F_VECS";
    region1.memory_region[0].size               = 0x40;
    region1.memory_region[0].auto               = false;
    region1.memory_region[1].type               = "TCMA";
    region1.memory_region[1].$name              = "R5F_TCMA";
    region1.memory_region[1].size               = 0x7FC0;
    region1.memory_region[2].type               = "TCMB";
    region1.memory_region[2].size               = 0x8000;
    region1.memory_region[2].$name              = "R5F_TCMB";
    region1.memory_region[3].$name              = "SBL";
    region1.memory_region[3].auto               = false;
    region1.memory_region[3].size               = 0x40000;
    region1.memory_region[4].$name              = "OCRAM";
    region1.memory_region[4].auto               = false;
    region1.memory_region[4].manualStartAddress = 0x70040000;
    region1.memory_region[4].size               = 0x40000;
    region1.memory_region[5].type               = "FLASH";
    region1.memory_region[5].auto               = false;
    region1.memory_region[5].manualStartAddress = 0x60100000;
    region1.memory_region[5].size               = 0x80000;
    region1.memory_region[5].$name              = "FLASH";
    region1.memory_region[6].type               = "CUSTOM";
    region1.memory_region[6].$name              = "RTOS_NORTOS_IPC_SHM_MEM";
    region1.memory_region[6].auto               = false;
    region1.memory_region[6].manualStartAddress = 0x72000000;
    region1.memory_region[6].size               = 0x3E80;
    region1.memory_region[6].isShared           = true;
    region1.memory_region[6].shared_cores       = ["r5fss0-1"];
    region1.memory_region[7].type               = "CUSTOM";
    region1.memory_region[7].$name              = "MAILBOX_HSM";
    region1.memory_region[7].auto               = false;
    region1.memory_region[7].manualStartAddress = 0x44000000;
    region1.memory_region[7].size               = 0x3CE;
    region1.memory_region[7].isShared           = true;
    region1.memory_region[7].shared_cores       = ["r5fss0-1"];
    region1.memory_region[8].type               = "CUSTOM";
    region1.memory_region[8].$name              = "MAILBOX_R5F";
    region1.memory_region[8].auto               = false;
    region1.memory_region[8].manualStartAddress = 0x44000400;
    region1.memory_region[8].size               = 0x3CE;
    region1.memory_region[8].isShared           = true;
    region1.memory_region[8].shared_cores       = ["r5fss0-1"];
    region1.memory_region[9].$name              = "ETH_PACKET_REGION";
    region1.memory_region[9].isShared           = true;
    region1.memory_region[9].shared_cores       = ["r5fss0-1"];
    region1.memory_region[9].size               = 0x800;
    region1.memory_region[9].auto               = false;
    region1.memory_region[9].manualStartAddress = 0x70100000;
    
    section1.load_memory                  = "R5F_VECS";
    section1.group                        = false;
    section1.$name                        = "Vector Table";
    section1.output_section.create(1);
    section1.output_section[0].$name      = ".vectors";
    section1.output_section[0].palignment = true;
    
    section2.load_memory                  = "OCRAM";
    section2.$name                        = "Text Segments";
    section2.output_section.create(5);
    section2.output_section[0].$name      = ".text.hwi";
    section2.output_section[0].palignment = true;
    section2.output_section[1].$name      = ".text.cache";
    section2.output_section[1].palignment = true;
    section2.output_section[2].$name      = ".text.mpu";
    section2.output_section[2].palignment = true;
    section2.output_section[3].$name      = ".text.boot";
    section2.output_section[3].palignment = true;
    section2.output_section[4].$name      = ".text:abort";
    section2.output_section[4].palignment = true;
    
    section3.load_memory                  = "OCRAM";
    section3.$name                        = "Code and Read-Only Data";
    section3.output_section.create(2);
    section3.output_section[0].$name      = ".text";
    section3.output_section[0].palignment = true;
    section3.output_section[1].$name      = ".rodata";
    section3.output_section[1].palignment = true;
    
    section4.load_memory                  = "OCRAM";
    section4.$name                        = "Data Segment";
    section4.output_section.create(1);
    section4.output_section[0].$name      = ".data";
    section4.output_section[0].palignment = true;
    
    section5.load_memory                             = "OCRAM";
    section5.$name                                   = "Memory Segments";
    section5.output_section.create(3);
    section5.output_section[0].$name                 = ".bss";
    section5.output_section[0].output_sections_start = "__BSS_START";
    section5.output_section[0].output_sections_end   = "__BSS_END";
    section5.output_section[0].palignment            = true;
    section5.output_section[1].$name                 = ".sysmem";
    section5.output_section[1].palignment            = true;
    section5.output_section[2].$name                 = ".stack";
    section5.output_section[2].palignment            = true;
    
    section6.load_memory                              = "OCRAM";
    section6.$name                                    = "Stack Segments";
    section6.output_section.create(5);
    section6.output_section[0].$name                  = ".irqstack";
    section6.output_section[0].output_sections_start  = "__IRQ_STACK_START";
    section6.output_section[0].output_sections_end    = "__IRQ_STACK_END";
    section6.output_section[0].input_section.create(1);
    section6.output_section[0].input_section[0].$name = ". = . + __IRQ_STACK_SIZE;";
    section6.output_section[1].$name                  = ".fiqstack";
    section6.output_section[1].output_sections_start  = "__FIQ_STACK_START";
    section6.output_section[1].output_sections_end    = "__FIQ_STACK_END";
    section6.output_section[1].input_section.create(1);
    section6.output_section[1].input_section[0].$name = ". = . + __FIQ_STACK_SIZE;";
    section6.output_section[2].$name                  = ".svcstack";
    section6.output_section[2].output_sections_start  = "__SVC_STACK_START";
    section6.output_section[2].output_sections_end    = "__SVC_STACK_END";
    section6.output_section[2].input_section.create(1);
    section6.output_section[2].input_section[0].$name = ". = . + __SVC_STACK_SIZE;";
    section6.output_section[3].$name                  = ".abortstack";
    section6.output_section[3].output_sections_start  = "__ABORT_STACK_START";
    section6.output_section[3].output_sections_end    = "__ABORT_STACK_END";
    section6.output_section[3].input_section.create(1);
    section6.output_section[3].input_section[0].$name = ". = . + __ABORT_STACK_SIZE;";
    section6.output_section[4].$name                  = ".undefinedstack";
    section6.output_section[4].output_sections_start  = "__UNDEFINED_STACK_START";
    section6.output_section[4].output_sections_end    = "__UNDEFINED_STACK_END";
    section6.output_section[4].input_section.create(1);
    section6.output_section[4].input_section[0].$name = ". = . + __UNDEFINED_STACK_SIZE;";
    
    section7.load_memory                  = "OCRAM";
    section7.$name                        = "Initialization and Exception Handling";
    section7.output_section.create(3);
    section7.output_section[0].$name      = ".ARM.exidx";
    section7.output_section[0].palignment = true;
    section7.output_section[1].$name      = ".init_array";
    section7.output_section[1].palignment = true;
    section7.output_section[2].$name      = ".fini_array";
    section7.output_section[2].palignment = true;
    
    section8.load_memory                 = "RTOS_NORTOS_IPC_SHM_MEM";
    section8.type                        = "NOLOAD";
    section8.$name                       = "IPC Shared Memory";
    section8.group                       = false;
    section8.output_section.create(1);
    section8.output_section[0].$name     = ".bss.ipc_vring_mem";
    section8.output_section[0].alignment = 0;
    
    section9.load_memory                 = "MAILBOX_HSM";
    section9.type                        = "NOLOAD";
    section9.$name                       = "SIPC HSM Queue Memory";
    section9.group                       = false;
    section9.output_section.create(1);
    section9.output_section[0].$name     = ".bss.sipc_hsm_queue_mem";
    section9.output_section[0].alignment = 0;
    
    section10.load_memory                 = "MAILBOX_R5F";
    section10.$name                       = "SIPC R5F Queue Memory";
    section10.group                       = false;
    section10.type                        = "NOLOAD";
    section10.output_section.create(1);
    section10.output_section[0].$name     = ".bss.sipc_secure_host_queue_mem";
    section10.output_section[0].alignment = 0;
    
    section11.$name                       = "ETH_PACKET";
    section11.type                        = "NOLOAD";
    section11.group                       = false;
    section11.load_memory                 = "ETH_PACKET_REGION";
    section11.output_section.create(1);
    section11.output_section[0].$name     = ".bss.eth_pkt";
    section11.output_section[0].alignment = 128;
    
    /**
     * Pinmux solution for unlocked pins/peripherals. This ensures that minor changes to the automatic solver in a future
     * version of the tool will not impact the pinmux you originally saw.  These lines can be completely deleted in order to
     * re-solve from scratch.
     */
    rti1.RTI.$suggestSolution = "RTI2";
    
    
    
    
    
    
    ---------------------------Core 1-------------------------------------
    /**
     * These arguments were used when this file was generated. They will be automatically applied on subsequent loads
     * via the GUI or CLI. Run CLI with '--help' for additional information on how to override these arguments.
     * @cliArgs --device "AM261x_ZFG" --part "AM2612" --package "ZFG" --context "r5fss0-1" --product "MCU_PLUS_SDK_AM261x@11.01.00"
     * @v2CliArgs --device "AM2612" --package "NFBGA (ZFG)" --variant "500MHz" --context "r5fss0-1" --product "MCU_PLUS_SDK_AM261x@11.01.00"
     * @versions {"tool":"1.26.0+4407"}
     */
    
    /**
     * Import the modules used in this configuration.
     */
    const eeprom             = scripting.addModule("/board/eeprom/eeprom", {}, false);
    const eeprom1            = eeprom.addInstance();
    const ethphy_cpsw_icssg  = scripting.addModule("/board/ethphy_cpsw_icssg/ethphy_cpsw_icssg", {}, false);
    const ethphy_cpsw_icssg1 = ethphy_cpsw_icssg.addInstance();
    const gpio               = scripting.addModule("/drivers/gpio/gpio", {}, false);
    const gpio1              = gpio.addInstance();
    const gpio2              = gpio.addInstance();
    const gpio3              = gpio.addInstance();
    const i2c                = scripting.addModule("/drivers/i2c/i2c", {}, false);
    const i2c1               = i2c.addInstance();
    const ipc                = scripting.addModule("/drivers/ipc/ipc");
    const clock              = scripting.addModule("/kernel/dpl/clock");
    const debug_log          = scripting.addModule("/kernel/dpl/debug_log");
    const dpl_cfg            = scripting.addModule("/kernel/dpl/dpl_cfg");
    const mpu_armv7          = scripting.addModule("/kernel/dpl/mpu_armv7", {}, false);
    const mpu_armv71         = mpu_armv7.addInstance();
    const mpu_armv72         = mpu_armv7.addInstance();
    const mpu_armv73         = mpu_armv7.addInstance();
    const mpu_armv74         = mpu_armv7.addInstance();
    const mpu_armv75         = mpu_armv7.addInstance();
    const mpu_armv76         = mpu_armv7.addInstance();
    const mpu_armv77         = mpu_armv7.addInstance();
    const mpu_armv78         = mpu_armv7.addInstance();
    const general            = scripting.addModule("/memory_configurator/general", {}, false);
    const general1           = general.addInstance();
    const region             = scripting.addModule("/memory_configurator/region", {}, false);
    const region1            = region.addInstance();
    const section            = scripting.addModule("/memory_configurator/section", {}, false);
    const section1           = section.addInstance();
    const section2           = section.addInstance();
    const section3           = section.addInstance();
    const section4           = section.addInstance();
    const section5           = section.addInstance();
    const section6           = section.addInstance();
    const section7           = section.addInstance();
    const section8           = section.addInstance();
    const section9           = section.addInstance();
    const section10          = section.addInstance();
    const section11          = section.addInstance();
    const section12          = section.addInstance();
    const section13          = section.addInstance();
    const enet_cpsw          = scripting.addModule("/networking/enet_cpsw/enet_cpsw", {}, false);
    const enet_cpsw1         = enet_cpsw.addInstance();
    
    /**
     * Write custom configuration values to the imported modules.
     */
    eeprom1.$name      = "CONFIG_EEPROM0";
    eeprom1.i2cAddress = 0x51;
    
    gpio1.$name          = "GPIO_DEBUG0";
    gpio1.pinDir         = "OUTPUT";
    gpio1.GPIO_n.$assign = "GPIO47";
    
    gpio2.$name          = "GPIO_DEBUG1";
    gpio2.pinDir         = "OUTPUT";
    gpio2.GPIO_n.$assign = "GPIO48";
    
    gpio3.$name          = "GPIO_DEBUG2";
    gpio3.pinDir         = "OUTPUT";
    gpio3.GPIO_n.$assign = "GPIO50";
    
    i2c1.$name               = "CONFIG_I2C0";
    eeprom1.peripheralDriver = i2c1;
    i2c1.I2C.$assign         = "I2C0";
    i2c1.I2C.SCL.$assign     = "GPIO135";
    i2c1.I2C.SDA.$assign     = "GPIO134";
    i2c1.I2C_child.$name     = "drivers_i2c_v1_i2c_v1_template1";
    
    ipc.r5fss0_0     = "notify";
    ipc.intrPriority = 1;
    
    clock.instance = "RTI0";
    
    debug_log.enableLogZoneWarning = false;
    debug_log.enableMemLog         = true;
    
    mpu_armv71.$name             = "CONFIG_MPU_REGION0";
    mpu_armv71.size              = 31;
    mpu_armv71.attributes        = "Device";
    mpu_armv71.accessPermissions = "Supervisor RD+WR, User RD";
    mpu_armv71.allowExecute      = false;
    
    mpu_armv72.$name             = "CONFIG_MPU_REGION1";
    mpu_armv72.size              = 15;
    mpu_armv72.accessPermissions = "Supervisor RD+WR, User RD";
    
    mpu_armv73.$name             = "CONFIG_MPU_REGION2";
    mpu_armv73.baseAddr          = 0x80000;
    mpu_armv73.size              = 15;
    mpu_armv73.accessPermissions = "Supervisor RD+WR, User RD";
    
    mpu_armv74.$name             = "CONFIG_MPU_REGION3";
    mpu_armv74.accessPermissions = "Supervisor RD+WR, User RD";
    mpu_armv74.baseAddr          = 0x70000000;
    mpu_armv74.size              = 21;
    
    mpu_armv75.$name      = "CONFIG_MPU_REGION4";
    mpu_armv75.size       = 12;
    mpu_armv75.baseAddr   = 0x70080000;
    mpu_armv75.attributes = "NonCached";
    
    mpu_armv76.$name        = "CONFIG_MPU_REGION5";
    mpu_armv76.allowExecute = false;
    mpu_armv76.isShareable  = true;
    mpu_armv76.isBufferable = false;
    mpu_armv76.baseAddr     = 0x50D00000;
    mpu_armv76.size         = 14;
    mpu_armv76.attributes   = "Device";
    
    mpu_armv77.$name        = "CONFIG_MPU_REGION6";
    mpu_armv77.baseAddr     = 0x72000000;
    mpu_armv77.size         = 14;
    mpu_armv77.allowExecute = false;
    mpu_armv77.attributes   = "NonCached";
    
    mpu_armv78.$name        = "CONFIG_MPU_REGION7";
    mpu_armv78.baseAddr     = 0x70100000;
    mpu_armv78.size         = 11;
    mpu_armv78.isShareable  = true;
    mpu_armv78.isBufferable = false;
    mpu_armv78.allowExecute = false;
    mpu_armv78.attributes   = "NonCached";
    
    general1.$name        = "CONFIG_GENERAL0";
    general1.stack_size   = 8192;
    general1.heap_size    = 1024;
    general1.linker.$name = "TIARMCLANG0";
    
    region1.$name                               = "MEMORY_REGION_CONFIGURATION0";
    region1.memory_region.create(6);
    region1.memory_region[0].type               = "TCMA";
    region1.memory_region[0].$name              = "R5F_VECS";
    region1.memory_region[0].auto               = false;
    region1.memory_region[0].size               = 0x40;
    region1.memory_region[1].type               = "TCMA";
    region1.memory_region[1].$name              = "R5F_TCMA";
    region1.memory_region[1].size               = 0x7FC0;
    region1.memory_region[2].type               = "TCMB";
    region1.memory_region[2].size               = 0x8000;
    region1.memory_region[2].$name              = "R5F_TCMB";
    region1.memory_region[3].$name              = "OCRAM";
    region1.memory_region[3].manualStartAddress = 0x70080000;
    region1.memory_region[3].size               = 0x7C000;
    region1.memory_region[4].type               = "FLASH";
    region1.memory_region[4].auto               = false;
    region1.memory_region[4].manualStartAddress = 0x60180000;
    region1.memory_region[4].size               = 0x80000;
    region1.memory_region[4].$name              = "FLASH";
    region1.memory_region[5].$name              = "CPPI_DESC";
    region1.memory_region[5].auto               = false;
    region1.memory_region[5].manualStartAddress = 0x70080000;
    region1.memory_region[5].size               = 0x4000;
    
    section1.load_memory                  = "R5F_VECS";
    section1.group                        = false;
    section1.$name                        = "Vector Table";
    section1.output_section.create(1);
    section1.output_section[0].$name      = ".vectors";
    section1.output_section[0].palignment = true;
    
    section2.load_memory                  = "OCRAM";
    section2.$name                        = "Text Segments";
    section2.output_section.create(5);
    section2.output_section[0].$name      = ".text.hwi";
    section2.output_section[0].palignment = true;
    section2.output_section[1].$name      = ".text.cache";
    section2.output_section[1].palignment = true;
    section2.output_section[2].$name      = ".text.mpu";
    section2.output_section[2].palignment = true;
    section2.output_section[3].$name      = ".text.boot";
    section2.output_section[3].palignment = true;
    section2.output_section[4].$name      = ".text:abort";
    section2.output_section[4].palignment = true;
    
    section3.load_memory                  = "OCRAM";
    section3.$name                        = "Code and Read-Only Data";
    section3.output_section.create(2);
    section3.output_section[0].$name      = ".text";
    section3.output_section[0].palignment = true;
    section3.output_section[1].$name      = ".rodata";
    section3.output_section[1].palignment = true;
    
    section4.load_memory                  = "OCRAM";
    section4.$name                        = "Data Segment";
    section4.output_section.create(1);
    section4.output_section[0].$name      = ".data";
    section4.output_section[0].palignment = true;
    
    section5.load_memory                             = "OCRAM";
    section5.$name                                   = "Memory Segments";
    section5.output_section.create(3);
    section5.output_section[0].$name                 = ".bss";
    section5.output_section[0].output_sections_start = "__BSS_START";
    section5.output_section[0].output_sections_end   = "__BSS_END";
    section5.output_section[0].palignment            = true;
    section5.output_section[1].$name                 = ".sysmem";
    section5.output_section[1].palignment            = true;
    section5.output_section[2].$name                 = ".stack";
    section5.output_section[2].palignment            = true;
    
    section6.load_memory                              = "OCRAM";
    section6.$name                                    = "Stack Segments";
    section6.output_section.create(5);
    section6.output_section[0].$name                  = ".irqstack";
    section6.output_section[0].output_sections_start  = "__IRQ_STACK_START";
    section6.output_section[0].output_sections_end    = "__IRQ_STACK_END";
    section6.output_section[0].input_section.create(1);
    section6.output_section[0].input_section[0].$name = ". = . + __IRQ_STACK_SIZE;";
    section6.output_section[1].$name                  = ".fiqstack";
    section6.output_section[1].output_sections_start  = "__FIQ_STACK_START";
    section6.output_section[1].output_sections_end    = "__FIQ_STACK_END";
    section6.output_section[1].input_section.create(1);
    section6.output_section[1].input_section[0].$name = ". = . + __FIQ_STACK_SIZE;";
    section6.output_section[2].$name                  = ".svcstack";
    section6.output_section[2].output_sections_start  = "__SVC_STACK_START";
    section6.output_section[2].output_sections_end    = "__SVC_STACK_END";
    section6.output_section[2].input_section.create(1);
    section6.output_section[2].input_section[0].$name = ". = . + __SVC_STACK_SIZE;";
    section6.output_section[3].$name                  = ".abortstack";
    section6.output_section[3].output_sections_start  = "__ABORT_STACK_START";
    section6.output_section[3].output_sections_end    = "__ABORT_STACK_END";
    section6.output_section[3].input_section.create(1);
    section6.output_section[3].input_section[0].$name = ". = . + __ABORT_STACK_SIZE;";
    section6.output_section[4].$name                  = ".undefinedstack";
    section6.output_section[4].output_sections_start  = "__UNDEFINED_STACK_START";
    section6.output_section[4].output_sections_end    = "__UNDEFINED_STACK_END";
    section6.output_section[4].input_section.create(1);
    section6.output_section[4].input_section[0].$name = ". = . + __UNDEFINED_STACK_SIZE;";
    
    section7.load_memory                  = "OCRAM";
    section7.$name                        = "Initialization and Exception Handling";
    section7.output_section.create(3);
    section7.output_section[0].$name      = ".ARM.exidx";
    section7.output_section[0].palignment = true;
    section7.output_section[1].$name      = ".init_array";
    section7.output_section[1].palignment = true;
    section7.output_section[2].$name      = ".fini_array";
    section7.output_section[2].palignment = true;
    
    section8.load_memory                 = "RTOS_NORTOS_IPC_SHM_MEM";
    section8.type                        = "NOLOAD";
    section8.$name                       = "IPC Shared Memory";
    section8.group                       = false;
    section8.output_section.create(1);
    section8.output_section[0].$name     = ".bss.ipc_vring_mem";
    section8.output_section[0].alignment = 0;
    
    section9.load_memory                 = "MAILBOX_HSM";
    section9.type                        = "NOLOAD";
    section9.$name                       = "SIPC HSM Queue Memory";
    section9.group                       = false;
    section9.output_section.create(1);
    section9.output_section[0].$name     = ".bss.sipc_hsm_queue_mem";
    section9.output_section[0].alignment = 0;
    
    section10.load_memory                 = "MAILBOX_R5F";
    section10.$name                       = "SIPC R5F Queue Memory";
    section10.group                       = false;
    section10.type                        = "NOLOAD";
    section10.output_section.create(1);
    section10.output_section[0].$name     = ".bss.sipc_secure_host_queue_mem";
    section10.output_section[0].alignment = 0;
    
    section11.type                        = "NOLOAD";
    section11.group                       = false;
    section11.load_memory                 = "CPPI_DESC";
    section11.$name                       = "ENET_CPPI_DESC";
    section11.output_section.create(1);
    section11.output_section[0].alignment = 128;
    section11.output_section[0].$name     = ".bss:ENET_CPPI_DESC";
    
    section12.$name                       = "ENET_DMA_PKT_MEMPOOL";
    section12.type                        = "NOLOAD";
    section12.group                       = false;
    section12.load_memory                 = "OCRAM";
    section12.output_section.create(1);
    section12.output_section[0].$name     = ".bss:ENET_DMA_PKT_MEMPOOL";
    section12.output_section[0].alignment = 128;
    
    section13.$name                       = "ETH_PACKET";
    section13.type                        = "NOLOAD";
    section13.group                       = false;
    section13.load_memory                 = "ETH_PACKET_REGION";
    section13.output_section.create(1);
    section13.output_section[0].$name     = ".bss.eth_pkt";
    section13.output_section[0].alignment = 128;
    
    enet_cpsw1.$name                 = "CONFIG_ENET_CPSW0";
    enet_cpsw1.LargePoolPktCount     = 16;
    enet_cpsw1.MediumPoolPktCount    = 32;
    enet_cpsw1.cptsHostRxTsEn        = false;
    enet_cpsw1.cptsRftClkFreq        = "CPSW_CPTS_RFTCLK_FREQ_200MHZ";
    enet_cpsw1.mdioMode              = "MDIO_MODE_MANUAL";
    enet_cpsw1.txDmaChannel[0].$name = "ENET_DMA_TX_CH0";
    enet_cpsw1.rxDmaChannel[0].$name = "ENET_DMA_RX_CH0";
    
    ethphy_cpsw_icssg1.$name          = "CONFIG_ENET_ETHPHY0";
    ethphy_cpsw_icssg1.boardType      = "am261x-lp";
    ethphy_cpsw_icssg1.peripheral     = "CPSW_MAC_PORT_1";
    enet_cpsw1.ethphy1                = ethphy_cpsw_icssg1;
    ethphy_cpsw_icssg1.extendedConfig = "/* Extended PHY configuration for DP83869 */\n.txClkShiftEn         = true,\n.rxClkShiftEn         = true,\n.txDelayInPs          = 2000U,   /* Value in pecosec. Refer to DLL_RX_DELAY_CTRL_SL field in ANA_RGMII_DLL_CTRL register of DP83869 PHY datasheet */\n.rxDelayInPs          = 2000U,   /* Value in pecosec. Refer to DLL_TX_DELAY_CTRL_SL field in ANA_RGMII_DLL_CTRL register of DP83869 PHY datasheet */\n.txFifoDepth          = 4U,\n.impedanceInMilliOhms = 35000,  /* 35 ohms */\n.idleCntThresh        = 4U,     /* Improves short cable performance */\n.gpio0Mode            = DP83869_GPIO0_COL,\n.gpio1Mode            = DP83869_GPIO1_CONSTANT0, /* Unused */\n.ledMode              =\n{\n\tDP83869_LED_LINKED,         /* Unused */\n\tDP83869_LED_RXERR,\n\tDP83869_LED_COLLDET,\n\tDP83869_LED_LINKED_1000BT,\n},";
    
    const ethphy_cpsw_icssg2 = ethphy_cpsw_icssg.addInstance({}, false);
    ethphy_cpsw_icssg2.$name = "CONFIG_ENET_ETHPHY1";
    enet_cpsw1.ethphy2       = ethphy_cpsw_icssg2;
    
    /**
     * Pinmux solution for unlocked pins/peripherals. This ensures that minor changes to the automatic solver in a future
     * version of the tool will not impact the pinmux you originally saw.  These lines can be completely deleted in order to
     * re-solve from scratch.
     */
    enet_cpsw1.MDIO.$suggestSolution          = "MDIO0";
    enet_cpsw1.MDIO.MDC.$suggestSolution      = "GPIO42";
    enet_cpsw1.MDIO.MDIO.$suggestSolution     = "GPIO41";
    enet_cpsw1.RGMII1.$suggestSolution        = "RGMII2";
    enet_cpsw1.RGMII1.RD0.$suggestSolution    = "GPIO93";
    enet_cpsw1.RGMII1.RD1.$suggestSolution    = "GPIO94";
    enet_cpsw1.RGMII1.RD2.$suggestSolution    = "GPIO95";
    enet_cpsw1.RGMII1.RD3.$suggestSolution    = "GPIO96";
    enet_cpsw1.RGMII1.RX_CTL.$suggestSolution = "GPIO92";
    enet_cpsw1.RGMII1.RXC.$suggestSolution    = "GPIO91";
    enet_cpsw1.RGMII1.TD0.$suggestSolution    = "GPIO99";
    enet_cpsw1.RGMII1.TD1.$suggestSolution    = "GPIO100";
    enet_cpsw1.RGMII1.TD2.$suggestSolution    = "GPIO101";
    enet_cpsw1.RGMII1.TD3.$suggestSolution    = "GPIO102";
    enet_cpsw1.RGMII1.TX_CTL.$suggestSolution = "GPIO98";
    enet_cpsw1.RGMII1.TXC.$suggestSolution    = "GPIO97";
    enet_cpsw1.RGMII2.$suggestSolution        = "RGMII1";
    enet_cpsw1.RGMII2.RD0.$suggestSolution    = "GPIO109";
    enet_cpsw1.RGMII2.RD1.$suggestSolution    = "GPIO110";
    enet_cpsw1.RGMII2.RD2.$suggestSolution    = "GPIO111";
    enet_cpsw1.RGMII2.RD3.$suggestSolution    = "GPIO112";
    enet_cpsw1.RGMII2.RX_CTL.$suggestSolution = "GPIO108";
    enet_cpsw1.RGMII2.RXC.$suggestSolution    = "GPIO107";
    enet_cpsw1.RGMII2.TD0.$suggestSolution    = "GPIO115";
    enet_cpsw1.RGMII2.TD1.$suggestSolution    = "GPIO116";
    enet_cpsw1.RGMII2.TD2.$suggestSolution    = "GPIO117";
    enet_cpsw1.RGMII2.TD3.$suggestSolution    = "GPIO118";
    enet_cpsw1.RGMII2.TX_CTL.$suggestSolution = "GPIO114";

  • Hi Elia,

    From initial analysis what i believe happens is, when you try to copy the ethPkt from 0x70100000, which is marked as non-cached and non-bufferable shared memory, where the destination memory is a cached OCRAM region (internalFrame).

    1. I believe the slow copy that we compare with fast copy is because we are reading from cache in one case and from a non-cached shared buffer in one case.

    2. Moreover, Non-Bufferable (isBufferable = false): The processor must wait for the write transaction to complete on the bus before proceeding. This is generally used for peripheral access where strict ordering is required. In this case, we set core-0 memory region to non-bufferable so it becomes slower as a result of no instruction pipe-lining.

    One solution you can try is to set the isBufferable = true (in both cores).

    Can you once try to rebuild the example and try to benchmark again and see if the above helps?

    Regards,
    Shaunak

  • Hi,

    These are my results over 1k acquisistion

    MPU Configuration of Shared RAM Core 0 Latency [µs] Core 1 Latency [µs]
    Not Configured 23.44 ± 0.06 6.6 ± 0.11
    Shareable 86.86 ± 0.05 6.12 ± 0.11
    Cacheable, Shareable 86.86 ± 0.08 6.11 ± 0.1
    Cacheable, Shareable, Bufferable 86.86 ± 0.08 6.12 ± 0.15

    What is more surprising though, is that if I do the same copy (from shared memory region to Core specific OCRAM), I get this results:

    MPU Configuration of Shared RAM Core 0 Latency [µs] Core 1 Latency [µs]
    Not Configured 20.5 4.625
    Shareable 85.875 8.25
    Cacheable, Shareable 85.875 8.25
    Cacheable, Shareable, Bufferable 86 8.25

    So, from experiments, it seems that MPU is just messing around with Core 0 for some reason.

    Any suggestion on how to continue?

  • I found the main problem. One core was compiled with --use_memcpy=fast, while the other no. This results in an improvment of over 10x in speed.

  • Hi Elia,

    Thanks for the update, makes sense that --use_memcpy=fast created the improvement. 
    Also, I am surprised that the bufferable = 0/1 did not create any change in performance.

    Regards,
    Shaunak