AM2612: AM261x-LP: Shared memory slow access on R5FSS0-0

Elia Pellegrino

Part Number: AM2612

Dear Experts,

I am working on IPC communication between Core 1 and Core 0 of my AM261x-LP. I modified the example l2_enet_switch, in a way that Core 1 receive the packet and then send the packet to the Core 0. The communication works like this:

Ethernet pkt is received from Core 1
ISR is generated and semaphore is posted
RX Task run and write the dequeued packet inside shared memory between Core 0 and Core 1
Core 1 notify Core 0 (IPC Notify mechanism) that a new packet is received
Core 0 wakes up and copy the packet from the shared ram inside its own RAM

Every access of shared ram is protected by a spinlock. I read the post on shared ram, and my memory map is like this:

OCRAM Bank 0: SBL + Core 0
OCRAM Bank 1: CPPI desc + Core 1 (included the Enet DMA Packet Pool)
OCRAM Bank 2: Shared Memory between Core 1 and Core 2. It is just a 2 KB region (for 1 full Etheret Packet), where the packet is copied to / from. The MPU is configured like this, both for Cores

The problem arise when I perform a memcpy in Core 0 and I am not managing to explain why: the memcpy takes exactly 85 us, against the 3.75 us of the Core 1.

The code snippet are:

namespace Core0
{
static EthFrame internalFrame;
// Called inside the IPC callback
static void ethTakePkt() 
{
    spinlock_lock(SPINLOCK_0);
    CacheP_inv((void*)&ethPkt, sizeof(ethPkt), CacheP_TYPE_L1D);
    hal_gpio_on(DEBUG_PIN_0);
    memcpy((void*)&internalFrame, (void*)&ethPkt, sizeof(EthFrame));
    hal_gpio_off(DEBUG_PIN_0);
    spinlock_unlock(SPINLOCK_0);
    return;
}
}

namespace Core 1
{
// Called inside RX Task
static void copyFrame(EthFrame *frame)
{
        spinlock_lock(SPINLOCK_0);
        hal_gpio_on(DEBUG_PIN_1);
        memcpy((void*)&ethPkt, (void*) frame, sizeof(EthFrame));
        hal_gpio_off(DEBUG_PIN_1);
        CacheP_wbInv((void*)&ethPkt, sizeof(EthFrame), CacheP_TYPE_L1D);
        spinlock_unlock(SPINLOCK_0);
}
}

namespace Common
{
volatile EthFrame ethPkt __attribute__((aligned(128), section(".bss.eth_pkt")));
}

I inspect the generated code and confirm that compiler is not optimizing memcpy by deleting it.

Any idea why Core 0 takes much long to copy data?

3 months ago

0 Shaunak Deshpande 3 months ago

TI__Mastermind 23972 points

Hi Elia,

Please allow me to review this, run some experiments and get back by tomorrow.

Regards,
Shaunak

0 Shaunak Deshpande 3 months ago

TI__Mastermind 23972 points

Hi Elia,

Can you also share the example.syscfg files for the two cores so I could take a look at the memory configurator and MPU configs

Regards,
Shaunak

0 Elia Pellegrino 3 months ago

Prodigy 30 points

Sure, here you can find the syscfgs.

I also made more tests, and actually it seems that memcpy is slow always (nothing related to IPC or Ethernet). I suspect some problems in MPUbut do not really know where to look

---------------------------Core 0-------------------------------------
/**
 * These arguments were used when this file was generated. They will be automatically applied on subsequent loads
 * via the GUI or CLI. Run CLI with '--help' for additional information on how to override these arguments.
 * @cliArgs --device "AM261x_ZFG" --part "AM2612" --package "ZFG" --context "r5fss0-0" --product "MCU_PLUS_SDK_AM261x@11.01.00"
 * @v2CliArgs --device "AM2612" --package "NFBGA (ZFG)" --variant "500MHz" --context "r5fss0-0" --product "MCU_PLUS_SDK_AM261x@11.01.00"
 * @versions {"tool":"1.26.0+4407"}
 */

/**
 * Import the modules used in this configuration.
 */
const gpio       = scripting.addModule("/drivers/gpio/gpio", {}, false);
const gpio1      = gpio.addInstance();
const gpio2      = gpio.addInstance();
const ipc        = scripting.addModule("/drivers/ipc/ipc");
const rti        = scripting.addModule("/drivers/rti/rti", {}, false);
const rti1       = rti.addInstance();
const uart       = scripting.addModule("/drivers/uart/uart", {}, false);
const uart1      = uart.addInstance();
const debug_log  = scripting.addModule("/kernel/dpl/debug_log");
const dpl_cfg    = scripting.addModule("/kernel/dpl/dpl_cfg");
const mpu_armv7  = scripting.addModule("/kernel/dpl/mpu_armv7", {}, false);
const mpu_armv71 = mpu_armv7.addInstance();
const mpu_armv72 = mpu_armv7.addInstance();
const mpu_armv73 = mpu_armv7.addInstance();
const mpu_armv74 = mpu_armv7.addInstance();
const mpu_armv75 = mpu_armv7.addInstance();
const mpu_armv76 = mpu_armv7.addInstance();
const mpu_armv77 = mpu_armv7.addInstance();
const general    = scripting.addModule("/memory_configurator/general", {}, false);
const general1   = general.addInstance();
const region     = scripting.addModule("/memory_configurator/region", {}, false);
const region1    = region.addInstance();
const section    = scripting.addModule("/memory_configurator/section", {}, false);
const section1   = section.addInstance();
const section2   = section.addInstance();
const section3   = section.addInstance();
const section4   = section.addInstance();
const section5   = section.addInstance();
const section6   = section.addInstance();
const section7   = section.addInstance();
const section8   = section.addInstance();
const section9   = section.addInstance();
const section10  = section.addInstance();
const section11  = section.addInstance();

/**
 * Write custom configuration values to the imported modules.
 */
gpio1.$name          = "GPIO_LED";
gpio1.pinDir         = "OUTPUT";
gpio1.GPIO_n.$assign = "GPIO84";

gpio2.$name          = "DEBUG_PIN0";
gpio2.pinDir         = "OUTPUT";
gpio2.GPIO_n.$assign = "GPIO49";

ipc.r5fss0_1     = "notify";
ipc.intrPriority = 1;

rti1.$name          = "CONFIG_RTI0";
rti1.counter0Enable = true;
rti1.compare0Enable = true;
rti1.eventCallback0 = "rtiEvent0";
rti1.nsecPerTick0   = 1000000;

uart1.$name            = "CONFIG_UART0";
uart1.baudRate         = 1000000;
uart1.intrEnable       = "USER_INTR";
uart1.rxTrigLvl        = "1";
uart1.txTrigLvl        = "63";
uart1.intrPriority     = 10;
uart1.UART.$assign     = "UART0";
uart1.UART.RXD.$assign = "GPIO27";
uart1.UART.TXD.$assign = "GPIO28";
uart1.child.$name      = "drivers_uart_v2_uart_v2_template0";

debug_log.enableLogZoneWarning = false;
debug_log.enableLogZoneError   = false;
debug_log.enableCssLog         = false;

mpu_armv71.$name             = "CONFIG_MPU_REGION0";
mpu_armv71.size              = 31;
mpu_armv71.attributes        = "Device";
mpu_armv71.accessPermissions = "Supervisor RD+WR, User RD";
mpu_armv71.allowExecute      = false;

mpu_armv72.$name             = "CONFIG_MPU_REGION1";
mpu_armv72.size              = 15;
mpu_armv72.accessPermissions = "Supervisor RD+WR, User RD";

mpu_armv73.$name             = "CONFIG_MPU_REGION2";
mpu_armv73.baseAddr          = 0x80000;
mpu_armv73.size              = 15;
mpu_armv73.accessPermissions = "Supervisor RD+WR, User RD";

mpu_armv74.$name             = "CONFIG_MPU_REGION3";
mpu_armv74.baseAddr          = 0x70000000;
mpu_armv74.size              = 21;
mpu_armv74.accessPermissions = "Supervisor RD+WR, User RD";

mpu_armv75.$name        = "CONFIG_MPU_REGION4";
mpu_armv75.allowExecute = false;
mpu_armv75.size         = 11;
mpu_armv75.isShareable  = true;
mpu_armv75.isBufferable = false;
mpu_armv75.baseAddr     = 0x70100000;
mpu_armv75.attributes   = "NonCached";

mpu_armv76.$name        = "CONFIG_MPU_REGION5";
mpu_armv76.baseAddr     = 0x50D00000;
mpu_armv76.size         = 14;
mpu_armv76.attributes   = "Device";
mpu_armv76.allowExecute = false;

mpu_armv77.$name        = "CONFIG_MPU_REGION6";
mpu_armv77.baseAddr     = 0x72000000;
mpu_armv77.size         = 14;
mpu_armv77.attributes   = "NonCached";
mpu_armv77.allowExecute = false;

general1.$name        = "CONFIG_GENERAL0";
general1.heap_size    = 1024;
general1.stack_size   = 8192;
general1.linker.$name = "TIARMCLANG0";

region1.$name                               = "MEMORY_REGION_CONFIGURATION0";
region1.memory_region.create(10);
region1.memory_region[0].type               = "TCMA";
region1.memory_region[0].$name              = "R5F_VECS";
region1.memory_region[0].size               = 0x40;
region1.memory_region[0].auto               = false;
region1.memory_region[1].type               = "TCMA";
region1.memory_region[1].$name              = "R5F_TCMA";
region1.memory_region[1].size               = 0x7FC0;
region1.memory_region[2].type               = "TCMB";
region1.memory_region[2].size               = 0x8000;
region1.memory_region[2].$name              = "R5F_TCMB";
region1.memory_region[3].$name              = "SBL";
region1.memory_region[3].auto               = false;
region1.memory_region[3].size               = 0x40000;
region1.memory_region[4].$name              = "OCRAM";
region1.memory_region[4].auto               = false;
region1.memory_region[4].manualStartAddress = 0x70040000;
region1.memory_region[4].size               = 0x40000;
region1.memory_region[5].type               = "FLASH";
region1.memory_region[5].auto               = false;
region1.memory_region[5].manualStartAddress = 0x60100000;
region1.memory_region[5].size               = 0x80000;
region1.memory_region[5].$name              = "FLASH";
region1.memory_region[6].type               = "CUSTOM";
region1.memory_region[6].$name              = "RTOS_NORTOS_IPC_SHM_MEM";
region1.memory_region[6].auto               = false;
region1.memory_region[6].manualStartAddress = 0x72000000;
region1.memory_region[6].size               = 0x3E80;
region1.memory_region[6].isShared           = true;
region1.memory_region[6].shared_cores       = ["r5fss0-1"];
region1.memory_region[7].type               = "CUSTOM";
region1.memory_region[7].$name              = "MAILBOX_HSM";
region1.memory_region[7].auto               = false;
region1.memory_region[7].manualStartAddress = 0x44000000;
region1.memory_region[7].size               = 0x3CE;
region1.memory_region[7].isShared           = true;
region1.memory_region[7].shared_cores       = ["r5fss0-1"];
region1.memory_region[8].type               = "CUSTOM";
region1.memory_region[8].$name              = "MAILBOX_R5F";
region1.memory_region[8].auto               = false;
region1.memory_region[8].manualStartAddress = 0x44000400;
region1.memory_region[8].size               = 0x3CE;
region1.memory_region[8].isShared           = true;
region1.memory_region[8].shared_cores       = ["r5fss0-1"];
region1.memory_region[9].$name              = "ETH_PACKET_REGION";
region1.memory_region[9].isShared           = true;
region1.memory_region[9].shared_cores       = ["r5fss0-1"];
region1.memory_region[9].size               = 0x800;
region1.memory_region[9].auto               = false;
region1.memory_region[9].manualStartAddress = 0x70100000;

section1.load_memory                  = "R5F_VECS";
section1.group                        = false;
section1.$name                        = "Vector Table";
section1.output_section.create(1);
section1.output_section[0].$name      = ".vectors";
section1.output_section[0].palignment = true;

section2.load_memory                  = "OCRAM";
section2.$name                        = "Text Segments";
section2.output_section.create(5);
section2.output_section[0].$name      = ".text.hwi";
section2.output_section[0].palignment = true;
section2.output_section[1].$name      = ".text.cache";
section2.output_section[1].palignment = true;
section2.output_section[2].$name      = ".text.mpu";
section2.output_section[2].palignment = true;
section2.output_section[3].$name      = ".text.boot";
section2.output_section[3].palignment = true;
section2.output_section[4].$name      = ".text:abort";
section2.output_section[4].palignment = true;

section3.load_memory                  = "OCRAM";
section3.$name                        = "Code and Read-Only Data";
section3.output_section.create(2);
section3.output_section[0].$name      = ".text";
section3.output_section[0].palignment = true;
section3.output_section[1].$name      = ".rodata";
section3.output_section[1].palignment = true;

section4.load_memory                  = "OCRAM";
section4.$name                        = "Data Segment";
section4.output_section.create(1);
section4.output_section[0].$name      = ".data";
section4.output_section[0].palignment = true;

section5.load_memory                             = "OCRAM";
section5.$name                                   = "Memory Segments";
section5.output_section.create(3);
section5.output_section[0].$name                 = ".bss";
section5.output_section[0].output_sections_start = "__BSS_START";
section5.output_section[0].output_sections_end   = "__BSS_END";
section5.output_section[0].palignment            = true;
section5.output_section[1].$name                 = ".sysmem";
section5.output_section[1].palignment            = true;
section5.output_section[2].$name                 = ".stack";
section5.output_section[2].palignment            = true;

section6.load_memory                              = "OCRAM";
section6.$name                                    = "Stack Segments";
section6.output_section.create(5);
section6.output_section[0].$name                  = ".irqstack";
section6.output_section[0].output_sections_start  = "__IRQ_STACK_START";
section6.output_section[0].output_sections_end    = "__IRQ_STACK_END";
section6.output_section[0].input_section.create(1);
section6.output_section[0].input_section[0].$name = ". = . + __IRQ_STACK_SIZE;";
section6.output_section[1].$name                  = ".fiqstack";
section6.output_section[1].output_sections_start  = "__FIQ_STACK_START";
section6.output_section[1].output_sections_end    = "__FIQ_STACK_END";
section6.output_section[1].input_section.create(1);
section6.output_section[1].input_section[0].$name = ". = . + __FIQ_STACK_SIZE;";
section6.output_section[2].$name                  = ".svcstack";
section6.output_section[2].output_sections_start  = "__SVC_STACK_START";
section6.output_section[2].output_sections_end    = "__SVC_STACK_END";
section6.output_section[2].input_section.create(1);
section6.output_section[2].input_section[0].$name = ". = . + __SVC_STACK_SIZE;";
section6.output_section[3].$name                  = ".abortstack";
section6.output_section[3].output_sections_start  = "__ABORT_STACK_START";
section6.output_section[3].output_sections_end    = "__ABORT_STACK_END";
section6.output_section[3].input_section.create(1);
section6.output_section[3].input_section[0].$name = ". = . + __ABORT_STACK_SIZE;";
section6.output_section[4].$name                  = ".undefinedstack";
section6.output_section[4].output_sections_start  = "__UNDEFINED_STACK_START";
section6.output_section[4].output_sections_end    = "__UNDEFINED_STACK_END";
section6.output_section[4].input_section.create(1);
section6.output_section[4].input_section[0].$name = ". = . + __UNDEFINED_STACK_SIZE;";

section7.load_memory                  = "OCRAM";
section7.$name                        = "Initialization and Exception Handling";
section7.output_section.create(3);
section7.output_section[0].$name      = ".ARM.exidx";
section7.output_section[0].palignment = true;
section7.output_section[1].$name      = ".init_array";
section7.output_section[1].palignment = true;
section7.output_section[2].$name      = ".fini_array";
section7.output_section[2].palignment = true;

section8.load_memory                 = "RTOS_NORTOS_IPC_SHM_MEM";
section8.type                        = "NOLOAD";
section8.$name                       = "IPC Shared Memory";
section8.group                       = false;
section8.output_section.create(1);
section8.output_section[0].$name     = ".bss.ipc_vring_mem";
section8.output_section[0].alignment = 0;

section9.load_memory                 = "MAILBOX_HSM";
section9.type                        = "NOLOAD";
section9.$name                       = "SIPC HSM Queue Memory";
section9.group                       = false;
section9.output_section.create(1);
section9.output_section[0].$name     = ".bss.sipc_hsm_queue_mem";
section9.output_section[0].alignment = 0;

section10.load_memory                 = "MAILBOX_R5F";
section10.$name                       = "SIPC R5F Queue Memory";
section10.group                       = false;
section10.type                        = "NOLOAD";
section10.output_section.create(1);
section10.output_section[0].$name     = ".bss.sipc_secure_host_queue_mem";
section10.output_section[0].alignment = 0;

section11.$name                       = "ETH_PACKET";
section11.type                        = "NOLOAD";
section11.group                       = false;
section11.load_memory                 = "ETH_PACKET_REGION";
section11.output_section.create(1);
section11.output_section[0].$name     = ".bss.eth_pkt";
section11.output_section[0].alignment = 128;

/**
 * Pinmux solution for unlocked pins/peripherals. This ensures that minor changes to the automatic solver in a future
 * version of the tool will not impact the pinmux you originally saw.  These lines can be completely deleted in order to
 * re-solve from scratch.
 */
rti1.RTI.$suggestSolution = "RTI2";






---------------------------Core 1-------------------------------------
/**
 * These arguments were used when this file was generated. They will be automatically applied on subsequent loads
 * via the GUI or CLI. Run CLI with '--help' for additional information on how to override these arguments.
 * @cliArgs --device "AM261x_ZFG" --part "AM2612" --package "ZFG" --context "r5fss0-1" --product "MCU_PLUS_SDK_AM261x@11.01.00"
 * @v2CliArgs --device "AM2612" --package "NFBGA (ZFG)" --variant "500MHz" --context "r5fss0-1" --product "MCU_PLUS_SDK_AM261x@11.01.00"
 * @versions {"tool":"1.26.0+4407"}
 */

/**
 * Import the modules used in this configuration.
 */
const eeprom             = scripting.addModule("/board/eeprom/eeprom", {}, false);
const eeprom1            = eeprom.addInstance();
const ethphy_cpsw_icssg  = scripting.addModule("/board/ethphy_cpsw_icssg/ethphy_cpsw_icssg", {}, false);
const ethphy_cpsw_icssg1 = ethphy_cpsw_icssg.addInstance();
const gpio               = scripting.addModule("/drivers/gpio/gpio", {}, false);
const gpio1              = gpio.addInstance();
const gpio2              = gpio.addInstance();
const gpio3              = gpio.addInstance();
const i2c                = scripting.addModule("/drivers/i2c/i2c", {}, false);
const i2c1               = i2c.addInstance();
const ipc                = scripting.addModule("/drivers/ipc/ipc");
const clock              = scripting.addModule("/kernel/dpl/clock");
const debug_log          = scripting.addModule("/kernel/dpl/debug_log");
const dpl_cfg            = scripting.addModule("/kernel/dpl/dpl_cfg");
const mpu_armv7          = scripting.addModule("/kernel/dpl/mpu_armv7", {}, false);
const mpu_armv71         = mpu_armv7.addInstance();
const mpu_armv72         = mpu_armv7.addInstance();
const mpu_armv73         = mpu_armv7.addInstance();
const mpu_armv74         = mpu_armv7.addInstance();
const mpu_armv75         = mpu_armv7.addInstance();
const mpu_armv76         = mpu_armv7.addInstance();
const mpu_armv77         = mpu_armv7.addInstance();
const mpu_armv78         = mpu_armv7.addInstance();
const general            = scripting.addModule("/memory_configurator/general", {}, false);
const general1           = general.addInstance();
const region             = scripting.addModule("/memory_configurator/region", {}, false);
const region1            = region.addInstance();
const section            = scripting.addModule("/memory_configurator/section", {}, false);
const section1           = section.addInstance();
const section2           = section.addInstance();
const section3           = section.addInstance();
const section4           = section.addInstance();
const section5           = section.addInstance();
const section6           = section.addInstance();
const section7           = section.addInstance();
const section8           = section.addInstance();
const section9           = section.addInstance();
const section10          = section.addInstance();
const section11          = section.addInstance();
const section12          = section.addInstance();
const section13          = section.addInstance();
const enet_cpsw          = scripting.addModule("/networking/enet_cpsw/enet_cpsw", {}, false);
const enet_cpsw1         = enet_cpsw.addInstance();

/**
 * Write custom configuration values to the imported modules.
 */
eeprom1.$name      = "CONFIG_EEPROM0";
eeprom1.i2cAddress = 0x51;

gpio1.$name          = "GPIO_DEBUG0";
gpio1.pinDir         = "OUTPUT";
gpio1.GPIO_n.$assign = "GPIO47";

gpio2.$name          = "GPIO_DEBUG1";
gpio2.pinDir         = "OUTPUT";
gpio2.GPIO_n.$assign = "GPIO48";

gpio3.$name          = "GPIO_DEBUG2";
gpio3.pinDir         = "OUTPUT";
gpio3.GPIO_n.$assign = "GPIO50";

i2c1.$name               = "CONFIG_I2C0";
eeprom1.peripheralDriver = i2c1;
i2c1.I2C.$assign         = "I2C0";
i2c1.I2C.SCL.$assign     = "GPIO135";
i2c1.I2C.SDA.$assign     = "GPIO134";
i2c1.I2C_child.$name     = "drivers_i2c_v1_i2c_v1_template1";

ipc.r5fss0_0     = "notify";
ipc.intrPriority = 1;

clock.instance = "RTI0";

debug_log.enableLogZoneWarning = false;
debug_log.enableMemLog         = true;

mpu_armv71.$name             = "CONFIG_MPU_REGION0";
mpu_armv71.size              = 31;
mpu_armv71.attributes        = "Device";
mpu_armv71.accessPermissions = "Supervisor RD+WR, User RD";
mpu_armv71.allowExecute      = false;

mpu_armv72.$name             = "CONFIG_MPU_REGION1";
mpu_armv72.size              = 15;
mpu_armv72.accessPermissions = "Supervisor RD+WR, User RD";

mpu_armv73.$name             = "CONFIG_MPU_REGION2";
mpu_armv73.baseAddr          = 0x80000;
mpu_armv73.size              = 15;
mpu_armv73.accessPermissions = "Supervisor RD+WR, User RD";

mpu_armv74.$name             = "CONFIG_MPU_REGION3";
mpu_armv74.accessPermissions = "Supervisor RD+WR, User RD";
mpu_armv74.baseAddr          = 0x70000000;
mpu_armv74.size              = 21;

mpu_armv75.$name      = "CONFIG_MPU_REGION4";
mpu_armv75.size       = 12;
mpu_armv75.baseAddr   = 0x70080000;
mpu_armv75.attributes = "NonCached";

mpu_armv76.$name        = "CONFIG_MPU_REGION5";
mpu_armv76.allowExecute = false;
mpu_armv76.isShareable  = true;
mpu_armv76.isBufferable = false;
mpu_armv76.baseAddr     = 0x50D00000;
mpu_armv76.size         = 14;
mpu_armv76.attributes   = "Device";

mpu_armv77.$name        = "CONFIG_MPU_REGION6";
mpu_armv77.baseAddr     = 0x72000000;
mpu_armv77.size         = 14;
mpu_armv77.allowExecute = false;
mpu_armv77.attributes   = "NonCached";

mpu_armv78.$name        = "CONFIG_MPU_REGION7";
mpu_armv78.baseAddr     = 0x70100000;
mpu_armv78.size         = 11;
mpu_armv78.isShareable  = true;
mpu_armv78.isBufferable = false;
mpu_armv78.allowExecute = false;
mpu_armv78.attributes   = "NonCached";

general1.$name        = "CONFIG_GENERAL0";
general1.stack_size   = 8192;
general1.heap_size    = 1024;
general1.linker.$name = "TIARMCLANG0";

region1.$name                               = "MEMORY_REGION_CONFIGURATION0";
region1.memory_region.create(6);
region1.memory_region[0].type               = "TCMA";
region1.memory_region[0].$name              = "R5F_VECS";
region1.memory_region[0].auto               = false;
region1.memory_region[0].size               = 0x40;
region1.memory_region[1].type               = "TCMA";
region1.memory_region[1].$name              = "R5F_TCMA";
region1.memory_region[1].size               = 0x7FC0;
region1.memory_region[2].type               = "TCMB";
region1.memory_region[2].size               = 0x8000;
region1.memory_region[2].$name              = "R5F_TCMB";
region1.memory_region[3].$name              = "OCRAM";
region1.memory_region[3].manualStartAddress = 0x70080000;
region1.memory_region[3].size               = 0x7C000;
region1.memory_region[4].type               = "FLASH";
region1.memory_region[4].auto               = false;
region1.memory_region[4].manualStartAddress = 0x60180000;
region1.memory_region[4].size               = 0x80000;
region1.memory_region[4].$name              = "FLASH";
region1.memory_region[5].$name              = "CPPI_DESC";
region1.memory_region[5].auto               = false;
region1.memory_region[5].manualStartAddress = 0x70080000;
region1.memory_region[5].size               = 0x4000;

section1.load_memory                  = "R5F_VECS";
section1.group                        = false;
section1.$name                        = "Vector Table";
section1.output_section.create(1);
section1.output_section[0].$name      = ".vectors";
section1.output_section[0].palignment = true;

section2.load_memory                  = "OCRAM";
section2.$name                        = "Text Segments";
section2.output_section.create(5);
section2.output_section[0].$name      = ".text.hwi";
section2.output_section[0].palignment = true;
section2.output_section[1].$name      = ".text.cache";
section2.output_section[1].palignment = true;
section2.output_section[2].$name      = ".text.mpu";
section2.output_section[2].palignment = true;
section2.output_section[3].$name      = ".text.boot";
section2.output_section[3].palignment = true;
section2.output_section[4].$name      = ".text:abort";
section2.output_section[4].palignment = true;

section3.load_memory                  = "OCRAM";
section3.$name                        = "Code and Read-Only Data";
section3.output_section.create(2);
section3.output_section[0].$name      = ".text";
section3.output_section[0].palignment = true;
section3.output_section[1].$name      = ".rodata";
section3.output_section[1].palignment = true;

section4.load_memory                  = "OCRAM";
section4.$name                        = "Data Segment";
section4.output_section.create(1);
section4.output_section[0].$name      = ".data";
section4.output_section[0].palignment = true;

section5.load_memory                             = "OCRAM";
section5.$name                                   = "Memory Segments";
section5.output_section.create(3);
section5.output_section[0].$name                 = ".bss";
section5.output_section[0].output_sections_start = "__BSS_START";
section5.output_section[0].output_sections_end   = "__BSS_END";
section5.output_section[0].palignment            = true;
section5.output_section[1].$name                 = ".sysmem";
section5.output_section[1].palignment            = true;
section5.output_section[2].$name                 = ".stack";
section5.output_section[2].palignment            = true;

section6.load_memory                              = "OCRAM";
section6.$name                                    = "Stack Segments";
section6.output_section.create(5);
section6.output_section[0].$name                  = ".irqstack";
section6.output_section[0].output_sections_start  = "__IRQ_STACK_START";
section6.output_section[0].output_sections_end    = "__IRQ_STACK_END";
section6.output_section[0].input_section.create(1);
section6.output_section[0].input_section[0].$name = ". = . + __IRQ_STACK_SIZE;";
section6.output_section[1].$name                  = ".fiqstack";
section6.output_section[1].output_sections_start  = "__FIQ_STACK_START";
section6.output_section[1].output_sections_end    = "__FIQ_STACK_END";
section6.output_section[1].input_section.create(1);
section6.output_section[1].input_section[0].$name = ". = . + __FIQ_STACK_SIZE;";
section6.output_section[2].$name                  = ".svcstack";
section6.output_section[2].output_sections_start  = "__SVC_STACK_START";
section6.output_section[2].output_sections_end    = "__SVC_STACK_END";
section6.output_section[2].input_section.create(1);
section6.output_section[2].input_section[0].$name = ". = . + __SVC_STACK_SIZE;";
section6.output_section[3].$name                  = ".abortstack";
section6.output_section[3].output_sections_start  = "__ABORT_STACK_START";
section6.output_section[3].output_sections_end    = "__ABORT_STACK_END";
section6.output_section[3].input_section.create(1);
section6.output_section[3].input_section[0].$name = ". = . + __ABORT_STACK_SIZE;";
section6.output_section[4].$name                  = ".undefinedstack";
section6.output_section[4].output_sections_start  = "__UNDEFINED_STACK_START";
section6.output_section[4].output_sections_end    = "__UNDEFINED_STACK_END";
section6.output_section[4].input_section.create(1);
section6.output_section[4].input_section[0].$name = ". = . + __UNDEFINED_STACK_SIZE;";

section7.load_memory                  = "OCRAM";
section7.$name                        = "Initialization and Exception Handling";
section7.output_section.create(3);
section7.output_section[0].$name      = ".ARM.exidx";
section7.output_section[0].palignment = true;
section7.output_section[1].$name      = ".init_array";
section7.output_section[1].palignment = true;
section7.output_section[2].$name      = ".fini_array";
section7.output_section[2].palignment = true;

section8.load_memory                 = "RTOS_NORTOS_IPC_SHM_MEM";
section8.type                        = "NOLOAD";
section8.$name                       = "IPC Shared Memory";
section8.group                       = false;
section8.output_section.create(1);
section8.output_section[0].$name     = ".bss.ipc_vring_mem";
section8.output_section[0].alignment = 0;

section9.load_memory                 = "MAILBOX_HSM";
section9.type                        = "NOLOAD";
section9.$name                       = "SIPC HSM Queue Memory";
section9.group                       = false;
section9.output_section.create(1);
section9.output_section[0].$name     = ".bss.sipc_hsm_queue_mem";
section9.output_section[0].alignment = 0;

section10.load_memory                 = "MAILBOX_R5F";
section10.$name                       = "SIPC R5F Queue Memory";
section10.group                       = false;
section10.type                        = "NOLOAD";
section10.output_section.create(1);
section10.output_section[0].$name     = ".bss.sipc_secure_host_queue_mem";
section10.output_section[0].alignment = 0;

section11.type                        = "NOLOAD";
section11.group                       = false;
section11.load_memory                 = "CPPI_DESC";
section11.$name                       = "ENET_CPPI_DESC";
section11.output_section.create(1);
section11.output_section[0].alignment = 128;
section11.output_section[0].$name     = ".bss:ENET_CPPI_DESC";

section12.$name                       = "ENET_DMA_PKT_MEMPOOL";
section12.type                        = "NOLOAD";
section12.group                       = false;
section12.load_memory                 = "OCRAM";
section12.output_section.create(1);
section12.output_section[0].$name     = ".bss:ENET_DMA_PKT_MEMPOOL";
section12.output_section[0].alignment = 128;

section13.$name                       = "ETH_PACKET";
section13.type                        = "NOLOAD";
section13.group                       = false;
section13.load_memory                 = "ETH_PACKET_REGION";
section13.output_section.create(1);
section13.output_section[0].$name     = ".bss.eth_pkt";
section13.output_section[0].alignment = 128;

enet_cpsw1.$name                 = "CONFIG_ENET_CPSW0";
enet_cpsw1.LargePoolPktCount     = 16;
enet_cpsw1.MediumPoolPktCount    = 32;
enet_cpsw1.cptsHostRxTsEn        = false;
enet_cpsw1.cptsRftClkFreq        = "CPSW_CPTS_RFTCLK_FREQ_200MHZ";
enet_cpsw1.mdioMode              = "MDIO_MODE_MANUAL";
enet_cpsw1.txDmaChannel[0].$name = "ENET_DMA_TX_CH0";
enet_cpsw1.rxDmaChannel[0].$name = "ENET_DMA_RX_CH0";

ethphy_cpsw_icssg1.$name          = "CONFIG_ENET_ETHPHY0";
ethphy_cpsw_icssg1.boardType      = "am261x-lp";
ethphy_cpsw_icssg1.peripheral     = "CPSW_MAC_PORT_1";
enet_cpsw1.ethphy1                = ethphy_cpsw_icssg1;
ethphy_cpsw_icssg1.extendedConfig = "/* Extended PHY configuration for DP83869 */\n.txClkShiftEn         = true,\n.rxClkShiftEn         = true,\n.txDelayInPs          = 2000U,   /* Value in pecosec. Refer to DLL_RX_DELAY_CTRL_SL field in ANA_RGMII_DLL_CTRL register of DP83869 PHY datasheet */\n.rxDelayInPs          = 2000U,   /* Value in pecosec. Refer to DLL_TX_DELAY_CTRL_SL field in ANA_RGMII_DLL_CTRL register of DP83869 PHY datasheet */\n.txFifoDepth          = 4U,\n.impedanceInMilliOhms = 35000,  /* 35 ohms */\n.idleCntThresh        = 4U,     /* Improves short cable performance */\n.gpio0Mode            = DP83869_GPIO0_COL,\n.gpio1Mode            = DP83869_GPIO1_CONSTANT0, /* Unused */\n.ledMode              =\n{\n\tDP83869_LED_LINKED,         /* Unused */\n\tDP83869_LED_RXERR,\n\tDP83869_LED_COLLDET,\n\tDP83869_LED_LINKED_1000BT,\n},";

const ethphy_cpsw_icssg2 = ethphy_cpsw_icssg.addInstance({}, false);
ethphy_cpsw_icssg2.$name = "CONFIG_ENET_ETHPHY1";
enet_cpsw1.ethphy2       = ethphy_cpsw_icssg2;

/**
 * Pinmux solution for unlocked pins/peripherals. This ensures that minor changes to the automatic solver in a future
 * version of the tool will not impact the pinmux you originally saw.  These lines can be completely deleted in order to
 * re-solve from scratch.
 */
enet_cpsw1.MDIO.$suggestSolution          = "MDIO0";
enet_cpsw1.MDIO.MDC.$suggestSolution      = "GPIO42";
enet_cpsw1.MDIO.MDIO.$suggestSolution     = "GPIO41";
enet_cpsw1.RGMII1.$suggestSolution        = "RGMII2";
enet_cpsw1.RGMII1.RD0.$suggestSolution    = "GPIO93";
enet_cpsw1.RGMII1.RD1.$suggestSolution    = "GPIO94";
enet_cpsw1.RGMII1.RD2.$suggestSolution    = "GPIO95";
enet_cpsw1.RGMII1.RD3.$suggestSolution    = "GPIO96";
enet_cpsw1.RGMII1.RX_CTL.$suggestSolution = "GPIO92";
enet_cpsw1.RGMII1.RXC.$suggestSolution    = "GPIO91";
enet_cpsw1.RGMII1.TD0.$suggestSolution    = "GPIO99";
enet_cpsw1.RGMII1.TD1.$suggestSolution    = "GPIO100";
enet_cpsw1.RGMII1.TD2.$suggestSolution    = "GPIO101";
enet_cpsw1.RGMII1.TD3.$suggestSolution    = "GPIO102";
enet_cpsw1.RGMII1.TX_CTL.$suggestSolution = "GPIO98";
enet_cpsw1.RGMII1.TXC.$suggestSolution    = "GPIO97";
enet_cpsw1.RGMII2.$suggestSolution        = "RGMII1";
enet_cpsw1.RGMII2.RD0.$suggestSolution    = "GPIO109";
enet_cpsw1.RGMII2.RD1.$suggestSolution    = "GPIO110";
enet_cpsw1.RGMII2.RD2.$suggestSolution    = "GPIO111";
enet_cpsw1.RGMII2.RD3.$suggestSolution    = "GPIO112";
enet_cpsw1.RGMII2.RX_CTL.$suggestSolution = "GPIO108";
enet_cpsw1.RGMII2.RXC.$suggestSolution    = "GPIO107";
enet_cpsw1.RGMII2.TD0.$suggestSolution    = "GPIO115";
enet_cpsw1.RGMII2.TD1.$suggestSolution    = "GPIO116";
enet_cpsw1.RGMII2.TD2.$suggestSolution    = "GPIO117";
enet_cpsw1.RGMII2.TD3.$suggestSolution    = "GPIO118";
enet_cpsw1.RGMII2.TX_CTL.$suggestSolution = "GPIO114";

0 Shaunak Deshpande 3 months ago in reply to Elia Pellegrino

TI__Mastermind 23972 points

Hi Elia,

From initial analysis what i believe happens is, when you try to copy the ethPkt from 0x70100000, which is marked as non-cached and non-bufferable shared memory, where the destination memory is a cached OCRAM region (internalFrame).

1. I believe the slow copy that we compare with fast copy is because we are reading from cache in one case and from a non-cached shared buffer in one case.

2. Moreover, Non-Bufferable (isBufferable = false): The processor must wait for the write transaction to complete on the bus before proceeding. This is generally used for peripheral access where strict ordering is required. In this case, we set core-0 memory region to non-bufferable so it becomes slower as a result of no instruction pipe-lining.

One solution you can try is to set the isBufferable = true (in both cores).

Can you once try to rebuild the example and try to benchmark again and see if the above helps?

Regards,
Shaunak

0 Elia Pellegrino 3 months ago in reply to Shaunak Deshpande

Prodigy 30 points

Hi,

These are my results over 1k acquisistion

MPU Configuration of Shared RAM	Core 0 Latency [µs]	Core 1 Latency [µs]
Not Configured	23.44 ± 0.06	6.6 ± 0.11
Shareable	86.86 ± 0.05	6.12 ± 0.11
Cacheable, Shareable	86.86 ± 0.08	6.11 ± 0.1
Cacheable, Shareable, Bufferable	86.86 ± 0.08	6.12 ± 0.15

What is more surprising though, is that if I do the same copy (from shared memory region to Core specific OCRAM), I get this results:

MPU Configuration of Shared RAM	Core 0 Latency [µs]	Core 1 Latency [µs]
Not Configured	20.5	4.625
Shareable	85.875	8.25
Cacheable, Shareable	85.875	8.25
Cacheable, Shareable, Bufferable	86	8.25

So, from experiments, it seems that MPU is just messing around with Core 0 for some reason.

Any suggestion on how to continue?

0 Elia Pellegrino 3 months ago in reply to Elia Pellegrino

Prodigy 30 points

I found the main problem. One core was compiled with --use_memcpy=fast, while the other no. This results in an improvment of over 10x in speed.

0 Shaunak Deshpande 3 months ago in reply to Elia Pellegrino

TI__Mastermind 23972 points

Hi Elia,

Thanks for the update, makes sense that --use_memcpy=fast created the improvement.
Also, I am surprised that the bufferable = 0/1 did not create any change in performance.

Regards,
Shaunak

Arm-based microcontrollers

Arm-based microcontrollers forum

AM2612: AM261x-LP: Shared memory slow access on R5FSS0-0