MCU-PLUS-SDK-AM243X: Using peripherals from multiple cores

Felix Heil

Tool/software:

Hello,

So we are using the MCU PLUS SDK 09.01, tiarmclang LTS 3.2.0. We are utilising the IPC with an additional layer of our own communication-protocol. Currently we are using two cores in one cluster but can also use all 4. Also please note for the following, we are not using SysCfg but we modified the SDK-API in such way that the Drivers won't take an index for an array but can use the Config directly as a pointer-parameter. So the change is small and just is relevant for the open() functions mostly.

So in general we have a lot of possibilities to modify the driver setup. E.g. the clocks for the modules are activated as soon as our written constructor of e.g. the SpiDriver, as well as the correct interrupts are set and so. In general this already works pretty well.

We now have the possibility to switch our software-components from one core to the other but since we did not (for now) implement any special handlers for peripheral access, we are accessing the peripheral from both cores.

So assume the following setup for our example:

2 cores in one cluster (mcu0_0 and mcu0_1), both are accessing the SpiDriver at the same MCSPI-module but different channels.

Our thoughts about preventing concurrent access was to use spinlocks. So every driver gets its own hardcoded spinlock-id. There are some constraints but I will handle them in the PS later.

We lock the spinlock as soon as one driver accesses the Spi-driver in general. be it any open, close, transceive or whatever - a spinlock is locked. Our current use case is to have some LEDs at a shift-register on one SPI-channel and one eeprom (we use our own drivers, not the ones of the SDK) at another SPI-channel.

The interesting thing is, that this seems to work perfectly via a load via CCS. the Spi can be accessed from both cores.

But it does not work if we boot from flash. We are running into an data abort when it tries to write a register from the SciClient it seems (CSL_REG32_WR_RAW in sciclientobj.c). unfortunately the abort-handler does not give any usefull information. but the CP15-register says: 0x1808, so an external abort on write. The Data fault address is 0x4D001004. So according to the TRM it's part of the DMASS0_SEC_PROXY_SRC_TARGET_DATA.

We may need your help here since we cannot really debug this situation. It seems to be some kind of race condition which behaves really strange. We implemented a sleep at one core for over 10 sec to exclude the possibility of activating some stuff at the same time but this did not prevent it. Interestingly it works when we add a loop_forever at the start (like in the bootloader debugging) and let it run afterwards then.

To explain the startup (per core) a bit more since we do not use SysCfg, see this as running from main():

1. HwiP_init()
2. setup Clock (HWTimer 8 for core 0_0 and HWTimer 9 for core 0_1, set interrupts correct and call ClockP_init() )
3. HwiP_open()
4. Sciclient_init(coreId)
5. setup Ipc, wait for sync
6. setup SpiDrivers:
6.1 set interrupt, spinlock ids, and enable clock for the spi-module (according to the used spi-module)
6.2 open SpiDriver
6.3 calling MCSPI_chConfig (currently for every channel on every core)

So since the error happens somewhere in the firewall range I dunno what to do. Is this something boardcfg related? But why does it then work when loading via CCS or with the bootloader-debugging-approach with a while-loop at the beginning?

PS:

Yes we know spinlocks are probably not a good idea to use. The TRM states the locks shouldn't be taken more than 200 ns and actions should be short etc. The theoretical problem exists that one core has a task with a low priority accessing the Spi, locking it and the other core's task with a higher prio then could be blocked when trying to access the same peripheral and trying to get the spinlock in a while-loop and thus could block one core pretty long. Also if on the same core multiple entities want to access the same spi-module it could happen they deadlock themselves out if a lower prio task had taken the spinlock and a higher prio task - also using the spi-module - preempts this task. But the last case shouldn't happen, since we also use Semaphores do prevent concurrent access from one core. The Spinlocks are directly locked afterwards (and released of course).
To at least minify the impact we made a timeout and a sleep(1) for every spinlock check, so it's ensured lower prio tasks can continue running. We may refine this behaviour later on.

We are for now fine with it, since at least we can guarantee no concurrent access to the SPI-module.

Do you have an idea what we can do here to get this running?

Best regards

Felix

over 1 year ago

0 Swargam Anil over 1 year ago

TI__Mastermind 46206 points

Hello Felix,

Sorry for the delayed reply as I was on leave yesterday.

You may get a reply in one or two days.

Regards,

Anil.

0 Swargam Anil over 1 year ago in reply to Swargam Anil

TI__Mastermind 46206 points

Hello Felix,

So, you are connecting one MCSPI instant with two CS lines.

one CS is connected to the Shift Register and another CS is connected to the EEPROM.

One CS is controlled in one R5F0_0 core and another CS line is controlled in R5F0_1.

Can you please confirm you are using DMA with both channels ?

Felix Heil said:
We implemented a sleep at one core for over 10 sec to exclude the possibility of activating some stuff at the same time but this did not prevent it.

Initially, my suspect is that both cores are accessing the DMA at the same time from flash mode.

Since this has already tested with CCS and works fine. From CCS, definitely there is a gap in between initializing all the peripherals in one core vs another core. So, this issue might be missing in the CCS loading .

One more observation from you after adding while(1) condition in one application this issue is not coming since already one core is completed initialization.

In another core we are initializing with the while(variable) condition. So, here there is some gap in between initialization of one core vs another core .

Actually, there are 5 modules available on MCSPI. Then why can't we connect one dedicated channel for EEPROM and Shift Register ?

You can add any feedback mechanism if one core is accessing the mcspi, wait for completion of one core, and after completion of one core, use the same channel for different core with IPC feedback .

I assume that for controlling Shift Register we don't need to enable RX mode .

I hope both MCSPI channels are configured with the same configuration except CS line.

Felix Heil said:
So since the error happens somewhere in the firewall range I dunno what to do. Is this something boardcfg related? But why does it then work when loading via CCS or with the bootloader-debugging-approach with a while-loop at the beginning?

Mostly this is not problem with the Firewall since already you have confirmed that code works fine in CCS .

So, I hope you are not using NO BOOT MODE in the above case .

Can you please confirm which boot mode you are using ?

NO BOOT MODE ?

If you are suspecting issue with the Firewall definitely all firewall exceptions are routed to SYSFW core.

So, enable sysfw log with the help of FAQ below and share the log.

https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1377336/faq-am64x-am243-how-to-enable-sysfw-trace-on-am64x-am243-devices

Regards,

Anil.

0 Felix Heil over 1 year ago in reply to Swargam Anil

Expert 1175 points

Hello Swargam,

I see I missed some informations.

We are not using any DMA. We can't connect is differently because we already have all pins connected and they share another job. Yes both channel are configured the same.

Boot Mode is xSPI in all cases. Also when connecting via CCS and loading.

We can drop this firewall issue since I think we found at least a solution now that works in the meanwhile.

So we used the Spinlocks now. as soon as the spi is accessed it is directly locked. It works. We think the previous problem was because of a call to the open() function where we forgot a spinlock.

So now every core has dedicated access without concurrent access by any other core, thanks to the spinlocks. I would see this as working then.

The only question here is: Is it legit this way? Or would you say: "Nah, not that good, there is maybe a better solution." ? Because we are now thinking of implementing this spinlock with dedicated IDs for each of the (our) drivers. That's easy to implement since we also have a wrapper for the TI-drivers. It is optional to activate when more than one core is used, else it's not compiled in.

This would mean there would be also spinlocks for Uart, OSPI (we are not using XIP, so shouldn't be a problem from this view), I²C and so on. At least in my understanding, this way we can guarante concurrency-free access to the peripherals, with the drawback of a possible priority inversion of the tasks of each core. On the other hand we use binary Semaphores anyway and even if a higher task want's to access a resource it needs to wait until the lower prio task has finished. So it's just a bit more complex with more cores.

Best regards

Felix

0 Swargam Anil over 1 year ago in reply to Felix Heil

TI__Mastermind 46206 points

Felix Heil said:
The only question here is: Is it legit this way? Or would you say: "Nah, not that good, there is maybe a better solution." ? Because we are now thinking of implementing this spinlock with dedicated IDs for each of the (our) drivers. That's easy to implement since we also have a wrapper for the TI-drivers. It is optional to activate when more than one core is used, else it's not compiled in

Hello Felix,

I am ok with the spinlock, but I am worrying about Interrupts handling.

In my analysis, I found that if one core is triggered TX and RX interrupts then the same interrupt goes to two cores.

I am not sure how are you handling the ISR interrupts?

Can you please share your application or IRQ handles for each core? So, that I can try to check and provide feedback..

My suggestion is that always use peripheral in one core and don't share it with the other cores .

Your application can be achieved in the method below .

I assume that CS0 is connected to Shift Register and CS1 connected to EEPROM .

So, R5F0_0 controls the Shift register based on R5F0_0 application data and once this operation is done next request to get the data from other core R5F0_1 through IPC and after getting data and control the CS1 from R5F0_0 application only .

If you are thinking that EEPROM data is more to write or read data on EEPROM memories and will take more time read data through IPC .

Then move peripheral control on R5F0_1 and from here you can get the data for shift register . I am sure that controlling Shift registers mostly needs few bytes and not required large data .

Regards,

Anil.

0 Felix Heil over 1 year ago in reply to Swargam Anil

Expert 1175 points

Hey Anil,

I see and I think I understand, we also had this topic when we wanted to manually trigger an interrupt - because this also goes to all cores.

the Interrupts of the MCSPI are handled by the sdk-driver itself as far as I understood. It uses a Semaphore to wait for the interrupt, which is part of the MCSPI_object:

    /**< Transfer Sync Sempahore - to sync between transfer completion ISR
     *   and task */
    SemaphoreP_Object       transferSemObj;

I haven't checked what will happen in case of an unintended interrupt for one core. But I think it's not a problem because coming from the program flow: if no semaphore was taken before by a task it also shouldn't be problem if it's posted later on. at least I hope so?

Let's check the sdk-driver-implementation of MCSPI_peripheralIsr we currently still use (did not migrate the lld of the latest SDKs (!)):

    int32_t             status = SystemP_SUCCESS;
    uint32_t            transferStatus;
    MCSPI_Config       *config;
    MCSPI_Object       *obj;
    const MCSPI_Attrs  *attrs;
    MCSPI_ChObject     *chObj;
    MCSPI_Transaction  *transaction;
    uint32_t            baseAddr, chNum;

    /* Check parameters */
    if(NULL == args)
    {
        status = SystemP_FAILURE;
    }

    if(SystemP_SUCCESS == status)
    {
        config = (MCSPI_Config *) args;
        obj = config->object;
        attrs = config->attrs;
        DebugP_assert(NULL != obj);
        DebugP_assert(NULL != config->attrs);

        transaction = obj->currTransaction;
        baseAddr = obj->baseAddr;
        if (transaction != NULL)
        {
            chNum = transaction->channel;
            chObj = &obj->chObj[chNum];
            transferStatus = MCSPI_continuePeripheralTxRx(obj, chObj, transaction);
            if ((MCSPI_TRANSFER_COMPLETED == transferStatus) ||
                    (MCSPI_TRANSFER_CANCELLED == transferStatus))
            {
                /* Process the transfer completion. */
                /* Stop MCSPI Channel */
                MCSPI_stop(obj, attrs, chObj, chNum);

                /* Disable TX and RX FIFO */
                chObj->chConfRegVal &= ~(CSL_MCSPI_CH0CONF_FFEW_MASK | CSL_MCSPI_CH0CONF_FFER_MASK);
                CSL_REG32_WR(baseAddr + MCSPI_CHCONF(chObj->chCfg.chNum), chObj->chConfRegVal);

                /* Update the driver internal status. */
                /* transfer completed */
                transaction->status  = transferStatus;
                /* Return the actual number of words transferred */
                obj->currTransaction->count = chObj->curRxWords;
                if (MCSPI_TR_MODE_TX_ONLY == chObj->chCfg.trMode)
                {
                    obj->currTransaction->count = chObj->curTxWords;
                }
                obj->currTransaction = NULL;

                /*
                * Post transfer Sem in case of bloacking transfer.
                * Call the callback function in case of Callback mode.
                */
                if (obj->openPrms.transferMode == MCSPI_TRANSFER_MODE_BLOCKING)
                {
                    SemaphoreP_post(&obj->transferSemObj);
                }
                else
                {
                    obj->openPrms.transferCallbackFxn((MCSPI_Handle) config, transaction);
                }
            }
            /*
            * Else the transfer is still pending.
            * Do nothing, wait for next interrupt.
            */
        }
        else
        {
            /* There is no ongoing transfer. Disable and clear all interrupts. */
            CSL_REG32_WR(baseAddr + CSL_MCSPI_IRQENABLE, 0U);
            MCSPI_clearAllIrqStatus(baseAddr);
        }
    }
    return;

so if either args or the transaction are NULL, it won't proceed. And in the case whenan interrupt occurs but the current core is not using the MCSPI then at least transaction should be NULL and nothing should modify any data on that core and the interrupt exits.

Maybe it's useful to print our configuration for MCSPI:

const MCSPI_Attrs spi0Attr = {
    CSL_MCSPI0_CFG_BASE,
    50000000U,
    0, // is set by driver
    MCSPI_OPER_MODE_INTERRUPT,
    8U,
    MCSPI_CH_MODE_MULTI,
    MCSPI_PINMODE_4PIN,
    MCSPI_INITDLY_0
};

Please keep in mind we use a modified SDK-API and our wrappers for the drivers also define the hw-interrupt number of the used MCSPI (of course according to the TRM).

So the MCSPI_OPER_MODE_INTERRUPT always ends in an interrupt of the SDK-driver, where it posts this semaphore. Our application code "just" calls the read and write with having the Spinlocks locked before.

The solution of only using one core for peripheral access was also on our mind but complicated a lot, since we now would've needed to add handlers, camo'd as drivers. This is due to our design where you can switch one SW-Component to another core and it still holds all the connections the the Components of the other core (we implemented another layer above the Ipc which works like a normal function-call and connects components at startup via all cores). But this does not work that way for the drivers. They are just simple c++ Interfaces implemented (and internally accessing the sdk-drivers). Which will mean in the end: more code and more space for data taken.
Additionally the transmission-time would increase for a large amount of data and when accessing the eeprom it makes things more complicated.

So I see we are making the same thoughts to that topic.

I think we would need to make some measurements utilizing the GTC to see how long it takes (and blocks tasks) in comparison. one time via Ipc and one time via shared peripheral access from multiple cores. And then bring it into relation to the memory footprint. Unfortunately currently we do not have the time for that.

But if I got everything right now, it's mainly a topic of the handling of the IRQ which all cores receive right? I think this should work. it's an unneccessary interrupt, but it shouldn't break anything.

+1 Swargam Anil over 1 year ago in reply to Felix Heil

TI__Mastermind 46206 points

Hello Felix,

In my analysis, I have seen only problems with interrupts handling and if semaphore already implemented then I have seen no issues right now.

As you mentioned, even every time the core may get unnecessary interrupts except there are no issues.

Other than the Spin lock, there is another method I have shared above, accessing SPI on EEPROM application core and reading data through IPC or DMA to control Shift register from other core .

Other than these two methods, there is no possibility to use the same peripheral in two cores.

Regards,

Anil

0 Felix Heil over 1 year ago in reply to Swargam Anil

Expert 1175 points

thanks Anil,

we will use now one of those options (of course with our own responsibility) but we may think about a better solution via the IPC. Thanks so far!

Arm-based microcontrollers

Arm-based microcontrollers forum

MCU-PLUS-SDK-AM243X: Using peripherals from multiple cores