Tool/software:
Hello,
So we are using the MCU PLUS SDK 09.01, tiarmclang LTS 3.2.0. We are utilising the IPC with an additional layer of our own communication-protocol. Currently we are using two cores in one cluster but can also use all 4. Also please note for the following, we are not using SysCfg but we modified the SDK-API in such way that the Drivers won't take an index for an array but can use the Config directly as a pointer-parameter. So the change is small and just is relevant for the open() functions mostly.
So in general we have a lot of possibilities to modify the driver setup. E.g. the clocks for the modules are activated as soon as our written constructor of e.g. the SpiDriver, as well as the correct interrupts are set and so. In general this already works pretty well.
We now have the possibility to switch our software-components from one core to the other but since we did not (for now) implement any special handlers for peripheral access, we are accessing the peripheral from both cores.
So assume the following setup for our example:
2 cores in one cluster (mcu0_0 and mcu0_1), both are accessing the SpiDriver at the same MCSPI-module but different channels.
Our thoughts about preventing concurrent access was to use spinlocks. So every driver gets its own hardcoded spinlock-id. There are some constraints but I will handle them in the PS later.
We lock the spinlock as soon as one driver accesses the Spi-driver in general. be it any open, close, transceive or whatever - a spinlock is locked. Our current use case is to have some LEDs at a shift-register on one SPI-channel and one eeprom (we use our own drivers, not the ones of the SDK) at another SPI-channel.
The interesting thing is, that this seems to work perfectly via a load via CCS. the Spi can be accessed from both cores.
But it does not work if we boot from flash. We are running into an data abort when it tries to write a register from the SciClient it seems (CSL_REG32_WR_RAW in sciclientobj.c). unfortunately the abort-handler does not give any usefull information. but the CP15-register says: 0x1808, so an external abort on write. The Data fault address is 0x4D001004. So according to the TRM it's part of the DMASS0_SEC_PROXY_SRC_TARGET_DATA.
We may need your help here since we cannot really debug this situation. It seems to be some kind of race condition which behaves really strange. We implemented a sleep at one core for over 10 sec to exclude the possibility of activating some stuff at the same time but this did not prevent it. Interestingly it works when we add a loop_forever at the start (like in the bootloader debugging) and let it run afterwards then.
To explain the startup (per core) a bit more since we do not use SysCfg, see this as running from main():
1. HwiP_init()
2. setup Clock (HWTimer 8 for core 0_0 and HWTimer 9 for core 0_1, set interrupts correct and call ClockP_init() )
3. HwiP_open()
4. Sciclient_init(coreId)
5. setup Ipc, wait for sync
6. setup SpiDrivers:
6.1 set interrupt, spinlock ids, and enable clock for the spi-module (according to the used spi-module)
6.2 open SpiDriver
6.3 calling MCSPI_chConfig (currently for every channel on every core)
So since the error happens somewhere in the firewall range I dunno what to do. Is this something boardcfg related? But why does it then work when loading via CCS or with the bootloader-debugging-approach with a while-loop at the beginning?
PS:
Yes we know spinlocks are probably not a good idea to use. The TRM states the locks shouldn't be taken more than 200 ns and actions should be short etc. The theoretical problem exists that one core has a task with a low priority accessing the Spi, locking it and the other core's task with a higher prio then could be blocked when trying to access the same peripheral and trying to get the spinlock in a while-loop and thus could block one core pretty long. Also if on the same core multiple entities want to access the same spi-module it could happen they deadlock themselves out if a lower prio task had taken the spinlock and a higher prio task - also using the spi-module - preempts this task. But the last case shouldn't happen, since we also use Semaphores do prevent concurrent access from one core. The Spinlocks are directly locked afterwards (and released of course).
To at least minify the impact we made a timeout and a sleep(1) for every spinlock check, so it's ensured lower prio tasks can continue running. We may refine this behaviour later on.
We are for now fine with it, since at least we can guarantee no concurrent access to the SPI-module.
Do you have an idea what we can do here to get this running?
Best regards
Felix