AM2612: AM261 issue using SPI + DMA with Zephyr OS

Part Number: AM2612

I am using the AM261 launchpad board and am attempting to read SPI data as a slave.  We have an existing SPI master sending 22 bytes of process data.  When I read the SPI data using the synchronous API (ie: spi_transceive) the received data matched my expectations.  However when I switched to the asynchronous API (ie: spi_transceive_cb) things stopped working.

app.overlay:

#include <zephyr/dt-bindings/gpio/gpio.h>
#include <zephyr/dt-bindings/memory-attr/memory-attr.h>

#define DMA_REGION_SIZE (0x80)  /* 128 bytes of DMA data */

/ {
  aliases {
    processdataspi = &mcspi2;
  };
  soc {
    /* Define the custom memory region for DMA, put it at the end of SRAM for now. */
    pd_rx_dma_region: memory@700fff80 {
			compatible = "zephyr,memory-region",
			             "mmio-sram";
      reg = <0x700fff80 DMA_REGION_SIZE>; /* 128 byte region for DMA data */
      zephyr,memory-region = "PD_RX_DMA_REGION";
      zephyr,memory-attr = <(DT_MEM_DMA)>;
    };
  };
};

&mcspi2 {
  // "Process Data"
  status = "okay";
};

&sram0 {
  reg = <0x70000000 (DT_SIZE_M(1) - DMA_REGION_SIZE)>;
};

prj.conf:

#
# C Library
#
CONFIG_NEWLIB_LIBC=y

#
# C++ Language Support
#
CONFIG_CPP=y
CONFIG_STD_CPP17=y
CONFIG_REQUIRES_FULL_LIBCPP=y
CONFIG_GLIBCXX_LIBCPP=y
CONFIG_CPP_EXCEPTIONS=y
CONFIG_CPP_RTTI=y
CONFIG_DYNAMIC_THREAD=y
CONFIG_DYNAMIC_THREAD_ALLOC=y
CONFIG_EVENTS=y
CONFIG_REBOOT=y
CONFIG_THREAD_STACK_INFO=y
CONFIG_SERIAL=y
CONFIG_SPI=y
CONFIG_SPI_DMA=y
CONFIG_SPI_SLAVE=y
# Configure stack/heap
CONFIG_HEAP_MEM_POOL_SIZE=131072
CONFIG_MAIN_STACK_SIZE=8192

west.yml:

manifest:
  remotes:
    - name: zephyrproject-rtos-ti
      url-base: github.com/TexasInstruments

  projects:
    - name: zephyr
      remote: zephyrproject-rtos-ti
      revision: REL.AM261X-v4.2.0-ti-01.01.01
      clone-depth: 1
      import:
        name-allowlist:
          - cmsis
          - cmsis_6
    - name: asm-hal_ti
      remote: zephyrproject-rtos-ti
      revision: v4.2.0-ti-11.00.02

Working synchronous SPI sample code:

#include <zephyr/kernel.h>
#include <zephyr/device.h>
#include <zephyr/devicetree.h>
#include <zephyr/drivers/spi.h>
#include <zephyr/sys/printk.h>

#include <array>
#include <cstdint>

namespace
{
const struct device* g_pSpi = DEVICE_DT_GET(DT_ALIAS(processdataspi));  //!< Pointer to SPI device tree instance
bool g_newDataReceived{false};                                          //!< Flag indicating new data is available
std::uint8_t g_rxBuffer[22U] __aligned(32);                             //!< Receive buffer

//*********************************************************************************************
//*********************************************************************************************
auto receiveSync(std::uint8_t* pBuffer, std::size_t len) -> bool
{
  /* NOTE: This is static so the SPI configuration doesn't need to be reapplied with every transfer.  This is due to an internal pointer compare to check equality of SPI configs. */
  static struct spi_config config = {
      .frequency = 1'000'000U,
      .operation = SPI_OP_MODE_SLAVE | SPI_TRANSFER_MSB | SPI_MODE_CPHA | SPI_MODE_CPOL | SPI_FRAME_FORMAT_MOTOROLA | SPI_WORD_SET(8),
      .slave = 0U,
      .cs = NULL,
  };
  struct spi_buf rxBuf = {.buf = pBuffer, .len = len};
  struct spi_buf_set rxSet = {.buffers = &rxBuf, .count = 1};

  /* Begin asynchronous SPI transfer. */
  const auto ret = spi_transceive(g_pSpi, &config, NULL, &rxSet);
  if (0 > ret)
  {
    printk("SPI Transceive failed: %d\n", ret);
    return false;
  }

  return true;
}

}

//***************************************************************************************
//***************************************************************************************
int main()
{
  if (!device_is_ready(g_pSpi))
  {
    printk("%s: device not ready.\n", g_pSpi->name);
    return -1;
  }

  std::uint32_t count{0U};
  std::size_t bufferSize{0U};
  std::array<std::uint8_t, 22U> rxBuffer{};

  /* Enter program loop. */
  while (true)
  {
    bufferSize = rxBuffer.size();

    /* Read new data from driver. */
    if (!receiveSync(rxBuffer.data(), bufferSize))
    {
      printk("Failed to receive new process data.\n");
      continue;
    }
    
    printk("Process Data [%u bytes, %u]: %02X-%02X %02X-%02X-%02X-%02X %02X-%02X-%02X-%02X %02X-%02X %02X-%02X-%02X-%02X %02X-%02X-%02X-%02X %02X-%02X\n", 
        bufferSize, count,
        rxBuffer[0], rxBuffer[1], rxBuffer[2], rxBuffer[3], rxBuffer[4], rxBuffer[5], rxBuffer[6], rxBuffer[7],
        rxBuffer[8], rxBuffer[9], rxBuffer[10], rxBuffer[11], rxBuffer[12], rxBuffer[13], rxBuffer[14], rxBuffer[15],
        rxBuffer[16], rxBuffer[17], rxBuffer[18], rxBuffer[19], rxBuffer[20], rxBuffer[21]);

    count++;
  }

  return 0;
}

Output:

Process Data [22 bytes, 0]: 00-00 00-02-4E-D9 00-09-9E-B6 C9-9D 90-5F-19-07 7E-DD-00-59 A1-7B
Process Data [22 bytes, 1]: 00-01 00-01-AF-3E 00-06-C0-1F CA-B4 39-BF-15-2F 6F-0E-00-63 99-BD
Process Data [22 bytes, 2]: 00-00 FF-FF-F6-C2 00-08-33-D6 C9-CC 9C-9B-19-12 C5-B5-00-56 A1-17
Process Data [22 bytes, 3]: 00-00 00-01-1C-F4 00-04-DE-FD C9-9B 2B-9A-19-2D B8-78-00-5C A1-97
Process Data [22 bytes, 4]: 00-00 00-02-53-00 00-07-B0-33 C9-D3 32-AE-19-14 49-86-00-4D A0-DB
Process Data [22 bytes, 5]: 00-02 00-06-0B-AE 00-05-67-6C CF-F7 17-84-25-95 77-BD-00-4A A3-02

Non-working asynchronous SPI sample code:

#include <zephyr/kernel.h>
#include <zephyr/device.h>
#include <zephyr/devicetree.h>
#include <zephyr/drivers/spi.h>
#include <zephyr/sys/printk.h>

#include <array>
#include <cassert>
#include <cstdint>
#include <cstring>

#define PD_RX_DMA_REGION_NAME Z_GENERIC_SECTION(LINKER_DT_NODE_REGION_NAME(DT_NODELABEL(pd_rx_dma_region)))

namespace
{

/**
 * @brief Enumerated process data thread events.
 *
 */
enum Event : std::uint8_t
{
  eShutdown = 0x01, //!< Stop process data task
  eNewData = 0x02,  //!< New data available
  eMask = 0xFF,     //!< Mask of all events
};

struct k_event g_syncEvent;                                             //!< Triggered events
const struct device* g_pSpi = DEVICE_DT_GET(DT_ALIAS(processdataspi));  //!< Pointer to SPI device tree instance
bool g_newDataReceived{false};                                          //!< Flag indicating new data is available
PD_RX_DMA_REGION_NAME std::uint8_t g_rxBuffer[22U] = {0};               //!< Receive buffer
}

#ifdef __cplusplus
extern "C" {
#endif

//*********************************************************************************************
//*********************************************************************************************
static void spiCallback(const struct device* dev, int result, void* data)
{
  (void)dev;
  (void)data;

  if (0 == result)
  {
    printk("SPI transfer failed: %d\n", result);
    return;
  }

  /* Transfer successful */
  g_newDataReceived = true;
  k_event_post(&g_syncEvent, Event::eNewData);
}

#ifdef __cplusplus
}
#endif

namespace
{

//*********************************************************************************************
//*********************************************************************************************
auto beginReceiveAsync() -> bool
{
  /* NOTE: This is static so the SPI configuration doesn't need to be reapplied with every transfer.  This is due to an internal pointer compare to check equality of SPI configs. */
  static struct spi_config config = {
      .frequency = 1'000'000U,
      .operation = SPI_OP_MODE_SLAVE | SPI_TRANSFER_MSB | SPI_MODE_CPHA | SPI_MODE_CPOL | SPI_FRAME_FORMAT_MOTOROLA | SPI_WORD_SET(8),
      .slave = 0U,
      .cs = NULL,
  };
  struct spi_buf rxBuf = {.buf = g_rxBuffer, .len = sizeof(g_rxBuffer)};
  struct spi_buf_set rxSet = {.buffers = &rxBuf, .count = 1};

  /* Begin asynchronous SPI transfer. */
  const auto ret = spi_transceive_cb(g_pSpi, &config, NULL, &rxSet, ::spiCallback, NULL);
  if (0 > ret)
  {
    printk("SPI Transceive failed: %d\n", ret);
    return false;
  }

  return true;
}

//*********************************************************************************************
//*********************************************************************************************
auto receiveProcessData(std::uint8_t* pBuffer, std::size_t* pSize) -> bool
{
  assert(nullptr != pBuffer);
  assert(nullptr != pSize);

  /* Return an error if no new data is available or destination is too small. */
  if (!g_newDataReceived || (*pSize < sizeof(g_rxBuffer)))
  {
    *pSize = 0U;
    printk("Invalid pointer or no new data available.\n");
    return false;
  }
  
  /* Copy data to destination. */
  std::memcpy(pBuffer, g_rxBuffer, sizeof(g_rxBuffer));
  *pSize = sizeof(g_rxBuffer);

  /* Clear new data flag. */
  g_newDataReceived = false;
  return true;
}

}

//***************************************************************************************
//***************************************************************************************
int main()
{
  k_event_init(&g_syncEvent);

  if (!device_is_ready(g_pSpi))
  {
    printk("%s: device not ready.\n", g_pSpi->name);
    return -1;
  }

  std::uint32_t count{0U};
  std::uint32_t events{0U};
  std::size_t bufferSize{0U};
  std::array<std::uint8_t, 22U> rxBuffer{};

  beginReceiveAsync();

  /* Enter program loop. */
  while (true)
  {
    k_event_wait(&g_syncEvent, Event::eMask, false, K_FOREVER);
    events = k_event_test(&g_syncEvent, 0x00FFFFFF);
    k_event_clear(&g_syncEvent, Event::eMask);

    if (Event::eNewData & events)
    {
      bufferSize = rxBuffer.size();

      /* Read new data from driver. */
      if (!receiveProcessData(rxBuffer.data(), &bufferSize))
      {
        printk("Failed to receive new process data.\n");
        continue;
      }

      if (0U == (count % 50U))
      {
        printk("Process Data [%u bytes, %u]: %02X-%02X %02X-%02X-%02X-%02X %02X-%02X-%02X-%02X %02X-%02X %02X-%02X-%02X-%02X %02X-%02X-%02X-%02X %02X-%02X\n", 
          bufferSize, count,
          rxBuffer[0], rxBuffer[1], rxBuffer[2], rxBuffer[3], rxBuffer[4], rxBuffer[5], rxBuffer[6], rxBuffer[7],
          rxBuffer[8], rxBuffer[9], rxBuffer[10], rxBuffer[11], rxBuffer[12], rxBuffer[13], rxBuffer[14], rxBuffer[15],
          rxBuffer[16], rxBuffer[17], rxBuffer[18], rxBuffer[19], rxBuffer[20], rxBuffer[21]);
      }

      count++;

      /* Retrigger driver to receive more data. */
      beginReceiveAsync();
    }
  }

  return 0;
}

Output w/DMA memory region:

Process Data [22 bytes, 0]: 00-02 00-05-00-EF 00-06-43-A8 CF-F3 E8-52-25-92 2E-5F-00-4B A2-FF
Process Data [22 bytes, 50]: 00-02 00-05-00-EF 00-06-43-A8 CF-F3 E8-52-25-92 2E-5F-00-4B A2-FF
Process Data [22 bytes, 100]: 00-02 00-05-00-EF 00-06-43-A8 CF-F3 E8-52-25-92 2E-5F-00-4B A2-FF
Process Data [22 bytes, 150]: 00-02 00-05-00-EF 00-06-43-A8 CF-F3 E8-52-25-92 2E-5F-00-4B A2-FF
Process Data [22 bytes, 200]: 00-02 00-05-00-EF 00-06-43-A8 CF-F3 E8-52-25-92 2E-5F-00-4B A2-FF
Process Data [22 bytes, 250]: 00-02 00-05-00-EF 00-06-43-A8 CF-F3 E8-52-25-92 2E-5F-00-4B A2-FF
Process Data [22 bytes, 300]: 00-02 00-05-00-EF 00-06-43-A8 CF-F3 E8-52-25-92 2E-5F-00-4B A2-FF

Output w/o DMA memory region:

Process Data [22 bytes, 0]: 00-00 00-00-00-00 00-00-00-00 00-00 00-00-00-00 00-00-00-00 00-00
Process Data [22 bytes, 50]: 00-00 00-00-00-00 00-00-00-00 00-00 00-00-00-00 00-00-00-00 00-00
Process Data [22 bytes, 100]: 00-00 00-00-00-00 00-00-00-00 00-00 00-00-00-00 00-00-00-00 00-00
Process Data [22 bytes, 150]: 00-00 00-00-00-00 00-00-00-00 00-00 00-00-00-00 00-00-00-00 00-00
Process Data [22 bytes, 200]: 00-00 00-00-00-00 00-00-00-00 00-00 00-00-00-00 00-00-00-00 00-00

In the asynchronous case I tried defining a DMA capable memory region fearing cache issues but it didn't matter, I still got no/garbage data.  When building the asynchronous SPI sample code I noticed compiler warnings emitted from the edma driver.  I'm wondering if this could have anything to do with what I am seeing.

EDMA compiler warnings:

[127/136] Building C object modules/hal_ti/am261x/CMakeFiles/..__asm-hal_ti__am261x.dir/source/drivers/edma/v0/edma.c.obj
./asm-hal_ti/am261x/source/drivers/edma/v0/edma.c: In function 'EDMA_getPaRAM':
./asm-hal_ti/am261x/source/drivers/edma/v0/edma.c:807:5: warning: converting a packed 'EDMACCPaRAMEntry' pointer (alignment 1) to a 'uint32_t' {aka 'unsigned int'} pointer (alignment 4) may result in an unaligned pointer value [-Waddress-of-packed-member]
  807 |     uint32_t *ds = (uint32_t *) currPaRAM;
      |     ^~~~~~~~
In file included from ./asm-hal_ti/am261x/source/drivers/edma.h:44,
                 from ./asm-hal_ti/am261x/source/drivers/edma/v0/edma.c:47:
./asm-hal_ti/am261x/source/drivers/edma/v0/edma.h:375:9: note: defined here
  375 | typedef struct {
      |         ^~~~~~
./asm-hal_ti/am261x/source/drivers/edma/v0/edma.c: In function 'EDMA_qdmaGetPaRAM':
./asm-hal_ti/am261x/source/drivers/edma/v0/edma.c:824:5: warning: converting a packed 'EDMACCPaRAMEntry' pointer (alignment 1) to a 'uint32_t' {aka 'unsigned int'} pointer (alignment 4) may result in an unaligned pointer value [-Waddress-of-packed-member]
  824 |     uint32_t *ds     = (uint32_t *) currPaRAM;
      |     ^~~~~~~~
./asm-hal_ti/am261x/source/drivers/edma/v0/edma.h:375:9: note: defined here
  375 | typedef struct {
      |         ^~~~~~
./asm-hal_ti/am261x/source/drivers/edma/v0/edma.c: In function 'EDMA_setPaRAM':
./asm-hal_ti/am261x/source/drivers/edma/v0/edma.c:840:5: warning: converting a packed 'EDMACCPaRAMEntry' pointer (alignment 1) to a 'uint32_t' {aka 'unsigned int'} pointer (alignment 4) may result in an unaligned pointer value [-Waddress-of-packed-member]
  840 |     uint32_t          *sr = (uint32_t *) newPaRAM;
      |     ^~~~~~~~
./asm-hal_ti/am261x/source/drivers/edma/v0/edma.c: In function 'EDMA_qdmaSetPaRAM':
./asm-hal_ti/am261x/source/drivers/edma/v0/edma.c:859:5: warning: converting a packed 'EDMACCPaRAMEntry' pointer (alignment 1) to a 'uint32_t' {aka 'unsigned int'} pointer (alignment 4) may result in an unaligned pointer value [-Waddress-of-packed-member]
  859 |     uint32_t *sr = (uint32_t *) newPaRAM;
      |     ^~~~~~~~

I'm not sure if there is a setup step I am missing to use SPI + DMA or there is in fact a bug related to the warnings.  Either way some working sample code on use of DMA with SPI from your end would be very helpful.

  • Hi Michael,

    Thank you for your details on the issue, really appreciate it! We have encountered similar issues while testing and cache-alignment and the coherence was the problem. its solved with explicit alignment of the rx-buffer and invalidation of the cache after the rx-callback.

    1. The EDMA driver warnings are known, and are thoroughly checked that wouldn't cause any issue on the DMA usage. (UART + DMA and SPI + DMA were validated with this state) 
    2. The cache invalidate APIs may need to be called explicitly, could you try with them using the sys_cache_data_invd_range ? 
    3. I notice you have the rx_buffer aligned in synchronous case, but isn't in the asynchronous case, could you explicitly add that alignment when declaring and try again? 
      1. The dts based DT_MEM_DMA might not help, as the MPU regions are static configurations outside of the device-tree.

    Please try these out, if it doesn't work, we can share the working sample in the thread. 

    Thanks and regards,

    Madhava

  • Invalidating the cache fixed my problem.  May I suggest suppressing those warnings so they don't appear in user builds.

  • Hi Micheal,

    Thanks for the suggestion. we are working on suppressing these.

    regards,

    Madhava