This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

PROCESSOR-SDK-J721S2: C7x: Transpose using SE is randomly hanging

Part Number: PROCESSOR-SDK-J721S2
Other Parts Discussed in Thread: TDA4VH

PROCESSOR-SDK-RTOS-J721S2 Version: 09.00.00.02

Hi,

We implemented a transpose function on the DSP core using the C7x intrinsics as shown below:

void Transpose(
    uint32_t* restrict inPtr,
    uint32_t* restrict outPtr,
    uint32_t inRows,
    uint32_t inCols)
{
    uint32_t rows;
    uint32_t remainRows;
    uint32_t blocks;
    uint32_t maxVecLen = c7x::element_count_of<uint16>::value;

    if (inRows <= maxVecLen)
    {
        rows = inRows;
        blocks = 1;
        remainRows = 0;
    }
    else
    {
        rows = maxVecLen;
        blocks = inRows / maxVecLen;
        remainRows = inRows % maxVecLen;
    }

    __SE_TEMPLATE_v1 seTemplate = __gen_SE_TEMPLATE_v1();
    
    seTemplate.ELETYPE = __SE_ELETYPE_32BIT;
    seTemplate.VECLEN = __SE_VECLEN_16ELEMS;
    seTemplate.TRANSPOSE = __SE_TRANSPOSE_32BIT;

    seTemplate.DIMFMT = __SE_DIMFMT_3D;
    seTemplate.ICNT0 = inCols;
    seTemplate.ICNT1 = rows;
    seTemplate.ICNT2 = blocks;
    seTemplate.DIM1 = inCols;
    seTemplate.DIM2 = inCols * maxVecLen;

    __SE0_OPEN((void *)inPtr, seTemplate);

    __vpred vPred = __mask_int((uint32_t)rows);

    for(int32_t blk = 0; blk < blocks; blk++)
    {
        for(int32_t col = 0; col < inCols; col++)
        {
            uint32_t outPtrOffset = (col * inRows) + (blk * maxVecLen);
            uint16 vIn = __SE0ADV(uint16);
            __vstore_pred(vPred, (uint16*)&outPtr[outPtrOffset], vIn);
        }
    }

    __SE0_CLOSE();

    if (0 < remainRows)
    {
        seTemplate = __gen_SE_TEMPLATE_v1();
    
        seTemplate.ELETYPE = __SE_ELETYPE_32BIT;
        seTemplate.VECLEN = __SE_VECLEN_16ELEMS;
        seTemplate.TRANSPOSE = __SE_TRANSPOSE_32BIT;

        seTemplate.DIMFMT = __SE_DIMFMT_3D;
        seTemplate.ICNT0 = inCols;
        seTemplate.ICNT1 = remainRows;
        seTemplate.ICNT2 = 1;
        seTemplate.DIM1 = inCols;
        seTemplate.DIM2 = 0;

        __SE0_OPEN((void *)&inPtr[blocks * maxVecLen * inCols], seTemplate);

        vPred = __mask_int((uint32_t)remainRows);

        for(int32_t col = 0; col < inCols; col++)
        {
            uint32_t outPtrOffset = (col * inRows) + (blocks * maxVecLen);
            uint16 vIn = __SE0ADV(uint16);
            __vstore_pred(vPred, (uint16*)&outPtr[outPtrOffset], vIn);
        }

        __SE0_CLOSE();
    }
}

This code was tested on version: 09.00.00.02 for both:

  • PROCESSOR-SDK-RTOS-J721S2
  • PROCESSOR-SDK-RTOS-J721E

When executed in a loop, with inPtr and outPtr on L2Sram and on the TIVX_CPU_ID_DSP_C7_1 core, this function works correctly on J721E , however it randomly hangs on the J721S2.

Is there any difference between the two products or a bug within our code that leads to this behaviour.

Regards.

  • Hi Amine,

    I am conferring with the development team about why you are seeing this runtime difference between J721E and J721S2 as I am not sure why that would be the case. I will get back to you next week after I receive a response and spend some time looking at the code you have provided. 

    Best,

    Asha

  • Hi Amine,

    Would you be able to share the code where the memory is allocated and where you are setting the inPtr and outPtr before calling the Transpose function? 

    Best,

    Asha

  • Hi Asha,

    Below is the code used for memory allocation

    constexpr uint32_t L2SRAM_ALIGNMENT = 128;
    
    constexpr uint32_t TRANSPOSE_BLK_WIDTH = L2SRAM_ALIGNMENT;
    
    constexpr uint32_t TRANSPOSE_BLK_HEIGHT = L2SRAM_ALIGNMENT;
    
    constexpr uint32_t L2SRAM_BLK_STRIDE = TRANSPOSE_BLK_HEIGHT * TRANSPOSE_BLK_WIDTH * sizeof(uint32_t);
    
    vx_status L2SramInit(TensorTransposeParams* prms)
    {
        vx_status status = VX_SUCCESS;
    
        tivxMemFree(NULL, 0, TIVX_MEM_INTERNAL_L2);
    
        prms->l2Size = (L2SRAM_BLK_STRIDE * 4) + L2SRAM_ALIGNMENT;
    
        prms->pL2Base = tivxMemAlloc(prms->l2Size, TIVX_MEM_INTERNAL_L2);
    
        if (NULL == prms->pL2Base)
        {
            status = VX_FAILURE;
            VX_PRINT(VX_ZONE_ERROR, "Unable to allocate L2SRAM memory\n");
        }
        else
        {
            uint64_t l2Base = (uint64_t)(((uintptr_t)prms->pL2Base + (L2SRAM_ALIGNMENT - 1)) & ~(uintptr_t)(L2SRAM_ALIGNMENT - 1));
    
            prms->srcL2[0] = l2Base;
            prms->srcL2[1] = l2Base + L2SRAM_BLK_STRIDE;
            prms->dstL2[0] = l2Base + (L2SRAM_BLK_STRIDE * 2);
            prms->dstL2[1] = l2Base + (L2SRAM_BLK_STRIDE * 3);
        }
    
        return status;
    }
    
    void L2SramDeinit(TensorTransposeParams* prms)
    {
        if (NULL != prms->pL2Base)
        {
            tivxMemFree(prms->pL2Base, prms->l2Size, TIVX_MEM_INTERNAL_L2);
        }
    
        tivxMemFree(NULL, 0, TIVX_MEM_INTERNAL_L2);
    }

    The srcL2, srcL2, dstL2 and dstL2 are used for ping-pong mechanism implemented using DRU channels.

    Regards

  • Hi Amine,

    Transpose mode is not supported for J721S2 DRU which explains the behavior differences you are seeing between it and J721E. I believe your issue is similar to this previous thread which you can view for reference. Let me know if that resolves your question.

    Best,

    Asha

  • Hi Asha,

    We are not currently using the DRU transpose mode. We are using the DRU channels to simply copy the data.

    However, we are using the SE transpose mode. Is it also not supported for SE on the J721S2?

    Regards

  • Hi Amine,

    Thank you for clarifying that! SE transpose mode is supported on J721S2 (as well as J721E). 

    Would you be able to run your code in debug mode and see exactly where you are seeing the execution hang? Would you be also be able to send the values of the following C7x registers? In Code Composer Studio they would be under the CPU registers for the C7x core. That will help us debug the issue. 

    IERR

    IEAR

    IESR

    IEDR

    Best,

    Asha

  • Hi Asha,

    Unfortunately we currently don't have the full setup to debug the C7X cores. However you can find attached the cpp test file that contains the implementation and the test function to reproduce this issue on the J721S2.

    #include <cstdint>
    #include <iostream>
    #include <sstream>
    #include <type_traits>
    #include <cstring>
    
    #include <TI/tivx.h>
    #include <c7x_scalable.h>
    #include <TI/tivx_mem.h>
    #include <utils/mem/include/app_mem.h>
    
    namespace transpose_tests
    {
    
    /* ********************************************************************* */
    /* Private Defines/Constexpr                                        **** */
    /* ********************************************************************* */
    
    // The SE can only load a max of 16 elements in transpose mode
    constexpr int32_t ROW_BLK_SIZE = c7x::element_count_of<uint16>::value;
    
    // Logger macros
    
    #define __FILENAME__ \
        (strrchr(__FILE__, '/') ? (strrchr(__FILE__, '/') + 1) : __FILE__)
    
    #define PRINT_TEST_HEADER() \
        std::cout << std::endl << "*********** " << __func__ << std::endl
    
    #define LOG() \
        std::cout << __FILENAME__ << "(" << __LINE__ << ")::" << __func__ << ": "
    
    #define LOG_VAL(val) \
        LOG() << #val << " = " << val << std::endl;
    
    #define LOG_RESULT(success) \
        LOG() << (success ? "PASS" : "FAIL") << std::endl
    
    #define RETURN_SUCCESS()    \
        do {                    \
            LOG_RESULT(true);   \
            return true;        \
        } while (false)
    
    #define RETURN_FAILURE()    \
        do {                    \
            LOG_RESULT(false);  \
            return false;       \
        } while (false)
    
    #define RETURN_STATUS(success)  \
        do {                        \
            if (success)            \
            {                       \
                RETURN_SUCCESS();   \
            }                       \
            else                    \
            {                       \
                RETURN_FAILURE();   \
            }                       \
        } while (false)
    
    #define REQUIRE(x)                                          \
        do {                                                    \
            if (!(x))                                           \
            {                                                   \
                LOG() << "Failed at { " #x " }" << std::endl;   \
                RETURN_FAILURE();                               \
            }                                                   \
        } while (false)
    
    
    /* ********************************************************************* */
    /* Private Function Definition                                      **** */
    /* ********************************************************************* */
    
    static vx_status Transpose(
        uint32_t* inPtr,
        uint32_t* outPtr,
        int32_t inRows,
        int32_t inCols)
    {
        vx_status status = VX_SUCCESS;
    
        if ((NULL == inPtr) || (NULL == outPtr) ||
            (0 >= inRows)   || (0 >= inCols))
        {
            VX_PRINT(VX_ZONE_ERROR,
                     "Invalid param!\n"
                     "    inPtr  = %p\n"
                     "    outPtr = %p\n"
                     "    inRows = %d\n"
                     "    inCols = %d\n",
                     inPtr,
                     outPtr,
                     inRows,
                     inCols);
            status = VX_ERROR_INVALID_PARAMETERS;
        }
    
        if (VX_SUCCESS == status)
        {
            int32_t rowBlocks;
            int32_t remainRows;
            constexpr int32_t MAX_VEC_LEN = c7x::element_count_of<uint16>::value;
    
            rowBlocks = inRows / MAX_VEC_LEN;
            remainRows = inRows % MAX_VEC_LEN;
    
            if (0 < rowBlocks)
            {
                __SE_TEMPLATE_v1 seTemplate = __gen_SE_TEMPLATE_v1();    
                seTemplate.ELETYPE   = __SE_ELETYPE_32BIT;
                seTemplate.VECLEN    = __SE_VECLEN_16ELEMS;
                seTemplate.TRANSPOSE = __SE_TRANSPOSE_32BIT;
                seTemplate.DIMFMT    = __SE_DIMFMT_3D;
                seTemplate.ICNT0     = inCols;
                seTemplate.ICNT1     = MAX_VEC_LEN;
                seTemplate.ICNT2     = rowBlocks;
                seTemplate.DIM1      = inCols;
                seTemplate.DIM2      = inCols * MAX_VEC_LEN;
    
                __SA_TEMPLATE_v1 saTemplate = __gen_SA_TEMPLATE_v1();
                saTemplate.VECLEN           = __SA_VECLEN_16ELEMS;
                saTemplate.DIMFMT           = __SA_DIMFMT_3D;
                saTemplate.ICNT0            = MAX_VEC_LEN;
                saTemplate.ICNT1            = inCols;
                saTemplate.ICNT2            = rowBlocks;
                saTemplate.DIM1             = inRows;
                saTemplate.DIM2             = MAX_VEC_LEN;
    
                __SE0_OPEN((void *)&inPtr[0], seTemplate);
                __SA0_OPEN(saTemplate);
    
                for (int32_t blk = 0; blk < rowBlocks; blk++)
                {
                    for (int32_t col = 0; col < inCols; col++)
                    {
                        uint16 vIn = __SE0ADV(uint16);
                        *__SA0ADV(uint16, &outPtr[0]) =  vIn;
                    }
                }
    
                __SE0_CLOSE();
                __SA0_CLOSE();
            }
    
            if (0 < remainRows)
            {
                uint32_t inputOffset = inCols * rowBlocks * MAX_VEC_LEN;
                uint32_t outputOffset = rowBlocks * MAX_VEC_LEN;
    
                __SE_TEMPLATE_v1 seTemplate = __gen_SE_TEMPLATE_v1();    
                seTemplate.ELETYPE   = __SE_ELETYPE_32BIT;
                seTemplate.VECLEN    = __SE_VECLEN_16ELEMS;
                seTemplate.TRANSPOSE = __SE_TRANSPOSE_32BIT;
                seTemplate.DIMFMT    = __SE_DIMFMT_3D;
                seTemplate.ICNT0     = inCols;
                seTemplate.ICNT1     = remainRows;
                seTemplate.ICNT2     = 1;
                seTemplate.DIM1      = inCols;
                seTemplate.DIM2      = 0;
    
                __SA_TEMPLATE_v1 saTemplate = __gen_SA_TEMPLATE_v1();
                saTemplate.VECLEN           = __SA_VECLEN_16ELEMS;
                saTemplate.DIMFMT           = __SA_DIMFMT_3D;
                saTemplate.ICNT0            = remainRows;
                saTemplate.ICNT1            = inCols;
                saTemplate.ICNT2            = 1;
                saTemplate.DIM1             = inRows;
                saTemplate.DIM2             = 0;
    
                __SE0_OPEN((void *)&inPtr[inputOffset], seTemplate);
                __SA0_OPEN(saTemplate);
    
                for (int32_t col = 0; col < inCols; col++)
                {
                    vpred vpStore = __SA0_VPRED(uint16);
                    uint16 vIn = __SE0ADV(uint16);
    
                    __vstore_pred(vpStore, __SA0ADV(uint16, &outPtr[outputOffset]), vIn);
                }
    
                __SE0_CLOSE();
                __SA0_CLOSE();
            }
        }
    
        return status;
    }
    
    static bool Test_Failure()
    {
        int32_t inCols = 10;
        int32_t inRows = 10;
        uint32_t inputBuffer[100];
        uint32_t outputBuffer[100];
        uint32_t* inPtr = &inputBuffer[0];
        uint32_t* outPtr = &outputBuffer[0];
    
        auto ResetValues = [&]()
        {
            inCols = 10;
            inRows = 10;
            inPtr = &inputBuffer[0];
            outPtr = &outputBuffer[0];
        };
    
        // inPtr == NULL
        inPtr = NULL;
        REQUIRE(VX_SUCCESS != Transpose(inPtr, outPtr, inRows, inCols));
    
        ResetValues();
        // outPtr == NULL
        outPtr = NULL;
        REQUIRE(VX_SUCCESS != Transpose(inPtr, outPtr, inRows, inCols));
    
        ResetValues();
        // inCols == 0
        inCols = 0;
        REQUIRE(VX_SUCCESS != Transpose(inPtr, outPtr, inRows, inCols));
    
        ResetValues();
        // inCols < 0
        inCols = -1;
        REQUIRE(VX_SUCCESS != Transpose(inPtr, outPtr, inRows, inCols));
    
        ResetValues();
        // inRows == 0
        inRows = 0;
        REQUIRE(VX_SUCCESS != Transpose(inPtr, outPtr, inRows, inCols));
    
        ResetValues();
        // inRows < 0
        inRows = -1;
        REQUIRE(VX_SUCCESS != Transpose(inPtr, outPtr, inRows, inCols));
    
        RETURN_SUCCESS();
    }
    
    static bool Test_Transpose(int32_t inRows, int32_t inCols, uint32_t inMem, uint32_t outMem)
    {
        uint32_t* inPtr = NULL;
        uint32_t* outPtr = NULL;
        int32_t size = inCols * inRows * sizeof(uint32_t);
    
        tivxMemFree(NULL, 0, TIVX_MEM_INTERNAL_L2);
    
        auto CleanUp = [&]()
        {
            if (NULL != inPtr)
            {
                tivxMemFree(inPtr, size, inMem);
            }
    
            if (NULL != outPtr)
            {
                tivxMemFree(outPtr, size, outMem);
            }
    
            tivxMemFree(NULL, 0, TIVX_MEM_INTERNAL_L2);
        };
    
        inPtr = (uint32_t*)tivxMemAlloc(size, inMem);
        if (NULL == inPtr)
        {
            CleanUp();
            return false;
        }
    
        outPtr = (uint32_t*)tivxMemAlloc(size, outMem);
        if (NULL == outPtr)
        {
            CleanUp();
            return false;
        }
    
        for (int32_t iter = 0; iter < 1000; iter++)
        {
            // Setup in/out buffers
            for (int32_t idx = 0; idx < size; idx++)
            {
                inPtr[idx] = idx + 1;
                outPtr[idx] = 0;
            }
    
            // Execute transpose
            if (VX_SUCCESS != Transpose(inPtr, outPtr, inRows, inCols))
            {
                CleanUp();
                return false;
            }
    
            // Verify results
            auto* inArray  = (uint32_t(*)[inCols])inPtr;
            auto* outArray = (uint32_t(*)[inRows])outPtr;
    
            for (int32_t row = 0; row < inRows; row++)
            {
                for (int32_t col = 0; col < inCols; col++)
                {
                    if (inArray[row][col] != outArray[col][row])
                    {
                        LOG() << "row: " << row << ", col: " << col << std::endl;
                        CleanUp();
                        return false;
                    }
                }
            }
        }
    
        CleanUp();
    
        return true;
    }
    
    static bool Test_RowBlockRemainderOneColum(uint32_t inMem, uint32_t outMem)
    {
        REQUIRE(Test_Transpose(ROW_BLK_SIZE - 5, 1, inMem, outMem));
    
        RETURN_SUCCESS();
    }
    
    static bool Test_RowBlockRemainderMultipleColum(uint32_t inMem, uint32_t outMem)
    {
        REQUIRE(Test_Transpose(ROW_BLK_SIZE - 5, 128, inMem, outMem));
    
        RETURN_SUCCESS();
    }
    
    static bool Test_OneRowBlockOneColum(uint32_t inMem, uint32_t outMem)
    {
        REQUIRE(Test_Transpose(ROW_BLK_SIZE, 1, inMem, outMem));
    
        RETURN_SUCCESS();
    }
    
    static bool Test_OneRowBlockMultipleColum(uint32_t inMem, uint32_t outMem)
    {
        REQUIRE(Test_Transpose(ROW_BLK_SIZE, 128, inMem, outMem));
    
        RETURN_SUCCESS();
    }
    
    static bool Test_OneRowBlockWithRemainderOneColum(uint32_t inMem, uint32_t outMem)
    {
        REQUIRE(Test_Transpose(ROW_BLK_SIZE + 10, 1, inMem, outMem));
    
        RETURN_SUCCESS();
    }
    
    static bool Test_OneRowBlockWithRemainderMultipleColum(uint32_t inMem, uint32_t outMem)
    {
        REQUIRE(Test_Transpose(ROW_BLK_SIZE + 10, 128, inMem, outMem));
    
        RETURN_SUCCESS();
    }
    
    
    static bool Test_MultipleRowBlockOneColum(uint32_t inMem, uint32_t outMem)
    {
        REQUIRE(Test_Transpose(10 * ROW_BLK_SIZE, 1, inMem, outMem));
    
        RETURN_SUCCESS();
    }
    
    static bool Test_MultipleRowBlockMultipleColum(uint32_t inMem, uint32_t outMem)
    {
        REQUIRE(Test_Transpose(10 * ROW_BLK_SIZE, 128, inMem, outMem));
    
        RETURN_SUCCESS();
    }
    
    static bool Test_MultipleRowBlockWithRemainderOneColum(uint32_t inMem, uint32_t outMem)
    {
        REQUIRE(Test_Transpose((10 * ROW_BLK_SIZE) + 10, 1, inMem, outMem));
    
        RETURN_SUCCESS();
    }
    
    static bool Test_MultipleRowBlockWithRemainderMultipleColum(uint32_t inMem, uint32_t outMem)
    {
        REQUIRE(Test_Transpose((10 * ROW_BLK_SIZE) + 10, 128, inMem, outMem));
    
        RETURN_SUCCESS();
    }
    
    /* ********************************************************************* */
    /* Public Function Definition                                       **** */
    /* ********************************************************************* */
    
    bool TestSuite(void)
    {
        PRINT_TEST_HEADER();
    
        bool res = true;
    
        res &= Test_Failure();
    
        for (int32_t idx = 0; idx < 100; idx++)
        {
            std::cout << "Iteration nbr: " << idx << std::endl;
    
            res &= Test_RowBlockRemainderOneColum(TIVX_MEM_INTERNAL_L2, TIVX_MEM_INTERNAL_L2);
            res &= Test_RowBlockRemainderMultipleColum(TIVX_MEM_INTERNAL_L2, TIVX_MEM_INTERNAL_L2);
            res &= Test_OneRowBlockOneColum(TIVX_MEM_INTERNAL_L2, TIVX_MEM_INTERNAL_L2);
            res &= Test_OneRowBlockMultipleColum(TIVX_MEM_INTERNAL_L2, TIVX_MEM_INTERNAL_L2);
            res &= Test_OneRowBlockWithRemainderOneColum(TIVX_MEM_INTERNAL_L2, TIVX_MEM_INTERNAL_L2);
            res &= Test_OneRowBlockWithRemainderMultipleColum(TIVX_MEM_INTERNAL_L2, TIVX_MEM_INTERNAL_L2);
            res &= Test_MultipleRowBlockOneColum(TIVX_MEM_INTERNAL_L2, TIVX_MEM_INTERNAL_L2);
            res &= Test_MultipleRowBlockMultipleColum(TIVX_MEM_INTERNAL_L2, TIVX_MEM_INTERNAL_L2);
            res &= Test_MultipleRowBlockWithRemainderOneColum(TIVX_MEM_INTERNAL_L2, TIVX_MEM_INTERNAL_L2);
            res &= Test_MultipleRowBlockWithRemainderMultipleColum(TIVX_MEM_INTERNAL_L2, TIVX_MEM_INTERNAL_L2);
        }
    
        return res;
    }
    
    }
    

    Regards

  • Hi Amine,

    Can you provide steps on how to integrate your program on top of the 9.0 SDK so we can replicate your setup on our end?

    Best,

    Asha

  • Hi Asha, Ill try to answer the question on behalf of Amine since he is unavailable today:

    using the vision apps app_init.c will be an easy way to run it:

    1. You will need to create a .cpp file at "/platform/j721s2/rtos/c7x_1" and copy paste Amine's implementation inside (let's call it transposeE2E.cpp)
    2. Add extern "C" before the public function ("TestSuite(void)) at the bottom (bool TestSuite(void) -> extern "C" bool TestSuite(void)
    3. Add the line " CPPSOURCES  += transposeE2E.cpp" to the 
    concerto file at "/platform/j721s2/rtos/c7x_1/concerto.mak"
    4. Add the function decleration:

    #ifdef CPU_c7x_1
    bool TestSuite();
    #endif

    on the file "vayadrive-tda4-fw/platform/j721s2/rtos/common/app_init.c"

    5. Add function call

        #ifdef CPU_c7x_1
        TestSuite();
        #endif

    at the end of appInit() function in the same app_init.c file

    6. Build c7 firmware, flush to a board and reboot.

    Best,

    Yotam

  • Thank you for the steps. , please take note.

  • Hi Yotam, 

    Thank you for providing the build instructions. I am giving an update after our meeting on 10/25

    I was able to replicate the build steps you have given and run the code on the EVM. I have been able to reproduce the issue that you described. The code execution halts at a random iteration in the test suite. So far, I have not determined what the core issue of this behavior is. I am working with our C7x development team to debug and see if we can pinpoint a reason for this behavior. This might take a few days for us to figure out - I will keep you updated on this issue. 

    In the meantime, would you be able to clarify between where DRU is copying data from?

    We are using the DRU channels to simply copy the data

    The srcL2, srcL2, dstL2 and dstL2 are used for ping-pong mechanism implemented using DRU channels.

    Best,

    Asha

  • Hi Asha,

    In our implementation we are using the DRU to ping pong the data from DRAM but the problem here is not related and we were able to show it in the code above without using the DRU at all.

    Thanks,

    Yotam

  • Hi Yotam,

    Thank you for clarifying that. I am giving an update after our meeting on 10/26

    We are still currently debugging the root cause of the SE issue. In particular, it seems to be linked specifically to using SE in transpose mode. 

    A potential workaround that I found is disabling interrupts before calling your TestSuite() function. For example, based on the setup you described in your previous post, the following code called at the end of the appInit() function does not seem to hang.

        uint32_t interState;
        interState = HwiP_disable();
    #ifdef CPU_c7x_1
        TestSuite();
    #endif
        HwiP_restore(interState);
        appLogPrintf("APP: TestSuite ... Done !!!\n");

    Would you be able to try this workaround and see if it works on your setup?

    Also, have you tried running this on C7x_2?

    I will keep you updated as we determine the root cause for this issue. 

    Best,

    Asha

  • Hi Asha,

    Thanks for the fast update.

    we will test the 2 optioned you mentiond to see if we can use it as a workaround.

    Best,

    Yotam

  • Hi Yotam,

    To clarify, try the workaround on C7x_1. This was the setup that was working for me.

    I was curious if you had seen the same behavior on C7x_2. 

    Best,

    Asha

  • Hi Asha,

    The problem is reproducible on both C7x_1 and C7x_2.

    Adding the below code, fixed the problems on both cores.

     

    uint32_t interState;
    interState = HwiP_disable();
    
    HwiP_restore(interState);

    Regards

  • Hi Amine and Yotam,

    Our hardware team has investigated, and the behavior is due to a silicon bug that affects the C7x variant in J721S2 and J784S4 (TDA4VE and TDA4VH). For now, please use disabling interrupts as the software workaround for this bug. 

    Best,

    Asha

  • Thanks Asha, we will add the workaround to our implementation.