This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CCSv4 MQT_1_21 evmC6474 Problem running smmqttest example

Hi all

I am trying to get the MQT package built and running on the C6474.

I was able to build libraries and smmqttest_64P.pjt by creating the CCSv4 projects from scratch (I tried importing but ran into migration log errors that didn't make any sense to me). Actually, not totally from scratch. I borrowed the .tcf and tci files from the original MQT_1_21 package so that  I didn't have to worry about re-mapping the memory or creating tasks.

I'm able to get the smmqttest_64P example to load to main, run the the end of main, but then it will not hit break points the worker or boss tasks (I also put in log_printfs() that are not outputting either). When I stop the emulator, its stuck in HSEMCRIT_getLock(). Any thoughts of what I should try to do to debug it?

Is HSEMCRIT_getLock() run after main() exits? Knowing that would help me debug the problem.

Cheers

  • Hi

    Now that I've read some of "Shared memory message queue Transport (MQT) sprab09.ped", I think I know what the problem is. If I stop the debug session at a point where the boss thread has the semaphore, the C6474 sem location 0x02B4 0100 is set to 0x0000 0100. This means that core 1 now has the semaphore. If I stop and reload the worker core and then run, it gets hung in hsemcrit_getLock() because the semaphore isn't released.

    When I reload the boss thread and run, it is also getting hung at hsemcrit_getLock().

    It appears that the only way to reset the semaphore is to power cycle the board.

    I'm going to see if I can get the boss core to reset the semaphore in main. Seems like it should be doing this so that we start from a known state.

    Cheers

  • The next chip we make with hardware semaphores will have a way to completely reset the module with one operation. But this version of the semaphore module can be tricky to do that in software.

    Depending on the semaphore-request modes you use, there can be pending requests for semaphores that have to be released before they can be cleared. I wrote the following code to (I believe) fully reset the semaphore module. Any core that may have owned a semaphore may have to run this code simultaneously before all the requests can be cleared. This means that once you load your code and start running again, all three cores must be running and must be able to get to this code while the others are also running this code.

    #include <c6x.h>
    #include <csl_chip.h>
    #include <csl_sem.h>

    CSL_SemContext SemContext;
    CSL_SemVal response,query;
    CSL_InstNum SemInst = 0;
    CSL_SemObj pSemObj;
    CSL_Status SemStat;
    CSL_SemHandle hSemHandle;
    CSL_SemParam Param;
    CSL_SemFaultStatus FaultStatus = { 0, (CSL_SemError)0, 0 };

        CSL_SemFlagSetClear_Arg FlagSetClear;
        CSL_SemEOISet EOISet;
        unsigned int NoArg;

        /* init CSL semaphore module */
        CSL_semInit(&SemContext);
        Param.flags = 0; // Semaphore number = 0
        hSemHandle = CSL_semOpen(&pSemObj, SemInst,&Param, &SemStat);
        if ((hSemHandle == NULL) || (SemStat != CSL_SOK)) { exit(1); }

        // clear semaphore module
        {
            int nQueryCnt;

            // clear any owned semaphores
            do
            {
                nQueryCnt = 0;

                for ( i = 0; i < 32; i++ )
                {
                    volatile unsigned int *pSemQuery = hSemHandle->regs->QUERY;
                    unsigned int uQuery = *pSemQuery;

                    if ( CSL_FEXT(uQuery, SEM_QUERY_FREE) == 0 )  // a semaphore is owned
                    {
                        nQueryCnt++;  // keep track of any owned semaphores

                        // if this semaphore is owned by this core, release it
                        if ( CSL_FEXT(uQuery, SEM_QUERY_OWNER) == CSL_chipReadReg( CSL_CHIP_DNUM ) )
                        {
                            hSemHandle->regs->DIRECT[i] = HW_SEM_RELEASE;
                        }
                    }
                }

            } while ( nQueryCnt > 0 );  // even if another core owns a semaphore, keep repeating this in case an old request is pending

            // clear any pending semaphore interrupt flags
            FlagSetClear.mask = 0xffffffff;
            FlagSetClear.masterId = (CSL_SemOwnerId)((int)CSL_SEM_ID0+CSL_chipReadReg( CSL_CHIP_DNUM ));
            CSL_semHwControl(hSemHandle, CSL_SEM_CMD_CLEAR_FLAGS, &FlagSetClear);

            // re-arm semaphore module interrupts
            EOISet = (CSL_SemEOISet)((int)CSL_SEM_REARM_SEMINT0+CSL_chipReadReg( CSL_CHIP_DNUM ));
            CSL_semHwControl(hSemHandle, CSL_SEM_CMD_EOI_WRITE, &EOISet);

            // clear any pending error flags
            CSL_semHwControl(hSemHandle, CSL_SEM_CMD_CLEAR_ERR, &NoArg);

            // re-arm error interrupts (every core will do this, possible race condition?)
            EOISet = CSL_SEM_REARM_SEMINT_ALL;
            CSL_semHwControl(hSemHandle, CSL_SEM_CMD_EOI_WRITE, &EOISet);
        }
        /* init CSL semaphore module */

  • Yiks, this is even scarier than I thought. The boss thread does not initialize the ->num member of the HSEMCRIT_handle in HSEMCRIT_open(). This is not good as this var is used to access the semaphore table. For some reason the passed in "init" veriable is set to zero when HSEMCRIT_open is called.

    But for the worker thread, init is set to 1 hence the ->num variable is set properly HSEMCRIT_open() is called.

    Now to figure out why the boss thread doesn't have init set to "1".

    Cheers

     

  • Why hsemcrit_open() does not have "init" set to "1" for the boss thread is because the calling function SMMQT_init() sets it to false when

    if (smmqtState.systemConfig->procIdToInit == GBL_getProcId()) {
            /*
             *  Determine if local reset for executing core occured.
             *  If not, then initialize the smmmqtState array of QUEs.
             */
            resetStat = _SMMQT_isLocalReset(DNUM);
    .

    .

    .

    }
    else {
            CRIT_open(smmqtState.systemConfig->critHandle, FALSE);   HERE IS WHERE THE BOSS HsemCrit_open() gets calledn
    }

    So I thought that maybe there is a shared mem structure for *handle in the hsemcrit_open() function below. When I broke in the function, the address of handle and params both point to the 0x0080 0000 L2 RAM. According to the C6474 data sheet (see section below), this address is for shared memory. So I'll assume that there isn't a problem with not setting the "->num" variable in the boss application.

    You can see below where I added code to clear the semaphore. Seems to have fixed the problem.

    Int _HSEMCRIT_open(Ptr *handle, Ptr params, Bool init)
    {

          volatile Uint32 *sem = _HSEMCRIT_BASEADDR; // I added to release semaphore
          HSEMCRIT_Handle hsemCritHandle = (HSEMCRIT_Handle)(*handle);

        if (init) {
            hsemCritHandle->num = ((HSEMCRIT_Params *)params)->num;
        }

       sem[hsemCritHandle->num] = 1;   // I added to release semaphore.

       
        return (SYS_OK);
    }

    -----------From the C6474 data sheet sprs552d.pdf ------------------------------------------------------------------------------

    Megamodule and allows for common code to be run unmodified on multiple cores. For example, address
    location 0x10800000 is the global base address for C64x+ Megamodule Core 0's L2 memory. C64x+
    Megamodule Core 0 can access this location by either using 0x10800000 or 0x00800000. Any other
    master on the device must use 0x10800000 only. Conversely, 0x00800000 can by used by any of the
    three cores as their own L2 base addresses
    . For C64x+ Megamodule Core 0, as mentioned this is
    equivalent to 0x10800000, for C64x+ Megamodule Core 1 this is equivalent to 0x11800000, and for
    C64x+ Megamodule Core 2 this is equivalent to 0x12800000. Local addresses should only be used for
    shared code or data, allowing a single image to be included in memory. Any code/data targeted to a
    specific core, or a memory region allocated during run-time by a particular core should always use the
    global address only.

     

  • The example code should work as-is without having to do all this debug that you are doing. Why did it not work right out of the box? What changes did you do to it in the first place to get where you are now?

    Eddie said:
    I was able to build libraries and smmqttest_64P.pjt by creating the CCSv4 projects from scratch (I tried importing but ran into migration log errors that didn't make any sense to me).

    I fear that this original statement has led to some problem that is sending you off in too many directions that you should not have to go.

    From the C6474 data sheet sprs552d.pdf said:
    Local addresses should only be used for shared code or data, allowing a single image to be included in memory.

    This is an unfortunate use of the word "shared". In this case, we are referring to the case where you want to build a single program image that can be loaded and run on each of the 3 cores. By using the local L2 addresses starting at 0x00800000, every core will be able to run its own copy of the same program without impacting the data or program on another core.

    This is not what you would refer to as "shared memory", but from the system point-of-view this is a way to share the same program and data addresses on each of the cores.

    And the comment that this is "a single image ... in memory" means that the same image is copied to all three cores. So there are three identical copies of a single image.

    Since this basic point is a confusing one, I would like to recommend that you take a look at some of the online training material we have available. In the Training section of TI.com, there is a training video set for the C6474 at http://focus.ti.com/docs/training/catalog/events/event.jhtml?sku=OLT110002 .

    Also, if your questions are really related to the use of the smmqt software, then you may want to summarize your question(s) in a new posting in the Embedded Software -> BIOS forum. I see you have already posted there, so you know about it. It is more likely to be seen by an LLD expert than it is here. And you would not have to put up with my details on hardware initialization, etc.

  • Hi Randy

     

    RandyP said:
    1. The example code should work as-is without having to do all this debug that you are doing. Why did it not work right out of the box? What changes did you do to it in the first place to get where you are now?

    Fortunately, it worked "out of the box" for CCSv3. CCSv4 was another story. The import wasn't smooth at all (see http://e2e.ti.com/support/development_tools/code_composer_studio/f/81/p/35543/124517.aspx#124517).

    In the example, if I stopped my debug session when the boss had the semaphore, you can't run the example until you HW reset.

    Thanks for the sem reset SW, the training video and pointing me to the BIOS group.

    Cheers

  • Hi,

    I do not know if this question is trivial. But i thought i will go ahead.

    I am trying to syncghronize the 3 cores,but am not using semaphores,instead using shared memory.

    I amusing three memory locations ( shared volatile pointer address i mean) for each of the cores to set to 1.

    So, each core will be waiting till all of the three locations are 1.

    Logic seems fine thus far. But when i debug in ccs, i find the following,

    Suppose core 1 is run first, the three locations values are (1,0,0).

    Now if core 2 is run, the value  becomes ( 0,1,0) instead of (1,1,0).

    Is this because, the core 2 starts executing from the first and initilaises eveything to 0 beforehand?

    is this approach wrong?

    kindly let me know your opinion.

    Thanks!

     

  • Apologies.

    i got the answer...

    Thanks

  • M. Faraday, please post your solution for the forum to have. The solution I have used is

        volatile int *pCommonMem = (int *)0x80000000;  // or whatever address you want, need three words in a row

        // sync all three cores to be starting at the same time
        for ( i = 0; i < 3; i++ ) pCommonMem[i] = 0;  // clear 3 words to make sure they're all 0
        while ( pCommonMem[0]+pCommonMem[1]+pCommonMem[2] != 3 )  // wait until all 3 are set to 1
            if ( pCommonMem[CSL_chipReadReg( CSL_CHIP_DNUM )] == 0 )  // if this core's is 0, set to 1
                pCommonMem[CSL_chipReadReg( CSL_CHIP_DNUM )] = 1;

    The repeated calls to CSL_chipReadReg() could be optimized a bit, but the optimizer may take care of that anyway. The time to go through the while-loop is the remaining worst-case uncertainty between the three cores.

    The volatile memory needs to be non-cached at this point. If you have simple code to force that, it would be nice to show here, too.

  • Hello Randy,

    Even i am using a code snippet similar to yours.

    Here it is.

      volatile  unsigned int *ptr_CoreA  = (unsigned int *)0x88000000;  //shared memory
       volatile unsigned int *ptr_CoreB = (unsigned int *)0x88000004;
       volatile  unsigned int *ptr_CoreC = (unsigned int *)0x88000008;

     

    inside function

        if( coreID == 0)      // coreID here is read from DNUM
            *ptr_CoreA = 1;
        else if (coreID == 1)
            *ptr_CoreB = 1;
        else{
            *ptr_CoreC = 1;
        }
           
      for(;;)
      {
          if( (*ptr_CoreA == 1) & (*ptr_CoreB == 1) & (*ptr_CoreC == 1) )
           break ;
      }

     

    the only thing is i am not initialising all 3 locations to 0 beforehand. I know this is not a good practice. But i works.!

     

    Thanks!

  • Since I do not know who you are, I cannot send this privately, so please forgive the critique. Without initializing all 3 locations to 0 beforehand, your solution does *not* work. Consider the case where you load all three cores and run them past this point. If you then halt, edit the code, reload and run again - all 3 locations are still 1 so each core will pass out of the for-loop without synchronization.

    It is more common practice to use logical && rather than bit-wise & in the if-statement, is it not? It might be interesting to see how the optimizer handles the two choices differently.

  • Hello Randy,

    Thanks for the critique!!.... It only helps ...:)

    Well i had not considered this case. And yes i should have considered  the logical AND...dont know how i missed..my bad...

     

    But  yes..my code only works if the code is run once....i will make the corrections..

     

    Best Regards,

    Kishore Ramaiah. :)