This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

MSA/UIA Questions

Expert 2430 points
Other Parts Discussed in Thread: SYSBIOS, TMS320C6474

A couple of small questions.

I'm at the point in my project where I need to start utilizing MSA's load analysis functionality.  Eventually, I need to capture data for 9 cores (3 C6474 DSPs), but right now I am having problems with just a single DSP (3 cores).

First, if I just run continuously and have everything disabled but the main logger, I keep getting "Out_Of_Seq Data" warnings.  (Each core is just printing out a benchmark number every 1 second.)  I have the diableMulticoreEventCorrelation is set to false, so other than that, is there some other setting that I need to be setting, or is this just one of those informational only type warnings that I should just ignore?

Second, what's the relationship between the logger's transfer buffer size and the size specified in LoggerSetup object?  I have mounds of DDR memory (512 MB total, so I gave each core about 170 MB [exclusive, not shared]), so I am creating and assigning my own loggers like so:

LoggingSetup.mainLogger    = CreateLogger( 32*1024, "(UIA) Main Logger",    ".logger:UIA_Main"    );
LoggingSetup.loadLogger    = CreateLogger( 64*1024, "(UIA) Load Logger",    ".logger:UIA_Load"    );
LoggingSetup.sysbiosLogger = CreateLogger( 64*1024, "(UIA) SysBios Logger", ".logger:UIA_SysBios" );

(CreateLogger is just a helper function I wrote to setup the logger.  [see below])

So for the main logger, if the transfer buffer size is set to 32K as shown, what do I need to set LoggingSetup.mainLoggerSize to?  (not to mention the Event size, too, which I believe just defaults to 128 MAUs)

Finally, when enabling anything other than main logging, I have quickly learned that continuous monitoring is just not feasible unfortunately.  (Way too much data for CCS to process!)  Very saddening.  So, okay, I let my program run (each core is just spinning on a single task which is doing a simple filter operation and then sleeps for 1 second and then repeats) and just launched the System Analyzer telling it to only collect data for 2 seconds.  It will go through the motions and then start churning through the collected records, but then without fail, I'll get an emulator error about not being able to read the target memory.  Says target is not responding to request.  (Address cited is always within the load logger's memory section.)

Well, this seems to only happen if I enable EVERYTHING.  If I leave hwi and swi logging disabled, I don't get the error.  Anyway, my hardware is thoroughly sound (we brutally stress test it, including the JTAG path). In the months I have been working with this particular board, I've never had emulator problems until now.  (Maybe MSA is even more brutal of a stress test, eh? :-) )  I lowered the JTAG's TCLK clock down to the legacy 10 MHz, and I changed the TMS/TDO output timing to the standard falling edge, but still I get the error.  Any ideas?

I really need to be able to monitor the HWIs and SWIs as those are what are driving the entire system, and I need to be able to measure and visualize the latency between the incoming data and the plethora of tasks I will have processing that data before the next frame comes in, so leaving hwi and swi monitoring disable just so I don't get this emulator error is not a viable option.  Could it be my misconfiguration of the log buffers above?  Perhaps more is not better, I mean.

 

Edit: Oh, forgot.  For completeness, here is my little CreateLogger() function:

////////////////////////////////////////////////////////////////////////////////
//  Function:       CreateLogger()
//  Description:    Helper function used to create a logger.   
////////////////////////////////////////////////////////////////////////////////
function CreateLogger( p_nTxfrSize, p_szLoggerName, p_szSectionName )
{
var logger;                             //  logger object
var loggerParams;                       //  logger object properties
var hLogger;                            //  handle to new logger


    logger                           = xdc.useModule( 'ti.uia.runtime.LoggerCircBuf' );
    loggerParams                     = new logger.Params();
    loggerParams.transferBufSize     = p_nTxfrSize;
    loggerParams.bufSection          = p_szSectionName;
    Program.sectMap[p_szSectionName] = new Program.SectionSpec()
    Program.sectMap[p_szSectionName] = OffChip;
   
    hLogger                          = logger.create( loggerParams );
    hLogger.instance.name            = p_szLoggerName;
   
    return( hLogger );
}

  • Come on.  Silence?  Okay, you can skip the emulator read error.  How about the Out_Of_Seq Data warning and the logger memory sizes?  Those should be easy to answer.

  • Does TI not officially support their Multicore System Analyzer component or something?  As it is now built into CCS v.5.1 M6 and above, I find this highly unusual.

  • Hi Alex, Sorry for the delay in replying!   TI does support both Multicore System Analyzer and UIA. 

       Re: First, if I just run continuously and have everything disabled but the main logger, I keep getting "Out_Of_Seq Data" warnings.  (Each core is just printing out a benchmark number every 1 second.)  I have the diableMulticoreEventCorrelation is set to false, so other than that, is there some other setting that I need to be setting, or is this just one of those informational only type warnings that I should just ignore?

    Are you using JTAG as a transport or Ethernet?   Could you post your .cfg file?

    Re: So for the main logger, if the transfer buffer size is set to 32K as shown, what do I need to set LoggingSetup.mainLoggerSize to?  (not to mention the Event size, too, which I believe just defaults to 128 MAUs)

    LoggingSetup.mainLoggerSize should be set to the same number you used for loggerParams.transferBufSize for the main Logger.

    Re: Could it be my misconfiguration of the log buffers above?  Perhaps more is not better, I mean.

    Yes, it could be due to a misconfiguration.  If the data structures don't map into real memory, then the write pointer for the buffer could be invalid and cause bad memory writes.  The fact that the target is not responding to requests often means it has gone into the weeds. 

    I'm in the process of setting up an EVM so that I can better work through these things with you, and will post when I have more info.

    Regards,

      Brian

  • Hi Alex,

       I've got a C6474 EVM with Mezzanine at my desk now. 

    Re: the sequence errors:  From your previous posts, it looks like you have the NDK installed so I'll assume you're using Ethernet as a transport.  One of the things that may be happening is that, if multicore event correlation is enabled, System Analyzer waits for 'synchronization events' in order to correlate events and if it doesn't get a valid sync point event, will tag the events as being 'out of sequence' because it can't correlate them. 

    Do you see the out of sequence events with the following configuration?

    • LoggingSetup.disableMulticoreEventCorrelation = true,
    • LoggingSetup.mainLogging = true,
    • LoggingSetup.sysbiosHwiLogging = false,
    • LoggingSetup.sysbiosSwiLogging = false,
    • LoggingSetup.sysbiosTaskLogging = false,

    One other problem you'll encounter is that UIA does not come with a global timestamp module that is preconfigured to work with the C6474.  You should be able to configure the ti.uia.family.c64p.TimestampC6472Timer module to use on the C6474, however.  I'm testing this out right now to verify that it works.

    Regards,

     Brian

  • No, actually, I am just using JTAG, not Ethernet. I do have ethernet available to one of my three DSPs, and that's good enough for what I am testing now, but I would need a step by step guide describing how to set that up, but the MSA documentation on the subject is written, it seems, for an audience who already knows how do it, not for a beginner. So, for now, I'm just using JTAG.

    Yes, in another application I have taken the NDK and your HelloWorld application and modified to work with my specific PHY, but that's the extent of my expertise with NDK so far. :-) It's still on my task list to merge that into my application here (that I doing the load balancing on with your MSA), but I haven't yet. But even if spend a day or two doing that, I would still have no clue how to utilize it or configure it to work with CCS.

    Anyway, about the .cfg file, it's a program in and of itself, so'll just zip it up and attach it rather than paste it here.

    The other thing is that I changed my .cfg file to reflect what you stated above (BufSize = TransferSize), but things just got way worse. Just trying to capture a single execution of my lone debug task at the start is just too much. With this increased buffer size, I'm now losing 400,000 records (51 gaps), and the task execution data is just not showing up at all (probably in the lost records, eh? -:P) I'm including the session data for you to load in and see for yourself.

    ...

    bah.. Maybe it's a proxy thing, but I never can upload anything to this forum from here at work.  I'll upload the zip file tonight when I get home.

  • [promised attachment: .cfg file and MSA session capture]

    0525.Cfg_and_MSA Session.zip

     

    Oh, and yes, with LoggingSetup.disableMulticoreEventCorrelation = true, the out of sequence error seemed to go away as you predicted.  Hopefully, I can use the 6472 timestamp module as you suggested, but I might need some tips from you how to properly set that up.  Technically, I am already using all six of my timers (three for BIOS, and three for my apps, but I guess I can go back to having BIOS share timer 0 across all three cores which would free up 1 and 2.

     

    Edit:  Okay, switching to JTAGSTOPMODE rather than JTAGRUNMODE prevented the data/packet loss (obviously), but as I asked in my other thread, the timings are all wrong.  (My 11 ms task, as shown in the log, is measuring 1.5 us in the execution graph. [see attachment below])  Undoubtedly, this is probably related to the the timestamp module not being configured for my C6474, so if you can help me get that working, I probably should be in business. :-)

    2072.systemAnalyzerData.zip

    (first thread starts around 510 us, and the other two start 634 us -- NOTE: when loading from the binary, the Pong_3B core does not show any data... it did when I captured it.  Guess I found a bug?)

  • Thanks Alex.

    Please note that, on page 4-6 of the System Analyzer User's Guide (spruh43b), there's the following disclaimer:

    What this really means is that it doesn't work out of the box (i.e. you need to add some additional API calls to your code), there's no documentation on it, and it has not been tested on a broad range of platforms so there might be some bugs encountered along the way.  If you're ok with that, I'll do my best to get it working for you.   We do intend to make this a fully supported feature in an upcoming release. 

     Another problem is that the base addresses of the Timers for the C6474 device are different than those used by the C6472.  The TimestampC6472Timer module uses an enumeration type to define the addresses of the 12 timers on the C6472 device, and it's not clear if we can assign the address of the C6474 timer to the TimestampC6472Timer.timerBaseAdrs config option without getting compile errors.  I'm working on it now - if I can't get a clean way to configure this, I might be able to provide a custom TimestampC6474Timer module that you can use for your application.   I'll post again once I've got something worked out.

    Regards,

      Brian

     

  • That would be great.  This timing analysis, particularly across cores, is critical to my application, so I by no means mind any type of "hacking" I may need to do to get it working properly.

  • Also, maybe this is just a bug in 1.0 and has been resolved in 2.0 -- meaning I am out of luck -- but is there some special trick in getting the Execution Graph to sort the tasks in a consistent order?  (The little up arrow only sorts the cores, not the tasks within.)

     

    Pong_0B: Task_Idle(), Task_Sync(), Task_Debug(), TSK_Idle()

    vs.

    Pong_0C: Task_Idle(), Task_Debug(), Task_Sync(), TSK_Idle()

  • Alex,

    Currently the graph only sort at the channel level (which in this case is the CPU). We'll file an enhancement to extend sorting to the state (in this case the task) level.

    If this is critical to your analysis we can look at getting you a patch for CCS5.1 when the enhancment is available.

    Regards,

    Imtaz.

  • It's not critical, per se.  It just makes it more difficult to visually inspect the Core-to-Tasks load balancing.  Admittedly, though, until I can get the timestamp module working, it's sort of a moot point. :-)

  • Hi Alex,

       I've created a new ti.uia.family.c64p.TimestampC6474Timer module - you can install it as follows:

    • make a copy of your uia_xxx\packages\ti\uia\family\c64p folder and save it as a backup, in case you want to revert back to it later
    • unzip the attached c64p.zip file into your uia_xxx\packages\ti\uia\family folder.  This will install the new module
    • make a backup copy of your uia_xxx\packages\ti\uia\runtime\LogSync.xs file
    • unzip the attached LogSyncXs.zip file into your uia_xxx\packages\ti\uia\runtime folder.

    You'll then need to add the following to your .cfg file:  [...

    // By default, the sync point events will be logged to a dedicated
     // LoggerCircBuf buffer named 'SyncLog' that is assigned to the LogSync
     // module.  Using a dedicated event logger buffer is recommended
     // in order to ensure that sufficient timing information
     // is captured to enable accurate multicore event correlation.
     // Configure LogSync.defaultSyncLoggerSize to specify a custom buffer size.
     var LogSync = xdc.useModule('ti.uia.runtime.LogSync');

     // Configure a shared timer to act as a global time reference to enable
     // multicore correlation.  The TimestampC6474Timer module implements the
     // IUIATimestampProvider interface, so assigning this timer to
     // LogSync.GlobalTimestampProxy will configure the LogSync module's global
     // clock parameters automatically. 
     var GlobalTimestampTimer =  xdc.useModule('ti.uia.family.c64p.TimestampC6474Timer');

    GlobalTimestampTimer.timerBaseAdrs = GlobalTimestampTimer.TimerInstance_Timer2;  // select which timer you want to use
    var adrs = GlobalTimestampTimer.timerBaseAdrs;
    print("GlobalTimestampTimer.timerBaseAdrs = 0x"+Number(adrs).toString(16));  // output it to the console to test that it is ok - see sprs552h for Timer addresses

    // Configure the speed of the device's bus clock
    GlobalTimestampTimer.maxBusClockFreq.lo = 1044000000;
    GlobalTimestampTimer.maxBusClockFreq.hi = 0;

    LogSync.GlobalTimestampProxy = GlobalTimestampTimer;

    ...]

    You're idle function should be configured to log a sync point event when required:

    #include <ti/uia/runtime/LogSync.h>

    Void myIdleFunc(){
          if (LogSync_isSyncEventRequired()){
              LogSync_writeSyncPoint();
          }
     }

    I still haven't got things logging sync points properly on the C6474 EVM board yet - they seem to be disabled for some reason.  I'll post when I figure out what else  is required.  You might want to wait until then, as some of the above might need to be changed.

     Regards,

        Brian

     

  • You're awesome!  I'll test this out later tonight and report back.

     

    Update #1

    So I was too excited to wait as you suggested, and here are my findings:

    (disturbing news) I had to increase my LoggingSetup.sysbiosLoggerSize to 2 MB to stop losing packets and getting (still) those Out_of_Sync warnings.  (I chalk those Out_of_Sync warnings to the record loss, though, unless you say otherwise.  With the 2 MB buffer and zero packet loss, I no longer get those warnings ever.)
    (disturbing news) [Eclipse Related] I can only make two runs (run... halt... restart [app and then MSA]... run... halt) before CCS crashes complaining about a Java heap error.  (And yes, I have already increased my heap size to 512 MB in the eclipse.ini file.  I'll try 1 GB.)
    (good news) My test task now is synchronized across all three cores.
    (great news)

    The timings are spot on! (as far as I can tell, give or take a few hundred microseconds)

    Interestingly, setting LogSync.globalTimestampCpuCyclesPerTick to 1 rather than leaving it at its default value of 6 seems to make no difference as far as timing measurements go (task measures 11.17 ms which is about right and the sleep time measures 99.84 ms which is about right [should be 100 ms])  But it does make a difference in the inter-core task synchronization: with 6, their start times are staggered by a few hundred microseconds (which is what I would expect, I guess), but with 1, they all start exactly at the same time (which is not very believable).

    So, in summary, the timing is now fixed (thanks to your new Timestamp module... THANK YOU!), but the synchronization still seems a little bit off.  It's probably good enough, though, I don't know.  I have 20 ms to do a lot of data processing and analysis (hence the need for the 9 total cores), so being off even by a millisecond can be detrimental.  Still, though, even "as is," this is of enormous benefit to me, and I thank you wholeheartedly for all the effort you have put into this.

    BTW, like you, though the Idle task is running (i.e., not perpetually blocked), LogSync_isSyncEventRequired() is never returning true, and no Sync points are getting written.  I just mention it since I am running on my own platform and you wouldn't think it was just an EVM thing.  (Not that you do...)

     

    Update #2

    Sync points become required (and do get written) when eventUploadMode is set to JTAGRUNMODE (as opposed to JTAGSTOPMODE):

        LoggingSetup.eventUploadMode = LoggingSetup.UploadMode_JTAGRUNMODE;

    Attached below is my .cfg file (I added a couple of Sync related options, including the CyclesPerTick mentioned above) and my captured session:

    2804.Cfg_and_SystemAnalyzerData_005.zip

    By the way, are you even able to open my MSA binary file, or is that tied to my specific debug session and platform?

  • Okay, I have been trying to figure this out all morning, and I'm stumped.

    As I reported above, with your new UIA C6474 timestamp module, everything was working hunky-dory this weekend on my platform at home.  Today, though, here at work--same platform, same app, same CCS tools, same UIA patch [provided by you]--there are no records being record at all for any task other than Idle task I spawn that runs at the end of the test sequence.  This is in the Execution Graph.  Look over at the message log pane, though, and everything is executing properly and in order.

    I'm attaching my MSA session and .cfg file.  You didn't answer my question above about whether or not you are even able to load these, but I hope you can so you can see what I mean.  If not, let me know, and I guess I post screen shots.

    7752.systemAnalyzerData.zip

     

    So you can follow the message logs easier, here's the application sequence I am using for this test:

    main() --> Bios_start() -->
    Task_Sync() [waits for everyone to initialize themselves]
    [Task_Debug() --> Task_sleep()] x 3 on all three cores
    Task_Idle() [here's where I put my breakpoint to collect/view the trace data]

     

  • (Forum is acting odd.  I posted this reply, but it magically got added to my other thread.  I'm trying again, but now the forum is forbidding duplicate threads...)

    I think I figured out the problem, but I do not know the cause.

    1. The Execution Graph always goes from 0 ms to 100 ms, exactly, every time.  (symptom of problem)
    2. I just inspected the log messages more closely, and every single timestamp is identical.

    So it seems the timer I am telling the GlobalTimestampTimer object to use (TimerInstance_Timer5) is not being initialized.

     

    Dumb question, but since it was working this weekend without me doing anything special, maybe not so dumb:  Will the GlobalTimestampTimer object create the timer, or do I need to do it myself?  Like with a clock object, is there a way to cite an owner so all three cores don't try to initialize it at the same time?

  • Hi Alex,

        I was out of the office on Monday - just going through your posts now.   The 'dumb question' in your last post is not dumb at all!  The GlobalTimestampTimer objects were originally intended to be O/S independent and do not currently leverage the BIOS timer management facilities, so it is very possible for BIOS to think it owns the same timer that the GlobalTimestampTimer has been assigned to use.    (I'll look what's involved in having the ti.uia.sysbios.LoggingSetup module reserve the timer used by the global timestamp module and will post an update when I know more).   You shouldn't create the timer.   The way the TimestampC6474Timer module is written, it should initialize the timer registers upon application startup.  It's ok for multiple cores to initialize the same timer registers as long as they use the same configuration values.    

    Also, the way the LogSync_isSyncEventRequired() API works is that it checks to see if the global variable ti_uia_runtime_LogSync_gNumTimesHalted is different than it was the last time it read it.  If it has changed, then it logs a sync point event.  This should cause the API to return true the first time it is called AND whenever the JTAG transport module increments the global variable (which should happen whenever CCS halts the target due to the target hitting a breakpoint or being halted by the user).    If System Analyzer connects to the target after the LoggerCircBuf buffer used by the LogSync module has wrapped around, then it will not get a sync point event.   A simple way to avoid this type of scenario is to periodically log a sync point event (e.g. every 100ms) by calling LogSync_writeSyncPoint() directly. 

    I'm looking into some of the other points you raised in your previous posts, and will post an update when I have more info.

    Regards,

      Brian

  • This timer still has me stumped.  Went home, and again, everything worked great.  Back in the office now, and nothing works.  Exact same code and hardware, so the only culprit left is maybe my UIA installation.

    I have BIOS sharing timer 3 across all three cores, and I have the timestamp module using timer 5.  I don't have a screenshot of my log from home, but here, you can see that the timestamps are erroneous:  same value across the board, and the value is way too large.  (This is after a fresh power-up, i.e., values should be starting around the 10,000,000,000 range tops, not 9,223,372,036,854,775,808.

  • First run, so I say this with trepidation, but I switched to Timer 4 and switch back to JTAGSTOPMODE mode, and everything somewhat started working again.  The only snag is now I am getting those Out_Of_Seq Data warnings again.

     

    Update:  I found the culprit.  It was Timer 5.  My colleague, who independently wrote our little boot-loader, was having BIOS (bootloader uses BIOS5... don't ask) use Timer 5 as its kernel timer.  So I guess this is just an issue of the timestamp module not properly reinitializing it.

  • Hi Alex,

       Re:  By the way, are you even able to open my MSA binary file, or is that tied to my specific debug session and platform?

    I can open the binary file, but since I don't have your .out file or the UIA metadata files (the *.uia.xml and *.rta.xml files in the <Debug or Release>\configPkg\package\cfg\ folder of your project)  there is not enough information to resolve any text strings that were logged.   Saving the data as a .csv file is probably the simplest way to capture event data in a way that saves the message text.  (Right click on the System Analyzer Live Session: Logs view table, select Data / Export All, specify a file name, ensure "Save as type" field is *.csv and save.

    Re: The only snag is now I am getting those Out_Of_Seq Data warnings again

    Could you try to add a call to LogSync_writeSyncPoint() in your code somewhere near where you want to start analyzing the data, and after the point in your code that the target halts after being loaded? 

    Re: timestamp values of 9,223,372,036,854,775,808

    This equals 0x8000000000000000 - I'll check to see what significance this has (might be due to no sync point event being detected.)

    More later...

  • Alex,

       I've created a test project that uses most of your .cfg settings, and it looks like the sync points are working ok if you log them explicitly using LogSync_writeSyncPoint().  I've also read up more on the C6474 bus architecture (sprs552h - page 51) and, since the timer is driven by a CPU/3 clock frequency (generated by the PLL1 controller), the GlobalTimestamp module needs to be configured with cpuCyclesPerTick = 3 (instead of the default value in the module, which was 6).   There's also a bug in the module that prevents you from manually configuring this value, so I've attached an updated version of the TimestampC6474Timer.xs file which fixes that - please unzip it into the <uia install dir>\packages\ti\uia\family\c64p folder.  3583.TimestampC6474Timer.zip

    You'll then need to add the following line to your .cfg file:

    GlobalTimestampTimer.cpuCyclesPerTick = 3;

    I'll fix the default settings for the module and will post the updated timer tomorrow.

    Regards,

      Brian

  • Thanks for the update.

    I updated the file and made the .cfg change, and everything seems to be still working, but now things are slightly off compared to before.  My point of reference is BIOS's kernel clock tick.  Since I have all three cores sharing the same timer, it makes sense (to me at least) that if CCS was properly synchronizing the data from each core, then the tick() task should line up across all cores.  Before, they did (give or take a 10 or 20 nanoseconds).  Now, with the new patch and setting, there is an 80 us gap between Core 0 and Core 1.

    So my question is since the shared timer will generate an interrupt to each core at the same time, why don't SYS/BIOS's 1 ms clock ticks line up in the execution graph?

    May be hard to make out, but...

     

    Edit:  (At home... haven't applied your new patch or .cfg setting yet.) Okay, I guess the alignment wasn't as close as I thought:

     

     

    But 80 us vs. 2 us actually has huge impact on my analysis as part of my load balancing deals with pre-processing the data we are receiving externally every 200 us. So my question still stands.  Is this timing accurate, and if so, why aren't they aligned (since the kernel HWI tick on all cores is being driven by a single, shared interrupt line).  If not accurate, to what degree of accuracy should I expect?

    BTW, BIOS's timers (kernel and now timestamp) are the only HWI's in the system active at the time of the above snapshots.  But even if the interrupts all happened at the same time, I wouldn't expect 2 us worth of latency, much less the 80 us I get with the new patch.

  • I struck out my reply above because I am chasing a red herring here.  My apologies.  Because of the way the Execution Graph cuts off (clips) the task name, I did not realize until I accidentally moved my mouse over it and saw the tooltip that the kernel tick task was not the core HWI task, but just the worker task, and it only seems to be called when the system is idle.

    (Sorry for the width.  Click it to see the full image.)

     

    It's still puzzling, though, why the new timer settings shift things.  The absolute values should change, sure, but relative to each other, since the application did not change, neither should have their relative offset.

  • After further analysis, it's clear that the cores are still not fully synchronized.

    8863.Logs_and_Graph.zip (.csv export of above graph and also event logs)

    With respect to the Task_Debug() task, there is an IPC notification event generated at the end of Task_Sync() making it impossible for Pong_0B (core 1/slave) and Pong_0C (core 2/slave) to begin before Pong_0A (core 0/master), yet the above shows differenently.

    Also, if you review the log messages, (filter on printf event and "Task_Sync" in the message), whether you sort by timestamp or by sequence ID, it is showing the slave cores receiving notifications before the master even sends them.

     

  • Thanks Alex - I'm looking into it and will post when I have more info. 

    Regards,

       Brian

  • Alex,

       What is the frequency of the CPU clock on your board?  Also, what platform file are you using (e.g. ti.platforms.evm6474)?

    Regards,

      Brian

  • 1 GHz, and the platform is a custom platform.  But the platform just defines the device (TMS320C6474),  frequency (1000 MHz) and memory layout (not applicable to this problem).

    So even making the synchronization tighter by polling memory flags and putting EVERYTHING onchip (to remove DDR resource contention), the graphs line up better (21 us max differential [see image below]), but the out of order log events still concern me. (Having a received event logged [both by timestamp and seqID] before it is actually sent just seems wrong to me.)

    3288.Logs_and_Graph.zip

     

  • Alex,

       I've been writing up additional unit tests for the time correlation code and it looks like there are some timing errors being introduced due to truncation.  The fact that the Global timer runs off of a CPU/3 clock means that it's frequency is 333333333.333... Hz, which gets truncated to an integer value of 333333333 Hz.    Unfortunately, it doesn't look like it fully explains what you are seeing.  The truncated vs. corrected timestamp values for the Task_Sync(): (rcvd) events in your .csv file are as follows:

    Core name Truncated  timestamp Corrected Timestamp
    Pong_0A
    1216158549136
    1216158547922
    Pong_0B
    1216158263815
    1216158262602
    Pong_0C
    1216158002383
    1216158001169

    So the corrected event timestamps don't change the order of the events.  (0C, then 0B, then 0A)

    I'm going to do some further tests to verify that the hardware timer module's frequency is actually what I've inferred it should be from the documents, and will post the results of that test.

    Regards

      Brian

     

  • I believe it actually does use the 1/6-rate clock, not the 1/3.  SPRS552H, Section 7.8.1.1, p.118:

    SYSCLK10: 1/6-rate clock (chip_clk6) for other peripherals (PLL Controllers, McBSPs, Timer64s,
    Semaphore, EMAC, GPIO, I2C, PSC) and L3 ROM.

  • Changing back to your original UIA patch where the timestamp defaults to 6 (and explicitly setting it has no effect), we are almost there. Core 0 and 1 differ only by 40 ns, but Core 2 is pushed out ~5 us.  In the log, if sorting by seqID, things are still out of order, but if sorting by timestamp, then everything is being logged in the order order.  Could this now be just a matter of LogSync_writeSyncPoint() not being called at the right time?

    Keep in mind all three cores are running the same application image, and everything is on-chip, and all three cores are sharing the same timer, so can't think of why Core2 would stall for 5 us, hence why I think (but may be wrong) this is still a MSA synchronization issue.

    I'll experiment more after lunch.

    7382.Logs_and_Graph.zip

  • Thanks Alex, you're right on: cpuCyclesPerTick = 6 for the global timer on the C6474.  (Just to be 100% sure, I've verified it by writing some tests that run on the 6474 EVM.)

    Also, the System Analyzer event correlation code has been updated to avoid the truncation error problem I mentioned earlier - the fix should be available in the upcoming 5.1 GA release.   This will reduce the amount of low-level 'jitter' in the timing correlation. 

    Re: Core 2 is pushed out ~5 us

    Does your application write anything to the console (i.e. use printf instead of Log_printf)?   The way that CCS is able to display text on the console is through a hidden breakpoint that is triggered when text is to be output to stderr or stdout.   Hitting a breakpoint will cause the timing relationship between the CPU's local timestamp and the global timer's timestamp to shift, breaking the correlation between the cores. 

    One thing you might want to try is to add a call to LogSync_writeSyncPoint() at the same place in your code that you currently log the "Task_Sync(): (rcvd)..." event.  This should 're-calibrate' the local CPU timestamp and global timestamp relationship, ensuring that it is accurate and up-to-date at that point in time.  This, in turn, should allow you to check whether the application code is behaving in the way it should or not.

    Regards,

      Brian

  • No, my app is deliberately console-free:  no printf's or System_printf's.  I am, though, using SysStd rather than SysMin, and maybe something hidden is going on.  I'll try SysMin.

    I also already have a sync-point after the point they are all truly aligned (execution-wise).  Here what I have at the end of my Task_Sync() task:

        //--------------------------------------
        //  Now Wait for Others
        //--------------------------------------
        *pbReadyFlag[MultiProc_self()] = true;
        while( (*pbReadyFlag[nOtherCore[0]] != true) && (*pbReadyFlag[nOtherCore[1]] != true) )
        {
            NOP(1); //  spin
        }
        LogSync_writeSyncPoint();


        //--------------------------------------
        //  Signal App to Begin Execution
        //--------------------------------------
        Semaphore_post( hSem_Run );


        //--------------------------------------
        //  Terminate This Task
        //--------------------------------------
        Task_exit();

    I short, after all the IPC synchronization, every core has to set its Ready flag to begin, and theoretically, all three cores should be calling Semaphore_post() at the exact same time.  Again, since everything is on-chip and there is no contention anywhere, I don't know why Core 2 would stall unless, I guess, there as a hidden breakpoint like you suggested.  I'll also turn off the "Enable CIO function use (requires setting a breakpoint)" option in the debug configuration.

     

    Update:  SysMin/No-CIO didn't help; new sync point didn't help; and even taking Core1 out of the equation (by having it spin instead of posting the semaphore) didn't help.  Core 2 always lags Core0 by ~5 us.

  • Hi Brian!

    I'm working with the MCSA and the EVM C6472, so I want to verify if the multicore correlation is working properly and also I have an issue regarding the CPU and Task loads, since I can not see this information displayed on the analizer.

    Can you have a look to my post?

    http://e2e.ti.com/support/embedded/f/355/p/140223/506251.aspx

     

    Thanks,

    Miguel

  • Miguel,

        I'm setting up a C6472 evm to look into the problems you've reported - I'll respond further on your other thread.

    Alex,

        Could you take a look at the global timestamp values that are reported in the sync point event logged at the point where they should all be aligned:

    Sync Point:  SerialNumber=0x1, CpuTStamp [LSW=0xb54ae63, MSW=0x278], GlobalTStamp [LSW=0x62de0c7a, MSW=0x5e], CpuCyclesPerCpuTimerTick=1, CpuFreq [LSW=0x3b9aca00, MSW=0x0]

    This 64b value is directly read from the hardware timer you've configured for use by the TimestampC6474Timer, so by comparing the timestamp values from all 3 cores you should get a clear picture of what the time alignment between the tasks is.  Since it's a CPU/6 timer frequency and your CPU = 1GHz, multiplying the difference between the value logged by one core to the value logged by another by 6 will give you the number of nanoseconds elapsed.   Does the value you calculate this way match what System Analyzer is telling you?  If so, then we can try to dig into what could be stalling that task.  If not, then there may be a bug in System Analyzer that is skewing the results, and I'll try to figure out what is going wrong there.

    Regards,

      Brian

     

  • So I have two sync points per core:  one called during my Idle task while my Sync_Task() is pending, and another at the end of the Sync_Task() immediately after all three cores have updated their flags.

    So using my last sync point as my reference, the Global Timestamp values match somewhat what I see in the execution graph:  Core 1 starts 216 ns later, and Core 2 starts 8928 ns later.

    So what's the Local Time (a.k.a. CputTStamp)?  Isn't that just a reading of the TSCH:TSCL register pair?  Those all match up pretty much exactly (within 72 ns) which is what I would expect since they are all from the same DSP, i.e., each core powered-up at the same time.

     

    Same subject, different problem.  So at this stage, I am not really going to fret the 8.9 us discrepancy.  I would greatly like it to match, don't get me wrong, but I got bigger fish to fry at this point. :-)  With everything enabled (main, SWI, HWI, and task), the application will only run for maybe 15 ms before I start getting Out_of_Seq errors.  (This is with a 2 MB buffer.)  Worse, if I throw a second DSP into the mix (6 total cores), forget it.  MSA utterly hangs or crashes.  (Soon, like within a week or two, I will need to monitor 9 cores for at least 50 ms.)

    So for the load balancing and task monitoring with everything on, sure, I understand that the system is being hammered with task switch messages, so maybe it's just a matter of massively increasing my buffer size.  What I don't understand is that even if I turn everything off except for the main logger, and then only have each of the 6 cores spam a benchmark score once every five seconds (I put the task to sleep for 5000 ms), I still get Out_of_Seq warnings--again, though, only if I add a second DSP to the mix.

    So I guess the question is now that we almost, kind of, sort of got 3 cores on a single DSP to sync (thanks to your awesome new C6474 timestamp module), how do I get two separate DSPs to sync (in the eyes of the MSA, I mean)?  Oh, and if the gaps and Out_of_Seq warnings are a resource/bandwidth issue, I can begrudgingly disable all SWI & HWI task monitoring, but I definitely need Task monitoring since, you know, it's kind of required to perform any kind of effective load balancing analysis. :-)

  • I was going to start a new thread and report this as a bug, but Miguel doesn't seem to have a problem (link), so it must be must my system.

    I've tried twice now to install RC1, once without any prep-work, and again after uninstalling all things TI (hoping it was a conflict or something).  Both times, though, the UIA component is completely broken in that I always near instantly get a Corrupted Data error once I start running my application:

    Exported Log of Above:  0726.UIA_CorruptedData.log

    I have to completely uninstall RC1 and then reinstall M7 to getting things working again.  (I can't install M8/8.5 since they don't recognize the NDK package.)

    I'm running under Windows 7 (64-bit) if that makes a difference.

    Oh, and I tried creating a new workspace, too, but that didn't make a difference.

     

    Update:  It's the Silicon Real-Time Mode, or, at least, it's related.  I have that off by default (because it's been buggy ever since CCSv.4), but I just noticed that when I first open an MSA session, it gets auto-enabled.  If I disable it again, things change.  A couple of times, it actually worked fine.  Other times, it was a different data corruption error (complaining about the payload size), and other times CCS just completely hangs (indefinite hourglass).

  • Update #2:  Alright, so unlike this weekend, I was able to switch back to my M7 install without uninstalling RC1, and everything is working fine again.  Same hardware, same JTAG emulator, same project, same workspace, and UIA 1.00.02.22 gives me data corruption and UIA 1.00.01.17 doesn't.

    Next test, I'm going to copy UIA 1.00.01.17 into my RC1 install directory, switch to it, and see if the problems persist.

    Update...

    Using UIA 1.00.01.17 under RC1, now the Trace Server crashes every time I launch the debugger.  I can close it and continue, but I still get the Data Corruption error, undoubtedly due to the now non-existing trace server.

    I guess I am stuck with M7 forever.

  • Can you please share your workspace and files from .TI-trace directory (in user directory)?

    We need some details to help resolve this.

  • Hi Alex,

       Wow, apologies for all of the RC1 problems you've hit!  The Trace Server is actually separate from System Analyzer and UIA, so should be able to safely ignore it.  (It's only relevant to CPU trace - this problem has been flagged and is under investigation. ) 

    One other problem is that the wrong release notes were included with UIA_1_00_02_22.  The correct ones are attached:

    5557.UIA_1_00_02_22 release notes.zip 

    Also, an important change was not documented in the release notes:

    • The whole_program and whole_program_debug build-profiles are no longer supported (not just for UIA, but also for other packages such as IPC, sysbios etc.).  Please use release or debug instead.
      • This was done to cut down on build time, address toolchain and compiler compatibility issues, reduce install footprint and download size, but unfortunately will increase the overhead associated with logging an event and the target code footprint w.r.t. older versions of UIA that supported the whole_program build profile. 

    I've forwarded your error logs to the appropriate developers here, and will post when I have more info. 

    On a more positive note, in response to one of your earlier posts (http://e2e.ti.com/support/development_tools/code_composer_studio/f/81/p/137922/500184.aspx#500184 ), I've posted a tutorial on how to get the Ethernet transport working with UIA that covers the C6474 device: it's available at http://processors.wiki.ti.com/index.php/System_Analyzer_Tutorial_3B .

    (BTW, please note that automatic email notification for threads like this doesn't get triggered when you update an existing post.  For faster response to your posts, please use 'reply' instead of editing the last post.    Otherwise I won't see your updates until I manually browse through the thread.  )

    Regards,

      Brian

     

  • Vikas,

    I'm back in the office now, so I'll post them tonight.  That said, though, I have already uninstalled RC1 and have been using my working M7 install all afternoon.  I didn't change workspaces throughout this whole endeavor today (it was last weekend I tried a new workspace), so I'm not sure if it will be of any use.  (I'll still upload it, though.)

     

    Brian,

    Great tutorial!  Awesome!  Thanks.  I'll test it out tonight with my EMAC project.  If it works (no reason it shouldn't, I guess), I'll merge my EMAC kernel into my application ASAP.

    About the build type, I've always used Release.  In fact, the only two options are Release and Debug.  (I'm referring to my RTSC settings.)  I haven't seen "whole_program" for quite a while now come to think of it.

  • Thanks. I look forward to have those files.

    I need some more information to help on this the crcash issue.

    - Which version of Windows are you using?

    - Which emulator are you using?

    - Did you install RC1 in the same directory as M7? (I believe the answer is no).

    - Was there any manual change in PATH environment variable (CCS related and beyond what gets done with CCS installer)?

  • Here are the directories you requested, Vikas.

    2337.WS_and_TI-trace.zip

    OS: Windows 7 (64-bit)
    Emulator:  Blackhawk USB-560 (Part # BH-USB-560m)
    Install Directory:  New Directory (full install, not an upgrade or repair)
    PATH changes:  No

    About the install, I do disable the EzFlo thing (not relevant I would think), but, also, I only install the Blackhawk and Spectrum Digital emulation drivers and uncheck your XDS ones.

  • Brian,

    Thanks for writing up that tutorial.  Worked like a charm! Who would have thought it only would only take 7 lines of code:

    var ServiceMgr           = xdc.useModule('ti.uia.runtime.ServiceMgr');
    var NdkConfig            = xdc.loadPackage('ti.ndk.config');
    var Global               = xdc.useModule('ti.ndk.config.Global');
    var Ip                   = xdc.useModule('ti.ndk.config.Ip');
    var Udp                  = xdc.useModule('ti.ndk.config.Udp');
    ServiceMgr.transportType = ServiceMgr.TransportType_ETHERNET;
    Ip.address              = "192.268.250.250";

    Amazing!

    When I incorporate my Ethernet driver into my main application (I tested the above with my HelloWorld demo app), no doubt it will help tremendously when dealing with UIA messages coming in from all nine cores.  (1 Gbit/s vs. 150 KB/s... yeah, I think it will.)

  • Thanks.

    I was not able to reproduce the crash with the workspace and other cache files. Will continue to dig down deep and see if there is any other way to reproduce this. There may be some PC specific dependency here (like firewall or antivirus software)......... not fully sure until reproduce it.

     

  • By any chance, were you using CCS 3.3 and CCS3.3/cc/bin is in your PATH? Just a query.

  • I do have CCS 3.3 installed, but no, it's not in my path.

  • Ok thanks.

    Is there Windows firewall active in your system?

  • Not Windows (I usually turned that off), but I run Norton Internet Security.  I'll check when I get home.

    (In case you haven't been able to infer this, I work 14-16 hours a day, both here at work and then resume when I get home.  To reduce risk, I generally only upgrade at home on a weekend, and only if everything works out do I upgrade my office workstation.)

     

  • Yes, here at home, Windows Firewall is disable and is replaced with Norton Internet Security, or so Windows tells me:

    BTW, I looked at my PATH variable here at home, and though CCSv3.3 wasn't there, I see that CCSv5.1(M7) put itself there in my User PATH.

    As one final attempt, I now completely uninstalled M7, manually uninstalled DVT (since all of CCSv5.1 installers since at least M5 always fail near the end and never finish the uninstall), removed M7 from the path (since the uninstaller didn't), and then installed RC1 again.  Still get the Trace Server crash.  Still get the corrupted UIA data.

    In case it helps, here again are the directories you requested above, taken immediately after the crash and data corruption.  I'm including the export of Eclipse's error log (about the Data Corruption), too.

    6888.TI-trace.zip

    I then disable the firewall, and Trace Server no longer crashes, but that be because it crashed previously and the driver is disabled (and I need to reboot, which I can't right now since I'll lose this post).  But interestingly, at least this first run, UIA didn't error out, but I did get weird results: 3 gaps, out_of_seq warnings,and the last entry is (null) and the timestamp and sequence # are way wrong.

    8686.MSA_Live_Log.zip

    I'll try rebooting now and then also disable the Antivirus and post back in a minute.

  • Rebooted, and again the trace server still no longer crashes.  I'm using Norton Internet Security 2010 (v.18.6.0.29).  I guess it's silently (I get no pop-ups asking me yay or nay) blocking your Trace Server, and your Trace Server gets annoyed and dies. :-)

     

    Unfortunately, though, this time, I got the usual UIA data corruption.  I have a lot of work to do, so I guess I will uninstall again and go back to M7. :-(

    Oh, and though I am getting way off-topic in my own thread, each time I uninstall RC1, I always get the following error (no option to copy text, so sorry for the big screenshot):

     

    And yes, I'm in the Administrator group, and I have UAC disabled.

     

  • Alex,

    Thanks again for all the feedback you're providing. Few things I'd like to mention:

    • The Trace Server crash and UIA data corruption are not related. The Trace Server crash should not be affecting any functionality you are using in CCS
    • We've been able to reproduce the UIA data corruption. Its seem to only occur with JTAG transport (not Ethernet). We're investigating and will let you know when a fix is available.
    • We'll investigate the uninstall issue you mentioned above.

    Regards,

    Imtaz.