MSA/UIA Questions

Alex

Other Parts Discussed in Thread: SYSBIOS, TMS320C6474

A couple of small questions.

I'm at the point in my project where I need to start utilizing MSA's load analysis functionality. Eventually, I need to capture data for 9 cores (3 C6474 DSPs), but right now I am having problems with just a single DSP (3 cores).

First, if I just run continuously and have everything disabled but the main logger, I keep getting "Out_Of_Seq Data" warnings. (Each core is just printing out a benchmark number every 1 second.) I have the diableMulticoreEventCorrelation is set to false, so other than that, is there some other setting that I need to be setting, or is this just one of those informational only type warnings that I should just ignore?

Second, what's the relationship between the logger's transfer buffer size and the size specified in LoggerSetup object? I have mounds of DDR memory (512 MB total, so I gave each core about 170 MB [exclusive, not shared]), so I am creating and assigning my own loggers like so:

LoggingSetup.mainLogger = CreateLogger( 32*1024, "(UIA) Main Logger", ".logger:UIA_Main" );
LoggingSetup.loadLogger = CreateLogger( 64*1024, "(UIA) Load Logger", ".logger:UIA_Load" );
LoggingSetup.sysbiosLogger = CreateLogger( 64*1024, "(UIA) SysBios Logger", ".logger:UIA_SysBios" );

(CreateLogger is just a helper function I wrote to setup the logger. [see below])

So for the main logger, if the transfer buffer size is set to 32K as shown, what do I need to set LoggingSetup.mainLoggerSize to? (not to mention the Event size, too, which I believe just defaults to 128 MAUs)

Finally, when enabling anything other than main logging, I have quickly learned that continuous monitoring is just not feasible unfortunately. (Way too much data for CCS to process!) Very saddening. So, okay, I let my program run (each core is just spinning on a single task which is doing a simple filter operation and then sleeps for 1 second and then repeats) and just launched the System Analyzer telling it to only collect data for 2 seconds. It will go through the motions and then start churning through the collected records, but then without fail, I'll get an emulator error about not being able to read the target memory. Says target is not responding to request. (Address cited is always within the load logger's memory section.)

Well, this seems to only happen if I enable EVERYTHING. If I leave hwi and swi logging disabled, I don't get the error. Anyway, my hardware is thoroughly sound (we brutally stress test it, including the JTAG path). In the months I have been working with this particular board, I've never had emulator problems until now. (Maybe MSA is even more brutal of a stress test, eh? :-) ) I lowered the JTAG's TCLK clock down to the legacy 10 MHz, and I changed the TMS/TDO output timing to the standard falling edge, but still I get the error. Any ideas?

I really need to be able to monitor the HWIs and SWIs as those are what are driving the entire system, and I need to be able to measure and visualize the latency between the incoming data and the plethora of tasks I will have processing that data before the next frame comes in, so leaving hwi and swi monitoring disable just so I don't get this emulator error is not a viable option. Could it be my misconfiguration of the log buffers above? Perhaps more is not better, I mean.

Edit: Oh, forgot. For completeness, here is my little CreateLogger() function:

////////////////////////////////////////////////////////////////////////////////
// Function:       CreateLogger()
// Description:    Helper function used to create a logger.
////////////////////////////////////////////////////////////////////////////////
function CreateLogger( p_nTxfrSize, p_szLoggerName, p_szSectionName )
{
var logger;                             // logger object
var loggerParams;                       // logger object properties
var hLogger;                            // handle to new logger

    logger                           = xdc.useModule( 'ti.uia.runtime.LoggerCircBuf' );
    loggerParams                     = new logger.Params();
    loggerParams.transferBufSize     = p_nTxfrSize;
    loggerParams.bufSection          = p_szSectionName;
    Program.sectMap[p_szSectionName] = new Program.SectionSpec()
    Program.sectMap[p_szSectionName] = OffChip;

    hLogger                          = logger.create( loggerParams );
    hLogger.instance.name            = p_szLoggerName;

    return( hLogger );
}

over 13 years ago

0 Alex over 13 years ago in reply to BrianC

Expert 2430 points

Brian, in regard to using an Ethernet transport, this may have already been fixed, but there seems to be a bug where if I close the MSA session (just MSA, not disconnect or terminate the debugger), I get spammed with endless mmAlloc errors. (May be just info messages... whatever.)

[Ping_3A] 23:48 ( 35%) 15:96 ( 46%) 1:128 ( 4%) 6:256 ( 50%)
[Ping_3A] 2:512 ( 33%) 0:1536 0:3072
[Ping_3A] (15360/49152 mmAlloc: 51/0/46, mmBulk: 6/1/5)
[Ping_3A]
[Ping_3A] 1 blocks alloced in 512 byte page
[Ping_3A] (0000)
[Ping_3A] 3 blocks alloced in 48 byte page
[Ping_3A] (000D) (000B) (000D)
[Ping_3A] 1 blocks alloced in 256 byte page
[Ping_3A] (000A)
[Ping_3A]
[Ping_3A] 00050.747 mmAlloc: PIT Used Sync
[Ping_3A] 00050.748 mmAlloc: PIT Used Sync
[Ping_3A] 00050.749 mmAlloc: PIT Used Sync
[Ping_3A] 00050.750 mmAlloc: PIT Used Sync
[Ping_3A] 00050.751 mmAlloc: PIT Used Sync
[Ping_3A] 00050.752 mmAlloc: PIT Used Sync
[Ping_3A] 00050.753 mmAlloc: PIT Used Sync
[Ping_3A] 00050.754 mmAlloc: PIT Used Sync
[Ping_3A] 00050.755 mmAlloc: PIT Used Sync
[Ping_3A] 00050.756 mmAlloc: PIT Used Sync
[Ping_3A] 00050.757 mmAlloc: PIT Used Sync
[Ping_3A] 00050.758 mmAlloc: PIT Used Sync
...

It's up to 15000 now. :) This was after running the target then opening an MSA session (timed session or manually stopped doesn't matter). After a few seconds of seeing it respond (i.e., display log messages) to my PC application, I closed MSA session.

0 Alex over 13 years ago in reply to Alex

Expert 2430 points

Actually, something just occurred to me which made me very sad. For real-estate reasons, we only have one DSP (of the six on a single card) connected to the PHY (Ethernet). So even if I did merge my NDK driver code into this application (which is what I was working on tonight), it won't do me any good for MSA sessions since it will only be able to communicate with one DSP and not the other five.

I'm stuck with JTAG. Given this, I hope you all find a fix for that Data Corruption problem soon. :-)

0 BrianC over 13 years ago in reply to Alex

TI__Expert 3745 points

Alex,

Re: I hope you all find a fix for that Data Corruption problem soon.

We found the problem - it's in the emulation driver, and will be fixed for the GA release. You can do the following as a workaround to repair your RC1 installation:

1. close CCS and rename the ccsv5\ccs_base\emulation\drivers folder to ccsv5\ccs_base\emulation\drivers_orig

2. copy the ccsv5\ccs_base\emulation\drivers folder from your CCS M8.5 release into the ccsv5\ccs_base\emulation\drivers folder for RC1.

You should then be able to use JTAG Run-mode (or stop-mode) transports with System Analyzer in RC1.

(Thanks for your help and your patience!)

Re: it won't do me any good for MSA sessions since it will only be able to communicate with one DSP and not the other five.

Could you clarify whether by DSP you mean the C6474 multicore device (with 3 C64X+ cores), and that your system has 18 DSP cores, or whether you mean that you have two C6474 devices with a total of 6 C64X+ cores?

FWIW, we've designed the UIA software to support interprocessor communication using MessageQ, with one core acting as 'master' (running the NDK stack) and the other cores acting as 'slaves'. (See http://processors.wiki.ti.com/index.php/SystemAnalyzerTutorial4 for info on how to do this - it's currently under construction, but contains a link to another post as well as to the relevant section of the User's Guide that covers this. I will try to complete Tutorial 4 early next week).

From your previous posts it looks like you are already using IPC - is this configured to work with all of the cores on your board? If so, it should be possible to use this infrastructure to move the event data between the slave cores and the master core. Alternatively, if you have any shared memory that all of the cores can access, it may be possible to locate the LoggerCircBuf event buffers in this shared memory and to have the master CPU core pull out the events directly from these event buffers so that all of the events can be uploaded. If you're interested in some type of approach like this I'll try my best to help you get System Analyzer and UIA to work with it.

Re: if I close the MSA session (just MSA, not disconnect or terminate the debugger), I get spammed with endless mmAlloc errors.

Haven't seen this before. How much external DDR memory do you have? Are you using the same .cfg file you posted earlier? If not, please post the updated one and the .map file and I'll try to figure out what the problem might be.

Also, back to your original problem (i.e. that the master CPU core is responding to the synchronization semaphore later than expected): if you have any HWI or SWI interrupt service routines, do you think that they might be the reason that your master CPU core is being delayed? We can instrument the HWIs and SWIs in a couple of different ways to provide visibility into this if you're interested.

Regards,

Brian

0 Alex over 13 years ago in reply to BrianC

Expert 2430 points

My system has two identical daughter cards, each sporting six C6474 DSPs, so for message logging purposes, MSA will be connected 36 cores. That's the worst case scenario, but how we have the DSPs functionally partitioned, I could live with having the MSA only connect to eight of them: 5 from one card and 3 from the other (so I can track the data exchange), so 24 cores. For load balancing purposes, I only need to connect to 3 of the DSPs, or 9 cores if you will.

All DSPs communicate with each other via SRIO--there is no shared memory among the physical DSPs. And in reference to your IPC and MessageQ suggestion, as it happens, I removed that Thursday and implemented my own custom IPC subsystem because with your IPC component, the system would crash if loaded from the DSP's on-chip bootloader. Works fine if loaded from emulator. I put everything on-chip since the SRIO bootloader is limited to L2RAM only, so it wasn't because the DDR wasn't initialized or anything. Generally, boot vs. emulator load problems are a .cint type problem, but I am behind schedule, so I don't have time to debug it at the moment and it was quicker just implementing my own core-to-core interrupts and event notifications via the DSP's IPC registers directly.

Anyway, I followed your work-around steps, and yes, indeed, the data corruption seems to have gone away. Thanks! That said, the emulator is extraordinarily sluggish now. For instance, halting a core now takes 10-15 seconds. For the MSA, right now, am connected to 6 cores (2 DSPs), and I only have the main logger enabled. A few minutes ago under M7, same debug configuration, everything was very responsive like it's always been. I'll reboot and try a few more sessions hoping maybe the DVT still needs time to cache some things in or something.

BTW, even with my firewall disabled, I got the Trace Server crash again. I'm not using trace, so I'm not blocked by this, but I'm just reporting.

So going forward with my current configuration (RC1, your timer patch, and the emulator driver downgrade), I am still getting Out_Of_Seq warnings when I run on more than one DSP. I'll check out the tutorial if and when I can get your IPC working again, but in the meantime, I am not understanding this. I grouped all six cores and started them at the same time, and each are running the same code, so they are all setting their sync points correctly, so why is MSA complaining? Do you get this on your EVM with its two DSPs?

About the memory alloc info message spam, no, it's a different .cfg file since it's just the NDK's HelloWorld application modified to work with my PHY. The UIA stuff I added to its .cfg file is a copy-and-paste from what I sent you above. Anyway, for now, I'll just defer this "problem" for later. That HelloWorld app is full debug info output, and it's probably just that. When I add networking support to my real application where I am fully controlling the debug message, we can revisit this if the problem persists (which I doubt it will).

0 Alex over 13 years ago in reply to Alex

Expert 2430 points

(new reply instead of message edit per your request)

For the Out_Of_Seq warnings when running on two DSPs, a reboot seemed to do the trick. (In reality, simply unplugging the emulator probably would have sufficed.) When I powered up my emulator, Windows decided it needed new drivers which is very odd since I normally never see that unless I change physical pods.

Anyway, at least for this one test run, I let it run for a bit watching all six cores print out their benchmark results every 100 ms : no errors, warnings, gaps, or anything. Coolness.

Also, Trace Server didn't crash this time (after the reboot), but I still have the firewall disabled.

Update: Nuts. After about 1000 messages, I started getting Out_Of_Seq warnings. No gaps or data loss, but I did have one of the cores halted (while the other 5 continued churning out their log messages). To this end, I restarted all cores and ran again without a breakpoint, but sadly, I still get the Out_Of_Seq warnings.

0 Miguel Aguilar over 13 years ago in reply to Alex

Intellectual 855 points

Brian, Alex,

Can you provide me the minimum configuration steps (Eg. .cfg file) for running the System Analyzer on the C6678 over Ethernet?

I already did this on the C6472 with no issue, I also check the tutorial for this and is more or less the same as I did for the C6472, since I simply loaded the NDK stuff and since all the EMAC drivers are already included in the NDK the configuration is simple, I also want to the the NDK use its own stack functions. However, with the C6678 I was following the configuration from the Image Processing demostration and the .cfg is has many other things somehow related to the NDK, CPPI, QMSS, but I would like to know the minimum configuration.

Thanks,

Miguel

0 BrianC over 13 years ago in reply to Miguel Aguilar

TI__Expert 3745 points

Hi Miguel,

I'm in the process of writing up a tutorial that will cover this ( http://processors.wiki.ti.com/index.php/System_Analyzer_Tutorial_4C ), and will post when it is complete.

Regards,

Brian

0 Alex over 13 years ago in reply to BrianC

Expert 2430 points

0 BrianC over 13 years ago in reply to Alex

TI__Expert 3745 points

Hi Alex,

Sorry I missed the update to your Oct 22 post. Were you able to overcome the out of sequence errors?

Also, I've been working out some approaches to support multi-device event correlation using non TI IPC mechanisms - please let me know if you would like more info on this. It's still 'theoretical' at this point (i.e. I haven't had a chance to actually try it out yet!)

Regards,

Brian

0 Alex over 13 years ago in reply to BrianC

Expert 2430 points

(I posted and then reposted [via edit] my post above, and both times nothing shows up on the forum--only an empty reply. Hopefully, this post won't be the same.)

No, I still get sequence errors, or worse, a perpetual "Waiting for Sync" message. Because of the latter, I generally now conditionally comment out the entire Sync portion of my configuration because the "Waiting for Sync" condition blocks any and all main logger messages.

I haven't had time to investigate my IPC boot problems, so I'm still using my own solution, as minimalistic as it is. Sure, I'm interested, but unless you are somehow going to coordinate things host-side, I am not sure if anything you come up with will apply to me. (My Ethernet is only connected to one of my DSPs, and all the DSPs are isolated. We wrote our own SRIO driver and interface as the only means of communication.)

I do have a general question, though. I think I know the answer (that being "masks"), and the documentation does cover this I think, I haven't had time to experiment. The main problem I have when I turn on task logging (and worse, HWI monitoring which I really, really need) is I get hammered with too many messages. Ideally, I would like to have all task/hwi switching events masked off, but when I get an external event (I'm controlling this event), I need to then turn on all the task event messages; and when I am done processing my data 15 ms later, I then want to turn them back off again. (Of course, all the while, I still want to keep my main logger active.) Is that possible?

Edit: Okay, instead of acting like a helpless idiot, I read through the user guide and took the time to experiment. In .cfg, set to masks to RUNTIME_OFF, and then while the program is executing, I can just use Diags_setMask() to temporarily enable them. But Section 5.2.3 is a little generic in describing which mask does what. So if I were only interested in log messages and brief task/HWIs event tracing, which of all those "default Diags.RUNTIME_ON" masks can I safely disable, and which do I need to toggle? And I would have to disable them for each and every module, correct? (Task, Swi, and Hwi?)

0 BrianC over 13 years ago in reply to Alex

TI__Expert 3745 points

Hi Alex,

Re: So if I were only interested in log messages and brief task/HWIs event tracing, which of all those "default Diags.RUNTIME_ON" masks can I safely disable, and which do I need to toggle?

I've put together a list of the various SysBios events and their associated Diags Mask settings in Tutorial 3A The Hwi module uses Diags_USER1 and Diags_USER2 to control event logging. Here's the .cfg code you can use to configure the masks so that they are dynamically reconfigurable at run time, and default to 'off':

    var Diags = xdc.useModule('xdc.runtime.Diags');
    var Hwi = xdc.useModule('ti.sysbios.hal.Hwi');
    Hwi.common$.diags_USER1 = Diags.RUNTIME_OFF;
    Hwi.common$.diags_USER2 = Diags.RUNTIME_OFF;

And here's the C code you can use to dynamically configure the masks at run time:

Diags_setMask("ti.sysbios.hal.Hwi-12"); // disable Hwi events (the "-12" means turn off both Diags_USER1 and Diags_USER2)

Diags_setMask("ti.sysbios.hal.Hwi+12"); // enable Hwi events (the "+12" means turn on both Diags_USER1 and Diags_USER2)

Re: And I would have to disable them for each and every module, correct? (Task, Swi, and Hwi?)

Yes, you can use the same technique for Swi and Task events as well - they also use Diags_USER1 and Diags_USER2 to control event logging.

Another approach would be to disable the logger that is being used to log sysbios events. Here's the .cfg code you can use to create a global symbol that can be used in your c code to configure the logger:

var LoggingSetup = xdc.useModule( 'ti.uia.sysbios.LoggingSetup' );

... // configure the LoggingSetup module as usual

Program.global.hSysbiosLogger = LoggingSetup.sysbiosLogger; // define a global symbol (hSysbiosLogger) for use in C code

Here's the c code you can use to disable / enable the sysbios logger:

#include <xdc/runtime/IFilterLogger.h>
#include <ti/uia/runtime/LoggerCircBuf.h>
#include <ti/sysbios/hal/Hwi.h>

/* ----------------------------------- To get globals from .cfg Header */
#include <xdc/cfg/global.h>
extern const xdc_runtime_IFilterLogger_Handle hSysbiosLogger;

...

LoggerCircBuf_disable(ti_uia_runtime_LoggerCircBuf_Handle_downCast3(hSysbiosLogger));

...

LoggerCircBuf_enable(ti_uia_runtime_LoggerCircBuf_Handle_downCast3(hSysbiosLogger));

(Note: if you are using stop-mode logging, you will need to replace LoggerCircBuf in the above with LoggerStopMode).

Hope this helps. I'll have more on the sync logs and IPC / multi-device event correlation in an upcoming post.

Regards,

Brian

0 Alex over 13 years ago in reply to BrianC

Expert 2430 points

Thanks for the info. Very helpful, especially the part about disabling the logger altogether.

Okay, so I received the notice about RC2 becoming the official release. Your UIA C6474 timer didn't seem to make it into it. (I'm still having to manually hack the UIA install.) What's up with that? Is the UIA component going to be updated independently soon?

0 BrianC over 13 years ago in reply to Alex

TI__Expert 3745 points

Hi Alex,

The ti.uia.family.c64p.TimestampC6474Timer module wasn't ready in time to meet the code freeze deadline for the last UIA release (UIA_1_00_03_25). We're planning another release in the next month or so, which will include it. In the meantime, I'll be posting the module as a zip file on the UIA Wiki as part of Tutorial 4. It will definitely be part of the UIA product release going forward, and will be in the UIA package that is installed with the next update release of CCSv5.1.

Regards,

Brian

0 BrianC over 13 years ago in reply to BrianC

TI__Expert 3745 points

Hi Miguel,

Re: Can you provide me the minimum configuration steps (Eg. .cfg file) for running the System Analyzer on the C6678 over Ethernet?

Sorry for the delay. I checked with the Low Level Driver (LLD) team that owns the various components. The official answer is that you need to pull in the LLD packages in order to use the NDK. i.e.

You technically don't have to include the PA package, but if you don't you would then have to interface the queues to the switch yourself.

The NDK/NIMU also requires application to provide resource manager and OSAL implementation. Information on RM and OSAL is available at http://processors.wiki.ti.com/index.php/BIOS_MCSDK_2.0_User_Guide#Platform_Development_Kit_.28PDK.29.

Tutorial 4C now contains a section on how to build an application using the NDK and MCSDK 2.X, which includes a project that has files that implement all of the functions requried by the RM and OSAL packages. This should provide a good starting point for your application.

I hope this helps. Please let me know if you have any questions or any suggestions for improving the tutorial.

Regards,

Brian

0 Miguel Aguilar over 13 years ago in reply to BrianC

Intellectual 855 points

Hi Brian,

Thanks for your detailed information. I will take a look to your tutorial and I will give anz feedback if I find something.

Regards,

Miguel

0 Miguel Aguilar over 13 years ago in reply to BrianC

Intellectual 855 points

Hi Brian,

The application that I built for the C6678 has to do with the Multicore Navigator, then I configured it in a specific way, now if I use the System Analyzer with Ethernet enabled, I believe that I would have a conflict in the Muticore Navigator since the resource manager file is confguring that as well, can you clarify me this?.

If there is a conflict is it possible to used JTAG mode on the C6678 with multicore correlation as well: http://processors.wiki.ti.com/index.php/System_Analyzer_Tutorial_4B?

Thanks

Miguel

0 BrianC over 13 years ago in reply to Miguel Aguilar

TI__Expert 3745 points

Hi Miguel,

You should be able to use System Analyzer / UIA and Multicore Navigator (CPPI) at the same time. The MCSDK 2.X image processing demos provide examples of this: e.g. C:\Program Files\Texas Instruments\mcsdk_2_00_03_15\demos\image_processing\ipc\evmc6678 . If you run into any specific problems, please post them and I'll do my best to help you get things working properly.

It is possible to use JTAG to upload events on the C6678 with multicore event correlation as an alternative if you prefer. One thing that isn't well enough documented is that you need to ensure that the UIA ServiceMgr module is NOT used or configured in your application if you are using JTAG to upload events. This is discussed in http://e2e.ti.com/support/embedded/f/355/p/148109/537730.aspx#537730 I'll be adding a note to http://processors.wiki.ti.com/index.php/System_Analyzer_Tutorial_4B in the near future to highlight this.

Regards,

Brian

Code Composer Studio™︎

Code Composer Studio forum

MSA/UIA Questions