This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

NDK 2.21 crashes after NC_NetStop

Other Parts Discussed in Thread: SYSBIOS

I am using SYS/BIOS 6.33.5.46, NDK 2.21.0.32, NSP 1.10.0.03 on a custom 6748 board. i am using the NDK in a fashion that i am doing the API calls myself, which means i am calling NC_Netstop, NC_Netstart, etc.  i am basing this project on a previous project that used NDK 2.20 and that worked just fine.  code generation is off so the SYS/BIOS NDK does not run, it is all my own task that does everything.

my problem is that when i call NC_NetStop, after my custom netstop routine returns but before NC_NetStart returns i get an error from SYS/BIOS. actually i can get one of 1 errors so far. they are:

[C674X_0] ti.sysbios.heaps.HeapMem: line 309: assertion failure: A_invalidFree: Invalid free
xdc.runtime.Error.raise: terminating execution

and

[C674X_0] ti.sysbios.family.c64p.Hwi: line 234: E_handleNotFound: Hwi handle not found: 0xc1db64b8
xdc.runtime.Error.raise: terminating execution

i do not know what the NDK is trying to free that it can't find the handle for and i have no idea what HWI is triggering that could be causing that error.

does anyone have any idea what causes these errors? am i blowing my system stack? if so how can i find that in SYS/BIOS ROV? i remember in DSP/BIOS it was under krnl but i can't find anything like that in SYS/BIOS ROV view. i know i am not blowing the heap.

any ideas?

  • Mike,

    Can you try setting the following in your *.cfg file?

    var Task = xdc.useModule('ti.sysbios.knl.Task');
    Task.deleteTerminatedTasks = true;

    I suspect it may be due to the problem mentioned in the User's Guide section "5.2.2 TaskCreate(), TaskExit(), and TaskDestroy()".

    Also, to check the Hwi stack, have a look at the module tab for the Hwi module view in ROV:

    Steve

  • I have tasks enabled with that check box checked.  do i bother still putting in 

    Task.deleteTerminatedTasks = true;

    or should i be good since the checkbox is checked?

    thanks for showing me the stack location. it appears i am not blowing the stack.

  • I am now also getting this exception after the project runs for a little bit and drops into idle for the first time, it doesn't happen every time, just some times:

    [C674X_0] 3=0x11832528
    B24=0xf1 B25=0xf
    B26=0xee B27=0xcb
    B28=0x9d B29=0x71
    B30=0xc1ad6074 B31=0xc1dbdfa0
    NTSR=0x1000e
    ITSR=0xf
    IRP=0xc1dc8160
    SSR=0x0
    AMR=0x0
    RILC=0x0
    ILC=0x0
    Exception at 0xc1db6488
    EFR=0x2 NRP=0xc1db6488
    Internal exception: IERR=0x10
    Resource conf[C674X_0] lict exception
    ti.sysbios.family.c64p.Exception: line 248: E_exceptionMin: pc = 0xc1dc8160, sp = 0xc1ae14b0.
    To see more exception detail, use ROV or set 'ti.sysbios.family.c64p.Exception[C674X_0] .enablePrint = true;'
    xdc.runtime.Error.raise: terminating execution

    so i have a conflict of some sort. out of all that data how do i find out what my conflict is? 

  • Mike,

    Please see the SYS/BIOS FAQ for how to resolve this:

    http://processors.wiki.ti.com/index.php/SYS/BIOS_FAQs#4_Exception_Dump_Decoding_Using_the_CCS_Register_View

    Steve

  • alright, i will use that next time i get an exception

    i now just got this error

    [C674X_0] ti.sysbios.knl.Task: line 736: assertion failure: A_badTaskState: Can't delete a task in RUNNING state.
    xdc.runtime.Error.raise: terminating execution

    we have no tasks that exit. we literally have a "while(1)" at every task that we have so this must be the NDK trying to shut down a task.

    i have now run into 4 individual errors using this setup, all of which seem to be happening randomly and i am running the project the very same each time.  is it possible that i have something wrong with my NDK or BIOS files? is there sometime horribly wrong with my .cfg file?

  • so i got an exception (several actually) and debugged a little.

    several of the exceptions occur at memcpy but the stack pointer gave no information about what called memcpy so i was kind of stuck on that one.

    this time i got an exception at ti_sysbios_hal_Hwi_checkStack() at Hwi_stack.c:119 0xC1DC4D34 with a return address of 0x11832528 (which is kind of in the middle of lala land)  when i open the ROV and check the stack in Hwi it looks fine, so does the heap, so do all my tasks.  it appears i was in idle as far as the tasks were concerned.

    i must have something seriously wrong with my bios .cfg. i am going to attach it so somebody can look at it because i have no idea what could be wrong.

    app.cfg
  • Mike,

    I took a look at your *.cfg file.  I see that you're redefining the Ethernet driver's transmit and receive functions:

    var hwi5Params = new Hwi.Params();
    hwi5Params.instance.name = "HWI_EMAC_Tx";
    hwi5Params.eventId = 32;
    Program.global.HWI_EMAC_Tx = Hwi.create(10, "&HwTxInt", hwi5Params);

    var hwi6Params = new Hwi.Params();
    hwi6Params.instance.name = "HWI_EMAC_Rx";
    hwi6Params.eventId = 31;
    Program.global.HWI_EMAC_Rx = Hwi.create(9, "&HwRxInt", hwi6Params);

    Is there a reason you're doing this?  Note that the driver code does this for you automatically at run time (mapping HWI vectors 5 and 6 to Rx and Tx).  You may be running into a conflict because of this.

    Steve

  • i changed the ethdriver.c file so they don't get initialized and the #defines of the interrupts. we do this because we do not want the ethernet HWI to be above our audio interrupts. i rebuilt the libraries for the EMAC and restarted the whole CCS program.

    i do however think that i am being biffed by the HWI restore function somewhere because i keep getting the error of a bad HWI, and when i look HWI 15 (one i don't use) is posted. when i create a function that does nothing, tie it into HWI 15 in the HWI module, don't activate it, don't tie it in with any system event (basically just a routine to give BIOS a handle) i don't have a problem with the NDK rebooting. so when the NDK reboots it, for some reason, posts to HWI 15 and my system has nothing to handle that.  still looking into that...

  • ok so i now can't get past rebooting the NDK with HWI 15 defined.  this thing is so inconsistent. this is quickly becoming impossible to debug.

  • some progress. i guess you NEED to run the Interrupt init so the NDK knows what interrupts it is using (because i guess it initializes it again when you do NC_NetStop. 2.20 didn't give me this issue when i commented out the init routine), you can't just comment out that code and set up the HWI yourself.  i disabled the EMAC in SYS/BIOS and brought in the .c and .h files from the NSP, altered ethdriver.c to point the interrupts to the locations i wanted them at, kept hwi init, and now i don't get the hwi error.

    but I've had this false hope before so hopefully this will stick.  now i have to try and get some of the other errors to happen so i can debug them too...unless they were all related to this...

  • Mike,

    I guess I don't see the reason why you were hacking on the driver like this?  As you've discovered, the driver is meant to run in a certain way and changing the order of things could lead to problems.  I'd highly recommend leaving the driver code as is, as this could be the cause of the problems you've been experiencing.

    Anyway, please keep the updates coming and I'll chime in when I can.

    Steve

  • actually i took the driver out of the project via the SYS/BIOS and i am using just the C-files for right now.

    we had to change the drivers because:

    we are using the higher priority interrupts for audio - our product is a phone and if we placed the audio interrupts lower than the EMAC i wold get the EMAC interrupts creating pops and click in our audio, so i had to move the interrupts to lower slots

    we want to do a bitmask on the interrupts in the dispatcher so we wanted to take them out of the initialization. we wanted direct control over creation of all our interrupts so i commented out the HWI init routine, which i had done for the 1.94 NDK, the 2.20 NDK/NSP, this is the first iteration of the NDK that gave me any issues

    lastly, the "hwi_disables" that the NDK do in previous versions seemed to disable all interrupts not just the ones the EMAC uses, so i wanted to replace the function the macro calls with a routine that disables only the ethernet HWI since we don't want the NDK turning off interrupts, and again, screwing up our audio.

    i am actually really surprised that in an embedded space TI did not think people would modify this and give this stuff at options in the Emac module.  why would an embedded programmer not want to change: interrupt slots, interrupt masking, HWI disable routines, links/routes/hooks for emac init, emac get config, emac link status.  this seems like really obvious options to give a user and seems realyl lazy that they were not included in the module.

    i'll keep you up to date if i have any more issues, hopefully this solves it

  • cobsonchael said:

    we are using the higher priority interrupts for audio - our product is a phone and if we placed the audio interrupts lower than the EMAC i wold get the EMAC interrupts creating pops and click in our audio, so i had to move the interrupts to lower slots

    Since you state that moving the interrupts helped remove the clicks and pops, I assume that you're masking the ethernet interrupt during the audio ISR (i.e., the audio Hwi object uses a bitmask that contains the ethernet interrupt bit).  As a point of clarification, if you are *not* masking the ethernet interrupt and since SYS/BIOS's Hwi dispatcher enables nested interrupts, by placing the audio interrupt at a higher priority vector you are essentially causing the ethernet ISR to be serviced *before* the audio ISR when both interrupts are present simultaneously during a cycle when the CPU decides which one to service (i.e., both might fire during the same exact cycle, or both fire during a period of interrupts being globally disabled).  In this simultaneous-interrupt situation, if you *are* masking the ethernet interrupt for the audio ISR, placing the audio interrupt at a higher priority vector allows its ISR to be reached sooner, since if audio was a lower priority then the ethernet interrupt would be dispatched and then preempted by the audio interrupt as soon as the Hwi dispatcher enabled global interrupts before calling the ethernet ISR.

    I write this to point out some common confusion with "interrupt priority" (not that I think you're confused, but I just want to make sure it is understood) - there's physical (or hardware) priority inherent in the CPU vs. logical priority that is achieved through interrupt bitmasking.  The logical priority is much more important than the physical priority, since without the logical priority the physical priority does nothing except decide which of two or more simultaneous interrupts to process (which is the exception, IMO, since most interrupts happen on their own without others firing at the same time).  Plus, if nested interrupts are enabled (which they are with the Hwi dispatcher) and interrupt masking is not done correctly, as I mentioned above the lower physical priority interrupt's ISR will get serviced (and run to completion) *before* the higher physical priority interrupt's ISR.  Only masking prevents preemption, since *any* interrupt (including lower physical priority ones) can preempt *any* ISR if global interrupts are enabled.

    Regards,

    - Rob

     

     

  • Mike,

    Ok, I discussed your issue with Rob and I see the reason for taking the approach you have now, thanks for clarifying.

    Regarding code modifications, certainly I am not surprised that customer would modify the code; in fact that is exactly why we ship the code in the product.  However, it is reference code and I typically lean towards a "if it ain't broke ..." methodology unless there really is a good reason for it, because once you start modifying code like that you are heading into somewhat uncharted (untested) territory.  Obviously given your application issues of audio degradation, this constituted a good reason to modify it, but my understanding on HWI priorities was slightly off, so your explanation of changing the vector IDs around originally didn't make sense to me.

    Again, keep me posted on where you're at and I'll help you get these issues resolved.

    Steve