PDK6678 NIMU

Sergey Vasilev

I have some questions on NIMU-driver for PDK6678.

1. Driver functionality is not consistent with the TI recommendations on the development of NIMU-drivers. Not implemented link-control functions at Phy and SGMII. Does not support multiple interfaces. When compared with drivers for 6455, the 6678 looks like early demo.

Will there be further developed driver, add the functionality?

2. Hardware initialization sequence, implemented in the driver and demo application does not comply with the documentation. First of all initialized SGMII, even before calling main.Then start QMSS, CPPI and PA, but not configured FDQ, FLOW and etc. This sequence work only after SOC reset and not work when application reloading/restarting.
Accordingly deinitialization implemented incorrectly too. Not all resources are released. When trying to stop the driver, and then run it again a problem with queues and descriptors arise.

Will the bug fixes in the driver, or the problem is recognized as not actual, because May 2012 with no updates were not?

3.I tried to implement the code restart the driver, using as a basis code of IBL: Reset PA, PKTDMA teardown, release queues, zeroing QMSS memory regions registers. If I use such fixup before initialize hardware, the driver runs with no error message, but the packages, placed in 648 queue in the network do not get. Incoming packets are also not recorded.

Is it possible to restart the network-application without reset of the SoC?

How I can debug the cause of "silence" in my example?

over 12 years ago

0 Ivan Pang over 12 years ago

TI__Intellectual 2220 points

Hi Sergey,

Not implemented link-control functions at Phy and SGMII. Does not support multiple interfaces.

What further functions are you looking for exactly? What interfaces do you plan to use?

2. Hardware initialization sequence, implemented in the driver and demo application does not comply with the documentation. First of all initialized SGMII, even before calling main.Then start QMSS, CPPI and PA, but not configured FDQ, FLOW and etc. This sequence work only after SOC reset and not work when application reloading/restarting.
Accordingly deinitialization implemented incorrectly too. Not all resources are released. When trying to stop the driver, and then run it again a problem with queues and descriptors arise.

Will the bug fixes in the driver, or the problem is recognized as not actual, because May 2012 with no updates were not?

The examples and demos we have are designed with a System Reset between runs.

3.I tried to implement the code restart the driver, using as a basis code of IBL: Reset PA, PKTDMA teardown, release queues, zeroing QMSS memory regions registers. If I use such fixup before initialize hardware, the driver runs with no error message, but the packages, placed in 648 queue in the network do not get. Incoming packets are also not recorded.

Can you explain how you are "restarting the driver"?

Is it possible to restart the network-application without reset of the SoC?

This would require a number of changes. There are currently no effort to implement this, but it should be doable with proper resets. Consider these at the start of your application: 1) cycling the clock+power domains of your peripherals - this will be similar to what is done on a SOC reset, and 2) reset the queue manager by clearing the memory regions and pushing a NULL pointer on each queue (this should destroy the descriptors of each queue).

How I can debug the cause of "silence" in my example?

I'll have to ask you what you mean by silence. What do you see, not see, or expect to see?

-Ivan

0 Sergey Vasilev over 12 years ago in reply to Ivan Pang

Intellectual 280 points

Hi Ivan.

What further functions are you looking for exactly? What interfaces do you plan to use?

Support of two or more NDK-Interfaces (eg 1 per SGMII port), support monitoring SGMII and Cooper link state, support NDK IOCTL calls and so on.

The examples and demos we have are designed with a System Reset between runs.

I want to know whether the NIMU-driver will remain in status "examples and demos," or go to the status of the system software, such SISBIOS?

Can you explain how you are "restarting the driver"?

In the Helloworld - application, I added Stop function, which is called at the beginning Statsktask.
Stop function does the following (this sequence is taken from the IBL code):

- reset PA

- teardown PKTDMA

- release queues by pushing 0 to each queue

- zeroing QMSS memory regions registers

Verification of the application I have done with the ping utility. Application to be run from a СCS 5.3.
At first tile all start and runs normally.
I stopped on the application (tried two ways: just stop in CCS; or set terminate variable, which leads to a legitimate end of the demo application) and tried to reload it.

The second time the application runs without error messages, but the interrupt handler does not work (no incoming packets) and all outgoing packets, placed in 648 queue by the Send-function not appear on the network. In this SGMII and Cooper links raised, and GBeSw lookup table updated, ie activity of external network to at least Switch comes.This behavior I call "silence".

Explain please, which means

cycling the clock+power domains of your peripherals - this will be similar to what is done on a SOC reset

According to the documentation TI and experimental results, 6678 powerdomans once turned on is not turn off.

Local reset on PA powermodules has no visible impact on the application behavior.

My ultimate goal is to write a boot loader which can load via tftp applications, which in turn worked with the network. Now I see the problem is how to restart the network without losing data in the DDR3.

0 Sergey Vasilev over 12 years ago in reply to Sergey Vasilev

Intellectual 280 points

Hi Ivan.

The problem is still actual. Got any ideas?

0 Ivan Pang over 12 years ago in reply to Sergey Vasilev

TI__Intellectual 2220 points

Hi Sergey,

I apologize for a delayed response due to other priorities.

Support of two or more NDK-Interfaces (eg 1 per SGMII port), support monitoring SGMII and Cooper link state, support NDK IOCTL calls and so on.

I want to know whether the NIMU-driver will remain in status "examples and demos," or go to the status of the system software, such SISBIOS?

There are currently no plans for further development on NIMU other than bug fixes. It will remain as a part of PDK and not as a separate, standalone package. However, we may provide application examples for 2-SGMII-port support or PHY monitoring. There has been a number of requests and discussions initiated, but no tentative target date yet.

I stopped on the application (tried two ways: just stop in CCS; or set terminate variable, which leads to a legitimate end of the demo application) and tried to reload it.

Just to confirm, your first run works, but on your second run, you do not see any network activities? And there are no error messages? Did adding your "Stop" function at the beginning change anything?

cycling the clock+power domains of your peripherals - this will be similar to what is done on a SOC reset

I will need to double check on this - if it even has any effect. Power domains can be toggled for peripherals, but the navigator might be on the always-on domain.

Now I see the problem is how to restart the network without losing data in the DDR3.

Can you confirm that your DDR3 data is not lost on a program reload? This will help us isolate the problem to just restarting the network without a SOC reset.

I can see the problem on the network restart needing a SOC reset, but I do not have a definite solution yet. I will see if I can loop in colleagues who are more expert on this area.

-Ivan

0 Sergey Vasilev over 12 years ago in reply to Ivan Pang

Intellectual 280 points

Hi Ivan.

Just to confirm, your first run works, but on your second run, you do not see any network activities? And there are no error messages? Did adding your "Stop" function at the beginning change anything?

At first run works well. At second run, I dont see any network activities and there are no error messages (all mesages looks like at first run).

Without my Stop function i see a lot of error messages and messages from QM verification.

Can you confirm that your DDR3 data is not lost on a program reload? This will help us isolate the problem to just restarting the network without a SOC reset.

Yes, DDR3 data is not lost. I am just stop+restart aplication. Also I tried stop+reload+start aplication. In both cases i am not use reset cpu or reset SoC operations.

I tried to debug situation. Packets, puted to the appropriate PA queues are returned as expected. Ie it looks as if the QMSS and CPPI works. How to check the passage of the package through PA and GbESW to physics, I figured it out.

0 Ivan Pang over 12 years ago in reply to Sergey Vasilev

TI__Intellectual 2220 points

Sergey,

Unfortunately, I still don't have a perfect answer for this and I will raise this question to my colleagues. I did a bit of testing, and I believe there may be some Packet DMA (CPPI) or PA routing error on a re-run or reload. It looks like out-going packets are still being transmitted, but packets are not being received (they are not hitting the high priority queue that we set up our interrupt to check for).

I will update again when I have more information.

-Ivan

0 Ivan Pang over 12 years ago in reply to Sergey Vasilev

TI__Intellectual 2220 points

Sergey,

I believe there may be one more problem with the packet DMA. On a reload or restart, we are re-configuring the CPPI flow with some pre-existing settings, and I'm afraid that this may cause the "silence" as you described.

If you are running one of our existing demos/examples using the NDK stack, you can try one thing: telnet to the EVM and issue the "shutdown" command. This will call the EmacStop command to free up some memory, close queues, and close the opened CPPI channels. You should be able to restart the application without a system reset at this point.

I'll update again when I have a better answer. So far, my attempts to do a software reset at the beginning of an application has failed, since there's no global packet DMA reset and I lose all the channel handles from the previous run.

-Ivan

0 Sergey Vasilev over 12 years ago in reply to Ivan Pang

Intellectual 280 points

Hi, Ivan.

A little more information.

About EmacStop:

I tried use graceful stop via demo "terminate" variable. In this case EmacStop called. But as I wrote above, EmacStop releases only part of resources taken in EmacStart. In addition, resources are captured by the resource manager not freed at all. In this case the EmacStop function call does not give the desired effect

About CPPI:

1. In my Stop function I teardown all CPPI cannels and clear all 32 flows. So at application start point, CPPI registers looks like after reset.

2. Packets, puted to the appropriate PA TX queues (648) are returned as expected to PA return queue. So i think at least TX CPPI channels work correct. But in the network data does not appear.

And some statistics.
I tried to connect two modules directly to each other. In this case there are no "foreign packets" from network. With this configuration, at first 1-2 restarts all packets pass, but then stops again tightly to reset the processor.

When connect through the hub to the network, the silence begins at the first restart.
As a hypothesis: is it possible effects associated with the package "stuck" in the on / off transition moments?

0 Ivan Pang over 12 years ago in reply to Sergey Vasilev

TI__Intellectual 2220 points

Hi Sergey,

I noticed this thread hasn't gained much traction and I wanted to sync up on any progress made.

EmacStop releases only part of resources taken in EmacStart. In addition, resources are captured by the resource manager not freed at all. In this case the EmacStop function call does not give the desired effect

Does this apply to your application only, or do you see this on the TI-provided demos and examples? On my end, it looks that the EmacStop function correctly releases the resources used and allows me to re-run without a SoC reset.

In my Stop function I teardown all CPPI cannels and clear all 32 flows. So at application start point, CPPI registers looks like after reset.

Can you explain how you are tearing down the CPPI channels? If you're using Cppi_channelClose, let me know how you are passing the CPPI handles from your previous run to your current run.

As a hypothesis: is it possible effects associated with the package "stuck" in the on / off transition moments?

I would say unlikely. The failure is too consistent and this would assume that the SoC line rates are slow enough to get caught.

I took the approach of debugging this on the PA example projects (no NDK, no NIMU). You can see the same PA timeout errors on restart. However, the restart will work if you end the previous run with clearing the queues and closing CPPI channels. This continue to lead me to believe that SoC reset requirement is due to opened dma channels or pre-populated queues. This is just another approach if you wish to continue debugging, and why I feel that ending your previous run gracefully with EmacStop should work. Let me know your thoughts and feedback.

-Ivan

0 Sergey Vasilev over 12 years ago in reply to Ivan Pang

Intellectual 280 points

Hi Ivan.

In TI-provided demos, capture of resources is carried out in several stages.

First stage is a resource manager function calls from StackThread function. At this stage QMSS, CPPI and FDQ initialization perfomed.
Second stage is a EMAC_init code. At this stage PA, SW, CPPI flow and QMSS DQ initialization perfomed.
So, EmacStop does not clear gTxCmdFreeQHnd, gTxCmdReturnQHnd, rx flow, qmss acc, qmss memory regions, eventcombiner, PA. It also call res_mgr_stop_qmss() before res_mgr_stop_cppi(). In addition, Cppi_channelClose used instead of Cppi_channelTeardown (teardown is a recomendation from TI documentation), And PA still push packets to host when cppi stops.

Can you explain how you are tearing down the CPPI channels? If you're using Cppi_channelClose, let me know how you are passing the CPPI handles from your previous run to your current run.

I am using CSL functions, not QMSS LLD, becouse at second run in my Stop function i have not CPPI channels instanses yet.

As i wrote before, I have tried use EmacStop and it did not solve the problem.

0 Ivan Pang over 12 years ago in reply to Sergey Vasilev

TI__Intellectual 2220 points

Sergey,

Simply calling EmacStop from your application will not exit the NDK stack, and therefore you will still see packets trying to be pushed/received. In my case, I executed EmacStop through the shutdown telnet command, which in reality does more than what EmacStop does.

Assuming you want to reload or restart your application (in any arbitrary point in your run), I see two ways you may want to explore:

1. Using the RSTCTRL register. Writing 0x05A69 twice to RSTCTRL (0x023100E8) will reset the device and reenter the bootrom, running the bootmode you have it set to. This may be applicable in your case since you are using a bootloader to load applications via tftp.

2. Modify your application(s) to reset resources needed to restart the NDK stack. I did some experiments on this, and in reality you only need to disable interrupts, clear any interrupt flags from QM, clear the queue used by the accumulator, and power cycle the NetCP. (The last bit, power cycling the NetCP, may not be necessary. I have seen some issues with the PA, and this seemed to help). I attached the client example I experimented with that should be able to restart/reload without a need for SoC reset.

If you are going with #2, let me know if you run into any memory leaks/errors. It also does not clear all resources - you may still see resource contention between different applications.

Again, I apologize I wasn't able to jump back on this issue sooner due to other priorities. Let me know if this helps.

-Ivan

0246.client_6678_restart_test.zip

0 Harsha J K over 12 years ago in reply to Ivan Pang

Expert 1010 points

Hi Ivan,

Thanks for the project, it has helped me fixing my issue. Refer the link below.

http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/t/261743.aspx

Warm Regards,

0 Ivan Pang over 12 years ago in reply to Harsha J K

TI__Intellectual 2220 points

Harsha,

Good to hear. Let me know if you encounter any related errors. I did not do any extensive testing and fear that my implementation may have memory leaks (from not freeing buffers queued) or resources blocked for other operations.

-Ivan

0 Sergey Vasilev over 12 years ago in reply to Ivan Pang

Intellectual 280 points

Hi Ivan.

I watched your program and compared with my, tried to run it.

1. The biggest difference in EVM_INIT. You add PA stop and power domain off code. But powerdomain stop code may cicling forever becouse sometime transition from "on" state to "off" state never ends. I ran the program in three EMVs, and several times was hanging in EVM_init.

2. Shutdown telnet command simply call NC_Stop, so I do the same. It did not help.

3. I tried using DSP software restart (RSTCTRL), but it led to new problems with DSP initialization. I opend a new thread http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/t/265993.aspx, may be you can help me.

0 Pavithra Shankar over 12 years ago in reply to Ivan Pang

Prodigy 75 points

Hi Ivan,

Thank you for your suggestions and example project. It worked for me and helped me fix the issues i had.

Best regards,

Pavithra

0 Varabei Dzmitry over 10 years ago in reply to Ivan Pang

Prodigy 60 points

Did you see something new on this issue.
Tried your solution. Yes, it allows you to restart the server. And the kernel is running. That's just the packets do not reach. As if they were not sent.

The problem with restarting is still not solved and topical.
I would be very grateful for the answer.

0 Varabei Dzmitry over 10 years ago in reply to Varabei Dzmitry

Prodigy 60 points

I am sorry you are running. And very well. Just need to be cleaned in the multi-core projects very carefully. Only the 0 core. ))))).

The authors of my great respect.

Processors

Processors forum

PDK6678 NIMU