AM3359 exception with Ethercat

Cazaban Laurence

Other Parts Discussed in Thread: TLK110

Hello,

Actually, we are developping a new control drive based on several dsp Sitara arm 3359 and a real time master linked thanks to an Ethercat network.

The real time master is an Intime Core with a KPA Stack.

We have developed our own DSP card which holds 2 dsp. The design of this card is really closed to the design of the 2 TI ICE Card grouped on a same card. The main difference is on the PHY component: we didn’t choose to use the Tlk110 but another Ethernet Phy LAN 8710.

I can give you some other technical details:

My code is compiled with CCSv6 and I’m testing in debug mode with an USB Probe
I use sdk 1.1.0.8 (I will update the sdk later)
The XDCTools version is 3.3.6.60_core
The SYS/BIOS is 6.41.5.54

To summarize the configuration of test we have an Ethercat network with:

Intime master with KPA Stack
Slave 1 with ARM3359
Slave 2 with ARM3359

So, we launch the test: start the code in debug mode on the 2 dsp, start the master with KPA Studio, scan the bus, find the 2 slaves, attach the slaves….. After a little time, an exception occurs on the dsp1 (you can see the screenshot in attachment).

So we decided to test another configuration by inverting the order of the dsp slave on the network (slave 2 before slave1). We launch the test and…. There is no error! We can reach the operational mode and run the test during hours.

So we decided to test another configuration and only test the slave 1. We launch the test and…. There is no error! We can reach the operational mode and run the test during hours.

We have made lots of time measurements on the clock, the data…. Everything seems to be normal. We also tried to modify the value of the parameters ESC_ADDR_TI_PORT0_TX_START_DELAY and ESC_ADDR_TI_PORT1_TX_START_DELAY: these parameters don’t seem to have any effects on the data clocking.

We don’t find any reason that could explain this exception and we are really annoyed at the time.

Do anyone can give us some explanation of this exception (what does it mean, when does it normally occur), on the delay parameter (what are their role). And if someone has an idea to solve the problem, that would be wonderful!

Thanks a lot for your help.

Laurence

over 9 years ago

0 Biser Gatchev-XID over 9 years ago

TI__Guru**** 393215 points

Hi,

I will forward this to the ISDK team.

0 Frank Walzer over 9 years ago

TI__Mastermind 44306 points

Hi Laurence,

to be honest I don't fully understand how we can help here. Obviously this is an ARM crash on your board with your software. A specific master configuration and program is probably in use too. How should we figure out what is wrong based on the provided info?

This requires a lot more debug (if the issue can be reproduced) in order to generate the data for questions we can answer. It may be your program crashes due to a 'normal' sw issue such as memory leak under certain conditions. Or an error condition (that only occurs rarely) enters into a part of your code that then behaves badly...

P.S. please don't call the AM335x devices 'DSP'... TI people do understand something different if they get asked for DSP...

Regards,

0 Cazaban Laurence over 9 years ago in reply to Frank Walzer

Expert 1500 points

Hello Frank!

First of all, I'm sorry to use the wrong word. In fact, we are used to use the dsp word because in our former driver we were really using DSP 2812.
In fact, the code that is implemented at the time in the arm is nearly the same as the one is ecta_appl full.
To be more specific, the problem of the exception is located in the function "get_app_reload_flag()".
AT the beginning of this function, you are trying to read in the register: HWREG(SOC_PRM_PER_REGS + PRM_PER_PM_PER_PWRSTST);
This register is located in the memory: 0x44E00C08. In a nominal case, I can see this memory but when the exception occurs this memory isn't readable anymore.
So, I wonder: what can put this memory in a non readable state? Is it related to a PRU crash ?
Moreover, can you tell me what are the parameters ESC_ADDR_TI_PORT0_TX_START_DELAY et ESC_ADDR_TI_PORT1_TX_START_DELAY ?
Thanks a lot for your help

0 Frank Walzer over 9 years ago in reply to Cazaban Laurence

TI__Mastermind 44306 points

Hi Laurence,

thanks for providing more data here... I am currently in a meeting and it is difficult to check on the code section mentioned. Obviously there should be no issue on accessing memory in the PRU area as we never disable that for any reason.
I have seen it in very rare cases that such an issue showed up but it was not really reproducible. How systematic is that error? I just want to avoid we are checking on a bad board or device here that is only a single broken instance.

I think the START_DELAY parameters should be commented but again I need to defer to the engineering team to confirm.

Regards,

0 Cazaban Laurence over 9 years ago in reply to Frank Walzer

Expert 1500 points

Frank,

I have tried to comment all the function "get_app_reload_flag" body and then the test is ok: I'm reaching the operational test, no exception occurs and if I break program and try to read in the memory where the PRU register are stocked, the memory is readable and everything is ok.
Then,; If I only try to access to the register in the function by the sentence: HWREG(SOC_PRM_PER_REGS+PRM_PER_PM_PER_PWRSTST) then the exception occurs and the memory seems to be non readable anymore.
What is very surprising is that if I invert slave 1 and slave 2 there is no error ...
Thanks for your help

Laurence

0 AnBer over 9 years ago in reply to Cazaban Laurence

TI__Mastermind 30210 points

Hi Laurence,

Cazaban Laurence said:
I have tried to comment all the function "get_app_reload_flag" body and then the test is ok: I'm reaching the operational test, no exception occurs and if I break program and try to read in the memory where the PRU register are stocked, the memory is readable and everything is ok.

So you mean that in this case you can access successfully the PRU memory map via the debugger. Correct?

Cazaban Laurence said:
Then,; If I only try to access to the register in the function by the sentence: HWREG(SOC_PRM_PER_REGS+PRM_PER_PM_PER_PWRSTST) then the exception occurs and the memory seems to be non readable anymore.

Just after (you were able to access the PRU memory map via the debugger) if you access the PRU memory map with the ARM CPU then it generates an exception. Correct?

What happen if you reset the CPU and PRU using the debugger and restart from the beginning the above procedure? Do you get the same results (ie CPU exception) consistantly?

Some more things to try:
- Could you try to identify more precisely when the PRU memory is not accessible anymore? May be it can relate to a specific chain of events. For example could you place some memory access to PRU in different place of the code?

- Also I have search SDK 1.0.0.x and SDK 2.1.0.x src code for "get_app_reload_flag".
I have not seen any instance of it in the SDK 2.1.0.x.
Have you tried to update to SDK 2.1.x.x to see if a comparable issue is happening?

- If you replace your 2nd on-board slave with an ICEv2 board instead do you get the same result (ie exception)?

0 AnBer over 9 years ago in reply to AnBer

TI__Mastermind 30210 points

Hi Laurence,

Also as a test for validating your HW (and reduce the number of variables) you could use the same SW tools as we use. The below wiki page documents how to use Twincat with our boards:

http://processors.wiki.ti.com/index.php/Running_AM335x_EtherCAT_Application_in_DC_Mode
http://processors.wiki.ti.com/index.php/Configuring_TwinCAT_For_TI_EtherCAT_Slave

Also the PRU ICSS Ethercat API are documented at:
http://processors.wiki.ti.com/index.php/PRU_ICSS_EtherCAT_firmware_API_guide

Processors

Processors forum

AM3359 exception with Ethercat