[AM3359 ICE] Ethercat EoE malloc/free issue?

eugenio

Other Parts Discussed in Thread: AM3359, SYSBIOS, CODECOMPOSER

'm evaluating SSCv5.0 with AM3359 ICE demoboard. Actually everything is working fine, except for Ethernet over Ethercat.

The ESC responds correctly to ping request, but every EoE mailbox message allocates 144 bytes of heap memory and *never releases* it. In a few ping iterations (depending
on heap size) the cpu locks somewere out of memory.

The memory allocation occurs in MBX_CheckAndCopyMailbox()
[file mailbox.c line 903]:

#if MAILBOX_QUEUE
        psWriteMbx = (TMBX MBXMEM *)
APPL_AllocMailboxBuffer(MBX_BUFFER_SIZE);

        /* if there is no more memory for mailbox buffer,
the mailbox should not be read */
        if (psWriteMbx == NULL)
...

This function is called by ECAT_Main() [file ecatslv.c line 2782] as soon as an EoE mailbox message is received.

Any suggestion?

over 13 years ago

0 PratheeshGangadhar over 13 years ago

TI__Mastermind 48061 points

Hi,

If you are using SDK 1.00.00.04 and SSC 5.01 - I suggest to try attached patch. We found this issue of memory leak in EoE mode with SSC5.01. Note that this patch is still under review with Beckhoff and may not be the final one to fix the problem.

2352.0006-Fix-memory-leaks-in-SSC5.01-stack-in-EoE-mode.zip

0 eugenio over 13 years ago in reply to PratheeshGangadhar

Expert 1505 points

Working fine now, thank you!

The problem was present since SDK 1.00.00.03 and SSC ver. 5.0.

I hope guys at beckhoff will apply this patch to their stack.

0 eugenio over 13 years ago in reply to PratheeshGangadhar

Expert 1505 points

It seems that a new issue is coming out: "task1" task is going out of stack memory.

I've increased the stack size from 0x800 to 0x1000, but the problem is still there. The cpu exits with this dump in error console:

[CortxA8] le: 0x800241a8.

Task stack base: 0x800241f0.
Task stack size: 0x1000.
R0 = 0x00000060 R8 = 0xffffffff
R1 = 0x80026a5b R9 = 0xffffffff
R2 = 0x00000060 R10 = 0xffffffff
R3 = 0x00000043 R11 = 0xffffffff
R4 = 0xffffffff R12 = 0x80026a56
R5 = 0xffffffff [CortxA8] SP(R13) = 0x80025128
R6 = 0xffffffff LR(R14) = 0x8000a55c
R7 = 0xffffffff PC(R15) = 0x80015854
PSR = 0x2000019f
ti.sysbios.family.arm.exc.Exception: line 174: E_dataAbort: pc = 0x80015854, lr = 0x8000a55c.[CortxA8]
xdc.runtime.Error.raise: terminating execution

This problems seems not correlated to EoE. The context is:

- SDK version 1.00.00.04, SSC version 5.1, CodeComposer 5.2

- ICE board connected to an ethercat master, only this slave present in network, free run mode enabled.

- CoE - Online tab in Twincat System Manager with "Auto update" option enabled, to continuously trigger CoE data refresh:

With SDK version 1.00.00.03 and SSC version 5.0 everything is working fine. The issue come out with 1.00.00.04 and SSCv5.1.

0 Frank Walzer over 13 years ago in reply to eugenio

TI__Mastermind 44306 points

Eugenio,

just the register dump doesn't tell what issue you see there. It might be stack size but this is not sure. If you debug the issue you should be able to see what is happening at the place where your code traps into the exception.

We are not extensively testing the higher level protocols in the stack as this is not our code. As Pratheesh mentioned we passed on the patch to ETG/Beckhoff but it is up to them to integrate into a new version of SSC. I saw you also posted in the ETG forum.

Regards.

0 eugenio over 13 years ago in reply to Frank Walzer

Expert 1505 points

This is a screenshot of CCS after the exception:

Cpu jumps to C$$EXIT(), but I can't figure out how to understand who thrown this exception. No EoE activity was on before the exception, just polling CoE variables. Everything is fine with previous versions of Sdl and ETG slave stack. These are the very few weeks I'm working through Code Composer (/Eclipse) and I'm not as experienced as I'd like.

Disabling EoE support in ecat_def.h solves the issue (but I'll need ethernet in the future). I can't guess if this is related to PRUs firmware or ETG slave stack.

0 Frank Walzer over 13 years ago in reply to eugenio

TI__Mastermind 44306 points

Eugenio,

this is too late in the processing. But you can see the program counter (PC-R15) in the dump. If you look there using the dis-assembly window you can get closer to where in the apps code the issue occurred. E_dataAbort exceptions may have a variety of reasons I think. But to say more we need to know the instructions that caused it.

I really don't think this is PRU firmware related if it only occurs with the EoE support...

Regards.

0 PratheeshGangadhar over 13 years ago in reply to eugenio

TI__Mastermind 48061 points

Hi,

How long it takes for this crash to occur?

Is EoE still enabled at the master?

Regarding the patch for EoE - I forgot to mention that I increased system heap and stack size in BIOS config as well

i.e. increasing BIOS.heapSize to 12288 and Program.stack to 4096 in app.cfg (application SYS/BIOS config file) improved the behavior. I still think there are more issues on the EoE implementation as memory leak (now at a very slower rate) still occur after ~ 30,000 pings or so. We have raised this issue with Beckhoff and hoping to get a resolution. Let me know if this helps - meanwhile we will try to reproduce this issue with 1.00.00.04.

0 eugenio over 13 years ago in reply to PratheeshGangadhar

Expert 1505 points

It takes thousands of pings (polling CoE variables in the meanwhile).

EoE is enabled at athe master.

Disabling EoE support in ecat_def.h eliminates the problem, so the issue (as stated by Frank in this thread) is strictly related to EoE code in beckhoff stack.

Engineers at Nuremberg ETG headquarter in Germany are informed, but still I got no answer.

Thank you for your support!

0 Rainer Hoffmann over 13 years ago in reply to eugenio

Prodigy 125 points

Hi,

the following patch file should fix the problem.

2133.SSC_V5i01_eoe_patch.zip

0 eugenio over 13 years ago in reply to Rainer Hoffmann

Expert 1505 points

Rainer,

the patch solved the issue.

Thank you all guys for your support.

Processors

Processors forum

[AM3359 ICE] Ethercat EoE malloc/free issue?