Improvements to the C6748 NSP

Dean Hofstetter

Other Parts Discussed in Thread: SYSBIOS, OMAPL138

I'm going to start this post with something unrelated to the NDK/NSP, but in my mind needs to be fixed ASAP. Please see: https://e2e.ti.com/support/embedded/tirtos/f/355/t/2612

DSP: C6748
Code Composer: 5.5
SYSBIOS: 6.37.05.35
XDCTools: 3.25.06.96
NDK: 2.24.02.31
NSP: 1.10.02.09

The C6748 NSP provided by TI appears to be a SW port from some other C6x part. It is poorly written, poorly commented, and contains bugs. With that said, it is better than starting from scratch, so we should be thankful we have at least something to use. This discussion is meant to help me understand the NSP, and to hopefully have TI release an updated NSP for the C6748.

Layer 1: NIMU. This is the layer that the NDK talks to, so no matter what device you use, the NDK expects certain functions to be implemented. The files for this layer are: nimu_eth.c and nimu_eth.h.

Problem #1. The NDK main stack thread starts the NDK by calling NC_NetStart( ). This is called from a dowhile loop, so as long as NC_NetStart( ) doesn't return with an error, the NDK will just be restarted. Every time NC_NetStart( ) is called, it calls the initialization function provided in the NIMUDeviceTable[ ] array. For the C6748 NSP that init function is EmacInit( ). EmacInit( ) acquires some memory via two mmAlloc( ) calls. That memory is never freed when EmacStop( ) is called. Therefore every time the NDK is restarted, a little bit of memory is lost (memory leak). After many restarts, eventually the NDK stops working. My suggested fix is to implement the memory required as a static variables. For example: static PDINFO PrivateEMACData and static NETIF_DEVICE C6748EMAC.

Problem #2. Too many places where the Ethernet MAC address is defined. EmacInit( ) has a hard coded MAC address, ethdriver.c has a hard coded static MAC address, and finally HwPktInit( ) calls EMAC_getConfig( ) where you're supposed to load up the real Ethernet MAC. Yikes. How about removing the hard coded Ethernet MAC addresses from EmacInit( ) and ethdriver.c, and pass the a pointer to the PrivateEMACData structure into HwPktInit( ). That way once HwPktInit( ) get the "real" Ethernet MAC address, it just loads that right into the PrivateEMACData structure. Finally copy that MAC address into the NETIF_DEVICE structure before calling NIMURegister( ). The NDK knows the MAC address right from the get go, and doesn't have to be overridden during EmacStart( ). Simple, easy, less confusion.

Problem #3. Undocumented configuration option. EmacStart( ) sets the Emac filter. According to the NDK the options are: NOTHING, DIRECT, BROADCAST, MULTICAST, ALLMULTICAST, ALL. The NSP forces this to ETH_PKTFLT_MULTICAST. Our applications never use multicast, so I've set this to ETH_PKTFLT_BROADCAST. I think this will reduce processing time further down in the NSP layers.

Problem #4. EmacStop( ) calls HwPktShutdown( ) which does nothing and returns. Why is it in the code? Remove the call to HwPktShutdown( ). This is a common problem with the NSP. Code that does nothing, code that is commented out, variables created for no reason, etc.

Problem #5. Emacioctl( ) only supports (2) of the (13) defined options in the NDK. The two that are supported are for multicast, which we don't use. In fact we don't use any of the defined options. I wouldn't care if this function was empty, but some people might. The NDK has code defined to use these options, therefore shouldn't the NSP support them?

Problem #6. EmacSend( ) does a min check verification on the packet length. Why here? It looks like EMAC_sendPacket( ) (located in the low level emac driver file) has the logic to handle packets less than the Ethernet minimum.

Problem #7. The C6748 NSP appears to be really old. I've reviewed the NSP for the C6657 and it is pretty different. The biggest difference appears to be how it handles RAW Ethernet packets. The PDINFO structure includes a PBMQ queue for raw received packets, where the C6748 implementation doesn't have this. Not sure if this means the C6748 can't send / receive raw Ethernet. Finally the C6647 NSP has no code for the function EmacPoll( ), it just returns. For the C6748 NSP EmacPoll( ) samples the MDIO layer to see if the Ethernet link is up or down.

I'll continue the rest of the NSP layers in another post. I've attached my re-write of the NIMU layer at the bottom of this post. Please review.

Thanks, Dean

/cfs-file/__key/communityserver-discussions-components-files/355/6457.Ethernet_5F00_NSP_5F00_nimu.c

over 8 years ago

Emmanuel Trinidad - Alicea over 8 years ago

TI__Genius 10260 points

Hi Dean,

Thanks for the feedback, I will bring this to the attention of the team in charge of the NSP. In the meantime, to load a file, press the reply button followed by the "Use rich formatting" text on the right hand side. You should be able to see and "Attach" paperclip icon in the tool bar.

Regards,

-- Emmanuel

Dean Hofstetter over 8 years ago in reply to Emmanuel Trinidad - Alicea

Expert 2525 points

Bump. Can we get an EMAC / NSP expert to comment on my observations so far? Is there one person or group within TI that is responsible for the NSP layer?

Thanks, Dean

Dean Hofstetter over 8 years ago in reply to Dean Hofstetter

Expert 2525 points

Layer 2: Ethdriver

Problem 1. Use of disable ALL interrupts vs. just disabling the EMAC interrupts. The reason for this heavy handed approach is the following link: http://processors.wiki.ti.com/index.php/DM648_Fix_for_NDK_Crash_in_High_Network_Traffic. We have real time audio requirements, and shutting down the world won't work for us. I think a better solution would be to declare each HWI interrupt routine as MASK ALL to disable HWI nesting. Then use Hwi_disableInterrupt() to only shut off the EMAC interrupts where needed. Comments???

Problem 2. PKT_MAX is poorly defined. There should be more comments around this #define. If you increase PKT_MAX (which I recommend) then you must increase the PBM in the *.cfg file (NDK -> Buffers -> PBM Buffers -> Number of Frames). Number of Frames should be set to 3X whatever PKT_MAX is set to.

Problem 3. Lots of unused, un-necessary stuff. Remove __TMS470__ #if stuff. Un-need variables: hMDIO, cls_errors, emac_fatal_error, coreNum, PktMTU, bMacAddr[], FlashActiveLED, and memory_squeeze_error.

Problem 4. Poorly defined #defines that control stuff. CORENUM can be 0 or 1 (can't be 2 as the C6748 doesn't have those interrupt vectors mapped). RXINT and TXINT may have to change, depends on your system interrupt structure. EXTERNAL_MEMORY and EXTMEM probably will never change for the C6748, so just get rid of the #defines (leave the code however). Basically you make it hard for the user to understand what they can.

Problem 5. Why have the hApplication stuff? Extra code just to be cute? Get rid of it.

Problem 6. RxPacket( ) returns a buffer of size (1514 + 4 + PKT_PREPAD). HwPktOpen( ) initializes buffers of size 1514. Why the difference? I would think these need to be the same.

Problem 7. StatusUpdate( ) needs to alert the user that something very bad has happened. There should be a mandatory call back function to the users code when this function is called.

Problem 8. HwPktInit( ) should return the Ethernet MAC address back up to the NIMU layer. The current NSP has way to many locations where the MAC address is defined. It should come from the user and no where else.

Problem 9. Why is the EMAC_CONFIG_MODEFLG_RXOFFLENBLOCK option set?

Problem 10. The C6748 only has one core, so why are both of these defined: ecfg.ChannelInfo[0].RxChanEnable and ecfg.ChannelInfo[1].RxChanEnable?

Problem 11. HwPktSetRX( ) is writing to undefined memory locations. Clearing out the multicast MAC's is hard coded with 32. The C6748 only has 8 multicast MAC's.

Problem 12. Loading the MACADDRLO register seems wrong. It shifts in CORENUM, when I believe it should shift in (tmp1 + 1).

Problem 13. No HASH_MCAST for the C6748, so get rid of the code.

Problem 14. HwPktTxNext( ) uses the "register" keyword on the buffer and length variables. Why? The other routines that use OEMCacheClean( ) don't use the "register" keyword.

Problem 15. HwPktPoll( ) should only run its code if fTimerTick == 1. The NSP code forces this variable to 1 always. Why?

Problem 16. Get rid of LED_TOGGLE( ) code. Must be left over from some EVM board.

Problem 17. The HWI interrupts are turned on after hEMAC is defined. Therefore the HWI vectors don't need to check for that. Just wastes CPU cycles.

Problem 18. The code for EMAC_rxEoiWrite( ) and EMAC_txEoiWrite( ) just wastes CPU cycles. Touch the registers directly.

Problem 19. The rcv_pkt[ ] buffer seems to be uninitialized. Some of the items in the structure are set in HwPktOpen( ), but most are not. I would think at a minimum as you load each record in the array you would first do: memset(&rcv_pkt[i], 0, sizeof(EMAC_pkt));. How else are Flags, ValidLen, DataOffset, etc going to get initialized?

1768.Ethernet_NSP_ethdriver.c

Dean Hofstetter over 8 years ago in reply to Dean Hofstetter

Expert 2525 points

I realize we are in the summer vacation time frame, but can I please get in touch with the team that handles the NSP? The NDK doesn't work without a functional NSP, and I'm not impressed with the C6748 NSP. Whenever a company releases source code (examples, drivers, etc), it is their chance to shine. Show that you follow a coding standard, clear code and comments, etc. The C6748 NSP is a poor effort by TI, and doesn't make them look good as a company.

Steven Connell over 8 years ago

TI__Mastermind 45025 points

Dean,

I can't argue with you that the driver is in bad shape. It's unfortunate that it was released with all of these issues present.

Thanks for taking the time to sort through this; I am in the process of reviewing the code, as well as the changes you have made. I'm also applying as many of your changes as I can. Note that I may not apply all of them; there are a couple so far that I don't entirely agree with, but that can be open for discussion. Once complete, I'll release a new NSP to get these updates out.

So far, I've mainly focused on the nimu_eth files. I'll work on ethernet.c next.

Dean Hofstetter said:
Problem #1. The NDK main stack thread starts the NDK by calling NC_NetStart( ). This is called from a dowhile loop, so as long as NC_NetStart( ) doesn't return with an error, the NDK will just be restarted. Every time NC_NetStart( ) is called, it calls the initialization function provided in the NIMUDeviceTable[ ] array. For the C6748 NSP that init function is EmacInit( ). EmacInit( ) acquires some memory via two mmAlloc( ) calls. That memory is never freed when EmacStop( ) is called. Therefore every time the NDK is restarted, a little bit of memory is lost (memory leak). After many restarts, eventually the NDK stops working. My suggested fix is to implement the memory required as a static variables. For example: static PDINFO PrivateEMACData and static NETIF_DEVICE C6748EMAC.

Good catch. The driver was definitely leaking memory as there is no corresponding mmFree() call for the private data struct that's allocated in EmacInit().

While you can choose to allocate these structs statically, if you do, you must set the flag NIMU_DEVICE_NO_FREE in the NETIF_DEVICE struct so the stack doesn't do an invalid free.

I fixed the memory leak by freeing the private data in the EmacStop() function (which is called upon both reboot and shut down of the stack)

Dean Hofstetter said:
Problem #2. Too many places where the Ethernet MAC address is defined. EmacInit( ) has a hard coded MAC address, ethdriver.c has a hard coded static MAC address, and finally HwPktInit( ) calls EMAC_getConfig( ) where you're supposed to load up the real Ethernet MAC. Yikes. How about removing the hard coded Ethernet MAC addresses from EmacInit( ) and ethdriver.c, and pass the a pointer to the PrivateEMACData structure into HwPktInit( ). That way once HwPktInit( ) get the "real" Ethernet MAC address, it just loads that right into the PrivateEMACData structure. Finally copy that MAC address into the NETIF_DEVICE structure before calling NIMURegister( ). The NDK knows the MAC address right from the get go, and doesn't have to be overridden during EmacStart( ). Simple, easy, less confusion.

What a confusing mess. I took your changes a step further and just called EMAC_getConfig directly from within EmacInit(). This also eliminates the need to call HwPktInit()

On a similar note:

HwPktInit()
This function seems to be a remnant of the old driver model that the NDK followed, prior to the NIMU design. Its functionality is pretty much covered by the init function that's set in the NIMUDeviceTable[]. The exception is that this function returns the number of supported devices. Getting the number of devices may be useful for some h/w platforms/drivers, but it doesn't seem to matter here. I think I'll remove that function.

Similar for HwPktShutdown.

Dean Hofstetter said:
Problem #3. Undocumented configuration option. EmacStart( ) sets the Emac filter. According to the NDK the options are: NOTHING, DIRECT, BROADCAST, MULTICAST, ALLMULTICAST, ALL. The NSP forces this to ETH_PKTFLT_MULTICAST. Our applications never use multicast, so I've set this to ETH_PKTFLT_BROADCAST. I think this will reduce processing time further down in the NSP layers.

Still need to look into this one ...

Dean Hofstetter said:
Problem #4. EmacStop( ) calls HwPktShutdown( ) which does nothing and returns. Why is it in the code? Remove the call to HwPktShutdown( ). This is a common problem with the NSP. Code that does nothing, code that is commented out, variables created for no reason, etc.

Agreed. Removed it. Also see response to problem #2 above.

Dean Hofstetter said:
Problem #5. Emacioctl( ) only supports (2) of the (13) defined options in the NDK. The two that are supported are for multicast, which we don't use. In fact we don't use any of the defined options. I wouldn't care if this function was empty, but some people might.

Yes, some people might not want this empty. For the sake of "general support" I'm planning to leave in the multicast IOCTL support that's there.

Dean Hofstetter said:
The NDK has code defined to use these options, therefore shouldn't the NSP support them?

And in reply to your changes to move the parameter checks in Emacioctl():

These argument checks assume that pBuf will contain a MAC address. I'm assuming the driver author put these parameter checks that are specific to pBuf containing a MAC address within each specific case to handle the event that the IOCTL function was called with a command in which pBuf did not contain a MAC address.

Looking more closely at the NDK code, it looks like the only possible command that's not handled by this driver or by NIMUIoctl() itself is NIMU_SET_DEVICE_MAC. In this case, pBuf would also contain a MAC address and so it would be OK to move the param checks to the top of Emacioctl.

However, it is possible that more NIMU IOCTL commands could be added to the
NDK in the future, which could be passed to the driver's IOCTL. In this case,
such an addition could cause a compatability issue wit the NSP and a new NDK
containing said change.

In any case, still thinking about this one.

Dean Hofstetter said:
Problem #6. EmacSend( ) does a min check verification on the packet length. Why here? It looks like EMAC_sendPacket( ) (located in the low level emac driver file) has the logic to handle packets less than the Ethernet minimum.

I'll look at whether or not this location is appropriate or not. In the meantime, I've fixed the following bug:

SDOCM00112837 OMAPL138 driver adds redundant 4 byte FCS padding causing errors to show in Wireshark

Dean Hofstetter said:
Problem #7. The C6748 NSP appears to be really old. I've reviewed the NSP for the C6657 and it is pretty different. The biggest difference appears to be how it handles RAW Ethernet packets. The PDINFO structure includes a PBMQ queue for raw received packets, where the C6748 implementation doesn't have this. Not sure if this means the C6748 can't send / receive raw Ethernet. Finally the C6647 NSP has no code for the function EmacPoll( ), it just returns. For the C6748 NSP EmacPoll( ) samples the MDIO layer to see if the Ethernet link is up or down.

I can't really comment on the 6647 driver, I'm not familiar with that one. I believe the 6748 can send/receive raw Ethernet, although you have to set some flags in registers as well as some other tweaks.

------------------

Here's some other comments/questions that came up while reviewing your code changes:

a. Updated header files (Ethernet_NDK_Hooks.h, Ethernet_NSP_{ethdriver, nimu}.h)

What are the changes here? I didn't see these attached.

b. Where is definition of NDKFatalError()?

c. Removal of static scope for NIMU registered functions

Not sure I agree on this one. The static scope acts as a deterrent against others calling these functions directly. Making them public opens them up for anyone to call.

d. Updates to comments, Comment format, and coding standards

Agreed that the comments for APIs were lacking. I'm going to take most of your changes to the comments (minus those expressing frustrations with the NDK ;)

The comment format is for generating doxygen. Now, I realize that the NSP does not generate doxygen, making these comments pointless to a certain extent, but I'll look at getting doxygen docs generated from these comments for the NSP.

Lastly, the code formatting is definitely a mess. I'm updating it to conform to our coding standard.

e. NDK not checking return value of driver init function:

Good catch. I've filed a bug for this:

SDOCM00118387 return value of NIMU registered init function not checked!

f. Removal of EMAC_DATA struct

Agreed. I don't see the point of having this extra layer of indirection on top of PDINFO.

g. Leading underscore in "_HwPktPoll()"

Planning to keep this, only because it's specified this way in the Ethernet Driver Design Guide (sprufp2a.pdf). From the guide:

"Note that this function is not called in kernel mode (hence, the underscore in the name). This is the only
mini-driver function called from outside kernel mode (to support polling drivers)."

h. Removal of call to PBMQ_init()

No need to call this if the driver instance struct is zeroed out first. Removing it.

i. Reordering of functions

Not sure why you wanted to reorder the functions. Maybe to avoid the need for prototypes at the top of the file? Planning to keep the ordering as it is. A plus of this is that it makes viewing the diff of changes made much
easier between engineering builds.

Steve

Dean Hofstetter over 8 years ago in reply to Steven Connell

Expert 2525 points

NIMU layer comments.
Problem #1: I prefer static variables vs. dynamic, to easy to cause a memory leak (just like the one we found). Thanks for pointing out the NIMU_DEVICE_NO_FREE flag option. nimuif.h should have the comments updated for the flag variable within the NETIF_DEVICE structure. Currently it says driver authors shouldn't touch, but this is where you need to load the NIMU_DEVICE_NO_FREE option.

Problem #2: Agree that moving EMAC_getConfig() to EmacInit() makes the most sense. I send in the pointer to the private EMAC data structure (bMacAddr) and then copy that into the NETIF_DEVICE structure(mac_address). That way both the private data structure and the NDK has the correct MAC address.

Problem #3: The filter option definitely configures low level EMAC registers. See RXMBPENABLE as an example. Looks like the C6748 EMAC can be configured to ignore certain Ethernet packets (multicast, all cast, etc). From what I've read on the Internet, multicast seems to be little used. I'm in the process of ripping out all the multicast stuff from the NSP, but understand if you want to leave it in. My only argument would be what if a customer wanted ALL CAST? The NSP doesn't support that the way it is written, so why support the little used multicast?

Problem #4: Agree.

Problem #5: Seems to be a little used function of the NDK. I'm removing it from my NSP.

Problem #6: How / Where did you fix SD0CM00112837? We have also see wireshark errors, so I'd like to get that fix rolled into my NSP as soon as possible.

Problem #7: I only commented on the C6647 NSP, as that part is newer than the C6748 and may include a "better" way to write the NSP. I think we can agree the C6748 NSP was written a long time ago and kind of neglected.

Responses to your questions
a) Sorry I didn't include the header files. I'm trying to remove the complexity of linking in the NSP at all, remove the CSL layer, etc. Is there a way I can share my project with you directly instead of posting it here on E2E?

b) NDKFatalError() exists in Ethernet_NDK_Hooks.h (one of your missing header files). Basically it a function I wrote that locks up our device with error information on the screen. Certainty not needed in tested / releasable code, but during my development helps me quickly find the source of the problem.

c) See your point, however when I see the static qualifier in front of a function in a file, I assume that function can ONLY be called by that file. The NDK is just playing games trying to hide stuff. Plus I thought the optimizer made assumptions on what it could do if it saw the static qualifier on a function. Maybe I'm wrong.

d) Thank you for taking the time to update the comments and coding style. Code shouldn't be "hard" or "confusing" to understand if written well.

e) There may be other NDK functions not checking the return value, I just happened to find this one.

f) Agreed. Extra layers for the sake of extra layers only add to the confusion.

g) I thought in the C6000 C compiler document it was highly suggested NOT to have your functions start with an underscore. If not there, I no I saw it somewhere. Plus most coding guidelines forbid starting off with an underscore. Your call if you want to leave it alone, but I've changed it for my NSP.

h) Agreed.

I) Re-ordered functions to better conform to our coding style. I get your point about not shuffling stuff too much for file compares. I'm treating this NSP as a complete re-write, so I'm move stuff around to suit our needs.

Finally I want to say thank you for looking into this. I was beginning to think I would be on my own for this project.

- Dean

Steven Connell over 8 years ago in reply to Dean Hofstetter

TI__Mastermind 45025 points

No problem, Dean. This work was needed and overdue.

Dean Hofstetter said:
nimuif.h should have the comments updated for the flag variable within the NETIF_DEVICE structure. Currently it says driver authors shouldn't touch, but this is where you need to load the NIMU_DEVICE_NO_FREE option.

Yes, that comment slipped by me. I filed a bug to get this fixed:

SDOCM00118886 update NIMU device struct's comment regarding setting of flags for driver's mem allocation model

Dean Hofstetter said:
Problem #6: How / Where did you fix SD0CM00112837? We have also see wireshark errors, so I'd like to get that fix rolled into my NSP as soon as possible.

In EmacSend (still need to look at moving that check elsewhere).

Since the h/w is configured to append the 4 byte FCS, setting the minimum frame size to 64 bytes is incorrect. The fix is to set it to 60 bytes instead of 64, then when the h/w appends the 4 byte FCS, the frame will meet the 64 byte minimum:

if (PBM_getValidLen(hPkt) < 60) {
PBM_setValidLen (hPkt, 60);
}

Dean Hofstetter said:
a) Sorry I didn't include the header files. I'm trying to remove the complexity of linking in the NSP at all, remove the CSL layer, etc. Is there a way I can share my project with you directly instead of posting it here on E2E?

Yes, we have to become friends on the forums. Then it's possible to send private messages. I'll send you a request.

Dean Hofstetter said:
g) I thought in the C6000 C compiler document it was highly suggested NOT to have your functions start with an underscore. If not there, I no I saw it somewhere. Plus most coding guidelines forbid starting off with an underscore. Your call if you want to leave it alone, but I've changed it for my NSP.

I'll double check on this.

Steve

Dean Hofstetter over 8 years ago in reply to Steven Connell

Expert 2525 points

Steve,

Any progress on the ethdriver layer? Can you please also look at this post and comment if you can: e2e.ti.com/.../433599

Thanks, Dean

Steven Connell over 8 years ago in reply to Dean Hofstetter

TI__Mastermind 45025 points

Dean,

Yes, I've made progress but still working on it. I'm also trying to track down someone who's knowledgeable of the EMAC hardware in order to find out more about the fatal HOSTPEND errorr and how to best handle it.

Steve

Steven Connell over 8 years ago in reply to Dean Hofstetter

TI__Mastermind 45025 points

Dean,

I'm getting ready to make an NSP release update here. I've made good progress on the bigger items you had pointed out, although some of the smaller ones won't make it into this iteration.

The main update I have is that I figured out how to generate a fatal HOSTPEND error, and even better, how to recover from it - without having to power cycle. In short, once the fatal error occurs, the EMAC is in a bad state (as expected). What I saw it doing at that point, is constantly trigger the EMAC RX interrupt, to the point that the whole system is starved. Upon closer inspection, I found that BIOS was still running, but constantly overloaded with servicing the interrupt.

With some reworking of the Hwi module settings (using Hwi.disableMask) to disable the interrupt automatically, and then re-enable it whenever NO fatal error is detected, this allowed everything to function under normal conditions, and also cause the RX interrupt to be ignored upon hitting the fatal error.

Additionally, this also allowed the NDK to be rebooted, which tears down and re-inits the driver, and after the reboot, the app recovers - I'm able to ping it and Telnet into it, so TCP works, too. (Note that these fixes are slightly different for the ARM, and you will see that in the code surrounded by #ifdefs ... please remember that this driver has to support the OMAPL138 with ARM, also. You should just ignore the ARM specific code, of course).

Regarding disabling of interrupts ...

I changed the DSP code to only disable the EMAC RX/TX interrupts by means of Hwi_disableIER. Note that the ARM code still disables ALL interrupts, as before. There is no Hwi_disableIER function for ARM, and when I tried to make the equivalent calls to only disable these 2 interrupts, the stack became unresponsive, and so I left it as is for ARM. I'm guessing you shouldn't care too much, since you're on the C674.

I'll let you know once the NSP is available for download.

Steve

Steven Connell over 8 years ago in reply to Steven Connell

TI__Mastermind 45025 points

Dean,

Please find the NSP update here.

Cheers,

Steve

Dean Hofstetter over 8 years ago in reply to Steven Connell

Expert 2525 points

Steve,

Thanks for taking the time to improve this NSP. I'm still reviewing it, and one thing caught my eye. In HwPktOpen( ) the calls to PBM_alloc( ) use PKT_MTU as the input. In RxPacket( ) the calls to PBM_alloc( ) use PKT_MTU + 4 + PKT_PREPAD as the input. PKT_MTU is set to 1514. I've checked a couple of other NSP implementations, and those use the same input for both calls to PBM_alloc( ), which is 1514.

So either the C6748 NSP is shorting the buffer space on the first few RX packets (because they are only PKT_MTU big), or the after many Ethernet packets are received, we are trying to pull more memory than needed when we refresh (because they are PKT_MTU + 4 + PKT_PREPAD big).

Can you clarify if there is a risk to the first few packets not having enough memory allocated?

Thanks, Dean

Steven Connell over 8 years ago in reply to Dean Hofstetter

TI__Mastermind 45025 points

Dean,

That sure doesn't look right at first glance. My first thought is that it may be related to the 4 byte CRC issue (adding space for it unnecessarily). But I'll have to take a closer look at this tomorrow.

Steve

Dean Hofstetter over 8 years ago in reply to Steven Connell

Expert 2525 points

Steve,

To further muddy the waters on this PBM_alloc() issue, the following also doesn't look right. In HwPktOpen( ) only AppPrivate, pDataBuffer, and BufferLen are defined. In RxPacket( ) AppPrivate, pDataBuffer, BufferLen, and now a new one: DataOffset is defined. In addition, PBM_setDataOffset(hPkt, PKT_PREPAD) used in RxPacket( ) but not in HwPktOpen( ). Something not right.

Other observations.
1. TxFree doesn't seemed to be used correctly. In this code, TxFree is 0 until the PHY link is up. Once the link is up, it then tries to TX any queued packets. That doesn't make any sense. Why would the NDK (or application code) try to send Ethernet packet without a physical link? And what good is sending those packet out maybe minutes or hours later (whenever the link is finally up)? Other NSP code bases (C66x for example) only use TxFree to signal that EmacStart( ) has finished. It has nothing to do with the link status.

2. I would have thought EmacSend( ) would check if the link was up. If so, then queue up the packet and send it down to the low level driver. If not, don't queue the packet and return -1 to let the NDK know something wrong.

Thanks, Dean

Steven Connell over 8 years ago in reply to Dean Hofstetter

TI__Mastermind 45025 points

Dean,

Dean Hofstetter said:
In RxPacket( ) the calls to PBM_alloc( ) use PKT_MTU + 4 + PKT_PREPAD as the input.

As far as I can tell, this is incorrect. PPPoE uses a standard Ethernet frame to encapsulate its data. Such a frame will still have a 14 byte header, consisting of the source and destination MAC addresses, and the 2 byte type field.

For PPPoE, the type field will be either 0x8863 or 0x8864, both of which are checked for in the NDK (in NIMUReceivePacket). The PPPoE header is then found in the beginning of the following payload, just as with IP, or any other protocol.

There is also a comment about PPPoE in inc/nimu_eth.h about PKT_PREPAD and the extra size being needed for PPPoE, and that this size must be 22 bytes, but I can't seen anything in the code to justify this. There is a check to ensure that the layer 2 header size is not greater than 22 bytes, but nothing about it needing to be 22 bytes. Furthermore, the layer 2 header size is set to 14 bytes in NIMUReceivePacket.

As for the magic number "4", I believe the driver author was trying to account for the 4 byte CRC. However, this driver implementation allows the hardware to tack on the CRC, so there is no need to account for it here. And since the CRC is stripped by the NIC, it won't even make it this far up anyway.

I'll file a bug for this and reply back with it here.

I'll also look into your other comments/feedback next and respond back here, too.

Steve

Steven Connell over 8 years ago in reply to Dean Hofstetter

TI__Mastermind 45025 points

Dean,

Here's the bug ID:

SDOCM00120148 (NSP) size passed in PBM_alloc() should be 1514

To further muddy the waters on this PBM_alloc() issue, the following also doesn't look right. In HwPktOpen( ) only AppPrivate, pDataBuffer, and BufferLen are defined. In RxPacket( ) AppPrivate, pDataBuffer, BufferLen, and now a new one: DataOffset is defined. In addition, PBM_setDataOffset(hPkt, PKT_PREPAD) used in RxPacket( ) but not in HwPktOpen( ). Something not right.

This issue looks to be directly related to the issue covered by the above bug. The call to PBM_setDataOffset() is making space for the PKT_PREPAD that was used in the allocation just prior. I don't think it's necessary and should be removed. I can't explain why that call doesn't also take into account the additional 4 bytes that was specified in the PBM_alloc() call right now, but if it did, I would say that should be removed as well.

1. TxFree doesn't seemed to be used correctly. In this code, TxFree is 0 until the PHY link is up. Once the link is up, it then tries to TX any queued packets. That doesn't make any sense. Why would the NDK (or application code) try to send Ethernet packet without a physical link? And what good is sending those packet out maybe minutes or hours later (whenever the link is finally up)? Other NSP code bases (C66x for example) only use TxFree to signal that EmacStart( ) has finished. It has nothing to do with the link status.

From what I can tell, the code of EmacSend is following the Ethernet driver design guide (sprufp2a.pdf). Looking at the flow chart in Section 1.3, the code looks to be following these steps:

The Text highlighted in green implies to me that TxFree is related to the link status. The text in orange implies that packets that are enqueued here, but not sent because the transmitter is busy or link is down, need to be sent at a later time.

And this is the issue I saw when reviewing the code. There was never any place that drained the TX queue (orange part), in the event of TX packets being queued up in EmacSend (due to the link being down or transmitter busy). Also, HwPktTxNext only dequeues a single packet from the TX queue. But what if EmacSend were called multiple times, with multiple packets enqueued? The code in HwPktPoll, that drains the TX queue, was added to take care of this.

One example of this that I can think of is the cable being physically disconnected during a file transfer. If someone unplugs a cable in the middle of the transfer, multiple packets would queue up here. Now, if hours have gone by, it may not make sense to send them out at this point, but for a brief interruption, it seems that it would.

2. I would have thought EmacSend( ) would check if the link was up. If so, then queue up the packet and send it down to the low level driver. If not, don't queue the packet and return -1 to let the NDK know something wrong.

You could certainly do it this way, especially if you don't see any need to queue up TX packets like this, and don't want to send possibly stale packets. The code of EmacSend is just following the guide, so I think it's ok as it is. Also, the The TI-RTOS EMAC driver for Tiva also checks the link status in the polling function.

Steve

Michael Greenspan over 8 years ago in reply to Steven Connell

Intellectual 565 points

This is very important for us and perhaps other users as well...the Tech Ref says "Host error interrupts require hardware reset in order to recover". If you have found a way which does not require hardware reset, could you provide the details on the level of the EMAC registers? That is, what do we have to write, to which registers, to clear this interrupt. Also, a related question...is a watchdog reset sufficient, or does one need to pull the reset pin low? Thanks

Michael Greenspan over 8 years ago in reply to Steven Connell

Intellectual 565 points

Steven Connell over 8 years ago in reply to Michael Greenspan

TI__Mastermind 45025 points

Hi Michael,

Please download the latest NSP for the evm6748/omapl138 and see the updates made to the driver to handle this. I would like to give you more details, but I am out of the office on leave right now. You should be able to see how it's handled in the driver code (running a diff between the latest and previous versions of the driver might help to see the changes).

Steve

Processors

Processors forum

Improvements to the C6748 NSP