This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

SMSC9221 chip reset and recovery

Hi,

We've created a board with OMAP 3530 (3730 in next revision) processor and the SMSC9221 chip.  We use WinCE 6.0 (we plan to move to 7.0), and the TI BSP has a driver for it (actually for 9118, but it should be compatible).

Once a while, in very specific circumstances, we lose the Ethernet connectivity, and after about 30 seconds the SMSC chip recovers and continues to work fine.  I have several questions related to this:

1.       There is a defined value of the GPIO pin for this chipset - LAN9115_RESET_GPIO and it’s being used only in boot loader (EBOOT\main.c). Do we hard reset the SMSC chipset only in boot loader? Later, under OS – only the soft reset is being used? Can the hard reset be also used?

2.       Why do you think we have this 30 seconds delay there is it related to the TIMEOUT_VALUE define (lan91c.c)?

3.       There is the Smsc9118CheckForHang function in the smsc9118.c file, and it’s accessible from the outside of the driver.  Is it being used?  Is the SMSC chip reseted after too many packets is lost?  This function should be executed every 2 seconds by default, so why the chip recovers after 30 seconds?

4.       Is the number 1500 (in the Smsc9118CheckForHang function) number of lost packets or bytes? From the description it looks like this is number of packets.  It seems to be a very large number.  Why it’s so large?  Can we make it lower?  Is it possible that we have this 30s delay because we wait for 1500 lost packets?

5.       I had problems with RETAILMSG macro – I couldn’t make it to work inside of the smsc driver.  It seemed to work, because the OS was running slower, but I couldn’t see any output messages on the serial terminal, even though I enabled the OAL messages in the boot loader menu, and I saw messages from differend peaces of code.  Everything seems to be defined – RETAILMSG should be defined as SMSCPrintfW, and should send the strings to the serial port.  Do I need to do something special?  I’ve even enabled the SMSC_DBG define and set values to g_DebugMode, to activate all the debug messages, but it was still the same – no output, just a delay – so it looks like the strings are processed, but not sent.

 Please, let me know.

Thanks a lot!

Zack

 

  • I've moved it a little farther.

    I know that the Smsc9118CheckForHang function is being used, and it’s executed every 2 seconds (and this is the default value).  The problem we have is related to ESD, and we can simulate it with setting the Reset line down for a short period of time (less than 30ms). However the reaction of the SMSC chip is different than before – it doesn’t recover at all.  Anyway, I’ve modified the Smsc9118CheckForHang function. Before we check QUEUE_COUNT(&pAdapter->TxDeferedPkt), I call Smsc9118QueryInformation function with OID_GEN_MEDIA_CONNECT_STATUS, and check if result is different than NdisMediaStateConnected (without this change communication loss is never detected, and the Smsc9118CheckForHang function always returns false).  If the result is different than NdisMediaStateConnected, I call: Lan_Initialize, InitializeQueues, ChipSetup and ChipStart.  So basically I reinitialize the chip.  And after that, the chip recovers and is fully functional again.

    Now, it does recover after about 5 seconds.  Is it possible to recover sooner?  It may be not possible, because it takes similar time to initialize the network connection after powering up the entire device… But if you have any ideea on how to spead it up – please let me know.

    Is changing polling rate from 2 seconds to i.e. 1 second recommended?  Probably I could mke it 1 second faster because of this.
    The other question I have is, why the chip reacts in so unexpected way on the Reset signal?

    Actually, even if I do the full length reset (30ms or more), it doesn’t recover.  Please, let me know if this is expected beahavior.  My understanding is, that after hard reset, the chip should initialize itself and start working properly after some initialization.

    Please, let me know.

    Thank you!!

    Best Regards,

    Zack

  • Since you mention this is looking like an ESD related problem, are you looking into hardware fixes (adjusting layout, shielding, etc.) to get rid of the hang problem entirely?

    Zbigniew Zadecki said:
    Now, it does recover after about 5 seconds.  Is it possible to recover sooner?  It may be not possible, because it takes similar time to initialize the network connection after powering up the entire device… But if you have any ideea on how to spead it up – please let me know.

    It is my impression that it will still take the time to recover just like when you power up the device if the SMSC device has to be fully reinitialized.

    Zbigniew Zadecki said:
    Is changing polling rate from 2 seconds to i.e. 1 second recommended?  Probably I could mke it 1 second faster because of this.
    The other question I have is, why the chip reacts in so unexpected way on the Reset signal?

    I would have to let the WinCE driver folks comment on any other potential problems in doing this, but my initial feel would be that this is just going to cause a bit more overhead because you are checking on it more often, but if your hardware is unstable it seems like it may be worth a shot.

    Zbigniew Zadecki said:
    Actually, even if I do the full length reset (30ms or more), it doesn’t recover.  Please, let me know if this is expected beahavior.

    This might be a better question for SMSC, if their chip is not resetting properly that is a bit unusual, though on the other hand it could be that the driver is not aware of the reset and it does not recover because of some internal lock up.

  • Zack,

    Answers to your questions:

    1. yes, RESET_GPIO is only used in Eboot. It can be used in CE OS if it is needed. The driver is designed to use soft reset.

    2. lan91c.c is the driver for KITL/Eboot, not miniport driver. The TIMEOUT_VALUE is the longest value the driver will wait for the packet to be sent out. It does not mean, there is 30s delay for every packet.

    3. Smsc9118CheckForHang is called from CE NDIS layer every 2s, not sure if this can be changed. When SMSC hardware is not available to send packets, the TX packets will be sent to a deferred packet queue. In Smsc9118CheckForHang(), it checks if there are too many packets get deferred, if YES, then it assumes there maybe hardware issues.

    4. 1500 is number of deferred TX packets. You can make it smaller.

    5. As for RETAILMSG problem , When the driver initialized, it should print DRIVER_VERSION and DATECODE. Did you see those messages? There should not be anything special.

    Thanks,

    Tao

     

  • Zack,

    First of all, I need to understand why you need to reset the SMSC miniport driver.

    When Smsc9118CheckForHang() returns TRUE, it means the miniport is not responding and hence NDIS will call Smsc9118Reset to reset the Miniport driver. 

    If there is connectivity issue, we should solve the issue. It is not a good way to relay on reset mechanism.

    The polling time for checkForHang() can be changed through the following NDIS call during driver initialization (RegisterAdapter() in smsc9118.c)

    VOID NdisMSetAttributesEx(
    NDIS_HANDLE MiniportAdapterHandle,
    NDIS_HANDLE MiniportAdapterContext,
    UINT CheckForHangTimeInSeconds,
    ULONG AttributeFlags,
    NDIS_INTERFACE_TYPE AdapterType
    );

    Thanks,
    Tao
  • Bernie,

    Thank you for your response.

    Bernie Thompson said:
    Since you mention this is looking like an ESD related problem, are you looking into hardware fixes (adjusting layout, shielding, etc.) to get rid of the hang problem entirely?

    Yes, we are looking into hardware fixes, but we'd like to have a software protection too.

    Bernie Thompson said:
    It is my impression that it will still take the time to recover just like when you power up the device if the SMSC device has to be fully reinitialized.

    OK, thank you for confirmation on that.

    Bernie Thompson said:
    I would have to let the WinCE driver folks comment on any other potential problems in doing this, but my initial feel would be that this is just going to cause a bit more overhead because you are checking on it more often, but if your hardware is unstable it seems like it may be worth a shot.

    OK. In the meantime I've tested it, and it has some impact on the delay.

    Bernie Thompson said:
    This might be a better question for SMSC, if their chip is not resetting properly that is a bit unusual, though on the other hand it could be that the driver is not aware of the reset and it does not recover because of some internal lock up.

    Yes, it looks like the driver is not aware of the reset.  I need to investigate it farther.  Thank you.

    Best Regards,

    Zack

  • Tao,

    Thank you for your responses!

    For the RETAILMSG problem, I cannot see the DRIVER_VERSION and DATECODE in the output.  I walked around this problem by using the LED port and MessageBox.

    In our case, the CheckForHang didn't detect the incorrect situation. The TX packets were not sent to the deffered packet queue. After the SMSC chip went to that error state, nothing in the driver was updated.  I had to call dirrectly the driver for the current state (see the second post for details).

    Also, even if I just returned TRUE, the chip did't recover from that state - I had to call all the functions listed above to fully initialaze the chip.  Probably Smsc9118Reset didn't initialize everything, and that's why it didn't recover.

    Tao Zhang said:
    The polling time for checkForHang() can be changed through the following NDIS call during driver initialization (RegisterAdapter() in smsc9118.c)
    VOID NdisMSetAttributesEx(
    NDIS_HANDLE MiniportAdapterHandle,
    NDIS_HANDLE MiniportAdapterContext,
    UINT CheckForHangTimeInSeconds,
    ULONG AttributeFlags,
    NDIS_INTERFACE_TYPE AdapterType
    );

    Yes, I found it and tested 1 second.  It has some impact on the delay.
    Thank you!!
    Best Regards,
    Zack