Hi,
I work for Distech-controls and we are working on a device with an AM3352 since three years. Since then, we only used eth0 in the device. Now, we need to use eth1 as well for a new features but we have an issue when we configure the interfaces.
ETH1 is setup in the init.am335xevm.rc at boot
ETH0 is setup in our Android application ~15 sec later.
When the application setup eth0 there is a probability that the interfaces stop working, but it does not happen all the time.
When ETH1 is setup in the RC file the cswp_ndo_open is called and everything is well configured. At the end of the opening function, the NAPI is enabled and then the interrupts are enabled on the hardware. The interrupts for the INTC were configured and enabled by request_irq before in the probe function.
As the Linux Core CPSW User's Guide by TI said : If eth0 is up, then eth0 napi is used. eth1 napi is used when eth0 interface is down. We are so in that situation, so when we enter the cpsw_interrupt we use the NAPI of ETH1. There is the code of our interrupts; I paste it here because there a lot of version and a lot of patches for that file.
static irqreturn_t cpsw_interrupt(int irq, void *dev_id)
{
struct cpsw_priv *priv = dev_id;
if (likely(netif_running(priv->ndev))) {
cpsw_intr_disable(priv);
cpsw_disable_irq(priv);
napi_schedule(&priv->napi);
} else {
priv = cpsw_get_slave_priv(priv, 1);
if (likely(priv) && likely(netif_running(priv->ndev))) {
cpsw_intr_disable(priv);
cpsw_disable_irq(priv);
napi_schedule(&priv->napi);
}
}
return IRQ_HANDLED
}
So like I said, we enter the interrupt and because ETH0 is down, we enter in the else section and then use the NAPI of ETH1. We also disable the interrupt in the hardware and in the INTC. This will be later the task of cpsw_poll which is scheduled by the NAPI to enable it after the DMA process.
The problem happens when we setup ETH0 in our application. We enter in cpsw_ndo_open but we did not disabled the interrupts because if (!cpsw_common_res_usage_state(priv)) return false. Then there is a very huge probability that the interrupt will be call because there is traffic on ETH1. So when we enter the interrupt we are now in the case where we are supposed to use ETH0 NAPI because if (likely(netif_running(priv->ndev))) will return true but the NAPI is not yet enable and then when we try to schedule the NAPI we can’t and cswp_poll is never call and the interrupt in the INTC are never re-enable because we don’t activate them in cpsw_ndo_open.
I post here because maybe somebody was in the same situation or maybe it’s a known issue or maybe it’s not a bug anymore in the recent cpsw version.
For now we add verification in the interrupt to still use the ETH1 NAPI if the ETH0 is not yet enable. At the moment, it seems to do the job I was unable to reproduce the bug.
if (likely(netif_running(priv->ndev)) &&
!test_bit(NAPI_STATE_SCHED, &priv->napi.state)) {
If someone has a better way to patch this I will very happy to hear it.