This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320F28067: CAN Tx stops working at a random time

Part Number: TMS320F28067

Hello,

colleagues of mine identified that the CAN communication stops in a test setup at a random time with a TMS320F28067 processor. We connected to the non-communicating CPU via JTAG (without loading a new firmware image and also without resetting the CPU in order to preserve the error situation).

We use Mailbox 0 in Tx direction. We see that Mailbox 0 holds a message and the corresponding transmit request bit inside register CANTRS is being set but never cleared by the peripheral. So it looks like the CAN controller stops transmitting the message located inside Mailbox 0. We also measured via oscilloscope that the CAN_TX pin of the CPU has always a high level, so no messages are being transmitted.

Here is a screen-shot of the relevant registers:

The registers that we checked are:

CANME = 63 -> Mailbox 0…5 are enabled.

CANMD = 62 -> Mailbox 0 is selected as a Tx Mailbox.

CANTRS bit 0 -> TRS0=1, which means a transmit request has been set for Mailbox 0. We expect this bit to get 0 after the message is being shifted out but this never happens.

CANES = 16 -> No power down or bus-off state. So no error indication that would prevent a transmission of a CAN telegram.

CANTEC = 0 -> No transmission error.

CANTIOC = 9 -> CANTX pin is used for transmission.

CANTSC is incrementing, which means CAN controller is clocked.

Can you please help us to check why the CPU stops suddenly shifting our messages on the CAN bus? Is there any further register we can look at?

Best regards,

Andreas

  • It appears you are encountering "Unexpected Cessation of Transmit Operation" described in page 22 of SPRZ432N. Can you check if the module continues to receive normally under this condition?
  • Hello Hareesh,

    Can you please post a link to that document? Google doesn't find it ;-)

    Well, I can only speak on the behalf of my colleague and ask him to verify my next statement.

    "Usually we get an interrupt as soon as the CAN controller receives any message but obviously we do not get any interrupt anymore."

    But we can of course manually read the Rx Mailboxes in some background task and therefore clear the Rx Mailboxes and check afterwards if new messages arrive (independent if an IRQ is being triggered or not). So the receive direction can be checked by us.

    Please note that it can take about 1 week until I come up with further information, since the problem occurs at my colleague's office (we have limited access to the test station) and I simply debugged the issue with my colleague via Desktop sharing.

    Andreas
  • Andreas,

    www.ti.com/.../sprz342n.pdf

    Whether or not the node continues to receive normally in this anomalous condition is critical to determine if it is indeed the issue mentioned in the errata or something else. I will await further inputs from you.
  • Hello Hareesh,

    OK, I know already that the IRQ function, which is usually being called upon a received message, is not called anymore in our error condition. That means I must write a function, which reads the Rx Mailboxes inside a background task and increment an own Rx read-counter, since something is not working with the IRQ generation.

    Anyway, if we are really facing the ‘Unexpected Cessation of Transmit Operation’ issue, then I expect the following:

    1) We reproduce the error situation.

    2) I ensure that the following function is being called for one time:

    3) I check the following:

        a) I see either on the CAN bus that the message pending inside Mailbox 0 is being directly transmitted.

        b) Or I will insert an own test message inside Mailbox 0 and set bit 0 of CANTRS before I re-enable the IRQs via EINT in my above function.

    Can you please verify if the above function that I wrote called ‘LeaveUnexpectedCessationOfCanTransmit’ contains everything that is needed to get out of that situation that is described inside the errata?

    Can you please advise if we expect the message inside Mailbox 0 is being send when calling the function or if we need to set the transmit request bit again (point 3a vs. 3b)?

    If we don’t see anything on the CAN bus, then we are facing most likely a different issue, independent if Rx continues to work or not (since the errata states that with the above function we get out of that error situation). Would you agree?

    Andreas

  • OK, I know already that the IRQ function, which is usually being called upon a received message, is not called anymore in our error condition.

    This means the reception has stopped as well. This doesn’t sound like the errata issue then. Have you checked if the messages are indeed transmitted on the bus? If you are using a CAN bus analyzer, did you see any error frames on the bus preceding this condition? Is the node under bus-off condition? If so, you should see the CCR bit being set. Do you? 

    That means I must write a function, which reads the Rx Mailboxes inside a background task and increment an own Rx read-counter, since something is not working with the IRQ generation.

    I disagree. If the receive interrupts are not being generated, you need to understand why. 

    Anyway, if we are really facing the ‘Unexpected Cessation of Transmit Operation’ issue, then I expect the following: 

    1) We reproduce the error situation.

    Based on our past experience, we have been unsuccessful in reproducing this error condition "at will" in the lab. 

    2) I ensure that the following function is being called for one time:

    That takes care of the situation, if it is indeed the errata issue. At this point, we are not sure if it is indeed that. Also note that your function would work just as well to clear a bus-off condition. Also, what would lead your code to execute that function? Your application should sense some kind of a time-out condition in order to call this condition, correct? 

    a) I see either on the CAN bus that the message pending inside Mailbox 0 is being directly transmitted. 

    b) Or I will insert an own test message inside Mailbox 0 and set bit 0 of CANTRS before I re-enable the IRQs via EINT in my above function.

    Why not take advantage of the Mailbox timeout feature of eCAN? The H/W will automatically take care of this for you, instead of the code. 

    Can you please verify if the above function that I wrote called ‘LeaveUnexpectedCessationOfCanTransmit’ contains everything that is needed to get out of that situation that is described inside the errata?

    As emphasized before, it is important to determine what you are seeing is indeed the errata issue. Considering the fact that reception has ceased as well, I wonder if the node is really in BO. On a different note, you are not employing 32-bit R/W while checking the CCE bit. You should do something like the below:

    do
        {
            ECanaShadow.CANES.all = ECanaRegs.CANES.all;
        } while(ECanaShadow.CANES.bit.CCE != 1 );  	// Wait for CCE bit to be set
    

    Can you please advise if we expect the message inside Mailbox 0 is being send when calling the function or if we need to set the transmit request bit again (point 3a vs. 3b)?

    I think this is what you are asking: Let's assume the module is in the erroneous state. Code calls the LeaveUnexpectedCessationOfCanTransmit() function. Will the message in MBX0 be automatically sent after that or should the code set TRS once again. Assuming that is your question, answer is as follows: I expect the data to go out without having to set TRS again. However, I have not verified this in silicon. As mentioned before, I have not been successful in reproducing this condition in the lab. 

    If we don’t see anything on the CAN bus, then we are facing most likely a different issue, independent if Rx continues to work or not (since the errata states that with the above function we get out of that error situation). Would you agree?

    I am not clear what you are asking here.

  • Hello Hareesh,

    First some answers and comments to your questions:

    1) I agree that we need to identify why the CAN IRQ isn't called anymore when telegrams are on the bus (so why no IRQ due to received messages). I also agree that we cannot yet say that we face the same situation as described inside th errata.

    2) My colleague checked the error history in our device, we faced indeed a bus-off condition before. My colleague cleared that bus-off condition via a function of a CAN stack vendor we are using (the company Port), so the bus-off condition was supposed to be cleared. I can provide you the source code for that function if you want.

    3) I have also a test station, which unfortunately shows the problem very rarely.  I was not yet able to connect yet with my debugger in my test station, but I enabled the auto bus on (ABO) bit and therefore I also stopped calling the software function that clears the bus off. I was still able to recreate that situation even with ABO set to true. But frankly, I cannot yet say that the situation in my colleagues test station and my problem is the same, I need to make more tests.

    4) We only measure the bus traffic once the error situation appeared, so I cannot yet say if there were error frames before. But the previous bus-off condition appears only when there have been error frames, right?

    5) I know for my colleagues test station, that the bus-off condition was present before but was cleared afterwards. The CCR bit is definitely 0 when the error situation occurs, it was the 1st thing that we checked. Maybe the source code that clears the bus-off has a bad timing or does not wait for CCR bit to be set/cleared as an acknowledge for the CCE setting, I need to check that.

    6) Yes, my code must sense some timeout for the Tx and then call my function 'LeaveUnexpectedCessationOfCanTransmit'. But this is something that we implement once we managed to get the messages again on the CAN bus.

    Thanks for the hint that I should apply a 32-bit read instruction using 'ECanaShadow.CANES.all'. I will update my code.

    7) About my statement:

    "If we don’t see anything on the CAN bus, then we are facing most likely a different issue, independent if Rx continues to work or not (since the errata states that with the above function we get out of that error situation). Would you agree?"

    If we face the error situation that I described and if we do not get out of that error situation in Tx direction (which means if we do not see any message being sent by our device), then we face another problem due to the fact, that the function I wrote will definitely ensure that a Tx message, which being placed inside our mailbox, is transmitted on the CAN bus.

    8) About the HW timeout feature, I simply need to read about it. I do not yet fully understand the functionality but for sure we consider this as an option.

    I suggest that I focus first on the issue that my colleague could reproduce quite easily:

    1) I update my function to ensure that the CANES register is read via 32-bit instruction (the value is loaded into a shadow register).

    2) I let my colleague reproduce the problem.

    3) I let my colleague run the function that I wrote after I updated the function.

    4) After calling my function we expect the CAN controller to transmit messages again on the bus, since we don't see any bad value in the CAN controller registers. Do you agree?

    5) We also try to investigate if Rx is working (why the CAN IRQ isn't called anymore upon receiving messages).

    PS: Our test station is located in Israel and next week the guys celebrate the Pessach holidays, so I am not sure if I can give you further updates before April 30th.

    PPS: We could also show you the problem in a WebEx session, would that be OK for you (just in case we are running out of ideas).

    Thank you for your support so far, I will keep you updated

  • Closing this thread since it is being worked offline. Will post resolution, if warranted.
  • Hello Hareesh

    It seems that the situation is as follows:

    - We somehow managed to put the CAN controller in a state, where the CCE bit inside the CANES register does not reflect the CCR bit inside the CANMC register. This situation is reflected in the very first picture that I posted.

    - The CAN controller does not communicate in case that the CCE bit is being set to true. This is a natural behavior but it would be nice if this could have been mentioned inside the bit description of the CAN controller manual (it is only written that you have access to configuration registers).

    - The error situation is most likely introduced due to the fact, that after a bus-off condition we called a 3rd party ClearBusOff() function too frequent (every 62.5us). This function was in addition programmed in a way that the code did not wait for the CCE bit to follow the CCR bit and this violates the following flow diagram.

    - We finally wrote our own function that gets us out of a bus-off situation, our new function is programmed according to the above flow diagram and includes in addition a timeout in case that the CCE bit does not follow CCR within a certain time (in that case we repeat the sequence 500[ms] later). The new function is called 500[ms] after detecting a bus-off condition in order to allow the bus to stabilize the communication.

    It looks like the problem is solved.

    Thank you very much for the support.

    Andreas