This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

EDMA misses a McBSP1 data receive event when too busy?

Other Parts Discussed in Thread: TMS320C6713B, SM320C31

Hi, I made my C6713 communicate with three serial devices through three serial ports, McASP1, McBSP0 and McBSP1, by EDMA.

I triggered the three devices simultaneously (repeatedly) and expect them all to respond with data sent back to the C6713 simultaneously for each trigger.

I monitored the data being sent back to the C6713 (by oscilloscope) and set a breakpoint in my program to detect when my program goes astray.

I discovered that the EDMA CIPR register would occasionally miss registering a data receive event from the McBSP1!

Has anyone detected a similar problem when multiple serial ports in the C6713 are being used via EDMA?

Can the EDMA CIPR do this when too busy?

Are there any solutions to this??? Please help!

C.M.

  • CMA,

    There is not enough information to give you a really good answer. How fast is the DSP running? How fast are the serial ports running? What data width are you handling with the EDMA?

    Trying to come up with at least two uneducated guesses, the first one that comes to my mind is that you are missing or clearing a CIPR bit because of a race condition in your ISR where you clear the CIPR bits.

    The second would be loading, as you suggest. If you want to eliminate system loading as the cause, slow the serial clock rates down by a factor of 10 and see if the problem goes away or not.

    Regards,
    RandyP

  • Hi Randy,

    The DSP core is running at 200MHz, the McBSP & McASP port transmit clocks are at 12.5MHz, and the receive clocks are at 10MHz. I'm not so sure what may be causing this crashing problem any more. It happens randomly after repeating the same process, sometimes it happens after less than a second, sometimes it happens after hours! The shortest process repetition period has been 300 microseconds. I tried setting the repetition period at 10ms and it still crashes randomly.

    I'm implanting many debugging nibblets all over the program to generate pulses on different unused GPIO pins to see if I can catch the cause now. It's frustrating!

    C.M.

  • It does sound like you have a lot of good experience at debugging. Best of luck to you.

    What is the evidence why you assume the CIPR is missing an event?

    I had some other questions in my other post, but you can decide what level of detail you want us to work with.

    Regards,
    RandyP

  • Oh, I'm receiving just four 16-bit words through each of the three McASP1/McBSP0/McBSP1 ports from the external devices. Since I highly suspect it is the random data receive process of the EDMA function that may be causing this random crashing problem, I have now tied the FSRs and the CLKRs of the three ports together to receive one identical data set simultaneously. I noticed that sometimes the ordering of the EDMA receive processes for the three ports still changes, contrary to my expectation.

    I noticed that one can set a 'priority' for an EDMA receive event. However, there is not much details mentioned about this setting in the EDMA user guide. Do you know how will the behavior of the EDMA receive process be affected by setting different or the same priorities for the different serial ports? (I tried to figure this out by experiementing before, but could not get any solid conclusion from what I observed).

    C.M.

     

  • C.M.,

    CMA said:
    I'm receiving just four 16-bit words through each of the three McASP1/McBSP0/McBSP1 ports

    Do you get an interrupt from each port after each 16-bit half-word comes in? That would be a total of 12 expected interrupts.

    CMA said:
    I have now tied the FSRs and the CLKRs of the three ports together to receive one identical data set simultaneously.

    At 10MHz receive clock rate for 16-bit half-words, that is 1.6us. At 200MHz CPU clock, that is 320 DSP clocks for every three interrupts from the EDMA. That is not an infinite amount of time.

    If you slow down the clock rate and the problem goes away, then the problem is most likely DSP loading which also means system design.

    If the problem remains, then it is probably the logic within the ISR and EDMA interrupt dispatcher.

    CMA said:
    I noticed that one can set a 'priority' for an EDMA receive event.

    Please be more specific on where you noticed this. I doubt very much that there would be anything to improve your situation, because you only have three EDMA events and they all have to be done at the same time.

    The big benefit of using the EDMA is that the DSP does not have to respond with an interrupt after every word comes in on the serial ports. Let the EDMA read in all four samples before sending an interrupt, and you will save a lot of DSP ISR overhead.

    My opinions, at least.

    Regards,
    RandyP

  • Hi Randy,

    [Do you get an interrupt from each port after each 16-bit half-word comes in? That would be a total of 12 expected interrupts.]

    I setup the EDMA PaRAM to generate one receive event after receiving the whole 4 halfwords set, so it should generate 1 interrupt for every receive event, which means 1 interrupt for each serial port and 3 interrupts in total (at the most) per cycle from the three ports.

    Wait ... I made my interrupt service routine clear as many 'receive events' as it sees in the CIPR. Does the EDMA generate one interrupt to the CPU for every bit set in the CIPR??? If it does, then in case my ISR cleared more than one bit in the CIPR, does that leave some interrupt(s) unhandled??? (Just a thought that just came to mind) Or does clearing say two bits in the CIPR settles two interrupts to the CPU?????

    [At 10MHz receive clock rate for 16-bit half-words, that is 1.6us. At 200MHz CPU clock, that is 320 DSP clocks for every three interrupts from the EDMA. That is not an infinite amount of time.]

    Should be 3 receive events per cycle, and a cycle I have tested is repeated only every 300us or longer, so total should be just 3 interrupts per 300us at the most if I understood correctly.

    [If you slow down the clock rate and the problem goes away, then the problem is most likely DSP loading which also means system design.]

    I implanted debugging nibblets to generate pulses on unused GPIO pins when some flags goes abnormal, but ever since I implanted them I haven't gotten any crashes for more than a continuous day of test already!

    [If the problem remains, then it is probably the logic within the ISR and EDMA interrupt dispatcher.]

    I haven't really caught any solid evidence what the cause is yet. So far it is the three flags which are supposed to be cleared upon servicing the three expected interrupts, that always has one left uncleared when the process went astray, which gave me the suspicion that an interrupt by the EDMA may be being missed. But, because I'm unable to see all the signals produced over an entire process cycle in sufficient details on my scope, so so far thius is still a questionable 'suspicion'.

    [Please be more specific on where you noticed this. I doubt very much that there would be anything to improve your situation, because you only have three EDMA events and they all have to be done at the same time.]

    This is in Section 3.1 of the EDMA Reference Guide ('spru234').

    [The big benefit of using the EDMA is that the DSP does not have to respond with an interrupt after every word comes in on the serial ports. Let the EDMA read in all four samples before sending an interrupt, and you will save a lot of DSP ISR overhead.]

    Yes, this is why I'm using it.

    Thanks for the discussions so far. It has given me some ideas to try out.

    C.M.

     

  • C.M.,

    [Please forgive me if you know this already, but you can easily make the quotation boxes when editing a reply to a post. The text from the previous post is shown above your edit window for your reply. If you select some text in that previous post and then click the red Quote link at the bottom of the previous post, a quote box will be inserted with the name of the previouis post's author and the text that they have written.]

    CMA said:
    Wait ... I made my interrupt service routine clear as many 'receive events' as it sees in the CIPR.

    Honestly, I have not used this processor for a very long time, so I do not know the detailed structure and timing of the interrupt generation. The EDMA Reference Guide Section 1.15 explains what needs to be done when servicing an EDMA interrupt. You may have a race condition in that code. I do not know the details, but the Ref Guide will help you.

    CMA said:

    Please be more specific on where you noticed this. I doubt very much that there would be anything to improve your situation, because you only have three EDMA events and they all have to be done at the same time.]

    This is in Section 3.1 of the EDMA Reference Guide ('spru234').

    This is just saying that there is no priority in the servicing of events to place them into queues, but the queues have different priorities. I still doubt this is related to your problem.

    CMA said:
    I implanted debugging nibblets to generate pulses on unused GPIO pins when some flags goes abnormal, but ever since I implanted them I haven't gotten any crashes for more than a continuous day of test already!

    Did you put some of the debugging nibblets in the ISR? They may have changed some critical timing.

    Good debugging.

    Regards,
    RandyP

     

  • Hi Randy, after another full week of trying to catch the cause of the occasional timeout-caused crashing problem, and excluding all possible causes by my own program, I have only one guess/suspicion/conclusion left.

    That is, after the McBSP/McASP received a full set of data, either it did not trigger an EDMA process, or the EDMA did not complete the transfer request, or the EDMA did not register an interrupt with the CIPR after completing the transfer request. And I'm sure that it's a 'miss' and not a 'delay' because this problem happens regardless of how fast or slowly I repeated my program process.

    In other words, whenever the crash occurred (I stopped the timer upon detecting the program gone astray) I can clearly see (with an oscilloscope) that the data for the McBSP/McASP did arrive on time as normal, but the ISR which should be invoked upon receiving the data did not get invoked, which caused my program to timeout and gone astray.

    Is there any possibility that I did not set up some register(s) properly still?

    (Judging from the fact that it sometimes runs properly for hours, repeating at 4500Hz, I doubt there is any solution to this problem, but thought I ask anyway since I don't know how this chip is designed internally.)

    C.M.

  • Are you using EDMA_intDispatcher?  I think the clearing of bits in the CIPR is probably key to this whole issue.  I recommend posting your ISR code (at least the parts that touch EDMA/CPU registers).

  • Hi Brad,

    I saw the word 'Dispatcher' somewhere within the CCS before, but that was a long time ago and now I can't even find it any more. Anyway, I never touched this setting before, so it should be in the default state still. I never understood what this setting is about. Should I understand something here?

    Below are my complete original ISR codes (I've removed the debugging codes I implanted after detecting the problem at issue). I hope you can spot some problems, because I'm so frustrated with this problem.

    interrupt void ReceiveDataEDMA (void)
    {
          temp_flags = *EDMACIPR & 0x0000a080;   // Read to find out which serial port interrupts are pending?

          while (temp_flags != 0)
          {
                if ((temp_flags & 0x00002000) != 0)    // Check whether McBSP0 received a data frame from C50-2 by EDMA?
                {            // If yes, register the event

                      Data_Pending_Flags &= 0xffffdfff;   // Clear the data pending flag bit corresponding to C50-2 since a data set received interrupt event occurred.
                      *PaRAMMcBSP0RCNT = rx_no_of_words;  // Reset the CNT part of McBSP0 rcv PaRAM to enable another rcv event
                }

                if ((temp_flags & 0x00008000) != 0)    // Check whether McBSP1 received a data frame from C50-1 (Right Eye) by EDMA?
                {            // If yes, register the event

                      Data_Pending_Flags &= 0xffff7fff;   // Clear the data pending flag bit corresponding to C50-1 (Right Eye) since a data set received interrupt event occurred.
                      *PaRAMMcBSP1RCNT = rx_no_of_words;  // Reset the CNT part of McBSP1 rcv PaRAM to enable another rcv event
                }

                if ((temp_flags & 0x00000080) != 0)    // Check whether McASP1 received a data frame from C50-3 (Left Eye) by EDMA?
                {            // If yes, register the event

                      Data_Pending_Flags &= 0xffffff7f;   // Clear the data pending flag bit corresponding to C50-3 (Left Eye) since a data set received interrupt event occurred.
                      *PaRAMMcASP1RCNT = rx_no_of_words;  // Reset the CNT part of McASP1 rcv PaRAM to enable another rcv event
                }

                *EDMACIPR = temp_flags;      // Clear CIPR bits (i.e., clear interrupts) which were set upon entry to this ISR by writing 1 to each
          }

    }

  • You may have a race condition with Data_Pending_Flags, depending on how it is used in your code. If it is tested and set without disabling interrupts around the test & set, then an EDMA interrupt could disturb the test & set.

    You should not need to write to RCNT. You can use the linking process to have the EDMA automatically restore all PARAM values from a Link PARAM Set. The name RCNT implies to me that this is the ELERLD field, but the usage implies the FRMCNT/ELECNT word.

    How does your ISR avoid being an infinite loop? I do not see any code that changes temp_flags within the while-loop. If this is really all of your ISR code and it does not result in an infinite loop, then you must be getting some lucky side-effect from the global variable and re-entrant interrupts.

    Regards,
    RandyP

  • RandyP said:

    You may have a race condition with Data_Pending_Flags, depending on how it is used in your code. If it is tested and set without disabling interrupts around the test & set, then an EDMA interrupt could disturb the test & set.

         Data_Pending_Flags is set equal to 0x0000A080 only after the time is up for the next repetition, that is exactly the time when I declared the program as having fatally timed out if Data_Pending_Flags has not been cleared to 0 yet by the ISR. Data_Pending_Flags is only tested (not modified) at all other places of the program.

    RandyP said:

    You should not need to write to RCNT. You can use the linking process to have the EDMA automatically restore all PARAM values from a Link PARAM Set. The name RCNT implies to me that this is the ELERLD field, but the usage implies the FRMCNT/ELECNT word.

        I worked on the PARAM sets more than half year ago. At that time I found that for some reason, when I used the linking process to have all PARAM values restored automatically, my program would hang and appears to be locked in the EDMA process (in an infinite loop). Then I found that by simply setting the FRMCNT/ELECNT word within the linking PARAMs =0 resolved the hang-up problem. Since then I've been setting the FRMCNT/ELECNT in the main PARAMs to be nonzero manually to re-enable the EDMA process as done in my current ISR. I don't think this should cause any of the occasional failure at issue now though, or is it possible???

    RandyP said:

    How does your ISR avoid being an infinite loop? I do not see any code that changes temp_flags within the while-loop. If this is really all of your ISR code and it does not result in an infinite loop, then you must be getting some lucky side-effect from the global variable and re-entrant interrupts.

         temp_flags is set to be equal to (*EDMACIPR & 0x0000a080) right at the first line of my ISR. It is not tested or used by the rest of the program. It was declared as a local variable of the ISR originally, but because I suspected everything, so I changed it to be a global to see if it makes any difference to the problem at issue, and so now it's left as a global variable.

    C.M.

  • Gentlemen, I think I finally have very solid evidence that there is something wrong about the McBSP1-EDMA part of the C6713b chip now.

         I have done three major tests over the past few weeks, and below are my findings:

    Test 1:

         I connected the FSRs and CLKRs of the McBSP1, McBSP0 and McASP1 in parallel to receive repeatedly the same single set of data from one of my three external devices (that produced the data that need to be received by my C6713b), and ran the repetition tests. I found no intermittent failures at all (ran through an entire weekend, at 210us repetition period, without problem).

    Test 2:

        I connected the FSRs and CLKRs of the three serial ports, McBSP0, McBSP1 and McASP1, to my three independent but identical external devices, and found intermittent failures (i.e., at least one port timed out without receiving the data from its associated external device) always. Sometimes it took hours before failure, sometimes it failed within a second.

    Test 3:

       Although I never observed any missing data from the external devices, I (being a good engineer???) still suspected that may be my oscilloscope or my eyes are no good, and that one of the external devices actually did not produce the data as it should which caused the intermittent failure. So I tied the FSRs and CLKRs of the McBSP0 and McBSP1 in parallel to receive the same data from one of the external devices, while leaving McASP1 port to receive data from a second external device.

       Guess what .... my C6713 still failed to receive all data! And ... the only port that timed out without receiving the data is the McBSP1 port, the port tied in parallel with the McBSP0!!!

    My Analysis:

       The McBSP serial ports in the C6713 has some race-type of timing problem. This problem would prevail when data frames are received in certain order by the three serial ports and when the timing between the receive instants meet some narrow condition(s). This explains why the failure does not happen in my Test 1, when the data received by the three ports are responded to in fixed order, with no randomness between the receive instants to kick it into the narrow timing problem condition. In Test 3, the data received by the McASP1 provided the randomness to put McBSP1's operation in the problem situation occasionally.

       After 4 years of development, two by myself and two by my predeccesor!!!

    C.M.

  • [ed - This post is in response to the post starting with "Data_Pending_Flags is set equal to". Our posts crossed in the process of writing them. I believe the debug code below will still benefit this process, and it would be interesting for you to analyze the results from all three tests using this data capture technique. -RP]

    C.M.,

    My point-of-view on these in reverse order:

    temp_flags: The code shown, without any external side-effects or re-entrant ISRs is an infinite loop. That is how I read the C code. If we pull out everything that only tests temp_flags and everything that does not access temp_flags, you have

    interrupt void ReceiveDataEDMA (void)
    {
          temp_flags = *EDMACIPR & 0x0000a080;   // Read to find out which serial port interrupts are pending?

          while (temp_flags != 0)
          {
          }
    }

    Since you do not seem to have the symptoms of an infinite loop, my diagnosis is that you have some problems in your system code and functional behavior.

     

    PARAM: We can document how things work and how we recommend you use the EDMA, but if you find something different that works, there is no direct reason to change it. On the other hand, if you have problems, we cannot help you fix code that is written differently than we recommend. It is not wrong, just not feasible for us to predict how everything you could write could work or fail.

     

    Data_Pending_Flag: You understand the timing of your system, and I do not. I am not really clear when things are happening "after the time is up for the next repetition". Your explanations do tell me that you have a clear direction with how you expect things to work in your code and how to procede with it. I fully expect you will have this figured out soon, and I look forward to you posting what the problem was and the solution, so future readers will benefit from your experience.

    What I would do as a debug technique is to create a large array and an index in global memory, and save some information in it like this:

    unsigned int debug_array[100];  // be sure to clear the contents somewhere
    int debug_index = 0;

    interrupt void ReceiveDataEDMA (void)
    {
          temp_flags = *EDMACIPR & 0x0000a080;   // Read to find out which serial port interrupts are pending?

          if ( debug_index < 100 )
          {
                debug_array[debug_index++] = temp_flags;
                debug_array[debug_index++] = CLK_gethtime();  // or read some free-running timer
          }


          while (temp_flags != 0)
          {
          }
    }

    After running the code, take a look at what the values are in the debug_array so you can figure out what values are read from CIPR and how far apart those reads are. That will be an excellent datapoint.

    Then change the if-statement to

          if ( debug_index >= 100 )
                debug_index = 0;
          debug_array[debug_index++] = temp_flags;
          debug_array[debug_index++] = CLK_gethtime();  // or read some free-running timer

    Assuming that interrupts will stop when you detect the failure (or put a breakpoint in the code so CCS will halt the processor), you will be able to see the last 50 values. This will also be a valuable datapoint.

    Beyond this, I am not sure what help to offer.

    Regards,
    RandyP

  • RandyP said:

    temp_flags: The code shown, without any external side-effects or re-entrant ISRs is an infinite loop. That is how I read the C code. If we pull out everything that only tests temp_flags and everything that does not access temp_flags, you have

    interrupt void ReceiveDataEDMA (void)
    {
          temp_flags = *EDMACIPR & 0x0000a080;   // Read to find out which serial port interrupts are pending?

          while (temp_flags != 0)
          {
          }
    }

    Since you do not seem to have the symptoms of an infinite loop, my diagnosis is that you have some problems in your system code and functional behavior.

    Hi Randy,  I guess you missed the line within my while loop:

         *EDMACIPR = temp_flags;        // Clear CIPR bits (i.e., clear interrupts) which were set upon entry to this ISR by writing 1 to each

    This assignment would clear the CIPR bits which were set on entry to the ISR, because these bits are of the 'W1C' (write 1 to clear) type.

    RandyP said:

    Data_Pending_Flag: You understand the timing of your system, and I do not. I am not really clear when things are happening "after the time is up for the next repetition". Your explanations do tell me that you have a clear direction with how you expect things to work in your code and how to procede with it. I fully expect you will have this figured out soon, and I look forward to you posting what the problem was and the solution, so future readers will benefit from your experience.

    Yes, I certainly appreciate your point here. However, I think the timing of my program is no longer important any more, since two serial ports tied in parallel resulted in one saying I received the data while the other says I didn't once a while, should be enough to indicate that there is something wrong with the serial ports already.

    RandyP said:

    What I would do as a debug technique is to create a large array and an index in global memory, and save some information in it like this:

    unsigned int debug_array[100];  // be sure to clear the contents somewhere
    int debug_index = 0;

    interrupt void ReceiveDataEDMA (void)
    {
          temp_flags = *EDMACIPR & 0x0000a080;   // Read to find out which serial port interrupts are pending?

          if ( debug_index < 100 )
          {
                debug_array[debug_index++] = temp_flags;
                debug_array[debug_index++] = CLK_gethtime();  // or read some free-running timer
          }


          while (temp_flags != 0)
          {
          }
    }

    After running the code, take a look at what the values are in the debug_array so you can figure out what values are read from CIPR and how far apart those reads are. That will be an excellent datapoint.

    I have already done something similar over the past few weeks to keep track of the last few states before exiting the ISR, with more simple-minded codes though, as follows:

    interrupt void ReceiveDataEDMA (void)
    {
         temp_flags = *EDMACIPR & 0x0000a080;   // Read to find out which serial port interrupts are pending?

         while (temp_flags != 0)
         {
                if ((temp_flags & 0x00002000) != 0)    // Check whether McBSP0 received a data frame from C50-2 (Center Eye) by EDMA?
                {            

                      Data_Pending_Flags &= 0xffffdfff;   // Clear the data pending flag bit corresponding to C50-2 since a data set received interrupt event occurred.

                      *PaRAMMcBSP0RCNT = rx_no_of_words;  // Reset the CNT part of McBSP0 rcv PaRAM to enable another rcv event
                }

                if ((temp_flags & 0x00008000) != 0)    // Check whether McBSP1 received a data frame from C50-1 (Right Eye) by EDMA?
                {            

                       Data_Pending_Flags &= 0xffff7fff;   // Clear the data pending flag bit corresponding to C50-1 (Right Eye) since a data set received interrupt event occurred.
                       *PaRAMMcBSP1RCNT = rx_no_of_words;  // Reset the CNT part of McBSP1 rcv PaRAM to enable another rcv event
                }

                if ((temp_flags & 0x00000080) != 0)    // Check whether McASP1 received a data frame from C50-3 (Left Eye) by EDMA?
                {            

                      Data_Pending_Flags &= 0xffffff7f;   // Clear the data pending flag bit corresponding to C50-3 (Left Eye) since a data set received interrupt event occurred.
                      *PaRAMMcASP1RCNT = rx_no_of_words;  // Reset the CNT part of McASP1 rcv PaRAM to enable another rcv event
                }

                *EDMACIPR = temp_flags;     // Clear CIPR bits (i.e., clear interrupts) which were set upon entry to this ISR by writing 1 to each


                temp_flags8=temp_flags7;     // Keep the previous Data_Pending_Flags for debugging
                temp_flags7=temp_flags6;     // Keep the previous Data_Pending_Flags for debugging
                temp_flags6=temp_flags5;     // Keep the previous Data_Pending_Flags for debugging
                temp_flags5=temp_flags4;     // Keep the previous Data_Pending_Flags for debugging
                temp_flags4=temp_flags3;     // Keep the previous Data_Pending_Flags for debugging
                temp_flags3=temp_flags2;     // Keep the previous Data_Pending_Flags for debugging
                temp_flags2=temp_flags1;
                temp_flags1=temp_flags; 

         }
    }

    In fact it's these additional global 'temp_flags' tracking codes that made me so certain about the problem with this chip.

    Can you guys not report this to the chip designer(s) to see what they can say about it? After spending so much time trying to make use of this chip, it would be nice to get some feedback from the responsible people.

    C.M.

  • CMA said:

    Hi Randy,  I guess you missed the line within my while loop:

         *EDMACIPR = temp_flags;        // Clear CIPR bits (i.e., clear interrupts) which were set upon entry to this ISR by writing 1 to each

    This assignment would clear the CIPR bits which were set on entry to the ISR, because these bits are of the 'W1C' (write 1 to clear) type.

    That clears the bits in EDMACIPR.  It does not, however, clear the bits in temp_flags, which is what your loop is conditioned upon (not EDMACIPR).  That's why Randy and I  both think you have an infinite loop.  Though I wouldn't expect your program to function at all if that were the case, so I'm a little puzzled.

    One tweak I'd recommend is to write to EDMACIPR to clear the pending flags immediately after reading the register.  Something like this:

    temp = EDMACIPR & EDMACIER;

    EDMACIPR = temp;

    // process bits set in temp

    That will allow new events to get latched immediately and reduce the possibility of a dropped event.

     

    CMA said:
    I think the timing of my program is no longer important any more, since two serial ports tied in parallel resulted in one saying I received the data while the other says I didn't once a while, should be enough to indicate that there is something wrong with the serial ports already.

    With all due respect this device has been out for more than 8 years.  We have hundreds of customers with thousands of designs shipping millions of units.  We've been through multiple silicon revisions to fix critical bugs and this device is rock solid.  I've had several customers come to me over the years that were "sure" they had found a silicon errata, but in every single case it was an issue with the customer's software or hardware.  Nothing you've revealed so far has me even moderately convinced that there is an issue.  I think we need to dig deeper into what's going on here as the problem almost without a doubt is in your hardware/software.

    The EDMA parameters should definitely be reconfigured through linking.  The fact that things were not working properly for you earlier with linking is indicative that you had something wrong.  Your "workaround" seems to just be covering one problem with another.

    Here's another related question:  What address are you programming into the EDMA parameters for reading from the McBSPs and McASP?

  • C.M.,

    50 samples may be more than you need\; 8 may be too few; I cannot say without seeing the data. Please add 8 temp_timerN variables to your debug code to capture a time reference, then post here the hex values for the 16 variables after hitting a breakpoint.

    This thread does not have adequate information to form a bug report. From your insights into your system, you can tell that there is definitely a problem when you make board changes and that this problem is detected in your software by observing a debug flag word.

    The process needs to be followed to reduce your application code to a standalone test case to demonstrate a bug. This will usually be a small subset of your application code, and often users find a solution while generating that test case since it requires eliminating things that have an unintended side-effect.

    My recommendation, for what it is worth, is to continue the debugging process before trying to create the test case.

    Regards,
    RandyP

  • Brad Griffis said:

    That clears the bits in EDMACIPR.  It does not, however, clear the bits in temp_flags, which is what your loop is conditioned upon (not EDMACIPR).  That's why Randy and I  both think you have an infinite loop.  Though I wouldn't expect your program to function at all if that were the case, so I'm a little puzzled.

    Hi Brad, the 'temp_flags' is reassigned at the top of the ISR as well as at the bottom of the while loop, so it would not be the same each time when it goes through the while loop, unless of course the instructions within the loop did not change the *EDMACIPR.

    Brad Griffis said:

    One tweak I'd recommend is to write to EDMACIPR to clear the pending flags immediately after reading the register.  Something like this:

    I've tried this. It did not make any observable difference. Please see my next reply to Randy.

  • Brad Griffis said:

    The EDMA parameters should definitely be reconfigured through linking.  The fact that things were not working properly for you earlier with linking is indicative that you had something wrong.  Your "workaround" seems to just be covering one problem with another.

    Here's another related question:  What address are you programming into the EDMA parameters for reading from the McBSPs and McASP?

    I certainly hope that fixing the PaRAM linking problem would solve the intermittent failure problem, though I think the probability is remote. Anyway, below are my PARAM settings. Hope you can help to find what's wrong with it which caused the program to hang (apparently went into an infinite loop) whenever I set the CNT reg of a linking PaRAM table to be nonzero.

    void initEdma_cch(void)
    {
    // 2010 Oct 21, cchma. Below for setting up EDMA control registers and Parameter RAMs
    //  Section 3.5 of "TMS320C6000 DSP Enhanced Direct Memory Access (DMA) Controller Reference Guide"

     // Setup the Null parameter RAM for termination chaining by all DMA operations
     PaRAMNull  = (unsigned int *) PaRAMNullAddr;                        // Set up pointer to the Null PaRAM
     *PaRAMNull++ = 0x00000000;     // OPT reg
     *PaRAMNull++ = 0x00000000;     // SRC reg
     *PaRAMNull++ = 0x00000000;     // CNT reg
     *PaRAMNull++ = 0x00000000;     // DST reg
     *PaRAMNull++ = 0x00000000;     // IDX reg
     *PaRAMNull++ = 0x00000000;     // RLD reg

    // -------------------------------------------------------
     // Setup parameter RAM table for data transmission to C50-2 thru McBSP0 by EDMA (at 0x01A00120) 
     // Setup a linking PaRAM table for McBSP0 data transfer by EDMA (at 0x01A00180)
    // --------------------
    // Setup parameter RAM table for data receiving thru McBSP0 by EDMA (at 0x01A00138)
     PaRAMMcBSP0R = (unsigned int *) PaRAMMcBSP0RAddr;         // Initialize the McBSP0 rcvr-by-DMA channel PaRAM address

     *PaRAMMcBSP0R++ = 0x287D0002;                                            // OPT reg - High priority, 16bit element, fixed source addr, increment destination per element index and frame index, receive completion interrupt on, TCC=D, linking off, element sync.
     *PaRAMMcBSP0R++ = McBSP0DPAddr;                                      // SRC reg - Data source to be from the McBSP0 data port (#define McBSP0DPAddr 0x30000000)
     *PaRAMMcBSP0R++ = rx_no_of_words;                                       // CNT reg - Always 4 16-bit words before interrupting the CPU
     *PaRAMMcBSP0R++ = (unsigned int) &c50_rx_arry12;                // DST reg - Set to the buffer for receiving the next data set from C50-2

     *PaRAMMcBSP0R++ = 0x00000004;                                             // IDX reg - Make address increment to the next word (4 bytes) boundary
     *PaRAMMcBSP0R++ = LinkPaRAMMcBSP0RAddrL;                    // RLD reg - Set LINK to the lower 16-bits of LinkPaRAMMcBSP0RAddr

     // Setup a linking PaRAM table for McBSP0 data receiving by EDMA (at 0x01A00198)
     PaRAMMcBSP0R = (unsigned int *) LinkPaRAMMcBSP0RAddr;  // Initialize the McBSP0 rcvr-by-DMA channel PaRAM address

     *PaRAMMcBSP0R++ = 0x287D0002;                                            // OPT reg - High priority, 16bit element, fixed source addr, increment destination per element index and frame index, receive completion interrupt on, TCC=D, linking off, element sync.
     *PaRAMMcBSP0R++ = McBSP0DPAddr;                                      // SRC reg - Data source to be from the McBSP0 data port (#define McBSP0DPAddr 0x30000000)
     *PaRAMMcBSP0R++ = 0x00000000;                                             // CNT reg - McBSP0 would malfunction if this number is not 0! (To be set to equal rx_no_of_words within the isr)
     *PaRAMMcBSP0R++ = (unsigned int) &c50_rx_arry12;                // DST reg - Set to the buffer for receiving the next data set from C50-2

     *PaRAMMcBSP0R++ = 0x00000004;                                             // IDX reg - Make address increment to the next word (4 bytes) boundary
     *PaRAMMcBSP0R++ = LinkPaRAMMcBSP0RAddrL;                    // RLD reg - Set LINK to the lower 16-bits of LinkPaRAMMcBSP0RAddr

     

    // -------------------------------------------------------
     // Setup parameter RAM table for data transmission to C50-1 thru McBSP1 by EDMA (at 0x01A00150)
     // Setup a linking PaRAM table for McBSP1 data transmission by EDMA (at 0x01A001B0)
    // --------------------
    // Setup parameter RAM table for data receiving from C50-1 thru McBSP1 by EDMA (at 0x01A00168)
     PaRAMMcBSP1R = (unsigned int *) PaRAMMcBSP1RAddr;       // Initialize the McBSP1 rcvr-by-DMA channel PaRAM address

     *PaRAMMcBSP1R++ = 0x287F0002;                                           // OPT reg - High priority, 16bit element, fixed source addr, increment destination per element index and frame index, receive completion interrupt on, TCC=D, linking off, element sync.
     *PaRAMMcBSP1R++ = McBSP1DPAddr;                                    // SRC reg - Data source to be from the McBSP1 data port (#define McBSP1DPAddr 0x34000000)
     *PaRAMMcBSP1R++ = rx_no_of_words;                                     // CNT reg - Always 4 16-bit words before interrupting the CPU
     *PaRAMMcBSP1R++ = (unsigned int) &c50_rx_arry22;              // DST reg - Set to the buffer for receiving the next data set from C50-1

     *PaRAMMcBSP1R++ = 0x00000004;                                           // IDX reg - Make address increment to the next word (4 bytes) boundary
     *PaRAMMcBSP1R++ = LinkPaRAMMcBSP1RAddrL;                  // RLD reg - Set LINK to the lower 16-bits of LinkPaRAMMcBSP1RAddr

     // Setup a linking PaRAM table for McBSP1 data receiving by EDMA (at 0x01A001C8)
     PaRAMMcBSP1R = (unsigned int *) LinkPaRAMMcBSP1RAddr; // Initialize the McBSP1 rcvr-by-DMA channel PaRAM address

     *PaRAMMcBSP1R++ = 0x287F0002;                                           // OPT reg - High priority, 16bit element, fixed source addr, increment destination per element index and frame index, receive completion interrupt on, TCC=D, linking off, element sync.
     *PaRAMMcBSP1R++ = McBSP1DPAddr;                                    // SRC reg - Data source to be from the McBSP1 data port (#define McBSP1DPAddr 0x34000000)
     *PaRAMMcBSP1R++ = 0x00000000;                                           // CNT reg - McBSP1 would malfunction if this number is not 0! (To be set to equal rx_no_of_words within the isr)
     *PaRAMMcBSP1R++ = (unsigned int) &c50_rx_arry22;              // DST reg - Set to the buffer for receiving the next data set from C50-1

     *PaRAMMcBSP1R++ = 0x00000004;                                           // IDX reg - Make address increment to the next word (4 bytes) boundary
     *PaRAMMcBSP1R++ = LinkPaRAMMcBSP1RAddrL;                 // RLD reg - Set LINK to the lower 16-bits of LinkPaRAMMcBSP1RAddr

     

    // ----------------------------------------------------
     // Setup parameter RAM table for data transmission to C50-3 thru McASP1 by EDMA (at 0x01A00090)
     // Setup a linking PaRAM table for McASP1 data transmission by EDMA (at 0x01A001E0)
    // --------------------
     // Setup parameter RAM table for data receiving from C50-3 thru McASP1 by EDMA (at 0x01A000A8)
     PaRAMMcASP1R = (unsigned int *) PaRAMMcASP1RAddr;     // Initialize the McASP1 rcvr-by-DMA channel PaRAM address

     *PaRAMMcASP1R++ = 0x28770002;                                         // OPT reg - High priority, 16bit element, fixed source addr, increment destination per element index and frame index, receive completion interrupt on, TCC=7, linking off, element sync.
     *PaRAMMcASP1R++ = McASP1DPAddr;                                 // SRC reg - Data source to be from the McASP1 data port (#define McASP1DPAddr 0x3C100000)
     *PaRAMMcASP1R++ = rx_no_of_words;                                  // CNT reg - Always 4 16-bit words before interrupting the CPU
     *PaRAMMcASP1R++ = (unsigned int) &c50_rx_arry32;           // DST reg - Set to the buffer for receiving the next data set from C50-1

     *PaRAMMcASP1R++ = 0x00000004;                                        // IDX reg - Make address increment to the next word (4 bytes) boundary
     *PaRAMMcASP1R++ = LinkPaRAMMcASP1RAddrL;               // RLD reg - Set LINK to the lower 16-bits of LinkPaRAMMcASP1RAddr

     // Setup a linking PaRAM table for McASP1 data receiving by EDMA (at 0x01A001F8)
     PaRAMMcASP1R = (unsigned int *) LinkPaRAMMcASP1RAddr; // Initialize the McASP1 rcvr-by-DMA channel PaRAM address

     *PaRAMMcASP1R++ = 0x28770002;                                           // OPT reg - High priority, 16bit element, fixed source addr, increment destination per element index and frame index, receive completion interrupt on, TCC=7, linking off, element sync.
     *PaRAMMcASP1R++ = McASP1DPAddr;                                    // SRC reg - Data source to be from the McASP1 data port (#define McASP1DPAddr 0x3C100000)
     *PaRAMMcASP1R++ = 0x00000000;                                           // CNT reg - McASP1 would malfunction if this number is not 0! (To be set to equal rx_no_of_words within the isr)
     *PaRAMMcASP1R++ = (unsigned int) &c50_rx_arry32;              // DST reg - Set to the buffer for receiving the next data set from C50-1

     *PaRAMMcASP1R++ = 0x00000004;                                           // IDX reg - Make address increment to the next word (4 bytes) boundary
     *PaRAMMcASP1R++ = LinkPaRAMMcASP1RAddrL;                  // RLD reg - Set LINK to the lower 16-bits of LinkPaRAMMcASP1RAddr

    // -----------------------------------------------------
     // Set up EDMA control registers
     *EDMAESEL1  &= 0x0000FFFF;                                                  // Clear the default values for EVTSEL7 and EVTSEL6 of the EDMA Event SELector 1
     *EDMAESEL1  |= 0x2B280000;                                                    // Set EDMA channel 6/7 to be driven by McASP1 data transmit/receive events respectively
     *EDMAEER  = 0x00000000;                                                         // Disable all EDMA channels
     *EDMACIER  = 0x0000A080;                                                        // Enable the rcv EDMA channel interrupts
     *EDMACCER  = 0x00000000;                                                       // Disable all channel chaining (default)
     *EDMAECR  = 0xFFFFFFFF;                                                        // Clear all previous events
     *EDMAEER  = 0x0000F0C0;                                                         // Enable EDMA channels for receive & xmit McBSP0/McBSP1/McASP1 data by EDMA
    }

     

     

  • CMA said:

    I certainly hope that fixing the PaRAM linking problem would solve the intermittent failure problem, though I think the probability is remote.

    I concur that this one fix will not fix your project. But this should be done right, as has been explained above.

    CMA said:

    Hope you can help to find what's wrong with it which caused the program to hang (apparently went into an infinite loop) whenever I set the CNT reg of a linking PaRAM table to be nonzero.

    If you have linking correct, then you must not write to CNT in the ISR.

    CMA said:
     *EDMAEER  = 0x0000F0C0;                                                         // Enable EDMA channels for receive & xmit

    Why do you enable the xmit channels without initializing the xmit EDMA PaRAM sets?

    CMA said:
    Hi Brad, the 'temp_flags' is reassigned at the top of the ISR as well as at the bottom of the while loop

    Please show me the instruction that reassigns temp_flags at the bottom of the while loop. Writing to EDMACIPR does not change the value of temp_flags.

    Regards,
    RandyP

  • RandyP said:

    C.M.,

    50 samples may be more than you need\; 8 may be too few; I cannot say without seeing the data. Please add 8 temp_timerN variables to your debug code to capture a time reference, then post here the hex values for the 16 variables after hitting a breakpoint.

    Ok, I did as you suggested. My program structure, the new EDMA ISR codes, together with Brad's last suggestion, are as below now:

    // ---- The PROGRAM ----

    Initialization;

    Setup system to receive data through McASP1, McBSP0, McBSP1 by EDMA;

    Setup a First-In-Last-Out (FILO) Temp_Flags_buffer for debugging purpose

    Idle till received a ‘repetition period’ value from user

    Set Period Register of Timer0 to the ‘repetition period’ (say 300us)

    Start Timer0;

    Trigger three external devices to each send a data frame to each of McASP1, McBSP0 and McBSP1 respectively;

    Data_Pending_Flags = 0x0000A080;

    NewResRcvd = 0;

    Enable EDMAINT and TINT0 interrupts;

    While (1)

    {

               If (NewResRcvd != 0)

    {                                                                                     // Enter when new data from C50-1, C50-2, C50-3 have been received as flagged by NewResRcvd

               NewResRcvd = 0;

    Process the received data;

    0 >> Temp_Flags_buffer;                                  // Shift a 0 into the FILO buffer For tracking the data receive interrupt sequence

               }

               Do other tasks such as receive and decode user commands

    }

     

    Interrupt      void   ISR_TINT0  (void)

    (

               If (Data_Pending_Flags != 0) Fatal_Timeout! Halt!;

     

    NewResRcvd = 1;

               Trigger the three external devices to each send a new data frame to each of McASP1, McBSP0 and McBSP1 respectively;

    Data_Pending_Flags = 0x0000A080;

    }

     

    Interrupt      void   ISR_EDMAINT     (void)

    (

               Temp_Flags = *EDMACIPR;

               While (Temp_Flags != 0)

               {

    If ((Temp_Flags & 0x00000080) != 0)

    {                                                                                    // McASP1 received data by EDMA and EDMA receive event interrupted

    Data_Pending_Flags &= 0xffffff7f;            // Clear the bit corresponding to the interrupting CIPR bit

    *EDMACIPR = 0x00000080;                        // Clear the interrupt pending bit

                         }

    If ((Temp_Flags & 0x00002000) != 0)

    {                                                                                    // McBSP0 received data by EDMA and EDMA receive event interrupted

    Data_Pending_Flags &= 0xffffdfff;            // Clear the bit corresponding to the interrupting CIPR bit

    *EDMACIPR = 0x00002000;                        // Clear the interrupt pending bit

                         }

    If ((Temp_Flags & 0x00008000) != 0)

    {                                                                                    // McBSP1 received data by EDMA and EDMA receive event interrupted

    Data_Pending_Flags &= 0xffff7fff;            // Clear the bit corresponding to the interrupting CIPR bit

    *EDMACIPR = 0x00008000;                       // Clear the interrupt pending bit

                         }

    Temp_Flags >> Temp_Flags_buffer;                   // Shift Temp_Flags into the FILO buffer for tracking the data receive interrupt sequence

     

                         // All interrupt pending bits set prior to entering this while-loop should be cleared at this point

                         Temp_Flags = *EDMACIPR;                                   // Check to see if more interrupts happened during the above interrupt

    //  service? If yes, repeat this interrupt service before exiting.

               }

    }

    // ---- End of the PROGRAM ----

    /* ---------------------- The EDMA Interrupt Service Routine -------------------- */
    /*
     *  interrupt void ReceiveDataEDMA (void) - 2011 May 5, cchma.
     *    Interrupt service routine for data receiving from C50s by EDMA.
     *    This is triggered when a complete frame of data has been
     *              received from C50-1, C50-2 and/or C50-3 by EDMA thru either McBSP0,
     *    McBSP1, and/or McASP1 respectively.
     *    'intvecs_bootload_cch.asm' directed the CPU to this ISR.
     */
    interrupt void ReceiveDataEDMA (void)
    {
         temp_flags = *EDMACIPR & 0x0000a080;   // Read to find out which serial port interrupts are pending?

         while (temp_flags != 0)
         {
              if ((temp_flags & 0x00002000) != 0)                     // Check whether McBSP0 received a data frame from C50-2 (Center Eye) by EDMA?
              {                                                                             // If yes, register the event
                     Data_Pending_Flags &= 0xffffdfff;                 // Clear the data pending flag bit corresponding to C50-2 since a data set received interrupt event occurred.
                     *EDMACIPR   = 0x00002000;                        // Clear bit-13 of the CIPR by writing 1 to it
                     *PaRAMMcBSP0RCNT = rx_no_of_words;  // Reset the CNT part of McBSP0 rcv PaRAM to enable another rcv event
              }

              if ((temp_flags & 0x00008000) != 0)                    // Check whether McBSP1 received a data frame from C50-1 (Right Eye) by EDMA?
              {           // If yes, register the event
                    Data_Pending_Flags &= 0xffff7fff;                 // Clear the data pending flag bit corresponding to C50-1 (Right Eye) since a data set received interrupt event occurred.
                    *EDMACIPR   = 0x00008000;                        // Clear bit-15 of the CIPR by writing 1 to it
                    *PaRAMMcBSP1RCNT = rx_no_of_words;  // Reset the CNT part of McBSP1 rcv PaRAM to enable another rcv event
              }

              if ((temp_flags & 0x00000080) != 0)                   // Check whether McASP1 received a data frame from C50-3 (Left Eye) by EDMA?
              {           // If yes, register the event
                   Data_Pending_Flags &= 0xffffff7f;                 // Clear the data pending flag bit corresponding to C50-3 (Left Eye) since a data set received interrupt event occurred.
                   *EDMACIPR   = 0x00000080;                        // Clear bit-7 of the CIPR by writing 1 to it
                   *PaRAMMcASP1RCNT = rx_no_of_words;  // Reset the CNT part of McASP1 rcv PaRAM to enable another rcv event
              }

    // 2011 Sept 26, cchma. ?????????? Below for debugging why Data_Pending_Flags stayed nonzero despite having received all data and caused fatal CapTimedOut!!!
              temp_flags16=temp_flags15;                              // Keep the previous Data_Pending_Flags for debugging
              temp_flags15=temp_flags14;                              // Keep the previous Data_Pending_Flags for debugging
              temp_flags14=temp_flags13; 
              temp_flags13=temp_flags12;
              temp_flags12=temp_flags11; 
              temp_flags11=temp_flags10; 
              temp_flags10=temp_flags9;
              temp_flags9=temp_flags8;
              temp_flags8=temp_flags7;
              temp_flags7=temp_flags6;
              temp_flags6=temp_flags5;
              temp_flags5=temp_flags4;
              temp_flags4=temp_flags3;
              temp_flags3=temp_flags2;

              temp_flags2=temp_flags1;
              temp_flags1=temp_flags;
    // 2011 Sept 26, cchma. ?????????? Above for debugging why Data_Pending_Flags stayed nonzero despite having received all data and caused fatal CapTimedOut!!!

              temp_flags = *EDMACIPR & 0x0000a080;          // Read to check whether any new interrupts came in during the above process?
         }
         Flags_at_end_of_last_ISR = Data_Pending_Flags; // Keep a record of the Data_Pending_Flags before exiting to see if it gets changed outside of the ISR
    }

    The values of the flags when the program hit a fatal time out are as follows:

              temp_flags16= 0x00002080
              temp_flags15= 0x00008000
              temp_flags14= 0 
              temp_flags13= 0x00000080
              temp_flags12= 0x0000a000 
              temp_flags11= 0 
              temp_flags10= 0x00002080
              temp_flags9=   0x00008000
              temp_flags8=   0
              temp_flags7=   0x00002080
              temp_flags6=   0x00008000
              temp_flags5=   0
              temp_flags4=   0x00002080
              temp_flags3=   0x00008000

              temp_flags2=   0
              temp_flags1=   0x00002080
              temp_flags =    0
              Flags_at_end_of_last_ISR = 0x00008000
              Data_Pending_Flags =   0x00008000

    These flag values indicate that the data frame received by McBSP1 (which is tied in parallel with McBSP0) never got registered as an EDMA receive event.

    There are variations of these values when the program hit a fatal timeout, but all value sequences are consistent in indicating the same. 

    Hope you can help to figure out how else can I debug this problem. By the way, is it possible for the CSS to cause such an intermittent problem thru interfering with my program???

    C.M

    .

  • I see a couple programming errors which I'll highlight below:

    CMA said:
               If (NewResRcvd != 0)
    {                                                                                     // Enter when new data from C50-1, C50-2, C50-3 have been received as flagged by NewResRcvd
               NewResRcvd = 0;
    Process the received data;
    0 >> Temp_Flags_buffer;                                  // Shift a 0 into the FILO buffer For tracking the data receive interrupt sequence
               }
               Do other tasks such as receive and decode user commands
    }

     

    CMA said:
    Interrupt      void   ISR_EDMAINT     (void)
    (
               Temp_Flags = *EDMACIPR;
               While (Temp_Flags != 0)
               {
    If ((Temp_Flags & 0x00000080) != 0)
    {                                                                                    // McASP1 received data by EDMA and EDMA receive event interrupted
    Data_Pending_Flags &= 0xffffff7f;            // Clear the bit corresponding to the interrupting CIPR bit
    *EDMACIPR = 0x00000080;                        // Clear the interrupt pending bit
                         }
    If ((Temp_Flags & 0x00002000) != 0)
    {                                                                                    // McBSP0 received data by EDMA and EDMA receive event interrupted
    Data_Pending_Flags &= 0xffffdfff;            // Clear the bit corresponding to the interrupting CIPR bit
    *EDMACIPR = 0x00002000;                        // Clear the interrupt pending bit
                         }
    If ((Temp_Flags & 0x00008000) != 0)
    {                                                                                    // McBSP1 received data by EDMA and EDMA receive event interrupted
    Data_Pending_Flags &= 0xffff7fff;            // Clear the bit corresponding to the interrupting CIPR bit
    *EDMACIPR = 0x00008000;                       // Clear the interrupt pending bit
                         }
    Temp_Flags >> Temp_Flags_buffer;                   // Shift Temp_Flags into the FILO buffer for tracking the data receive interrupt sequence
     
                         // All interrupt pending bits set prior to entering this while-loop should be cleared at this point
                         Temp_Flags = *EDMACIPR;                                   // Check to see if more interrupts happened during the above interrupt
    //  service? If yes, repeat this interrupt service before exiting.
               }
    }

     

    As far as I can tell the highlighted lines of code don't actually do anything because there is no assignment taking place.  The ">>" operator is a right-shift in C.  It looks to me like you're trying to use it similar to a bash shell or something.

  • Sorry Brad, those two highlighted lines are meant to represent

    {          temp_flags16=temp_flags15;                              // Keep the previous Data_Pending_Flags for debugging
              temp_flags15=temp_flags14;                              // Keep the previous Data_Pending_Flags for debugging
              temp_flags14=temp_flags13; 
              temp_flags13=temp_flags12;
              temp_flags12=temp_flags11; 
              temp_flags11=temp_flags10; 
              temp_flags10=temp_flags9;
              temp_flags9=temp_flags8;
              temp_flags8=temp_flags7;
              temp_flags7=temp_flags6;
              temp_flags6=temp_flags5;
              temp_flags5=temp_flags4;
              temp_flags4=temp_flags3;
              temp_flags3=temp_flags2;

              temp_flags2=temp_flags1;
              temp_flags1=0;

    }
    and

     {         temp_flags16=temp_flags15;                              // Keep the previous Data_Pending_Flags for debugging
              temp_flags15=temp_flags14;                              // Keep the previous Data_Pending_Flags for debugging
              temp_flags14=temp_flags13; 
              temp_flags13=temp_flags12;
              temp_flags12=temp_flags11; 
              temp_flags11=temp_flags10; 
              temp_flags10=temp_flags9;
              temp_flags9=temp_flags8;
              temp_flags8=temp_flags7;
              temp_flags7=temp_flags6;
              temp_flags6=temp_flags5;
              temp_flags5=temp_flags4;
              temp_flags4=temp_flags3;
              temp_flags3=temp_flags2;

              temp_flags2=temp_flags1;
              temp_flags1=temp_flags;
    }

    respectively, to shorten what you guys need to read. They are not actual instructions.

    C.M.

  • RandyP said:

    If you have linking correct, then you must not write to CNT in the ISR.

    Yes, I knew this. I only wrote to CNTwithin the ISR after finding the linking problem whenever CNT is set nonzero within the linking PaRAM table.

    RandyP said:

    Why do you enable the xmit channels without initializing the xmit EDMA PaRAM sets?

    I do have PaRAM and linking PaRAM tables for the xmit EDMA operations too. I just deleted them to make the posting shorter for you guys, since there has apparently been no problems with them. 

    RandyP said:

    Please show me the instruction that reassigns temp_flags at the bottom of the while loop. Writing to EDMACIPR does not change the value of temp_flags.

    Sorry I deleted this line accidentally in my original codes when I deleted the massive extra comments online. This line is now definitely at the bottom of the while loop in the latest long-long posting of my codes, as follows:

        temp_flags = *EDMACIPR & 0x0000a080;   // Read to find out which serial port interrupts are pending?

         while (temp_flags != 0)
         {
              if ((temp_flags & 0x00002000) != 0)                     // Check whether McBSP0 received a data frame from C50-2 (Center Eye) by EDMA?
              {                                                                             // If yes, register the event
                   ...

              }

              if ((temp_flags & 0x00008000) != 0)                    // Check whether McBSP1 received a data frame from C50-1 (Right Eye) by EDMA?
              {           // If yes, register the event
                   ...

              }

              if ((temp_flags & 0x00000080) != 0)                   // Check whether McASP1 received a data frame from C50-3 (Left Eye) by EDMA?
              {           // If yes, register the event
                  ...

              }

              ...

              temp_flags = *EDMACIPR & 0x0000a080;          // Read to check whether any new interrupts came in during the above process?
         }

    C.M.

  • C.M.,

    We have tried sincerely to help you. You have a difficult situation, and we all understand how hard it can be to debug an embedded system. CCS is a helpful tool for exploring the situation with breakpoints and watch windows. And we have made suggestions how to change your code and how to insert debug structures into your code.

    It is not possible for me to debug your code without the information that has been suggested, and it is not possible to assess the situation when other problems may be masking each other, like the EDMA Linking problem. Mixing psuedo code and C code and leaving out critical lines, these all make it unlikely that we will be able to help you.

    My suggestion is to make the following changes to try to get more visibility into the timing of the failures and the true signature of the failures:

    unsigned int debug_array[100];  // be sure to clear the contents somewhere
    int debug_index = 0;

    interrupt void ReceiveDataEDMA (void)
    {
          temp_flags = *EDMACIPR & 0x0000a080;   // Read to find out which serial port interrupts are pending?

          if ( debug_index >= 100 )
                debug_index = 0;
          debug_array[debug_index++] = temp_flags;
          debug_array[debug_index++] = CLK_gethtime();  // or read your timer's cnt value

          if (temp_flags != 0)  // no while loop, simplify the situation to get it debugged first
          {
                // Do not write to EDMACIPR here, only write once at the end.
                // This is what the EDMA User's Guide says to do in the last
                //     sentence of section 1.15.1 .


                *EDMACIPR = temp_flags;
          }

    }

    In ISR_TINT0 when the failure is detected, add

          if ( debug_index >= 100 )
                debug_index = 0;
          debug_array[debug_index++] = *EDMACIPR & 0x0000a080;
          debug_array[debug_index++] = CLK_gethtime();  // or read your timer's cnt value


    In ISR_TINT0 when the failure is not detected, add

          if ( debug_index >= 100 )
                debug_index = 0;
          debug_array[debug_index++] = 0xffffffff;
          debug_array[debug_index++] = CLK_gethtime();  // or read your timer's cnt value

    If you are using the xmit side and generating interrupts on the xmit side, those ISRs could also be contributing. But perhaps you are not generating interrupts since they are not serviced in the same place?

    Even if we are unable to understand well enough to help, we would still like to know your status and progress.

    Regards,
    RandyP

  • Hi Randy,

    I am 99% certain of what is the cause of the problem now. The problem is not due to the software, it is due to C6713 being not fast enough to receive all of the data from the 3 devices simultaneously.

    I found out this by making the McBSP0 port interrupt the CPU and receive the data by an ISR (called 'ReceiveDataMcBSP0') instead of triggering an EDMA receive request. Within the ReceiveDataMcBSP0 I made it generate a pulse on GP8 line to show when the CPU interrupt occurred. and I see on the oscilloscope that every time when the system timed out fatally at least one of the ReceiveDataMcBSP0 interrupts occurred after two halfwords have been received. So the problem has been that one of the RRDY=1 event did not get serviced by the ReceiveDataMcBSP0 ISR before the next halfword is received. The first RRDY=1 event got masked out by the second one, ReceiveDataMcBSP0 got invoked only once for two halfwords received,the first halfword got written over by the second and hence the timeout.

    If the EDMA does not work any faster than a CPU interrupt service can read a word off a serial port, then the same thing would happen when the EDMA was used to receive and store the data originally. This is why I never saw any missing data when the program timed out due to not being able to receive 4 halfwords in time. I have slowed down the data transmission clock frequency of the 3 devices from 10MHz to 6.66MHz already, but this occasional time out problem is still happening, though less frequently now.

    One of the reasons I chose to use C6713 is because of the claimed higher power and operating speed (MIPS). I cannot keep on decreasing the CLKR frequency. I'm wondering is there any specification on how many halfwords, at what CLKR frequency, can this chip receive by EDMA under at least the condition when the CPU is not doing anything uninterruptable?

    C.M.

     

  • CMA said:
    One of the reasons I chose to use C6713 is because of the claimed higher power and operating speed (MIPS).

    The C6713B is a good part with good performance, excellent in its day. We have faster choices now, but after 4 years of development with the C6713B, I doubt you are interested in a change. And the C6713B may be adequate for your job.

    CMA said:
    I'm wondering is there any specification on how many halfwords, at what CLKR frequency, can this chip receive by EDMA under at least the condition when the CPU is not doing anything uninterruptible?

    We have an EDMA Performance Application Note that you can find from the TMS320C6713B Product Folder -> Technical Documents. It may not specifically address your situation servicing the McBSP port, but it will give you good insight into how the EDMA can perform for different data and different memory endpoints.

    The interruptibility of the CPU will not directly affect the performance of the EDMA. The EDMA is an independent bus master, so the only dependencies would be with sharing common memory endpoints or internal buses.

    CMA said:
    If the EDMA does not work any faster than a CPU interrupt service can read a word off a serial port, then the same thing would happen when the EDMA was used to receive and store the data originally.

    The EDMA will work faster than a CPU ISR. How long is the delay between the end of the last bit of received data and the GP8 pin toggling? Does it toggle at the beginning or the end of the ISR? That delay tells you the latency from the McBSP0 to the ISR servicing.

    CMA said:
    I have slowed down the data transmission clock frequency of the 3 devices from 10MHz to 6.66MHz already, but this occasional time out problem is still happening, though less frequently now.

    For testing purposes, can you turn off the transmit side for testing, or does that corrupt the system's operation?

    At about 10MHz for a total of 6 devices sending and receiving 16 bits with a SYSCLK1 of 200MHz, you get about 60 SYSCLK1s for each of the 6 halfwords to be read or written. That is not a huge number, and could mean you need to be careful with timing. But it is probably easily enough since in the McBSP there is the RBR to double-buffer the data and RXR to buffer one more halfword as it shifts in. Since you have only 4 halfwords to read per "burst", there should be plenty of time. This is not a scientific or empirical analysis, but a gentle approximation to see how close you are.

    If it is working for many cycles at 10MHz before failing, and lowering the speed by 33% does not make the situation go away completely, it is very unlikely that this is a pure EDMA bandwidth issue. You might have other things using the EDMA and together they do cause that type of problem, but that is another discussion.

    Regards,
    RandyP

  • CMA -- is your code/data in internal or external memory?  For testing purposes you should create a test that runs entirely from internal memory.  Any CPU requests to external memory will generate an EDMA request to external memory, which gets serviced at the highest priority level (fixed priority, not programmable until 64x).  Perhaps that's related.

    Newer devices like the c6748 feature a FIFO on the front-end of the McBSP which significantly reduces the hard real-time deadline in servicing the McBSP.  It also allows for more efficient bursting transactions to be performed.  The EDMA has changed significantly from 6713 -> 6748 so there would be some migration work.  There's an app note that discusses the migration.

    Brad

  • Brad Griffis said:

    CMA -- is your code/data in internal or external memory?  For testing purposes you should create a test that runs entirely from internal memory. 

    It starts at 0x0001139c, internal memory.

    Brad Griffis said:

    Any CPU requests to external memory will generate an EDMA request to external memory, which gets serviced at the highest priority level (fixed priority, not programmable until 64x).  Perhaps that's related.

    Interesting, requests to 'external' memory gets highest priority! I thought it was the reverse.

    Brad Griffis said:

    Newer devices like the c6748 feature a FIFO on the front-end of the McBSP which significantly reduces the hard real-time deadline in servicing the McBSP.  It also allows for more efficient bursting transactions to be performed.  The EDMA has changed significantly from 6713 -> 6748 so there would be some migration work.  There's an app note that discusses the migration.

    Great information, cause I bet TI is going to discontinue the C6713 pretty soon again since it's been out for 6 (?) years already. (I think the C31 was discontinued after just 10 years, which is why I still have to work on this thing now). I'll check out the C6748 soon too.

    CM

     

  • Brad Griffis said:
    Any CPU requests ... get serviced at the highest priority level

    CMA said:
    Interesting, requests to 'external' memory gets highest priority! I thought it was the reverse.

    It is the CPU requests that get highest priority, not external accesses.

    CPU requests that go to external memory use the EDMA logic in the C6713B. Other peripherals may use the EDMA logic to reach external memory, also. And the EDMA may be programmed to access external memory in response to peripheral events like from the McBSP. The CPU requests will be prioritized at the highest priority level.

    CMA said:
    I bet TI is going to discontinue the C6713 pretty soon

    Do not bet very much unless you are very rich. Technically, you are correct because the C6713 was discontinued a while ago; you are using the C6713B. It will be around for a while, but for new designs there may be better choices to use, for cost, power, performance, usability, tools tradeoffs. But there may be reasons for you to choose the C6713B, and we are happy for you to use it for the success of your project.

    Regards,
    RandyP

  • CMA said:

    CMA -- is your code/data in internal or external memory?  For testing purposes you should create a test that runs entirely from internal memory. 

    It starts at 0x0001139c, internal memory.

    [/quote] 

    First, that's a weird address. I would have expected it to be aligned to a 32-byte boundary if it's code.  Second, it's not clear if ALL of your code and data is in internal memory.

     

    CMA said:

    Any CPU requests to external memory will generate an EDMA request to external memory, which gets serviced at the highest priority level (fixed priority, not programmable until 64x).  Perhaps that's related.

    Interesting, requests to 'external' memory gets highest priority! I thought it was the reverse.

    [/quote]

    You should read C671x EDMA Architecture.  It will help you understand the architecture much better.  You are misunderstanding my point above... A CPU access to internal memory does not utilize EDMA at all.  My comment above is stating that simultaneous accesses to external memory due to an L2 cache miss and an EDMA Channel Controller event (e.g. service McBSP) will always give priority to the L2 cache miss because the L2 cache is mapped to Queue 0 (urgent priority) of the EDMA transfer controller, while EDMA Channel Controller events are programmable, but can only be mapped to Queue 1 (high priority) or 2 (low priority).

     

    CMA said:

    Great information, cause I bet TI is going to discontinue the C6713 pretty soon again since it's been out for 6 (?) years already. (I think the C31 was discontinued after just 10 years, which is why I still have to work on this thing now). I'll check out the C6748 soon too.

    You can still purchase the C31 today:

    http://www.ti.com/product/sm320c31

    TI puts forth great effort to keep devices around as long as possible. 

     

  • RandyP said:

    The C6713B is a good part with good performance, excellent in its day. We have faster choices now, but after 4 years of development with the C6713B, I doubt you are interested in a change. And the C6713B may be adequate for your job.

    I've looked at the C6748, but this chip will require tremendous changes and further development before we can use it in place of the C6713. Just the BGA packaging will be a huge struggle for us during the further development. So you're right I'm not so keen to change the chip for now, except I am already worrying how much longer will it be before the C6713 becomes obsolete again! It's so terrifying to use TI products these days.

    RandyP said:
    The EDMA will work faster than a CPU ISR. How long is the delay between the end of the last bit of received data and the GP8 pin toggling? Does it toggle at the beginning or the end of the ISR? That delay tells you the latency from the McBSP0 to the ISR servicing.

    It varies because the three 4-halfword data frames coming from the three devices are not perfectly synchronized. Their arrival ordering keeps changing, sometimes the McBSP1 data arrives first, sometimes the McBSP0 data or McASP1 data frame arrives first.

    As I pointed out before, and still is true, that if the three serial port FSRs and CLKRs are connected in parallel, so that C6713 receives the three data frames in perfect synchronism, then there is no problem at all. However the problem happens as soon as one of them becomes not perfectly synchronized with the other two data frames (i.e., tying only two ports' FSRs and CLKRs in parallel, or do not tie any two ports in parallel).

    The pulse on GP8 is generated at the start of the CPU ISR. I only used CPU ISR to service the McBSP0 data receive event by CPU interrupt, McBSP1 and McASP1 continued to receive their data frames by EDMA. Due to the random arrival of the data frames, the 'delay' was not constant.

    RandyP said:
    For testing purposes, can you turn off the transmit side for testing, or does that corrupt the system's operation?

    The transmit sides of all three serial ports transmit just one single halfword of trigger command to each of the three devices respectively, at the start of each (210us) cycle. The data frames are generated by the devices upon receiving the trigger commands, and are supposed to arrive before the end of the cycle. The time between triggering a device and receiving its data frame is at least 160us. So if I turn off the transmit side the C6713 would not receive any data from the devices at all.

    RandyP said:
    At about 10MHz for a total of 6 devices sending and receiving 16 bits with a SYSCLK1 of 200MHz, you get about 60 SYSCLK1s for each of the 6 halfwords to be read or written. That is not a huge number, and could mean you need to be careful with timing. But it is probably easily enough since in the McBSP there is the RBR to double-buffer the data and RXR to buffer one more halfword as it shifts in. Since you have only 4 halfwords to read per "burst", there should be plenty of time. This is not a scientific or empirical analysis, but a gentle approximation to see how close you are.

    In fact, since the transmit sides are never transmitting anything when the data frames arrive, the CPU has 16 bits/6.66MHz = 2.4us to read three halfwords (one from each device, assuming the worst case). This is 480 instruction cycles for just reading 3 halfwords! I don't think the instruction set of the C6713 or the compiler can be this bad, So I think it's more like there is some kind of hardware timing problem between the serial ports and the EDMA that prevails only when the data frame arrivals meet certain rare condition.

    RandyP said:
    If it is working for many cycles at 10MHz before failing, and lowering the speed by 33% does not make the situation go away completely, it is very unlikely that this is a pure EDMA bandwidth issue. You might have other things using the EDMA and together they do cause that type of problem, but that is another discussion.

    I set the EDMA_EER register = 0x0000D0C0 now that McBSP0 is receiving the data by CPU ISR. It was set as 0x0000F0C0 originally when McBSP0 was also receiving data by EDMA. So if anything else is using the EDMA, it's certainly not due to my program. I also agree that it's not a bandwidth problem, because when I tied all three FSRs together, and all three CLKRs together too, it ran without problem for days at 10MHz!

    C.M.

  • RandyP said:
    How long is the delay between the end of the last bit of [McBSP0] received data and the GP8 pin toggling?

    "It varies" does not let me offer advice. Please look at the time from McBSP0 FSR to GP8 and see how it varies (use numbers), such as 3 distinct times or purely random. Look at it for just the first of 4 pulses to get an idea of how it works in the "new" condition. Look at it for the last of the 4 pulses to see whether it changes across the sequence of the 4 halfwords. If there is a significant difference, then also look at the second and third pulse results.

    McBSP0 FSR to GP8 should not vary much unless there is internal contention. Since I only said "last bit of received data" originally, there was randomness introduced by the other channels.

    Try this GP8 pulse also at the beginning of the ISR. It could be interesting to know if the delays vary from the first sample to the fourth sample.

    CMA said:
    This is 480 instruction cycles for just reading 3 halfwords! I don't think the instruction set of the C6713 or the compiler can be this bad,

    You understand that instructions are not being used for the EDMA transfers, right? The instruction set and compiler do not have a part in that process.

    CMA said:
    So I think it's more like there is some kind of hardware timing problem between the serial ports and the EDMA that prevails only when the data frame arrivals meet certain rare condition.

    Without all the data I have been asking for (timestamps, GP8 measurements), you do not establish what "rare condition" means.

    If you only send 3 halfwords, does it ever fail? If you send 5 halfwrds, does it fail more often?

    Regards,
    RandyP

  • Brad Griffis said:

    You can still purchase the C31 today:

    http://www.ti.com/product/sm320c31

    TI puts forth great effort to keep devices around as long as possible. 

    Hi Brad, this is extremely interesting information for us. We have been painfully looking for 100 pieces of the TMS320C31PQL80 from surplus vendors for months already. We just got news from a vendor who previously told us that they have the TMS320C31PQL80 and all the other parts we need too to make our last batch of products based on it. But yesterday they suddenly told us that they got the wrong part! So now we are stuck not being able to get the TMS320C31PQL80 to make our last shipment based on the C31, and meanwhile I'm still stuck not being able to get the C6713B replacement design going reliably after years of development! Frankly we are in a very flimsy situation all because we chose to use the TI parts and trusted in TI's words we were told years ago! (I was personally told by one of your TI products department managers in Texas that I called 10 years ago that as long as we stayed using that other TI part, TI will always make them available to us. But now, I cannot even find that manager and no one would take responsibility for that other part at all now.) If you guys on this technical forum is really another channel for us small (but very high tech) manufacturers to get thru to your higher management, I would be really appreciative and extremely happy to know. (Other people in our company had tried to get through to your upper management via the TI Products Information Center but had to give up.)

    Anyway, I just quickly browsed through the sm320c31 link above but found no explicit mention that it can replace the TMS320C31PQL80. It seems that the sm320c31 chips only go up to '-60' which as I understand means that its top clock frequency is only 60MHz. Do you know off-hand whether it is an easy replacement of the TMS320C31PQL80? Can it run as fast at just 60MHz clock frequency???

    C.M.