This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Hi,
On our system we have the problem, that the McBsp1 FIFO stops generating EDMA3 RX events if there is kind of heavy traffic on EDMA3.
We use the McBsp for reception of data only and use the EDMA3_0 (Queue 0) in a “ping-pong” configuration reading those data out of McBsp1 FIFO into two shared-RAM sections. While there is nothing else happening than the McBsp + EDMA3 servicing the McBsp everything works like a charm. But if the EDMA3 has to service additional SPI events (RX & TX), QDMA data-transfers to the EMIFA (NAND-Flash) and the QDMA servicing ordinary 1d memcpy, the McBsp1 FIFO will stops generating RX events. However the McBsp1 will not immediately stop operation.
To me it looks like the EMIFA access is the central building block causing the issue because swapping all the data adressed to NAND-Flash to SDMMC module prevents the error-occurence. Please note, that also SDMMC module uses the EDMA0 for data-transfer.
In order to verify this problem not being selfmade, I created a “nutshell”-project reproducing this issue. This projects sets up the McBsp and EDMA3_0 for McBsp servicing. Subsequent to that, 3 processes will start to create the additional load on EDMA3_0 as followed:
Process 1 (testApl) keeps endless creating EDMA3-memcpy jobs (QDMA Queue 1)
Process2 (testApl2) keeps endless erasing one Flashblock and writeVerify this block (with QDMA support @ queue 1)
Process3 (testApl3) fills some random data into a buffer and sends this buffer to an SPI-Display (SPI RX/TX is also done with EDMA). SPI is running @ 20MHz
Each process places its job @ EDMA0 and waits for transfer-completion before placing a new order. On any successful McBsp1 completion Interrupt a port-pin is toggled for observation. However after a variable number of _minutes_ perfect operation, the McBsp stops. Checking the settings of McBsp and EDMA showed that EDMA is waiting for further McBsp RX events (PaRAM set not jet exhausted), McBsp->spcr says RFULL = 1 and RRDY=1 but the RFIFOSTS keeps static (not empty not full) which ends up in no further events generated. All other job such as EDMA3-memcpy, SPI xfers and nand-flashing keep working.
I tried to turn several optimization knobs such as busmaster-priority, default-burst-size, number of bytes per A-Dimension within PaRAM, moving events from queue0 to queue1 and vice versa. All those changes seem to impact the frequency of error occurrence, but none of those knobs could completely prevent the error.
Please see attached an export of all EDMA0 and McBsp registers while nominal (error-free) system conditions.
See below the McBsp, FIFO, and PaRAM4 setup after error occured:
((mcbsp_regs_t*) (0x01D11000)) struct mcbsp_s * 0x01D11000
*(((mcbsp_regs_t*) (0x01D11000))) struct mcbsp_s {...} 0x01D11000
drr unsigned int 0x00000002 0x01D11000
dxr unsigned int 0x00000000 0x01D11004
spcr unsigned int 0x02002031 0x01D11008
rcr unsigned int 0x00011040 0x01D1100C
xcr unsigned int 0x00000000 0x01D11010
srgr unsigned int 0x00000000 0x01D11014
mcr unsigned int 0x00000000 0x01D11018
rcere0 unsigned int 0x00000000 0x01D1101C
xcere0 unsigned int 0x00000000 0x01D11020
pcr unsigned int 0x00000081 0x01D11024
rcere1 unsigned int 0x00000000 0x01D11028
xcere1 unsigned int 0x00000000 0x01D1102C
rcere2 unsigned int 0x00000000 0x01D11030
xcere2 unsigned int 0x00000000 0x01D11034
rcere3 unsigned int 0x00000000 0x01D11038
xcere3 unsigned int 0x00000000 0x01D1103C
((mcbsp_fifo_regs_t*) 0x01D11800) struct mcbsp_fifo_s * 0x01D11800
*(((mcbsp_fifo_regs_t*) 0x01D11800)) struct mcbsp_fifo_s {...} 0x01D11800
bfiforev unsigned int 0x44311100 0x01D11800
rsvd0 unsigned int[3] 0x01D11804 0x01D11804
wfifoctl unsigned int 0x00001004 0x01D11810
wfifosts unsigned int 0x00000000 0x01D11814
rfifoctl unsigned int 0x00011001 0x01D11818
rfifosts unsigned int 0x00000007 0x01D1181C
*(&((edma3_pram_regs_t *)(0x01C04000))->pram_set4) struct unknown {...} 0x01C04080
opt unsigned int 0x80104004 0x01C04080
src unsigned int 0x01F11000 0x01C04084
acnt unsigned short 0x0040 0x01C04088
bcnt unsigned short 0x0001 0x01C0408A
dst unsigned int 0x8000164C 0x01C0408C
src_bidx short 0x0000 0x01C04090
dst_bidx short 0x0040 0x01C04092
link unsigned short 0x4880 0x01C04094
bcntrld unsigned short 0x0001 0x01C04096
src_cidx short 0x0000 0x01C04098
dst_cidx short 0x0024 0x01C0409A
ccnt unsigned short 0x0001 0x01C0409C
rsvd unsigned short 0x0000 0x01C0409E
While searching the e2e for any existing information on this issue I found the following, potentially unsolved post which seams like exactly the same issue:
http://e2e.ti.com/support/dsp/tms320c6000_high_performance_dsps/f/112/t/217915.aspx
Cheers
Stefan
Stefan,
It might be worth to try to split the different EDMA0 transfers over TC0 and TC1.
Please look at section 18.2.14 of the OMAP-L138 TRM - SPRUH77a for general performance consideration.
Also the The below perfromance related wiki articles might help especially the McASP related one:
OMAP-L1x/C674x/AM1x LCD Controller (LCDC) Throughput and Optimization Techniques
OMAP-L1x/C674x/AM1x Multichannel Audio Serial Port (McASP) Throughput and Optimization Techniques
OMAP-L1x/C674x/AM1x SOC Architecture and Throughput Overview
OMAP-L1x/C674x/AM1x SoC Architectural Overview
OMAP-L1x/C674x/AM1x SoC Constraints
OMAP-L1x/C674x/AM1x SoC Level Optimizations
OMAP-L1x/C674x/AM1x UART Throughput and Optimization Techniques
http://processors.wiki.ti.com/index.php/Category:OMAPL1
Hope it helps.
Anthony
Anthony,
as I already mentioned in my first post, I tried moving transferes from TC0 to TC1. I also tried all those opimization techniques mentioned in the links you proviede ( as stated in my first post). None of them helped.
There is still the central issue, that McBsp freezes without indication of _any_ error! I implemented _any_ error-Interrupt (BcBsp_errint, armEdma3_0Cc0Errint, armEdma3_0Tc0Errint, armEdma3_0Tc1Errint) and _none_ of those is entered. Also the register analysis after McBsp freez showed no error. So the questins are:
Why does McBsp freez? There is simply no more EDMA3 event generation!
Why is there no notifcation on this freez?
How could this freez be prevented?
Just for clarifcation, and prevention of any further link-postings:
I spent days of unsuccessful wasted time on optimizing with the help of any knob I could find.
I have found the following config which seams to be the most perfomant but still McBsp1 stalling configuration,
- McBsp1 RX and SPI RX are on TC0;
- SPI TX is on TC1
- all QDMA channels are on TC1; servicing the 8-bit NAND Flash and MEMCOPIES
- readrate of TC0 == 0, readrate of TC1 == 3
- DBS of both TCs == 16
- masterpriority of TC0 = 0, masterprioritiy of TC1 == 2, all other masters default
- holding MEMCPY acnt as small as possible has a huge impact on performance and time till McBsp freezes
and those have been evaluated by turning the following knobs:
- changing the TCs did change the performance, but McBsp still freezes
- Optimizing of the McBsp1 PaRAM (acnt / bcnt) did not solve the issue
- Optimizing of the MEMCPY PaRAMs (smaller acnt while bcnt increases) has impact on performance but did not solve the issue
- Optimizing the PaRAMs of QDMA servicing the NAND-Flash did not solve the issue
- Optimizing the masterpriorities could not prevent the McBsp from freezing
- Optimizing the DBS of all EDMA3TCs could not prevent the McBsp from freezing
- Optimizing the readrate of all EDMA3TCs could not prevent the McBsp from freezing
- Accessing PaRAM with 32-bit accesses only changed nothing
- Optimaizing the EMIFB Command Re-ordering could not prevent the McBsp from freezing
- Changing the RNUMEVT in McBsp FIFO control to any usefull value could not prevent the McBsp from freezing
- and propably many more I forgot to mention
All of those changes impacted the performance, none of those remedied the freezing McBsp.
Stefan
Stefan,
Having resource conflicts in a complex device while multiple things are running, this does happen in some cases like yours. There are limits to what you can do and there are improvements that can be done or at least tried.
Anthony said:It might be worth to try to split the different EDMA0 transfers over TC0 and TC1.
Specifically, put only the McBSP1 read transfer on EDMA0_TC0. Put all of the other transfers that have to be on EDMA0 on EDMA0_TC1. It was not obvious that you did this, since the SPI operation did not have the TCn specified.
Which memory endpoints are the other transfers using? Since your McBSP1 operation is writing to the internal Shared RAM, is this a common point for all the transfers and may be a stall point?
What data rates are you running the McBSP1, DSP and DDR?
Is the McBSP1 data a continuous stream? Do you get one FS per 16-bit sample? It could be helpful for throughput if you can get the incoming data as 32-bit words and store that data half as often.
Stefan Hittmeyer said:EDMA is waiting for further McBsp RX events (PaRAM set not yet exhausted)
The PaRAM you show at the end of your first post shows OPT.ABSYNC=1, ACNT=0x40, BCNT=1, CCNT=1. Whenever an event gets to the EDMA to read the RFIFO, it will read 0x40 bytes and then do a Link to 0x4880. There is nothing to see there that would indicate a "exhausted" state or not. It would be useful to see the EDMA0's IER, IFR, EMR, SEC, registers after the error condition has occurred, to add to the status you have already shown.
If you turn off only one of the processes 1, 2, and 3, does each of those 3 cases stop failing? Or is it only when you remove the process 2 to the NAND?
Do you have the facility to pulse different GPIOs at the start/end of each of the processes so you can see which one was coincident with the last successful McBSP1 read, or perhaps with the time when the next McBSP1 read should have occurred?
How much jitter do you have in your McBSP1 GPIO pulse? Does it always move around or does it tend to stay close to a fixed time after the 16th sample comes in, then jump a lot only during certain events? How much would that max jump be (same as jitter, maybe)?
If you turn off the EDMA0.IER bit for the McBSP1 Rx, will the freeze go away? I am curious if it is related to the DSP having to do anything.
Those are my ideas for now.
Regards,
RandyP
Hi,
please find below the answers to your questions
- McBSP internal CLK speed and serial clock speed?
ASYNC3 = PLL1_SYSCLK2 -> PLL1_SYSCLK2 = 150 MHz, 10,4857MHz serial clock speed
- Seems that the EDMA uses the shared RAM. Have you tried to use part of the L2 SRAM as RAM for the McBSP buffers?
No. We need to use the shared RAM or DDR2 (but thought shared would be better for performance)
because ARM manages the memory while DSP calculates with those data
- What is the RCV buffer mechanism scheme (one single buffer, ping/pong buffers, ..etc)?
2 ping-pong buffers filled continuous by EDMA3 with completion-interrupt
- Is the McBSP the only peripheral serviced by the EDMA TC1?
McBSP is serviced by EDMA0 TC0 together with SPI1 Rx
EDMA0 TC1 services QDMA (memcpy + NAND-Flash), SPI1 Tx, MMCSD0 Rx/Tx
EDMA1 TC0 services memsorting
- Have you tried to use a memory destination not use by any other resources?
Yes, the shared RAM is only used for the McBSP Data and therefore ARM and DSP also access those data for calculation and management
- What SW are you using to program the McBSP and EDMA ? The DSP BIOS/SYSBIOS drivers, starterware or custom code?
custom code
With those answers any requirements for a conf call should be given.
with regards,
Stefan Hittmeyer
Thanks for the clarification.
Just to be sure about the different process and the SRC/DST of all the data transfer.
- For process 1 what memory is the SCR/DST of the QDMA memcpy transfer?
- For process 2: How is the writeverify done? I guess that you use memory for this: what memory is it (shared RAM, DDR, .etc)? What are the SRC/DST of the QDMA transfer?
- For process 3: Where are the RX/TX SPI buffers located (Shared RAM, DDR, ..etc)?
The idea here is to be able to identify if the shared RAM is the bottleneck of the system (ie if several MASTER contend to access it). Thanks in advance for the additional info.
Anthony
Also one more question:
- Does it change something is the occurence of the missed event when you make the RX FIFO size bigger or smaller?
On your setup RFIFOCTL.RNUMEVT seems to be 0x10 is 16 words.
Anthony,
- For process 1 what memory is the SCR/DST of the QDMA memcpy transfer?
DDR2
- For process 2: How is the writeverify done? I guess that you use memory for this: what memory is it (shared RAM, DDR, .etc)? What are the SRC/DST of the QDMA transfer?
DDR2
- For process 3: Where are the RX/TX SPI buffers located (Shared RAM, DDR, ..etc)?
DDR2
Please have a look to the attachment of my very first post! I should be a cinch for you interpreting those registers with the help of SPRUH77a ;-)
Stefan
Yes I have looked at the attachement and I did not find any DMA transfer using shared RAM (apat from the McBSP FIFO).
So apart from the EDMA is the CPU accessing the shared RAM at all? For example is process 2 using shared RAM for the verification?
For all the process is the CPU accessing the different buffers in DDR2? If it is the case is cache enabled and are there any CACHE coherency done?
Some comments following the info you gave:
Looking at the different PaRAM values it seems that the different transfers are well spread over EDMA0 TC0, EDMA0
TC1 and EDMA1 TC0.
The EDMA0 TC0 handles the McBSP FIFO to Shared RAM transfer and the SPI RX SPI RCV register to DDR2. The transfer scheduled on TC0 should not take the complete bandwith on the SCR bus.
So I guess that the issue might be more at the EDMA CC event QUEUE end (see page 558 fig 18-2 of the SPRUH77A TRM) that actually schedules the transfers on the TC0/TC1..
There are some ways to debug the Event queue usage (see page 591 section 18.2.10.2 and following of TRM).
There is a way to track the max usage of all event queues.
- Could you look at the different registers mentioned in those sections?
The max queue length is 16. If more than that are coming then I think that the QUEUE just stall. This could make you miss a real time event.
Also as mentioned earlier the depth of the McBSP FIFO (RFIFOCTL.RNUMEVT) has an influence on how many EDMA events are generated. See the below post about the McBSP TFIFO. Same concept applies to McBSP RFIFO:
http://e2e.ti.com/support/dsp/omap_applications_processors/f/42/p/60093/219105.aspx#219105
Anthony
Anthony,
for clarifcation as I described in my first post:
In order to verify this problem not being selfmade, I created a “nutshell”-project reproducing this issue. This projects sets up the McBsp and EDMA3_0 for McBsp servicing. Subsequent to that, 3 processes will start to create the additional load on EDMA3_0 as followed:
So there is no real function done by this 3 additional processes except increasing the load. It is correct, that McBSP + EDMA3 push continously data to the shared ram (no cache active) an that (in the nutshell) no other Busmaster accesses this data in shared ram (in real application they would).
Those other 3 processes (and with that also the EDMA jobs) use the DDR2 for any buffering, read/verify etc.
For DDR2 the read-cache is active and no write-cache (write-through). EDMA invalidates the DDR2 cache whenever EDMA wrote data to the DDR2.
I already debugged those EDMA queue registers and tried to trace back, but the 16 events is just way too few history for identification of suspects. Moreover the watermark never reached values higher than "1" (even though the watermark of "1" seams to me a little unlikely but not impossible).
on Dec 19 2014 02:14 AM I posted:
- Changing the RNUMEVT in McBsp FIFO control to any usefull value could not prevent the McBsp from freezing
So also this point was evaluated.
We found the following "workarounds" which could maybe a hint for you (this is not nutshell-project, this is real application program):
-To keep the McBSP alive and beeing able to proceed application develeopment we swaped any data resided in NAND-Flash to SD-Card. So for now there is no runtime-access to NAND-Flash at all (neither read, nor write).
-We assembled a quick-N-dirty sample of next gen hardware which uses a 16bit wide attached NAND-Flash instead of 8-bit wide attached NAND-Flash. With that hardware we were not able to reproduce the issue in a very quick-n-dirty test. But please mind, this does not mean, that we won't have this issue in future when the application increases and the load does also (which they will definitely)
Stefan
Hi Stefan,
Just to be sure. Here are the EDMA transfers setup in the system:
EDMA0 TC0: McBSP FIFO to shared RAM
SPI RX to DDR
EDMA0 TC1: QDMA (DDR to EMIF NAND and EMIF NAND to DDR)
DDR to SPI TX
QDMA (TX MMCSDO to DDR and RX MMCSDO from DDR)
For EDMA1 TC0 you mention mem sorting? Can you be more precise? is it QDMA or EDMA? From DDR to DDR?
Thanks.
Anthony
Hi Anthony,
EDMA0 TC0: McBSP FIFO to shared RAM
SPI RX to DDR
EDMA0 TC1: QDMA (DDR to EMIF NAND and EMIF NAND to DDR)
DDR to SPI TX
QDMA (DDR to DDR)
EDMA1 TC0 memsorting would only be active in real application. But issue also apears without those!
QDMA (TX MMCSDO to DDR and RX MMCSDO from DDR) is only active instead of EMIF NAND for workaround
We checked using the McBSP without FIFO which works, but the McBSP still stalls.
The raw NAND rate measured with 8bit NAND are:
Read: ~12.2 MB/s
WriteVerify: ~4,2 MB/s
Stefan
Thanks.
Could you post the McBSP registers at the time the problem occur when FIFO is not used?
Could you as well post the EDMA register used for debug (QSTAT, QWMTHRA, ..etc)?
Since the FIFO is not used you should see a higher threshold for the Event QUEUE usage.
Anthony
Anthony,
please find required data as followed:
(edma3cc_regs_t *)0x01c00000 struct edma3cc_regs_t * 0x01C00000
*((edma3cc_regs_t *)0x01c00000) struct edma3cc_regs_t {...} 0x01C00000
pid unsigned int 0x40019B00 0x01C00000
cccfg unsigned int 0x00213344 0x01C00004
rsvd0 unsigned int[126] 0x01C00008 0x01C00008
qchmap0 unsigned int 0x00000C0C 0x01C00200
qchmap1 unsigned int 0x00000C3C 0x01C00204
qchmap2 unsigned int 0x00000C5C 0x01C00208
qchmap3 unsigned int 0x00000C7C 0x01C0020C
qchmap4 unsigned int 0x00000C9C 0x01C00210
qchmap5 unsigned int 0x00000CBC 0x01C00214
qchmap6 unsigned int 0x00000CDC 0x01C00218
qchmap7 unsigned int 0x00000CE4 0x01C0021C
rsvd1 unsigned int[8] 0x01C00220 0x01C00220
dmaqnum0 unsigned int 0x00000000 0x01C00240
dmaqnum1 unsigned int 0x00000000 0x01C00244
dmaqnum2 unsigned int 0x10001011 0x01C00248
dmaqnum3 unsigned int 0x00011111 0x01C0024C
rsvd2 unsigned int[4] 0x01C00250 0x01C00250
qdmaqnum unsigned int 0x11111111 0x01C00260
rsvd3 unsigned int[8] 0x01C00264 0x01C00264
quepri unsigned int 0x00000000 0x01C00284
rsvd4 unsigned int[30] 0x01C00288 0x01C00288
emr unsigned int 0x00000000 0x01C00300
rsvd5 unsigned int 0x00000000 0x01C00304
emcr unsigned int 0x00000000 0x01C00308
rsvd6 unsigned int 0x00000000 0x01C0030C
qemr unsigned int 0x00000000 0x01C00310
qemcr unsigned int 0x00000000 0x01C00314
ccerr unsigned int 0x00000000 0x01C00318
ccerrclr unsigned int 0x00000000 0x01C0031C
eeval unsigned int 0x00000000 0x01C00320
rsvd7 unsigned int[7] 0x01C00324 0x01C00324
drae0 unsigned int 0xFFFFFFFF 0x01C00340
rsvd8 unsigned int 0x00000000 0x01C00344
drae1 unsigned int 0x00000000 0x01C00348
rsvd9 unsigned int 0x00000000 0x01C0034C
drae2 unsigned int 0x00000000 0x01C00350
rsvd10 unsigned int 0x00000000 0x01C00354
drae3 unsigned int 0x00000000 0x01C00358
rsvd11 unsigned int[9] 0x01C0035C 0x01C0035C
qrae0 unsigned int 0x000000FF 0x01C00380
qrae1 unsigned int 0x00000000 0x01C00384
qrae2 unsigned int 0x00000000 0x01C00388
qrae3 unsigned int 0x00000000 0x01C0038C
rsvd12 unsigned int[28] 0x01C00390 0x01C00390
q0e0e15 unsigned int[16] 0x01C00400 0x01C00400
q1e0e15 unsigned int[16] 0x01C00440 0x01C00440
rsvd13 unsigned int[96] 0x01C00480 0x01C00480
qstat0 unsigned int 0x00020000 0x01C00600
qstat1 unsigned int 0x0002000D 0x01C00604
rsvd14 unsigned int[6] 0x01C00608 0x01C00608
qwmthra unsigned int 0x00001010 0x01C00620
rsvd15 unsigned int[7] 0x01C00624 0x01C00624
ccstat unsigned int 0x00000000 0x01C00640
rsvd16 unsigned int[623] 0x01C00644 0x01C00644
er unsigned int 0x00002400 0x01C01000
rsvd17 unsigned int 0x00000000 0x01C01004
ecr unsigned int 0x00000000 0x01C01008
rsvd18 unsigned int 0x00000000 0x01C0100C
esr unsigned int 0x00000000 0x01C01010
rsvd19 unsigned int 0x00000000 0x01C01014
cer unsigned int 0x00000000 0x01C01018
rsvd20 unsigned int 0x00000000 0x01C0101C
eer unsigned int 0x000C0010 0x01C01020
rsvd21 unsigned int 0x00000000 0x01C01024
eecr unsigned int 0x00000000 0x01C01028
rsvd22 unsigned int 0x00000000 0x01C0102C
eesr unsigned int 0x00000000 0x01C01030
rsvd23 unsigned int 0x00000000 0x01C01034
ser unsigned int 0x00000000 0x01C01038
rsvd24 unsigned int 0x00000000 0x01C0103C
secr unsigned int 0x00000000 0x01C01040
rsvd25 unsigned int[3] 0x01C01044 0x01C01044
ier unsigned int 0x3FCF0010 0x01C01050
rsvd26 unsigned int 0x00000000 0x01C01054
iecr unsigned int 0x00000000 0x01C01058
rsvd27 unsigned int 0x00000000 0x01C0105C
iesr unsigned int 0x00000000 0x01C01060
rsvd28 unsigned int 0x00000000 0x01C01064
ipr unsigned int 0x00000000 0x01C01068
rsvd29 unsigned int 0x00000000 0x01C0106C
icr unsigned int 0x00000000 0x01C01070
rsvd30 unsigned int 0x00000000 0x01C01074
ieval unsigned int 0x00000000 0x01C01078
rsvd31 unsigned int 0x00000000 0x01C0107C
qer unsigned int 0x00000000 0x01C01080
qeer unsigned int 0x000000FF 0x01C01084
qeecr unsigned int 0x00000000 0x01C01088
qeesr unsigned int 0x00000000 0x01C0108C
qser unsigned int 0x00000000 0x01C01090
qsecr unsigned int 0x00000000 0x01C01094
(mcbsp_regs_t*)0x01D11000 struct mcbsp_s * 0x01D11000
*((mcbsp_regs_t*)0x01D11000) struct mcbsp_s {...} 0x01D11000
drr unsigned int 0xFFFFFFFF 0x01D11000
dxr unsigned int 0x00000000 0x01D11004
spcr unsigned int 0x02002037 0x01D11008
rcr unsigned int 0x00011040 0x01D1100C
xcr unsigned int 0x00000000 0x01D11010
srgr unsigned int 0x00000000 0x01D11014
mcr unsigned int 0x00000000 0x01D11018
rcere0 unsigned int 0x00000000 0x01D1101C
xcere0 unsigned int 0x00000000 0x01D11020
pcr unsigned int 0x00000081 0x01D11024
rcere1 unsigned int 0x00000000 0x01D11028
xcere1 unsigned int 0x00000000 0x01D1102C
rcere2 unsigned int 0x00000000 0x01D11030
xcere2 unsigned int 0x00000000 0x01D11034
rcere3 unsigned int 0x00000000 0x01D11038
xcere3 unsigned int 0x00000000 0x01D1103C
Stefan
Some point to investigate:
1) On the MCBSP and HW side:
a) The fact that the McBSP suddenly stops functioning might be linked to some system level issues. Some HW checks need to be done:
- The power rails (CVDD for the IOs)
- The power up sequence (it needs to be monotonic)
- The noise level on the oscillator/CLKIN and PLL pins
- The reset signal timings
At the McBSP signal level:
- The noise level on the serial CLK
- The signal integrity on all McBSP ctrl pins (clk, FS).
- Can you dump a CVDD rail measure just before and during failure (with an 100mV resolution) to ensure it is within the specs?
- Can you do that as well with MMSCSD workaround?
- Do you see any differences?
b) Also since the McBSP seems to be stuck how can you recover from the problem?
- Can you simply re-run the McBSP init sequence and the McBSP starts again?
- or do you need to clear the RRST bit of the McBSP in order to have it work again?
This could give indication that there are some noise on the Serial CLK or some of the ctrl lines.
Could you provide some screenshot of the input CLK in both scenario (failing and MMCSD workaround).
- Have you followed the exact McBSP init sequence provided in the section 26.2.12 of the TRM?
For the measurements (CVDD and Serial CLK) try to setup the system in the worse case scenario so that you reproduce the problem as often as possible.
2) Regarding EDMA and throughput tuning:
a) What is thedata rate for the MMCSD you can achieve?
If NAND is more “efficient” then MMC/SD, it might been that MMC/SD traffic (slow) is allowing better interleaving of McBSP and MMC/SD packet compared to EMIFA/NAND reads.
b) For EDMA settings for NAND vs MMSD:
Is it truly the same “chunk/size”of data in both cases per DMA/QDMA event?
The TC optimization rules need to apply for both MMCSD and EMIF NAND.
c) What is the exact value for the EMIF BPRIO register
d) How was written the NAND EDMA driver vs MMC/SD EDMA driver:
The nature of the NAND vs MMCSD is different as for NAND there need to be some CPU involvement to setup the address to read/write and then followed by the EDMA for the actual data movement.
MMC/SD has sync events to EDMA, so are you not using those and using QDMA instead?
I hope the driver differences and difference in CPU/EDMA involvement to do address followed by data is not changing the "dynamics of this scenario" outside of a throughput issue.
e) Is the McBSP freeze happen within a fixed amount of time or increase in activity or is random?
f) Seems like you are running the device at 300 MHz, any chance of leveraging increased CPU speed?
Anthony