This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

C6747 EVM - McASP to SDRAM memory EDMA3 LLD transfer problem

Other Parts Discussed in Thread: ADS1278

I set EDMA3 LLD 01_11_00_03 in order to transfer 8 McASP channels to SDRAM external memory using double (ping-pong) buffering. When acquisition is completed an EDMA3 chaining starts a transfer to sort samples in another 8 SDRAM buffers. And when final TCC is completed I immediately copy the final 8 buffers to another 8 ones for data integrity checking. Next there is a capture of one of the 8 buffers.

The same disturbance appears at the first 8 samples of 8 buffers in every acquisition. I guess this is due to some memory stall but I'm not sure. How could I check or debug this issue? Thanks in advance,

gaston

  • Gaston,

    Is your system using the McASP FIFOs?

    -Tommy

  • No, Should I use it? Have you some McASP example with FIFOs enabled?

    Regards,

    Gaston

  • Gaston,

    I'm just trying to understand if the unexpected data is coming from system latency issues.  Are you using the BIOS drivers?  The BIOS McASP drivers for OMAPL1 should have hardware FIFO support.

    -Tommy

  • Tommy, I'm working with this enviroment:

    • PSP drivers v01_30_00_06
    • McASP configuration using CSL
    • EDMA3 settings using LLD v01_11_00_03
    • DSP BIOS v5_41_02_14

    I tried to enable McASP FIFO with RFIFOCTL.RNUMEVT to 8 and RFIFOCTL.RNUMDMA to 8 because I use 8 read serializers. After this the same disturbance apperars in buffers but taken 32 first samples instead the 8 ones. I also tried to enable 32Kb of L2 cache following these guidelines with no luck. Moreover, I cannot use internal SRAM memory due to the large size of buffers.

    I hope this helps you to understand what is going on.

    Regards,

    Gaston

  • Gaston,

    I asked a few colleagues about this and they have a few questions.

    1. Are you able to use the McASP BIOS driver instead of CSL configurations?  The driver has been tested and does not have any data glitch issues.
    2. Is the data being touched by the DSP?  There might be cache coherency issues.
    3. Are the McASP acquisitions continuous or is the McASP reconfigured each time?

    -Tommy

  • tlee said:
    Are you able to use the McASP BIOS driver instead of CSL configurations?  The driver has been tested and does not have any data glitch issues.

    Well, at first I wanted to do it this way but driver buffer has a maximum of 32Kbytes, you can check the last post of Mariana right here. In my application I have to get 10240-32bits samples from an ADC with 8 simultaneous channels and this means that 10240x4x8=320Kbytes buffer size is necessary.

    tlee said:
    Is the data being touched by the DSP?  There might be cache coherency issues.

    I'm affraid yes. I'll try to explain the escenario: McASP are continous (it also answer the last question) and I wait to pin/pong transfer completes. Next I trigger another transfer for data sorting into another buffer (I use a toggle flag to determine pin/pong sources and set an unique destination buffer called 'large_buffer').

    To solve the problem, I tried this options:

    • Configure L2 32Kb cache. It didn't work.
    • Change EMIFB BRPIObits to 0x20 or 0x30 but does not work.
    • I set EDMA3_RM_EventQueue to 1 in EDMA3 McASP request channel (and also in two link channels for ping/pong) and it improve the acquisition dramatically. Now sometimes there is a glitch (of 1 sample) and somethimes not (at any position of the large buffer).

    I'll have to investigate a little deeper to solve the problem completely and apreciate any help from you and your colleagues.

    regards

    Gaston

  • Hi Gaston

    Can you try a few more things

    Gaston said:
    I set EDMA3_RM_EventQueue to 1 in EDMA3 McASP request channel (and also in two link channels for ping/pong) and it improve the acquisition dramatically. Now sometimes there is a glitch (of 1 sample) and somethimes not (at any position of the large buffer).

    1) Can you split your EDMA transfers, such that McASP to/from SDRAM transfers are on Queue 0/ TC0 , and your memory to memory transfers are on Queue 1 -TC1. This is assuming your McASP transfers are concurrent to a previous data sorting transfer and are making use of the same Queue/TC in your current scenario.

    2) Can you share with us the details of your PARAM programming for the data sorting EDMA transfers (ACNT, BCNT  etc)

    3) This should be tried last, but you can also see if changing the DSP.MDMA priority from a default of 2 to 0 (highest priority and equal to Transfer Controller priority) helps. The priority can be configured using MSTPRI0 register (Details in the system guide, Pg 173. We typically recommend keeping this at lower priority then DMA transfer controllers etc, but this is just to understand whether the "concurrency" bottlenecks are coming from CPU accesses manipulating the buffers or is it just McASP missing real time deadlines because you possibly have the same EDMA queue also being used for memory to memory transfers for your data sorting.

     

    Regards

    Mukul

  • Mukul Bhatnagar said:
    1) Can you split your EDMA transfers, such that McASP to/from SDRAM transfers are on Queue 0/ TC0 , and your memory to memory transfers are on Queue 1 -TC1. This is assuming your McASP transfers are concurrent to a previous data sorting transfer and are making use of the same Queue/TC in your current scenario.

    here, there is McASP to SDRAM EDMA transfer request channel configuration. Link channels are set for ping/pong buffering and transfer waits for tcc1 completion

        // Request any DMA channel
        if (result == EDMA3_DRV_SOK)
            result = EDMA3_DRV_requestChannel ( hEdma, &chId, tcc1,
                                                (EDMA3_RM_EventQueue)1,
                                                NULL, NULL);

        // If successful, allocate one link channel.
        if (result == EDMA3_DRV_SOK)
            result = EDMA3_DRV_requestChannel ( hEdma, &chXId, NULL,
                                                (EDMA3_RM_EventQueue)1,
                                                NULL, NULL);

        // If successful, allocate next link channel.
        if (result == EDMA3_DRV_SOK)
            result = EDMA3_DRV_requestChannel ( hEdma, &chYId, NULL,
                                                (EDMA3_RM_EventQueue)1,
                                                NULL, NULL);

        if (result == EDMA3_DRV_SOK)
            result = EDMA3_DRV_linkChannel (hEdma, chId, chYId);

        if (result == EDMA3_DRV_SOK)
            result = EDMA3_DRV_linkChannel (hEdma, chYId, chXId);

        if (result == EDMA3_DRV_SOK)
            result = EDMA3_DRV_linkChannel (hEdma, chXId, chYId);

        // enable the transfer!
        if (result == EDMA3_DRV_SOK)
            result = EDMA3_DRV_enableTransfer (    hEdma, chId,
                                                EDMA3_DRV_TRIG_MODE_EVENT);


    and here the SDRAM mem-to-mem data sorting EDMA transfer. Now, I enable this transfer using manual trigger mode and wait for tcc2 completion

        // Request any active DMA channel
        if (result == EDMA3_DRV_SOK)
            result = EDMA3_DRV_requestChannel ( hEdma, &chn, &tcc2,
                                                (EDMA3_RM_EventQueue)0,
                                                NULL, NULL);
        // Request any link DMA channel
        if (result == EDMA3_DRV_SOK)
            result = EDMA3_DRV_requestChannel ( hEdma, &lnkchn, &tcc2,
                                                (EDMA3_RM_EventQueue)0,
                                                NULL, NULL);
        // link channels
        if (result == EDMA3_DRV_SOK)
            result = EDMA3_DRV_linkChannel (hEdma, chn, lnkchn);

        // link channels
        if (result == EDMA3_DRV_SOK)
            result = EDMA3_DRV_linkChannel (hEdma, lnkchn, chn);

    Therefore I making use of different Queue/TC in each case. If it correct, I'm giving higher priority to mem-to-mem transfer by assign it to Queue0/TC0. I also tried to reverse this configuration but does not work.

    Mukul Bhatnagar said:
    2) Can you share with us the details of your PARAM programming for the data sorting EDMA transfers (ACNT, BCNT  etc)

            edma3_cfg.srcbidx = 0;                        //
            edma3_cfg.srccidx = 0;                        //
            edma3_cfg.desbidx = 4;                        // 32-bit word
            edma3_cfg.descidx = 32;                       // 8 Serielizers (32-bit word per serializer)
            edma3_cfg.acnt = 4;                           // 32-bit word
            edma3_cfg.bcnt = 8;                           // 8 Serializers
            edma3_cfg.ccnt = 10240;                       // Samples per serializer
            edma3_cfg.sync = EDMA3_DRV_SYNC_AB;           // AB sync
            edma3_cfg.BRCnt = edma3_cfg.bcnt;             // Set B count reload as B count.
            edma3_cfg.tcomplete = 1u;
            edma3_cfg.itcomplete = 0u;
            edma3_cfg.tchcomplete = 0u;
            edma3_cfg.itchcomplete = 0u;

            edma3_cfg.BufferSrc = (Int32 *)0x01D06000;           // McASP1 address
            edma3_cfg.BufferDst1 = (Int32 *)_ping_buffer;        // Buffer address for ping transfer
            edma3_cfg.BufferDst2 = (Int32 *)_pong_buffer;        // buffer address for pong transfer

    and data sorting transfer settings,

        edma3_cfg.tcomplete = 1u;
        edma3_cfg.itcomplete = 0u;
        edma3_cfg.tchcomplete = 0u;
        edma3_cfg.itchcomplete = 0u;
        edma3_cfg.srcbidx = 32u;                    //
        edma3_cfg.srccidx = 0u;                     //
        edma3_cfg.desbidx = 4u;                     // 32-bit word
        edma3_cfg.descidx = 0u;                     //
        edma3_cfg.acnt = 4u;                        // 32-bit word
        edma3_cfg.bcnt = 10240u;                    //
        edma3_cfg.ccnt = 1u;                        //
        edma3_cfg.sync = EDMA3_DRV_SYNC_AB;         // AB sync
        edma3_cfg.BRCnt = edma3_cfg.bcnt;           //

    next, ping/pong flag updates address buffer as follows

        // EDMA3 sorting data transfer for ping buffer
        for(i = 0; i < N_CHANNELS; i++){
            edma3_cfg.BufferSrc = (Int32 *)(_ping_buffer+i);
            edma3_cfg.BufferDst1 = (Int32 *)(_large_buffer+(N_SAMPLES_LARGE*i));
            if (result == EDMA3_DRV_SOK)
                result = edma3_sort_cfg(hEdma, &tccA[i], edma3_cfg, &chnA[i]);
        }

        // EDMA3 sorting data transfer for pong buffer
        for(i = 0; i < N_CHANNELS; i++){
            edma3_cfg.BufferSrc = (Int32 *)(_pong_buffer+i);
            edma3_cfg.BufferDst1 = (Int32 *)(_large_buffer+(N_SAMPLES_LARGE*i));
            if (result == EDMA3_DRV_SOK)
                result = edma3_sort_cfg(hEdma, &tccB[i], edma3_cfg, &chnB[i]);
        }

    Mukul Bhatnagar said:
    3) This should be tried last, but you can also see if changing the DSP.MDMA priority from a default of 2 to 0 (highest priority and equal to Transfer Controller priority) helps. The priority can be configured using MSTPRI0 register (Details in the system guide, Pg 173. We typically recommend keeping this at lower priority then DMA transfer controllers etc, but this is just to understand whether the "concurrency" bottlenecks are coming from CPU accesses manipulating the buffers or is it just McASP missing real time deadlines because you possibly have the same EDMA queue also being used for memory to memory transfers for your data sorting.

    I found MSTPRIO in the system reference guide of my platform C6747 at Pg. 139. I changed the MDMA bits default value of 2 to 0 but fails again. After some seconds I get next acquisition buffer.

    More hints:

    • McASP receiver is fed by an external clock at 13,107200MHz. Assuming that 8-channel ADS1278 oversample by 256, McASP data rate is 51,2KHz
    • I see no difference between Release and Debug configuration
    • BIOS data, BIOS code, Compiler Sections and Buffer Manager are placed in IRAM
    • For capturing the glitch I calculate the difference of every sample with its prior value and check the threshold. I set a breakpoint to catch this condition.

    Regards,

    Gaston

  • Hi Gaston

    I could not allocate much time to go through your last post today. Hope to do so tomorrow and look at your code snippets more carefully.

    However based on your updates, I am still latched on the fact that when you are submitting McASP transfers on Q1 (including ping pong transfers) you saw improvements in acquisition --> So when you posted your query originally was everything (McASP and mem to mem transfers) going on Q0?

    Gaston said:
    Therefore I making use of different Queue/TC in each case. If it correct, I'm giving higher priority to mem-to-mem transfer by assign it to Queue0/TC0. I also tried to reverse this configuration but does not work.

    Can you clarify what you meant by reverse configuration does not work? Does putting McASP transfers on Q0 and mem to mem on Q1 make the acquisition/data mis match worst? Please note that even though within CC Q0 has a higher priority then Q1 , the more "dominant" prioritization is TC priorities ie. TC0 vs TC1, as in your case both TCs would be accessing SDRAM memory (TC1 for McASP to SDRAM and TC0 for SDRAM to SDRAM). If you want one set of EDMA transfers to have higher priority then the other based on Q-TC, I would recommend changing the TC priority via MSTPRI registers ( default is 0 for both) , this is also in the system guide (sorry for pointing you to the wrong system guide)

    Gaston said:
    BIOS data, BIOS code, Compiler Sections and Buffer Manager are placed in IRAM

    Can you clarify what you meant by buffer manager?

    You mentioned chaining transfers, I didn't catch that from your code snippets. Do sorting transfers after McASP ping/pong completion need to be done manually or could be they be chained?

    Also,  in one of your previous posts you mentioned that on enabling fifos you saw 32 samples getting corrputped (not 64 8x8?) , can you confirm that? Still intriguing why you were seeing this only at the beginning of every transfer set?

    Lot of questions from our side, but remote debug has its caveats :).

    Regards

    Mukul

     

  • Mukul Bhatnagar said:
    However based on your updates, I am still latched on the fact that when you are submitting McASP transfers on Q1 (including ping pong transfers) you saw improvements in acquisition --> So when you posted your query originally was everything (McASP and mem to mem transfers) going on Q0?

    Mukul, forgive me. I have changed my original configuration. The first one chained McASP-ping/pong with a mem-to-mem data sorting transfer. The 8-channel data sorting transfer are chained as well between them (from 1 to 8) and I had to wait for only one tcc (final completion at 8th channel data sorting). Later I decided to split these transfers and trigger them separately for debug purposes. Next posts correspond with this scenario. I'm sorry again for the misunderstanding. However the first configuration also going on Q1 for McASP-ping/pong and Q0 for mem-to-mem transfers.

    Mukul Bhatnagar said:
    Can you clarify what you meant by reverse configuration does not work?

    I wanted to say that Q0 for McASP and Q1 for mem-to-mem transfer neither works.

    Mukul Bhatnagar said:
    I would recommend changing the TC priority via MSTPRI registers

    Ok, I'll check it carefully.

    Mukul Bhatnagar said:
    Can you clarify what you meant by buffer manager?

    It seems it's a fast, deterministic fixed-size buffer allocation BIOS component but I think I don't use it. I'm not very experienced in DSP/BIOS enviroment. At least for the time being.

    Mukul Bhatnagar said:
    Also,  in one of your previous posts you mentioned that on enabling fifos you saw 32 samples getting corrputped (not 64 8x8?) , can you confirm that? Still intriguing why you were seeing this only at the beginning of every transfer set?

    Ok, let me work a little bit on enabling FIFOs as well. I'll keep you informed about my progress. Mukul, I really appreciate your help.

    Regards,

    Gaston

  • By using the last McASP+EDMA3 configuration (with no chaining) I tried to set TC0 and TC1 priorities by changing MSTPRI register. I set EDMATC1=6 and EDMATC0=7 (assuming that McASP to SDRAM has the max priority and SDRAM to SDRAM data sorting a lower one) but unfortunately that made no difference. After some seconds I can see the glitches again in destination buffers...Read McASP FIFO enabling (RFIFOCTL.RNUMEVT=8 and RFIFOCTL.RNUMDMA=8) seems that does neither work.

    I really do not know how to solve the problem and think that may be similar to the thread started by Stephen, few days ago.

    regards,

    Gaston

     

     

     

  • Gaston,

    Are you meaning to set the TC priorities to the two highest (least latency) priorities?  If so, the priorities should be [0 and 1] instead of [6 and 7].

    -Tommy

  • Hi Gaston

    Sorry to hear that the issue is still not resolved.

    Some more questions from my side (I have also notified the PSP team to monitor this thread)

    1) Can you tell us what other masters, peripherals are in this system apart from your DMA transfers to/from McASP and memory to memory transfers. Trying to understand what else could by vying for external memory bandwidth

    2) Can you tell us how the failure patterns are showing up now, initially you mentioned it was only for the first x samples of every acquisition. Is this still the case and can you clarify what beginning of every acquisition implies from a EDMA/McASP standpoint , as you also mentioned that your McASP transfers are continuous.

    3) You are using an older PSP release, I am not sure if it would should have a direct impact on what failures you are observing, but for easier debuggability and ensuring that you are using the latest drivers , can you please update to GA release 1.3.00.

    Note it mentions a few McASP issues that have been fixed, in the release notes of the package

    CQ SDOCM00062991: The audio would not play after using the channel reset IOCTL. This was because the IOCTL was resetting the device to default values.This has been fixed to reset only the channel data structures and channel states.Files changed: ti\pspiom\mcasp\src\Mcasp_ioctl.c


     CQ SDOCM00063254: In non-loopjob mode, during channel deletion, the driver would attempt to restart the EDMA and clocks, which would generate error
    interrupts. However, the clocks and EDMA should be restarted only in case a newpacket is submitted in non-loopjob mode. This has been fixed.Files changed: ti\pspiom\mcasp\src\Mcasp.c

    Regards

    Mukul

     

  • Mukul Bhatnagar said:
    1) Can you tell us what other masters, peripherals are in this system apart from your DMA transfers to/from McASP and memory to memory transfers. Trying to understand what else could by vying for external memory bandwidth

    There are no other masters in the system. At the moment these are the used masters: EDMA3TC0, EDMA3TC1 and DSP. Slaves are: GPIO and McASP1

    Mukul Bhatnagar said:
    2) Can you tell us how the failure patterns are showing up now, initially you mentioned it was only for the first x samples of every acquisition. Is this still the case and can you clarify what beginning of every acquisition implies from a EDMA/McASP standpoint , as you also mentioned that your McASP transfers are continuous.

    Yes, McASP transfer are continous. Please, let me check the whole system one more time before continue. I mentioned there was x corrupted samples at the first buffer position but now I find wrong samples in different locations. I want to know if there is something wrong with signal pattern generation or front end circuit.

    Mukul Bhatnagar said:
    3) You are using an older PSP release, I am not sure if it would should have a direct impact on what failures you are observing, but for easier debuggability and ensuring that you are using the latest drivers , can you please update to GA release 1.3.00.

    I'm using the very last PSP version. I have downloaded it from here.

    Regards,

    Gaston

  • Hi Tommy,

    I was referring to default priority values of table 3-1 of pag. 26 in SPRUFK4D. I trying to understand how system interconection works but I find it a bit difficult. What do you mean with latencies and what is the relationship between priority and latency?

    regards,

    gaston

  • Gaston,

    "Latency" was the wrong word on my part.  I pictured lower-priority masters with delayed accesses when requesting shared resources because of arbitration.  Somehow, that translated to "higher latency" in my mind.  The priority setting does not have any effect on actual datapath latency. 

    There are a couple of wiki articles for system interconnect that may help.  Here's the overview article: http://processors.wiki.ti.com/index.php/OMAP-L1x/C674x/AM1x_SoC_Architectural_Overview

    By the way, do you have a local FAE contact?

    -Tommy

  • Hi Gaston

    Gaston said:
    I'm using the very last PSP version. I have downloaded it from here.

    The latest one available is actually 1.30.01(you indicated you have 1.30.00.06), you can find the release (+release notes) on this link

    Regards

    Mukul

  • Very helpful this wiki article, thanks! Yes, we should contact our local FAE, Massimo Martelli.

    regards,

    gaston

  • Hi,

    At the end this is what is happening:

    • Corrupted samples at first positions appear when McASP to SDRAM and SDRAM to SDRAM transfers are set using the same EDMATC. This is solved by setting transfers on Queue 0/ TC0 and Queue 1 -TC1 in that order.
    • Glitches are due to ringing effects on wire connections between ADC and McASP interface since serial clocks and data lines are running at 13MHz with no shielding. Therefore, some sample's MSB can be missed and shown in buffers as a signal peak. I have found a provisional workaround changing RDATDLY McASP register to 1 or 2 delay bits related to Frame Sync signal.

    Thank to everyone for the support provided.

    Regards,

    Gaston

     

  • Hi Gaston

     Thanks for the update. Good to see your issue is resolved.

    As a rule of thumb it is always recommended to keep the McASP/McBSP transfers (real time transfers) on a dedicated queue/TC and rest of the system traffic relying on the EDMA3 on other queue/TCs. Additionally, since by default all TCs are at the same priority (highest aka 0), it is always recommended to change and lower the priority of  the queue/tc servicing the other transfers  (like your mem to mem)

    Regards

    Mukul