This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

QDMA transfers using EDMA LLD on C6678 - What all parameters need to be written if a series of linked QDMA transfers are to be done if none of the parameters vary between transfers?



Hi,

I am trying to do a linked QDMA transfers using EDMA LLD on C6678 to transfer some data from DDR3 to L2SRAM. I took the code from sample app. I have linked 8 QDMA channels together and trigger the transfer. Now, if I have to do the same set of 8 linked transfer again with no change in any parameters, what all parameters need to be written again the subsequent times?

I just tried to only write the trigger word (which is CCNT) in my case during subsequent transfers. The transfer did happen and interrupt got raised. However, it looked like only the last link in the transfer got the new data from DDR to L2 and rest of the links did not get any new data. I have ensured that I invalidate cache for L2 destination address before starting the next transfer.

But if I free the channel, request again and write all the parameters even in subsequent transfers, it works fine. 

Can anyone please let me know if they have any info on which all parameters need to be written again in the case of subsequent QDMA transfers?

Thanks,

Harinandan

  • Hi Harinandan,

    Let us try to solve this with a bit more information provided. Can you be specific about the following, please:

    1: What are the values of the PARAM set which you are setting for the QDMA transfer to happen. Can you please list there values here.

    2. Since its a link, we need to trigger the QDMA 8 times using the core,unlink in chain where in we just need to trigger it once.
    3. Is the value of CCNT > 1? Are you using AB-Sync? Because,  after first link is completed, the EDMA3 module decrements the value of CcnT. Are you making sure that you write the decremented value and not the value in the first place? Ccnt is 1 in the sample app.

    4. Since the source buffer is in DDR3, are you doing a cache flush?

    5. Any specific reason why use only QDMA for linking? Triggering of QDMA is faster than normal DMA but setting up the link QDMA is long. Plus, its not recommended to use QDMA for linking or chaining! QDMA is used to perform a quick data transfer. Address/count updates and linking are not performed. CCNT = 1 (single event transfer).
    Please look into slide 109 from the attached document for more info about QDMA. 

    5736.Intro_to_ EDMA.pdf

    Regards

    Sud

     

  • Hi Sudarshan,

    Thanks for the reply. Please find answers to the questions below.

    1. PARAM set values are not out of ordinary. I used ACNT = 2560, BCNT = 90, CCNT = 1. B index = 2560. C index is not required since I am using AB sync and CCNT = 1. I just set it to 1.

    2. No, I need not trigger the QDMA 8 times. I set CCNT as the trigger word and static bit of the param set is set to 0 for all the links except for the last one. I only manually trigger the first link. Since static bit is 0, it reloads the PARAM set. Since CCNT is set as trigger word, it automatically triggers the next in the link. It goes on until the last one whose static bit is set to 1 and hence end of linked transfers.

    3. CCNT = 1 and I am using AB sync mode. I know EDMA decrements CCNT but not sure if a QDMA also decrements CCNT. I need not write any decremented value to CCNT because the reload happens automatically and triggers the next transfer in link.

    4. Source is in DDR3 and destination in L2SRAM. Since the core only deals with L2SRAM, I invalidate the cache for L2SRAM. I dont think i need to invalidate the cache for DDR3 since core is not involved in DMA transfers.

    5. One of the main reason to use QDMA was I could link 8 channels and trigger just once to achieve 8 transfers. In my use case, a video frame is divided into 8 parts and each part is transferred to L2SRAM of a core. Hence I could just set up the links, trigger once and when all transfers are complete, I get an interrupt. This way core does not need to manually trigger individual transfers in a link, leads to less context switches and also easy.

    I am not very sure about the slide in the pdf. I know QDMA is mainly used for quick single transfers. But if you look at the EDMA LLD, the sample application links 2 transfers. I just extended it for 8 transfers. 

    One of my main suspects now is the docs say PARAM set is reloaded if static bit is set to 0. What does reload mean here? Is the PARAM set of a link channel copied to QDMA channel? I somehow suspect if this is the case by the end of linked transfer, the last PARAM set ends up in the QDMA channel. Hence when I try to start the same linked list of transfers again, only the last transfer is actually happening and rest are not. 

    I'll also look at EDMA and see how I can use that. If that works, may be I'll drop the idea of using QDMA linked transfers.

    Regards,

    Hari

  • Hari,

    What you are doing is a trick with the QDMA. It will certainly work when done correctly, but it can be difficult to get it working exactly the way you want.

    In the Training section of TI.com, there is a training video set for the C66x SOC architecture. It may be helpful for you to review some of the modules. There does not seem to be an EDMA3 module there, though. You can find the complete video set here.

    In the Training section of TI.com, there is also a training video set for the C6474. In particular, the EDMA3/QDMA/IDMA Module may help you understand some of the terms used in the EDMA3 user Guide and some of the features and options available within the EDMA3 module. You can find the complete video set here. There are some differences for the C6678 EDMA3 modules in terms of TC count and channel count, but the basic operation is the same for any EDMA3 module.

    Harinandan Srinivasamurthy said:
    5. One of the main reason to use QDMA was I could link 8 channels and trigger just once to achieve 8 transfers. In my use case, a video frame is divided into 8 parts and each part is transferred to L2SRAM of a core. Hence I could just set up the links, trigger once and when all transfers are complete, I get an interrupt. This way core does not need to manually trigger individual transfers in a link, leads to less context switches and also easy.

    One term that was confusing was "channel". There are 8 QDMA channels available in each of the EDMA3 modules in the C6678. However, you are not linking 8 "channels", instead you are using the Link mechanism to reload and trigger the same QDMA channel 8 times using different Link PaRAM sets. Your explanation above has made your meaning clear, and I do believe you are doing it all correctly for the first set of transfers. What you are not doing correctly is your attempt to repeat the transfers.

    Harinandan Srinivasamurthy said:
    One of my main suspects now is the docs say PARAM set is reloaded if static bit is set to 0. What does reload mean here? Is the PARAM set of a link channel copied to QDMA channel? I somehow suspect if this is the case by the end of linked transfer, the last PARAM set ends up in the QDMA channel. Hence when I try to start the same linked list of transfers again, only the last transfer is actually happening and rest are not.

    The term "reload" and "link" are often used to mean the same action. When any QDMA or DMA operation completes (and OPT.STATIC=0), the PaRAM set indicated by the LINK field is copied to the active PaRAM set for the QDMA or DMA channel. That copy action should technically be called a "load" instead of a "reload", but since this is often done to repeated put the same values into the active PaRAM set, we tend to call it reloading. A subtle detail, or maybe just my opinion.

    If you look at the QDMA channel's PaRAM after one sequence of 8 QDMA operations have completed, you will see the contents from the last Link step, where OPT.STATIC=1 and all of the parameters match what was in that last Link PaRAM set. This is what you should expect.

    To restart the sequence exactly as it was done the first time, you will have to have the DSP manually recreate the first transfer in the QDMA channel's active PaRAM set, the PaRAM set that is assigned to this QDMA channel. By only writing to CCNT, you trigger the QDMA channel but it will run with the other contents of the active PaRAM which are not what you wanted.

    You have at least four choices how to handle this:

    1. Rewrite all 8 PaRAM values to the active PaRAM set exactly as you did at the end of the process when you prepared for the first set of transfers. It was the last of those 8 writes that triggered the QDMA channel and started the sequence of 8 transfers by linking.

    2. Put all 8 of the transfer PaRAM sets in link sets, including the first one. Set the last one to have its LINK field pointing to the first transfer's link set, but with OPT.STATIC=1 and the proper interrupt fields set. To start the sequence the first time, in the QDMA channel's active PaRAM set write 0x00000000 to the OPT register, write the proper LINK field pointing to the first transfer's link set, write ACNT=1, and then write 0x00000000 to the CCNT register. The write to CCNT will trigger the operation, but the operation will not copy anything because CCNT=0, but it will do the Link operation which will copy the first transfer's Link set into the QDMA's active PaRAM set and trigger that first operation to start. When the last transfer has completed, OPT.STATIC=1 will properly stop the sequence. To start the exact same process again,  in the QDMA channel's active PaRAM set write 0x00000000 to the OPT register, be aware that the proper LINK field pointing to the first transfer's link set is already set correctly, be aware that ACNT has a valid non-0 value, and then write 0x00000000 to the CCNT register. Now the DUMMY transfer will start and will Link/load the first transfer's PaRAM contents and start the copying.

    3. Use 8 DMA channels and chain from one to the next. Each can use OPT.STATIC=1 so none of them will have to do a Link from anywhere and those parameters will remain unchanged in the associated active PaRAM sets. Chaining will have the same effect as the Linking/loading does for the QDMA channel, which is partly why chaining and linking get confused as terms quite often. The slides in the PDF that Sud offered have a section that shows the chaining process. And this is also discussed in the C6474 EDMA3 module of the training videos.

    4. For another "trick", use one DMA channel with "self-chaining" where it does a Link from a new set of transfer PaRAM values while simultaneously triggering itself through chaining. Then the last transfer will not chain anywhere but will still do a Link to load the first transfer PaRAM link set. This will require 9 PaRAM sets: the 8 transfer sets plus the active PaRAM set for the DMA channel being used. Normally, the active PaRAM set will be initialized with the exact same contents as the first transfer's PaRAM, linking to the second PaRAM set. Then you start the first and all subsequent transfers by writing to the ESR register or using a CHANNEL_SET CSL function, or the similar function from the LLD.

    Regards,
    RandyP

  • Good comment Randy! Thats one elaborate answer one would get. Would just like to add on few points.

    Harinandan,
    So your using OPT.STATIC=0 and implementing the QDMA linking in which after QDMA transfer completion, the PaRAM set pointed by the link field is copied or loaded  to the active PaRAM set of the QDMA. And by using Ccnt as the trigger word ensures that all the field in the PaRAM set is copied before triggering the QDMA. This would be one great way to implement linking which avoids manual triggering for of all the links. But as Randy said, implementing such a linking mechanism is tricky, one slight mistake, the required result will not be achieved. 
     

    Harinandan Srinivasamurthy said:
    5. One of the main reason to use QDMA was I could link 8 channels and trigger just once to achieve 8 transfers. In my use case, a video frame is divided into 8 parts and each part is transferred to L2SRAM of a core. Hence I could just set up the links, trigger once and when all transfers are complete, I get an interrupt. This way core does not need to manually trigger individual transfers in a link, leads to less context switches and also easy.

    I still don't get the exact reason why use only QDMA to perform such an action. And I would not agree with the last line since we can see here that implementing such a linking mechanism is tricky and difficult. If you want such a transfer mechanism where in you just need to trigger once and all the subsequent transfers are complete, then DMA chaining would be an better option. Since we have plenty number of DMA channels available, 8 DMA channels can be used without any worry.

    Even though QDMA is much faster, there would not be a much difference in the profile value (time taken to complete the whole transfer) between chain DMA and link QDMA, since in linking, after transfer completion of each link,  the PaRAM set pointed by the link field is copied to the active PaRAM set , which eats up some of the time. 

    Nothing restricts us from using QDMA for linking, the sample project just shows an example as how to set up linking for QDMA. As the name suggests, QDMA are used when the transfer is to be done quickly, which is by just setting the trigger word. So its recommended (but not compulsorily) to use QDMA only for single event transfers.

    To sum it up, if time constraint is important in your application, and you have much time worth spending to implement the QDMA linking mechanism , then go for it. Else, would suggest go for DMA chaining where setting up the chain is really easy. You can refer the EDMA3 LLD and the example project to know how to set up the chain DMA. The document which I'd attached previously would also be useful.

    Regards

    Sud 

  • Would like to add one more point. Since your using the EDMA3 LLD, you can use the API EDMA3_DRV_getPaRAMPhyAddr() to get the physical address of the active PaRAM set. Using this address, you can see the contents of the PaRAM set using the memory browser. This would help you to better understand the linking mechanism. Also might help you in implementing the above QDMA linking mechanism.

    Regards 

    Sud

  • Hi Randy and Sudarshan,

    Thanks for helping me out. As you see, I'm pretty much new to TI DSP world!

    I had missed looking at DMA chaining closely. Linking was the only thing in my head when I attempted it. Since I found out QDMA linking does not require multiple triggers, I jumped at it. Now, I'll go back and look at chaining example and material to see how to set it up and get it working. In summary, these are the only two requirements I have for DMA in my case:

    1. I should be able to trigger once and let all 8 transfers take place. I'll enable interrupt only for the last transfer and hence core will be interrupted when all transfers are complete. 

    2. More the DMA channels available, better it is. I just attempted getting data from DDR3 and L2SRAM. I also do have to do the opposite to store data back from L2SRAM to DDR3. If I have two separate sets of linked channels available, one for each direction it is even better. 

    Also, time to set up was not at all a primary consideration when I attempted QDMA. It was just the ease of extending the sample app to what I wanted made me to consider it. Thanks again for the help. Will look at DMA chaining example in EDMA3 LLD, try and it and will post if I could achieve my desired result.

    Regards,

    Hari

  • Hi,

    Following EDMA3 LLD DMA chaining example, I wrote the code to request 8 DMA channels and set them up for chaining. I am running the code on Core 0. I am using EDMA3 instance 0, region 0 as defined in LLD.

    Requesting two DMA channels is fine. When I try to request the third DMA channel, I get channel not available error. I checked the datasheet for C6678 and found that there are 16 DMA channels associated with CC0 and 64 each with CC1 & CC2. Hence there are plenty of them available.

    I looked at a structure in EDMA3 LLD for C6678 which specifies the above values in total number of DMA channels supported (sampleEdma3GblCfgParams) but there was another structure sampleInstInitConfig where "ownDmaChannels" has only two entries for instance 0, region 0. So is the request for third channel failing because there are only two entries in sampleInstInitConfig for C6678?

    Thanks,

    Hari

  • Hari,

    CC0 is pretty special, so I recommend using it only for transfers in which the source and destination are both in the set of [ MSMCSRAM, DDR3 ]. For transfers like yours that have source or destination in a CorePac's L2, you will get the same or similar performance with CC1 or CC2, and there are many more DMA channels available.

    The EDMA3 LLD includes Resource Management, with the structs that you mentioned and others. You are correct that you need to configure these so that the right channels and PaRAMs and TCCs are available. Make sure you keep everything unique between the cores/regions.

    Sud,

    I look forward to your comments on this. I have a feeling you know a lot more than I do about the use of the LLD. That is a weakness of mine, but you obviously know your way around the EDMA3 module very well.

    Regards,
    RandyP

  • Hi Hari,
    Yes you are right.  The structure "sampleInstInitConfig" defines the EDMA3 resource allocations for various EDMA3 instances for different shadow regions. In the present sample project, the reason why only 2 channels are being allowed to open (4 channels to be specific, 2 reserved, 2 unreserved) is that only 2 channels are sufficient for the sample project (2 channels max for chaining). And I believe your assigning ch=EDMA3_DRV_DMA_CHANNEL_ANY. Doing so the API EDMA3_DRV_requestChannel() will return the channel number from the pool of free dma channels and not from pool of reserved dma channels. 

    So we need to define resource allocations as per needed by our application. I myself had the same issue. Myself and Randy were able to solve this together. Check out the link below for the forum which describes the same issue . You would find much more detailed explanation there.

    http://e2e.ti.com/support/embedded/bios/f/355/t/242609.aspx

    Hi Randy,
    Thanks a lot for the compliments. Well, to the fact, I'm just an entry-level engineer/fresher working in a small company who was assigned to work on the EDMA3 module. To say, I pretty much enjoy working on the TI DSPs. We get alot of free softwares, alot of help and support from the TI engineers. Its just feels great to work and learn. I'm also looking forward in helping people who have issues related to EDMA. I also believe that by helping, we get to learn a lot.

    Thanks and Regards

    Sud 


  • Hi Sudarshan and Randy,

    Thanks for the help and pointing to the other thread for details. I am now using CC1 and edited sampleInstInitConfig to give more DMA channels to Instance 1, region 0. This way I was able to open 8 DMA channels on region 0. Hence I'll close this thread by marking as answered.

    But I am facing some issues with chaining. I see only the first channel DMA is working and rest are not. Also they seem very slow. So something has gone wrong in setting up the chain and transfers. I am currently debugging it. Will create a new thread if I cannot solve it. The current thread title seems inappropriate for the issue I am experiencing now.

    Thanks again for the help. 

    Regards,

    Hari

  • Harinandan Srinivasamurthy said:
     Also they seem very slow.

    How exactly are you profiling it? On the simulator or the EVM? Note that EDMA3 is a separate hardware module in the chip, the cycle values for data transfer using EDMA3 on the simulator will be inaccurate.

    Yes, do create a new thread if issue still persists.

    Regards

    Sud

  • Hi Sudarshan,

    Thanks for responding. I am running on the actual hardware and profiling by using timestamp_get32. There was not just slowing down but only first DMA was working. But its solved now. I can see almost the same time as with QDMA. I do not know what exactly went wrong but when I tried the below sequence I saw this behavior:

    LOOP FOR EACH VIDEO FRAME { Request EDMA channels, Set up parameters & chaining, Enable first transfer, Wait for completion of all transfers, Free all the channels, Do a dummy processing on L2 buffer, Copy the L2 buffer back to DDR (memcpy)}

    By just moving the "Free all the channels" block after processing, I was able to overcome the issue. Still not sure what exactly was wrong with the sequence in first way.

    Now, I am back at my original intention of doing repeated such transfers. I tried to request channels once at the start and then only wrote the PaRAM set for each channel every time and initiated the transfer. The first transfer when channels are requested works fine. The second transfer initiated without freeing the channels but just writing PaRAM sets again and enabling the transfer results in an exception in the "process" block after DMA is complete. But if I free and request channels for every transfer, there is no exception. So I am still wondering what is wrong. Will try to debug and see if there is something wrong in the way I am doing and then will create another thread with the exact issue if I am still lost.

    Regards,

    Hari