This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Reading/Writing speed of EMIF/EDMA

Other Parts Discussed in Thread: TMS320C6747

Hi,

I am using the C6747.

I am serving an asynchronous FPGA with EDMA through EMIFA.

The EMIF (100MHz clock) is configured with hold  = 1, strobe = 3 and setup = 1, both for writing and reading.

The EDMA configuration is the same for the writing (from internal memory to FPGA) and the reading (from FPGA to internal memory) operation as well.

I noticed (by checking the WE and OE signals with the oscilloscope) that the time required for writing one 16 bit data to the FPGA is about 70ns, while for reading 16 bit data it takes 140ns!!!! 

Any idea about the reason of this difference ?

Thanks,

Sara Salvador

  • Sara,

    Please show a picture from the scope or logic analyzer that shows the waveforms and timing points you are describing.

    Between what points are the measurements being taken?

    Are the values 1/3/1 the programmed N-1 values or the resulting N values?

    What is the EDMA configuration used for both reads and writes?

    The results you see should be about the same whether you are using the EDMA3 or the DSP to do the reads and writes. Could you confirm the waveforms and timing using the DSP code match what you saw with the EDMA3, please?

    Regards,
    RandyP

  • Hi Randy,

    I checked the WE and OE signals of EMIFA, I attach a picture of the oscilloscope (WE in the first picture, OE in the second). The time measurement is taken between two successive falling edges of the WE/OE signals.

         

    The values 1/3/1 are the resulting N values, i.e. I configured the register with values 0/2/0.

    The EDMA trasfers for read and write are configured as AB-synchronized, with Acnt = 2, Bcnt = 256 and Ccnt = 1024.

    The first trasfer (for writing to the FPGA) is triggered by an interrupt and the second one (reading) is chained to the first.

    I tried to use the CPU instead of the EDMA and the tranfer becomes faster ... the problem is that I cannot waste CPU time for serving the FPGA, I really would like to use the DMA!

    Sara

  • Sara,

    The pictures and the extra description are very helpful to explain what you are seeing and measuring. It is possible the timing you see is what can be expected under these conditions, but it is also possible that some improvement can be found.

    What are all of the EDMA PARAM values? Please list all 8 of the registers fully.

    What timing do you see when using the DSP code to do the writes and reads?

    Please run the EDMA test with EMIFA_SDCR.SR=1. I assume it is currently at the default value of 0. This is just a test for evaluation to see if we can improve your timing.

    Regards,
    RandyP

  • Here are the Param Values.

    For the Writing (triggered by an interrupt) - PaRAM 22:
    W_OPT                     =   0x00C07004  (channel 7 is the channel used for the reading)
    W_SRC                     =   internal_memory_buffer
    W_A_B_CNT             =   0x02000002
    W_DST                      =   0x62002000 
    W_SRC_DST_BIDX  =   0x00000002
    W_LINK_BCNTRLD  =   0x00004100
    W_SRC_DST_CIDX  =   0x00000400
    W_CCNT                   =   0x00000020

    For the reading - PaRAM 7:
    R_OPT                      =   0x00C0B004 
    R_SRC                      =   0x62001800  
    R_A_B_CNT             =   0x02000002
    R_DST                      =   internal_memory_buffer
    R_SRC_DST_BIDX  =   0x00020000
    R_LINK_BCNTRLD  =   0x00004120
    R_SRC_DST_CIDX  =   0x04000000
    R_CCNT                   =   0x00000020

    I also did some tests by manually triggering the transfer one at a time, and tried to change the B/C cnt of the trasfers and the behaviour was exactly the same.

    I did not mention this before, but the data are transferred correctly, it is just a matter of time...


    When using the DSP code I see exactly what I expect: 50 ns for a 16bit transfer, with strobe = 3, hold = 1 and setup = 1, and if I change these values, the time changes consequently.

    I run the EDMA test with SDCR.SR = 1 as you suggested but it doesn't make any difference...

    While reading the Megamodule manual I found a chapter about the bandwidth manager. I read that it has to do with the arbitration of DMA transfer.
    Does it have anything to do with my problem ? The register listed are not mentioned in any of the other manuals (TMS320C6747, DMA ...), I am not sure that they actually exist ... anyways I try to change the values at the addresses liste in the Megamodule manual but still mi signals do not change.

    Sara

  • Sorry, in the previous post I wrote a wrong value for the R_OPT.

    It is:

    R_OPT 0x00106004 (it does not chain any other transfer) ...

    Sara

  • Sara,

    What I believe is happening is that the EDMA3 Channel Controller (CC) is breaking the whole transfer into multiple Transfer Requests (TR) to the Transfer Control (TC) based on what the TC can handle. This gets affected strongly by your use of the FPGA's address as a single-address FIFO port.

    I would have expected identical behavior for reads and writes, but obviously that is not the case based on your experience that is well-documented here. So my theory is

    1. For the write to the FPGA, a single TR is sent to the TC for a full ABsync'd transfer of 0x200*0x2 bytes. The read portion of this TR is optimized by the TC since it is a sequential set of bytes being read from internal memory, and these get copied into the TC's write buffer FIFO (see the EDMA3 User Guide for pictures and more detail) from which they are then written in single bus commands from the TC to the EMIFA peripheral.

    I am not absolutely sure of this, and I cannot tell the cycle granularity in the WE picture to know why or explain why there is so much space between the WE pulses. It seems to be more than there should be. Were the WE pulses closer together when you used the DSP to do the writes?

    2. For the read from the FPGA, multiple TRs are sent to the TC, for 2 bytes each since the addresses are not sequential for the reads of this TR. Each read will copy 2 bytes into the write buffer FIFO, then they will be written to internal memory. Only then does the next TR get sent to the TC, for another 2 bytes to be read from the same address.

    One thing that is easy to try, but only has a small chance of helping, is to change the TCMOD to EARLY. That might make the read TRs get sent quicker, but it is a reach and not a big chance it will help. Seems worth mentioning in case you want to try it.

    The best solution requires a change to your FPGA. If you can change the address decode of the FPGA's address to which you are writing and reading so that it will accept any address in a range, then the EDMA can do a sequential transfer on EMIFA. For example, let the FPGA accept any address in the the range of 0x62002000-0x620023FF for writes and 0x62001800-0x62001BFF for reads. If the master address map defined for the FPGA allows this, then it would be as simple as removing some address bits from the decode of the single address location being used for the FIFO right now. If the address map does not have this much space available, then it will require redefining the address map, and that will make your FPGA designer mad at me and your software team mad at me.

    But you will be happy, because the EDMA PARAM can be changed to use the BIDX = 2 instead of 0. You may need to keep the corresponding CIDX = 0 so you will start back at the same beginning point for each interrupt/event trigger.

    What do you think about this being feasible?

    You can test this by just changing the PARAM, even though it will not work well with the FPGA. You can look at the WE and OE pulses to see whether they are closer together after the sequential accesses, or not.

    Regards,
    RandyP

  • Hi Randy,

    when using the DSP the WE pulses were a bit closer: 50ns instead of the 70ns that I see using the DMA (by calculations, the duration should be 50ns, i.e. setup + strobe + hold).

    I changed the EDMA configuration to do sequential transfers, and both the reading and the writing become faster, but there is still something weird: with the sequential transfer I have 8 data transfer of the correct duration (50 ns), and then I see the signal high for a longer time (40 ns for the writing and 110 for the reading, instead of the 20ns that I would expect), and then again the timing is ok for 8 transfers etc....

    Probably the TC still does something that we don't know, can it be some latency time for the managment of arbitration between different transfers? I don't understand why this extra time occurs every 8 data (both in reading and writing), which is not a dimension of my transfers ...

    Anyways, the changes in the FPGA for working with sequential transfer is minimal, so nobody is mad at you for now :) ... I still have to check everything in details, but I think I can work with this configuration. 

    Still it would be great to understand why the timing is not what I expect, there might be something managed uncorrectly somewhere....

    Sara

  • Sara,

    This is probably the best performance you will get with this EDMA3 configuration. You could try increasing the size of ACNT to be 0x400, but you should be getting the equivalent of that already due to the TC's automatic optimization of TRs.

    I do not expect we will be able to reach any deeper understanding of the internal operation of the EDMA3 module. You are now doing what my other customers have done for optimization in the past.

    Regards,
    RandyP

  • Sara,

    Someone watching this thread sent me a recommendation to pass on to you. There is a parameter called DBS, Default Burst Size, that is defined for each Transfer Controller. The default value is 16 bytes, meaning that all reads or writes from a TC will be limited to 16 bytes per command. This probably explains why the gap between fast pulses occurs every 8 beats, or 16 bytes.

    This parameter is programmable, somewhere on the device. You can set it to 32 or 64, although he said that the bridge to EMIFA will break 64-byte bursts into two 32-byte bursts. The gap might still be smaller between the split bursts.

    I did not do a search to find where to program DBS, but if you want or need to take a little more overhead out of the transfer, please do that search. I would start with the EDMA3 User Guide, since it is a TC parameter, but it may be in another module at the SoC level. If you do that search and find it, please reply back to help future users; if you want it and do not find it, please reply back so someone may be able to make that search.

    Regards,
    RandyP

  • Hello Randy,

    the register for programming the DBS is CFGCHIP0, the explanation is in the System's Reference Guide (SPRUFK4).

    You are right : if I set the DBS to 16 byte, the gap between fast pulses occurs every 8 16-bit data, while if I set it to 32 or 64 byte the gap is every 16 data.

    Thank you very much for your help, now everything makes sense!

    Sara

  • Hello Randy,

    I just want to let you know that with furhter tests I saw that by changing the ACNT to 0x400 and BCNT to 0x1 with fixed addressing I obtain the same performance that I have with sequential addressing (i.e. bursts of DBS fast transmissions).

    Regards,

    Sara