This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

MCASP/EDMA3 with I2S 16-bit words

I am looking for info on the best way to use EDM3 to service a simple MCASP I2S transmitter (i.e., 2-slot TDM, 32-bit slot, word-width FS and a 1-clock delay) with a 16-bit word size. Source data format is 16-bit packed words. From what I can gather, you cannot use EDMA3 to access the data port as a 16-bit word. The only way I can find to do this is to use 32-bit port access and pad the EDMA buffer so that it is 2x its normal size. So, you'd be pulling consecutive 32-bit words from the buffer and only sending 16-bits per 32-bit word read. You can always get rid of the buffer padding by making BINDEX 2 instead of 4, but then you end up reading every odd 32-bit word from an address that is not 32-bit aligned which is inefficient. Any ideas?

  • Which specific device are you using? Most will have the same restrictions on the Config Bus activity, meaning that both the CPU and the EDMA3 will have to do 32-bit transactions internally. But there might be ways to accomplish this depending on the exact device and its particular peripherals.

  • Primarily C6746/6748 (EDMA3). This uses MCASP to interface with an HDMI receiver which delivers 16-bit I2S.

    This interfaces also to a DM6467 (EDMA3) and a C6713 (EDMA2).

  • OK ... for anyone interested, I have some answers. Understanding how the EDMA TC functions on both ends of a transfer is fundamental here. First, a few summary points:

    ====

    1) MCASP/EDMA3 can support both 16-bit and 8-bit MCASP data port access for single or multiple serializers and any number of slots per frame. (As discussed earlier in this thread, the reason to do this would be to service a MCASP that implements a 16-bit or 8-bit word size without either padding the source memory or inefficiently accessing it with an overlapping pattern.)

    ====

    2) Although single Transfer Controller accesses can be as wide as 64-bits, data quantities are 32-bits maximum at the EDMA_TC/MCASP_DPORT interface. But, individual EDMA_TC/MCASP_DPORT accesses can also be 16-bits or 8-bits wide as mentioned above.

    ====

    3) For a 16-bit word size to/from packed memory, the single-serializer scenario is simple: use A-Synchronization, set ACNT=2, BCNT=n, and MCASP_DPORT-side indexing can be any value that keeps the entire transfer within the MCASP_DPORT address window.

    ====

    4) For a 16-bit word size and multiple serializers to/from packed memory, it is necessary to use the AB-synchronization mode and to configure the MCASP_DPORT-side indexing to zero. This prevents multiple sub-32-bit accesses within a single event from being automatically aggregated into wider accesses within the Transfer Controller. (The TC optimizes contiguous sub-32-bit data items, but it cannot optimize sub-32-bit repeat transfers to a fixed address.) When the MCASP needs to be serviced for multiple serializers that each require 16-bits or less, it cannot intelligently fragment data quantities from EDMA that are delivered as 32-bit. Instead, it will access the serializers 32-bits at a time until the EDMA data volume for that event is expired. This is not the desired behavior as it wil starve (Tx) or overflow (Rx) one or more serializers. The granularity of the data delivery must be explicitly controlled by the EDMA parameters.

     

    In the examples below, I refer to the EDMA setup parameters for a MCASP transmitter. The DESTINATION ADDRESS is always MCASP_DPORT and BINDX/CINDX both refer to the destination indices. 'Event' refers to a MCASP-to-EDMA signaling event that occurs once per slot. The EDMA must service all serializers per slot at this time.

     

    Case 1:

    -------

    A-Synchronized

    ACNT=4

    BCNT=n

    BINDX=anything

    -------

    Per event, a single 32-bit word is written to DPORT.

    Use case is single serializer, 32-bit word.

     

    Case 2:

    -------

    A-Synchronized

    ACNT=8,12,16,...

    BCNT=n,

    BINDX=anything

    -------

    Per event, multiple 32-bit words are written to DPORT sequentially.

    Use case is multiple serializers, 32-bit word per serializer.

     

    Case 3:

    -------

    AB-Synchronized

    ACNT=4

    BCNT=s

    BINDX=0 or 4

    CCNT=n

    CINDX=0 or 4*s

    -------

    Per event, 's' 32-bit words are written to DPORT sequentially.

    Use case is 's' serializers, 32-bit word per serializer.

    When BINDX/CINDX=4/4*s, the EDMA-->DPORT performance is improved at the EDMA backend.

     

    Case 4:

    -------

    A-Synchronized

    ACNT=2

    BCNT=n,

    BINDX=anything

    -------

    Per event, a single 16-bit word is written to DPORT.

    Use case is single serializer, 16-bit word.

    (Note: If data is ms-bit first and slot width is greater that 16, the mask/pad/rotate feature is used to shift the valid bits to the correct position.)

     

    Case 5:

    -------

    AB-Synchronized

    ACNT=2

    BCNT=s,

    BINDX=0

    CCNT=n

    CINDX=0

    -------

    Per event, 's' 16-bit words are written to DPORT sequentially.

    Use case is 's' serializers, 16-bit word per serializer.

     

    Case 6:

    -------

    AB-Synchronized

    ACNT=2

    BCNT=s,

    BINDX=2

    CCNT=n

    CINDX=2*s

    -------

    Per event, 's'/2 32-bit words are written to DPORT sequentially.

    Use case is none. This is a demonstration case to show how the contiguous addressing in the DPORT address window caused by BINDX/CINDX=2/2*s makes the Transfer Controller aggregate the data into 32-bit words at the EDMA/DPORT interface. This is efficient, but it feeds the DPORT incorrectly if the intent of ACNT=2 was to submit individual 16-bit words to the serializers.

     

    Case 5 is the primary case of interest here as it describes how to service the MCASP with multiple individual 16-bit words per event. Note that, although the EDMA/MCASP interface necessarily issues multiple individual 16-bit words, the data source indexing can be setup for contiguous memory access so that the mem/EDMA interface operates with the most efficiently-sized memory accesses.

     

    Cases 4 and 5 can easily be altered to support an 8-bit word size.

    I have actually tested each of these cases. It seems like all of this should apply directly to MCBSP as well as some other serial peripherals.

  • Excellent analysis. Thank you for posting this to the community.

  • ***Update:

    The solution described was tested on a DM6467 in both the transmit and receive direction. However, this method fails on a 674x. TI has confirmed that on the 6748 in particular, the 32-bit-only access to the DPORT is enforced. Not sure why this was done. It is very common to have 32 clocks per I2S slot/channel where only 16-bits are valid. This limitation makes it impossible to pick off only the valid data and store it in packed memory via EDMA. I am trying to find out what other devices have this limitation.

  • splonge386 said:

    TI has confirmed that on the 6748 in particular, the 32-bit-only access to the DPORT is enforced.

    Could you elaborate, please, on "TI has confirmed" and "is enforced"? Hopefully you can reference some documentation that they sent you. It would be helpful to some of us to understand this fully.

    splonge386 said:

    This limitation makes it impossible to pick off only the valid data and store it in packed memory via EDMA.

    This is a perfect opportunity for the flexibility of the EDMA3 to provide you with a solution: one channel to read the peripheral chaining to another channel to pack the buffer. Or the other way around for the transmit side.

    Program the active I2S read-event channel to read one 32-bit word from the peripheral DPORT and write that word to any 32-bit word location of your choosing. To avoid the cost of a reload PaRAM set, set OPT.STATIC=1, ACNT=4, BCNT=CCNT=1. The response to every read event will be to copy one word to a single-word location, and the addresses do not need to change. Set OPT.TCCHEN=1 and OPT.TCC=ChB where ChB is the buffer-packing channel.

    Program a new buffer-packing channel, ChB, to copy 16 bits from the single-word location above, with ACNT=2, DSTIDX=2 and SRCIDX=0. This channel would be programmed the same as you would have programmed the I2S read channel if you could read 16 bits from DPORT, but with the SRCADDR being this new single-word location instead of the DPORT address.

    Once you have programmed these, there will be no CPU interaction required more than what you would have done previously. The I2S read-event channel will never need to be updated since it is STATIC, and the buffer-packing ChB will be updated just like you would have before, assuming that you move the destination buffer around from time-to-time. If you only use a single buffer or a ping-pong buffer pair, then you would have no CPU interaction other than responding to interrupts when a buffer is full.

     

  • After demonstrating that 16-bit DPORT access did not work on a 6748, I learned from TI app support that it would not work on this part as the EDMA/DPORT interface was different ; this was confirmed by 'the factory.'

    Certainly, you can make multiple passes on the data buffers with EDMA, but the point was to transfer this data efficiently. Also, if memory space is your constraint, you would need to fit at least one instance of the data in its unpacked format. No good.

     

  • When transferring 32-bit contiguous data one would normally program ACNT=4 and SRCBIDX=4 (non-overlapping, contiguous data).  For the case of desiring to send 16-bit data, but needing a 32-bit transfer it seems that you could accomplish this by setting ACNT=4 and SRCBIDX=2.  This would avoid doing multiple passes.

  • Sure. That method is understood (see original post).But, it is inefficient - not due to 32-bit reads vs 16-bit reads, but due to misaligned 32-bit access on every odd word. So, that's 3*n/2 individual accesses vs n.

     

  • This "extra work" for the EDMA3 is trivial given the rate of data coming in the McASP.

  • This has less to do with "work" and more to do with clogging up the SCR. In heavily loaded multi-core TI SoCs, it is not hard to get the MCASP to underflow/overflow. This peripheral, inconceivably, has no HW FIFO and the extra buffer stage in the MCBSP is a half-measure.

     

  • I just looked in the 6748 data sheet.  Both McASP and McBSP include a FIFO.  I don't understand your comment.

  • Ah. Yes - the C674x implements a different version of the MCASP peripheral which has R/W FIFOs. The DM family MCASP has no FIFO.

     

  • Hi,

    I am also facing a similar problem with I2S and EDMA.
    I am using following:
    DSKDA830EVM from Spectrum Digital
    bios_6_32_02_39
    ccsv4
    ipc_1_23_02_27
    xdctools_3_22_01_21
    pspdrivers_02_00_01
    edma3_lld_02_00_01_04

    I am trying to modify the audio example in psp to do a sinetone transmission in 16 bitI2S mode. Final aim is to make both receive and trasmit work for 16 bit I2S mode.I am attaching the code here.With the present setup the tone is getting corrupted when we hear it on speakers. Can somebody please tell me what needs to be done for correct playback?
    There are changes in \ti\psp\platforms\evmDA830\audio.cfg
          \ti\psp\platforms\codec\Aic31.xdc
          \ti\psp\examples\evmDA830\audio\config\audioSample.cfg
          \ti\psp\examples\evmDA830\audio\sample\src\audioSample_io.c
         
    Thanks,

    Rajaram