This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TM4C1290NCPDT: DMA Performance Questions on TM4C1290

Part Number: TM4C1290NCPDT
Other Parts Discussed in Thread: EK-TM4C1294XL

I am doing some design feasibility work.  I am trying to understand the performance of the uDMA in the TM4C1290.  My primary concern is whether I will have adequate DMA performance to keep up with asynchronous data acquisition from a peripheral port register without losing any data.  Hence, I am very interested in determining how long a data capture via DMA will take, prior to the next data sample’s arrival.

The central design is to read from a peripheral port register and transfer to a circular RAM buffer for latter processing.  Although there is very little time between some sample bursts, on the average the rate is such that background processing of the data buffer is feasible.
I cannot use Ping-Pong because I need to avoid disturbing the existing elaborate interrupt structure by adding another ISR (for supporting ping-pong).
Therefore, I plan to use Scatter-Gather, with one ‘task’ for each transfer from the data port into the buffer.  I will have an N entry circular buffer with a parallel list of N+1 tasks.  Each task will be dedicated to transferring data to the corresponding position in the data buffer. 

The last task in the list (N+1) will rewrite the primary control structure’s control field, so that the buffer is made circular.  For expediency, only the 32-bit control field will be written by the task.  I believe that there is no need to re-write the source and designation fields in the primary control structure, because they are not altered while the task list runs.

Q: Any comments on this overall design?

My analysis of my performance constraints will be greatly enhanced if you can answer the following questions.  Please reference the question number in your answers.
Thanks!!

1. Cost of Scatter-Gather Unused Field - When using Scatter-Gather, it appears that there is the waste of transferring the ‘Unused’ field of a task in the task list to the alternate control structure.  In other words, I am obliged to transfer all 4 words of the task (wasting one read and one write bus cycle), even though only the first 3 words are ‘live’.  This is based on Figure 9-3 Memory Scatter-Gather, Setup and Configuration [DS-TM4C1290NCDT-15863.2743, dated June 18, 2014].  On the right column for the Channel Control Table, the example lists the control field as ITEMS = 12.  There are indeed 3 tasks with 4 words each = 12 words.  Question:  Is the DMA controller smart enough to skip transferring the 4th (unused) field?

2. DMA Idle Cycles - During a Scatter-Gather DMA transfer, are there any bus idle cycles inserted by the DMA controller?  Asked differently: Is every available bus cycle used to transfer data to (or from) the DMA controller and the transferred region (or the RAM-based control structures / task list)?  When answering, disregard any bus cycles that are taken by the CPU. 

3. Striped RAM - Can the DMA run at full speed concurrent with the CPU, given the following assumptions?  In other words, will the DMA ever be slowed by bus contention with the CPU, given the following conditions?  Assume:

a. The Scatter-Gather DMA mode is used to transfer data from a peripheral to a RAM buffer.

b. All RAM used by DMA is located in a dedicated section of striped section of RAM (say the top quarter), including:

i. Primary control structure.

ii. Alternate control structure.

iii. Task list.

iv. Destination RAM buffer.

c. The CPU never accesses any RAM in the DMA’s striped area while DMA is running.

d. Ignore any potential occasional contention between the CPU and DMA for access to the peripherals.

4. Peripheral Bus - When accessing a peripheral, does DMA have to wait for the CPU to have an idle bus cycle?  Asked differently: Is DMA undelayed when accessing a peripheral as long as the CPU is not also accessing a peripheral?  The crux of the question: Is there a peripheral bus that is separate from (and concurrent with) the main flash memory bus?

5. Read Frequency of Source and Destination Fields - During a DMA transfer, how often are the source and destination fields read from the RAM control structure?  I believe that the answer is probably one of the following.  Which one?  (Alternatively, is there another answer?)

a. The fields are read once during startup of the DMA transfer.  The fields are not reloaded after losing and subsequently regaining channel arbitration.  In other words, the fields are remembered internally by the DMA controller for channel resumption after regaining arbitration.

b. The fields are read once during startup of the DMA transfer.  The fields are read again, after resuming a transfer that has been interrupted by losing channel arbitration.  In other words, the fields are forgotten by the DMA controller if the channel loses arbitration.

c. The fields are read once during startup of the DMA transfer.  The fields are read again after every channel arbitration.  The reloading occurs even if the current channel has not lost the arbitration.

d. The source field is re-read prior to every read cycle from the transferred memory area.  The destination field is re-read prior for every write cycle to the transferred memory area.  In other words, the source and destination are not maintained internally in the DMA controller between data transfers.

6. Write Frequency of Control Field - During a DMA transfer, how often is the control field written back to the RAM control structure?  I believe that the answer is probably one of the following.  Which one?  (Alternatively, is there another answer?)

a. The control field is written only when the channel loses arbitration after a bust transfer.

b. The control field is written after each memory write cycle of transferred data in a burst transfer.  During a bust transfer with N items, the control field will be written N times.

7. Read Frequency of Control Field – During a DMA transfer, how often is the control field read from the RAM control structure?  I believe that the answer is probably one of the following.  Which one?  (Alternatively, is there another answer?)

a. The control field is read once during startup of the DMA transfer.  The control field is not read again, even after losing and subsequently regaining channel arbitration.  In other words, the field is remembered internally by the DMA controller for channel resumption after regaining arbitration.

b. The control field is read once during startup of the DMA transfer.  The channel’s control field is read again, after resuming a transfer that has been interrupted by losing channel arbitration.

c. The control field is read once during startup of the DMA transfer.  The control field is read again after every channel arbitration, even if arbitration was not lost.

8. Scenario: In Scatter-Gather mode, done with running one task in the alternate control structure, and ready to load the next task into the alternate control structure.  Is the primary control structure reloaded into the DMA controller before the next task is DMA’ed from the task list to the alternate control structure?  Alternatively, is the primary control structure remembered internally in the DMA controller while it runs the alternate control structure?

  • Just to let you know I have received and read your post. I will need to research the answers when I get back in the office on Monday.
  • Sorry, I needed to do some research and I wanted to consult one of my colleagues before answering you.  

    Dave Kellogg88 said:
    Any comments on this overall design?

    You were not clear on which "peripheral port register" you will be reading. The Scatter-Gather mode triggers off of a single request. It will not work on a UART, I2C or SPI receive registers as you need a separate request for each byte received. The GPIO port does not generate uDMA requests. The only peripheral that might work is the EPI. Even then, I do not encourage it as the DMA cycle overhead to create a circular buffer using Gather-Scatter is significant. Also, the EPI port would stall the uDMA machine which may be significant if you are using other uDMA channels.

    You also did not mention the maximum burst size. If the burst size is less than or equal to 1024 elements, you can use ping-pong mode and poll the uDMA between bursts. If you current interrupt routines will not allow reconfiguring the uDMA after 1024 items are received, you should seriously reconsider if some of the operation in the interrupt routine can be moved to the main loop.  

    Dave Kellogg88 said:
    Is the DMA controller smart enough to skip transferring the 4th (unused) field?

    No.

    Dave Kellogg88 said:
    During a Scatter-Gather DMA transfer, are there any bus idle cycles inserted by the DMA controller?

    Sorry, I do not know and have been unable to find an absolute answer.

    Dave Kellogg88 said:
    Striped RAM - Can the DMA run at full speed concurrent with the CPU, given the following assumptions? 

    The RAM is configured as 4-way interleaved. It is not separated into memory quarters. That means that the uDMA will be loading reading one word from the task list from each of the four banks. There will be uDMA stalls due to RAM bus contentions with the CPU. (The CPU gets priority.)

    Dave Kellogg88 said:
    Is there a peripheral bus that is separate from (and concurrent with) the main flash memory bus?

    Yes.

    Dave Kellogg88 said:
    During a DMA transfer, how often are the source and destination fields read from the RAM control structure?

    Answer B:

    Dave Kellogg88 said:
    b. The fields are read once during startup of the DMA transfer.  The fields are read again, after resuming a transfer that has been interrupted by losing channel arbitration.  In other words, the fields are forgotten by the DMA controller if the channel loses arbitration.

    Dave Kellogg88 said:
    During a DMA transfer, how often is the control field written back to the RAM control structure?

    Answer A:

    Dave Kellogg88 said:
    The control field is written only when the channel loses arbitration after a bust transfer.

    Dave Kellogg88 said:
    During a DMA transfer, how often is the control field read from the RAM control structure?

    Answer B:

    Dave Kellogg88 said:
    The control field is read once during startup of the DMA transfer.  The channel’s control field is read again, after resuming a transfer that has been interrupted by losing channel arbitration.

    Dave Kellogg88 said:
    In Scatter-Gather mode, done with running one task in the alternate control structure, and ready to load the next task into the alternate control structure.  Is the primary control structure reloaded into the DMA controller before the next task is DMA’ed from the task list to the alternate control structure? 

    No, the primary structure is remembered while the alternate structure is executed unless the channel loses arbitration.

  • Hi Bob,
    Thank you for the detailed answers – it helps a lot! However, it stirred up some more questions:

    Regarding my use of peripherals: I was intending to use Scatter-Gather on Port M GPIO, with the expectation that I could configure each bit to trigger an interrupt on both a rising and falling edge. My (apparently wrong) expectation was that when the single common IRQ for Port M was asserted by any combination of edges on the 8 bits, that could be used to initiate a uDMA request.

    I based part of my understanding on Table 9-1 in the TM4C1290NCDT data sheet. In that table, channel #14 is shown as supporting PORT M. So I’m still puzzled. Further, Table 9-2 shows that a GPIO can generate a trigger event for a Burst Request. In section 1.3.6.7 Programmable GPIOs, it mentions that each port “can be used to initiate an ADC sample sequence or a uDMA transfer”.

    So is there something special about Scatter-Gather mode that precludes its use with GPIO? Can the GPIO ports be used only with some of the other DMA modes? If so, which ones?

    Instead of bringing my signals in as Port M GPIO, I can route them through the same pins into the General Purpose Timer 2. Then I can configure the timer to raise a uDMA request after an edge change.

    Is Scatter-Gather DMA supported by the timers? You did not mention that class of peripheral.

    +++++++++++
    Just to help me clarify my new understanding of what the “4-way interleave” means for the SRAM, please confirm my view as follows. (Thanks for being patient with me!!)

    Is the following true: Is the 4-way interleave essentially a mechanism to match a slower speed RAM with the faster CPU?

    In Table 1-1, the SRAM is noted as “256 KB single-cycle system SRAM”. How should I interpret the term “single cycle”?

    Address 2000.0000-2000.0003 is a 4-byte access in the first interleave bank, 2000.0004-2000.0007 accesses 4 bytes in the second interleave bank, 2000.0008-2000.000B is in the third interleave bank, and 2000.000C-2000.000F is in the 4th interleave bank. Then everything repeats over at 2000.0010, 16 bytes at a time. Each of these access can occur 8.33 nsec after the preceding access, right?

    However, can the CPU access a 4-byte word at 2000.0000-2000.0003 and then access another word at 2000.0010-2000.0013 with no delay?

    In the data sheet in section 8.2.1 SRAM, there is a note about the interleaving. It says that a write followed by a read “in the same bank” incurs a “stall of a single clock cycle”. The clock cycle used here is the period of the CPU at 120 MHz (ie, 8.33 nsec), so the stall is for one CPU cycle = 8.33 nsec, right?

    The second paragraph of the note speaks about multiple masters with simultaneous access. So if I am lucky enough for uDMA and the CPU to be reading separate banks at the same moment, then the action is indeed in parallel, right?

    However, if uDMA tries to access the same bank as the CPU, then uDMA is delayed by 8.33 nsec, right?

    +++++++++
    Thanks again for your help - I appreciate it.
  • Hi Dave,

    First, let me correct something. The GPIO ports can generate uDMA requests. You can configure all 8 GPIO port M pins to generate a uDMA request on either a rising or falling edge.  This may not work well for parallel port communications. First, if you are transferring the same byte twice, no pins change and no DMA request would be made. The second problem is when two pins change at "the same time". If that change happens coincident with an internal clock edge, even a small difference in the timing of the two edges may cause two uDMA requests, resulting in two reads.

    If neither of those issues will affect you, let's discuss the Scatter-Gather mode. In this mode a single uDMA request will cause the entire "task list" to be executed, not just one read. If you want to read the 8-bit GPIO port each time any of the lines change, then you should use the ping-pong mode with an arbitration size of 1. Then each uDMA request results in one read of the GPIO port which gets stored in the buffer. After up to 1024 of such reads, you get an interrupt and the alternate channel can control up to another 1024 reads. That gives you the time of 1024 byte reads before you have to service the uDMA interrupt and reset the primary channel.

    Dave Kellogg88 said:
    Is the 4-way interleave essentially a mechanism to match a slower speed RAM with the faster CPU?

    No, it is a mechanism to avoid a single bus master from blocking accesses to a bank of RAM for long periods of time.

    Dave Kellogg88 said:
    How should I interpret the term “single cycle”?

    That means the RAM is read or written in a single cycle. For this device, in as little as 8.3nS. However, the M4 CPU uses a single level write buffer, so the write happens one cycle delayed.

    Dave Kellogg88 said:
    Address 2000.0000-2000.0003 is a 4-byte access in the first interleave bank, 2000.0004-2000.0007 accesses 4 bytes in the second interleave bank, 2000.0008-2000.000B is in the third interleave bank, and 2000.000C-2000.000F is in the 4th interleave bank. Then everything repeats over at 2000.0010, 16 bytes at a time. Each of these access can occur 8.33 nsec after the preceding access, right?

    That is correct.

    Dave Kellogg88 said:
    However, can the CPU access a 4-byte word at 2000.0000-2000.0003 and then access another word at 2000.0010-2000.0013 with no delay?

    If both accesses are reads, yes. However, if the first is a write and the second a read, since the write is delayed one cycle by the write buffer, and the next read is from the same bank, that read will be delayed one cycle.

    Dave Kellogg88 said:
    In the data sheet in section 8.2.1 SRAM, there is a note about the interleaving. It says that a write followed by a read “in the same bank” incurs a “stall of a single clock cycle”. The clock cycle used here is the period of the CPU at 120 MHz (ie, 8.33 nsec), so the stall is for one CPU cycle = 8.33 nsec, right?

    Yes

    Dave Kellogg88 said:
    The second paragraph of the note speaks about multiple masters with simultaneous access. So if I am lucky enough for uDMA and the CPU to be reading separate banks at the same moment, then the action is indeed in parallel, right?

    However, if uDMA tries to access the same bank as the CPU, then uDMA is delayed by 8.33 nsec, right?

    Yes to both.

  • Hi Bob,

    I have a few more questions that relate to my use of DMA.

    First off - Thanks for reminding me about Ping-Pong mode.  It does seem to be a better solution than Scatter-Gather.

    Hi Bob,

    I have a few more questions that relate to my use of DMA.

    First off - Thanks for reminding me about Ping-Pong mode.  It does seem to be a better solution than Scatter-Gather.

    1.  I understand that the CPU’s use of the bus(s) will block DMA’s use. I am trying to (roughly) quantify how much this will occur, so I can get an approximation of the latency and throughput rate for DMA.  I believe that this will not be deterministic.  Right?

    2.  Figure 8-1 “Internal Memory Block Diagram” seems to indicate that DMA contends with the CPU only for access to RAM and peripherals. (I am assuming that I never configure DMA to reference Flash.)  Right?

    3.  In particular, DMA is never delayed due to accesses by the CPU to Flash, ROM, or EEPROM. Right?

    4.  Said differently, DMA will be blocked by the CPU only by concurrent CPU accesses to RAM or the peripherals. Right?

    5.  You responded earlier about this, but I want to confirm: When the CPU is accessing a peripheral, DMA will not be delayed when accessing RAM.  Right?

    6.  Is there any CPU timing penalty for 32-bit accesses to RAM that do not lay on a natural word boundary? i.e., If the CPU accesses a 32-bit word (or 16-bit half-word) in RAM that straddles between two of the interleaved RAM banks, does this take longer than 1 cycle?

    7.  In Figure 8-1, there are two small blocks each labeled “SPB” (near the Cortex M4 block). What are these?  (I search the PDF for “SPB” and had no other hits.)

    8.  Are there any app notes or similar material that could help me to understand the internal bus structure of the TM4C129?

     

    Thank you again for the assistance,

    Dave Kellogg

    I understand that the CPU’s use of the bus(s) will block DMA’s use.  I am trying to (roughly) quantify how much this will occur, so I can get an approximation of the latency and throughput rate for DMA.  I believe that this will not be deterministic.  Right?

    Figure 8-1 “Internal Memory Block Diagram” seems to indicate that DMA contends with the CPU only for access to RAM and peripherals.  (I am assuming that I never configure DMA to reference Flash.)  Right?

    In particular, DMA is never delayed due to accesses by the CPU to Flash, ROM, or EEPROM.  Right?

    Said differently, DMA will be blocked by the CPU only by concurrent CPU accesses to RAM or the peripherals.  Right?

    You responded earlier about this, but I want to confirm:  When the CPU is accessing a peripheral, DMA will not be delayed when accessing RAM.  Right?

    Is there any CPU timing penalty for 32-bit accesses to RAM that do not lay on a natural word boundary?  i.e., If the CPU accesses a 32-bit word (or 16-bit half-word) in RAM that straddles between two of the interleaved RAM banks, does this take longer than 1 cycle?

    In Figure 8-1, there are two small blocks each labeled “SPB” (near the Cortex M4 block).  What are these?  (I search the PDF for “SPB” and had no other hits.)

    Are there any app notes or similar material that could help me to understand the internal bus structure of the TM4C129? 

  • Dave Kellogg88 said:
    1.  I understand that the CPU’s use of the bus(s) will block DMA’s use. I am trying to (roughly) quantify how much this will occur, so I can get an approximation of the latency and throughput rate for DMA.  I believe that this will not be deterministic.  Right?

    Yes, I agree.

    Dave Kellogg88 said:

    2.  Figure 8-1 “Internal Memory Block Diagram” seems to indicate that DMA contends with the CPU only for access to RAM and peripherals. (I am assuming that I never configure DMA to reference Flash.)  Right?

    3.  In particular, DMA is never delayed due to accesses by the CPU to Flash, ROM, or EEPROM. Right?

    4.  Said differently, DMA will be blocked by the CPU only by concurrent CPU accesses to RAM or the peripherals. Right?

    Right, right, right.

    Dave Kellogg88 said:
    6.  Is there any CPU timing penalty for 32-bit accesses to RAM that do not lay on a natural word boundary? i.e., If the CPU accesses a 32-bit word (or 16-bit half-word) in RAM that straddles between two of the interleaved RAM banks, does this take longer than 1 cycle?

    Yes, the internal bus is 32-bits wide. Any unaligned access that crosses that 32-bit boundary is converted into two accesses.

    Dave Kellogg88 said:
    In Figure 8-1, there are two small blocks each labeled “SPB” (near the Cortex M4 block).  What are these?  (I search the PDF for “SPB” and had no other hits.)

    It is just showing a connection to the System Peripheral Bus.

    Dave Kellogg88 said:
    Are there any app notes or similar material that could help me to understand the internal bus structure of the TM4C129? 

    I can't think of a a good reference. There is a lot of information in the ARM Cortex M4 Technical Reference Manual, but almost too much information.

  • Hi (Again) Bob,

    As I mentioned earlier, I want to generate a DMA request on any rising or falling edges for several input bit s to port M GPIO pins.  Using DMA Ping-Pong with AbrSize = 1, for each edge detected, I’ll transfer a single copy of the port’s data register into a DMA buffer.  I do not want to interrupt the CPU when the DMA starts.  Nor do I want to interrupt the CPU when the DMA finishes. 

    1. I believe that I must do the following configuration.  Is this correct?  Anything missing or out of order?
    a. Enable clocking to Port M by setting bit 11 of RCGCGPIO.

    For each bit position on Port M:
    b. Write bits in GPIODEN to 1 to enable the digital functions.
    c. Write bits in GPIODIR to 0 to select input direction.
    d. Write bits in GPIOAFSEL to 0 to treat as an I/O bit.
    e. Write bits in GPIOIS to 0 to select edge triggered interrupts.
    f. Write bits in GPIOIBE to 1 to select interrupt on both rising and falling edges.
    g. Ignore GPIOIEV since using IBE.
    h. Ignore GPIOPCTL since no alternate muxed function is used.
    i. Write bits in GPIODMACTL to 1 to enable DMA (per section 10.3.2.3).
    j. Write bits in GPIOIM to 1 to enable interrupts (per section 10.3.2.3).


    2. How do I avoid interrupting the CPU?  In particular, I am puzzled by section 10.3.2.3 directing setting bits in GPIOIM.  At some low level, are interrupts needed to generate a dma_req signal? 

    3. If I was using edge-triggered interrupts (instead of DMA), I know that I would have to write the bits in DPIOICR to clear the interrupt.  Since I am using Ping-Pong DMA, how does the edge interrupt clearing take place?

    4. In the data sheet for TM4C1290, section 9.2.8.2 Trigger Peripherals, the second paragraph talks about overlapping DMA requests with potential dma_req loss for more than two waiting requests.  Therefore, can I think of dma_req as a brief pulse?  So the request does not need to be cleared or acknowledge?  Can the DMA controller buffer up one (but not two) outstanding request if busy?

    5. Several question cycles ago, I understood that Peripheral Scatter-Gather DMA (PSG) was undesirable, because it could not work with transfers from a peripheral such as initiated by an edge on a GPIO port.  However, in section 9.2.6.6 “Peripheral Scatter-Gather”, it says that uDMA will wait for a subsequent request from the peripheral before continuing. 
    Although I like the design of Ping-Pong DMA better than PSG, I can see that PSG combined with Memory Scatter-Gather may be necessary if interrupt clearing must be done after transferring each byte from port M.  Is there any reason for why Ping-Pong DMA would be a bad fit for my use case?

    6. This question is more about the relative timing of detection of an edge on port M, the subsequent reading of the data register, and an immediate second edge.  I looked through Chapter 26 – Electrical Characteristics and did not see anything of help.  The basic question is: “Is the edge detection clocked by the system clock?”  I.e., an edge change on the port is not detected until the clock's next cycle.  Is this right?   So the detection of every edge is delayed by up to one system clock cycle?  When reading the port data register, is there a latch between the bus and the raw port pins, so that a bit that changes mid-cycle while reading the port  is not ambiguous?

    7. In the prior Q&A response, you mentioned that any unaligned access that crosses the 32-bit boundary between interleaved RAM pages incurs an extra bus cycle.  I assume that this is true for both accesses by the CPU and accesses by the DMA, right?

    Again, Thank You for all your help.

    Dave Kellogg

  • Sorry for the delay in getting back to you, but I thought an example might be more helpful for you. It took a while to get working (too many interruptions.)

    I have attached a .zip file that contains two Code Composer project. Both run on an EK-TM4C1294XL Launchpad. The first project, 8BitPortMOut, simply writes an infinite number of incrementing 8-bit values out GPIO port M. The second project, 8BitPortMIn, use the uDMA to capture the changing values on Port M. The M ports of the two launchpads are connected by 8 wires plus a ground wire.

    /cfs-file/__key/communityserver-discussions-components-files/908/8BitPortM.zip

    Looking at the first logic analyzer picture, you can see that the first EVM was outputting values roughly every 110nS. Also note, that sometimes there was a few nanosecond delay between the transition on one signal and the transition of another for the same count. In my test case, I never saw an extra capture, or missed a capture. The distinct values came far enough apart that all could be captured, and the edges came close enough together that multiple DMA reads were not generated for a single count.

    My example used ping-pong mode of the uDMA with two 1024 byte buffers. I get an interrupt after each buffer is filled. If I have not reset the DMA transfer before the second buffer is filled, I will miss some of the counts. In my routine, the main loop is doing nothing so interrupts were serviced almost immediately. I toggled an extra pin to show when I was in the interrupt routine. The time between interrupts was 110.5uS.

    The length of the interrupt routine was 2.34uS.

  • Dave Kellogg88 said:
    1. I believe that I must do the following configuration.  Is this correct?  Anything missing or out of order?

    Don't do direct register writes. See the example in the post above and use the TivaWare function calls.

    Dave Kellogg88 said:
    2. How do I avoid interrupting the CPU?  In particular, I am puzzled by section 10.3.2.3 directing setting bits in GPIOIM.  At some low level, are interrupts needed to generate a dma_req signal?

    In ping-pong mode interrupts are required if you are going to capture more than 2048 samples. I only enabled the DMA interrupt mask bit in GPIOIM.

    Dave Kellogg88 said:
     If I was using edge-triggered interrupts (instead of DMA), I know that I would have to write the bits in DPIOICR to clear the interrupt.  Since I am using Ping-Pong DMA, how does the edge interrupt clearing take place?

    You just have to clear bit 8, the DMA interrupt request bit. See the example in the project posted above.

    Dave Kellogg88 said:
     In the data sheet for TM4C1290, section 9.2.8.2 Trigger Peripherals, the second paragraph talks about overlapping DMA requests with potential dma_req loss for more than two waiting requests.  Therefore, can I think of dma_req as a brief pulse?  So the request does not need to be cleared or acknowledge?  Can the DMA controller buffer up one (but not two) outstanding request if busy?

    In the case of GPIO port M, there is only one request that is generated when any of the enabled pin edges occurs. Therefore, if the read of the GPIO data port cannot be done before the value changes again, you will miss a value. If two lines change to create a new value, and the time between the two changes is long enough, you will read two values.

    Dave Kellogg88 said:
    6. This question is more about the relative timing of detection of an edge on port M, the subsequent reading of the data register, and an immediate second edge.  I looked through Chapter 26 – Electrical Characteristics and did not see anything of help.  The basic question is: “Is the edge detection clocked by the system clock?”  I.e., an edge change on the port is not detected until the clock's next cycle.  Is this right?   So the detection of every edge is delayed by up to one system clock cycle?  When reading the port data register, is there a latch between the bus and the raw port pins, so that a bit that changes mid-cycle while reading the port  is not ambiguous?

    The pins are synchronized to the system clock both for the edge detection and for the port read. For a valid reading, the pins should not be changing during the read cycle. All pin values will resolve to 1 or 0, but an input pin whose voltage is move than Vil and less than Vih at the time it is read may resolve to either state.

    Dave Kellogg88 said:
    In the prior Q&A response, you mentioned that any unaligned access that crosses the 32-bit boundary between interleaved RAM pages incurs an extra bus cycle.  I assume that this is true for both accesses by the CPU and accesses by the DMA, right?

    The DMA does not support unaligned accesses. The data to be transferred must be aligned in memory according to the data size (8, 16, or 32 bits).