This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CC430 SD Card ADC Transfer Sampling Rate

Other Parts Discussed in Thread: CC430F5137, MSP430F1611

Hi,

I am using a CC430F5137 connected directly to an SD card via SPI.

One of my design objectives is to perform a 40kHz ADC sample on a single channel for four seconds continuously and store it on the SD card.

I am thinking this is not possible to do with SD card write speeds on one CPU, but I would like to ask the internet to make sure before moving forward.

My code has two 512 byte buffers in RAM and uses the ADC interrupt service routine to place the sample data into one buffer until it is full, then switch buffers and initiate a DMA write via SPI to the SD card on the full buffer.

I believe the bottleneck here is the DMA write speed, which takes two MCLK cycles per byte. This results in a 17MHz write speed to the SD card. This would be fine except for this excerpt from the CC430 User Guide DMA Operation:

During a block transfer, the CPU is halted until the complete block has been transferred. The block
transfer takes 2 × MCLK × DMAxSZ clock cycles to complete. CPU execution resumes with its previous
state after the block transfer is complete.

 Doesn't this defeat the purpose of DMA? Which is to perform memory transfer while leaving the CPU available to perform other tasks?

If the CPU is indeed halted, then the 1024 clocks cycles it takes to perform a 512 byte SD block write will stop the sampling data transfer in the ADC ISR.

I have successfully taken a little over 9000 samples (about 19000 bytes of data) of a 1000kHz sine wave and have observed periodic interruptions in the waveform.

Any advice? Is a 40kHz ADC sampling rate written to the SD card achievable?

  • Andrew Thornton said:
    Doesn't this defeat the purpose of DMA? Which is to perform memory transfer while leaving the CPU available to perform other tasks?

    Well, teh purpose of DMA is to not have the CPU executing code to check for availability of data and transferring it. The transfer itself, howeve,r takes the time it takes. Each MCLK cycle, only one data byte/word can be read from or written to any perpiheral. Ther eis only one data bus. So a DMA transfer requires 2 MCLK cycles for synchronization (to ensure the bus is ready and the CPU is temporarily halted), then one cycle to read the data from source addres and one to write to destination address, for each byte/word.

    Whiel a transfer takes place, the CPU cannot do anything else, because it cannot fetch its next instruction, because teh data bus is occupied by teh DMA transfer. However, DMA is done much faster than even the best optimized code loop coud do it, not to mention the immediate reaction to a DMA trigger (in opposition to an IRQ trigger that requires executing quite some code before finally reaching the CPU instruction that moves the data).

    For burst transfers, there is a DMA mode that inteleaves the DMA transfer with the CPU, slowign down the DMA transfer, but not completely halting the CPU.

    However, writing to SD card using serial transfer, let' sassume the CPU is running on 16MHz. If the SPI is running on 16MHz too, the transfer of one byte requires 8 MCLK cycles. A DMA transfer triggered by the TX empty flag takes 4 MCLK cycles, the CPU still has 4 MCLK cycles each byte for its own, independent job (effectively running on 8MHz now).
    However, checking th eTX register for being empty and moving that next data byte in would definitely take more than 8 MCLK cycles per byte. So even if the CPU just waits for teh DMA transfer being complete, it is still faster than doing in in plain software.

    Now for the data acquisition. You'll need three buffers of 512 bytes. One buffer fills with a rate of 2*40.000 byte/s, so it is full in 6,4ms. You can use one DMA channel to fill it from ADC. Once conversion is done, switch to buffer two to fill it. Buffer 1 then is written to SD card. The plain transfer of 512 bytes with 16MHz system clock takes 2,56ms. Some overhead for initializing the write, well, let's say 3ms. Leaves 3-3.4ms for the flash write itself (SD-card internal). which shouldn't be a problem at all if the SD card is empty (pre-erased) before the write. (even if not, the card should be fast enough)
    While Buffer 1 is written, buffer 2 is filled, and when full, to relax things a bit, the software should switch to buffer 3 for filling while buffer 1 is hopefully written now and buffer 2 can be written.

    Andrew Thornton said:
    If the CPU is indeed halted, then the 1024 clocks cycles it takes to perform a 512 byte SD block write will stop the sampling data transfer in the ADC ISR.

    As I said, on serial transfer to the card, the CPU will run with about 50% speed. which is still enough to read the ADC and write it to buffer. 40kHz sampling rate are 25µs per sample = 640 CPU cycles. On 50% CPU speed, you still have 320 cycles per sample. Whiich on cleverly written software will be enough.
    However, it you use a second DMA channel for the ADC->buffer move, you son't need any CPU cycle for each sample. Since the second DMA will take 4 cycles per transfer , but this times these ae word transfers, so you'll still left over with 1/4 of the CPU speed. Or 160 clock cycles between two conversion. Plenty of time to switch the sampling buffer. And for the next 256 conversions you'll have more than enough time to prepare and start the write operation fo the next block.

    The bottleneck I see here is the write speed of the SD card itself. You only have ~3ms per 512byte sector. Depending on the card this might be not enough (especially if the target sector needs to be erased first). But well, 80k/s write speed really shouldn't be a problem. I did 4 times as much (dummy writes) on an 8MHz MSP430F1611 (with 4MHz SPI speed and no DMA). Even more is possible on a 5x device with 16 or even 25MHz MCLK and full SPI speed.

    So I believe, the bottleneck in your application is your current code/approach, not the capabilities of the hardware. (Well, you're not the first one who has to learn to work on finite resources rather than to solve problems with the power of a total overkill processor. Most people starting with a microcontroller faced the same problem)

  • Thank you for such a detailed response! It looks like it should work with that methodology.

    I almost have the code working but I am having a strange issue with my SD write function.

    Before writing posting this thread I was using the ADC ISR to transfer data into RAM, switch buffers and issue SD write commands, and it all worked fine but wasn't fast enough.

    Following your advice (very much appreciated!) I am now using the DMA to transfer data into RAM and switching buffers in a while loop that waits on the DMAEN bit.

    For some reason the sd_write_block command is okay with the writing the buffer in the first (ISR) method above but throws the code into reset in the 2nd (DMA) method. Placing a breakpoint at the sd_write_block function call, the breakpoint is reached an the code immediately goes into reset after taking a step. I originally switched buffers through pointers in the switch statement and then had the DMA enable and sd_write below the switch using the pointers, but got rid of the pointers in favor of brute force to rule out de-referencing errors.

    If I comment out the SD write command the code runs through with the DMA operating just fine. The SD Card read/write operations are also working fine elsewhere in the application.

    I am stumped as to what could be causing this issue. Nothing has changed with the SD card code. If you have a moment could you please take a look?

    static unsigned short adcbuffer1[256];
    static unsigned short adcbuffer2[256];
    static unsigned short adcbuffer3[256];
    void main()
    {
    ...
    }
    void ADC_Capture(){
    //set up ADC (code omitted)
    ...
      // Initialize DMA Channel 2
    DMACTL1 = DMA2TSEL_24; // ADC12IFGx triggered
    DMA2SZ = 256; // DMA0 size = 1
    __data16_write_addr((unsigned short) &DMA2SA,(unsigned long) &ADC12MEM0); //this style command is seen in TI example code
    __data16_write_addr((unsigned short) &DMA2DA,(unsigned long) ADC_buffer_ptr);
    DMA2CTL = DMADT_0 //Single transfer
    + DMAEN //DMA Enable
    + DMASWDW //Word Read/Write
    + DMADSTINCR_3; //Increment Dest Addr
    // Kick off the transfer by starting the ADC conversion
    ADC12CTL0 |= ADC12ENC; // Enable conversions
    ADC12CTL0 |= ADC12SC; // Start conversion

    // Wait for the block to complete
    while ((DMA2CTL & DMAEN) != 0) { }
    while(sdBlockNumber != 79){    
    //select which buffer we are using
    switch(bufferFull)
    {
    case 0:
    DMACTL1 = 0; // this is how the SD card app "resets" the DMA.
    __data16_write_addr((unsigned short) &DMA2DA,(unsigned long) adcbuffer2);
    DMACTL1 = DMA2TSEL_24;
    DMA2CTL |= DMAEN;

    sd_write_block(&sdc, sdBlockNumber, (unsigned char*)adcbuffer1);
    bufferFull = 1;
    break;

    case 1:
    DMACTL1 = 0;
    __data16_write_addr((unsigned short) &DMA2DA,(unsigned long) adcbuffer3);
    DMACTL1 = DMA2TSEL_24;
    DMA2CTL |= DMAEN;

    sd_write_block(&sdc, sdBlockNumber, (unsigned char*)adcbuffer2);
    bufferFull = 2;
    break;

    case 2:
    DMACTL1 = 0;
    __data16_write_addr((unsigned short) &DMA2DA,(unsigned long) adcbuffer1);
    DMACTL1 = DMA2TSEL_24;
    DMA2CTL |= DMAEN;

    sd_write_block(&sdc, sdBlockNumber, (unsigned char*)adcbuffer3);
    bufferFull = 0;
    break;
    }

    sdBlockNumber++;

    // Wait for this ADC-DMA block to complete
    while ((DMA2CTL & DMAEN) != 0) { }
    }
    //disable ADC, write the last SD block
    ... 

    EDIT

    I also raised MCLK to 34MHz and adjusted the ADC sampling frequency to ~38kHz so the timing should be okay.

    EDIT#2

    I have done some additional experiments with some interesting results. Omitting the entire ADC and DMA code, leaving just the while loop with the pseudo-buffer swapping and using dummy data, the SD card function works. Adding the ADC back in (while still omitting the DMA) results in the code going into reset once it reaches a case in the while loop. This same SD write function worked fine in the ADC ISR when the interrupt was enabled, so this selective behavior is confusing.


     

  • Andrew Thornton said:
    I also raised MCLK to 34MHz and adjusted the ADC sampling frequency to ~38kHz so the timing should be okay.


    The maximum MCLK frequency allowed on any MSP is 25MHz. Since you don't crash your system immediately, I guess you're still driving MCLK from DCOCLKDIV (teh power-on default) rather than DCOCLK. And unless oyu changed the FLL pre-divider, DCOCLKDIV is DCOCLK/2 (resulting in 17MHz MCLK). Same for SMCLK.

    About your code, I have an idea that you missed a very small detail: An ISR cannot be interrupted by an ISR normally.
    Even if a higher-priority interrupt is triggered while youo execute an ISR, thsi ISR will first execute to its end before the ISR for the higher-priority interrupt is executed.
    Interrupt priority only applies if two interrutp scompete for entering their ISR, e.g. if they happen at the very same time, or if GIE was disabled while they were buffered, and is enabled now with more than one interrupt pending.

    If you have to start a time-consuming process from inside an ISR, it must be a fire-and-forget type. If no other chain of events can be acquired, then a state machine is the way to go, so when an ISR is called, it checks what to do next.

    Example: in the DMA ISR that tells that buffer is filled, first thing is to switch the DMA to the next buffer and start it. Further DMA operation is a fire-and-forget. You don't do busy-waiting until the DMA is done. Fine. But then you call 'sd_write_block'. Which most likely isn't fire-and-forget.
    Here I guess a state machine is best:

    reset state machine to 'STATE_NEW_TRANSFER' (each STATE_x is just a symbol of any unique value. An enumeration, if you want)
    Pull CS low
    enable TX interrupt
    return

    In the TX interrupt, the TX IST checks for the state. STATE_NEW_TRANSFER writes the first byte of the write command to TXBUF, advances the state to STATE_SEND_WRITE_0 and exits.
    On next TX interrupt, the state is STATE_SEND_WRITE_0, which causes the ISR to writ ethe next byte of the write command to TXBUF, advance to the next state and exit. And so on until the DMA block transfer is needed.

    SInce your SPI speed is equal to MCLK, this can be shortened by just stuffing the header bytes to TXBUF one after another, ensuring that you do not writ ethem too fast (best is hand-crafted assembly code, so you can exactly count processor cycles: one transfer takes only 8).
    Then you program the DMA transfer and arm it, disable the R interrupt, advance the state to STATE_DMA_RUNNING and exit.
    At this point it takes at least 512*8 = 4096 MCLK cycles until the transfer is done. Cycles you waste if you wait for it. Instead, you just do nothing until the next DMA interrupt. Once DMA is done and the DMA ISR is called, it will change the state to STATE_DMA_DONE, and do what's necessary after a write. e.g. enable the RX interrupt and write a dummy to TXBUF so it starts polling the write operation result form the card. Next ISR called is teh RX ISR which sees the 'STATE_DMA_DONE', knows that it requires astatus message form the card and either gets a 'succes' or 'fail' result or continues to write a dummy to TXBUF and exits.

    A little bit more straight and less complex, but not usable in all cases, is if you put all this into main. Then it goes this way:
    The DMA ISR only sets a golbal flag 'buffer x filled' and exits. This flag must be declared volatile, as it is accessed from both, ISR and main.

    Main sees the flag magically set and begins writing. This can be through your sd_write_block function as you did it. While main is waiting for the SPI responses or for the DMA being done, the otehr ISR can interrupt if necessary. Including the DAM ISR itself, reporting the next buffer being filled (this should, however, not happen before you're done writing, or you'll sooner or later overflow your buffers).
    This is the more simple, and even faster variant, since you don't have the overhead of checking states or entering/exiting ISRs, but then main cannot do anything else during the process.It is just waiting for a buffer filled, then writes it and returns to the aiting point.
    But it may be okay for your project.

    [erroneously posted code snippet removed]

  • The ADC capture occurs in a function called from the main work loop so there shouldn't be any ISRs involved in the process. The only things interrupting the main execution in the code I posted should be the DMA transfers, which are handled automatically. The execution only waits for the ADC DMA channel to complete its 256 transfers before first switching the buffer and re-enabling the DMA (fire and forget) and then setting up and issuing the SD write command and DMA. Once that is done, if there was any time left it will wait for the current ADC DMA block to finish before repeating the process. The most time-critical component functionally is the ADC DMA, second to the SD DMA. The sd_write_block function also waits for any previous SD DMA block writes that haven't finished anyways so it will never overflow the buffers, but instead just miss ADC samples by preventing the ADC DMA from getting re-enabled in time.

    The DMA ISR is replaced by the "wait" code 

    EDIT: And yes the are no other priorities when an ADC capture is needed. So have a dedicated section in main is acceptable.

    while ((DMA2CTL & DMAEN) != 0) { }
    
    

    So in the worst case, the code I have written is not fast enough.
    But this still does not explain why the code crashes in the way that it does.

     EDIT:

    I adjusted the DCO to ~20MHz and lowered the ADC sampling rate and the code ran through. The highest I've been able to test it working with so far is with a 25kHz sampling rate in code, but the actual data looks like its about a 10kHz sampling rate. Adjusting the ADCCLK divider from /8 to /6 to increase the sampling rate results in the code going into reset like before, so it seems that the ADC sampling rate is a critical dependency. Still more testing to do, but ideas are always welcome. Thanks!

  • It could be of interest to know why the MSP is resetting.

    On startup, you can read the SYSRSTIV register. It returns one reset cause on every read, until you finally get BOR(power-on) as the initial reset cause and then only 0x00.
    Multiple resets will stack and the highest value (not the last reset!) is delivered first, and each one only once, no matter how often it happened.
    So if you read this register and get 0x20, then the cause was a PMM password violation, if it is 0x16, it was a WDT timeout etc.

  • Now I am actually not sure if it is successfully resetting, but just hanging near the reset vector. Breaking the code in the IAR debugger shows execution halted at address 0x10002, which doesn't exist in the memory space. The code never recovers. What does this mean?

  • Andrew Thornton said:
    Breaking the code in the IAR debugger shows execution halted at address 0x10002, which doesn't exist in the memory space. The code never recovers. What does this mean?

    This rather looks like you trigger an interrupt but have no ISR for it. Then the MSP reads the ISR address from th vetor table, but since there is no ISR, the vector entry still contains 0xffff, and there the MSP jumps to for executing the ISR.
    Depending on the content of the reset vector at 0xfffe (the LSB of the "ISR address" is ignored, as code always executes from even addresses), the MSP executes the data on this address as if it were an instruction (while it is the address of the reset function and no code isntruciton at all) and then continues program execution from this point. Usually this eventually leads to a reset,a shte CPU is running wild, but in your case it seems to lead to an endless loop.

    So the quesiton is: what IE bits did you ever set in your code. One of them causes an interrupt to be triggered which you don't have an ISR for.

  • It would make sense, but doesn't explain why it works with no issue at lower ADC sampling rates, and also works with higher ADC sampling rates if the SD card writing is excluded.

    I will try disabling all interrupts beforehand.

  • I have an interesting experiment to present on this topic. I was able to run 3 iterations of the 512-byte ADC sample -> 512-byte SD write process, running through all 3 RAM buffers, with no error. I then enclosed this exact same process in a for loop for a reasonable number of times and the execution-going-to-0xffff error occurs. I can only imagine this indicates some sort of depleted system resource.

    Here is the code. With the for loop declaration commented out like so, the no-op is reached. With the for loop uncommented, the code dies. Before this code, the ADC and DMA are set up and enabled. I have verified the setup part to be working by pointing the DMA to flash at 13kHz for 9000+ samples and the data looking good.

    Also notable are the sd_wait_notbusy lines which is the first thing that each sd_write_block does anyways, but does not work if the additional call is not made before each write.

      //for(u8 i = 0; i < 39; i++){
    /****************Iteration 1 *************************/
    while ((DMA2CTL & DMAEN) != 0) { }

    DMACTL1 = 0; //Reset DMA
    DMACTL1 = DMA2TSEL_24; //Select ADC Conversion Complete as trigger

    __data16_write_addr((unsigned short) &DMA2DA,(unsigned long) adcbuffer2);
    DMA2CTL |= DMAEN; //Enable the DMA

    sd_wait_notbusy (&sdc);
    sd_write_block(&sdc, sdBlockNumber++, (unsigned char*)adcbuffer1);
     
    /****************Iteration 2 *************************/
    while ((DMA2CTL & DMAEN) != 0) { }
        
    DMACTL1 = 0; //Reset DMA
    DMACTL1 = DMA2TSEL_24; //Select ADC Conversion Complete as trigger

    __data16_write_addr((unsigned short) &DMA2DA,(unsigned long) adcbuffer3);
    DMA2CTL |= DMAEN; //Enable the DMA

    sd_wait_notbusy (&sdc);
    sd_write_block(&sdc, sdBlockNumber++, (unsigned char*)adcbuffer2);
     
    /****************Iteration 3 *************************/
    while ((DMA2CTL & DMAEN) != 0) { }
        
    DMACTL1 = 0; //Reset DMA
    DMACTL1 = DMA2TSEL_24; //Select ADC Conversion Complete as trigger

    __data16_write_addr((unsigned short) &DMA2DA,(unsigned long) adcbuffer1);
    DMA2CTL |= DMAEN; //Enable the DMA

    sd_wait_notbusy (&sdc);
    sd_write_block(&sdc, sdBlockNumber++, (unsigned char*)adcbuffer3);
    // }

    ADC_Disable();

    __no_operation();
  • That's interesting.

    My guess is that the SD write function is somewhat buggy.

    If the write does a wait_notbusy but you'll need a second one seems to indicate that there is something wrong with the SD access. It's even possible that the SD card gets inaccessible after some time or if certain timing requirements are not met.
    I've had cards which stopped workign after some hours, others were sensitive to the transfer protocol timings, while others were unimpressed by even serious timing violations. It's possible that your SD card code was written and tested with one of the third category and now it fails you with your card. You should try with a different brand for verification, and then take a close look at the SD card code.

    The wait_notbusy should work always on the first call. If not, you cannot be sure that under certain conditions even 3 or 4 calls aren't enough. :)

    However, a locked card shouldn't let your code running wild. It may cause the code to enter an endless loop, but it shouldn't ever leave the coded trail. Except if the endless loop is not a loop but a recursion, or allocates resources that aren't freed within the loop. This can cause stack overflow, but I'd consider this as a serious code design flaw.

  • I figured out a few things that were wrong and it is now working almost correctly.

    I had one mismatched register setting enabling an overflow interrupt which caused the code to go to 0xffff.

    After that fix the code would hang waiting for the DMA to complete after the first buffer rotation. This was fixed by disabling the ADC before rotating buffers, then re-enabling the DMA, then re-enabling the ADC. If the ADC is left enabled, the DMA will fail to catch the rising edge of the ADC12IFG flag which occurs during the rotation before the DMA is re-enabled, at least thats what I think is happening.

    Embedded in that last fix is a very particular way to correctly disable the ADC

        ADC12CTL1 &= ~ADC12CONSEQ_2;                   // Stop conversion immediately
    ADC12CTL0 &= ~ADC12ENC; // disable ADC12 conversion

     At first I thought the second line alone was enough to disable the ADC, but that is not the case. The first line is also necessary.

    The code now works up to 44kHz (haven't tested faster) except for one bug where the sample experiences a break for some time at least once every 8000 samples. I am assuming the SD card's microcontroller gets hung up writing to its flash every once in awhile. We are working with a class 2 card and have ordered a few class 10s to see if they resolve the issue.

    Thanks for all your help Mr. Gross :)

  • Andrew Thornton said:
     At first I thought the second line alone was enough to disable the ADC, but that is not the case. The first line is also necessary.

    Indeed. clearing ENC will only stop the ADC in single channel single conversion mode. in all othe rmodes, it will first complete the current conversion.

    However, ~ADC12ONSEQ_2 will only work if it is the current setting. As it only clears teh ADC12CONSEQ1 bit. Performing this if the current setting is ADC12CONSEQ_1 woul dhave no effect, and ADC12CONSEQ_3 would be converted to ADC12CONSEQ_1.

    When clearign a bitfield (all defines with an underscored enumeration are bitfields), you should do so with the highest setting, ~ADC12CONSEQ_3 in this case, as it clears all bits, no matter what the current setting is.

    Andrew Thornton said:
    I am assuming the SD card's microcontroller gets hung up writing to its flash every once in awhile.

    That's possible. It may be that the controller needs to update ad store its internal tables for wear leveling. If you write 10 times to the same flash block on the SD card, you're not writing 10 times to the same physical block. Else the FAT area on a flash-based deice, which is updated with every write operation, would wear out in a short time. The physical memory is mapped to logical memory and the mapping needs to be stored somewhere. It's even possible, that if you write on a 1GB card 2 milllion times to the same logical sector, you wrote to all physical sectors, without ever erasing anything. And from this point on every write operation takes much longer as before, since now an erase is required.. That happens with some cheaper solid state disks.
    However, often the controller writes first to an unused block, and then performs an erase on the previously used physical block once you set the card to idle. or if the two blocks on different physical flash chips, the two operations can even happen simultaneously.

    Only the older cards (the really slow ones) have a 1:1 mapping and erase each sector before writing the new data.

  • Hi Michael,

    I'm working on CC430F5137. I wanted to have dual image in the device's flash memory. 

    Experimenting changing changed vectors, including reset vectors

    FLASH : origin = 0x8000, length = 0x7F80
    INT00 : origin = 0x9080, length = 0x0002
    INT01 : origin = 0x9082, length = 0x0002

    ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

    INT61 : origin = 0x90FA, length = 0x0002
    INT62 : origin = 0x90FC, length = 0x0002
    RESET : origin = 0x90FE, length = 0x0002

    After doing this, my CPU is halting at 0x10002!.

    What might be the problem?

    Thanks,

    Rahul

  • You can’t simply put your vectors to a new location. While you of course can tell the linker to put the vectors anywhere, the CPU expects them at 0xFF80 and up. That’s a hardware requirement. So when an interrupt happens, the CPU fetches the ISR address from 0xFFFx, reads 0xFFFF and jumps to 0xFFFE (SLB ignores). From there it will continue to execute code at 0x10000, 0x10002 etc.

  • Thanks Gross!

  • Jens-Michael Gross
    The plain transfer of 512 bytes with 16MHz system clock takes 2,56ms.

    Could you explain the math behind this calculation please? I had understood from the first paragraph that a DMA transfer takes 4 cycles per byte. That would mean (1/16MHz)*4*512 = 0.128ms. In a later reply you then say a transfer takes 8 cycles/byte, but that would lead to 0.256ms. Exactly how many cycles does a DMA transfer take, and where is this information found?

  • In reply to Jens-Michael Gross:

    "The plain transfer of 512 bytes with 16MHz system clock takes 2,56ms."

    Could you explain the math behind this calculation please? I had understood from the first paragraph that a DMA transfer takes 4 cycles per byte. That would mean (1/16MHz)*4*512 = 0.128ms. In a later reply you then say a transfer takes 8 cycles/byte, but that would lead to 0.256ms. Exactly how many cycles does a DMA transfer take, and where is this information found?
  • A DMA transfer takes 2 clock cycles per byte/word. One for reading memory and one for writing. For the first transfer of a block (or a single transfer) it may take up to 4 clock cycles for additional synchronization with the CPU.

    An SPI master transfer takes 8 SPI clock cycles, which can be 1/2 or 1/1 of the maximum system frequency (depending on SPI module: USART, USI or USCI).

    Generally, DMA clock is MCLK and SPI clock is whatever you choose for it (including MCLK, but also SMCLK, ACLK and external clock signal), so the two need some synchronization. E.g. the USCI TXIFG flag.
    But when using TXIFG trigger for DMA, the CPU is running interleaved with the DMA, and the DMA will need to synchronize with the CPU code execution, causing additional DMA waitstates.

    You could set MCLK to 4MHz, SPI clock on 16MHz (USCI) and then start an unsynchronized DMA block transfer. It should work and result in maximum throughput.

    However, you're right, the total transfer time then isn't 2,56ms, it is 0,256ms or 2MB/s throughput.

**Attention** This is a public forum