Here's a code snippet for those who need to bit-bang SPI output as quickly as possible. Its baud is 1/7 MCLK. I haven't seen this technique posted anywhere, so please forgive me if I'm rehashing the obvious.
The most important trick to writing optimized code, besides writing in assembler, is to remove any jumps, so the pipeline stays full. For an unoptimized SPI routine, there will be two jumps: one to repeat the loop and one conditional jump to set or clear the SPISIMO line. We remove the loop jump trivially by unrolling the loop. The other jump is tricky, but there's a work-around. Here is the code. It assumes that SPISIMO is IO24 , SPICLK is IO26 (but both can be trivially modified) , and that chip select assertion/deassertion is handled elsewhere, and that DP is pointed to the the IO register bank. Input data is in AL, and will be shifted out MSB first.
XOR AL,#0xFFFF ;invert the data
MOVL XAR7,#GPASETHI ;point XAR7 to set register of SPISIMO
MOV AR1,#(1<<8) ;bit mask register = location of the SPISIMO bit
MOVB AH,#0 ;pre-clear AH
LSL ACC,#1 ;shift next bit from AL MSB to AH LSB
LSL AH,#1 ;and multiply by 2
MOV AR0,AH ;the result (0 or 2) will be used as an offset index
MOV *+XAR7[AR0],AR1 ;write bit mask register to either GPASETHI or GPACLEARHI
MOVB AH,#0 ;pre-clear AH
MOV @GPASETHI, #(1 <<10) ;bring SPICLK high
LSL ACC,#1 ;shift next bit from AL MSB to AH LSB
MOV @GPACLEARHI, #(1 <<10) ;bring SPICLK low
Repeat the previous 7 instructions for as many bits as you want to shift.
The clock is high for two cycles of the seven. Data is setup two cycles before clock goes high.
A more general, less nosebleedy version rolls the loop back up, and use BANZ to allow the user to send a programmable number of bits.
If anyone knows of a way to do this with fewer than 7 cycles/bit, I'd love to hear about it.
-Jim MacArthur