This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320C6678: delay slot and Functional Unit Latency for c6678 instruction

Genius 13655 points
Part Number: TMS320C6678

Hello champs,

How to calculate the CPU cycles for an assembly instruction? Does it need to include the delay slot+Functional Unit Latency? 

For example, how many CPU cycles need for MPYDP instruction? 

Instruction Type MPYDP
Delay Slots 9
Functional Unit Latency 4
http://www.ti.com/lit/ug/sprugh7/sprugh7.pdf

Thanks.
Rgds
Shine

  • We're looking into this.

    Best Regards,
    Yordan
  • Hi,

    I thought "The lower 32 bits of src1 are read on E1 and E2, and the upper 32 bits of src1 are read on E3 and E4. The lower 32 bits of src2 are read on E1 and E3, and the upper 32 bits of src2 are read on E2 and E4. The lower 32 bits of the result are written on E9, and the upper 32 bits of the result are written on E10". So I think it takes 13 CPU cycles (4 functional + 9 delay).

    Regards, Eric
  • SPRUGH7 is the right document to look at. The most important sections for this question are 3.4 (Delay Slots) and 5.1.4 (Pipeline Operation Summary).

    From Table 3-8, the functional unit latency is usually baked into the number of delay slots. In general, I believe an instruction takes a number of cycles equal to 7 plus the number of delay slots.

    From Figure 5-5, Figure 5-6, and Table 5-1, MPYDP uses ten execute cycles, in addition to four program fetch cycles and two program decode cycles, for a total of 16 cycles. Of course, when the pipeline is full, most of those cycles overlap with other instructions. Table 5-4 explains what MPYDP does during each of the ten execute cycles, and section 5.2.15 shows (for MPYDP) when the four cycles of instruction unit latency are relative to register reads and writes.

    In the case of MPYDP, section 3.4 also mentions that the arithmetically equivalent FMPYDP was added with smaller values for both delay slots and functional unit latency.