This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

[C6416] Pipeline behavior when an EP spans two FPs

Hi,

I see in spru732j and spru610c that, an execute packet is allowed to span two fetch packets. But I did not found out a detail description about what the PR and DP stages will act on such a condition.

I show an example in the following figure:

   ...

   inst 1     ; EP1              ; FP1          ; ---------- DC stage

|| inst 2

   inst 3     ; EP2                                 ; ---------- DP stage

|| inst 4

   inst 5     ; EP3                                 ; ---------- PR stage

|| inst 6

|| inst 7

|| inst 8

----------

|| inst 9     ; EP3 (spans FPs)     ; FP2     ; ---------- PW stage

|| inst 10

|| inst 11

|| inst 12

   inst 13     ; EP4

|| inst 14

|| inst 15

|| inst 16

See, there are two fetch packets FP1 and FP2. The FP1 has three execute packets EP1, EP2 and EP3. The FP2 has two execute packets EP3 and EP4. The execute packet EP3 contains 8 instructions and spans the two fetch packets.

In this cycle, EP1 is moved to DC stage and EP2 is advanced in DP stage. The first part of EP3 in FP1 still stays in PR stage. The whole FP2 is still kept in PW stage.

Then, in the next cycle, what will happen? Will the two parts of EP3 (who spans two fetch packets) be moved to DP/DC/E1 stages one following the other, instructions 5-8 first, then instruction 9-12 follow? Or will the whole execute packet EP3 be moved to DP in the save cycle?

If it is the latter, then how to achieve that? What will happen in the pipeline?

Thanks and best regards,

  • Did I make mistake in the above?

    I am just seeing into the Figure 4-20 in spru732j again, and thinking that there may be some kind of buffer between PR and DP to cache the value of PR in the previous cycle.

    So the whole FP1 will be advanced from PR stage into the buffer, and the first execute packet if the FP1 will be moved to DP stage at the same time. And the FP2 will be moved to PR stage in the same cycle.

    Cycle 1 -----

        FP1 --> DP / buffer

            EP1 --> DP

            EP2, EP3(1) --> stay in buffer

        FP2 --> PR

    Cycle 2 -----

        FP1 --> DC / DP / buffer

            EP1 --> DC

            EP2 --> DP

            EP3(1) --> stays in buffer

        FP2 --> stays in PR

    Cycle 3 -----

        FP1 --> E1 / DC / DP

            EP1 --> E1

            EP2 --> DC

            EP3(1) --> DP

        FP2 --> PR / buffer

            EP3(2) --> buffer --> DP

            EP4 --> stays in PR

    That is, in cycle 3, when EP3(1) is advanced in DP stage, all the instructions in FP1 have been consumed (the buffer is empty). Thus the value of FP2 can be pre-loaded into the buffer, though the FP2 is still staying in PR stage.

    Since the pipeline detects the EP3(1) is spanning fetch packets, and the rest part EP(2) has been re-loaded into the buffer, it can advance the EP(2) directly into DP stage in the same cycle.

    Is it?

    Thanks and best regards,

  • Lu,

    In the C64x and C64x+ architecture execute packets can span fetch packets in order to save code size. Execute packets are meant to execute together so as shown in figure 4-20 of the C64x/C64x+ CPU Guide Instruction set manual (SPRU732), the pipeline will stall in order for that to happen.

    Below is the text from that page with figure 4-20 (I am sorry but I do not know how to paste it here):

    "In Figure 4-20, fetch packet n, which contains three execute packets, is shown followed by six fetch packets (n + 1 through n + 6), each with one execute packet (containing eight parallel instructions). The first fetch packet (n) goes through the program fetch phases during cycles 1-4. During these cycles, a program fetch phase is started for each of the fetch packets that follow.

    In cycle 5, the program dispatch (DP) phase, the CPU scans the p -bits and detects that there are three execute packets (k through k + 2) in fetch packet n. This forces the pipeline to stall, which allows the DP phase to start for execute packets k + 1 and k + 2 in cycles 6 and 7. Once execute packet k + 2 is ready to move on to the DC phase (cycle 8), the pipeline stall is released.

    The fetch packets n + 1 through n + 4 were all stalled so the CPU could have time to perform the DP phase for each of the three execute packets (k through k + 2) in fetch packet n. Fetch packet n + 5 was also stalled in cycles 6 and 7: it was not allowed to enter the PG phase until after the pipeline stall was released in cycle 8. The pipeline continues operation as shown with fetch packets n + 5 and n + 6 until another fetch packet containing multiple execution packets enters the DP phase, or an interrupt occurs."

    I hope this helps.

    Jackie Brenner

    Texas Instruments

  • Hi Jackie,

    I'm sorry for mentioning Figure 4-20, which makes misguide. What I really want to know is the pipeline behaviour in the case of an execute packet spanning two fetch packets.

    Now I know that, these details are not public. I'll not request an answer on it.

    Though, I still want to know, just for interesting, whether an additional stall of one CPU cycle in dispatch pipeline is needed, to get the two pieces of the execute packet together, thinking that the two pieces are loaded in two fetch packets.

    Thanks and best regards,

    Lu

  • Lu,

    The C64x/C64x+ CPU and Instruction set guide (SPRU732) is in the public domain.  I was just not able to copy the figure 4-20 into this post which is a limitation on my part.

    I do not understand your question though. The diagram aned explanation shows there is a pipeline stall so that the execute packets can execute together.

    Since this may be a language barrier issue, I suggest you contact your local TI representative/distributor to ensure proper understanding of your question.

    Jackie Brenner

    Texas Instruments