This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Updated: Question on VLIW fetch & execute packets

Anonymous
Anonymous

Hi,

 

I would like to ask some questions on fetch packets & execute packets.

 

The C6000 DSPs implements VLIW architecture. For the example given on page 512 of SPRU732J, TMS320C64x/C64x+ DSP

CPU and Instruction Set, in the instruction dispatch phase there are two execute packets.

 

 

Question 1:

 

Why there are execute packets, rather than allowing all fetched instructions to be executed simultaneously? I guess the reasons might include:

1. Competition of resources: for example, more than two storage/load instructions in the same fetch packet, whereas there are in total two .D functional units. Therefore, the dispatch of excess instructions must be deferred to later cycles.

2. Interdependency: since DM6437 DSP uses VLIW architecture, the compiler is responsible for detecting the interdependency in the C language code and should generate assembly code accordingly. In the attached picture, the shaded first two instructions in the packet are parallel and were decoded in the previous cycle, and the remaining six instructions are parallel and are to be decoded in the present cycle.

3. Is there any other reasons?

 

However, for whatever reason that the fetch packet is divided into execute packets, the information about

1. Execute packets, and the instructions they contain

2. The order of execution for different execute packets

3. The number of delay cycles between execute packet i and packet i+1

must be somehow encoded and placed somewhere in the fetch packet. How is this done and where is it placed?

 

Furthermore, does the compiler put instructions belong to each execute packets contiguously, or are they allowed to be scattered among other instructions? It seems to me that if they are placed contiguously, the information encoding should be easier, and requires fewer bits to encode.

 

And are execute packets placed in fetch packets according to their execution order? If the compiler does this work, it might also save several bits that would otherwise be needed to encode this information, as well as making the hardware logic less complicated.

 

An interesting thing I have noticed when debugging with disassembly is that I frequently seen less than 8 instructions executing together, with parallel arrows on the left indicating this like:

 

 

In the above figure, does it mean that the next execute packet consists of three instructions, and the emulator/CPU is awaiting user input to execute (probably fetch, or instruction dispatch) them?

 

Question 2:

 

 

Not only could there be order of execution for different execute packets within a single fetch packets, but there could also be necessary delay cycles between

(1). The last execute packet of fetch packet i

(2). The first execute packet of fetch packet i+1

 

The reasons for this could be that

1. There are interdependency between at least one instruction in (2) and at least one instruction in (1).

and

2. (1)’s writing into register doesn’t immediately become available and requires one or more cycles’ delay.

 

When this happens, it is imperative that the necessary delay cycles be encoded somewhere into fetch packet i+1. Does the compiler actually do this? How is this encoded and where is it stored in the fetch packet?

 

Question 3:

 

 

How many instructions there are in a single VLIW fetch packet?

 

Each fetch packet is 8 words long containing 256 bits. Most instructions have standard 32 bit version, and many also have compact 16 bit version. If all bits are utilized, each fetch packet could contain 256/16=16 instruction. Considering the fact that some bits are needed to encode which instructions belong to which execute packet(s) as well as the execution order (see question 1 and 2), there should still be at least over 192 bits available (a rough lower bound estimate), allowing a maximum of 192/16=12 instructions.

 

 

If there are 12 (>8) instructions in the fetch packet, certainly they cannot be dispatched at the same cycle. Therefore, even if there are no interdependencies at all between instructions, they still needed to be dispatched in more than one cycles.

 

Does this really happen?

 

Where can I find the information on VLIW instruction structure? Which parts does it consist of? How does compiler generate them?

 

 

 

 

I wish to get detailed, in-depth answer on the above questions. Hope someone could help on this.

 

 

 

 

Thanks,

Zheng

  • Please see the collateral from this C6000 training workshop.  It probably answers most of your questions.

    Thanks and regards,

    -George

  • Anonymous
    0 Anonymous in reply to George Mock

    Hi,

     

    I would like to ask a question on browser support.

     

    I posted Question on VLIW fetch & execute packets several hours ago,

     

    1.    It was pasted from MS Word 2007 into the web editing window in Firefox 4.01. I didn't use IE because my IE9 cannot Insert File into the post.

    2.    It is viewable in both Firefox 4.01 and Opera 11.11, but in IE9 or IE8 there is nothing displayed.

     

    It is an urgent question for me and I need support on that topic. Could anyone check into this?

     

     

     

     

    Zheng

     

  • Greetings,

    I've updated the post and let me know if you are still seeing any issues. When pasting anything from Microsoft Office products we recommend using the "Paste from Word" button in the editor (to the immediate left of the spellchecker button). More details on using it are available (here).

    Looks like we have since responded, so let me know if you are able to see the response and you should get an email version of the response as well.

    Anything else we can do to help, let us know.

     

    Blake

  • Anonymous
    0 Anonymous in reply to George Mock

    Question 2:

     

    Not only could there be order of execution for different execute packets within a single fetch packets, but there could also be necessary delay cycles between

    (1). The last execute packet of fetch packet i

    (2). The first execute packet of fetch packet i+1

     

    The reasons for this could be that

    1.    There are interdependency between at least one instruction in (2) and at least one instruction in (1).

    and

    2.    (1)'s writing into register doesn't immediately become available and requires one or more cycles' delay.

     

    When this happens, it is imperative that the necessary delay cycles be encoded somewhere into fetch packet i+1. Does the compiler actually do this? How is this encoded and where is it stored in the fetch packet?

     

     

  • Blake,

     

     

    "Paste from Word" is perfect. The top post is pasted directly from clipboard and preserves the indentation, whereas the post immediately above is pasted using "Paste from Word" button, and indentations are lost.

     

    Heading in Light Blue

     

    Meanwhile, both "CTRL+V clipboard paste" and "Paste from Word" lost the color for headings "Question 1, Question 2, Question 3", which were originally light blue in MS Word.

     

    Font color in red

     

    Colored (in red above) normal style fonts in Word also get lost with "Paste from Word", but "CTRL+V clipboard paste" preserves it.

     

     

     

    Zheng

     

  • Anonymous
    0 Anonymous in reply to George Mock

    George,

    George said:

    Please see the collateral from this C6000 training workshop.  It probably answers most of your questions.

    What do you mean by "collateral"? I didn't find the a appropriate definition to this context from either dictionaries or http://en.wikipedia.org/wiki/Collateral.

     

    Zheng

  • We use "collateral" as if it were a noun, but it should really be taken as an adjective.  The real phrase is "collateral material." 

    One of the definitions of "collateral" as an adjective (http://dictionary.reference.com/browse/collateral) is "serving to support or corroborate", which is the sense meant here.

  • Anonymous
    0 Anonymous in reply to Archaeologist

    Archaeologist:

    Thanks very much for looking up the definition for me and the explanation.

    I think it is neither "supportive" nor "corroborative" in this context, nor the same as "supplemental". Itself is unique and probably the most appropriate word here.

     

    Zheng

  • Anonymous
    0 Anonymous in reply to George Mock

    George,

    I found answers to all my questions are contained in the referred documents and SPRU732J, TMS320C64x-C64x+ DSP CPU and Instruction Set Reference Guide. Thanks very much.

     

    Zheng