This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Help explain C55 instruction

Hi,

I need to use the algorithm on spra776a.pdf on viterbi decoding. I find that C55 instruction is quite different from C6000. The following code/description on page 11 cannot be easily understand for me.

; AR5: pointer to the old metrics table

; AR4: pointer to the new metrics table

; T2 = SD(2*j) − SD(2*j+1)

;Compute New_metric (i)&(i+8)

hi(AC0) = *AR5+ − T2, ;AC0=Old_Met(2*j) +T2

lo(AC0) = *AR5+ + T2 ;AC0=Old_met(2*j+1)−T2

hi(AC1) = *AR5+ + T2, ;AC1=Old_Met(2*j) −T2

lo(AC1) = *AR5+ − T2 ;AC1=Old_met(2*j+1)+T2

max_diff(AC0, AC1, AC2, AC1) ;Compare AC0, AC1

||*AR4(T0) = lo(AC2), ;Store New_metric(i−1)&(i−1+8)

*AR4+ = hi(AC2)

Three instructions are required to update two states. The states are updated in consecutive

order to simplify pointer manipulation. In many systems, the same local distance is used in

consecutive butterflies.

 

"Three instructions are required to update two states. " indicates which three instructions?

"update two states" means which states in the code?

 

Thanks,

 

  •  Hi,

     In C6000, "||" means parallel instruction. In the following code, ||*AR4(T0) = lo(AC2) means after AC2 updates in first line, saves to AR4(T0). There are some cycles needed for the first line even though there is "||" indication? Then, next cycle runs *AR4+ = hi(AC2). How many cycles needed for these three instructions?

    Thanks,

     

    max_diff(AC0, AC1, AC2, AC1) ;Compare AC0, AC1

    ||*AR4(T0) = lo(AC2), ;Store New_metric(i−1)&(i−1+8)

    *AR4+ = hi(AC2)

     

  • Because the C55x has a multi-stage pipeline, it is a bit misleading to talk about how many "cycles" an instruction takes.  I'll ignore that problem for this post.

    The code fragment you show should be thought of in the following fashion, which is how it is actually encoded:

    HI(AC0) = *AR5+ - T2, LO(AC0) = *AR5+ + T2
    HI(AC1) = *AR5+ + T2, LO(AC1) = *AR5+ - T2
    *AR4(AR0) = LO(AC2), *AR4+ = HI(AC2) || max_diff(AC0, AC1, AC2, AC1)
    

    Parallel instructions happen all at once, except that some effects may happen in earlier pipeline stages.  max_diff operates in the execute stage, so it writes to AC2 after AC2 has been read for the dual store.  Thus, the dual store is storing the previous calculation, not the one being calculated at the same time.  The comments in SPRA776 seem to suggest that this is the case, but I'll admit I haven't studied the whole example closely.

  • Although in the original paper it is written in 5 lines, these are in fact three instructions as you wrote. Is it right?

    HI(AC0) = *AR5+ - T2, LO(AC0) = *AR5+ + T2
    HI(AC1) = *AR5+ + T2, LO(AC1) = *AR5+ - T2
    *AR4(AR0) = LO(AC2), *AR4+ = HI(AC2) || max_diff(AC0, AC1, AC2, AC1)
    
    
    Another question is for the AR0 (in blue) in the last line. The original document is T0. They are equivalent, or the real content in T0 is AR0?
    Thanks,
  • Hi,

    For C6000, there is double word access. How about C55 the memory data bus access range? I do not find that information yet. Thanks

  • The example shows 4 instructions, two of which are in parallel.  There are 3 execution packets.

    As to AR0 vs. T0, I forgot to use the "CPL" setting for the disassembler.  Whether that instruction uses AR0 or T0 depends on the CPL bit.  Yes, it should be T0 for this example.

  • The widest load available on C55x is 32 bits.