Help explain C55 instruction

Robert W

Hi,

I need to use the algorithm on spra776a.pdf on viterbi decoding. I find that C55 instruction is quite different from C6000. The following code/description on page 11 cannot be easily understand for me.

; AR5: pointer to the old metrics table

; AR4: pointer to the new metrics table

; T2 = SD(2*j) − SD(2*j+1)

;Compute New_metric (i)&(i+8)

hi(AC0) = *AR5+ − T2, ;AC0=Old_Met(2*j) +T2

lo(AC0) = *AR5+ + T2 ;AC0=Old_met(2*j+1)−T2

hi(AC1) = *AR5+ + T2, ;AC1=Old_Met(2*j) −T2

lo(AC1) = *AR5+ − T2 ;AC1=Old_met(2*j+1)+T2

max_diff(AC0, AC1, AC2, AC1) ;Compare AC0, AC1

||*AR4(T0) = lo(AC2), ;Store New_metric(i−1)&(i−1+8)

*AR4+ = hi(AC2)

Three instructions are required to update two states. The states are updated in consecutive

order to simplify pointer manipulation. In many systems, the same local distance is used in

consecutive butterflies.

"Three instructions are required to update two states. " indicates which three instructions?

"update two states" means which states in the code?

Thanks,

over 12 years ago

0 Robert W over 12 years ago

Mastermind 7100 points

Hi,

In C6000, "||" means parallel instruction. In the following code, ||*AR4(T0) = lo(AC2) means after AC2 updates in first line, saves to AR4(T0). There are some cycles needed for the first line even though there is "||" indication? Then, next cycle runs *AR4+ = hi(AC2). How many cycles needed for these three instructions?

Thanks,

max_diff(AC0, AC1, AC2, AC1) ;Compare AC0, AC1

||*AR4(T0) = lo(AC2), ;Store New_metric(i−1)&(i−1+8)

*AR4+ = hi(AC2)

0 Archaeologist over 12 years ago in reply to Robert W

TI__Guru* 84285 points

Because the C55x has a multi-stage pipeline, it is a bit misleading to talk about how many "cycles" an instruction takes. I'll ignore that problem for this post.

The code fragment you show should be thought of in the following fashion, which is how it is actually encoded:

HI(AC0) = *AR5+ - T2, LO(AC0) = *AR5+ + T2
HI(AC1) = *AR5+ + T2, LO(AC1) = *AR5+ - T2
*AR4(AR0) = LO(AC2), *AR4+ = HI(AC2) || max_diff(AC0, AC1, AC2, AC1)

Parallel instructions happen all at once, except that some effects may happen in earlier pipeline stages. max_diff operates in the execute stage, so it writes to AC2 after AC2 has been read for the dual store. Thus, the dual store is storing the previous calculation, not the one being calculated at the same time. The comments in SPRA776 seem to suggest that this is the case, but I'll admit I haven't studied the whole example closely.

0 Robert W over 12 years ago in reply to Archaeologist

Mastermind 7100 points

Although in the original paper it is written in 5 lines, these are in fact three instructions as you wrote. Is it right?

HI(AC0) = *AR5+ - T2, LO(AC0) = *AR5+ + T2
HI(AC1) = *AR5+ + T2, LO(AC1) = *AR5+ - T2
*AR4(AR0) = LO(AC2), *AR4+ = HI(AC2) || max_diff(AC0, AC1, AC2, AC1)

Another question is for the AR0 (in blue) in the last line. The original document is T0. They are equivalent, or the real content in T0 is AR0?

Thanks,

0 Robert W over 12 years ago in reply to Robert W

Mastermind 7100 points

Hi,

For C6000, there is double word access. How about C55 the memory data bus access range? I do not find that information yet. Thanks

0 Archaeologist over 12 years ago in reply to Robert W

TI__Guru* 84285 points

The example shows 4 instructions, two of which are in parallel. There are 3 execution packets.

As to AR0 vs. T0, I forgot to use the "CPL" setting for the disassembler. Whether that instruction uses AR0 or T0 depends on the CPL bit. Yes, it should be T0 for this example.

0 Archaeologist over 12 years ago in reply to Robert W

TI__Guru* 84285 points

The widest load available on C55x is 32 bits.

Processors

Processors forum

Help explain C55 instruction