This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Viterbi ACS optimization in C with intrinsics (C6740)

I would like to know if someone could help me to optimize the ACS butterfly of the Viterbi decoder in C. I would prefer to go as far as possible without writing it in assembly. I am using the optimizing compiler option at level 3.

Is there a good example out there? Even for the C64x? Here is what I have so far (just started optimizing):

typedef union
{
uint32_t integer;
int16_t shorts[2];
}opt_u;

test_path_metrics_1.shorts[0] = path_metrics[start_states[0]] + current_branch_metric; // First pair goes in the lower bits
test_path_metrics_2.shorts[0] = path_metrics[start_states[1]] - current_branch_metric; // First pair goes in the lower bits

test_path_metrics_1.shorts[1] = path_metrics[start_states[0]] - current_branch_metric; // Second pair goes in the upper bits
test_path_metrics_2.shorts[1] = path_metrics[start_states[1]] + current_branch_metric; // Second pair goes in the upper bits

max_result.integer = _max2(test_path_metrics_1.integer, test_path_metrics_2.integer);

// A 1-true means select start state 1, a 0-false means select start state 0 (for both pairs)
cmp_result = _cmpgt2(test_path_metrics_2.integer, test_path_metrics_1.integer);

path_history[out_bit_idx][end_state_0] = start_states[cmp_result&1]; // Low result is in the LEAST significant bit
path_metrics_new[end_state_0] = max_result.shorts[0];

path_history[out_bit_idx][end_state_1] = start_states[(cmp_result>>1)&1]; // High result is in the MORE significant bit
path_metrics_new[end_state_1] = max_result.shorts[1];

Thank you

Chris

  • Chris35513 said:
    Is there a good example out there? Even for the C64x? Here is what I have so far (just started optimizing): 

    As per this link, http://www.ti.com/product/tms320tci6484 , the device "TCI6484" has Viterbi Decoder Coprocessor(VCP2). 

    In the following post, they did mentioned about the c64x devices which supports VCP2.

    http://e2e.ti.com/support/dsp/omap_applications_processors/f/42/t/51108.aspx

    Regards,

    Shankari.

     

    --------------------------------------------------------------------------------------------------------
    Please click the
    Verify Answer button on this post if it answers your question.

    --------------------------------------------------------------------------------------------------------

  • Thank you for the answer. I posted this on the C67x Single Core DSP Forum because I am using a C6747 although I did not make that clear in the post. I suggested an example from C64x because as far as I know, the C6747 is a superset of that instruction set and so it would be applicable. I did not mention C64x because I wanted to use the coprocessor and I don't have the coprocessor. As far as I understand not all C64x have the coprocessor anyway.

    Is there anyone who has used intrinsics to optimize a Viterbi decoder? My current plan will look something like this for the inner loop (4 end states per iteration):

    _addsub2()
    _subadd2()
    _max2()
    _max2()
    _cmpgrt2()
    _cmpgrt2()

    Rather than what I had posted previously.