I would like to know if someone could help me to optimize the ACS butterfly of the Viterbi decoder in C. I would prefer to go as far as possible without writing it in assembly. I am using the optimizing compiler option at level 3.
Is there a good example out there? Even for the C64x? Here is what I have so far (just started optimizing):
typedef union
{
uint32_t integer;
int16_t shorts[2];
}opt_u;
test_path_metrics_1.shorts[0] = path_metrics[start_states[0]] + current_branch_metric; // First pair goes in the lower bits
test_path_metrics_2.shorts[0] = path_metrics[start_states[1]] - current_branch_metric; // First pair goes in the lower bits
test_path_metrics_1.shorts[1] = path_metrics[start_states[0]] - current_branch_metric; // Second pair goes in the upper bits
test_path_metrics_2.shorts[1] = path_metrics[start_states[1]] + current_branch_metric; // Second pair goes in the upper bits
max_result.integer = _max2(test_path_metrics_1.integer, test_path_metrics_2.integer);
// A 1-true means select start state 1, a 0-false means select start state 0 (for both pairs)
cmp_result = _cmpgt2(test_path_metrics_2.integer, test_path_metrics_1.integer);
path_history[out_bit_idx][end_state_0] = start_states[cmp_result&1]; // Low result is in the LEAST significant bit
path_metrics_new[end_state_0] = max_result.shorts[0];
path_history[out_bit_idx][end_state_1] = start_states[(cmp_result>>1)&1]; // High result is in the MORE significant bit
path_metrics_new[end_state_1] = max_result.shorts[1];
Thank you
Chris