Hi,
I have recently upgraded my code gen tool from 6.0.28 to 6.1.19 then to 7.0.4 and found that my optimized code which was fit into the sploop (with optimization flag set to -o2) prior to the upgrade no longer fit into the sploop any more. And I have also found that even the asm code generated are different, in fact more lengthy and inefficient. Let me attach the asm file headers generated from different versions of the code gen tool on the same C source file:
Using TMS320C6x Assembler PC v7.0.4:
2546 ;*----------------------------------------------------------------------------*
2547 ;* SOFTWARE PIPELINE INFORMATION
2548 ;*
2549 ;* Loop source line : 191
2550 ;* Loop opening brace source line : 192
2551 ;* Loop closing brace source line : 253
2552 ;* Known Minimum Trip Count : 1
2553 ;* Known Maximum Trip Count : 65536
2554 ;* Known Max Trip Count Factor : 1
2555 ;* Loop Carried Dependency Bound(^) : 19
2556 ;* Unpartitioned Resource Bound : 14
2557 ;* Partitioned Resource Bound(*) : 14
2558 ;* Resource Partition:
2559 ;* A-side B-side
2560 ;* .L units 0 0
2561 ;* .S units 11 11
2562 ;* .D units 13 12
2563 ;* .M units 7 6
2564 ;* .X cross paths 9 9
2565 ;* .T address paths 13 13
2566 ;* Long read paths 0 0
2567 ;* Long write paths 0 0
2568 ;* Logical ops (.LS) 5 4 (.L or .S unit)
2569 ;* Addition ops (.LSD) 12 13 (.L or .S or .D unit)
2570 ;* Bound(.L .S .LS) 8 8
2571 ;* Bound(.L .S .D .LS .LSD) 14* 14*
2572 ;*
2573 ;* Searching for software pipeline schedule at ...
2574 ;* ii = 19 Did not find schedule
2575 ;* ii = 20 Did not find schedule
2576 ;* ii = 21 Schedule found with 3 iterations in parallel
2577 ;* Done
2578 ;*
2579 ;* Epilog not entirely removed
2580 ;* Collapsed epilog stages : 1
2581 ;*
2582 ;* Prolog not removed
2583 ;* Collapsed prolog stages : 0
2584 ;*
2585 ;* Minimum required memory pad : 0 bytes
2586 ;*
2587 ;* For further improvement on this loop, try option -mh14
2588 ;*
2589 ;* Minimum safe trip count : 2
2590 ;*----------------------------------------------------------------------------
Using TMS320C6x Assembler PC v6.1.19 and TMS320C6x Assembler PC v6.1.12:
2637 ;*----------------------------------------------------------------------------*
2638 ;* SOFTWARE PIPELINE INFORMATION
2639 ;*
2640 ;* Loop source line : 191
2641 ;* Loop opening brace source line : 192
2642 ;* Loop closing brace source line : 253
2643 ;* Known Minimum Trip Count : 1
2644 ;* Known Maximum Trip Count : 65536
2645 ;* Known Max Trip Count Factor : 1
2646 ;* Loop Carried Dependency Bound(^) : 14
2647 ;* Unpartitioned Resource Bound : 13
2648 ;* Partitioned Resource Bound(*) : 14
2649 ;* Resource Partition:
2650 ;* A-side B-side
2651 ;* .L units 0 0
2652 ;* .S units 11 11
2653 ;* .D units 13 12
2654 ;* .M units 7 6
2655 ;* .X cross paths 9 9
2656 ;* .T address paths 13 13
2657 ;* Long read paths 0 0
2658 ;* Long write paths 0 0
2659 ;* Logical ops (.LS) 5 4 (.L or .S unit)
2660 ;* Addition ops (.LSD) 11 13 (.L or .S or .D unit)
2661 ;* Bound(.L .S .LS) 8 8
2662 ;* Bound(.L .S .D .LS .LSD) 14* 14*
2663 ;*
2664 ;* Searching for software pipeline schedule at ...
2665 ;* ii = 14 Did not find schedule
2666 ;* ii = 15 Did not find schedule
2667 ;* ii = 16 Did not find schedule
2668 ;* ii = 17 Schedule found with 4 iterations in parallel
2669 ;* Done
2670 ;*
2671 ;* Epilog not entirely removed
2672 ;* Collapsed epilog stages : 2
2673 ;*
2674 ;* Prolog not entirely removed
2675 ;* Collapsed prolog stages : 1
2676 ;*
2677 ;* Minimum required memory pad : 0 bytes
2678 ;*
2679 ;* For further improvement on this loop, try option -mh28
2680 ;*
2681 ;* Minimum safe trip count : 2
2682 ;*----------------------------------------------------------------------------\
Using TMS320C6x Assembler PC v6.0.28:
2061 ;*----------------------------------------------------------------------------*
2062 ;* SOFTWARE PIPELINE INFORMATION
2063 ;*
2064 ;* Loop source line : 191
2065 ;* Loop opening brace source line : 192
2066 ;* Loop closing brace source line : 253
2067 ;* Known Minimum Trip Count : 1
2068 ;* Known Maximum Trip Count : 65536
2069 ;* Known Max Trip Count Factor : 1
2070 ;* Loop Carried Dependency Bound(^) : 12
2071 ;* Unpartitioned Resource Bound : 11
2072 ;* Partitioned Resource Bound(*) : 13
2073 ;* Resource Partition:
2074 ;* A-side B-side
2075 ;* .L units 0 0
2076 ;* .S units 10 10
2077 ;* .D units 11 9
2078 ;* .M units 7 6
2079 ;* .X cross paths 9 7
2080 ;* .T address paths 13* 13*
2081 ;* Long read paths 0 0
2082 ;* Long write paths 0 0
2083 ;* Logical ops (.LS) 3 5 (.L or .S unit)
2084 ;* Addition ops (.LSD) 9 9 (.L or .S or .D unit)
2085 ;* Bound(.L .S .LS) 7 8
2086 ;* Bound(.L .S .D .LS .LSD) 11 11
2087 ;*
2088 ;* Searching for software pipeline schedule at ...
2089 ;* ii = 13 Did not find schedule
2090 ;* ii = 14 Schedule found with 3 iterations in parallel
2091 ;* Done
2092 ;*
2093 ;* Loop will be splooped
2094 ;* Collapsed epilog stages : 0
2095 ;* Collapsed prolog stages : 0
2096 ;* Minimum required memory pad : 0 bytes
2097 ;*
2098 ;* Minimum safe trip count : 1
2099 ;*----------------------------------------------------------------------------*
Note also the differences in ii, it is changed from 14 (with 6.0.28) to 17 (with 6.1.19) and then to 21 (with 7.0.4). And this compiler degradation behavior is happening in not only this loop but other loops also. Having these loops fit into sploop is essential to our project, and we need to update our code gen tool to 7.0.4. Lots of effort has been spent on optimizing these loops into sploop, we can't afford to exam unexpected behavior and make major changes in every compiler/tool upgrade.
Do you have any recommendation on what compiler flag change (other than -o2) I need to make such that I get the same performance out of it? Or is it a bug (or undocumented feature) that the latest code gen tool carries, but will soon be fixed?
Thanks,
-- Louis