Tool/software: TI C/C++ Compiler
Previous releases of DSPLIB on ti.com were compiled using C6000 7.4.2 code gen tools. From Proc-SDK 5.2, DSPLIB was included in Processor-SDK and was compiled with C6000 8.x code gen tool.
However, with C6000_8.3.3 code gen tool, following kernels of DSPLIB for C66x were incorrect functionally (failed unit tests) when compiled with -o3 optimization:
DSPF_sp_fftSPxSP, DSPF_sp_ifftSPxSP, DSPF_sp_svd_cmplx.
Attached are CCS projects for these kernels and readme on howDSPF_sp_ifftSPxSP.zipDSPF_sp_svd_cmplx.zip
Here are the steps to reproduce the failures of 3 DSPLIB kernels: - DSPF_sp_fftSPxSP, - DSPF_sp_ifftSPxSP, - DSPF_sp_svd_cmplx ================= DSPF_sp_fftSPxSP: ================= 1. Import project at: DSPF_sp_fftSPxSP\c66\DSPF_sp_fftSPxSP_66_LE_ELF 2. Set project configuration to Debug, and build project 3. Load DSPF_sp_fftSPxSP\c66\DSPF_sp_fftSPxSP_66_LE_ELF\Debug\DSPF_sp_fftSPxSP_66_LE_ELF.OUT to C66x DSP and run. CCS console should print the following (cycle numbers may not be exactly the same): DSPF_sp_fftSPxSP Iter#: 1 Intrinsic Successful SA Successful N = 8 radix = 2 natC: 4292 optC: 1166 SA: 433 DSPF_sp_fftSPxSP Iter#: 2 Intrinsic Successful SA Successful N = 16 radix = 4 natC: 7115 optC: 2246 SA: 835 DSPF_sp_fftSPxSP Iter#: 3 Intrinsic Successful SA Successful N = 32 radix = 2 natC: 17506 optC: 6327 SA: 2320 DSPF_sp_fftSPxSP Iter#: 4 Intrinsic Successful SA Successful N = 64 radix = 4 natC: 32839 optC: 13265 SA: 4902 DSPF_sp_fftSPxSP Iter#: 5 Intrinsic Successful SA Successful N = 128 radix = 2 natC: 81841 optC: 34259 SA: 12639 DSPF_sp_fftSPxSP Iter#: 6 Intrinsic Successful SA Successful N = 256 radix = 4 natC: 158393 optC: 71552 SA: 26428 DSPF_sp_fftSPxSP Iter#: 7 Intrinsic Successful SA Successful N = 512 radix = 2 natC: 384393 optC: 174389 SA: 64253 DSPF_sp_fftSPxSP Iter#: 8 Intrinsic Successful SA Successful N = 1024 radix = 4 natC: 750925 optC: 361993 SA: 133625 Memory: 1216 bytes Cycles: 12639 (N=128) 26428 (N=256) 4. Change project configuration to Release, and rebuild the project 5. Load DSPF_sp_fftSPxSP\c66\DSPF_sp_fftSPxSP_66_LE_ELF\Release\DSPF_sp_fftSPxSP_66_LE_ELF.OUT to C66x DSP and run. CCS console should print the following (notice the failure at the second line): DSPF_sp_fftSPxSP Iter#: 1 Intrinsic Successful SA Successful N = 8 radix = 2 natC: 411 optC: 179 SA: 256 DSPF_sp_fftSPxSP Iter#: 2 Intrinsic Successful SA Failure max_pct_diff = 5007338130857002139.091492 N = 16 radix = 4 natC: 576 optC: 207 SA: 213 DSPF_sp_fftSPxSP Iter#: 3 Intrinsic Successful SA Successful N = 32 radix = 2 natC: 1256 optC: 354 SA: 258 DSPF_sp_fftSPxSP Iter#: 4 Intrinsic Successful SA Successful N = 64 radix = 4 natC: 2355 optC: 531 SA: 457 DSPF_sp_fftSPxSP Iter#: 5 Intrinsic Successful SA Successful N = 128 radix = 2 natC: 5530 optC: 1414 SA: 986 DSPF_sp_fftSPxSP Iter#: 6 Intrinsic Successful SA Successful N = 256 radix = 4 natC: 10930 optC: 2731 SA: 1915 DSPF_sp_fftSPxSP Iter#: 7 Intrinsic Successful SA Successful N = 512 radix = 2 natC: 25586 optC: 6250 SA: 4308 DSPF_sp_fftSPxSP Iter#: 8 Intrinsic Successful SA Successful N = 1024 radix = 4 natC: 51038 optC: 12336 SA: 8408 Memory: 2496 bytes Cycles: 986 (N=128) 1915 (N=256) ================== DSPF_sp_ifftSPxSP: ================== Import and build the project with Debug and Release configurations respectively, similarly to DSPF_sp_fftSPxSP. 1. With Debug configuration, program execution generates console output below: DSPF_sp_ifftSPxSP Iter#: 1 Intrinsic Successful SA Successful N = 8 radix = 2 natC: 4413 optC: 1205 SA: 471 DSPF_sp_ifftSPxSP Iter#: 2 Intrinsic Successful SA Successful N = 16 radix = 4 natC: 7307 optC: 2311 SA: 907 DSPF_sp_ifftSPxSP Iter#: 3 Intrinsic Successful SA Successful N = 32 radix = 2 natC: 17664 optC: 6440 SA: 2454 DSPF_sp_ifftSPxSP Iter#: 4 Intrinsic Successful SA Successful N = 64 radix = 4 natC: 33405 optC: 13488 SA: 5171 DSPF_sp_ifftSPxSP Iter#: 5 Intrinsic Successful SA Successful N = 128 radix = 2 natC: 82393 optC: 34662 SA: 13128 DSPF_sp_ifftSPxSP Iter#: 6 Intrinsic Successful SA Successful N = 256 radix = 4 natC: 160801 optC: 72431 SA: 27451 DSPF_sp_ifftSPxSP Iter#: 7 Intrinsic Successful SA Successful N = 512 radix = 2 natC: 387183 optC: 176077 SA: 66293 DSPF_sp_ifftSPxSP Iter#: 8 Intrinsic Successful SA Successful N = 1024 radix = 4 natC: 762014 optC: 365599 SA: 137716 Memory: 1344 bytes Cycles: 13128 (N=128) 27451 (N=256) 2. With Release configuration, program execution generates console output below: DSPF_sp_ifftSPxSP Iter#: 1 Intrinsic Successful SA Successful N = 8 radix = 2 natC: 534 optC: 177 SA: 262 DSPF_sp_ifftSPxSP Iter#: 2 Intrinsic Successful SA Failure max_pct_diff = 5007338130857002139.091492 N = 16 radix = 4 natC: 690 optC: 189 SA: 272 DSPF_sp_ifftSPxSP Iter#: 3 Intrinsic Successful SA Successful N = 32 radix = 2 natC: 1398 optC: 306 SA: 296 DSPF_sp_ifftSPxSP Iter#: 4 Intrinsic Successful SA Successful N = 64 radix = 4 natC: 2537 optC: 535 SA: 561 DSPF_sp_ifftSPxSP Iter#: 5 Intrinsic Successful SA Successful N = 128 radix = 2 natC: 5856 optC: 1465 SA: 1066 DSPF_sp_ifftSPxSP Iter#: 6 Intrinsic Successful SA Successful N = 256 radix = 4 natC: 11480 optC: 2888 SA: 2441 DSPF_sp_ifftSPxSP Iter#: 7 Intrinsic Successful SA Successful N = 512 radix = 2 natC: 26840 optC: 6534 SA: 4410 DSPF_sp_ifftSPxSP Iter#: 8 Intrinsic Successful SA Successful N = 1024 radix = 4 natC: 53444 optC: 13142 SA: 10485 Memory: 2688 bytes Cycles: 1066 (N=128) 2441 (N=256) ================== DSPF_sp_svd_cmplx: ================== 1. Import project at: DSPF_sp_svd_cmplx\c66\DSPF_sp_svd_cmplx_66_LE_ELF 2. Download MATHLIB 3.1.2.1 from http://software-dl.ti.com/sdoemb/sdoemb_public_sw/mathlib/latest/index_FDS.html and install it 3. Open project DSPF_sp_svd_cmplx_66_LE_ELF, go to Build Settings -> Build -> Variables, and set MATHLIB_INSTALL_DIR to MATHLIB 3.1.2.1 installation root folder. 4. Build the project with Debug and Release configurations respectively, similarly to DSPF_sp_fftSPxSP. 5. With Debug configuration, program execution generates console output below: DSPF_sp_svd_cmplx Iter#: 0 Result Successful order= 3x 3 natC: 143512 optC: 45931 DSPF_sp_svd_cmplx Iter#: 1 Result Successful order= 4x 2 natC: 79790 optC: 28271 DSPF_sp_svd_cmplx Iter#: 2 Result Successful order= 3x 3 natC: 142265 optC: 45497 DSPF_sp_svd_cmplx Iter#: 3 Result Successful order= 3x 3 natC: 142769 optC: 45871 DSPF_sp_svd_cmplx Iter#: 4 Result Successful order= 3x 3 natC: 140245 optC: 45198 DSPF_sp_svd_cmplx Iter#: 5 Result Successful order= 5x 4 natC: 73164 optC: 26094 DSPF_sp_svd_cmplx Iter#: 6 Result Successful order= 8x 5 natC: 414471 optC: 147010 DSPF_sp_svd_cmplx Iter#: 7 Result Successful order= 5x 5 natC: 71462 optC: 29171 DSPF_sp_svd_cmplx Iter#: 8 Result Successful order= 4x 5 natC: 121316 optC: 27559 DSPF_sp_svd_cmplx Iter#: 9 Result Successful order= 2x 4 natC: 79745 optC: 23870 DSPF_sp_svd_cmplx Iter#: 10 Result Successful order=16x16 natC: 5632326 optC: 1916451 DSPF_sp_svd_cmplx Iter#: 11 Result Successful order=32x32 natC: 31444014 optC: 12341916 DSPF_sp_svd_cmplx Iter#: 12 Result Successful order=64x64 natC: 208170164 optC: 91753342 Memory: 0 bytes Cycles: 12341916 (order=32) 91753342 (order=64) 6. With Release configuration, program execution generates console output below: DSPF_sp_svd_cmplx Iter#: 0 opt decomp: orig=4.000000e+00 3.000000e+00 calc=2.901682e+00 4.097286e+00 error=9.831762e-02 9.728622e-02 opt decomp: orig=6.000000e+00 5.000000e+00 calc=5.133464e+00 5.860123e+00 error=1.334643e-01 1.398768e-01 opt decomp: orig=1.000000e+01 9.000000e+00 calc=8.964866e+00 1.002669e+01 error=3.513432e-02 2.669334e-02 opt decomp: orig=1.200000e+01 1.100000e+01 calc=1.097500e+01 1.203735e+01 error=2.500439e-02 3.734875e-02 opt decomp: orig=1.600000e+01 1.500000e+01 calc=1.502695e+01 1.597042e+01 error=2.695179e-02 2.958012e-02 opt decomp: orig=0.000000e+00 0.000000e+00 calc=2.897382e-03 -1.086049e-03 error=2.897382e-03 1.086049e-03 Result Failure order= 3 DSPF_sp_svd_cmplx Iter#: 1 opt decomp: orig=3.000000e+00 4.000000e+00 calc=4.038859e+00 2.964571e+00 error=3.885889e-02 3.542852e-02 opt decomp: orig=7.000000e+00 8.000000e+00 calc=8.020570e+00 6.980567e+00 error=2.056980e-02 1.943254e-02 opt decomp: orig=1.100000e+01 1.200000e+01 calc=1.200228e+01 1.099657e+01 error=2.282143e-03 3.433228e-03 opt decomp: orig=1.500000e+01 1.600000e+01 calc=1.598400e+01 1.501256e+01 error=1.600456e-02 1.256084e-02 Result Failure order= 4 DSPF_sp_svd_cmplx Iter#: 2 Result Successful order= 3x 3 natC: 114729 optC: 24938 DSPF_sp_svd_cmplx Iter#: 3 Result Successful order= 3x 3 natC: 114739 optC: 24998 DSPF_sp_svd_cmplx Iter#: 4 Result Successful order= 3x 3 natC: 112961 optC: 24774 DSPF_sp_svd_cmplx Iter#: 5 opt decomp: orig=0.000000e+00 0.000000e+00 calc=-5.656855e+00 -5.656852e+00 error=5.656855e+00 5.656852e+00 Result Failure order= 5 DSPF_sp_svd_cmplx Iter#: 6 Result Successful order= 8x 5 natC: 341907 optC: 63431 DSPF_sp_svd_cmplx Iter#: 7 Result Successful order= 5x 5 natC: 43460 optC: 11813 DSPF_sp_svd_cmplx Iter#: 8 opt decomp: orig=0.000000e+00 0.000000e+00 calc=-5.656855e+00 -5.656852e+00 error=5.656855e+00 5.656852e+00 Result Failure order= 4 DSPF_sp_svd_cmplx Iter#: 9 Result Successful order= 2x 4 natC: 63173 optC: 11494 DSPF_sp_svd_cmplx Iter#: 10 opt decomp: orig=5.345317e-01 3.086337e-01 calc=6.736217e-01 -9.613143e-02 error=3.649880e-01 6.306632e-01 opt decomp: orig=1.717277e-01 9.476302e-01 calc=-1.710862e-02 9.526858e-01 error=9.647388e-01 7.809581e-01 opt decomp: orig=2.264168e-01 7.022309e-01 calc=9.213583e-01 1.006981e+00 error=2.191274e-01 7.805644e-01 opt decomp: orig=1.246986e-01 4.947661e-01 calc=7.281303e-01 7.588238e-02 error=2.333643e-01 4.881626e-02 opt decomp: orig=3.896298e-01 8.389539e-02 calc=9.840196e-01 6.467397e-01 error=9.001242e-01 2.571099e-01 opt decomp: orig=3.680532e-01 2.772301e-01 calc=3.895895e-01 -1.534027e-01 error=1.123593e-01 5.214559e-01 opt decomp: orig=5.353862e-01 9.834590e-01 calc=1.091957e+00 1.232446e+00 error=1.084980e-01 6.970596e-01 opt decomp: orig=6.464736e-01 7.656789e-01 calc=9.002686e-03 4.127785e-01 error=7.566762e-01 2.336951e-01 opt decomp: orig=7.802362e-01 7.671438e-01 calc=6.346341e-01 -1.226493e-01 error=1.325097e-01 9.028854e-01 opt decomp: orig=1.519211e-01 8.229621e-01 calc=6.530185e-01 3.441887e-01 error=1.699436e-01 1.922676e-01 opt decomp: orig=3.146763e-01 6.254768e-01 calc=-1.995233e-01 5.356525e-01 error=8.250001e-01 2.209761e-01 opt decomp: orig=9.172033e-01 3.469039e-01 calc=5.908008e-01 6.356075e-01 error=2.438969e-01 2.815958e-01 opt decomp: orig=4.011658e-01 5.197607e-01 calc=1.008279e+00 3.441169e-01 error=4.885182e-01 5.704895e-02 opt decomp: orig=7.854244e-01 6.067690e-01 calc=7.777394e-01 7.605299e-01 error=1.709704e-01 2.489448e-02 opt decomp: orig=8.699301e-01 9.315470e-01 calc=9.965378e-01 5.573298e-01 error=6.499082e-02 3.126003e-01 opt decomp: orig=5.818964e-01 7.584155e-01 calc=6.268492e-01 1.052595e+00 error=1.315663e-01 4.706985e-01 opt decomp: orig=3.556322e-01 3.892331e-01 calc=8.805515e-01 1.533649e+00 error=4.913184e-01 1.178017e+00 opt decomp: orig=8.269295e-01 2.002319e-01 calc=5.081270e-01 5.852357e-01 error=3.078950e-01 2.416938e-01 opt decomp: orig=4.635151e-01 4.159063e-01 calc=3.925760e-01 -2.755134e-01 error=2.333024e-02 7.390285e-01 opt decomp: orig=1.264382e-01 9.791864e-01 calc=5.740923e-01 -4.270523e-01 error=4.050940e-01 5.534905e-01 opt decomp: orig=9.584643e-01 2.126224e-01 calc=7.625121e-01 7.130163e-01 error=5.498896e-01 2.454481e-01 opt decomp: orig=4.090396e-01 7.374798e-01 calc=9.346265e-02 5.518734e-03 error=6.440172e-01 4.035209e-01 opt decomp: orig=7.578967e-01 7.801141e-01 calc=5.286452e-01 6.019846e-01 error=2.514690e-01 1.559120e-01 opt decomp: orig=2.807703e-02 9.568468e-01 calc=6.185585e-01 -1.667596e-01 error=3.382883e-01 1.948366e-01 opt decomp: orig=7.569506e-01 3.187353e-01 calc=1.322550e-01 5.682697e-01 error=1.864803e-01 1.886809e-01 opt decomp: orig=5.895566e-01 2.429884e-01 calc=4.547104e-01 8.814226e-01 error=2.117220e-01 2.918661e-01 opt decomp: orig=9.560533e-01 4.339732e-02 calc=5.732488e-01 1.535505e+00 error=5.298515e-01 5.794517e-01 opt decomp: orig=5.935850e-02 3.191321e-01 calc=4.168135e-02 -2.175532e-01 error=2.774507e-01 2.769117e-01 opt decomp: orig=9.150364e-01 4.418775e-01 calc=3.879997e-01 8.951935e-01 error=5.387777e-02 1.984298e-02 opt decomp: orig=1.188391e-01 5.722526e-01 calc=5.535980e-01 2.873653e-01 error=1.865453e-02 1.685263e-01 opt decomp: orig=2.367321e-01 4.958647e-01 calc=3.929330e-01 2.839139e-02 error=1.029318e-01 2.083407e-01 opt decomp: orig=4.060793e-01 4.769738e-01 calc=5.255923e-01 7.101768e-01 error=4.861856e-02 3.040975e-01 opt decomp: orig=4.269539e-01 8.730125e-01 calc=8.175417e-01 8.140490e-01 error=5.547076e-02 3.870951e-01 opt decomp: orig=3.819697e-01 3.582263e-01 calc=2.343700e-01 3.702796e-01 error=1.238563e-01 1.169002e-02 .... .... .... Result Failure order=64 Memory: 0 bytes Cycles: 3709381 (order=32) 27881787 (order=64)
Thanks for your help.
JianzhongDSPF_sp_fftSPxSP.zip