Hello,
I would like to know the difference between file *.lib and *.ae66. Could you tell me?
And, How does use the *.ae66 at a project for the CCS v5?
I add the tsu_a.ae66 and the tsu_c.ae66 into the project, however when I compile the project, have the following errors:
I have add the source and header files into the project as followed:
Thank you.
They're both extensions for Library files. .ae66 is used for ELF format and .lib is used for COFF format in general to identify them separately.
Best Regards,
Chad
------------------------------------------------------------------------------------------------------------
Please click the Verify Answer button on this post if it answers your question.
Hi tianxing,
It looks like you are using the TSU component from MCSDK Video 2.0. If that is true, please define tsuContext (and its entries) in your application to address the last linking error.
The first two linking errors can be due to project settings. If it is possible, please provide the complete compilation log or the CCS project so that we can take a look.
Thanks,
Hongmei
Thank you, I have resolved the question.
Hi Hongmei,
I used the TSU component from MCSDK Video 2.0. However, I found it will consume too much time.
Could you tell me the performance of the TSU component.
The number of cycles consumed by TSU depends on input/output resolutions, as well as memory/cache configuration for the application.
Below please find the number of Million cycles taken by each frame in our benchmarking.
These numbers are obtained with:
L1D cache: 32K
L1P cache: 32K
L2 cache: 64K
DDR: cache enabled; pre-fetch enabled.
TSU scratch: placed in local L2
Program: placed in MSMC
We are also optimizing the cycle performance of TSU. The optimized TSU will be packaged in the next MCSDK Video release.
Thanks, Hongmei.
I find the TSU have two algorithm for the interpolation, one is based on the bicubic algorithm and another is based on the polyphase algorithm. What's the differences between them in performance.
And I have tried them without memory/cache configuration. The result is not satisfactory. Could you provide an example for us.
In the component location of tsu, I don't find the datasheet about the benchmarking and more information.
I have some others questions about tsu in the forum threads below. Could you give me some advice?
http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/t/198056.aspx
Thank you for your help again.
Tianxing
Hi Tianxing,
Attached below please find our optimized TSU (CPU copy replaced with EDMA transfters) along with TSU unit test application. Please unzip it <mcsdk_video_2_0_0_10_install_dir>\components\ti\mas\tsu and try it out.
1830.tsu.zip
Thank you for your reply.
I have build the project and can execute it successful for the image's resolution is 176*144, if my image's resolution is 1920*1088, what's the value of SIU_TSU_SCRATCH_SIZE? I don't know the connective between the scratch size and the resolution of image.
When the resolution of image is 1920*1088, I can't read the file successful, the program will dead in line 339 of main.c. My file as follow:
4760.yuv420_1080p.rar
For more, I have some questions about the use of EDMA3. You used the ECPY APIs, how can I get the datasheet about it. What should I do if I want to use the EDMA3?
Tianxing.
Glad to know that you can build and run the TSU unit test application.
The TSU unit test provides two ways for data IO: 1) fread and fwrite; 2) read input data from pre-loaded DDR (starting from 0x85000000) and also save output data in DDR (starting from 0x88000000). For testing HD, e.g, 1920x1088 as you tried, please use method 2 to avoid slow fread and fread as follows:
1) Use READ_INPUT_FROM_DDR2 in Line 8) of tsu\test\testVecs\config\testVecs.cfg
2) Pre-load input YUV to 0x85000000 through "Memory Browser"
3) Run .out file
4) Save output YUV from 0x88000000 to PC through "Memory Browser"
The program when running with 1920x1088 is not getting stuck. Instead, it's reading the input and it can take ~18 minutes to read a single 1920x1088 frame when using XDS560v2. If you are using XDS100 USB emulator, it will take even much longer.
SIU_TSU_SCRATCH_SIZE in unit test has the same value as TSU_SCRATCH_SIZE in tsu\src\tsuinit.c. Currently this scratch size is defined for supporting up to 1920x1088. Cross check on the scratch size is in tsu\src\polyphase\tsuPolyphaseScaling.c: line 374-376.
As for your question about EDMA, the optimized TSU is using ECPY/RMAN/IRES modules from framework components to achieve EDMA based data transfers. Underneath, it still uses the EDMA3 peripheral on C6678. Hope this clarifies. For details of ECPY/RMAN/IRES, please refer to link of framework components @ http://software-dl.ti.com/dsps/dsps_public_sw/sdo_sb/targetcontent/fc/index.html.
According to your instruction I have executed the project successful, and I try the resolution of 1920*1088 to 720*480, it consume 6784480 cycles in average. Thank you very much for your help.
I have a question for the TSU, now the TSU only support the 1080p resolution, however the resolution of my image is 2432*2048, what should I do if I want to implement resize the image to other resolution, for example 1080p, D1 and so on.
I have modified the code as follow:
#define GG_TSU_BLOCK_SIZE 24064 --> #define GG_TSU_BLOCK_SIZE 35840#define IN_OUT_SIZE 3133440 --> #define IN_OUT_SIZE 7471104
I don't know how modify the SIU_TSU_SCRATCH_SIZE. I tried modify the value of SIU_TSU_SCRATCH_SIZE, however it didn't execute successful. I want to know if the size of scratch have upper limit. If it is association with that scratch placed in local L2.
The yuv data of image as follow:
3326.yuv.rar
For more, I have some questions.
1. What's the role of the tsuContext, if it is used only in the algorithm of polyphase filter?
2. Why should use the EDMA3 or memcpy in the program, if it is used only in the algorithm of polyphase filter?
Thank you for your instruction in these days again, it's so helpful for us.
To use TSU for resolutions higher than 1080p, we need to make changes in TSU source code and then recompile TSU libs. The following defines in tsu\src\bicubic\tsuCubic.h need to be increased for higher resolutions:
#define MAX_SIZE_X 1920#define MAX_SIZE_Y 1088
The steps of recompiling TSU libs:
1) In command window, go to dsp\mkrel, and then run "setupenvMsys.bat bypass" (as for sv01or sv04 described in http://processors.wiki.ti.com/index.php/MCSDK_VIDEO_2.0_Getting_Started_Guide#Set_up_environment_variables)
2) go to TSU directory: bash-3.1$ cd ../../components/ti/mas/tsu
3) Run xdc command to rebuild: bash-3.1$ xdc XDCARGS="c66le_elf src"
Your changes for GG_TSU_BLOCK_SIZE and IN_OUT_SIZE are good. For SIU_TSU_SCRATCH_SIZE, you can start with a big number, say "#define SIU_TSU_SCRATCH_SIZE 177824". As there is cross check in tsu\src\polyphase\tsuPolyphaseScaling.c, exception will be reported if this large size is still not large enough. If no exception is reported, the actual scratch size can be found by recording the maximal value of (store_index + prev_pos) as used in the cross check. You can use a global variable to record this maximal usage in tsuPolyphaseScaling.c, recompile TSU lib and unit test, and then find its value in watch window after transizing is completed.
/* Cross check on size of the TSU scratch */ if( (store_index + prev_pos) > tsuContext.scratchSize) { tsu_exception(instId, TSU_EXC_UNEXPECTED_ERROR); }
With above changes, I tried your 2432*2048 YUV input and it can be transized to 720p successfully.
Thank you for your help, I have execute the project successful, and implement the 2432*2048 --> 1920*1088, it consume 3185471 cycles in average, thank you very much.
For more, I have some questions for the program.
1. What's the mean of the tsuContext.alloc, tsuContext.free, tsuContext.availCoef, tsuContext.coeffHandle? I can't know how to use the struct of tsuContext. I find the use of the tsuContext.dataCopy and tsuContext.dataWait in the tsuPolyphaseDat.h and it will instead of the DAT_copy and DAT_wait in the tsuPolyphaseScaling.c. However I can't find the coeffHandle, availCoeff, alloc, free in the TSU code, you just init that in the main.c.
2. There are some modifications between your code and early code, for example you modified the struct of tsuContext_t, add the DataCopy and DataWait in it, what's the mean of that.
3. I tried to modify the SIU_TSU_SCRATCH_SIZE to a very large value, for example 2432*2048, however there are some errors when I build the project, as follow:
#define SIU_TSU_SCRATCH_SIZE 77824 --> #define SIU_TSU_SCRATCH_SIZE (2432*2048)
4. If the TSU is compliant with the XDAIS standard? If I set the scratch to L2SRAM, other XDAIS algorithm can set scratch to L2SRAM too?
5. What's the GMP and GMC modules?
6. I used the cubic algorithm, it consume 1798293871 cycles. If it need more modification while I used the cubic algorithm?
Glad that we can help.
tsuContext allows test application (instead of TSU lib) to have control on such items as buffer assignment, memory allocation/free, how to do data copying, and etc. This enables a more generic TSU. The structure of tsuContext_t is defined in tsu.h. Content of tsuContext is supplied from test application (e.g., main.c), including function pointer, base address and size of buffers. The test application also implements the related functions and allocate the related buffers. Internally in TSU, it's just using the function pointers and buffers supplied from the test application. You can search "tsuContext." inside TSU lib to find out how the tsuContext entries are used.
For example, as you pointed out, dataCopy and dataWait are newly added as two entries of tsuContext. This allows application to choose how to do data copying and how to wait until data copying is completed. In test application (main.c), tsuContext.dataCopy is pointing to function siutsu_data_xfer, which implements data copying and application can choose either "EDMA" or "memcpy" for it. For tsuContext.dataWait (siutsu_data_wait()), no actions are needed for memcpy, while wait is needed to complete the data copying with EDMA before the output data is used and/or input data is modified.
As for how to set SIU_TSU_SCRATCH_SIZE, please refer to our earlier post on 07/03 to find the maximal usage. There is no need to over-allocate. As the scratch buffer is allocated from local L2 (tsu\test\ccsProject\linker_c6678.cmd), a very large size which exceeds local L2 will result in linking error as you reported.
TSU is not compliant with XDAIS standard. If it is ensured that TSU and other XDAIS based algorithms will not access the scratch at the same time, you can use the same scratch. If not, you can allocate another scratch from local L2 for other XDAIS based algorithms to use, as long as it can fit in local L2.
GMP and GMC are global memory pool and global memory cell. Implementation details can be found from tsu\test\src\siuVigdkGmp.c and siuVigdkGmp.h.
For cycles with bicubic interpolation, is "1798293871 cycles" collected from your application or the TSU unit test we recently provided? If it's the former, please recollect with TSU unit test. Cache settings can largely affect the cycle performance.
Tianxing,
Bicubic interpolation is not hand-optimized for DSP, where as polyphase filter is optimized w/ scheduled assembly. Please use polyphase for your application.
Regards,
Vivek
Thank you for your reply, it is so useful for us.