Develope codec on DM6446

RexDSP

Hi all,

I want to develop a H.264 baseline encoder on DM6446, and there are questions:

1. Is there a way to get documents about VICP of DM6446?

2. Is it a possibility that develope a H.264 baseline encoder with 720x576 resolution with about 300MHz use of C64x+ of DM6446, and without VICP?

over 16 years ago

0 Juan Gonzales over 16 years ago

TI__Mastermind 37340 points

Hi,

1. Access to VICP is limited to TI third parties writting codecs for general availability and even then you might have a thought time; if you think your company can fit this support model, you will need to work with your local TI

http://focus.ti.com/general/docs/salesrep/salesrep.tsp?DCMP=TIHomeTracking&HQS=Other+OT+home_b_salesreprepresentative

2) This would be very difficult to achieve with 300 MHz, it will likely take 400 MHz or more based our current h.264 tests.

0 Juan Gonzales over 16 years ago in reply to Juan Gonzales

TI__Mastermind 37340 points

Zhouger,

With regards to customer demand to use VICP, we have been working on a library to give customers access to the extra processing power provided by VICP (and it is finally here). This can be found here

http://focus.ti.com/docs/toolsw/folders/print/sprc831.html

While this does not detail all the inner workings of our VICP (NDA thru local selaes rep is still required for this), it will allow you to take advantage of VICP to develop your own codecs.

0 RexDSP over 16 years ago in reply to Juan Gonzales

Prodigy 165 points

Juan Gonzales:

I got it. Thanks a lot!

BR,

Zhouger

0 RexDSP over 16 years ago in reply to RexDSP

Prodigy 165 points

Juan Gonzales:

How can I use 16-bits compress instructions under C6400+ platform?

Is there any options to control the Assembler to product 16-bits compress instructions on the assembling time?

by, zhouger

0 Bernie Thompson TI over 16 years ago in reply to RexDSP

TI__Mastermind 41665 points

Assuming you are using the C compiler, how much of your code uses compressed instructions is determined by your optimization options, in particular the --opt_for_space=n flag with a maximum n of 3 to get the most possible compressed instructions. Note that the more space you save with compression that typically you will have less optimal code from a performance perspective, so the use of compressed instructions must be weighed against your performance needs. These options are discussed in section 3.5 of SPRU187.

0 RexDSP over 16 years ago in reply to Bernie Thompson TI

Prodigy 165 points

Hi, Bernie:

In fact, there may be more than 80% code within our codec is assembly source,

so, I want to know whether we can uses compressed instructions directly in assembly source?

0 Bernie Thompson TI over 16 years ago in reply to RexDSP

TI__Mastermind 41665 points

You certainly can, the assembler will automatically be using compressed/compact instructions as often as possible by default (you can supress them with the --no_compress option).

If you are already writing in assembly than the key is to use instructions that are capible of being compact, the reason that the C compiler will be slower with compact instructions is because you have a limited instruction set for compact/compressed fetch packets. There is a list of compactable instructions in section 3.9.5 of SPRU732, if you code your assembly such that fetch packets only utilize these instructions you should see compressed fetch packets.

0 RexDSP over 16 years ago in reply to Bernie Thompson TI

Prodigy 165 points

Hi Bernie,

I have made a test upon code access efficiency, and the result puzzled me.

I wrote a test function with the size of 32k bytes, and it doesn't have load or store instruction (avoid these 2 instruction's disturb the executing).

I used Timer of DM6446 to record the run time under the following cases:

/************** case 1 **************/

Level 1 program cache = 32KB, Level 2 SRAM = 64KB, Level 2 cache = 0KB, the test function locates in Level 2 SRAM

Result: the first execute time is 15158 core cycles, the rest 100 execution is 15334 core cycles

/************** case 2 **************/

Level 1 program cache = 32KB, Level 2 SRAM = 32KB, Level 2 cache = 32KB, the test function locates in external DDR

Result: the first execute time is 48620 core cycles, the rest 100 execution is 14300 core cycles

/************** my question **************/

So my question is about the later 100 execution time.

Why execution of case 2 is faster than case 1 ? During the later 100 execution, all the function code is really located in Level 1 program cache, which cause the difference?

Thanks in advance!

BR,

Zhouger

Processors

Processors forum

Develope codec on DM6446