CCS: TDA4x: Layer dumps size issue

Vyom Mishra1

Genius 4590 points

Tool/software: Code Composer Studio

Dear Sir,

Observation: For TDA2x:

Resolution is 512x512x3

Layer dump: 0(Data layer)trace_dump_0_512x512.zip

Size: 512x512x3 = 786432 bytes

Layer 0 visualized through YUView is below:

Observation: For TDA4x:

Resolution is 512x512x3

Layer dump: 0(Data layer)caffe_tidl_infer_msi_mobilenet_pd.txt_0000_00003_00512x00512.zip

Size: 512x512x3 (x2)extra is observed = 1572864 bytes

Layer 0 visualized through YUView is below:

For TDA4x:

Why the layer dumps has the double size of the input(512x512x3)?

How to confirm that input to the model is correct?

Kindly do the needful.

Thanks and Regards,

Vyom Mishra

over 5 years ago

0 Subhajit Paul over 5 years ago

TI__Expert 7015 points

Is the featureParamBits set to 16? that might lead to double buffer size for a tensor.

Let me have a look at the files you have sent and get back to you

0 Vyom Mishra1 over 5 years ago in reply to Subhajit Paul

Genius 4590 points

Dear Sir,

As per the experiment,

"numfeaturebits" and "numparambits" is not affecting the size of the Data Layer dump(.y).

It is constant as 512x512x3 (x2)extra.

Thanks and Regards,

Vyom Mishra

0 Vyom Mishra1 over 5 years ago in reply to Subhajit Paul

Genius 4590 points

Gentle Reminder!

0 Subhajit Paul over 5 years ago in reply to Vyom Mishra1

TI__Expert 7015 points

Vyom Mishra,

I am attaching an image. Can you verify that this is your intended output? With the black pads in the top and bottom rows?

This is a 512 x 512 image with size = 786432 bytes and it is derived from the file that you attached.

In your file, each feature is 16 bits wide, and I had to run the following code to get the correct image

for(i = 0; i < 512 * 512; i++) {
new_buffer[i] = ((uint16_t*)buffer)[i];
new_buffer[512 * 512 + i] = ((uint16_t *)buffer)[512 * 512 + i];
new_buffer[512 * 512 * 2 + i] = ((uint16_t *)buffer)[512 * 512 * 2 + i];
}

https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/81/caffe_5F00_tidl_5F00_infer_5F00_msi_5F00_mobilenet_5F00_pd.txt_5F00_0000_5F00_00003_5F00_00512x00512_2D00_modified.y

- Subhajit

0 Vyom Mishra1 over 5 years ago in reply to Subhajit Paul

Genius 4590 points

Dear Sir,

Now the image visualized is correct with black pads in the top and bottom rows.

Can you please let us know where these changes have to be updated to get the correct trace dumps output so that we can do the layer-by-layer matching.

I have several queries regarding the same,

With ref. to https://e2e.ti.com/support/processors/f/791/t/876743#pi320966=2 post (Parallel query running for the same Padded model)

we are getting some detections, some are missed but no FP's with below configuration:

numparambits = numfeaturebits = 8 and quantizationstyle =3

but by increasing the above parameters to 12/16 we are getting detections with FP's.

Is this increase in the above parameters(from 8 to 12/16) can be a reason for the FP's?

How to tackle this FP's issue?

Kindly do the needful.

Thanks and Regards,

Vyom Mishra

0 kumar.desappan over 5 years ago in reply to Vyom Mishra1

TI__Mastermind 22145 points

Vyom Mishra,

Can you share your import and infer config files.

Looks like the model is imported for 16-bit flow with the trace that you have shared.

0 Vyom Mishra1 over 5 years ago in reply to kumar.desappan

Genius 4590 points

Dear Sir,

I am sharing you the import config file and infer config file for your reference:

Fullscreen 2018.tidl_import_msi_mobilenet_pd.txt Download

modelType          = 0
inputNetFile       = "/home/vyom/psdk_rtos_auto_j7_06_01_00_15/tidl_j7_01_00_00_00/ti_dl/test/testvecs/models/mando/fvc/od/L1_mob_final/deploy.prototxt"
inputParamsFile    = "/home/vyom/psdk_rtos_auto_j7_06_01_00_15/tidl_j7_01_00_00_00/ti_dl/test/testvecs/models/mando/fvc/od/L1_mob_final/mob.caffemodel"
outputNetFile      = "/home/vyom/psdk_rtos_auto_j7_06_01_00_15/tidl_j7_01_00_00_00/ti_dl/test/testvecs/models/mando/fvc/od/L1_mob_final/tidl_net_msi_mobilenet_pd_l1.bin"
outputParamsFile   = "/home/vyom/psdk_rtos_auto_j7_06_01_00_15/tidl_j7_01_00_00_00/ti_dl/test/testvecs/models/mando/fvc/od/L1_mob_final/tidl_io_msi_mobilenet_pd_l1"
numParamBits = 12
numFeatureBits = 12
quantizationStyle = 2
inDataFormat = 0
inElementType  = 0 
inWidth = 512
inHeight = 512
inNumChannels = 3
perfSimConfig = "../../test/testvecs/config/import/perfsim_base.cfg"
inData = "/home/vyom/psdk_rtos_auto_j7_06_01_00_15/tidl_j7_01_00_00_00/ti_dl/test/testvecs/config/det_bck.txt"
numFrames = 1
postProcType = 2
inFileFormat = 2

Fullscreen 0647.tidl_infer_msi_mobilenet_pd.txt Download

inFileFormat    = 2
postProcType = 2
numFrames   = 1
padInBuffInTB = 1
netBinFile      = "/home/vyom/psdk_rtos_auto_j7_06_01_00_15/tidl_j7_01_00_00_00/ti_dl/test/testvecs/models/mando/fvc/od/L1_mob_final/tidl_net_msi_mobilenet_pd_l1.bin"
ioConfigFile    = "/home/vyom/psdk_rtos_auto_j7_06_01_00_15/tidl_j7_01_00_00_00/ti_dl/test/testvecs/models/mando/fvc/od/L1_mob_final/tidl_io_msi_mobilenet_pd_l11.bin"
outData =   "testvecs/output/msi_mobilenet.bin"
inData  =   "testvecs/config/det_bck.txt"
debugTraceLevel = 1
writeTraceLevel = 3
numFrames = 1

Just informing you again regarding the size of the first layer dump:

a) numparambits=numfeaturebits=8

- It has the correct size of first layer dump i.e., 512x512x3

b) numparambits=numfeaturebits=12/16

- It has the size: 512x512x3x2

So the traces shared earlier were of

numparambits=numfeaturebits=12 only.

I am requesting you to share the changes made to generate the correct layer dumps which were made by Shubhajit.

As we need to compare the PC and Import tool dumps.

Thanks and Regards,

Vyom Mishra

0 Subhajit Paul over 5 years ago in reply to Vyom Mishra1

TI__Expert 7015 points

Vyom,

For numParambits > 8 or numFeatureBits > 8, it is going to use W x H x 3 x 2 bytes, as the elements are 2 bytes (16 bits)

- Subhajit

0 Vyom Mishra1 over 5 years ago in reply to Subhajit Paul

Genius 4590 points

Dear Sir,

As per the experiments,

I am getting WxHx3X2 for below (b) configuration only

a) if numparambits=numfeaturebits=8

- It has the correct size of first layer dump i.e., 512x512x3

b) if numparambits=numfeaturebits=12/16

- It has the size: 512x512x3x2

Thanks and Regards,

Vyom Mishra

0 kumar.desappan over 5 years ago in reply to Vyom Mishra1

TI__Mastermind 22145 points

Refer below page for debugging accuracy mismatch issues.

Comparing the input tensor to TIDL with Reference

It is important to match the input tensor to TIDL net with the input tensor of a network which was trained.
Save the input tensor from the training code that you are using in float format.
Use writeTraceLevel = 3 to write the layer level traces from the TIDL to files.
By default the data normalizing batchNorm layer is merged to following convolution layer. So set foldPreBnConv2D = 0 to avoid this.
Compare the output of this batchNorm layer with input tensor from training code. Refer Link

http://software-dl.ti.com/jacinto7/esd/processor-sdk-rtos-jacinto7/latest/exports/docs/tidl_j7_01_01_00_10/ti_dl/docs/user_guide_html/md_tidl_fsg_steps_to_debug_mismatch.html

0 Vyom Mishra1 over 5 years ago in reply to kumar.desappan

Genius 4590 points

Dear Sir,

We have followed the suggestion to set "foldPreBnConv2D = 0" but results were the same as before i.e., multiple boxes on a single object.

We have also compared the output of this batchNorm layer with input tensor and results can be found in the below folder for your reference:

TI Query.zip

Observations:

Data layer and Batch norm layer output data for both PC/Import tool matches.

Kindly provide the feedback for the same.

I have a query regarding the "Feature Map Scale Analysis"

As observed the min/max value for the layer-0(data layer) and layer-1(batch norm) has the values greater than 32. which means these layer only has the maximum quantization loss. FY reference please find the console output in the zip file shared above.

But,

As observed earlier and we followed the suggestion of "Weights Quantization statistic Analysis"

we found and shared you that convolution(dw) i.e., depthwise has the maximum Quantization loss.

Above two conclusions are different. Kindly help us to understand if we have misunderstood it.

Kindly do the needful.

Thanks and Regards,

Vyom Mishra

0 Vyom Mishra1 over 5 years ago in reply to kumar.desappan

Genius 4590 points

Dear Sir,

We are trying to do Layer-by-Layer matching between Target and PC.

Below are the visual observations till now :

a) Pooling layer output from the board side is matching with PC

b) Out of the first 5 convolutions, first convolution layer output doesn’t match with PC but others four convolution output matched with PC

c) Detection Output layer( final layer of the model)

- Target side: we have two Bounding boxes out of which one box matches with PC whereas another box is not matching as it is multiple bounding boxes on the same object.

What could be a possible reason for point (a)?

Kindly do the needful.

Thanks and Regards,

Vyom Mishra

0 kumar.desappan over 5 years ago in reply to Vyom Mishra1

TI__Mastermind 22145 points

layer level traces of PC and target execution is supposed to match (for the same input)

Is the trace of input tensor (data ID 0) matching if yes, can you share the sample model (Layers with issue) to reproduce the issue at our end

0 Sankalp Kallakuri22 over 5 years ago in reply to kumar.desappan

Intellectual 980 points

The model has been shared to Sujith and Kartik,

We have made certain changes in the aspect ratios of the prior boxes. These changes are present in the deploy.prototxt. We are not sure if these changes are being taking foward into the model after running the import tool.

Following is a snippet of the deploy.prototxt showing the kind of aspect ratio values for prior boxes

layer {

name: "ctx_output1/sep/relu_mbox_priorbox"

type: "PriorBox"

bottom: "ctx_output1/sep"

bottom: "data"

top: "ctx_output1/sep/relu_mbox_priorbox"

prior_box_param {

min_size: 35.0

max_size: 109.800003052

aspect_ratio: 0.5

aspect_ratio: 0.333333343267

Regards,

Sankalp

0 kumar.desappan over 5 years ago in reply to Sankalp Kallakuri22

TI__Mastermind 22145 points

Hi Sankalp,

Can you share one input image and expected output from caffe for the same (Along with layer level float tensor from caffe).

We will try to reproduce this issue at our end using the model that you have shared.

Regarding the aspect_ratio, Import tool is expected to read these aspect ratios from deploy.prototxt during import

Regards,

Kumar.D

0 kumar.desappan over 5 years ago in reply to kumar.desappan

TI__Mastermind 22145 points

Hi Sankalp,

I have also noticed below in the model that you have shared.

layer {
name: "ctx_output2/sep/relu_mbox_priorbox"
type: "PriorBox"
bottom: "ctx_output2/sep"
bottom: "data"
top: "ctx_output2/sep/relu_mbox_priorbox"
prior_box_param {
min_size: 109.800003052
max_size: 184.600006104
aspect_ratio: 0.5
aspect_ratio: 0.40000000596
aspect_ratio: 0.001
aspect_ratio: 0.285714298487
flip: true
clip: false
variance: 0.10000000149
variance: 0.10000000149
variance: 0.20000000298
variance: 0.20000000298
offset: 0.5
}
}

This is means aspect raion of 1:1000. Is this expected?

Regards,

Kumar.D

Code Composer Studio™︎

Code Composer Studio forum

CCS: TDA4x: Layer dumps size issue

Comparing the input tensor to TIDL with Reference