PROCESSOR-SDK-AM57X: Example to train CNN network on tensorflow and run using TIDL

Manu Iyengar

Hi,

I'm trying to convert a tensorflow model to TIDL and run it on a AM57x processor. Towards this, I followed the instructions shared in this thread to convert the model and the model got converted.

For simplicity, I have used the network in the TensorflowExample (cifar10_train.py) itself as reference. Only difference in my model is that I'm using MNIST dataset for training and testing instead of CIFAR-10 dataset. I have validated the results of my modified network on tensorflow and so, I expect it to work with TIDL also (given the network in TensorflowExample is validated on TIDL). However, when I tried to run the application using the converted model, I get the following issues -

1. Application terminates with segfault (core dump analysis below) if the target device type is EVE.

(gdb) bt
#0 0xb663b522 in std::local_Rb_tree_decrement (__x=0x5714e0)
at /home/tcwg-buildslave/workspace/tcwg-make-release/label/docker-trusty-amd64-tcwg-build/target/arm-linux-gnueabihf/snapshots/gcc-linaro-6.2-2016.11/libstdc++-v3/src/c++98/tree.cc:98
#1 0xb6d75c5e in Coal::Object::Object(Coal::Object::Type, Coal::Object*) () from /usr/lib/libOpenCL.so.1
#2 0xb6d6f91a in Coal::Event::Event(Coal::CommandQueue*, Coal::Event::Status, unsigned int, _cl_event* const*, int*) () from /usr/lib/libOpenCL.so.1
#3 0xb6d72820 in Coal::KernelEvent::KernelEvent(Coal::CommandQueue*, Coal::Kernel*, unsigned int, unsigned int const*, unsigned int const*, unsigned int const*, unsigned int, _cl_event* const*, int*) () from /usr/lib/libOpenCL.so.1
#4 0xb6d72cca in Coal::TaskEvent::TaskEvent(Coal::CommandQueue*, Coal::Kernel*, unsigned int, _cl_event* const*, int*) () from /usr/lib/libOpenCL.so.1
#5 0xb6d6d51a in clEnqueueTask () from /usr/lib/libOpenCL.so.1
#6 0x0001d8ea in tidl::Kernel::RunAsync() ()
#7 0x0001b0f2 in tidl::ExecutionObject::ProcessFrameStartAsync() ()
#8 0x00014b38 in RunConfiguration (config_file=..., num_devices=num_devices@entry=1, device_type=device_type@entry=tidl::DeviceType::DLA, format=format@entry=0, input_file=...) at main.cpp:229
#9 0x000137a2 in main (argc=3, argv=0xbeb1dc84) at main.cpp:116

2. Application terminates with segmentation fault if target device type is DSP and number of devices configured is 2 instead of 1

ERROR: [ Line: 312] CL_INVALID_PROGRAM_EXECUTABLE

core dump analysis:

(gdb) bt
#0 0xb6d3fd02 in std::_Rb_tree<Coal::Object*, Coal::Object*, std::_Identity<Coal::Object*>, std::less<Coal::Object*>, std::allocator<Coal::Object*> >::_M_erase(std::_Rb_tree_node<Coal::Object*>*) () from /usr/lib/libOpenCL.so.1
#1 0xbee35614 in ?? ()

Is there some step that I'm missing?

Regards,
Manu

over 5 years ago

0 RonB over 5 years ago

TI__Mastermind 30276 points

Manu, our expert on this is traveling and the response to this issue may be delayed as a result. I'm sorry for any inconvenience.

0 Ajay Jayaraj over 5 years ago in reply to RonB

TI__Expert 3170 points

Manu,

Can you share the following information? This will help us reproduce the problem and analyze the issues you are seeing.
* Version of Processor Linux SDK used to run the example
* Configuration file used for inference
* Sample input and
* Imported bin files (two files: *net*.bin, *param*.bin)

Thanks,
Ajay

0 Manu Iyengar over 5 years ago in reply to Ajay Jayaraj

Intellectual 590 points

Hi Ajay,

Thanks for getting back on my query. Actually, I have made some progress towards this.

After some investigation, I figured out that the crash was coming due to the type of images used for training / evaluating the network (my network uses grayscale images, unlike other TIDL examples which use RGB images).
This was not handled correctly in the application and was resulting in crash. I have corrected this now and observed that application does not crash (tested only with DSP and not with EVE)

I'm now facing another problem. The results that I'm getting do not match expected result (snapshot of the result below)

Input: digit7.png => this is input image containing the digit '7'
frame[ 0]: Time on DSP0: 43.01 ms, host: 44.12 ms API overhead: 2.53 %
1: 2
2: 4
3: 3
4: 7
5: 1

As you can see, although expected result was '7', it is not the best match (not even in the top 3). However, the result is correct when tested using Tensorflow.

Please find the attached file (below) containing imported files, sample input and config file for your reference.
Processor Linux SDK version: v5.01.00.11. Do let me know if you need any other details.

It would be helpful if you can throw some light on what might be going wrong.

/cfs-file/__key/communityserver-discussions-components-files/791/4505.sample.zip

Regards,
Manu

0 Ajay Jayaraj over 5 years ago in reply to Manu Iyengar

TI__Expert 3170 points

Manu,

Thanks for the update and artifacts to reproduce the issue. We've also run into incorrect results with the mnist dataset on our end and are investigating. Will post an update as soon as we discover what is causing the failure.

Ajay

0 Manu Iyengar over 5 years ago in reply to Ajay Jayaraj

Intellectual 590 points

Hi Ajay,

Thanks for the information. Look forward to your update on this.

Regards,
Manu

0 Manisha Agrawal over 5 years ago in reply to Manu Iyengar

TI__Mastermind 22386 points

Hi Manu,

I am closing this thread for now. Will update here once we root cause the failure.

0 Manisha Agrawal over 5 years ago in reply to Manisha Agrawal

TI__Mastermind 22386 points

Hi Manu,

Reported issues has been fixed and the fix will be available in next Processor SDK release (version 5.2) scheduled this month end. Please watch out for the release. In this release, MNIST example is also provided in the same directory as the other TIDL examples.

Regards,
Manisha

Processors

Processors forum

PROCESSOR-SDK-AM57X: Example to train CNN network on tensorflow and run using TIDL