TDA4VM: J721EXSKG01EVM

Nikolay Chernuha

Prodigy 10 points

Part Number: TDA4VM

Hello,

Does TDA4MV sdk support OpenCL?

If not, how do you suggest to perform image processing/ matrix manipulation the most efficient way?

Thanks,

Nikolay.

over 1 year ago

0 Erick Narvaez over 1 year ago

TI__Mastermind 36307 points

Hello,

Currently, the SDK does not support OpenCL natively. There are ways of building the filesystem to include support for OpenCL for the GPU, but this is still being tested further.

There are dedicated accelerators on our device that can assist in various image processing and matrix multiplication needs. One is the DSPs on chip, which are the C7x.

Can you elaborate what kind of image processing you wish to do? This should help identify what component of the SoC would be best to use.

Regards,

Erick

0 Nikolay Chernuha over 1 year ago in reply to Erick Narvaez

Prodigy 10 points

Hello,

Thanks for the swift answer.

I think that using the accelerators would be a better solution for me.

The operations are:

1. Affine transformations: resize(bi linear), rotation, moving

2. Morphological operations: erode, dilate.

If there are some examples for this - I will gladly accept them.

Thanks,

Nikolay.

0 Erick Narvaez over 1 year ago in reply to Nikolay Chernuha

TI__Mastermind 36307 points

Nikolay,

I'm wondering if you meant OpenCV? It is available, and this would be the most basic way to do these transformations.

But if you want to offload these tasks to other cores, this will require more custom implementations to get this to work. For example, on the GPU, you would need to develop an OpenGL/OpenCL/Vulkan implementation to do this. For other accelerators, we would need to see in what form your image files are in to see if it makes sense exploring those as usually they don't have native Linux interfaces.

Regards,

Erick

+1 Nikolay Chernuha over 1 year ago in reply to Erick Narvaez

Prodigy 10 points

Hey,

Well, I want to get maximum performance, that's why I asked about OpenCL support.

The images I use are uint8 tensors.

How can I compile the kernel in order to use OpenCL? Is there any manual for that or I should prepare for an adventure?

Is there any manual for the dedicated hardware usage?

Thanks,

Nikolay.

0 Nikolay Chernuha over 1 year ago in reply to Erick Narvaez

Prodigy 10 points

Accidentally marked my questions as answers...

+1 Erick Narvaez over 1 year ago in reply to Nikolay Chernuha

TI__Mastermind 36307 points

Nikolay,

The GPU would probably be your best bet right now. I was able to compile and run an OpenCL application on my system. I can share the filesystem I am booting here:

https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/tisdk_2D00_default_2D00_image_2D00_j721e_2D00_evm.tar.xz

It has the header files and the library for OpenCL built-in.

These are my boot images:

https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/3000.sysfw.itb

https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/3000.tiboot3.bin

https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/1452.tispl.bin

https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/5165.u_2D00_boot.img

You could boot your board with these in your boot partition, and the rootfs in the other partition.

I used this example that I found online:

Fullscreen cl_example.c Download

#include <stdio.h>
#include <stdlib.h>
#ifdef __APPLE__
#include <OpenCL/cl.h>
#else
#include <CL/cl.h>
#endif
#define VECTOR_SIZE 1024

//OpenCL kernel which is run for every work item created.
const char *saxpy_kernel =
"__kernel                                   \n"
"void saxpy_kernel(float alpha,     \n"
"                  __global float *A,       \n"
"                  __global float *B,       \n"
"                  __global float *C)       \n"
"{                                          \n"
"    //Get the index of the work-item       \n"
"    int index = get_global_id(0);          \n"
"    C[index] = alpha* A[index] + B[index]; \n"
"}                                          \n";

int main(void) {
  int i;
  // Allocate space for vectors A, B and C
  float alpha = 2.0;
  float *A = (float*)malloc(sizeof(float)*VECTOR_SIZE);
  float *B = (float*)malloc(sizeof(float)*VECTOR_SIZE);
  float *C = (float*)malloc(sizeof(float)*VECTOR_SIZE);
  for(i = 0; i < VECTOR_SIZE; i++)
  {
    A[i] = i;
    B[i] = VECTOR_SIZE - i;
    C[i] = 0;
  }

  // Get platform and device information
  cl_platform_id * platforms = NULL;
  cl_uint     num_platforms;
  //Set up the Platform
  cl_int clStatus = clGetPlatformIDs(0, NULL, &num_platforms);
  platforms = (cl_platform_id *)
  malloc(sizeof(cl_platform_id)*num_platforms);
  clStatus = clGetPlatformIDs(num_platforms, platforms, NULL);

  //Get the devices list and choose the device you want to run on
  cl_device_id     *device_list = NULL;
  cl_uint           num_devices;

  clStatus = clGetDeviceIDs( platforms[0], CL_DEVICE_TYPE_GPU, 0,NULL, &num_devices);
  device_list = (cl_device_id *) 
  malloc(sizeof(cl_device_id)*num_devices);
  clStatus = clGetDeviceIDs( platforms[0],CL_DEVICE_TYPE_GPU, num_devices, device_list, NULL);

  // Create one OpenCL context for each device in the platform
  cl_context context;
  context = clCreateContext( NULL, num_devices, device_list, NULL, NULL, &clStatus);

  // Create a command queue
  cl_command_queue command_queue = clCreateCommandQueue(context, device_list[0], 0, &clStatus);

  // Create memory buffers on the device for each vector
  cl_mem A_clmem = clCreateBuffer(context, CL_MEM_READ_ONLY,VECTOR_SIZE * sizeof(float), NULL, &clStatus);
  cl_mem B_clmem = clCreateBuffer(context, CL_MEM_READ_ONLY,VECTOR_SIZE * sizeof(float), NULL, &clStatus);
  cl_mem C_clmem = clCreateBuffer(context, CL_MEM_WRITE_ONLY,VECTOR_SIZE * sizeof(float), NULL, &clStatus);

  // Copy the Buffer A and B to the device
  clStatus = clEnqueueWriteBuffer(command_queue, A_clmem, CL_TRUE, 0, VECTOR_SIZE * sizeof(float), A, 0, NULL, NULL);
  clStatus = clEnqueueWriteBuffer(command_queue, B_clmem, CL_TRUE, 0, VECTOR_SIZE * sizeof(float), B, 0, NULL, NULL);

  // Create a program from the kernel source
  cl_program program = clCreateProgramWithSource(context, 1,(const char **)&saxpy_kernel, NULL, &clStatus);

  // Build the program
  clStatus = clBuildProgram(program, 1, device_list, NULL, NULL, NULL);

  // Create the OpenCL kernel
  cl_kernel kernel = clCreateKernel(program, "saxpy_kernel", &clStatus);

  // Set the arguments of the kernel
  clStatus = clSetKernelArg(kernel, 0, sizeof(float), (void *)&alpha);
  clStatus = clSetKernelArg(kernel, 1, sizeof(cl_mem), (void *)&A_clmem);
  clStatus = clSetKernelArg(kernel, 2, sizeof(cl_mem), (void *)&B_clmem);
  clStatus = clSetKernelArg(kernel, 3, sizeof(cl_mem), (void *)&C_clmem);

  // Execute the OpenCL kernel on the list
  size_t global_size = VECTOR_SIZE; // Process the entire lists
  size_t local_size = 64;           // Process one item at a time
  clStatus = clEnqueueNDRangeKernel(command_queue, kernel, 1, NULL, &global_size, &local_size, 0, NULL, NULL);

  // Read the cl memory C_clmem on device to the host variable C
  clStatus = clEnqueueReadBuffer(command_queue, C_clmem, CL_TRUE, 0, VECTOR_SIZE * sizeof(float), C, 0, NULL, NULL);

  // Clean up and wait for all the comands to complete.
  clStatus = clFlush(command_queue);
  clStatus = clFinish(command_queue);

  // Display the result to the screen
  for(i = 0; i < VECTOR_SIZE; i++)
    printf("%f * %f + %f = %f\n", alpha, A[i], B[i], C[i]);

  // Finally release all OpenCL allocated objects and host buffers.
  clStatus = clReleaseKernel(kernel);
  clStatus = clReleaseProgram(program);
  clStatus = clReleaseMemObject(A_clmem);
  clStatus = clReleaseMemObject(B_clmem);
  clStatus = clReleaseMemObject(C_clmem);
  clStatus = clReleaseCommandQueue(command_queue);
  clStatus = clReleaseContext(context);
  free(A);
  free(B);
  free(C);
  free(platforms);
  free(device_list);
  return 0;
}

And my build command on the target was: gcc -lOpenCL cl_example.c

There were complaints by ld that it could not find -lOpenCL, so I added a soft link to fix it: ln -s /usr/lib/libOpenCL.so.1 /usr/lib/libOpenCL.so

Let me know if this helps!

Thanks,

Erick

Processors

Processors forum

TDA4VM: J721EXSKG01EVM