Skip to content

May 1, 2012

How to measure OpenCL Kernel execution time

I need to be able to measure Kernel execution time to validate some options. For a long long Kernel you may use wallclock, but it’s not the right way to do it. There are few steps to measure accurately the Kernel execution time:

Create Queue with Profiling enabled
command_queue = clCreateCommandQueue(context, devices[deviceUsed], CL_QUEUE_PROFILING_ENABLE, &err);

Ensure to have executed all enqueued tasks
clFinish(command_queue);

Launch Kernel linked to an event
err = clEnqueueNDRangeKernel(command_queue, kernel, 1, NULL, workGroupSize, NULL, 0, NULL, &event);

Ensure kernel execution is finished
clWaitForEvents(1 , &event);

Get the Profiling data
cl_ulong time_start, time_end;
double total_time;

clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_START, sizeof(time_start), &time_start, NULL);
clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_END, sizeof(time_end), &time_end, NULL);
total_time = time_end - time_start;
printf("\nExecution time in milliseconds = %0.3f ms\n", (total_time / 1000000.0) );

That’s it :)

Read more from OpenCL, OpenCL Bench

Comments are closed.