Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Execution: failed" when running simple dot product kernel #67

Open
kernhanda opened this issue Mar 27, 2019 · 21 comments
Open

"Execution: failed" when running simple dot product kernel #67

kernhanda opened this issue Mar 27, 2019 · 21 comments
Labels

Comments

@kernhanda
Copy link
Contributor

Using the most recent commits from VC4CLStdLib, VC4C, and VC4CL (while commenting out the offending lines from #66 ), I'm running into a failure when running the code at https://github.com/kernhanda/opencl_dot_product.

To repro:

git clone https://github.com/kernhanda/opencl_dot_product
make
sudo ./dot_product

Output:

$ sudo ./dot_product
On the host, the dot product took 0.013763 seconds.
[VC4CL](    dot_product): API call: void* clGetExtensionFunctionAddressForPlatform(cl_platform_id 0x18bf75c, const char* "clIcdGetPlatformIDsKHR")
[VC4CL](    dot_product): get extension function address: clIcdGetPlatformIDsKHR
[VC4CL](    dot_product): API call: void* clGetExtensionFunctionAddressForPlatform(cl_platform_id 0x18bf75c, const char* "clGetPlatformInfo")
[VC4CL](    dot_product): get extension function address: clGetPlatformInfo
[VC4CL](    dot_product): API call: cl_int clIcdGetPlatformIDsKHR(cl_uint 0, cl_platform_id* 0, cl_uint* 0x7e7e05c8)
[VC4CL](    dot_product): API call: cl_int clIcdGetPlatformIDsKHR(cl_uint 1, cl_platform_id* 0x18a9468, cl_uint* 0)
[VC4CL](    dot_product): API call: cl_int clGetPlatformInfo(cl_platform_id 0x18bf75c, cl_platform_info 2308, size_t 0, void* 0, size_t* 0x7e7e0558)
[VC4CL](    dot_product): API call: cl_int clGetPlatformInfo(cl_platform_id 0x18bf75c, cl_platform_info 2308, size_t 159, void* 0x18c02f8, size_t* 0)
[VC4CL](    dot_product): API call: cl_int clGetPlatformInfo(cl_platform_id 0x18bf75c, cl_platform_info 2336, size_t 0, void* 0, size_t* 0x7e7e0558)
[VC4CL](    dot_product): API call: cl_int clGetPlatformInfo(cl_platform_id 0x18bf75c, cl_platform_info 2336, size_t 6, void* 0x18b7548, size_t* 0)
[VC4CL](    dot_product): API call: cl_int clGetDeviceIDs(cl_platform_id 0x18bf75c, cl_device_type 4, cl_uint 1, cl_device_id* 0x7e7e067c, cl_uint* 0)
[VC4CL](    dot_product): API call: cl_context clCreateContext(const cl_context_properties* 0, cl_uint 1, const cl_device_id* 0x7e9e066c, void(CL_CALLBACK*)(const char* errinfo, const void* private_info, size_t cb, void* user_data) 0x7e7e05a8, void* 0, cl_int* 0x7e9e0660)
[VC4CL](    dot_product): Tracking live-time of object: cl_context
[VC4CL](    dot_product): API call: cl_int clGetDeviceInfo(cl_device_id 0x18bf770, cl_device_info 4100, size_t 4, void* 0x7e9e0668, size_t* 0)
[VC4CL] base=0x3fc00000, mem=0x76feb000
[VC4CL] V3D base: 0x76feb000
[VC4CL](    dot_product): API call: cl_program clCreateProgramWithSource(cl_context 0x18bffa4, cl_uint 1, const char** 0x7e7e0678, const size_t* 0x7e7e0674, cl_int* 0x7e7e066c)
[VC4CL](    dot_product): Tracking live-time of object: cl_program
[VC4CL](    dot_product): API call: cl_int clBuildProgram(cl_program 0x1893d4c, cl_uint 0, const cl_device_id* 0, const char* (null), void(CL_CALLBACK*)(cl_program program, void* user_data) 0x7e7e0650, void* 0)
[VC4CL](    dot_product): Precompiling source with: 
[VC4CL](    dot_product): Dumping program sources to /tmp/vc4cl-source-15489783.cl
[VC4CL](    dot_product): Precompilation complete with status: 0
[VC4CL](    dot_product): Compilation log: [W] Wed Mar 27 07:48:17 2019: Warnings in precompilation:
[W] Wed Mar 27 07:48:17 2019: <stdin>:22:1: warning: null character ignored
<U+0000>
^
1 warning generated.


[VC4CL](    dot_product): Compiling source with: 
[VC4CL](    dot_product): Compilation complete with status: 0
[VC4CL](    dot_product): Dumping program sources to /tmp/vc4cl-binary-1608230074.bin
[VC4CL](    dot_product): API call: cl_kernel clCreateKernel(cl_program 0x1893d4c, const char* "dot_product", cl_int* 0x7e9e0660)
[VC4CL](    dot_product): Tracking live-time of object: cl_kernel
[VC4CL](    dot_product): API call: cl_mem clCreateBuffer(cl_context 0x18bffa4, cl_mem_flags 36, size_t 1048512, void* 0x7e8e0678, cl_int* 0x7e9e0660)
[VC4CL] Mailbox file descriptor opened: 4
[VC4CL](    dot_product): Mailbox buffer before: 00000020 00000000 00030012 00000008 00000004 00000001 00000000 00000000
[VC4CL](    dot_product): Mailbox buffer after: 00000020 80000000 00030012 00000008 80000004 00000000 00000000 00000000
[VC4CL](    dot_product): Mailbox buffer before: 00000020 00000000 00010006 00000008 00000000 00000000 00000000 00000000
[VC4CL](    dot_product): Mailbox buffer after: 00000020 80000000 00010006 00000008 80000008 3b400000 04c00000 00000000
[VC4CL](    dot_product): Mailbox request: succeeded
[VC4CL](    dot_product): Tracking live-time of object: cl_mem
[VC4CL](    dot_product): Mailbox buffer before: 00000024 00000000 0003000c 0000000c 0000000c 000fffc0 00001000 0000000c 00000000
[VC4CL](    dot_product): Mailbox buffer after: 00000024 80000000 0003000c 0000000c 80000004 00000019 00001000 0000000c 00000000
[VC4CL](    dot_product): Mailbox buffer before: 00000020 00000000 0003000d 00000008 00000004 00000019 00000000 00000000
[VC4CL](    dot_product): Mailbox buffer after: 00000020 80000000 0003000d 00000008 80000004 be3a9000 00000000 00000000
[VC4CL] base=0x3e3a9000, mem=0x70d00000
[VC4CL](    dot_product): Allocated 1048512 bytes of buffer: handle 25, device address 0xbe3a9000, host address 0x70d00000
[VC4CL](    dot_product): API call: cl_mem clCreateBuffer(cl_context 0x18bffa4, cl_mem_flags 33, size_t 1048512, void* 0x7e7e06b8, cl_int* 0x7e9e0660)
[VC4CL](    dot_product): Mailbox buffer before: 00000020 00000000 00010006 00000008 00000000 00000000 00000000 00000000
[VC4CL](    dot_product): Mailbox buffer after: 00000020 80000000 00010006 00000008 80000008 3b400000 04c00000 00000000
[VC4CL](    dot_product): Mailbox request: succeeded
[VC4CL](    dot_product): Tracking live-time of object: cl_mem
[VC4CL](    dot_product): Mailbox buffer before: 00000024 00000000 0003000c 0000000c 0000000c 000fffc0 00001000 0000000c 00000000
[VC4CL](    dot_product): Mailbox buffer after: 00000024 80000000 0003000c 0000000c 80000004 0000001a 00001000 0000000c 00000000
[VC4CL](    dot_product): Mailbox buffer before: 00000020 00000000 0003000d 00000008 00000004 0000001a 00000000 00000000
[VC4CL](    dot_product): Mailbox buffer after: 00000020 80000000 0003000d 00000008 80000004 be2a8000 00000000 00000000
[VC4CL] base=0x3e2a8000, mem=0x70c00000
[VC4CL](    dot_product): Allocated 1048512 bytes of buffer: handle 26, device address 0xbe2a8000, host address 0x70c00000
[VC4CL](    dot_product): API call: cl_mem clCreateBuffer(cl_context 0x18bffa4, cl_mem_flags 2, size_t 21844, void* 0, cl_int* 0x7e9e0660)
[VC4CL](    dot_product): Mailbox buffer before: 00000020 00000000 00010006 00000008 00000000 00000000 00000000 00000000
[VC4CL](    dot_product): Mailbox buffer after: 00000020 80000000 00010006 00000008 80000008 3b400000 04c00000 00000000
[VC4CL](    dot_product): Mailbox request: succeeded
[VC4CL](    dot_product): Tracking live-time of object: cl_mem
[VC4CL](    dot_product): Mailbox buffer before: 00000024 00000000 0003000c 0000000c 0000000c 00005554 00001000 0000000c 00000000
[VC4CL](    dot_product): Mailbox buffer after: 00000024 80000000 0003000c 0000000c 80000004 0000001b 00001000 0000000c 00000000
[VC4CL](    dot_product): Mailbox buffer before: 00000020 00000000 0003000d 00000008 00000004 0000001b 00000000 00000000
[VC4CL](    dot_product): Mailbox buffer after: 00000020 80000000 0003000d 00000008 80000004 be2a1000 00000000 00000000
[VC4CL] base=0x3e2a1000, mem=0x76fe5000
[VC4CL](    dot_product): Allocated 21844 bytes of buffer: handle 27, device address 0xbe2a1000, host address 0x76fe5000
[VC4CL](    dot_product): API call: cl_command_queue clCreateCommandQueue(cl_context 0x18bffa4, cl_device_id 0x18bf770, cl_command_queue_properties 0, cl_int* 0x7e9e0660)
[VC4CL](    dot_product): Starting queue handler thread...
[VC4CL](    dot_product): Tracking live-time of object: cl_command_queue
[VC4CL](    dot_product): API call: cl_int clSetKernelArg(cl_kernel 0x1894904, cl_uint 0, size_t 4, const void* 0x7e7e06b4)
[VC4CL](    dot_product): Set kernel arg 0 for kernel 'dot_product' to 0x7e7e06b4 (25770348) with size 4
[VC4CL](    dot_product): Kernel arg 0 for kernel 'dot_product' is float4* 'a_vec' with size 4
[VC4CL](    dot_product): Setting kernel-argument 0 to pointer 0x0x1893960
[VC4CL](    dot_product): API call: cl_int clSetKernelArg(cl_kernel 0x1894904, cl_uint 1, size_t 4, const void* 0x7e7e06b0)
[VC4CL](    dot_product): Set kernel arg 1 for kernel 'dot_product' to 0x7e7e06b0 (25774588) with size 4
[VC4CL](    dot_product): Kernel arg 1 for kernel 'dot_product' is float4* 'b_vec' with size 4
[VC4CL](    dot_product): Setting kernel-argument 1 to pointer 0x0x18949f0
[VC4CL](    dot_product): API call: cl_int clSetKernelArg(cl_kernel 0x1894904, cl_uint 2, size_t 4, const void* 0x7e7e06ac)
[VC4CL](    dot_product): Set kernel arg 2 for kernel 'dot_product' to 0x7e7e06ac (26322148) with size 4
[VC4CL](    dot_product): Kernel arg 2 for kernel 'dot_product' is float* 'output' with size 4
[VC4CL](    dot_product): Setting kernel-argument 2 to pointer 0x0x191a4d8
[VC4CL](    dot_product): API call: cl_int clSetKernelArg(cl_kernel 0x1894904, cl_uint 3, size_t 192, const void* 0)
[VC4CL](    dot_product): Set kernel arg 3 for kernel 'dot_product' to 0 (0) with size 192
[VC4CL](    dot_product): Kernel arg 3 for kernel 'dot_product' is float4* 'partial_dot' with size 4
[VC4CL](    dot_product): Mailbox buffer before: 00000020 00000000 00010006 00000008 00000000 00000000 00000000 00000000
[VC4CL](    dot_product): Mailbox buffer after: 00000020 80000000 00010006 00000008 80000008 3b400000 04c00000 00000000
[VC4CL](    dot_product): Mailbox request: succeeded
[VC4CL](    dot_product): Setting kernel-argument 3 to pointer 0x0
[VC4CL](    dot_product): API call: cl_int clEnqueueNDRangeKernel(cl_command_queue 0x191f64c, cl_kernel 0x1894904, cl_uint 1, const size_t* 0, const size_t* 0x7e9e0664, const size_t* 0x7e9e0668, cl_uint 0, const cl_event* 0, cl_event* 0x7e9e064c)
[VC4CL](    dot_product): Mailbox buffer before: 00000024 00000000 0003000c 0000000c 0000000c 000000c0 00001000 0000000c 00000000
[VC4CL](    dot_product): Mailbox buffer after: 00000024 80000000 0003000c 0000000c 80000004 0000001c 00001000 0000000c 00000000
[VC4CL](    dot_product): Mailbox buffer before: 00000020 00000000 0003000d 00000008 00000004 0000001c 00000000 00000000
[VC4CL](    dot_product): Mailbox buffer after: 00000020 80000000 0003000d 00000008 80000004 be2a0000 00000000 00000000
[VC4CL] base=0x3e2a0000, mem=0x76fe4000
[VC4CL](    dot_product): Allocated 192 bytes of buffer: handle 28, device address 0xbe2a0000, host address 0x76fe4000
[VC4CL](    dot_product): Tracking live-time of object: cl_event
[VC4CL](    dot_product): API call: cl_int clEnqueueReadBuffer(cl_command_queue 0x191f64c, cl_mem 0x191a4e4, cl_bool 1, size_t 0, size_t 21844, void* 0x188e358, cl_uint 0, const cl_event* 0, cl_event* 0)
[VC4CL](VC4CL Queue Han): Running kernel 'dot_product' with 901 instructions...
[VC4CL](VC4CL Queue Han): Local sizes: 12 1 1 -> 12 QPUs
[VC4CL](VC4CL Queue Han): Global sizes: 65532 1 1 -> 5461 work-groups (1 run at once)
[VC4CL](    dot_product): Tracking live-time of object: cl_event
[VC4CL](VC4CL Queue Han): Mailbox buffer before: 00000024 00000000 0003000c 0000000c 0000000c 00002000 00001000 0000000c 00000000
[VC4CL](VC4CL Queue Han): Mailbox buffer after: 00000024 80000000 0003000c 0000000c 80000004 0000001d 00001000 0000000c 00000000
[VC4CL](VC4CL Queue Han): Mailbox buffer before: 00000020 00000000 0003000d 00000008 00000004 0000001d 00000000 00000000
[VC4CL](VC4CL Queue Han): Mailbox buffer after: 00000020 80000000 0003000d 00000008 80000004 be29d000 00000000 00000000
[VC4CL] base=0x3e29d000, mem=0x76fe2000
[VC4CL](VC4CL Queue Han): Allocated 8192 bytes of buffer: handle 29, device address 0xbe29d000, host address 0x76fe2000
[VC4CL](VC4CL Queue Han): Copied 0 bytes of global data to device buffer
[VC4CL](VC4CL Queue Han): Reserving space for 12 stack-frames of 0 bytes each
[VC4CL](VC4CL Queue Han): Copied 7208 bytes of kernel code to device buffer
[VC4CL](VC4CL Queue Han): Setting work-item infos:
	1 dimensions with offsets: 0, 0, 0
	Global IDs (sizes): 0(65532), 0(1), 0(1)
	Local IDs (sizes): 0(12), 0(1), 0(1)
	Group IDs (sizes): 0(5461), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 3191508992
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 3190456320
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 3190427648
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 3190423552
[VC4CL](VC4CL Queue Han): 9 parameters set.
[VC4CL](VC4CL Queue Han): Setting work-item infos:
	1 dimensions with offsets: 0, 0, 0
	Global IDs (sizes): 1(65532), 0(1), 0(1)
	Local IDs (sizes): 1(12), 0(1), 0(1)
	Group IDs (sizes): 0(5461), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 3191508992
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 3190456320
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 3190427648
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 3190423552
[VC4CL](VC4CL Queue Han): 9 parameters set.
[VC4CL](VC4CL Queue Han): Setting work-item infos:
	1 dimensions with offsets: 0, 0, 0
	Global IDs (sizes): 2(65532), 0(1), 0(1)
	Local IDs (sizes): 2(12), 0(1), 0(1)
	Group IDs (sizes): 0(5461), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 3191508992
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 3190456320
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 3190427648
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 3190423552
[VC4CL](VC4CL Queue Han): 9 parameters set.
[VC4CL](VC4CL Queue Han): Setting work-item infos:
	1 dimensions with offsets: 0, 0, 0
	Global IDs (sizes): 3(65532), 0(1), 0(1)
	Local IDs (sizes): 3(12), 0(1), 0(1)
	Group IDs (sizes): 0(5461), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 3191508992
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 3190456320
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 3190427648
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 3190423552
[VC4CL](VC4CL Queue Han): 9 parameters set.
[VC4CL](VC4CL Queue Han): Setting work-item infos:
	1 dimensions with offsets: 0, 0, 0
	Global IDs (sizes): 4(65532), 0(1), 0(1)
	Local IDs (sizes): 4(12), 0(1), 0(1)
	Group IDs (sizes): 0(5461), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 3191508992
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 3190456320
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 3190427648
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 3190423552
[VC4CL](VC4CL Queue Han): 9 parameters set.
[VC4CL](VC4CL Queue Han): Setting work-item infos:
	1 dimensions with offsets: 0, 0, 0
	Global IDs (sizes): 5(65532), 0(1), 0(1)
	Local IDs (sizes): 5(12), 0(1), 0(1)
	Group IDs (sizes): 0(5461), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 3191508992
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 3190456320
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 3190427648
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 3190423552
[VC4CL](VC4CL Queue Han): 9 parameters set.
[VC4CL](VC4CL Queue Han): Setting work-item infos:
	1 dimensions with offsets: 0, 0, 0
	Global IDs (sizes): 6(65532), 0(1), 0(1)
	Local IDs (sizes): 6(12), 0(1), 0(1)
	Group IDs (sizes): 0(5461), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 3191508992
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 3190456320
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 3190427648
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 3190423552
[VC4CL](VC4CL Queue Han): 9 parameters set.
[VC4CL](VC4CL Queue Han): Setting work-item infos:
	1 dimensions with offsets: 0, 0, 0
	Global IDs (sizes): 7(65532), 0(1), 0(1)
	Local IDs (sizes): 7(12), 0(1), 0(1)
	Group IDs (sizes): 0(5461), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 3191508992
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 3190456320
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 3190427648
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 3190423552
[VC4CL](VC4CL Queue Han): 9 parameters set.
[VC4CL](VC4CL Queue Han): Setting work-item infos:
	1 dimensions with offsets: 0, 0, 0
	Global IDs (sizes): 8(65532), 0(1), 0(1)
	Local IDs (sizes): 8(12), 0(1), 0(1)
	Group IDs (sizes): 0(5461), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 3191508992
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 3190456320
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 3190427648
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 3190423552
[VC4CL](VC4CL Queue Han): 9 parameters set.
[VC4CL](VC4CL Queue Han): Setting work-item infos:
	1 dimensions with offsets: 0, 0, 0
	Global IDs (sizes): 9(65532), 0(1), 0(1)
	Local IDs (sizes): 9(12), 0(1), 0(1)
	Group IDs (sizes): 0(5461), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 3191508992
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 3190456320
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 3190427648
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 3190423552
[VC4CL](VC4CL Queue Han): 9 parameters set.
[VC4CL](VC4CL Queue Han): Setting work-item infos:
	1 dimensions with offsets: 0, 0, 0
	Global IDs (sizes): 10(65532), 0(1), 0(1)
	Local IDs (sizes): 10(12), 0(1), 0(1)
	Group IDs (sizes): 0(5461), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 3191508992
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 3190456320
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 3190427648
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 3190423552
[VC4CL](VC4CL Queue Han): 9 parameters set.
[VC4CL](VC4CL Queue Han): Setting work-item infos:
	1 dimensions with offsets: 0, 0, 0
	Global IDs (sizes): 11(65532), 0(1), 0(1)
	Local IDs (sizes): 11(12), 0(1), 0(1)
	Group IDs (sizes): 0(5461), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 3191508992
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 3190456320
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 3190427648
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 3190423552
[VC4CL](VC4CL Queue Han): 9 parameters set.
[VC4CL](VC4CL Queue Han): Running work-group 0, 0, 0
[VC4CL](VC4CL Queue Han): Execution: failed
[VC4CL](VC4CL Queue Han): Mailbox buffer before: 00000020 00000000 0003000e 00000008 00000004 0000001d 00000000 00000000
[VC4CL](VC4CL Queue Han): Mailbox buffer after: 00000020 80000000 0003000e 00000008 80000004 00000000 00000000 00000000
[VC4CL](VC4CL Queue Han): Mailbox buffer before: 00000020 00000000 0003000f 00000008 00000004 0000001d 00000000 00000000
[VC4CL](VC4CL Queue Han): Mailbox buffer after: 00000020 80000000 0003000f 00000008 80000004 00000000 00000000 00000000
[VC4CL](VC4CL Queue Han): Deallocated 8192 bytes of buffer: handle 29, device address 0xbe29d000, host address 0x76fe2000
[VC4CL](    dot_product): Releasing live-time of object: cl_event
[VC4CL](    dot_product): API call: cl_int clGetEventProfilingInfo(cl_event 0x191c45c, cl_profiling_info 4738, size_t 8, void* 0x7e9e0640, size_t* 0)
[VC4CL](    dot_product): API call: cl_int clGetEventProfilingInfo(cl_event 0x191c45c, cl_profiling_info 4739, size_t 8, void* 0x7e9e0638, size_t* 0)
On the device, the dot product kernel completed in 0 ns.
Dot product failed.
[VC4CL](    dot_product): API call: cl_int clReleaseMemObject(cl_mem 0x189396c)
[VC4CL](    dot_product): Releasing live-time of object: cl_mem
[VC4CL](    dot_product): API call: cl_int clReleaseMemObject(cl_mem 0x18949fc)
[VC4CL](    dot_product): Releasing live-time of object: cl_mem
[VC4CL](    dot_product): API call: cl_int clReleaseMemObject(cl_mem 0x191a4e4)
[VC4CL](    dot_product): Releasing live-time of object: cl_mem
[VC4CL](    dot_product): API call: cl_int clReleaseKernel(cl_kernel 0x1894904)
[VC4CL](    dot_product): API call: cl_int clReleaseCommandQueue(cl_command_queue 0x191f64c)
[VC4CL](    dot_product): API call: cl_int clReleaseProgram(cl_program 0x1893d4c)
[VC4CL](    dot_product): API call: cl_int clReleaseContext(cl_context 0x18bffa4)
terminate called without an active exception
[1]    20278 abort      sudo ./dot_product

clinfo works as expected:

$ sudo ./clinfo
[VC4CL](         clinfo): API call: void* clGetExtensionFunctionAddressForPlatform(cl_platform_id 0x6c7e14, const char* "clIcdGetPlatformIDsKHR")
[VC4CL](         clinfo): get extension function address: clIcdGetPlatformIDsKHR
[VC4CL](         clinfo): API call: void* clGetExtensionFunctionAddressForPlatform(cl_platform_id 0x6c7e14, const char* "clGetPlatformInfo")
[VC4CL](         clinfo): get extension function address: clGetPlatformInfo
[VC4CL](         clinfo): API call: cl_int clIcdGetPlatformIDsKHR(cl_uint 0, cl_platform_id* 0, cl_uint* 0x7e81f5a0)
[VC4CL](         clinfo): API call: cl_int clIcdGetPlatformIDsKHR(cl_uint 1, cl_platform_id* 0x6ba6e0, cl_uint* 0)
[VC4CL](         clinfo): API call: cl_int clGetPlatformInfo(cl_platform_id 0x6c7e14, cl_platform_info 2308, size_t 0, void* 0, size_t* 0x7e81f530)
[VC4CL](         clinfo): API call: cl_int clGetPlatformInfo(cl_platform_id 0x6c7e14, cl_platform_info 2308, size_t 159, void* 0x6c99b8, size_t* 0)
[VC4CL](         clinfo): API call: cl_int clGetPlatformInfo(cl_platform_id 0x6c7e14, cl_platform_info 2336, size_t 0, void* 0, size_t* 0x7e81f530)
[VC4CL](         clinfo): API call: cl_int clGetPlatformInfo(cl_platform_id 0x6c7e14, cl_platform_info 2336, size_t 6, void* 0x6c2100, size_t* 0)
Number of platforms                               1
[VC4CL](         clinfo): API call: cl_int clGetPlatformInfo(cl_platform_id 0x6c7e14, cl_platform_info 2306, size_t 0, void* 0, size_t* 0x7e81f5a4)
[VC4CL](         clinfo): API call: cl_int clGetPlatformInfo(cl_platform_id 0x6c7e14, cl_platform_info 2306, size_t 1024, void* 0x696008, size_t* 0)
  Platform Name                                   OpenCL for the Raspberry Pi VideoCore IV GPU
[VC4CL](         clinfo): API call: cl_int clGetPlatformInfo(cl_platform_id 0x6c7e14, cl_platform_info 2307, size_t 0, void* 0, size_t* 0x7e81f5a4)
[VC4CL](         clinfo): API call: cl_int clGetPlatformInfo(cl_platform_id 0x6c7e14, cl_platform_info 2307, size_t 1024, void* 0x696008, size_t* 0)
  Platform Vendor                                 doe300
[VC4CL](         clinfo): API call: cl_int clGetPlatformInfo(cl_platform_id 0x6c7e14, cl_platform_info 2305, size_t 0, void* 0, size_t* 0x7e81f5a4)
[VC4CL](         clinfo): API call: cl_int clGetPlatformInfo(cl_platform_id 0x6c7e14, cl_platform_info 2305, size_t 1024, void* 0x696008, size_t* 0)
  Platform Version                                OpenCL 1.2 VC4CL 0.4.9999
[VC4CL](         clinfo): API call: cl_int clGetPlatformInfo(cl_platform_id 0x6c7e14, cl_platform_info 2304, size_t 0, void* 0, size_t* 0x7e81f5a4)
[VC4CL](         clinfo): API call: cl_int clGetPlatformInfo(cl_platform_id 0x6c7e14, cl_platform_info 2304, size_t 1024, void* 0x696008, size_t* 0)
  Platform Profile                                EMBEDDED_PROFILE
[VC4CL](         clinfo): API call: cl_int clGetPlatformInfo(cl_platform_id 0x6c7e14, cl_platform_info 2308, size_t 0, void* 0, size_t* 0x7e81f5a4)
[VC4CL](         clinfo): API call: cl_int clGetPlatformInfo(cl_platform_id 0x6c7e14, cl_platform_info 2308, size_t 1024, void* 0x696008, size_t* 0)
  Platform Extensions                             cl_khr_il_program cl_khr_spir cl_khr_create_command_queue cl_altera_device_temperature cl_altera_live_object_tracking cl_khr_icd cl_vc4cl_performance_counters
[VC4CL](         clinfo): API call: cl_int clGetPlatformInfo(cl_platform_id 0x6c7e14, cl_platform_info 2336, size_t 0, void* 0, size_t* 0x7e81f5a4)
[VC4CL](         clinfo): API call: cl_int clGetPlatformInfo(cl_platform_id 0x6c7e14, cl_platform_info 2336, size_t 1024, void* 0x696008, size_t* 0)
  Platform Extensions function suffix             VC4CL
[VC4CL](         clinfo): API call: cl_int clGetDeviceIDs(cl_platform_id 0x6c7e14, cl_device_type 4294967295, cl_uint 0, cl_device_id* 0, cl_uint* 0x69e048)
[VC4CL](         clinfo): API call: cl_int clGetDeviceIDs(cl_platform_id 0x6c7e14, cl_device_type 4294967295, cl_uint 1, cl_device_id* 0x6c3f48, cl_uint* 0)

  Platform Name                                   OpenCL for the Raspberry Pi VideoCore IV GPU
Number of devices                                 1
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4139, size_t 0, void* 0, size_t* 0x7e81f21c)
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4139, size_t 1024, void* 0x696410, size_t* 0)
  Device Name                                     VideoCore IV GPU
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4140, size_t 0, void* 0, size_t* 0x7e81f21c)
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4140, size_t 1024, void* 0x696410, size_t* 0)
  Device Vendor                                   Broadcom
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4097, size_t 4, void* 0x7e81f260, size_t* 0)
  Device Vendor ID                                0xa5c
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4143, size_t 0, void* 0, size_t* 0x7e81f21c)
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4143, size_t 1024, void* 0x696410, size_t* 0)
  Device Version                                  OpenCL 1.2 VC4CL 0.4.9999
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4141, size_t 0, void* 0, size_t* 0x7e81f21c)
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4141, size_t 1024, void* 0x696410, size_t* 0)
  Driver Version                                  0.4.9999
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4157, size_t 0, void* 0, size_t* 0x7e81f21c)
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4157, size_t 1024, void* 0x696410, size_t* 0)
  Device OpenCL C Version                         OpenCL C 1.2 
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4144, size_t 0, void* 0, size_t* 0x7e81f21c)
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4144, size_t 1024, void* 0x696410, size_t* 0)
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4096, size_t 8, void* 0x7e81f260, size_t* 0)
  Device Type                                     GPU
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4142, size_t 0, void* 0, size_t* 0x7e81f21c)
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4142, size_t 1024, void* 0x696410, size_t* 0)
  Device Profile                                  EMBEDDED_PROFILE
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4135, size_t 4, void* 0x7e81f260, size_t* 0)
  Device Available                                Yes
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4136, size_t 4, void* 0x7e81f260, size_t* 0)
  Compiler Available                              Yes
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4158, size_t 4, void* 0x7e81f260, size_t* 0)
  Linker Available                                Yes
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4098, size_t 4, void* 0x7e81f260, size_t* 0)
  Max compute units                               1
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 16575, size_t 8, void* 0x7e81f260, size_t* 0)
  Available core IDs                              0, 64
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4108, size_t 4, void* 0x7e81f260, size_t* 0)
[VC4CL] Mailbox file descriptor opened: 3
[VC4CL](         clinfo): Mailbox buffer before: 00000020 00000000 00030012 00000008 00000004 00000001 00000000 00000000
[VC4CL](         clinfo): Mailbox buffer after: 00000020 80000000 00030012 00000008 80000004 00000000 00000000 00000000
[VC4CL](         clinfo): Mailbox buffer before: 00000020 00000000 00030004 00000008 00000004 00000005 00000000 00000000
[VC4CL](         clinfo): Mailbox buffer after: 00000020 80000000 00030004 00000008 80000008 00000005 11e1a300 00000000
[VC4CL](         clinfo): Mailbox request: succeeded
  Max clock frequency                             300MHz
  Device Partition                                (core)
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4163, size_t 4, void* 0x7e81f260, size_t* 0)
    Max number of sub-devices                     0
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4164, size_t 0, void* 0, size_t* 0x7e81f204)
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4164, size_t 4, void* 0x696de0, size_t* 0)
    Supported partition types                     None
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4165, size_t 8, void* 0x7e81f260, size_t* 0)
    Supported affinity domains                    (n/a)
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4099, size_t 4, void* 0x7e81f260, size_t* 0)
  Max work item dimensions                        3
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4101, size_t 0, void* 0, size_t* 0x7e81f1f0)
[VC4CL] base=0x3fc00000, mem=0x76f3e000
[VC4CL] V3D base: 0x76f3e000
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4101, size_t 12, void* 0x6c7750, size_t* 0)
  Max work item sizes                             12x12x12
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4100, size_t 4, void* 0x7e81f260, size_t* 0)
  Max work group size                             12
[VC4CL](         clinfo): API call: cl_context clCreateContext(const cl_context_properties* 0x7e81f1dc, cl_uint 1, const cl_device_id* 0x7e81f2c0, void(CL_CALLBACK*)(const char* errinfo, const void* private_info, size_t cb, void* user_data) 0x7e81f0d0, void* 0, cl_int* 0x7e81f240)
[VC4CL](         clinfo): Tracking live-time of object: cl_context
[VC4CL](         clinfo): API call: cl_program clCreateProgramWithSource(cl_context 0x697144, cl_uint 6, const char** 0x32274, const size_t* 0, cl_int* 0x7e81f240)
[VC4CL](         clinfo): Tracking live-time of object: cl_program
[VC4CL](         clinfo): API call: cl_int clBuildProgram(cl_program 0x6973fc, cl_uint 1, const cl_device_id* 0x7e81f2c0, const char* (null), void(CL_CALLBACK*)(cl_program program, void* user_data) 0x7e81f1b8, void* 0)
[VC4CL](         clinfo): Precompiling source with: 
[VC4CL](         clinfo): Dumping program sources to /tmp/vc4cl-source-1804289383.cl
[VC4CL](         clinfo): Precompilation complete with status: 0
[VC4CL](         clinfo): Compilation log: [W] Wed Mar 27 07:47:58 2019: Warnings in precompilation:
[W] Wed Mar 27 07:47:58 2019: <stdin>:11:1: warning: null character ignored
<U+0000>
^
1 warning generated.


[VC4CL](         clinfo): Compiling source with: 
[VC4CL](         clinfo): Compilation complete with status: 0
[VC4CL](         clinfo): Dumping program sources to /tmp/vc4cl-binary-846930886.bin
[VC4CL](         clinfo): API call: cl_kernel clCreateKernel(cl_program 0x6973fc, const char* "sum", cl_int* 0x7e81f240)
[VC4CL](         clinfo): Tracking live-time of object: cl_kernel
[VC4CL](         clinfo): API call: cl_int clGetKernelWorkGroupInfo(cl_kernel 0x69802c, cl_device_id 0x6c7e28, cl_kernel_work_group_info 4531, size_t 4, void* 0x7e81f224, size_t* 0)
[VC4CL](         clinfo): API call: cl_int clReleaseKernel(cl_kernel 0x69802c)
[VC4CL](         clinfo): Releasing live-time of object: cl_kernel
[VC4CL](         clinfo): API call: cl_int clReleaseProgram(cl_program 0x6973fc)
[VC4CL](         clinfo): Releasing live-time of object: cl_program
[VC4CL](         clinfo): API call: cl_int clReleaseContext(cl_context 0x697144)
[VC4CL](         clinfo): Releasing live-time of object: cl_context
  Preferred work group size multiple              1
  Preferred / native vector sizes                 
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4102, size_t 4, void* 0x7e81f1f8, size_t* 0)
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4150, size_t 4, void* 0x7e81f1f4, size_t* 0)
    char                                                16 / 16      
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4103, size_t 4, void* 0x7e81f1f8, size_t* 0)
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4151, size_t 4, void* 0x7e81f1f4, size_t* 0)
    short                                               16 / 16      
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4104, size_t 4, void* 0x7e81f1f8, size_t* 0)
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4152, size_t 4, void* 0x7e81f1f4, size_t* 0)
    int                                                 16 / 16      
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4105, size_t 4, void* 0x7e81f1f8, size_t* 0)
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4153, size_t 4, void* 0x7e81f1f4, size_t* 0)
    long                                                 0 / 0       
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4148, size_t 4, void* 0x7e81f1f8, size_t* 0)
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4156, size_t 4, void* 0x7e81f1f4, size_t* 0)
    half                                                 0 / 0        (n/a)
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4106, size_t 4, void* 0x7e81f1f8, size_t* 0)
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4154, size_t 4, void* 0x7e81f1f4, size_t* 0)
    float                                               16 / 16      
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4107, size_t 4, void* 0x7e81f1f8, size_t* 0)
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4155, size_t 4, void* 0x7e81f1f4, size_t* 0)
    double                                               0 / 0        (n/a)
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4147, size_t 8, void* 0x7e81f260, size_t* 0)
  Half-precision Floating-point support           (n/a)
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4123, size_t 8, void* 0x7e81f260, size_t* 0)
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             No
    Round to nearest                              No
    Round to zero                                 Yes
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4146, size_t 8, void* 0x7e81f260, size_t* 0)
  Double-precision Floating-point support         (n/a)
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4109, size_t 4, void* 0x7e81f260, size_t* 0)
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4134, size_t 4, void* 0x7e81f260, size_t* 0)
  Address bits                                    32, Little-Endian
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4127, size_t 8, void* 0x7e81f260, size_t* 0)
[VC4CL](         clinfo): Mailbox buffer before: 00000020 00000000 00010006 00000008 00000000 00000000 00000000 00000000
[VC4CL](         clinfo): Mailbox buffer after: 00000020 80000000 00010006 00000008 80000008 3b400000 04c00000 00000000
[VC4CL](         clinfo): Mailbox request: succeeded
  Global memory size                              79691776 (76MiB)
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4132, size_t 4, void* 0x7e81f260, size_t* 0)
  Error Correction support                        No
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4112, size_t 8, void* 0x7e81f260, size_t* 0)
[VC4CL](         clinfo): Mailbox buffer before: 00000020 00000000 00010006 00000008 00000000 00000000 00000000 00000000
[VC4CL](         clinfo): Mailbox buffer after: 00000020 80000000 00010006 00000008 80000008 3b400000 04c00000 00000000
[VC4CL](         clinfo): Mailbox request: succeeded
  Max memory allocation                           79691776 (76MiB)
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4149, size_t 4, void* 0x7e81f260, size_t* 0)
  Unified memory for Host and Device              Yes
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4122, size_t 4, void* 0x7e81f260, size_t* 0)
  Minimum alignment for any data type             64 bytes
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4121, size_t 4, void* 0x7e81f260, size_t* 0)
  Alignment of base address                       512 bits (64 bytes)
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4124, size_t 4, void* 0x7e81f260, size_t* 0)
  Global Memory cache type                        Read/Write
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4126, size_t 8, void* 0x7e81f260, size_t* 0)
  Global Memory cache size                        32768 (32KiB)
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4125, size_t 4, void* 0x7e81f260, size_t* 0)
  Global Memory cache line size                   64 bytes
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4118, size_t 4, void* 0x7e81f260, size_t* 0)
  Image support                                   No
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4130, size_t 4, void* 0x7e81f260, size_t* 0)
  Local memory type                               Global
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4131, size_t 8, void* 0x7e81f260, size_t* 0)
[VC4CL](         clinfo): Mailbox buffer before: 00000020 00000000 00010006 00000008 00000000 00000000 00000000 00000000
[VC4CL](         clinfo): Mailbox buffer after: 00000020 80000000 00010006 00000008 80000008 3b400000 04c00000 00000000
[VC4CL](         clinfo): Mailbox request: succeeded
  Local memory size                               79691776 (76MiB)
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4129, size_t 4, void* 0x7e81f260, size_t* 0)
  Max number of constant args                     64
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4128, size_t 8, void* 0x7e81f260, size_t* 0)
[VC4CL](         clinfo): Mailbox buffer before: 00000020 00000000 00010006 00000008 00000000 00000000 00000000 00000000
[VC4CL](         clinfo): Mailbox buffer after: 00000020 80000000 00010006 00000008 80000008 3b400000 04c00000 00000000
[VC4CL](         clinfo): Mailbox request: succeeded
  Max constant buffer size                        79691776 (76MiB)
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4119, size_t 8, void* 0x7e81f260, size_t* 0)
  Max size of kernel argument                     256
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4138, size_t 8, void* 0x7e81f260, size_t* 0)
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4168, size_t 4, void* 0x7e81f260, size_t* 0)
  Prefer user sync for interop                    Yes
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4133, size_t 4, void* 0x7e81f260, size_t* 0)
  Profiling timer resolution                      1ns
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4137, size_t 8, void* 0x7e81f260, size_t* 0)
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4169, size_t 4, void* 0x7e81f260, size_t* 0)
  printf() buffer size                            0
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4159, size_t 0, void* 0, size_t* 0x7e81f21c)
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4159, size_t 1024, void* 0x696410, size_t* 0)
  Built-in kernels                                (n/a)
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_nv_pragma_unroll cl_arm_core_id cl_ext_atomic_counters_32 cl_khr_initialize_memory cl_arm_integer_dot_product_int8 cl_arm_integer_dot_product_accumulate_int8 cl_arm_integer_dot_product_accumulate_int16 cl_arm_integer_dot_product_accumulate_saturate_int8

NULL platform behavior
[VC4CL](         clinfo): API call: cl_int clGetPlatformInfo(cl_platform_id 0x6c7e14, cl_platform_info 2306, size_t 1024, void* 0x6cb0d0, size_t* 0)
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  OpenCL for the Raspberry Pi VideoCore IV GPU
[VC4CL](         clinfo): API call: cl_int clGetDeviceIDs(cl_platform_id 0x6c7e14, cl_device_type 4294967295, cl_uint 0, cl_device_id* 0, cl_uint* 0x7e81f560)
[VC4CL](         clinfo): API call: cl_int clGetDeviceIDs(cl_platform_id 0x6c7e14, cl_device_type 4294967295, cl_uint 1, cl_device_id* 0x7e81f568, cl_uint* 0)
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4145, size_t 4, void* 0x7e81f564, size_t* 0)
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [VC4CL]
[VC4CL](         clinfo): API call: cl_context clCreateContext(const cl_context_properties* 0, cl_uint 1, const cl_device_id* 0x6c3f48, void(CL_CALLBACK*)(const char* errinfo, const void* private_info, size_t cb, void* user_data) 0x7e81f480, void* 0, cl_int* 0x7e81f5c0)
[VC4CL](         clinfo): Tracking live-time of object: cl_context
[VC4CL](         clinfo): API call: cl_int clReleaseContext(cl_context 0x697144)
[VC4CL](         clinfo): Releasing live-time of object: cl_context
  clCreateContext(NULL, ...) [default]            Success [VC4CL]
[VC4CL](         clinfo): API call: cl_context clCreateContextFromType(const cl_context_properties* 0, cl_device_type 1, void(CL_CALLBACK*)(const char* errinfo, const void* private_info, size_t cb, void* user_data) 0x7e81f0d0, void* 0, cl_int* 0x7e81f0e0)
[VC4CL](         clinfo): API call: cl_int clGetDeviceIDs(cl_platform_id 0x6c7e14, cl_device_type 1, cl_uint 1, cl_device_id* 0x7e81f098, cl_uint* 0x7e81f09c)
[VC4CL](         clinfo): API call: cl_context clCreateContext(const cl_context_properties* 0, cl_uint 1, const cl_device_id* 0x7e81f098, void(CL_CALLBACK*)(const char* errinfo, const void* private_info, size_t cb, void* user_data) 0x7e81ef78, void* 0, cl_int* 0x7e81f0e0)
[VC4CL](         clinfo): Tracking live-time of object: cl_context
[VC4CL](         clinfo): API call: cl_int clGetContextInfo(cl_context 0x697144, cl_context_info 4225, size_t 0, void* 0, size_t* 0x7e81f170)
[VC4CL](         clinfo): API call: cl_int clGetContextInfo(cl_context 0x697144, cl_context_info 4225, size_t 32, void* 0x6c7e50, size_t* 0)
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4145, size_t 4, void* 0x7e81f16c, size_t* 0)
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4139, size_t 867, void* 0x6cb16d, size_t* 0x7e81f14c)
[VC4CL](         clinfo): API call: cl_int clReleaseContext(cl_context 0x697144)
[VC4CL](         clinfo): Releasing live-time of object: cl_context
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 OpenCL for the Raspberry Pi VideoCore IV GPU
    Device Name                                   VideoCore IV GPU
[VC4CL](         clinfo): API call: cl_context clCreateContextFromType(const cl_context_properties* 0, cl_device_type 2, void(CL_CALLBACK*)(const char* errinfo, const void* private_info, size_t cb, void* user_data) 0x7e81f0d0, void* 0, cl_int* 0x7e81f0e0)
[VC4CL](         clinfo): API call: cl_int clGetDeviceIDs(cl_platform_id 0x6c7e14, cl_device_type 2, cl_uint 1, cl_device_id* 0x7e81f098, cl_uint* 0x7e81f09c)
[VC4CL](         clinfo): Error in '/home/pi/gpu/VC4CL/src/Device.cpp:491', returning status -1:No device for the given criteria: platform 0x6c7e14, type: 2!
[VC4CL](         clinfo): Error in '/home/pi/gpu/VC4CL/src/Context.cpp:264', returning status -1:Failed to get device ID!
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
[VC4CL](         clinfo): API call: cl_context clCreateContextFromType(const cl_context_properties* 0, cl_device_type 4, void(CL_CALLBACK*)(const char* errinfo, const void* private_info, size_t cb, void* user_data) 0x7e81f0d0, void* 0, cl_int* 0x7e81f0e0)
[VC4CL](         clinfo): API call: cl_int clGetDeviceIDs(cl_platform_id 0x6c7e14, cl_device_type 4, cl_uint 1, cl_device_id* 0x7e81f098, cl_uint* 0x7e81f09c)
[VC4CL](         clinfo): API call: cl_context clCreateContext(const cl_context_properties* 0, cl_uint 1, const cl_device_id* 0x7e81f098, void(CL_CALLBACK*)(const char* errinfo, const void* private_info, size_t cb, void* user_data) 0x7e81ef78, void* 0, cl_int* 0x7e81f0e0)
[VC4CL](         clinfo): Tracking live-time of object: cl_context
[VC4CL](         clinfo): API call: cl_int clGetContextInfo(cl_context 0x697144, cl_context_info 4225, size_t 0, void* 0, size_t* 0x7e81f170)
[VC4CL](         clinfo): API call: cl_int clGetContextInfo(cl_context 0x697144, cl_context_info 4225, size_t 32, void* 0x6c7e50, size_t* 0)
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4145, size_t 4, void* 0x7e81f16c, size_t* 0)
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4139, size_t 867, void* 0x6cb16d, size_t* 0x7e81f14c)
[VC4CL](         clinfo): API call: cl_int clReleaseContext(cl_context 0x697144)
[VC4CL](         clinfo): Releasing live-time of object: cl_context
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 OpenCL for the Raspberry Pi VideoCore IV GPU
    Device Name                                   VideoCore IV GPU
[VC4CL](         clinfo): API call: cl_context clCreateContextFromType(const cl_context_properties* 0, cl_device_type 8, void(CL_CALLBACK*)(const char* errinfo, const void* private_info, size_t cb, void* user_data) 0x7e81f0d0, void* 0, cl_int* 0x7e81f0e0)
[VC4CL](         clinfo): API call: cl_int clGetDeviceIDs(cl_platform_id 0x6c7e14, cl_device_type 8, cl_uint 1, cl_device_id* 0x7e81f098, cl_uint* 0x7e81f09c)
[VC4CL](         clinfo): Error in '/home/pi/gpu/VC4CL/src/Device.cpp:491', returning status -1:No device for the given criteria: platform 0x6c7e14, type: 8!
[VC4CL](         clinfo): Error in '/home/pi/gpu/VC4CL/src/Context.cpp:264', returning status -1:Failed to get device ID!
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
[VC4CL](         clinfo): API call: cl_context clCreateContextFromType(const cl_context_properties* 0, cl_device_type 16, void(CL_CALLBACK*)(const char* errinfo, const void* private_info, size_t cb, void* user_data) 0x7e81f0d0, void* 0, cl_int* 0x7e81f0e0)
[VC4CL](         clinfo): API call: cl_int clGetDeviceIDs(cl_platform_id 0x6c7e14, cl_device_type 16, cl_uint 1, cl_device_id* 0x7e81f098, cl_uint* 0x7e81f09c)
[VC4CL](         clinfo): Error in '/home/pi/gpu/VC4CL/src/Device.cpp:491', returning status -1:No device for the given criteria: platform 0x6c7e14, type: 16!
[VC4CL](         clinfo): Error in '/home/pi/gpu/VC4CL/src/Context.cpp:264', returning status -1:Failed to get device ID!
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
[VC4CL](         clinfo): API call: cl_context clCreateContextFromType(const cl_context_properties* 0, cl_device_type 4294967295, void(CL_CALLBACK*)(const char* errinfo, const void* private_info, size_t cb, void* user_data) 0x7e81f0d0, void* 0, cl_int* 0x7e81f0e0)
[VC4CL](         clinfo): API call: cl_int clGetDeviceIDs(cl_platform_id 0x6c7e14, cl_device_type 4294967295, cl_uint 1, cl_device_id* 0x7e81f098, cl_uint* 0x7e81f09c)
[VC4CL](         clinfo): API call: cl_context clCreateContext(const cl_context_properties* 0, cl_uint 1, const cl_device_id* 0x7e81f098, void(CL_CALLBACK*)(const char* errinfo, const void* private_info, size_t cb, void* user_data) 0x7e81ef78, void* 0, cl_int* 0x7e81f0e0)
[VC4CL](         clinfo): Tracking live-time of object: cl_context
[VC4CL](         clinfo): API call: cl_int clGetContextInfo(cl_context 0x697144, cl_context_info 4225, size_t 0, void* 0, size_t* 0x7e81f170)
[VC4CL](         clinfo): API call: cl_int clGetContextInfo(cl_context 0x697144, cl_context_info 4225, size_t 32, void* 0x6c7e50, size_t* 0)
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4145, size_t 4, void* 0x7e81f16c, size_t* 0)
[VC4CL](         clinfo): API call: cl_int clGetDeviceInfo(cl_device_id 0x6c7e28, cl_device_info 4139, size_t 867, void* 0x6cb16d, size_t* 0x7e81f14c)
[VC4CL](         clinfo): API call: cl_int clReleaseContext(cl_context 0x697144)
[VC4CL](         clinfo): Releasing live-time of object: cl_context
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 OpenCL for the Raspberry Pi VideoCore IV GPU
    Device Name                                   VideoCore IV GPU

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.11
  ICD loader Profile                              OpenCL 2.1

[VC4CL](         clinfo): Mailbox buffer before: 00000020 00000000 00030012 00000008 00000004 00000000 00000000 00000000
[VC4CL](         clinfo): Mailbox buffer after: 00000020 80000000 00030012 00000008 80000004 00000000 00000000 00000000
[VC4CL] Mailbox file descriptor closed: 3
@doe300
Copy link
Owner

doe300 commented Mar 27, 2019

Unless you explicitly disabled REGISTER_POKE_KERNEL, the "Execution: failed" error means that the kernel execution timed out.

This could be due to one of two problems:

  • The execution of a single kernel instance takes longer than the timeout (currently 30 seconds)
  • The execution somehow hangs the QPU or runs into an infinite loop (more likely)

Can you run the kernel in emulation mode and send the output (or at least the error)?
To do this, you will have to recompile VC4CL with the MOCK_HAL option enabled and execute the code as before. Instead of the GPU, this will execute the code in the built-in emulator which will take longer than the normal execution but should provide more detailed error information.

@kernhanda
Copy link
Contributor Author

Building VC4CL with cmake .. -DCMAKE_BUILD_TYPE=Debug -DBUILD_DEBUG=ON -DMOCK_HAL=ON and then running clinfo results in a segfault.

The crash is in src/extensions.cpp:

65| void* clGetExtensionFunctionAddress(const char* name)
66| {
67+>    return VC4CL_FUNC(clGetExtensionFunctionAddressForPlatform)(Platform::getVC4CLPlatform().toBase(), name);
68| }

I believe it's because VC4CL_FUNC(clGetExtensionFunctionAddressForPlatform) isn't found, so it just segfaults.

Let me know if I'm building VC4CL incorrectly.

@doe300
Copy link
Owner

doe300 commented Mar 28, 2019

When VC4CL is built in emulation mode (MOCK_HAL enabled), it is built without ICD loader support.

Thus, to run clinfo with the emulation version, you need to make sure, clinfo loads the libVC4CL.so as libOpenCL.so:

  1. Create a symlink ln --symbolic libVC4CL.so libOpenCL.so within the build/src directory
  2. Run clinfo with LD_PRELOAD=path/to/libOpenCL.so clinfo

Any program which relies on the ICD loader (any program linking in libOpenCL.so and not directly libVC4CL.so) will need to be started with the LD_PRELOAD environment variable.

@kernhanda
Copy link
Contributor Author

kernhanda commented Mar 28, 2019 via email

@kernhanda
Copy link
Contributor Author

Alright, with MOCK_HAL enabled, it seems that the kernel ran to almost the end, but then some unknown error happened.

Compared to running directly on the device, with emulation, all work groups ran.

Any other ideas about what could be going on here?

Snipped output below (removed the middle part that was repetitive, full output here)

pi@raspberrypi:~/gpu-on-pi/opencl_dot_product$ sudo LD_PRELOAD=$PWD/../VC4CL/build/src/libOpenCL.so ./dot_product
On the host, the dot product took 0.000456 seconds.
[VC4CL](    dot_product): API call: cl_int clGetPlatformIDs(cl_uint 1, cl_platform_id* 0x7ee5b9c0, cl_uint* 0)
[VC4CL](    dot_product): API call: cl_int clGetDeviceIDs(cl_platform_id 0xad15bc, cl_device_type 4, cl_uint 1, cl_device_id* 0x7ee5b9bc, cl_uint* 0)
[VC4CL](    dot_product): API call: cl_context clCreateContext(const cl_context_properties* 0, cl_uint 1, const cl_device_id* 0x7ee6e62c, void(CL_CALLBACK*)(const char* errinfo, const void* private_info, size_t cb, void* user_data) 0x7ee5b8e8, void* 0, cl_int* 0x7ee6e620)
[VC4CL](    dot_product): Tracking live-time of object: cl_context
[VC4CL](    dot_product): API call: cl_int clGetDeviceInfo(cl_device_id 0xad15cc, cl_device_info 4100, size_t 4, void* 0x7ee6e628, size_t* 0)
[VC4CL] base=0x1f6db000, mem=(nil)
[VC4CL] V3D base: 0xeef
[VC4CL](    dot_product): API call: cl_program clCreateProgramWithSource(cl_context 0xad1d4c, cl_uint 1, const char** 0x7ee5b9b8, const size_t* 0x7ee5b9b4, cl_int* 0x7ee5b9ac)
[VC4CL](    dot_product): Tracking live-time of object: cl_program
[VC4CL](    dot_product): API call: cl_int clBuildProgram(cl_program 0xad322c, cl_uint 0, const cl_device_id* 0, const char* (null), void(CL_CALLBACK*)(cl_program program, void* user_data) 0x7ee5b990, void* 0)
[VC4CL](    dot_product): Precompiling source with:
[VC4CL](    dot_product): Dumping program sources to /tmp/vc4cl-source-1863408870.cl
[VC4CL](    dot_product): Precompilation complete with status: 0
[VC4CL](    dot_product): Compilation log: [W] Thu Mar 28 20:50:32 2019: Warnings in precompilation:
[W] Thu Mar 28 20:50:32 2019: <stdin>:22:1: warning: null character ignored
<U+0000>
^
1 warning generated.

[W] Thu Mar 28 20:50:33 2019: Warnings in precompilation:
[W] Thu Mar 28 20:50:33 2019:


[VC4CL](    dot_product): Compiling source with:
[VC4CL](    dot_product): Compilation complete with status: 0
[VC4CL](    dot_product): Dumping program sources to /tmp/vc4cl-binary-1149089301.bin
[VC4CL](    dot_product): API call: cl_kernel clCreateKernel(cl_program 0xad322c, const char* "dot_product", cl_int* 0x7ee6e620)
[VC4CL](    dot_product): Tracking live-time of object: cl_kernel
[VC4CL](    dot_product): API call: cl_mem clCreateBuffer(cl_context 0xad1d4c, cl_mem_flags 36, size_t 38400, void* 0x7ee64ff8, cl_int* 0x7ee6e620)
[VC4CL] Mailbox file descriptor opened: 48879
[VC4CL](    dot_product): Mailbox buffer before: 00000020 00000000 00030012 00000008 00000004 00000001 00000000 00000000
[VC4CL](    dot_product): Mailbox buffer after: 00000020 80000000 00030012 00000008 00000004 00000000 00000000 00000000
[VC4CL](    dot_product): Mailbox buffer before: 00000020 00000000 00010006 00000008 00000000 00000000 00000000 00000000
[VC4CL](    dot_product): Mailbox buffer after: 00000020 80000000 00010006 00000008 00000000 00000000 00800000 00000000
[VC4CL](    dot_product): Mailbox request: succeeded
[VC4CL](    dot_product): Tracking live-time of object: cl_mem
[VC4CL](    dot_product): Mailbox buffer before: 00000024 00000000 0003000c 0000000c 0000000c 00009600 00001000 0000000c 00000000
[VC4CL](    dot_product): Mailbox buffer after: 00000024 80000000 0003000c 0000000c 00000004 00000001 00001000 0000000c 00000000
[VC4CL](    dot_product): Mailbox buffer before: 00000020 00000000 0003000d 00000008 00000004 00000001 00000000 00000000
[VC4CL](    dot_product): Mailbox buffer after: 00000020 80000000 0003000d 00000008 00000004 00000000 00000000 00000000
[VC4CL] base=0x0, mem=0xaef2b0
[VC4CL](    dot_product): Allocated 38400 bytes of buffer: handle 1, device address 0x0, host address 0xaef2b0
[VC4CL](    dot_product): API call: cl_mem clCreateBuffer(cl_context 0xad1d4c, cl_mem_flags 33, size_t 38400, void* 0x7ee5b9f8, cl_int* 0x7ee6e620)
[VC4CL](    dot_product): Mailbox buffer before: 00000020 00000000 00010006 00000008 00000000 00000000 00000000 00000000
[VC4CL](    dot_product): Mailbox buffer after: 00000020 80000000 00010006 00000008 00000000 00000000 00800000 00000000
[VC4CL](    dot_product): Mailbox request: succeeded
[VC4CL](    dot_product): Tracking live-time of object: cl_mem
[VC4CL](    dot_product): Mailbox buffer before: 00000024 00000000 0003000c 0000000c 0000000c 00009600 00001000 0000000c 00000000
[VC4CL](    dot_product): Mailbox buffer after: 00000024 80000000 0003000c 0000000c 00000004 00000002 00001000 0000000c 00000000
[VC4CL](    dot_product): Mailbox buffer before: 00000020 00000000 0003000d 00000008 00000004 00000002 00000000 00000000
[VC4CL](    dot_product): Mailbox buffer after: 00000020 80000000 0003000d 00000008 00000004 00800000 00000000 00000000
[VC4CL] base=0x800000, mem=0xb05b30
[VC4CL](    dot_product): Allocated 38400 bytes of buffer: handle 2, device address 0x800000, host address 0xb05b30
[VC4CL](    dot_product): API call: cl_mem clCreateBuffer(cl_context 0xad1d4c, cl_mem_flags 2, size_t 800, void* 0, cl_int* 0x7ee6e620)
[VC4CL](    dot_product): Mailbox buffer before: 00000020 00000000 00010006 00000008 00000000 00000000 00000000 00000000
[VC4CL](    dot_product): Mailbox buffer after: 00000020 80000000 00010006 00000008 00000000 00000000 00800000 00000000
[VC4CL](    dot_product): Mailbox request: succeeded
[VC4CL](    dot_product): Tracking live-time of object: cl_mem
[VC4CL](    dot_product): Mailbox buffer before: 00000024 00000000 0003000c 0000000c 0000000c 00000320 00001000 0000000c 00000000
[VC4CL](    dot_product): Mailbox buffer after: 00000024 80000000 0003000c 0000000c 00000004 00000003 00001000 0000000c 00000000
[VC4CL](    dot_product): Mailbox buffer before: 00000020 00000000 0003000d 00000008 00000004 00000003 00000000 00000000
[VC4CL](    dot_product): Mailbox buffer after: 00000020 80000000 0003000d 00000008 00000004 01000000 00000000 00000000
[VC4CL] base=0x1000000, mem=0xb346b8
[VC4CL](    dot_product): Allocated 800 bytes of buffer: handle 3, device address 0x1000000, host address 0xb346b8
[VC4CL](    dot_product): API call: cl_command_queue clCreateCommandQueue(cl_context 0xad1d4c, cl_device_id 0xad15cc, cl_command_queue_properties 0, cl_int* 0x7ee6e620)
[VC4CL](    dot_product): Starting queue handler thread...
[VC4CL](    dot_product): Tracking live-time of object: cl_command_queue
[VC4CL](    dot_product): API call: cl_int clSetKernelArg(cl_kernel 0xb34b8c, cl_uint 0, size_t 4, const void* 0x7ee5b9f4)
[VC4CL](    dot_product): Set kernel arg 0 for kernel 'dot_product' to 0x7ee5b9f4 (11352148) with size 4
[VC4CL](    dot_product): Kernel arg 0 for kernel 'dot_product' is float4* 'a_vec' with size 4
[VC4CL](    dot_product): Setting kernel-argument 0 to pointer 0x0xad3848
[VC4CL](    dot_product): API call: cl_int clSetKernelArg(cl_kernel 0xb34b8c, cl_uint 1, size_t 4, const void* 0x7ee5b9f0)
[VC4CL](    dot_product): Set kernel arg 1 for kernel 'dot_product' to 0x7ee5b9f0 (11353828) with size 4
[VC4CL](    dot_product): Kernel arg 1 for kernel 'dot_product' is float4* 'b_vec' with size 4
[VC4CL](    dot_product): Setting kernel-argument 1 to pointer 0x0xad3ed8
[VC4CL](    dot_product): API call: cl_int clSetKernelArg(cl_kernel 0xb34b8c, cl_uint 2, size_t 4, const void* 0x7ee5b9ec)
[VC4CL](    dot_product): Set kernel arg 2 for kernel 'dot_product' to 0x7ee5b9ec (11350412) with size 4
[VC4CL](    dot_product): Kernel arg 2 for kernel 'dot_product' is float* 'output' with size 4
[VC4CL](    dot_product): Setting kernel-argument 2 to pointer 0x0xad3180
[VC4CL](    dot_product): API call: cl_int clSetKernelArg(cl_kernel 0xb34b8c, cl_uint 3, size_t 192, const void* 0)
[VC4CL](    dot_product): Set kernel arg 3 for kernel 'dot_product' to 0 (0) with size 192
[VC4CL](    dot_product): Kernel arg 3 for kernel 'dot_product' is float4* 'partial_dot' with size 4
[VC4CL](    dot_product): Mailbox buffer before: 00000020 00000000 00010006 00000008 00000000 00000000 00000000 00000000
[VC4CL](    dot_product): Mailbox buffer after: 00000020 80000000 00010006 00000008 00000000 00000000 00800000 00000000
[VC4CL](    dot_product): Mailbox request: succeeded
[VC4CL](    dot_product): Setting kernel-argument 3 to pointer 0x0
[VC4CL](    dot_product): API call: cl_int clEnqueueNDRangeKernel(cl_command_queue 0xad1dc4, cl_kernel 0xb34b8c, cl_uint 1, const size_t* 0, const size_t* 0x7ee6e624, const size_t* 0x7ee6e628, cl_uint 0, const cl_event* 0, cl_event* 0x7ee6e60c)
[VC4CL](    dot_product): Mailbox buffer before: 00000024 00000000 0003000c 0000000c 0000000c 000000c0 00001000 0000000c 00000000
[VC4CL](    dot_product): Mailbox buffer after: 00000024 80000000 0003000c 0000000c 00000004 00000004 00001000 0000000c 00000000
[VC4CL](    dot_product): Mailbox buffer before: 00000020 00000000 0003000d 00000008 00000004 00000004 00000000 00000000
[VC4CL](    dot_product): Mailbox buffer after: 00000020 80000000 0003000d 00000008 00000004 01800000 00000000 00000000
[VC4CL] base=0x1800000, mem=0xb00e38
[VC4CL](    dot_product): Allocated 192 bytes of buffer: handle 4, device address 0x1800000, host address 0xb00e38
[VC4CL](    dot_product): Reserved 192 bytes of buffer for local/struct parameter: float4* partial_dot
[VC4CL](    dot_product): Tracking live-time of object: cl_event
[VC4CL](    dot_product): API call: cl_int clEnqueueReadBuffer(cl_command_queue 0xad1dc4, cl_mem 0xad318c, cl_bool 1, size_t 0, size_t 800, void* 0xad2a68, cl_uint 0, const cl_event* 0, cl_event* 0)
[VC4CL](VC4CL Queue Han): Running kernel 'dot_product' with 901 instructions...
[VC4CL](VC4CL Queue Han): Local sizes: 12 1 1 -> 12 QPUs
[VC4CL](VC4CL Queue Han): Global sizes: 2400 1 1 -> 200 work-groups (8 run at once)
[VC4CL](    dot_product): Tracking live-time of object: cl_event
[VC4CL](VC4CL Queue Han): Mailbox buffer before: 00000024 00000000 0003000c 0000000c 0000000c 00004000 00001000 0000000c 00000000
[VC4CL](VC4CL Queue Han): Mailbox buffer after: 00000024 80000000 0003000c 0000000c 00000004 00000005 00001000 0000000c 00000000
[VC4CL](VC4CL Queue Han): Mailbox buffer before: 00000020 00000000 0003000d 00000008 00000004 00000005 00000000 00000000
[VC4CL](VC4CL Queue Han): Mailbox buffer after: 00000020 80000000 0003000d 00000008 00000004 02000000 00000000 00000000
[VC4CL] base=0x2000000, mem=0x73074d20
[VC4CL](VC4CL Queue Han): Allocated 16384 bytes of buffer: handle 5, device address 0x2000000, host address 0x73074d20
[VC4CL](VC4CL Queue Han): Copied 0 bytes of global data to device buffer
[VC4CL](VC4CL Queue Han): Reserving space for 12 stack-frames of 0 bytes each
[VC4CL](VC4CL Queue Han): Copied 7208 bytes of kernel code to device buffer
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 0(2400), 0(1), 0(1)
        Local IDs (sizes): 0(12), 0(1), 0(1)
        Group IDs (sizes): 0(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 12(2400), 0(1), 0(1)
        Local IDs (sizes): 0(12), 0(1), 0(1)
        Group IDs (sizes): 1(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 24(2400), 0(1), 0(1)
        Local IDs (sizes): 0(12), 0(1), 0(1)
        Group IDs (sizes): 2(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 36(2400), 0(1), 0(1)
        Local IDs (sizes): 0(12), 0(1), 0(1)
        Group IDs (sizes): 3(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 48(2400), 0(1), 0(1)
        Local IDs (sizes): 0(12), 0(1), 0(1)
        Group IDs (sizes): 4(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 60(2400), 0(1), 0(1)
        Local IDs (sizes): 0(12), 0(1), 0(1)
        Group IDs (sizes): 5(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 72(2400), 0(1), 0(1)
        Local IDs (sizes): 0(12), 0(1), 0(1)
        Group IDs (sizes): 6(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 84(2400), 0(1), 0(1)
        Local IDs (sizes): 0(12), 0(1), 0(1)
        Group IDs (sizes): 7(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): 72 parameters set.
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 1(2400), 0(1), 0(1)
        Local IDs (sizes): 1(12), 0(1), 0(1)
        Group IDs (sizes): 0(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 13(2400), 0(1), 0(1)
        Local IDs (sizes): 1(12), 0(1), 0(1)
        Group IDs (sizes): 1(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 25(2400), 0(1), 0(1)
        Local IDs (sizes): 1(12), 0(1), 0(1)
        Group IDs (sizes): 2(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 37(2400), 0(1), 0(1)
        Local IDs (sizes): 1(12), 0(1), 0(1)
        Group IDs (sizes): 3(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 49(2400), 0(1), 0(1)
        Local IDs (sizes): 1(12), 0(1), 0(1)
        Group IDs (sizes): 4(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 61(2400), 0(1), 0(1)
        Local IDs (sizes): 1(12), 0(1), 0(1)
        Group IDs (sizes): 5(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 73(2400), 0(1), 0(1)
        Local IDs (sizes): 1(12), 0(1), 0(1)
        Group IDs (sizes): 6(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 85(2400), 0(1), 0(1)
        Local IDs (sizes): 1(12), 0(1), 0(1)
        Group IDs (sizes): 7(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): 72 parameters set.
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2(2400), 0(1), 0(1)
        Local IDs (sizes): 2(12), 0(1), 0(1)
        Group IDs (sizes): 0(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 14(2400), 0(1), 0(1)
        Local IDs (sizes): 2(12), 0(1), 0(1)
        Group IDs (sizes): 1(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 26(2400), 0(1), 0(1)
        Local IDs (sizes): 2(12), 0(1), 0(1)
        Group IDs (sizes): 2(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 38(2400), 0(1), 0(1)
        Local IDs (sizes): 2(12), 0(1), 0(1)
        Group IDs (sizes): 3(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 50(2400), 0(1), 0(1)
        Local IDs (sizes): 2(12), 0(1), 0(1)
        Group IDs (sizes): 4(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 62(2400), 0(1), 0(1)
        Local IDs (sizes): 2(12), 0(1), 0(1)
        Group IDs (sizes): 5(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 74(2400), 0(1), 0(1)
        Local IDs (sizes): 2(12), 0(1), 0(1)
        Group IDs (sizes): 6(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 86(2400), 0(1), 0(1)
        Local IDs (sizes): 2(12), 0(1), 0(1)
        Group IDs (sizes): 7(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): 72 parameters set.
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 3(2400), 0(1), 0(1)
        Local IDs (sizes): 3(12), 0(1), 0(1)
        Group IDs (sizes): 0(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 15(2400), 0(1), 0(1)
        Local IDs (sizes): 3(12), 0(1), 0(1)
        Group IDs (sizes): 1(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 27(2400), 0(1), 0(1)
        Local IDs (sizes): 3(12), 0(1), 0(1)
        Group IDs (sizes): 2(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 39(2400), 0(1), 0(1)
        Local IDs (sizes): 3(12), 0(1), 0(1)
        Group IDs (sizes): 3(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 51(2400), 0(1), 0(1)
        Local IDs (sizes): 3(12), 0(1), 0(1)
        Group IDs (sizes): 4(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 63(2400), 0(1), 0(1)
        Local IDs (sizes): 3(12), 0(1), 0(1)
        Group IDs (sizes): 5(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 75(2400), 0(1), 0(1)
        Local IDs (sizes): 3(12), 0(1), 0(1)
        Group IDs (sizes): 6(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 87(2400), 0(1), 0(1)
        Local IDs (sizes): 3(12), 0(1), 0(1)
        Group IDs (sizes): 7(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): 72 parameters set.
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 4(2400), 0(1), 0(1)
        Local IDs (sizes): 4(12), 0(1), 0(1)
        Group IDs (sizes): 0(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 16(2400), 0(1), 0(1)
        Local IDs (sizes): 4(12), 0(1), 0(1)
        Group IDs (sizes): 1(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 28(2400), 0(1), 0(1)
        Local IDs (sizes): 4(12), 0(1), 0(1)
        Group IDs (sizes): 2(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 40(2400), 0(1), 0(1)
        Local IDs (sizes): 4(12), 0(1), 0(1)
        Group IDs (sizes): 3(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 52(2400), 0(1), 0(1)
        Local IDs (sizes): 4(12), 0(1), 0(1)
        Group IDs (sizes): 4(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 64(2400), 0(1), 0(1)
        Local IDs (sizes): 4(12), 0(1), 0(1)
        Group IDs (sizes): 5(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 76(2400), 0(1), 0(1)
        Local IDs (sizes): 4(12), 0(1), 0(1)
        Group IDs (sizes): 6(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 88(2400), 0(1), 0(1)
        Local IDs (sizes): 4(12), 0(1), 0(1)
        Group IDs (sizes): 7(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): 72 parameters set.
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 5(2400), 0(1), 0(1)
        Local IDs (sizes): 5(12), 0(1), 0(1)
        Group IDs (sizes): 0(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 17(2400), 0(1), 0(1)
        Local IDs (sizes): 5(12), 0(1), 0(1)
        Group IDs (sizes): 1(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 29(2400), 0(1), 0(1)
        Local IDs (sizes): 5(12), 0(1), 0(1)
        Group IDs (sizes): 2(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 41(2400), 0(1), 0(1)
        Local IDs (sizes): 5(12), 0(1), 0(1)
        Group IDs (sizes): 3(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 53(2400), 0(1), 0(1)
        Local IDs (sizes): 5(12), 0(1), 0(1)
        Group IDs (sizes): 4(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 65(2400), 0(1), 0(1)
        Local IDs (sizes): 5(12), 0(1), 0(1)
        Group IDs (sizes): 5(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 77(2400), 0(1), 0(1)
        Local IDs (sizes): 5(12), 0(1), 0(1)
        Group IDs (sizes): 6(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 89(2400), 0(1), 0(1)
        Local IDs (sizes): 5(12), 0(1), 0(1)
        Group IDs (sizes): 7(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): 72 parameters set.
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 6(2400), 0(1), 0(1)
        Local IDs (sizes): 6(12), 0(1), 0(1)
        Group IDs (sizes): 0(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 18(2400), 0(1), 0(1)
        Local IDs (sizes): 6(12), 0(1), 0(1)
        Group IDs (sizes): 1(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 30(2400), 0(1), 0(1)
        Local IDs (sizes): 6(12), 0(1), 0(1)
        Group IDs (sizes): 2(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 42(2400), 0(1), 0(1)
        Local IDs (sizes): 6(12), 0(1), 0(1)
        Group IDs (sizes): 3(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 54(2400), 0(1), 0(1)
        Local IDs (sizes): 6(12), 0(1), 0(1)
        Group IDs (sizes): 4(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 66(2400), 0(1), 0(1)
        Local IDs (sizes): 6(12), 0(1), 0(1)
        Group IDs (sizes): 5(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 78(2400), 0(1), 0(1)
        Local IDs (sizes): 6(12), 0(1), 0(1)
        Group IDs (sizes): 6(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 90(2400), 0(1), 0(1)
        Local IDs (sizes): 6(12), 0(1), 0(1)
        Group IDs (sizes): 7(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): 72 parameters set.
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 7(2400), 0(1), 0(1)
        Local IDs (sizes): 7(12), 0(1), 0(1)
        Group IDs (sizes): 0(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 19(2400), 0(1), 0(1)
        Local IDs (sizes): 7(12), 0(1), 0(1)
        Group IDs (sizes): 1(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 31(2400), 0(1), 0(1)
        Local IDs (sizes): 7(12), 0(1), 0(1)
        Group IDs (sizes): 2(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 43(2400), 0(1), 0(1)
        Local IDs (sizes): 7(12), 0(1), 0(1)
        Group IDs (sizes): 3(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 55(2400), 0(1), 0(1)
        Local IDs (sizes): 7(12), 0(1), 0(1)
        Group IDs (sizes): 4(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 67(2400), 0(1), 0(1)
        Local IDs (sizes): 7(12), 0(1), 0(1)
        Group IDs (sizes): 5(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 79(2400), 0(1), 0(1)
        Local IDs (sizes): 7(12), 0(1), 0(1)
        Group IDs (sizes): 6(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 91(2400), 0(1), 0(1)
        Local IDs (sizes): 7(12), 0(1), 0(1)
        Group IDs (sizes): 7(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): 72 parameters set.
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 8(2400), 0(1), 0(1)
        Local IDs (sizes): 8(12), 0(1), 0(1)
        Group IDs (sizes): 0(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 20(2400), 0(1), 0(1)
        Local IDs (sizes): 8(12), 0(1), 0(1)
        Group IDs (sizes): 1(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 32(2400), 0(1), 0(1)
        Local IDs (sizes): 8(12), 0(1), 0(1)
        Group IDs (sizes): 2(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 44(2400), 0(1), 0(1)
        Local IDs (sizes): 8(12), 0(1), 0(1)
        Group IDs (sizes): 3(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 56(2400), 0(1), 0(1)
        Local IDs (sizes): 8(12), 0(1), 0(1)
        Group IDs (sizes): 4(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 68(2400), 0(1), 0(1)
        Local IDs (sizes): 8(12), 0(1), 0(1)
        Group IDs (sizes): 5(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 80(2400), 0(1), 0(1)
        Local IDs (sizes): 8(12), 0(1), 0(1)
        Group IDs (sizes): 6(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 92(2400), 0(1), 0(1)
        Local IDs (sizes): 8(12), 0(1), 0(1)
        Group IDs (sizes): 7(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): 72 parameters set.
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 9(2400), 0(1), 0(1)
        Local IDs (sizes): 9(12), 0(1), 0(1)
        Group IDs (sizes): 0(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 21(2400), 0(1), 0(1)
        Local IDs (sizes): 9(12), 0(1), 0(1)
        Group IDs (sizes): 1(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 33(2400), 0(1), 0(1)
        Local IDs (sizes): 9(12), 0(1), 0(1)
        Group IDs (sizes): 2(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 45(2400), 0(1), 0(1)
        Local IDs (sizes): 9(12), 0(1), 0(1)
        Group IDs (sizes): 3(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 57(2400), 0(1), 0(1)
        Local IDs (sizes): 9(12), 0(1), 0(1)
        Group IDs (sizes): 4(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 69(2400), 0(1), 0(1)
        Local IDs (sizes): 9(12), 0(1), 0(1)
        Group IDs (sizes): 5(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 81(2400), 0(1), 0(1)
        Local IDs (sizes): 9(12), 0(1), 0(1)
        Group IDs (sizes): 6(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 93(2400), 0(1), 0(1)
        Local IDs (sizes): 9(12), 0(1), 0(1)
        Group IDs (sizes): 7(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): 72 parameters set.
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 10(2400), 0(1), 0(1)
        Local IDs (sizes): 10(12), 0(1), 0(1)
        Group IDs (sizes): 0(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 22(2400), 0(1), 0(1)
        Local IDs (sizes): 10(12), 0(1), 0(1)
        Group IDs (sizes): 1(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 34(2400), 0(1), 0(1)
        Local IDs (sizes): 10(12), 0(1), 0(1)
        Group IDs (sizes): 2(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 46(2400), 0(1), 0(1)
        Local IDs (sizes): 10(12), 0(1), 0(1)
        Group IDs (sizes): 3(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 58(2400), 0(1), 0(1)
        Local IDs (sizes): 10(12), 0(1), 0(1)
        Group IDs (sizes): 4(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 70(2400), 0(1), 0(1)
        Local IDs (sizes): 10(12), 0(1), 0(1)
        Group IDs (sizes): 5(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 82(2400), 0(1), 0(1)
        Local IDs (sizes): 10(12), 0(1), 0(1)
        Group IDs (sizes): 6(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 94(2400), 0(1), 0(1)
        Local IDs (sizes): 10(12), 0(1), 0(1)
        Group IDs (sizes): 7(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): 72 parameters set.
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 11(2400), 0(1), 0(1)
        Local IDs (sizes): 11(12), 0(1), 0(1)
        Group IDs (sizes): 0(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 23(2400), 0(1), 0(1)
        Local IDs (sizes): 11(12), 0(1), 0(1)
        Group IDs (sizes): 1(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 35(2400), 0(1), 0(1)
        Local IDs (sizes): 11(12), 0(1), 0(1)
        Group IDs (sizes): 2(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 47(2400), 0(1), 0(1)
        Local IDs (sizes): 11(12), 0(1), 0(1)
        Group IDs (sizes): 3(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 59(2400), 0(1), 0(1)
        Local IDs (sizes): 11(12), 0(1), 0(1)
        Group IDs (sizes): 4(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 71(2400), 0(1), 0(1)
        Local IDs (sizes): 11(12), 0(1), 0(1)
        Group IDs (sizes): 5(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 83(2400), 0(1), 0(1)
        Local IDs (sizes): 11(12), 0(1), 0(1)
        Group IDs (sizes): 6(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 95(2400), 0(1), 0(1)
        Local IDs (sizes): 11(12), 0(1), 0(1)
        Group IDs (sizes): 7(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 4 to buffer 0
[VC4CL](VC4CL Queue Han): Setting parameter 5 to buffer 8388608
[VC4CL](VC4CL Queue Han): Setting parameter 6 to buffer 16777216
[VC4CL](VC4CL Queue Han): Setting parameter 7 to temporary buffer 25165824
[VC4CL](VC4CL Queue Han): 72 parameters set.
[VC4CL](VC4CL Queue Han): Running work-group 0, 0, 0
[VC4CL](VC4CL Queue Han): Mailbox buffer before: 00000028 00000000 00030011 00000010 00000010 0000000c 020029a8 00000000 00007530 00000000
[VC4CL](VC4CL Queue Han): Mailbox buffer after: 00000028 80000000 00030011 00000010 00000004 00000000 020029a8 00000000 00007530 00000000
[VC4CL](VC4CL Queue Han): Execution: successful
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 96(2400), 0(1), 0(1)
        Local IDs (sizes): 0(12), 0(1), 0(1)
        Group IDs (sizes): 8(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 108(2400), 0(1), 0(1)
        Local IDs (sizes): 0(12), 0(1), 0(1)
        Group IDs (sizes): 9(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 120(2400), 0(1), 0(1)
        Local IDs (sizes): 0(12), 0(1), 0(1)
        Group IDs (sizes): 10(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 132(2400), 0(1), 0(1)
        Local IDs (sizes): 0(12), 0(1), 0(1)
        Group IDs (sizes): 11(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 144(2400), 0(1), 0(1)
        Local IDs (sizes): 0(12), 0(1), 0(1)
        Group IDs (sizes): 12(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 156(2400), 0(1), 0(1)
        Local IDs (sizes): 0(12), 0(1), 0(1)
        Group IDs (sizes): 13(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 168(2400), 0(1), 0(1)
        Local IDs (sizes): 0(12), 0(1), 0(1)
        Group IDs (sizes): 14(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 180(2400), 0(1), 0(1)
        Local IDs (sizes): 0(12), 0(1), 0(1)
        Group IDs (sizes): 15(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 97(2400), 0(1), 0(1)
        Local IDs (sizes): 1(12), 0(1), 0(1)
        Group IDs (sizes): 8(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 109(2400), 0(1), 0(1)
        Local IDs (sizes): 1(12), 0(1), 0(1)
        Group IDs (sizes): 9(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 121(2400), 0(1), 0(1)
        Local IDs (sizes): 1(12), 0(1), 0(1)
        Group IDs (sizes): 10(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 133(2400), 0(1), 0(1)
        Local IDs (sizes): 1(12), 0(1), 0(1)
        Group IDs (sizes): 11(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 145(2400), 0(1), 0(1)
        Local IDs (sizes): 1(12), 0(1), 0(1)
        Group IDs (sizes): 12(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 157(2400), 0(1), 0(1)
        Local IDs (sizes): 1(12), 0(1), 0(1)
        Group IDs (sizes): 13(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 169(2400), 0(1), 0(1)
        Local IDs (sizes): 1(12), 0(1), 0(1)
        Group IDs (sizes): 14(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 181(2400), 0(1), 0(1)
        Local IDs (sizes): 1(12), 0(1), 0(1)
        Group IDs (sizes): 15(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 98(2400), 0(1), 0(1)
        Local IDs (sizes): 2(12), 0(1), 0(1)
        Group IDs (sizes): 8(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 110(2400), 0(1), 0(1)
        Local IDs (sizes): 2(12), 0(1), 0(1)
        Group IDs (sizes): 9(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 122(2400), 0(1), 0(1)
        Local IDs (sizes): 2(12), 0(1), 0(1)
        Group IDs (sizes): 10(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 134(2400), 0(1), 0(1)
        Local IDs (sizes): 2(12), 0(1), 0(1)
        Group IDs (sizes): 11(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 146(2400), 0(1), 0(1)
        Local IDs (sizes): 2(12), 0(1), 0(1)
        Group IDs (sizes): 12(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 158(2400), 0(1), 0(1)
        Local IDs (sizes): 2(12), 0(1), 0(1)
        Group IDs (sizes): 13(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 170(2400), 0(1), 0(1)
        Local IDs (sizes): 2(12), 0(1), 0(1)
        Group IDs (sizes): 14(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 182(2400), 0(1), 0(1)
        Local IDs (sizes): 2(12), 0(1), 0(1)
        Group IDs (sizes): 15(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 99(2400), 0(1), 0(1)
        Local IDs (sizes): 3(12), 0(1), 0(1)
        Group IDs (sizes): 8(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 111(2400), 0(1), 0(1)
        Local IDs (sizes): 3(12), 0(1), 0(1)
        Group IDs (sizes): 9(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 123(2400), 0(1), 0(1)
        Local IDs (sizes): 3(12), 0(1), 0(1)
        Group IDs (sizes): 10(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 135(2400), 0(1), 0(1)
        Local IDs (sizes): 3(12), 0(1), 0(1)
        Group IDs (sizes): 11(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 147(2400), 0(1), 0(1)
        Local IDs (sizes): 3(12), 0(1), 0(1)
        Group IDs (sizes): 12(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 159(2400), 0(1), 0(1)
        Local IDs (sizes): 3(12), 0(1), 0(1)
        Group IDs (sizes): 13(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 171(2400), 0(1), 0(1)
        Local IDs (sizes): 3(12), 0(1), 0(1)
        Group IDs (sizes): 14(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 183(2400), 0(1), 0(1)
        Local IDs (sizes): 3(12), 0(1), 0(1)
        Group IDs (sizes): 15(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 100(2400), 0(1), 0(1)
        Local IDs (sizes): 4(12), 0(1), 0(1)
        Group IDs (sizes): 8(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 112(2400), 0(1), 0(1)
        Local IDs (sizes): 4(12), 0(1), 0(1)
        Group IDs (sizes): 9(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 124(2400), 0(1), 0(1)
        Local IDs (sizes): 4(12), 0(1), 0(1)
        Group IDs (sizes): 10(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 136(2400), 0(1), 0(1)
        Local IDs (sizes): 4(12), 0(1), 0(1)
        Group IDs (sizes): 11(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 148(2400), 0(1), 0(1)
        Local IDs (sizes): 4(12), 0(1), 0(1)
        Group IDs (sizes): 12(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 160(2400), 0(1), 0(1)
        Local IDs (sizes): 4(12), 0(1), 0(1)
        Group IDs (sizes): 13(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 172(2400), 0(1), 0(1)
        Local IDs (sizes): 4(12), 0(1), 0(1)
        Group IDs (sizes): 14(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 184(2400), 0(1), 0(1)
        Local IDs (sizes): 4(12), 0(1), 0(1)
        Group IDs (sizes): 15(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 101(2400), 0(1), 0(1)
        Local IDs (sizes): 5(12), 0(1), 0(1)
        Group IDs (sizes): 8(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 113(2400), 0(1), 0(1)
        Local IDs (sizes): 5(12), 0(1), 0(1)
        Group IDs (sizes): 9(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 125(2400), 0(1), 0(1)
        Local IDs (sizes): 5(12), 0(1), 0(1)
        Group IDs (sizes): 10(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 137(2400), 0(1), 0(1)
        Local IDs (sizes): 5(12), 0(1), 0(1)
        Group IDs (sizes): 11(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 149(2400), 0(1), 0(1)
        Local IDs (sizes): 5(12), 0(1), 0(1)
        Group IDs (sizes): 12(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 161(2400), 0(1), 0(1)
        Local IDs (sizes): 5(12), 0(1), 0(1)
        Group IDs (sizes): 13(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 173(2400), 0(1), 0(1)
        Local IDs (sizes): 5(12), 0(1), 0(1)
        Group IDs (sizes): 14(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 185(2400), 0(1), 0(1)
        Local IDs (sizes): 5(12), 0(1), 0(1)
        Group IDs (sizes): 15(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 102(2400), 0(1), 0(1)
        Local IDs (sizes): 6(12), 0(1), 0(1)
        Group IDs (sizes): 8(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 114(2400), 0(1), 0(1)
        Local IDs (sizes): 6(12), 0(1), 0(1)
        Group IDs (sizes): 9(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 126(2400), 0(1), 0(1)
        Local IDs (sizes): 6(12), 0(1), 0(1)
        Group IDs (sizes): 10(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 138(2400), 0(1), 0(1)
        Local IDs (sizes): 6(12), 0(1), 0(1)
        Group IDs (sizes): 11(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 150(2400), 0(1), 0(1)
        Local IDs (sizes): 6(12), 0(1), 0(1)
        Group IDs (sizes): 12(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 162(2400), 0(1), 0(1)
        Local IDs (sizes): 6(12), 0(1), 0(1)
        Group IDs (sizes): 13(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 174(2400), 0(1), 0(1)
        Local IDs (sizes): 6(12), 0(1), 0(1)
        Group IDs (sizes): 14(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 186(2400), 0(1), 0(1)
        Local IDs (sizes): 6(12), 0(1), 0(1)
        Group IDs (sizes): 15(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 103(2400), 0(1), 0(1)
        Local IDs (sizes): 7(12), 0(1), 0(1)
        Group IDs (sizes): 8(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 115(2400), 0(1), 0(1)
        Local IDs (sizes): 7(12), 0(1), 0(1)
        Group IDs (sizes): 9(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 127(2400), 0(1), 0(1)
        Local IDs (sizes): 7(12), 0(1), 0(1)
        Group IDs (sizes): 10(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 139(2400), 0(1), 0(1)
        Local IDs (sizes): 7(12), 0(1), 0(1)
        Group IDs (sizes): 11(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 151(2400), 0(1), 0(1)
        Local IDs (sizes): 7(12), 0(1), 0(1)
        Group IDs (sizes): 12(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 163(2400), 0(1), 0(1)
        Local IDs (sizes): 7(12), 0(1), 0(1)
        Group IDs (sizes): 13(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 175(2400), 0(1), 0(1)
        Local IDs (sizes): 7(12), 0(1), 0(1)
        Group IDs (sizes): 14(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 187(2400), 0(1), 0(1)
        Local IDs (sizes): 7(12), 0(1), 0(1)
        Group IDs (sizes): 15(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 104(2400), 0(1), 0(1)
        Local IDs (sizes): 8(12), 0(1), 0(1)
        Group IDs (sizes): 8(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 116(2400), 0(1), 0(1)
        Local IDs (sizes): 8(12), 0(1), 0(1)
        Group IDs (sizes): 9(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 128(2400), 0(1), 0(1)
        Local IDs (sizes): 8(12), 0(1), 0(1)
        Group IDs (sizes): 10(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 140(2400), 0(1), 0(1)
        Local IDs (sizes): 8(12), 0(1), 0(1)
        Group IDs (sizes): 11(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 152(2400), 0(1), 0(1)
        Local IDs (sizes): 8(12), 0(1), 0(1)
        Group IDs (sizes): 12(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 164(2400), 0(1), 0(1)
        Local IDs (sizes): 8(12), 0(1), 0(1)
        Group IDs (sizes): 13(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 176(2400), 0(1), 0(1)
        Local IDs (sizes): 8(12), 0(1), 0(1)
        Group IDs (sizes): 14(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 188(2400), 0(1), 0(1)
        Local IDs (sizes): 8(12), 0(1), 0(1)
        Group IDs (sizes): 15(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 105(2400), 0(1), 0(1)
        Local IDs (sizes): 9(12), 0(1), 0(1)
        Group IDs (sizes): 8(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 117(2400), 0(1), 0(1)
        Local IDs (sizes): 9(12), 0(1), 0(1)
        Group IDs (sizes): 9(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 129(2400), 0(1), 0(1)
        Local IDs (sizes): 9(12), 0(1), 0(1)
        Group IDs (sizes): 10(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 141(2400), 0(1), 0(1)
        Local IDs (sizes): 9(12), 0(1), 0(1)
        Group IDs (sizes): 11(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 153(2400), 0(1), 0(1)
        Local IDs (sizes): 9(12), 0(1), 0(1)
        Group IDs (sizes): 12(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 165(2400), 0(1), 0(1)
        Local IDs (sizes): 9(12), 0(1), 0(1)
        Group IDs (sizes): 13(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 177(2400), 0(1), 0(1)
        Local IDs (sizes): 9(12), 0(1), 0(1)
        Group IDs (sizes): 14(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 189(2400), 0(1), 0(1)
        Local IDs (sizes): 9(12), 0(1), 0(1)
        Group IDs (sizes): 15(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 106(2400), 0(1), 0(1)
        Local IDs (sizes): 10(12), 0(1), 0(1)
        Group IDs (sizes): 8(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 118(2400), 0(1), 0(1)
        Local IDs (sizes): 10(12), 0(1), 0(1)
        Group IDs (sizes): 9(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 130(2400), 0(1), 0(1)
        Local IDs (sizes): 10(12), 0(1), 0(1)
        Group IDs (sizes): 10(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 142(2400), 0(1), 0(1)
        Local IDs (sizes): 10(12), 0(1), 0(1)
        Group IDs (sizes): 11(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 154(2400), 0(1), 0(1)
        Local IDs (sizes): 10(12), 0(1), 0(1)
        Group IDs (sizes): 12(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 166(2400), 0(1), 0(1)
        Local IDs (sizes): 10(12), 0(1), 0(1)
        Group IDs (sizes): 13(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 178(2400), 0(1), 0(1)
        Local IDs (sizes): 10(12), 0(1), 0(1)
        Group IDs (sizes): 14(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 190(2400), 0(1), 0(1)
        Local IDs (sizes): 10(12), 0(1), 0(1)
        Group IDs (sizes): 15(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 107(2400), 0(1), 0(1)
        Local IDs (sizes): 11(12), 0(1), 0(1)
        Group IDs (sizes): 8(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 119(2400), 0(1), 0(1)
        Local IDs (sizes): 11(12), 0(1), 0(1)
        Group IDs (sizes): 9(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 131(2400), 0(1), 0(1)
        Local IDs (sizes): 11(12), 0(1), 0(1)
        Group IDs (sizes): 10(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 143(2400), 0(1), 0(1)
        Local IDs (sizes): 11(12), 0(1), 0(1)
        Group IDs (sizes): 11(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 155(2400), 0(1), 0(1)
        Local IDs (sizes): 11(12), 0(1), 0(1)
        Group IDs (sizes): 12(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 167(2400), 0(1), 0(1)
        Local IDs (sizes): 11(12), 0(1), 0(1)
        Group IDs (sizes): 13(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 179(2400), 0(1), 0(1)
        Local IDs (sizes): 11(12), 0(1), 0(1)
        Group IDs (sizes): 14(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 191(2400), 0(1), 0(1)
        Local IDs (sizes): 11(12), 0(1), 0(1)
        Group IDs (sizes): 15(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Running work-group 8, 0, 0
[VC4CL](VC4CL Queue Han): Mailbox buffer before: 00000028 00000000 00030011 00000010 00000010 0000000c 020029a8 00000001 00007530 00000000
[VC4CL](VC4CL Queue Han): Mailbox buffer after: 00000028 80000000 00030011 00000010 00000004 00000000 020029a8 00000001 00007530 00000000
[VC4CL](VC4CL Queue Han): Execution: successful
<snip/>
[VC4CL](VC4CL Queue Han): Running work-group 184, 0, 0
[VC4CL](VC4CL Queue Han): Mailbox buffer before: 00000028 00000000 00030011 00000010 00000010 0000000c 020029a8 00000001 00007530 00000000
[VC4CL](VC4CL Queue Han): Mailbox buffer after: 00000028 80000000 00030011 00000010 00000004 00000000 020029a8 00000001 00007530 00000000
[VC4CL](VC4CL Queue Han): Execution: successful
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2304(2400), 0(1), 0(1)
        Local IDs (sizes): 0(12), 0(1), 0(1)
        Group IDs (sizes): 192(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2316(2400), 0(1), 0(1)
        Local IDs (sizes): 0(12), 0(1), 0(1)
        Group IDs (sizes): 193(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2328(2400), 0(1), 0(1)
        Local IDs (sizes): 0(12), 0(1), 0(1)
        Group IDs (sizes): 194(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2340(2400), 0(1), 0(1)
        Local IDs (sizes): 0(12), 0(1), 0(1)
        Group IDs (sizes): 195(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2352(2400), 0(1), 0(1)
        Local IDs (sizes): 0(12), 0(1), 0(1)
        Group IDs (sizes): 196(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2364(2400), 0(1), 0(1)
        Local IDs (sizes): 0(12), 0(1), 0(1)
        Group IDs (sizes): 197(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2376(2400), 0(1), 0(1)
        Local IDs (sizes): 0(12), 0(1), 0(1)
        Group IDs (sizes): 198(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2388(2400), 0(1), 0(1)
        Local IDs (sizes): 0(12), 0(1), 0(1)
        Group IDs (sizes): 199(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2305(2400), 0(1), 0(1)
        Local IDs (sizes): 1(12), 0(1), 0(1)
        Group IDs (sizes): 192(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2317(2400), 0(1), 0(1)
        Local IDs (sizes): 1(12), 0(1), 0(1)
        Group IDs (sizes): 193(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2329(2400), 0(1), 0(1)
        Local IDs (sizes): 1(12), 0(1), 0(1)
        Group IDs (sizes): 194(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2341(2400), 0(1), 0(1)
        Local IDs (sizes): 1(12), 0(1), 0(1)
        Group IDs (sizes): 195(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2353(2400), 0(1), 0(1)
        Local IDs (sizes): 1(12), 0(1), 0(1)
        Group IDs (sizes): 196(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2365(2400), 0(1), 0(1)
        Local IDs (sizes): 1(12), 0(1), 0(1)
        Group IDs (sizes): 197(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2377(2400), 0(1), 0(1)
        Local IDs (sizes): 1(12), 0(1), 0(1)
        Group IDs (sizes): 198(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2389(2400), 0(1), 0(1)
        Local IDs (sizes): 1(12), 0(1), 0(1)
        Group IDs (sizes): 199(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2306(2400), 0(1), 0(1)
        Local IDs (sizes): 2(12), 0(1), 0(1)
        Group IDs (sizes): 192(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2318(2400), 0(1), 0(1)
        Local IDs (sizes): 2(12), 0(1), 0(1)
        Group IDs (sizes): 193(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2330(2400), 0(1), 0(1)
        Local IDs (sizes): 2(12), 0(1), 0(1)
        Group IDs (sizes): 194(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2342(2400), 0(1), 0(1)
        Local IDs (sizes): 2(12), 0(1), 0(1)
        Group IDs (sizes): 195(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2354(2400), 0(1), 0(1)
        Local IDs (sizes): 2(12), 0(1), 0(1)
        Group IDs (sizes): 196(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2366(2400), 0(1), 0(1)
        Local IDs (sizes): 2(12), 0(1), 0(1)
        Group IDs (sizes): 197(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2378(2400), 0(1), 0(1)
        Local IDs (sizes): 2(12), 0(1), 0(1)
        Group IDs (sizes): 198(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2390(2400), 0(1), 0(1)
        Local IDs (sizes): 2(12), 0(1), 0(1)
        Group IDs (sizes): 199(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2307(2400), 0(1), 0(1)
        Local IDs (sizes): 3(12), 0(1), 0(1)
        Group IDs (sizes): 192(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2319(2400), 0(1), 0(1)
        Local IDs (sizes): 3(12), 0(1), 0(1)
        Group IDs (sizes): 193(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2331(2400), 0(1), 0(1)
        Local IDs (sizes): 3(12), 0(1), 0(1)
        Group IDs (sizes): 194(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2343(2400), 0(1), 0(1)
        Local IDs (sizes): 3(12), 0(1), 0(1)
        Group IDs (sizes): 195(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2355(2400), 0(1), 0(1)
        Local IDs (sizes): 3(12), 0(1), 0(1)
        Group IDs (sizes): 196(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2367(2400), 0(1), 0(1)
        Local IDs (sizes): 3(12), 0(1), 0(1)
        Group IDs (sizes): 197(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2379(2400), 0(1), 0(1)
        Local IDs (sizes): 3(12), 0(1), 0(1)
        Group IDs (sizes): 198(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2391(2400), 0(1), 0(1)
        Local IDs (sizes): 3(12), 0(1), 0(1)
        Group IDs (sizes): 199(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2308(2400), 0(1), 0(1)
        Local IDs (sizes): 4(12), 0(1), 0(1)
        Group IDs (sizes): 192(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2320(2400), 0(1), 0(1)
        Local IDs (sizes): 4(12), 0(1), 0(1)
        Group IDs (sizes): 193(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2332(2400), 0(1), 0(1)
        Local IDs (sizes): 4(12), 0(1), 0(1)
        Group IDs (sizes): 194(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2344(2400), 0(1), 0(1)
        Local IDs (sizes): 4(12), 0(1), 0(1)
        Group IDs (sizes): 195(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2356(2400), 0(1), 0(1)
        Local IDs (sizes): 4(12), 0(1), 0(1)
        Group IDs (sizes): 196(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2368(2400), 0(1), 0(1)
        Local IDs (sizes): 4(12), 0(1), 0(1)
        Group IDs (sizes): 197(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2380(2400), 0(1), 0(1)
        Local IDs (sizes): 4(12), 0(1), 0(1)
        Group IDs (sizes): 198(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2392(2400), 0(1), 0(1)
        Local IDs (sizes): 4(12), 0(1), 0(1)
        Group IDs (sizes): 199(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2309(2400), 0(1), 0(1)
        Local IDs (sizes): 5(12), 0(1), 0(1)
        Group IDs (sizes): 192(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2321(2400), 0(1), 0(1)
        Local IDs (sizes): 5(12), 0(1), 0(1)
        Group IDs (sizes): 193(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2333(2400), 0(1), 0(1)
        Local IDs (sizes): 5(12), 0(1), 0(1)
        Group IDs (sizes): 194(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2345(2400), 0(1), 0(1)
        Local IDs (sizes): 5(12), 0(1), 0(1)
        Group IDs (sizes): 195(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2357(2400), 0(1), 0(1)
        Local IDs (sizes): 5(12), 0(1), 0(1)
        Group IDs (sizes): 196(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2369(2400), 0(1), 0(1)
        Local IDs (sizes): 5(12), 0(1), 0(1)
        Group IDs (sizes): 197(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2381(2400), 0(1), 0(1)
        Local IDs (sizes): 5(12), 0(1), 0(1)
        Group IDs (sizes): 198(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2393(2400), 0(1), 0(1)
        Local IDs (sizes): 5(12), 0(1), 0(1)
        Group IDs (sizes): 199(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2310(2400), 0(1), 0(1)
        Local IDs (sizes): 6(12), 0(1), 0(1)
        Group IDs (sizes): 192(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2322(2400), 0(1), 0(1)
        Local IDs (sizes): 6(12), 0(1), 0(1)
        Group IDs (sizes): 193(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2334(2400), 0(1), 0(1)
        Local IDs (sizes): 6(12), 0(1), 0(1)
        Group IDs (sizes): 194(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2346(2400), 0(1), 0(1)
        Local IDs (sizes): 6(12), 0(1), 0(1)
        Group IDs (sizes): 195(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2358(2400), 0(1), 0(1)
        Local IDs (sizes): 6(12), 0(1), 0(1)
        Group IDs (sizes): 196(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2370(2400), 0(1), 0(1)
        Local IDs (sizes): 6(12), 0(1), 0(1)
        Group IDs (sizes): 197(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2382(2400), 0(1), 0(1)
        Local IDs (sizes): 6(12), 0(1), 0(1)
        Group IDs (sizes): 198(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2394(2400), 0(1), 0(1)
        Local IDs (sizes): 6(12), 0(1), 0(1)
        Group IDs (sizes): 199(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2311(2400), 0(1), 0(1)
        Local IDs (sizes): 7(12), 0(1), 0(1)
        Group IDs (sizes): 192(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2323(2400), 0(1), 0(1)
        Local IDs (sizes): 7(12), 0(1), 0(1)
        Group IDs (sizes): 193(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2335(2400), 0(1), 0(1)
        Local IDs (sizes): 7(12), 0(1), 0(1)
        Group IDs (sizes): 194(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2347(2400), 0(1), 0(1)
        Local IDs (sizes): 7(12), 0(1), 0(1)
        Group IDs (sizes): 195(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2359(2400), 0(1), 0(1)
        Local IDs (sizes): 7(12), 0(1), 0(1)
        Group IDs (sizes): 196(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2371(2400), 0(1), 0(1)
        Local IDs (sizes): 7(12), 0(1), 0(1)
        Group IDs (sizes): 197(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2383(2400), 0(1), 0(1)
        Local IDs (sizes): 7(12), 0(1), 0(1)
        Group IDs (sizes): 198(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2395(2400), 0(1), 0(1)
        Local IDs (sizes): 7(12), 0(1), 0(1)
        Group IDs (sizes): 199(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2312(2400), 0(1), 0(1)
        Local IDs (sizes): 8(12), 0(1), 0(1)
        Group IDs (sizes): 192(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2324(2400), 0(1), 0(1)
        Local IDs (sizes): 8(12), 0(1), 0(1)
        Group IDs (sizes): 193(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2336(2400), 0(1), 0(1)
        Local IDs (sizes): 8(12), 0(1), 0(1)
        Group IDs (sizes): 194(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2348(2400), 0(1), 0(1)
        Local IDs (sizes): 8(12), 0(1), 0(1)
        Group IDs (sizes): 195(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2360(2400), 0(1), 0(1)
        Local IDs (sizes): 8(12), 0(1), 0(1)
        Group IDs (sizes): 196(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2372(2400), 0(1), 0(1)
        Local IDs (sizes): 8(12), 0(1), 0(1)
        Group IDs (sizes): 197(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2384(2400), 0(1), 0(1)
        Local IDs (sizes): 8(12), 0(1), 0(1)
        Group IDs (sizes): 198(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2396(2400), 0(1), 0(1)
        Local IDs (sizes): 8(12), 0(1), 0(1)
        Group IDs (sizes): 199(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2313(2400), 0(1), 0(1)
        Local IDs (sizes): 9(12), 0(1), 0(1)
        Group IDs (sizes): 192(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2325(2400), 0(1), 0(1)
        Local IDs (sizes): 9(12), 0(1), 0(1)
        Group IDs (sizes): 193(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2337(2400), 0(1), 0(1)
        Local IDs (sizes): 9(12), 0(1), 0(1)
        Group IDs (sizes): 194(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2349(2400), 0(1), 0(1)
        Local IDs (sizes): 9(12), 0(1), 0(1)
        Group IDs (sizes): 195(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2361(2400), 0(1), 0(1)
        Local IDs (sizes): 9(12), 0(1), 0(1)
        Group IDs (sizes): 196(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2373(2400), 0(1), 0(1)
        Local IDs (sizes): 9(12), 0(1), 0(1)
        Group IDs (sizes): 197(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2385(2400), 0(1), 0(1)
        Local IDs (sizes): 9(12), 0(1), 0(1)
        Group IDs (sizes): 198(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2397(2400), 0(1), 0(1)
        Local IDs (sizes): 9(12), 0(1), 0(1)
        Group IDs (sizes): 199(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2314(2400), 0(1), 0(1)
        Local IDs (sizes): 10(12), 0(1), 0(1)
        Group IDs (sizes): 192(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2326(2400), 0(1), 0(1)
        Local IDs (sizes): 10(12), 0(1), 0(1)
        Group IDs (sizes): 193(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2338(2400), 0(1), 0(1)
        Local IDs (sizes): 10(12), 0(1), 0(1)
        Group IDs (sizes): 194(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2350(2400), 0(1), 0(1)
        Local IDs (sizes): 10(12), 0(1), 0(1)
        Group IDs (sizes): 195(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2362(2400), 0(1), 0(1)
        Local IDs (sizes): 10(12), 0(1), 0(1)
        Group IDs (sizes): 196(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2374(2400), 0(1), 0(1)
        Local IDs (sizes): 10(12), 0(1), 0(1)
        Group IDs (sizes): 197(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2386(2400), 0(1), 0(1)
        Local IDs (sizes): 10(12), 0(1), 0(1)
        Group IDs (sizes): 198(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2398(2400), 0(1), 0(1)
        Local IDs (sizes): 10(12), 0(1), 0(1)
        Group IDs (sizes): 199(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2315(2400), 0(1), 0(1)
        Local IDs (sizes): 11(12), 0(1), 0(1)
        Group IDs (sizes): 192(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2327(2400), 0(1), 0(1)
        Local IDs (sizes): 11(12), 0(1), 0(1)
        Group IDs (sizes): 193(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2339(2400), 0(1), 0(1)
        Local IDs (sizes): 11(12), 0(1), 0(1)
        Group IDs (sizes): 194(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2351(2400), 0(1), 0(1)
        Local IDs (sizes): 11(12), 0(1), 0(1)
        Group IDs (sizes): 195(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2363(2400), 0(1), 0(1)
        Local IDs (sizes): 11(12), 0(1), 0(1)
        Group IDs (sizes): 196(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2375(2400), 0(1), 0(1)
        Local IDs (sizes): 11(12), 0(1), 0(1)
        Group IDs (sizes): 197(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2387(2400), 0(1), 0(1)
        Local IDs (sizes): 11(12), 0(1), 0(1)
        Group IDs (sizes): 198(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2399(2400), 0(1), 0(1)
        Local IDs (sizes): 11(12), 0(1), 0(1)
        Group IDs (sizes): 199(200), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Running work-group 192, 0, 0
[VC4CL](VC4CL Queue Han): Mailbox buffer before: 00000028 00000000 00030011 00000010 00000010 0000000c 020029a8 00000001 00007530 00000000
[VC4CL](VC4CL Queue Han): Mailbox buffer after: 00000028 80000000 00030011 00000010 00000004 00000000 020029a8 00000001 00007530 00000000
[VC4CL](VC4CL Queue Han): Execution: successful
[VC4CL](VC4CL Queue Han): Mailbox buffer before: 00000020 00000000 0003000e 00000008 00000004 00000004 00000000 00000000
[VC4CL](VC4CL Queue Han): Mailbox buffer after: 00000020 80000000 0003000e 00000008 00000004 00000000 00000000 00000000
[VC4CL](VC4CL Queue Han): Mailbox buffer before: 00000020 00000000 0003000f 00000008 00000004 00000004 00000000 00000000
[VC4CL](VC4CL Queue Han): Mailbox buffer after: 00000020 80000000 0003000f 00000008 00000004 00000000 00000000 00000000
[VC4CL](VC4CL Queue Han): Deallocated 192 bytes of buffer: handle 4, device address 0x1800000, host address 0xb00e38
[VC4CL](VC4CL Queue Han): Mailbox buffer before: 00000020 00000000 0003000e 00000008 00000004 00000005 00000000 00000000
[VC4CL](VC4CL Queue Han): Mailbox buffer after: 00000020 80000000 0003000e 00000008 00000004 00000000 00000000 00000000
[VC4CL](VC4CL Queue Han): Mailbox buffer before: 00000020 00000000 0003000f 00000008 00000004 00000005 00000000 00000000
[VC4CL](VC4CL Queue Han): Mailbox buffer after: 00000020 80000000 0003000f 00000008 00000004 00000000 00000000 00000000
[VC4CL](VC4CL Queue Han): Deallocated 16384 bytes of buffer: handle 5, device address 0x2000000, host address 0x73074d20
[VC4CL](    dot_product): Releasing live-time of object: cl_event
[VC4CL](    dot_product): API call: cl_int clGetEventProfilingInfo(cl_event 0xafb26c, cl_profiling_info 4738, size_t 8, void* 0x7ee6e600, size_t* 0)
[VC4CL](    dot_product): API call: cl_int clGetEventProfilingInfo(cl_event 0xafb26c, cl_profiling_info 4739, size_t 8, void* 0x7ee6e5f8, size_t* 0)
On the device, the dot product kernel completed in 36 ns.
Dot product failed.
[VC4CL](    dot_product): API call: cl_int clReleaseMemObject(cl_mem 0xad3854)
[VC4CL](    dot_product): Releasing live-time of object: cl_mem
[VC4CL](    dot_product): Mailbox buffer before: 00000020 00000000 0003000e 00000008 00000004 00000001 00000000 00000000
[VC4CL](    dot_product): Mailbox buffer after: 00000020 80000000 0003000e 00000008 00000004 00000000 00000000 00000000
[VC4CL](    dot_product): Mailbox buffer before: 00000020 00000000 0003000f 00000008 00000004 00000001 00000000 00000000
[VC4CL](    dot_product): Mailbox buffer after: 00000020 80000000 0003000f 00000008 00000004 00000000 00000000 00000000
[VC4CL](    dot_product): Deallocated 38400 bytes of buffer: handle 1, device address 0x0, host address 0xaef2b0
[VC4CL](    dot_product): API call: cl_int clReleaseMemObject(cl_mem 0xad3ee4)
[VC4CL](    dot_product): Releasing live-time of object: cl_mem
[VC4CL](    dot_product): Mailbox buffer before: 00000020 00000000 0003000e 00000008 00000004 00000002 00000000 00000000
[VC4CL](    dot_product): Mailbox buffer after: 00000020 80000000 0003000e 00000008 00000004 00000000 00000000 00000000
[VC4CL](    dot_product): Mailbox buffer before: 00000020 00000000 0003000f 00000008 00000004 00000002 00000000 00000000
[VC4CL](    dot_product): Mailbox buffer after: 00000020 80000000 0003000f 00000008 00000004 00000000 00000000 00000000
[VC4CL](    dot_product): Deallocated 38400 bytes of buffer: handle 2, device address 0x800000, host address 0xb05b30
[VC4CL](    dot_product): API call: cl_int clReleaseMemObject(cl_mem 0xad318c)
[VC4CL](    dot_product): Releasing live-time of object: cl_mem
[VC4CL](    dot_product): Mailbox buffer before: 00000020 00000000 0003000e 00000008 00000004 00000003 00000000 00000000
[VC4CL](    dot_product): Mailbox buffer after: 00000020 80000000 0003000e 00000008 00000004 00000000 00000000 00000000
[VC4CL](    dot_product): Mailbox buffer before: 00000020 00000000 0003000f 00000008 00000004 00000003 00000000 00000000
[VC4CL](    dot_product): Mailbox buffer after: 00000020 80000000 0003000f 00000008 00000004 00000000 00000000 00000000
[VC4CL](    dot_product): Deallocated 800 bytes of buffer: handle 3, device address 0x1000000, host address 0xb346b8
[VC4CL](    dot_product): API call: cl_int clReleaseKernel(cl_kernel 0xb34b8c)
[VC4CL](    dot_product): API call: cl_int clReleaseCommandQueue(cl_command_queue 0xad1dc4)
[VC4CL](    dot_product): API call: cl_int clReleaseProgram(cl_program 0xad322c)
[VC4CL](    dot_product): API call: cl_int clReleaseContext(cl_context 0xad1d4c)
terminate called without an active exception
Aborted

@doe300
Copy link
Owner

doe300 commented Mar 29, 2019

I tried it out myself too yesterday and it took a while but finished. Which is bad in this case, since it doesn't help to find the problem.

So from the Dot product failed. line, it looks like the result of the execution is wrong. I will look into this, but it still does not explain why the original execution failed to execute at all.

The terminate called without an active exception and Aborted occurs if a VC4CL internal thread had no time to shut down before the program exits, but this has no effect on the result. I will also try to fix this one.

One thing you could check:
On the Raspberry Pi (with native execution), does the time the execution takes before failing correspond to the timeout? E.g. if you increase/decrease the timeout in https://github.com/doe300/VC4CL/blob/master/src/executor.cpp#L28, does the actual time taken also increase/decrease?

@kernhanda
Copy link
Contributor Author

Your suspicions seem to be correct.

Without any changes:

pi@raspberrypi:~/gpu-on-pi/opencl_dot_product$ time sudo ./dot_product
<snip/>

real    0m34.003s
user    0m33.750s
sys     0m0.140s

Changing the 30s timeout to 60s:

pi@raspberrypi:~/gpu-on-pi/opencl_dot_product$ time sudo ./dot_product
<snip/>

real    1m3.891s
user    1m3.760s
sys     0m0.110s

Changing the 60s timeout to 120s:

pi@raspberrypi:~/gpu-on-pi/opencl_dot_product$ time sudo ./dot_product
<snip/>

real    2m3.949s
user    2m3.720s
sys     0m0.130s

@kernhanda
Copy link
Contributor Author

Ping. :) Any luck?

@doe300
Copy link
Owner

doe300 commented Apr 3, 2019

Sorry, didn't have time to look into that.
Since the error is not reproducible on the emulator, I will have to take a look at the generated source code, but I'm not sure when I will have time to do that.

@doe300 doe300 added the bug label Apr 3, 2019
@doe300
Copy link
Owner

doe300 commented May 4, 2019

So it looks like for a work-group size of 12 (the maximum), the execution hangs, probably somewhere in barrier(...). For a work-group-size of less than 12, it at least passes, though the result is wrong on the Raspberry Pi, but correct on the emulator. This looks like there is a dead-lock/infinite stall somewhere in barrier(...) which the emulator does not catch...

A side note: clGetEventProfilingInfo can only be queried if the command-queue is created with CL_QUEUE_PROFILING_ENABLE.
Also you need to call clReleaseEvent(prof_event) on cleanup, this also removes the termination and abortion issue.

doe300 added a commit to doe300/VC4C that referenced this issue May 4, 2019
* Improves a few optimizations
* Adapts emulator to semaphore changes

See doe300/VC4CL#67

Fixes:
boost_compute/test_reduce_by_key
OpenCL-CTS/basic/kernel_limit_constants
@doe300
Copy link
Owner

doe300 commented May 4, 2019

So, turns out I did semaphore access wrong, that is why the execution timed out.
Now the execution passes, but the result seems to still be wrong.

@rNoz
Copy link

rNoz commented Dec 21, 2019

Hi.
First, thank you for you effort with this project. It is a massive complex work.

I tried to reproduce this bug, and I find that VC4CL in my Raspberry Pi 3 B+, has some float point precision errors.

I did 2 tests: dot_product (the same as this issue) and a basic saxpy program ( dest[i] = src[i] * 3.14 ). In both cases I get different values than in the host:

[0] Host: 3313.5571 Device: 3313.5571
[1] Host: 6579.5039 Device: 6579.5034
[FAILURE] at index 1: 6579.5039 != 6579.5034
[2] Host: 6708.3955 Device: 6708.3950
[FAILURE] at index 2: 6708.3955 != 6708.3950
(as you can see, in many indexes but not all)

The dot product has problems with local memory and computation. I don't know why. I checked your other issues but I didn't find a workaround. For example, if I do a simple operation, the final debug values are just wrong:

    __local float4 partial_dot[12]; // instead of __local in parameters
   if (BUG){
      partial_dot[lid] = a_vec[gid] + (float4)1.0f;
   }else{
      partial_dot[lid] = a_vec[gid];
   }
   output_debug[lid] = partial_dot[lid];

Without BUG:
[4] 40.00000
[5] 50.00000
[6] 60.00000
[7] 70.00000

With BUG (the same with multiplication, etc):
[4] -360777220096.00000
[5] -360777220096.00000
[6] -360777220096.00000
[7] -360777220096.00000

Results:
571.67017 (check: 19.49584)
683.61450 (check: 16.47419)
781.39270 (check: 16.03123)
661.78967 (check: 13.76786)
Total: 2698.46704 (check: 68.02542)
Dot product failed.

What can be wrong?

@doe300
Copy link
Owner

doe300 commented Dec 22, 2019

The difference between 6579.5039 and 6579.5034 is 1 ULP, same for 6708.3955 and 6708.3950. The problem here is probably, how the constant 3.14 gets stored (3.140000 vs. 3.1399998). Can you given me the full kernel code you used?

For the second part:
With -DBUG=0, the code is compiled to a simple memory copy and with -DBUG=1 it is compiled to read-add-write. So yes, there seems to be something wrong with the lowered __local buffer.

@rNoz
Copy link

rNoz commented Dec 23, 2019

I have created this repo: https://github.com/rNoz/opencl_embedded_tests

Precision

See the differences when using FACTOR=3.1415 or FACTOR=2.0.

It is the expected behavior? It is the first time I read about ULP. I checked that OpenCL is less restrictive (less precision) than CUDA, and I assume in Embedded Profile it is even further. Any advice regarding doing "unit test"/bench check for embedded?
I was surprise the dot_product OpenCL repo has a final if (abs(opencl_value - host_value) > 10.0)) failure(), because 10.0 seems quite a lot of error for that calc.

$ FACTOR=3.1415 VECTOR=12 CHECK=1 sudo -E ./build/saxpy saxpy.cl
vector_len: 12
check results: true
factor: 3.141500
using platform.device: 0.0
saxpy.cl
operation: saxpy
=== 1 OpenCL platform(s) found: ===
  -- 0 --
  PROFILE = EMBEDDED_PROFILE
  VERSION = OpenCL 1.2 VC4CL 0.4.9999
  NAME = OpenCL for the Raspberry Pi VideoCore IV GPU
  VENDOR = doe300
  EXTENSIONS = cl_khr_il_program cl_khr_spir cl_khr_create_command_queue cl_altera_device_temperature cl_altera_live_object_tracking cl_khr_icd cl_vc4cl_performance_counters
=== 2 OpenCL device(s) found on platform:
  -- 0 --
  DEVICE_NAME = VideoCore IV GPU
  DEVICE_VENDOR = Broadcom
  DEVICE_VERSION = OpenCL 1.2 VC4CL 0.4.9999
  DRIVER_VERSION = 0.4.9999
  DEVICE_MAX_COMPUTE_UNITS = 1
  DEVICE_MAX_CLOCK_FREQUENCY = 300
  DEVICE_GLOBAL_MEM_SIZE = 67108864
  DEVICE_MAX_WG_SIZE X=12,Y=12,Z=12
Creating context...
Creating command queue...
Creating program...
Building program from source...
attempting to create input buffer
attempting to create output buffer
attempting to create kernel
setting up kernel args cl_mem: 0x7f4800bc38
attempting to enqueue write buffer
attempting to enqueue kernel
Enqueue'd kerenel
time(ns):590937
Result:
[0] Host: 263.944977  Device: 263.944977
[1] Host: 123.895401  Device: 123.895393
[FAILURE] at index 1:  123.895401 != 123.895393
[2] Host: 246.010620  Device: 246.010605
[FAILURE] at index 2:  246.010620 != 246.010605
[FAILURE] at index 4:  286.394043 != 286.394012
[FAILURE] at index 6:  105.310226 != 105.310219
[FAILURE] at index 9:  174.029678 != 174.029663

computed 12 elements

$ FACTOR=2.0 VECTOR=12 CHECK=1 sudo -E ./build/saxpy saxpy.cl
vector_len: 12
check results: true
factor: 2.000000
using platform.device: 0.0
saxpy.cl
operation: saxpy
=== 1 OpenCL platform(s) found: ===
  -- 0 --
  PROFILE = EMBEDDED_PROFILE
  VERSION = OpenCL 1.2 VC4CL 0.4.9999
  NAME = OpenCL for the Raspberry Pi VideoCore IV GPU
  VENDOR = doe300
  EXTENSIONS = cl_khr_il_program cl_khr_spir cl_khr_create_command_queue cl_altera_device_temperature cl_altera_live_object_tracking cl_khr_icd cl_vc4cl_performance_counters
=== 2 OpenCL device(s) found on platform:
  -- 0 --
  DEVICE_NAME = VideoCore IV GPU
  DEVICE_VENDOR = Broadcom
  DEVICE_VERSION = OpenCL 1.2 VC4CL 0.4.9999
  DRIVER_VERSION = 0.4.9999
  DEVICE_MAX_COMPUTE_UNITS = 1
  DEVICE_MAX_CLOCK_FREQUENCY = 300
  DEVICE_GLOBAL_MEM_SIZE = 67108864
  DEVICE_MAX_WG_SIZE X=12,Y=12,Z=12
Creating context...
Creating command queue...
Creating program...
Building program from source...
attempting to create input buffer
attempting to create output buffer
attempting to create kernel
setting up kernel args cl_mem: 0x7f5400bdc8
attempting to enqueue write buffer
attempting to enqueue kernel
Enqueue'd kerenel
time(ns):592813
Result:
[0] Host: 168.037552  Device: 168.037552
[1] Host: 78.876587  Device: 78.876587
[2] Host: 156.619843  Device: 156.619843

computed 12 elements

Non deterministic behavior?

I ran:

$ CHECK=1 VECTOR=12 sudo -E ./build/vectors vecmul.cl
vector: 12
check results: true
using platform.device: 0.0
operation: vecmul
max wg size: 12
[0] OpenCL (2.00000) Host (2.00000)
[1] OpenCL (8.00000) Host (8.00000)
[2] OpenCL (18.00000) Host (18.00000)
[3] OpenCL (32.00000) Host (32.00000)
[8] OpenCL (162.00000) Host (162.00000)
[9] OpenCL (200.00000) Host (200.00000)
[10] OpenCL (242.00000) Host (242.00000)
[11] OpenCL (288.00000) Host (288.00000)
Everything seems to work fine!

$ CHECK=1 VECTOR=1024 sudo -E ./build/vectors vecmul.cl
...
Everything seems to work fine!

$ CHECK=1 VECTOR=102400 sudo -E ./build/vectors vecmul.cl
... 
[FAILURE] [102371] OpenCL (20960051200.00000) Host (20960053248.00000)
[FAILURE] [102372] OpenCL (20960460800.00000) Host (20960462848.00000)
[FAILURE] [102373] OpenCL (20960870400.00000) Host (20960872448.00000)
[FAILURE] [102374] OpenCL (20961280000.00000) Host (20961282048.00000)
[FAILURE] [102375] OpenCL (20961689600.00000) Host (20961691648.00000)
[FAILURE] [102376] OpenCL (20962099200.00000) Host (20962101248.00000)
[102396] OpenCL (20970291200.00000) Host (20970291200.00000)
[102397] OpenCL (20970700800.00000) Host (20970700800.00000)
[102398] OpenCL (20971110400.00000) Host (20971110400.00000)
[102399] OpenCL (20971520000.00000) Host (20971520000.00000)

And then, since this execution with all these failures, even the 1024 fails now:

$ CHECK=1 VECTOR=1024 sudo -E ./build/vectors vecmul.cl
vector: 1024
check results: true
using platform.device: 0.0
operation: vecmul
max wg size: 12
[0] OpenCL (2.00000) Host (2.00000)
[1] OpenCL (8.00000) Host (8.00000)
[2] OpenCL (18.00000) Host (18.00000)
[3] OpenCL (32.00000) Host (32.00000)
[FAILURE] [16] OpenCL (3447362.00000) Host (578.00000)
[FAILURE] [17] OpenCL (3650184.00000) Host (648.00000)
[FAILURE] [18] OpenCL (3853010.00000) Host (722.00000)
[FAILURE] [19] OpenCL (4055840.00000) Host (800.00000)
[FAILURE] [20] OpenCL (4258674.00000) Host (882.00000)
[FAILURE] [21] OpenCL (4461512.00000) Host (968.00000)
[FAILURE] [22] OpenCL (4664354.00000) Host (1058.00000)
[FAILURE] [23] OpenCL (4867200.00000) Host (1152.00000)
[FAILURE] [24] OpenCL (5070050.00000) Host (1250.00000)
...
[FAILURE] [1020] OpenCL (209094672.00000) Host (2084882.00000)
[1021] OpenCL (209301504.00000) Host (2088968.00000)
[FAILURE] [1021] OpenCL (209301504.00000) Host (2088968.00000)
[1022] OpenCL (209508352.00000) Host (2093058.00000)
[FAILURE] [1022] OpenCL (209508352.00000) Host (2093058.00000)
[1023] OpenCL (209715200.00000) Host (2097152.00000)
[FAILURE] [1023] OpenCL (209715200.00000) Host (2097152.00000)

# with 128, to see that not always starts failing in 16:
$ CHECK=1 VECTOR=128 sudo -E ./build/vectors vecadd.cl
vector: 128
check results: true
using platform.device: 0.0
operation: vecadd
max wg size: 12
[0] OpenCL (3.00000) Host (3.00000)
[1] OpenCL (6.00000) Host (6.00000)
[2] OpenCL (9.00000) Host (9.00000)
[3] OpenCL (12.00000) Host (12.00000)
[FAILURE] [28] OpenCL (58.00000) Host (87.00000)
[FAILURE] [29] OpenCL (60.00000) Host (90.00000)
[FAILURE] [30] OpenCL (62.00000) Host (93.00000)
[FAILURE] [31] OpenCL (64.00000) Host (96.00000)
[124] OpenCL (375.00000) Host (375.00000)
[125] OpenCL (378.00000) Host (378.00000)
[126] OpenCL (381.00000) Host (381.00000)
[127] OpenCL (384.00000) Host (384.00000)

But small tests like 12 or 24 works:

 ❯ CHECK=1 VECTOR=24 sudo -E ./build/vectors vecmul.cl
vector: 24
check results: true
using platform.device: 0.0
operation: vecmul
max wg size: 12
[0] OpenCL (2.00000) Host (2.00000)
[1] OpenCL (8.00000) Host (8.00000)
[2] OpenCL (18.00000) Host (18.00000)
[3] OpenCL (32.00000) Host (32.00000)
[20] OpenCL (882.00000) Host (882.00000)
[21] OpenCL (968.00000) Host (968.00000)
[22] OpenCL (1058.00000) Host (1058.00000)
[23] OpenCL (1152.00000) Host (1152.00000)
Everything seems to work fine!

It is interesting (but chaotic):

  • Two executions doesn't produce the same results (previously was Ok, now it Fails, but eventually it will work in some execution - eg. tested with random executions of 12,24,128,1024)
  • It fails at a random index
  • If the length is small, it continues working.

Maybe you know what is going on here.

Local memory

I understood that is a bug and you will work on it. If you need any more test from my side, please, let me know. Also, I don't know if the kernel/device/building process would affect.

Thank you.

@doe300
Copy link
Owner

doe300 commented Dec 23, 2019

Thanks for investing time to better analyse this error.

See the differences when using FACTOR=3.1415 or FACTOR=2.0.

The difference here is that 2.0 can be exactly represented as float, while 3.1415 cannot. So if or 3.1415 the VC4C compiler chooses the wrong of the two adjacent actually representable values, the result will be off by this one ULP (which then can expand to a larger error depending on the calculations done on the value).

I don't know what is going wrong with the indeterministic results, I have to look further into that.

@doe300
Copy link
Owner

doe300 commented Dec 23, 2019

So I can partially reproduce the behavior you are seeing:
Up to 10240, everything is okay, for 102400 I get a lot of errors.
When I get back to <=10240 samples, the successful results are reported again, so I don't get you indeterministic behavior.
The pattern of the wrong values also seems to suggest that there is a off-by-one-ULP error somewhere.

@rNoz
Copy link

rNoz commented Dec 23, 2019

What can I do to help detecting the issue? Maybe executing low-level tests regarding memory? float ops?
Do you think the distro (Manjaro Arm), kernel version (latest 4.19), device (Rpi 3B+) or anything in the building process (VC4CL opts) would affect?
If you provide me more advices or guidelines I can do further tests.

@doe300
Copy link
Owner

doe300 commented Dec 23, 2019

So for the float multiplication, I did some testing and it looks like the fmul instruction does different rounding than the host CPU. I will have to try to find out which rounding is done by the CPU/GPU, which rounding is allowed by OpenCL and how to round correctly (if we have to change something).

No, I don't think the build process has anything to do with that.

About the memory issue: Did you run the code with the latest VC4C/VC4CL version? Or which version are you using?

@doe300
Copy link
Owner

doe300 commented Dec 24, 2019

So it very much looks like the CPU uses the "normal" IEEE 754 round-to-nearest-even rounding mode, which is also the default rounding modes for the OpenCL 1.2 full profile.
The fmul operation seems to use the IEEE 754 round-to-zero rounding mode, which is why for some calculations, the absolute value of the result for the GPU is 1 ULP less than the CPU.
According to the OpenCL 1.2 standard, for the embedded profile (which is the one supported by the VC4CL implementation) an implementation is allowed to either use the round-to-nearest-even or round-to-zero rounding modes, so the floating point multiplication is correct according to the specification.

So to test for correctness, instead of the equality comparison a comparison with 1 ULP of allowed error has to be used.

@rNoz
Copy link

rNoz commented Dec 24, 2019

Thank you for your time.

I contribute with further tests:

  • Rpi3B: Raspbian Stretch fully updated, VC4CL nightly build installed (Wiki, deb packages from circleci).
  • Rpi3B+: Manjaro Arm fully updated, VC4CL from git, built and installed with -DCMAKE_BUILD_TYPE=Release but also with -DCMAKE_BUILD_TYPE=Debug -DBUILD_DEBUG=On -DBUILD_TESTING=On -DCMAKE_VERBOSE_MAKEFILE=On to dump logs.

They differ partially. Important note: I started writting this issue 6h ago, but I wanted to be completely sure everything I wrote is correct, so, I re-run again the experiments.... oh my bad. Now, they differ, just like yesterday with the non deterministic behavior. Finally, I have decided to skip the inconsistent results from this morning and I will explore further.

The main thing was that this morning I saw how it gave failures at the 11584 index, but I cannot reproduce anymore. I write the two sections with the only relevant (bus error, ULP):

6h ago experiments

Rpi3B:

$ CHECK=1 VECTOR=$(( 10240 * 1 + 512 * 2 + 128 * 2 + 64 + 2 )) sudo -E ./build/vectors vecmul.cl
vector: 11586
...
[11584] OpenCL (268424448.00000) Host (268424448.00000)
[11585] OpenCL (268470784.00000) Host (268470784.00000)
Everything seems to work fine!

$ CHECK=1 VECTOR=$(( 10240 * 1 + 512 * 2 + 128 * 2 + 64 + 3 )) sudo -E ./build/vectors vecmul.cl
...
[11585] OpenCL (268470784.00000) Host (268470784.00000)
[11586] OpenCL (268517120.00000) Host (268517152.00000)
[FAILURE] [11586] OpenCL (268517120.00000) Host (268517152.00000)

$ CHECK=1 VECTOR=$(( 10240 * 1 + 512 * 2 + 128 * 2 + 64 + 4 )) sudo -E ./build/vectors vecmul.cl
vector: 11588
...
[11586] OpenCL (268517120.00000) Host (268517152.00000)
[FAILURE] [11586] OpenCL (268517120.00000) Host (268517152.00000)
[11587] OpenCL (268563488.00000) Host (268563488.00000)

Rpi3B+:

$ CHECK=1 VECTOR=$(( 10240 * 1 + 512 * 2 + 128 * 2 + 64 )) sudo -E ./build/vectors vecmul.cl
vector: 11584
...
[11582] OpenCL (268331776.00000) Host (268331776.00000)
[11583] OpenCL (268378112.00000) Host (268378112.00000)
Everything seems to work fine!

$ CHECK=1 VECTOR=$(( 10240 * 1 + 512 * 2 + 128 * 2 + 64 + 1)) sudo -E ./build/vectors vecmul.cl
vector: 11585
check results: true
using platform.device: 0.0
operation: vecmul
max wg size: 12
[1]    1802 bus error  CHECK=1 VECTOR=$(( 10240 * 1 + 512 * 2 + 128 * 2 + 64 + 1)) sudo -E  vecmul.c

$ CHECK=1 VECTOR=$(( 10240 * 1 + 512 * 2 + 128 * 2 + 64 + 2 )) sudo -E ./build/vectors vecmul.cl
vector: 11586
...
[11584] OpenCL (268424448.00000) Host (268424448.00000)
[11585] OpenCL (268470784.00000) Host (268470784.00000)
Everything seems to work fine!

$ CHECK=1 VECTOR=$(( 10240 * 1 + 512 * 2 + 128 * 2 + 64 + 4 )) sudo -E ./build/vectors vecmul.cl
vector: 11588
...
[11585] OpenCL (268470784.00000) Host (268470784.00000)
[11586] OpenCL (268517120.00000) Host (268517152.00000)
[FAILURE] [11586] OpenCL (268517120.00000) Host (268517152.00000)
[11587] OpenCL (268563488.00000) Host (268563488.00000)

It starts failing in the 11586:

A[11586] = 11586+1;
B[11586] = (11586+1) * 2;
C[11586] = A[11586] * B[11586];
C[11586] = 11587 * 23174 = 268517138 (host, if using double)
C[11586] = 11587 * 23174 = 268517152 (host, if using float)

Note that sometimes I get "bus error", maybe some problem when I use odd number of elements (alignment?), but it only affects the Rpi3B+, not the Rpi3B (it finishes).

Also, we are quite far from the memory limits because 11588 elements * 4 bytes/element * 3 buffers / 1024B/KiB = 135 KiB and clinfo says 64MiB is the max memory allocation.

All the failures may? be related with the floating point precision, but I would like to know how I can do the C test for this kernel. I was doing some tests using nextafterf/nexttowardf, but I didn't get it completely. Also, I found in your Wiki https://github.com/doe300/VC4CL/wiki/NumericalCompliance

You refer to fmul but in the Built-in Functions it does not appear. Maybe I miss something.

So, if I have the first two operations performed in the host (assigning values to the array) and the last one performed in the device (get, multiply, assign), what can I expect regarding the 1ULP?

#include <math.h>
#include <stdio.h>

void main(){
  float f0, f1, f;
  long double d0, d1;

  float floats[] = {
    11587.0f,
    23174.0f,
    268540312.0f,
  };

  int nfloats = sizeof(floats)/sizeof(float);
  for (int nfloat=0; nfloat<nfloats; ++nfloat){
    float f0 = floats[nfloat];
    int times = 3;
    float fto = f0 * 1.10;
    float fton = f0 - f0 * 0.10;
    float fprev = nextafterf(f0, fton); // get the previous
    float ffrom = fprev;
    printf("Finding above %10.10f (starting with %10.10f)\n", f0, ffrom);
    for (int i=0; i<times; ++i){
      f = nextafterf(ffrom, fto);
      printf("[%d] %12s f0 %10.10f f1 %10.10f => %f\n", i, "nextafterf", ffrom, fto, f);
      ffrom = f;
      fto = f * 1.10;
    }
  }

  // ints to float
  for (int i=0; i<200000; ++i){
    float ffrom = (float)i;
    if ((float)(int)ffrom != ffrom){
      printf("not representable %10.10f (%d)\n", ffrom, i);
      break;
    }
  }

  float a = 11587.0;
  float b = (11586+1) * 2.0f;
  printf("a %10.10f\n", a);
  printf("b %10.10f\n", b);
  float c = a * b;
  printf("c = a * b = %10.10f\n", c);
  printf("double c  = %10.10f\n", ((double)(11586 + 1)) * (((double)(11586 + 1) * 2.0d)));

}

In the Rpi host:

Finding above 11587.0000000000 (starting with 11586.9990234375)
[0]   nextafterf f0 11586.9990234375 f1 12745.7001953125 => 11587.000000
[1]   nextafterf f0 11587.0000000000 f1 12745.7001953125 => 11587.000977
[2]   nextafterf f0 11587.0009765625 f1 12745.7011718750 => 11587.001953

Finding above 23174.0000000000 (starting with 23173.9980468750)
[0]   nextafterf f0 23173.9980468750 f1 25491.4003906250 => 23174.000000
[1]   nextafterf f0 23174.0000000000 f1 25491.4003906250 => 23174.001953
[2]   nextafterf f0 23174.0019531250 f1 25491.4023437500 => 23174.003906

Finding above 268540320.0000000000 (starting with 268540288.0000000000)
[0]   nextafterf f0 268540288.0000000000 f1 295394368.0000000000 => 268540320.000000
[1]   nextafterf f0 268540320.0000000000 f1 295394368.0000000000 => 268540352.000000
[2]   nextafterf f0 268540352.0000000000 f1 295394400.0000000000 => 268540384.000000

a 11587.0000000000
b 23174.0000000000
c = a * b = 268517152.0000000000
double c  = 268517138.0000000000

With this, I am checking that in the host, the values 11587 and 23174 are correctly representable. Isn't it?

Then, in the GPU, it fetches both values, multiplies and stores them, getting a final 268517120.0 compared with 268517152.0 in the host.

  1. When it fetches the values, what is the value that the GPU "sees"?
  2. You specify the relative ULP is 2^-24. Should I apply it to the fetched values and then to the multiplication itself?
  3. Imagine that instead of a multiplication I want to do the division. I see x / y allowed ULP = 3. How can be applied? I assume I need to propagate the errors for every operation I add inside the kernel to get the final error (eg. 2 multiplications give different error than 1 multiplication and 1 exp)

Maybe are "basic" questions but I would really appreciate to know how to calculate the host code to check this and more complex kernels.

Last hour experiments

In both cases, Rpi3B+ and Rpi3B, perform the next:

$ for i in `seq 1 1 10`; do for offset in 2 4 8 16 32; do CHECK=1 VECTOR=$(( 10240 * 1 + 512 * 2 + 128 * 2 + 62 + $offset )) sudo -E ./build/vectors vecmul.cl | grep -E '(vector:|FAILURE)' | head; done; done
vector: 11584
vector: 11586
vector: 11590
[FAILURE] [11586] OpenCL (268517120.00000) Host (268517152.00000)
[FAILURE] [11588] OpenCL (268609824.00000) Host (268609856.00000)
vector: 11598
[FAILURE] [11586] OpenCL (268517120.00000) Host (268517152.00000)
[FAILURE] [11588] OpenCL (268609824.00000) Host (268609856.00000)
[FAILURE] [11594] OpenCL (268888032.00000) Host (268888064.00000)
[FAILURE] [11596] OpenCL (268980800.00000) Host (268980832.00000)
vector: 11614
[FAILURE] [11586] OpenCL (268517120.00000) Host (268517152.00000)
[FAILURE] [11588] OpenCL (268609824.00000) Host (268609856.00000)
[FAILURE] [11594] OpenCL (268888032.00000) Host (268888064.00000)
[FAILURE] [11596] OpenCL (268980800.00000) Host (268980832.00000)
[FAILURE] [11602] OpenCL (269259200.00000) Host (269259232.00000)
[FAILURE] [11604] OpenCL (269352032.00000) Host (269352064.00000)
[FAILURE] [11610] OpenCL (269630624.00000) Host (269630656.00000)
[FAILURE] [11612] OpenCL (269723520.00000) Host (269723552.00000)
... # all the same

The only difference here is that the Rpi3B+ needs 3m52s to finish, while the Rpi3B 1m12s. Rebooted both and re-run again. Same results (+-4s).

$ grep '^[^#]' /boot/config.txt
dtparam=spi=on
dtoverlay=w1-gpio
dtparam=audio=on

With clinfo, the Platform/Device version are:

  • OpenCL 1.2 VC4CL 0.4.9999 in Rpi3B+
  • OpenCL 1.2 VC4CL 0.4.138 in Rpi3B
4c4
<   Platform Version                                OpenCL 1.2 VC4CL 0.4.9999
---
>   Platform Version                                OpenCL 1.2 VC4CL 0.4.138
14,15c14,15
<   Device Version                                  OpenCL 1.2 VC4CL 0.4.9999
<   Driver Version                                  0.4.9999
---
>   Device Version                                  OpenCL 1.2 VC4CL 0.4.138
>   Driver Version                                  0.4.138
22a23
>   Available core IDs                              0, 64
24c25
<   Core Temperature (Altera)                       42 C
---
>   Core Temperature (Altera)                       54 C
53c54
<   Global memory size                              79691776 (76MiB)
---
>   Global memory size                              67108864 (64MiB)
55c56
<   Max memory allocation                           79691776 (76MiB)
---
>   Max memory allocation                           67108864 (64MiB)
64c65
<   Local memory size                               79691776 (76MiB)
---
>   Local memory size                               67108864 (64MiB)
66c67
<   Max constant buffer size                        79691776 (76MiB)
---
>   Max constant buffer size                        67108864 (64MiB)
87a89
>

My next test will be to install a full raspbian in the Rpi3B+ to discard issues with Manjaro Arm or its config files/drivers.
Also, I will try again the local memory tests __local.

I have a question for you, since I was having a look to some parts of your code. How did you do this massive work? I couldn't find any funding/project info. Do you work for free on VC4CL? Or did you extracted this as an open-source project but it is backed up/funded by real industry/projects where you get your salary/money from? To me, it seems a full-time job that you carried out for 2 years...

@doe300
Copy link
Owner

doe300 commented Dec 24, 2019

You refer to fmul but in the Built-in Functions it does not appear. Maybe I miss something.

You didn't miss something, I did. The floating point multiplication has to be correctly rounded (0 ULP), similar to the addition and subtraction according to the OpenCL 1.2 specification. I will update the wiki page with the correct allowed and the newly tested behavior (e.g. rounding mode).

  1. When it fetches the values, what is the value that the GPU "sees"?

The GPU sees (as values read from memory) the values you give into the buffer, they are only memory copied, so no conversion there.

  1. You specify the relative ULP is 2^-24. Should I apply it to the fetched values and then to the multiplication itself?

Its the error of the operation (or of the result of the operation).

  1. Imagine that instead of a multiplication I want to do the division. I see x / y allowed ULP = 3. How can be applied? I assume I need to propagate the errors for every operation I add inside the kernel to get the final error (eg. 2 multiplications give different error than 1 multiplication and 1 exp)

About the huge run-time difference: Maybe the debug build is so much slower? But the difference is really huge, so I am not sure this is the sole reason.

Yes, the total error of e.g. an algorithm is calculated by adding up the allowed/actual errors of the operations performed.
So having 2 multiplications gives you an allowed error of (2 * 0 =) 0 ULP. And having a multiplication and an exp gives you (0 + 4 = ) 4 ULP allowed error.

How did you do this massive work? I couldn't find any funding/project info. Do you work for free on VC4CL?

It started out as the programming part of my Masters Thesis pretty much 3 years ago, 2 years ago I published it first on github with the finalization of my thesis and from then I basically continue it as hobby. I sadly do not get paid for it and have to work an actual job, this is also why sometimes the progress is slower than I hope ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants