-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ GPU ] GPU Kernel creation time #2723
Comments
cibot: Thank you for posting issue #2723. The person in charge will reply soon. |
I'm thinking to change the current implementation cl context structure. |
PR #2732 has been created for addressing this issue. Following is the plan:
|
[Suggestion / Need Discussion] As I understood, current GPU's ClContext is expected to be created with LayerNode. |
- This commit updates transpose_cl.cpp/h to inherit LayerImplCl. - This commit implements registerClKernels() of transpose_cl layer. - This commit update cl_context.cpp (applying transpose_cl's update) - This is the last commit to complete nnstreamer#2723. - This can close nnstreamer#2723. Self evaluation: Build test: [X]Passed [ ]Failed [ ]Skipped Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Eunju Yang <[email protected]>
- This commit updates transpose_cl.cpp/h to inherit LayerImplCl. - This commit implements registerClKernels() of transpose_cl layer. - This commit update cl_context.cpp (applying transpose_cl's update) - This is the last commit to complete nnstreamer#2723. - This can close nnstreamer#2723. Self evaluation: Build test: [X]Passed [ ]Failed [ ]Skipped Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Eunju Yang <[email protected]>
- This commit updates transpose_cl.cpp/h to inherit LayerImplCl. - This commit implements registerClKernels() of transpose_cl layer. - This commit update cl_context.cpp (applying transpose_cl's update) - This is the last commit to complete nnstreamer#2723. - This can close nnstreamer#2723. Self evaluation: Build test: [X]Passed [ ]Failed [ ]Skipped Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Eunju Yang <[email protected]>
For now, it seems
clCreateKernel
is called whenever any type of_cl
wrapper function is invoked.For example,
FullyConnectedLayer::forwarding -> dotCl -> dot_cl -> clCreateKernel
Considering the timing of calling
_cl
functions, this could potentially slow down performance. (Of course, it already avoids duplicate of registration. However, it may not be enough for speed up.)Since NNTrainer already has a compilation phase, what about to move the kernel registration process into the compilation stage?
During the compilation step, we can identify which computational units are utilized by each layer and generate the corresponding kernels accordingly.
The text was updated successfully, but these errors were encountered: