-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
machine learning pipeline #41
Comments
Interesting. I'll take a look at oclgrind to get a better understand of what you want. I'll come back to you soon. Perhaps also the Collective Knowledge framework might be of some help: https://github.com/ctuning/ck |
wow, I wasn't even aware that something like ck existed ... gotta have to do some reading now. |
I found some time and I think I understand your idea. By the way, in your first post you meant "emulate an OpenCL device", not "emulate an OpenCL kernel", right? I am not sure if CLTune is what you are looking for though. What is your use-case exactly? I can interpret your goal in two ways:
In the first case CLTune is really not your choice: it can only perform 'optimisations' that are pre-programmed using pre-processor variables into a kernel. CLTune is a tool to help you explore those options, optionally using machine learning to guide you faster towards a decision space. Better to hook this up in the compiler itself I would say. For the second case it might be a better fit, but I am not so sure if this extra information will be helpful to train a model. With the extra data we might also need to look at larger models that can capture this new information. Keep in mind that I am currently not even using the static data that is readily available (number of instructions of some sort, number of branches, vector width, architecture details), I am only using the current user-defined 'configuration'. So perhaps it is better to start there, instead of using run-time information from device emulation? |
yes, I meant "device" like you said - your 2) describes the idea pretty well, i.e. it has more to do with kernel-specific runtime information and using that come up with/guide different transformations I will have to do some reading to see if this is really feasible, for all the reasons you mentioned - however, I did reference a few papers that basically describe doing this sort of thing. So it really is more about narrowing-down and guiding the search space based on kernel-specific information that can be gathered via emulated execution. |
OK! Which papers are those? I'm interested as well to see what's possible. |
I basically worked through the referenced paper and its references section: http://arxiv.org/pdf/1506.00842v1.pdf
|
This is exactly what CLTune is also doing. I mainly wrote CLTune because the other paper's authors did not made any tool available. However, I did not evaluate the machine learning part too much, the paper actually doesn't include it at all (http://www.cedricnugteren.nl/downloads/Nugteren2015a.pdf). Perhaps someone should do some more experiments using CLTune an a small neural network? Or are you perhaps referring to the future work part of the paper:
I am not sure exactly what the authors mean with this, but it could be that this is what you are referring to? In that case I would also contact the authors and see if they haven't already done this? |
A really interesting thread. /cc @hughperkins |
This is related to another discussion currently taking place here: jrprice/Oclgrind#109 (comment)
The idea is to emulate an OpenCL kernel using oclgrind and use this to gather kernel-specific runtime information (think dataflow, variable lifetime) and use this information in the ML pipeline to do more sophisticated transformations based on much more comprehensive. and better, information of the kernel's runtime behavior.
To pull this off, some kind of interface would need to be established between the kernel virtualization and the tuner components, even if that just means serializing kernel-specific data to a file on disk and use that for the ML pipeline.
The text was updated successfully, but these errors were encountered: