-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define a new file format and implement InputFormat and RecordReader for it #77
Comments
I agree with your concerns.
By the way, in reality, we do not use both MNIST vector format or CIFAR-10 pickle file. We mostly use the only original files like JPEG.
|
Thank you for your suggestion, @dongjoon-hyun. I will consider ND4J serialization and discuss this with @jsjason. If it is okay to use it, I will let you know and start implementing it. I can connect to SKT cluster through VPN. Thanks to @jsjason. |
Thank you for considering. By the way, I found that the following codes in DL4J and ND4J. Actually, the file is plain text file delimeted spaces. DL4J
ND4J
I think we already have Numpy compatible read function in ND4J. |
Thank you for letting me know In addition, I saw the code of |
Yep. That is right. But I think we can depend on that part in ND4J layer. |
By the way, for the efficiency, we have to distinguish between input file format and internal storage format. The followings are my opinions until now.
|
@dongjoon-hyun When you say 'internal storage format', are you referring to the intermediate and final output data? |
One thing I am concerned about is our dependency on ND4J. I don't know much about scientific computing libraries, but is it okay to rely on ND4J this much? We could search for and use a library with a greater community. |
@dongjoon-hyun, (image) (delimiter) (label) (newline) By using |
@jsjason , I meant 'internal storage format' for really For dependency, I always welcome your further research and proposal for better BLAS library supporting CPU/GPU. :) |
Ur, @beomyeol , I meant |
For the train/test data and label, you can read with the similar way as you described, i.e., m x (n + 1) matrix.
In addition, |
The pre-trained model equals the initial parameter set for the DNN case, right? Unlike the other algorithms, for DNNs we are trying to provide a |
@dongjoon-hyun. I am still confused a little bit. What is the format of file which in addition, for |
@jsjason , that's right. |
Thanks, @beomyeol and @dongjoon-hyun. Let's keep this issue open since we'll probably going to have more discussions when PRs starts to come up. |
Thank @dongjoon-hyun for you comment :) |
For various datasets, the data is stored in different file formats. For example, the data of MNIST database is saved in their own file format and the data of CIFAR-10 database is stored in Python pickle file and their own file format. Supporting all these file format is too burdensome.
So, I suggest defining a new file format which our DNN uses to load data from file. In order to support a variety of datasets such as MNIST and ImageNet, we can convert these datasets to our file format and provide them for DNN.
After define the new file format, we also need
InputFormat
andRecordReader
for it to run our neural network on REEF.The text was updated successfully, but these errors were encountered: