Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error when training by cifar10 #4

Open
Super-1123 opened this issue Apr 10, 2019 · 1 comment
Open

error when training by cifar10 #4

Super-1123 opened this issue Apr 10, 2019 · 1 comment

Comments

@Super-1123
Copy link

Hi,I try to rerun this code to test this model's performance by using the 'python3.6 main.py --cfg cfgs/cifar10/aognet_cifar10_ps_4_bottleneck_1M.yaml --gpus 1,2'.At first everything seemed to be going smoothly,however,when it comes to epoch 280,it is stoped by an error:
Traceback (most recent call last):
File "main.py", line 133, in
main()
File "main.py", line 120, in main
epoch_end_callback = checkpoint)
File "/home/amax/anaconda3/lib/python3.6/site-packages/mxnet/module/base_module.py", line 575, in fit
callback(epoch, self.symbol, arg_params, aux_params)
File "/home/amax/anaconda3/lib/python3.6/site-packages/mxnet/callback.py", line 89, in _callback
save_checkpoint(prefix, iter_no + 1, sym, arg, aux)
File "/home/amax/anaconda3/lib/python3.6/site-packages/mxnet/model.py", line 409, in save_checkpoint
nd.save(param_name, save_dict)
File "/home/amax/anaconda3/lib/python3.6/site-packages/mxnet/ndarray/utils.py", line 273, in save
keys))
File "/home/amax/anaconda3/lib/python3.6/site-packages/mxnet/base.py", line 252, in check_call
raise MXNetError(py_str(LIB.MXGetLastError()))
mxnet.base.MXNetError: [09:52:25] src/io/local_filesys.cc:39: Check failed: std::fwrite(ptr, 1, size, fp
) == size FileStream.Write incomplete
I can't find the suitable solution to deal with this problem.So could you please tell me how to solve this problem?

@xilaili
Copy link
Owner

xilaili commented Apr 10, 2019

sorry, I didn't have this problem before. It seems like an error to save checkpoint to file.
Anyway, the code is based on old version of mxnet. The results cannot match the results reported in the paper. We'll release our new pytorch code very soon. Stay tuned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants