Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting Exhaust Error when Running train.py file #13

Open
ashamy97 opened this issue Mar 8, 2024 · 1 comment
Open

Getting Exhaust Error when Running train.py file #13

ashamy97 opened this issue Mar 8, 2024 · 1 comment

Comments

@ashamy97
Copy link

ashamy97 commented Mar 8, 2024

Hello, I am testing your repo and I tried to run the train.py script on jupyter notebook and I got this error

Namespace(D_learning_rate=0.0002, G_learning_rate=0.0002, batch_size=2, data_folder='C:\Users\youss\Desktop\Research\augmented_data_zoom_cropped', diff_augment=False, discriminator_weights=None, epochs=50000, fid=False, fid_frequency=1, fid_number_of_images=128, generator_weights=None, name='experiment', override=True, resolution=256)
[Model G] output shape: (2, 256, 256, 3)
[Model D] real_fake output shape: (2, 5, 5, 1)
[Model D] image output shape(2, 128, 128, 3)
[Model D] image part output shape(2, 128, 128, 3)
Epoch 0 -------------

ResourceExhaustedError Traceback (most recent call last)
~\Desktop\Research\Adding ML TO AR\GAN and cGAN\cFastGAN to generate motor images\SLE-GAN\train.py in
83 D_optimizer=D_optimizer,
84 images=image_batch,
---> 85 diff_augmenter_policies=diff_augment_policies)
86
87 G_loss_metric(G_loss)

~\anaconda3\envs\tens-gpu\lib\site-packages\tensorflow\python\eager\def_function.py in call(self, *args, **kwds)
883
884 with OptionalXlaContext(self._jit_compile):
--> 885 result = self._call(*args, **kwds)
886
887 new_tracing_count = self.experimental_get_tracing_count()

~\anaconda3\envs\tens-gpu\lib\site-packages\tensorflow\python\eager\def_function.py in _call(self, *args, **kwds)
948 # Lifting succeeded, so variables are initialized and we can run the
949 # stateless function.
--> 950 return self._stateless_fn(*args, **kwds)
951 else:
952 _, _, _, filtered_flat_args = \

~\anaconda3\envs\tens-gpu\lib\site-packages\tensorflow\python\eager\function.py in call(self, *args, **kwargs)
3038 filtered_flat_args) = self._maybe_define_function(args, kwargs)
3039 return graph_function._call_flat(
-> 3040 filtered_flat_args, captured_inputs=graph_function.captured_inputs) # pylint: disable=protected-access
3041
3042 @ property

~\anaconda3\envs\tens-gpu\lib\site-packages\tensorflow\python\eager\function.py in _call_flat(self, args, captured_inputs, cancellation_manager)
1962 # No tape is watching; skip to running the function.
1963 return self._build_call_outputs(self._inference_function.call(
-> 1964 ctx, args, cancellation_manager=cancellation_manager))
1965 forward_backward = self._select_forward_and_backward_functions(
1966 args,

~\anaconda3\envs\tens-gpu\lib\site-packages\tensorflow\python\eager\function.py in call(self, ctx, args, cancellation_manager)
594 inputs=args,
595 attrs=attrs,
--> 596 ctx=ctx)
597 else:
598 outputs = execute.execute_with_cancellation(

~\anaconda3\envs\tens-gpu\lib\site-packages\tensorflow\python\eager\execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
58 ctx.ensure_initialized()
59 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60 inputs, attrs, num_outputs)
61 except core._NotOkStatusException as e:
62 if name is not None:

ResourceExhaustedError: OOM when allocating tensor with shape[2,128,128,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node generator/StatefulPartitionedCall/up_sampling_block_4/up_sampling2d_4/resize/ResizeNearestNeighbor}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
[Op:__inference_train_step_20754]

Function call stack:
train_step

@ashamy97
Copy link
Author

ashamy97 commented Mar 8, 2024

This is the line I typed in the jupyter notebook

%run train.py --data-folder ./my-path --override --resolution 256 --G-learning-rate 0.0002 --D-learning-rate 0.0002 --batch-size 2 --epochs 50000

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant