OOM issue #27

Yaqing2023 · 2024-12-12T12:54:46Z

The current implementation in inference_basic.py has
video_frames = pipeline(
image=validation_image,
image_pose=validation_control_images,
...
which basically handles all pose images in one pipeline, this requires significant amount of GPU memory and limits the length of the video. I tried to split the validation_control_images into smaller chunks such as 40 images in a sublist and run the pipeline in a for loop. This works well for me and the final video quality looks good to me. Not sure whether this will affect the ID consistency but at least from what i tried results are good.

nitinmukesh · 2024-12-12T13:23:18Z

@Yaqing2023

Please could you share the updated code with batch processing. It will be useful.

Yaqing2023 · 2024-12-12T14:48:56Z

in inference_basic.py, first add a function
def split_into_sublists(image_list, max_images_per_list):
sublists = [image_list[i:i + max_images_per_list] for i in range(0, len(image_list), max_images_per_list)]

# Check if the last sublist contains fewer than min_last_sublist_size images
if len(sublists[-1]) < 10 and len(sublists) > 1:
    # Combine the last sublist with the previous one
    sublists[-2].extend(sublists[-1])
    sublists.pop()  # Remove the last sublist

return sublists

then change the pipeline call to following
# hard code sublist images to 40
max_images_per_list = 40
pose_sublists = split_into_sublists(validation_control_images, max_images_per_list)
all_video_frames = []
print("Total sublist groups: ", len(pose_sublists))
for pose_sublist in pose_sublists:
sub_num_frames = len(pose_sublist)
video_frames = pipeline(
image=validation_image,
image_pose=pose_sublist,
height=args.height,
width=args.width,
num_frames=sub_num_frames,
tile_size=args.tile_size,
tile_overlap=args.frames_overlap,
decode_chunk_size=decode_chunk_size,
motion_bucket_id=127.,
fps=7,
min_guidance_scale=args.guidance_scale,
max_guidance_scale=args.guidance_scale,
noise_aug_strength=args.noise_aug_strength,
num_inference_steps=args.num_inference_steps,
generator=generator,
output_type="pil",
validation_image_id_ante_embedding=validation_image_id_ante_embedding,
).frames[0]
all_video_frames.extend(video_frames)

out_file = os.path.join(
    args.output_dir,
    f"animation_video.mp4",
)
for i in range(num_frames):
    img = all_video_frames[i]
    all_video_frames[i] = np.array(img)

png_out_file = os.path.join(args.output_dir, "animated_images")
os.makedirs(png_out_file, exist_ok=True)
export_to_gif(all_video_frames, out_file, 8)
save_frames_as_png(all_video_frames, png_out_file)

nitinmukesh · 2024-12-12T16:44:44Z

@Yaqing2023

Thank you very much.
Would request to attach the updated file as I am not a developer, just trying these AI apps.

Yaqing2023 · 2024-12-13T01:01:45Z

inference_basic.py.txt

Francis-Rings · 2024-12-13T01:24:47Z

in inference_basic.py, first add a function def split_into_sublists(image_list, max_images_per_list): sublists = [image_list[i:i + max_images_per_list] for i in range(0, len(image_list), max_images_per_list)]
# Check if the last sublist contains fewer than min_last_sublist_size images
if len(sublists[-1]) < 10 and len(sublists) > 1:
    # Combine the last sublist with the previous one
    sublists[-2].extend(sublists[-1])
    sublists.pop()  # Remove the last sublist

return sublists
then change the pipeline call to following # hard code sublist images to 40 max_images_per_list = 40 pose_sublists = split_into_sublists(validation_control_images, max_images_per_list) all_video_frames = [] print("Total sublist groups: ", len(pose_sublists)) for pose_sublist in pose_sublists: sub_num_frames = len(pose_sublist) video_frames = pipeline( image=validation_image, image_pose=pose_sublist, height=args.height, width=args.width, num_frames=sub_num_frames, tile_size=args.tile_size, tile_overlap=args.frames_overlap, decode_chunk_size=decode_chunk_size, motion_bucket_id=127., fps=7, min_guidance_scale=args.guidance_scale, max_guidance_scale=args.guidance_scale, noise_aug_strength=args.noise_aug_strength, num_inference_steps=args.num_inference_steps, generator=generator, output_type="pil", validation_image_id_ante_embedding=validation_image_id_ante_embedding, ).frames[0] all_video_frames.extend(video_frames)
out_file = os.path.join(
    args.output_dir,
    f"animation_video.mp4",
)
for i in range(num_frames):
    img = all_video_frames[i]
    all_video_frames[i] = np.array(img)

png_out_file = os.path.join(args.output_dir, "animated_images")
os.makedirs(png_out_file, exist_ok=True)
export_to_gif(all_video_frames, out_file, 8)
save_frames_as_png(all_video_frames, png_out_file)

Cool! Thank you for your contribution!

nitinmukesh · 2024-12-13T07:03:40Z

@Yaqing2023

Appreciate your help, thank you.

@Francis-Rings

The existing code works fine on low VRAM with 512 x 512. If the changes contributed by Yaqing can be incorporated in the code driven by a --batch_processing True, it will help many.

Francis-Rings · 2024-12-13T08:07:27Z

@Yaqing2023

Appreciate your help, thank you.

@Francis-Rings

The existing code works fine on low VRAM with 512 x 512. If the changes contributed by Yaqing can be incorporated in the code driven by a --batch_processing True, it will help many.

Sure, I will check whether the modified pipeline affects the performance of StableAnimator. If it maintains the original performance while reducing GPU memory consumption, I’ll incorporate it into the inference code.

nitinmukesh · 2024-12-13T18:58:59Z

It increases the processing time by 10 fold.

Yaqing2023 · 2024-12-14T03:09:42Z

It is indeed a trade off of low memory and execution time. Many of us may have single GPU 4090 which can only run few seconds long video. The change will remove that limit.
The other memory bottleneck later i removed is the video_frames list. Current code stores all frames and do tesnor2vid in one batch which may also cause OOM. I changed that to small batch and write frames to disk immediately so i do not need to maintain the huge video_frames list. This way I can handle basically any length of driving videos without memory limitation.

nitinmukesh · 2024-12-14T08:01:20Z

and here I am using this on 4060 8 GB VRAM + 8 GB shared. :)
Please if you can share the updated code I would love to try it.
The approach of writing to disk is also good. I guess instead of creating video if each image is saved to disk as soon as it is processed and clearing torch cache will significantly reduce the VRAM usage. The frames can be merged using ffmpeg which will take only few seconds. Unfortunately, I am not a developer so can't code all these approaches.

Here is the benchmark for the video I am trying
Duration : 12 s 423 ms
Frame rate : 30.000 FPS
Resolution : 512 x 512

Inference time
100%|██████████████| 775/775 [35:14<00:00, 2.73s/it]

After inference it does some processing which need (will post total processing time). I think this is where if each image is saved to disk will save a lot of VRAM.

Total Inference Time: 0:53:49.142157

Yaqing2023 · 2024-12-16T05:59:48Z

inference_basic.py.txt
@nitinmukesh this is generate images in the basic_infer/animated_images/ directly for each batch of frames instead of wait till the end. Then you can run ffmpeg manually to create a mp4 files using the frames in basic_infer/animated_images/

nitinmukesh · 2024-12-16T08:09:56Z

TQVM @Yaqing2023. Appreciate your help.
Will try it now.

Yaqing2023 · 2024-12-19T01:27:19Z

The intention is not speeding up the process but saving memory so less powerful machine can play with this project

Francis-Rings mentioned this issue Dec 16, 2024

Speed up version? #42

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OOM issue #27

OOM issue #27

Yaqing2023 commented Dec 12, 2024

nitinmukesh commented Dec 12, 2024

Yaqing2023 commented Dec 12, 2024

nitinmukesh commented Dec 12, 2024

Yaqing2023 commented Dec 13, 2024

Francis-Rings commented Dec 13, 2024

nitinmukesh commented Dec 13, 2024

Francis-Rings commented Dec 13, 2024

nitinmukesh commented Dec 13, 2024

Yaqing2023 commented Dec 14, 2024

nitinmukesh commented Dec 14, 2024 •

edited

Loading

Yaqing2023 commented Dec 16, 2024

nitinmukesh commented Dec 16, 2024

Yaqing2023 commented Dec 19, 2024

OOM issue #27

OOM issue #27

Comments

Yaqing2023 commented Dec 12, 2024

nitinmukesh commented Dec 12, 2024

Yaqing2023 commented Dec 12, 2024

nitinmukesh commented Dec 12, 2024

Yaqing2023 commented Dec 13, 2024

Francis-Rings commented Dec 13, 2024

nitinmukesh commented Dec 13, 2024

Francis-Rings commented Dec 13, 2024

nitinmukesh commented Dec 13, 2024

Yaqing2023 commented Dec 14, 2024

nitinmukesh commented Dec 14, 2024 • edited Loading

Yaqing2023 commented Dec 16, 2024

nitinmukesh commented Dec 16, 2024

Yaqing2023 commented Dec 19, 2024

nitinmukesh commented Dec 14, 2024 •

edited

Loading