-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Classifier-free guidance? #6
Comments
Hi, thanks for the question. I didn't notice a practical difference between no text dropout and some text dropout in my experiments, so I left it out of this repo. However, I can push a branch later today and potentially merge after some testing. For reference, the implementation is just randomly substituting the input string with the empty string, similar to how it's done at inference time in diffusers (https://github.com/huggingface/diffusers/blob/384c83aa9a1f268e5587d5ea1ea9f4c040845167/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L371) |
Ah, you mean you could apply classifier-free guidance at inference even the model never encounter empty strings (like done in the repo you mention), isn't this unexpected? |
Yes, the |
I think it's interesting that it worked anyway, on which dataset you were training ? |
@mehdidc I've used several datasets with this setup, mostly various filtered versions of LAION-2b (e.g. laion aesthetics and laion high res). I've added text dropout in the
I haven't been able to test this code recently due to lack of resources. Let me know if you get a chance to try this out, and I can merge it into |
Thanks @vkramanuj for the implementation, I can try to do some runs, do you maybe have the config file you used in your tests with LAION aesthetics and/or high res, so that we can compare more or less directly ? |
Here's one. I removed my system:
gradient_accumulation: 1
batch_size: 32
workers: 6
dist_backend: ${distributed.dist_backend}
dist_url: ${distributed.dist_url}
distributed:
dist_backend: 'nccl'
dist_url: 'env://'
experiment:
log_dir: <path>/sd-logs
name: "laion-2b-aesthetics-hr"
project: "diffusion"
num_examples_to_see: 2000000000
save_every: 2000
requeue: True
optimizer:
name: adamw
params:
learning_rate: 0.0001
beta1: 0.9
beta2: 0.98 # changed from initial sd value for training stability
weight_decay: 0.01
epsilon: 0.00000001
model:
vae:
pretrained: "<path>/stable-diffusion-v1-5-fp32"
text_encoder:
pretrained: "<path>/stable-diffusion-v1-5-fp32"
tokenizer:
pretrained: "<path>/stable-diffusion-v1-5-fp32"
scheduler:
pretrained: "<path>/stable-diffusion-v1-5-fp32"
unet:
target: UNet2DConditionModel
params:
act_fn: "silu"
attention_head_dim: 8
block_out_channels: [320, 640, 1280, 1280]
center_input_sample: False
cross_attention_dim: 768
down_block_types: ["CrossAttnDownBlock2D","CrossAttnDownBlock2D", "CrossAttnDownBlock2D", "DownBlock2D"]
downsample_padding: 1
flip_sin_to_cos: true
freq_shift: 0
in_channels: 4
layers_per_block: 2
mid_block_scale_factor: 1
norm_eps: 1e-05
norm_num_groups: 32
out_channels: 4
sample_size: 32
up_block_types: [
"UpBlock2D",
"CrossAttnUpBlock2D",
"CrossAttnUpBlock2D",
"CrossAttnUpBlock2D"
]
use_ema: True
mixed_precision: bf16
gradient_checkpointing: True
xformers: True
dataset:
type: WebDataset
params:
path: "pipe:aws s3 cp s3://s-datasets/laion5b/laion2B-data/{000000..231349}.tar -"
batch_size: ${system.batch_size}
workers: ${system.workers}
num_examples_to_see: ${experiment.num_examples_to_see}
resolution: 512
text_dropout: 0.0
lr_scheduler:
scheduler: "ConstantWithWarmup"
params:
learning_rate: ${optimizer.params.learning_rate}
warmup_length: 500 |
I might have missed it in the code, but I can't see whether we randomly drop the captions for classifier-free guidance (which is already used at inference).
The text was updated successfully, but these errors were encountered: