Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

does finetuning the DVAE component make any difference? #9

Closed
thivux opened this issue Sep 23, 2024 · 4 comments
Closed

does finetuning the DVAE component make any difference? #9

thivux opened this issue Sep 23, 2024 · 4 comments

Comments

@thivux
Copy link

thivux commented Sep 23, 2024

hi, i am finetuning this model on vietnamese dataset and face a problem with generating short sentences: the audio is unintelligible in some parts. after doing some research, i found a comment saying that fine-tuning the DVAE on my dataset will get rid of this problem. i am not sure if this will work, as the DVAE model is already trained on a lot of data and i expect it to be able to generalize well on new data. how is your experience with XTTS, with and without fine-tuning the DVAE component? does fine-tuning DVAE help with short sentences?

@thivux thivux changed the title does finetuning DVAE component make any difference? does finetuning the DVAE component make any difference? Sep 23, 2024
@thivux
Copy link
Author

thivux commented Sep 23, 2024

if DVAE does make a difference. how many epochs did you fine-tune it for?

@anhnh2002
Copy link
Owner

anhnh2002 commented Sep 23, 2024

if DVAE does make a difference. how many epochs did you fine-tune it for?

Here are the hyperparameters that gave the best performance in my experience.

CUDA_VISIBLE_DEVICES=0 python train_dvae_xtts.py \
--output_path=checkpoints/ \
--train_csv_path=datasets/metadata_train.csv \
--eval_csv_path=datasets/metadata_eval.csv \
--language="vi" \
--num_epochs=5 \
--batch_size=512 \
--lr=5e-6

@vcstack
Copy link

vcstack commented Sep 27, 2024

@thivux Xin chào bạn
Mình cũng đang muốn training mô hình này cho giọng nói tiếng việt, mình đã làm theo hướng dẫn và hiện đang bị lỗi
Nếu bạn đã train được mô hình này thành công có thể public cho mình tham khảo được không?

@ukemamaster
Copy link

if DVAE does make a difference. how many epochs did you fine-tune it for?

Here are the hyperparameters that gave the best performance in my experience.

CUDA_VISIBLE_DEVICES=0 python train_dvae_xtts.py \
--output_path=checkpoints/ \
--train_csv_path=datasets/metadata_train.csv \
--eval_csv_path=datasets/metadata_eval.csv \
--language="vi" \
--num_epochs=5 \
--batch_size=512 \
--lr=5e-6

If i continue for more epochs, will the model over-fit?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants