-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can I use an English dataset for this repo? #7
Comments
No, you just have to prepare the English dataset |
I can see that your dataset is relatively small, so the number of update steps per epoch is only 5. Have your try a longer run and check if the behavior remains. Take a look at the vocab.json file whether it contains the correct English characters. |
Encountered the same problem even with larger dataset (91 steps and 20 epochs). |
I have not tried on other language datasets yet. Can you share more information about your dataset, config, tensorboard,… |
i can see that your model did not converge yet, train loss is still high. Try increase the lr higher for faster training |
Ping me at mail [email protected] for better debugging since I rarely check the GitHub notifications |
Already, thanks |
Is it possible to get an update on this question? What is the minimum size of the dataset? I want to train the model with a 20mins dataset. Do you think it is possible?
…________________________________
From: ghosthunterk ***@***.***>
Sent: Friday, July 21, 2023 5:33:53 PM
To: khanld/ASR-Wav2vec-Finetune ***@***.***>
Cc: Shaobo-Z ***@***.***>; Author ***@***.***>
Subject: Re: [khanld/ASR-Wav2vec-Finetune] Can I use an English dataset for this repo? (Issue #7)
Ping me at mail ***@***.******@***.***> for better debugging since I rarely check the GitHub notifications
Already, thanks
—
Reply to this email directly, view it on GitHub<#7 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AJHDBZHB2CRHQRPSO6CQOU3XRIWGDANCNFSM6AAAAAAZ3HZPXY>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
I will take a look at my codes and run some experiments on english datasets and response to you soon @Shaobo-Z |
In the source code, you used Vietnamese for training and validation. If I want to fine-tune a model that is in English and has English dataset, is there anything that I should change?
The text was updated successfully, but these errors were encountered: