Skip to content

Latest commit

 

History

History
96 lines (56 loc) · 3.12 KB

README.md

File metadata and controls

96 lines (56 loc) · 3.12 KB

TTS-EgyptianArabic-Tacotron2

TTS models (Tacotron2), trained on EGYARA dataset from MASRY TTS paper including the HiFi-GAN vocoder for direct TTS inference.

Papers:

Tacotron2 | Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions (arXiv)

MASRY TTS | Masry: A Text-to-Speech System for the Egyptian Arabic (SCITEPRESS)

HiFi-GAN | HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis (arXiv)

Quick Setup

Required packages: torch torchaudio pyyaml

~ for training: librosa matplotlib tensorboard

Download the pretrained weights for the Tacotron2 model for Egyptian Arabic (https://drive.google.com/file/d/1etruUB2hNsYfvn5_zsDrQM6uVJW62u8u/view?usp=drive_link) then put it in pretrained folder

We used a diacritization model from Camel Tools (https://github.com/CAMeL-Lab/camel_tools) to diacritize Egyptian Arabic.

Download the HiFi-GAN vocoder weights (link). Either put them into pretrained/hifigan-asc-v1 or edit the following lines in configs/basic.yaml.

# vocoder
vocoder_state_path: pretrained/hifigan-asc-v1/hifigan-asc.pth
vocoder_config_path: pretrained/hifigan-asc-v1/config.json

Using the models

The Tacotron2 from models.tacotron2 are wrappers that simplify text-to-mel inference. The Tacotron2Wave models includes the HiFi-GAN vocoder for direct text-to-speech inference.

Inferring the Mel spectrogram

from models.tacotron2 import Tacotron2
model = Tacotron2('pretrained/tacotron2_ar_adv.pth')
model = model.cuda()
mel_spec = model.ttmel("ازيك عامل ايه")

End-to-end Text-to-Speech

from models.tacotron2 import Tacotron2Wave
model = Tacotron2Wave('pretrained/tacotron2_ar_adv.pth')
model = model.cuda()
wave = model.tts("اَزيك عامل ايه")

By default, Arabic letters are converted using the Buckwalter transliteration. The transliteration can also be used directly. If no Arabic script is expected to be used you can set arabic_in=False.

Inference from text file

python inference.py
# default parameters:
python inference.py --list data/infer_text.txt --out_dir samples/results --model tacotron2 --checkpoint pretrained/tacotron2_ar_adv.pth --batch_size 2 --denoise 0

Testing the model

To test the model run:

python test.py
# default parameters:
python test.py --model tacotron2 --checkpoint pretrained/tacotron2_ar_adv.pth --out_dir samples/test

Training the model

Before training, the audio files must be resampled. The model was trained after preprocessing the files using scripts/preprocess_audio.py.

To train the model with options specified in the config file run:

python train.py
# default parameters:
python train.py --config configs/EGYARA.yaml