This Text->Text Translation System is designed to provide accurate and efficient real-time translations of spoken language. By leveraging machine learning, this system can process audio inputs and return translations in multiple languages, catering to a diverse global audience.
The project includes:
- Preprocessing of Text Translation data.
- Training machine learning models
- Evaluation of model performance
This directory contains the English-to-Spanish dataset in .txt
format.
/data/raw/English
: Holds the raw English-to-Spanish text dataset.
This directory includes the models for translation, including different architectures like GRU, LSTM, and Transformer.
/models/Tet_Text/gru_model.py
: Contains the code for the GRU-based translation model./models/Tet_Text/lst_model.py
: Contains the code for the LSTM-based translation model./models/Tet_Text/transformer_model.py
: Contains the code for the Transformer-based translation model.
This directory includes the necessary preprocessing code for preparing the dataset before training.
/scripts/preprocessing/data_processing.py
: Contains the functions for processing the raw data, such as tokenization, padding, etc./scripts/preprocessing/utils.py
: Contains utility functions used for preprocessing, like vocabulary creation or text cleaning.
This directory holds the training scripts for training the model.
/scripts/training/process.py
: Contains the functions for loading the data, preparing the dataset, and handling the training loop./scripts/training/train.py
: Main script for training the translation model.
This file contains the list of required Python packages needed to run the project.
/requirements.txt
: A text file listing the Python dependencies required for the project.