Develop a Telegram Bot that can:
-
Save audio messages to a DB by user IDs. Audio should be converted to
.wav
format with a 16kHz sampling rate.
Recording format:uid -> [audio_message_0, audio_message_1, ..., audio_message_N]
. -
Determines whether there is a face in the photos being sent or not, saves only those where it is.
Open your terminal and ho to the folder where you want to save this bot and type:
git clone https://github.com/SrVladyslav/TelegramBotPOC.git
cd TelegramBotPOC
python -m venv .venv
source .venv/Scripts/activate
pip install -r requirements.txt
NOTE: It might take some time, be patient 😅.
In order to use the bot live, you need a Telegranm token, which is used to create your own Bot in Telegram.
Follow the steps to obtain yours from scratch:
- Search for
BotFather
in the Telegram search section. - Click on the
Start
button. - Type
/newbot
and press enter to create your own bot. Then you will be asked to provide the bot name and username. To choose a username, you sould know that it must end with the wordbot
or_bot
.
NOTE: It is not possible to change the bot username later!
- If BotFather approves your username, you will be prompted with a message that contains
YOUR_TELEGRAM_BOT_TOKEN
in it, copy this token.
echo "TELEGRAM_BOT_TOKEN='<YOUR_TELEGRAM_BOT_TOKEN here>'" > .env
python main.py
Press Ctrl-C
to stop.
- Implement a connection to a remote database (e.g. RDS + S3).
- Implement a logger to keep track of everything that happens in the program.
- Here is a notebook with readings from the DB.
- Database handler code here.
- Audio processing main code.
- Image processing main code.
NOTE: According to the statement: "Determines whether there is a face in the photos being sent or not, saves only those where it is" it's not clear if we need to record the image relations with the User into a DB, neither if the images should have some user_id, or if we need to rescale those images.
Here I'm assuming that we are making a face Dataset, so all the images are named as [image_<0>.jpg, ..., image_.jpg] with its original sizes, since Telegram rescales large images to a maximum of 1280px, we should not be afraid that they will send us very large images.
Also no information about the userid will be included (it would be the same code like in the audio part, if were needed).
NOTE: I'm using Haar Cascade Algorithm included in cv2 since is pretty good for the given task, it's trading precision for time. If we have a good server, we can user some ML models instead here, for example, see: insightface.ai.
📦TelegramBotPOC
┣ 📂.venv
┣ 📂data
┃ ┣ 📂audio_data # Here will be stored all the preprocessed audio records
┃ ┃ ┣ 📂<USER_TELEGRAM_UID>
┃ ┃ ┃ ┣ 📜audio_message_0.wav
┃ ┃ ┃ ┣ ...
┃ ┃ ┃ ┗ 📜audio_message_N.wav
┃ ┃ ┗ 📜.gitkeep
┃ ┣ 📂db
┃ ┃ ┗ 📜database_prod.db # DB Tables will be created after running python main.py
┃ ┣ 📂image_data # The images which have some face on it will be stored here
┃ ┃ ┣ 📜.gitkeep
┃ ┃ ┣ 📜image_0.jpg
┃ ┃ ┣ ...
┃ ┃ ┗ 📜image_N.jpg
┃ ┣ 📜database_handler.py # All the database SQL functions are here
┃ ┗ 📜__init__.py
┣ 📂docs
┃ ┗ 📜opencv24.pdf
┣ 📂utils
┃ ┣ 📜audio_utils.py # All the main functions related to the audio processing are here
┃ ┣ 📜image_utils.py # All the main functions related to the image processing are here
┃ ┗ 📜__init__.py
┣ 📜.env
┣ 📜.gitignore
┣ 📜audioVisualizer.ipynb # Contains some plots to check visually that some changes on the audio were made
┣ 📜dbVisualizer.ipynb # Contains GET SQLs with Pandas representation of the DB Tables
┣ 📜main.py
┣ 📜README.md
┣ 📜requirements.txt
┗ 📜__init__.py
- python-telegram-bot documentation page.
- librosa.load was used to read the audio data and automatically resample to 16kHz.
- Haar Cascade face detection explanation.