Skip to content

Latest commit

 

History

History
179 lines (132 loc) · 9.23 KB

README.md

File metadata and controls

179 lines (132 loc) · 9.23 KB

Abstraction

Boilest is my solution to:

  • Having video media in lots of different formats, but wanting to consolidate it into one format
  • Wanting my video content to consume less space in my NAS
  • Wanting to do this work at scale

Why 'Boilest'?

Because I am terrible at naming things, and it was the first agreeable thing to come out of a random name generator.


What about Tdarr, Unmanic, or other existing distributed solutions??

Tdarr is a great platform, but didn't setup or scale as well as I would have liked. I also found it comfortably to under documented, closed source, had some design oddities, and hid features behind a paywall.

As frenzied as Tdarr fans are on Reddit, I just can't commit/subscribe to a service like that.

Unmanic is magic... I am a big fan, and Unmanic is comfortably the inspiration of this project.

I would be using Unmanic today, instead of writing spaghetti code, but Josh5 had previously hardcoded the platform on an older version of FFmpeg, doesn't currently support AV1, has some complexities to build the container that make it difficult to code my own support, and doesn't seem to be keeping up on the repo or accepting PRs.


Why not Handbrake?

Handbrake is awesome, but:

  • It's not distributed/doesn't scale past the node with the GUI open
  • It's 'watch folder' functionality doesn't do any file checking, sorting, or filtering to decide if it should actually process a file
  • Does not have the functionality for monitoring an existing media collection

How does Boilest work?

  • Boilest kicks off a job that searches directories for video files
  • Boilest then checks each individual video file to see if the various codecs match a spec. In this step, Boilest will also prioritize files that have the highest ROI for encoding (large media, x264, mp4, etc) first as to not waste time with diminishing returns (changing small, x265, mkv files) up front. If any of the codecs don't match spec, the file is dispatched for encoding.
  • If it is determined from the above step that encoding is required, the file undergoes a series of validations. Assuming the file passes those validations, the file is encoded. The output encoded file is then also validated. If the output encoded file passes validations, it replaces the original file.
  • Once encoding is complete, the results are stored in a DB for stats.

What will Boilest change?

In any given media file:

Area Target Change
Container Media containers that are not MKR (like MP4) are changed to MKV
Video Video streams that are not AV1 are encoded to AV1
Audio No changes to audio streams at this time. Audio streams are copied.
Subtitles No changes to subtitle streams at this time. subtitle streams are copied.
Attachments No changes to Attachments at this time. Attachments are copied.

Once I make some final decisions around what is optimal for TV/device streaming, there will become targets to audio, subtitles, and attachments.


Prerequisites


RabbitMQ

The backbone of Boilest is a distributed task Python library called Celery. Celery needs a message transport (a place to store the task queue), and we leverage RabbitMQ for that.

RabbitMQ will need to be deployed with it's management plugin.

From the management plugin:

  • Create a 'celery' vhost
  • Create a user with the user/pwd of celery/celery
  • Give the celery .* configure, write, read permissions in the celery vhost

MariaDB

Technically, the workflow works fine (at this time) without access to MariaDB (mysql). MariaDB is where the results of the encoding are tracked. If Maria is not deployed, the final task will fail, and this will only be noticeable in the logs.

In Maria, create a database called 'boilest'.

In the 'boilest' database, create a table called 'ffmpeghistory' with the following columns:

Column Name Type
unique_identifier varchar(100)
recorded_date datetime
file_name varchar(100)
file_path varchar(100)
config_name varchar(100)
new_file_size int(11)
new_file_size_difference int(11)
old_file_size int(11)
watch_folder varchar(100)
ffmpeg_encoding_string varchar(1000)

In a future iteration, I'll include a python script that populates database and table into Maria automatically.


How to deploy

  • Create your deployment (Docker/Kubernetes/etc) with the ghcr.io/goingoffroading/boilest-worker:latest container image.
  • Change the container variables to reflect your environment:
ENV Default Value Notes
celery_user celery The user setup for Celery in your RabbitMQ
celery_password celery The password setup for Celery in your RabbitMQ
celery_host 192.168.1.110 The IP address of RabbitMQ
celery_port 31672 The port RabbitMQ's port 5672 or 5673 are mapped to
celery_vhost celery The RabbitMQ vhost setup for Boilest
rabbitmq_host 192.168.1.110 The IP address of RabbitMQ management UI
rabbitmq_port 32311 The port of RabbitMQ management UI
sql_host 192.168.1.110 The IP address of MariaDB
sql_port 32053 The port mapped to MariaDB's port 3306
sql_database boilest The database name setup for Boilest
sql_user boilest The username setup for Boilest
sql_pswd boilest The password setup for Boilest
  • Deploy the container.
  • SSH into any one of the containers and run 'python start.sh'. This will kick off all of the workflows.

Done.

  • See 'boilest_kubernetes.yml' for an example of a Kubernetes deployment

How to start the Boilest/video encoding workflow

Either:

  • Deploy the Boilest Management GUI container and either wait for the cron, or SSH into the container and start start.py
  • SSH into one of the Boilest-Worker containers and run start.py:

In both SSH cases, literally run

python start.py  

SSH in Kuberentes is:

kubectl exec -it (your pod name) -- /bin/sh

SSH in Docker is:

docker exec -it (container ID) /bin/sh

Starting the workflow only needs to be done once from any one of the relevant containers. That start will trickle into the other containers via the RabbitMQ broker.


Q&A

  • If Celery can use Redis or RabbitMQ for it's message transport, can Boilest use Redis?

    Not in Boilest's current state, and probably never. Redis doesn't 'support' prioritization of messages technically at all or as well as rabbit does. Boilest currently uses RabbitMQ's prioritization of messages to encode the video files with the highest ROI for encoding time.


Todo List

  • Setup a set_priority function for ffprobe based on container, file size, and video codec (I.E. the things that have the greatest impact on ROI)
  • Setup the function to write the results to the DB
  • Replace the prints with logging
  • Made decisions on audio codec
  • Make decisions on subtitle codec
  • Research ffprobe flags for HDR content
  • Figure out how to pass the watch folder forward for the SQL write
  • Figure out how to pass the ffmpeg string forward for the SQL write
  • Stand up repo for management UI
  • Make tweaks to the prioritization scoring
  • Create a 'create database, table' script
  • Having write_results be it's own task is stupid. Incorporate it into process_ffmpeg.
  • Tasks.py is stupidly big. Break it up into different files for readability/management.
  • Revisit string formatting i.e. f"Name: {name}, Age: {age}" instead of name + ", Age:" + str(age)
  • Explore using the Pydantic Model
  • Remove hard-coding related
  • Move UniqueID in the SQL to a GUID
  • Explore using pathlib instead of OS
  • Remove the archive
  • Some day... Remove the celery task function for write_results
  • Consider moving queue_workers_if_queue_empty to the manager container