Updates

GoingOffRoading · Jul 21, 2024 · 067d7d7 · 067d7d7
1 parent ef3aa91
commit 067d7d7
Show file tree

Hide file tree

Showing 4 changed files with 96 additions and 35 deletions.
diff --git a/README.md b/README.md
@@ -1,20 +1,17 @@
-# Note
-
-This project is being refactored.  Stay tuned.
-
-
 # Boilest
 
 Boilest is my solution to:
 
 - Having video media in lots of different formats, but wanting to consolidate it into one format
-- Having video media consume a lot of disk space, and desiring to compress it
+- Wanting my video content to consume less space in my NAS
 - Wanting to do this work at scale
 
+---
 # Why 'Boilest'?
 
 Because I am terrible at naming things, and it was the first agreeable thing to come out of a random name generator.
 
+---
 # What about Tdarr, Unmanic, or other existing distributed solutions??
 
 [Tdarr](https://home.tdarr.io/) is a great platform, but didn't setup or scale as well as I would have liked.  I also found it dificult to under documented, closed source, had some design oddities, and hid features behind a paywall.
@@ -23,47 +20,108 @@ As frenzied as Tdarr fans are on Reddit, I just can't commit/subscribe to a serv
 
 [Unmanic](https://github.com/Unmanic/unmanic/tree/master) is magic...  I am a big fan, and Unmanuc is comftorably the inspiration of this project.
 
-I would be using Unmanic today, instead of writing spagetti code, but Josh5 has [hardcoded the platform on an older version of FFmpeg](https://github.com/Unmanic/unmanic/blob/master/docker/Dockerfile#L82), doesn't currently support AV1, [has some complexities to build the container](https://github.com/Unmanic/unmanic/blob/master/docker/README.md) that make it dificult to code my own support, and doesn't seem to be keeping up on the repo or accepting PRs.
+I would be using Unmanic today, instead of writing spagetti code, but Josh5 had previously [hardcoded the platform on an older version of FFmpeg](https://github.com/Unmanic/unmanic/blob/master/docker/Dockerfile#L82), doesn't currently support AV1, [has some complexities to build the container](https://github.com/Unmanic/unmanic/blob/master/docker/README.md) that make it dificult to code my own support, and doesn't seem to be keeping up on the repo or accepting PRs.
 
+---
 # Why not Handbrake?
 
 Handbrake is awesome, but:
 
-- It's not distributed
+- It's not distributed/doesn't scale past the node with the GUI open
 - It's 'watch folder' functionality doesn't do any file checking, sorting, or filtering to decide if it should actually process a file
 - Does not have the functionality for monitoring an existing media collection
 
+---
 # How does Boilest work?
 
 - Boilest kicks off a job that searches directories for video files
-- Boilest then checks each individual video file to see if the various codecs match a spec
-- If any of the codecs don't match spec, the file is dispatched for encoding
-- Once encoding is complete, the results are stored in a DB
+- Boilest then checks each individual video file to see if the various codecs match a spec.  In this step, Boilest will also prioritize files that have the highest ROI for encoding (large media, x264, mp4, etc) first as to not waste time with diminishing returns (changing small, x265, mkv files) up front. If any of the codecs don't match spec, the file is dispatched for encoding.
+- If it is determined from the above step that encoding is required, the file undergoes a series of validations.  Assuming the file passes those validations, the file is encoded.  The output encoded file is then also validated.  If the output encoded file passes validations, it replaces the original file.
+- Once encoding is complete, the results are stored in a DB for stats.
+
+---
+ # What will Boilest change?
+
+ In any given media file:
 
- # What is in this repo today?
+ | Area | Target Change |
+ |------|---------------|
+ | Container | Media containers that are not MKR (like MP4) are changed to MKV
+ | Video | Video streams that are not AV1 are encoded to AV1
+ | Audio | No changes to audio streams at this time.  Audio streams are copied.
+ | Subtitles | No changes to subtitle streams at this time.  subtitle streams are copied.
+ | Attachments | No changes to Attachmentsat this time.  Attachments are copied.
 
-A simplified workflow running in Celery
+ Once I make some final decisions around what is optimal for TV/device streaming, there will become targets to audio, subtitles, and attachments.
 
+---
 # How to deploy
 
-- Build the container image 
-- Deploy the image
+- Create your deployment (Docker/Kubernetes/etc) with the ghcr.io/goingoffroading/boilest-worker:latest container image.
+- Change the container variables to reflect your enviorment:
 
-See /Deployment for a Kubernetes example using Azure Container Registry
+| ENV                 | Defaul Value  |
+|---------------------|---------------|
+| user                | celery        |
+| password            | celery        |
+| celery_host         | 192.168.1.110 |
+| celery_port         | 31672         |
+| celery_vhost        | celery        |
+| rabbitmq_host       | 192.168.1.110 |
+| rabbitmq_port       | 32311         |
+| sql_host            | 192.168.1.110 |
+| sql_port            | 32053         |
+| sql_database        | boilest       |
+| sql_user            | boilest       |
+| sql_pswd            | boilest       |
 
-## Prerequisit 
+- Deploy the container.
+- SSH into any one of the containers and run 'python start.sh'.  This will kick off all of the workflows.
 
-### RabbitMQ
+Done.
+
+- See 'boilest_kubernetes.yml' for an example of a Kubernetes deployment
+
+---
+# Prerequisit 
+
+---
+## RabbitMQ
 
 The backbone of Boilest is a distributed task Python library called [Celery](https://docs.celeryq.dev/en/stable/getting-started/introduction.html). Celery needs a message transport (a place to store the task queue), and we leverage RabbitMQ for that.
 
-RabbitMQ will need to be deployed with it's ,management plugin.
+RabbitMQ will need to be deployed with it's management plugin.
+
+---
+## MariaDB
 
+Technically, the workflow works fine (at this time) without access to MariaDB (mysql).  MariaDB is where the results of the encoding are tracked.  If Maria is not deployed, the final task will fail, and this will only be noticable in the logs.
+
+In Maria, create a database called 'boilest'.
+
+In the 'boilest' database, create a table called 'ffmpeghistory' with the following columns:
+
+| Column Name              | Type                            |
+|--------------------------|---------------------------------|
+| unique_identifier        | varchar(100)                    |
+| recorded_date            | datetime                        |
+| file_name                | varchar(100)                    |
+| file_path                | varchar(100)                    |
+| config_name              | varchar(100)                    |
+| new_file_size            | int(11)                         |
+| new_file_size_difference | int(11)                         |
+| old_file_size            | int(11)                         |
+| watch_folder             | varchar(100)                    |
+| ffmpeg_encoding_string   | varchar(1000)                   |
+
+In a huture itteration, I'll include a python script that populates database and table into Maria automatically.
+
+---
 # Q&A
 
   * If Celery can use Reddis or RabbitMQ for it's message transport, can Boilest use Reddis?
 
-    Not in Boilest's current state.  Boilest doesn't use any RabbitMQ functionality that Reddis doesn't have an equivilent.  That said, of the to-dos in /Scripts, having message transport flexibility is not a priority at this time.
+    Not in Boilest's current state, and probably never.  Reddis doesn't 'support' prioritization of messages technically at all or as well as rabbit does.  Boilest currently uses RabbitMQ's prioritization of messages to encode the video files with the highest ROI for encoding time.
 
 - Why does Boilest use RabbitMQ over Reddis?
 
@@ -80,9 +138,7 @@ RabbitMQ will need to be deployed with it's ,management plugin.
 
   They're worth keeping around for refrences/discussion on [r/learnpython](https://www.reddit.com/r/learnpython/)
 
-
-# Todo List:
-
+---
 # Todo List
 
 - [x] Setup a set_priority function for ffprobe based on container, file size, and video codec (I.E. the things that have the greatest impact on ROI)
@@ -93,4 +149,8 @@ RabbitMQ will need to be deployed with it's ,management plugin.
 - [ ] Research ffprobe flags for HDR content
 - [x] Figure out how to pass the wath folder forward for the SQL write
 - [x] Figure out how to pass the ffmpeg string forward for the SQL write
-
+- [ ] Stand up repo for management UI
+- [ ] Make tweaks to the priotiziation scoring
+- [ ] Create a 'create database, table' script
+- [ ] Having write_results be it's own task is stupid.  Incoporate it into process_ffmpeg.
+- [ ] Tasks.py is stupidly big.  Break it up into different files for readability/management.
diff --git a/boilest kubernetes.yml b/boilest kubernetes.yml
@@ -1,4 +1,7 @@
 ---
+# Kubernetes deployment example using a Daemonset and labeled nodes.
+# Label a node 'boilest':'worker' and Boilest will automatically deploy to it
+# Respository here is Azure Container Registry (ARC) so 
 kind: Daemonset
 apiVersion: apps/v1
 metadata:
@@ -20,15 +23,13 @@ spec:
     spec:
       containers:
         - name: boilest
-          image: TBD
+          image: ghcr.io/goingoffroading/boilest-worker:latest
           imagePullPolicy: Always
           volumeMounts:
             - name: boilestmedia
               mountPath: "/boil_watch"
       nodeSelector:
         boilest: worker
-      imagePullSecrets:
-      - name: azureacr
       nodeName: node101-desktop
       volumes:
         - name: boilestmedia

diff --git a/dockerfile b/dockerfile
@@ -52,5 +52,4 @@ ENV sql_pswd boilest
 USER appuser
 
 # Start supervisord
-#CMD ["supervisord", "-c", "/etc/supervisor/conf.d/supervisord.conf"]
 CMD ["celery", "-A", "tasks", "worker"]
diff --git a/tasks.py b/tasks.py
@@ -9,7 +9,7 @@
 
 # create logger
 logger = logging.getLogger('boilest_logs')
-logger.setLevel(logging.DEBUG)
+logger.setLevel(logging.INFO)
 
 # create console handler and set level to debug
 ch = logging.StreamHandler()
@@ -116,8 +116,8 @@ def locate_files(arg):
     directories = ['/anime', '/tv', '/movies']
     extensions = ['.mp4', '.mkv', '.avi']
 
-    logger.debug(f'Searching directories: {directories}')
-    logger.debug(f'File extensions: {extensions}')
+    logger.info(f'Searching directories: {directories}')
+    logger.info(f'File extensions: {extensions}')
 
     for file_located in find_files(directories, extensions):
         logger.debug('File located, sending to ffprobe function')
@@ -339,6 +339,7 @@ def process_ffmpeg(file_located_data):
                 logger.debug(file + ' has passed ffmpeg_postlaunch_checks')
                 if move_media(file_located_data) == True:
                     logger.debug(file + ' has passed move_media')
+                    logger.info ('ffmpeg is done')
                     file_path = file_located_data['file_path']
                     ffmepg_output_file_name = file_located_data['ffmepg_output_file_name'] 
                     file_located_data['new_file_size'] = get_file_size_kb(destination_file_name_function(file_path, ffmepg_output_file_name))
@@ -397,8 +398,8 @@ def run_ffmpeg(file_located_data):
     ffmpeg_stringffmpeg_command = file_located_data['ffmpeg_command']
     ffmpeg_stringffmepg_output_file_name = file_located_data['ffmepg_output_file_name']
     output_ffmpeg_command = f"{ffmpeg_string_settings} \"{ffmpeg_stringfile_path}\" {ffmpeg_stringffmpeg_command} \"{ffmpeg_stringffmepg_output_file_name}\""
-    logger.debug ('ffmpeg_command is: ' + output_ffmpeg_command)
-    logger.debug ('running ffmpeg now')
+    logger.info ('ffmpeg_command is: ' + output_ffmpeg_command)
+    logger.info ('running ffmpeg now')
     try:
         process = subprocess.Popen(output_ffmpeg_command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT,universal_newlines=True)
         for line in process.stdout:
@@ -567,9 +568,9 @@ def write_results(file_located_data):
         # the varchar for ffmpeg_encoding_string is 999 characters.  This is to keep the db write from failing at 1000 characters
         ffmpeg_encoding_string = ffmpeg_encoding_string[:999]
 
-    logger.debug('Writing results')
+    logger.info('Writing results')
     insert_record(unique_identifier, file_name, file_path, config_name, new_file_size, new_file_size_difference, old_file_size, watch_folder, ffmpeg_encoding_string)
-    logger.debug('Writing results complete')
+    logger.info('Writing results complete')
 
 
 def insert_record(unique_identifier, file_name, file_path, config_name, new_file_size, new_file_size_difference, old_file_size, watch_folder, ffmpeg_encoding_string):