ML Pipeline v2 (#684)

* Create backend model * Create backend status endpoint * Return server status and available pipelines * Use pipeline slug * Fix .gitignore * Update backend status endpoint, test pipeline process images * fix: missing import in ml models * Add Backend to admin, update pipeline/backend model, register_pipelines action * Fix type checking * Add backend id to test pipeline processing * Constant and Random pipeline processing * Add test fixture * Don't use same project id for all tests * Added Backend created_at and updated_at serializer fields * Update models and display backends last checked * Resolve merge conflicts * Remove unused variables * Remove unused file * Register pipelines via frontend * Add missing fields to backend, fix migration error after merging with main * Add backend details dialog * Display backend details * Fix backend details displayed values * Select first backend associated with pipeline * Fix linting errors * Remove backend_id * Remove version/version name, fix adding project, make endpoint required * Use ErrorState component * Add serializer details * API test to check that pipelines are created * Add edit backend default values * Process images using backend with lowest latency * Remove projects from ML schemas * Resolve todos * Raise exception if no backends online * Fail the job if no backends are online * feat: begin storing category maps with algorithms in the DB * feat: begin saving all logits and scores from all predictions * fix: complete rename of softmax_scores field (now any calibrated score) * feat: admin sections for Classifications and Category Maps * fix: save simple labels along with category map data * feat: use function to generate fake classifications * feat: update schema in example ML backend. comments * fix: formatting * fix: remove bad admin filter * fix: update formatting (line-lengths) * fix: reset line-length to existing project value * feat: API views for algorithm category maps * feat: define schema for Algorithm & AlgorithmCategoryMap in the ML backend API * feat: continue writing schema for algorithms * feat: update image fetching utils based on latest AMI data companion * feat: bring schemas related to the ML backend responses in sync * fix: fix import * feat: update schema for algorithms * feat: update tests for processing pipeline responses * chore: update formatting (line-lengths) * fix: use only the key field for keeping algorithms unique * feat: update logging when saving pipeline results * feat: features from live ml backend schema * feat: improve matching existing algorithms & labels to incoming data * chore: reduce number of test images * fix: attempt to fix reprocessing when there is a new algorithm * chore: refactor pipeline results into multiple functions * feat: support to add logits & scores to existing classifications * fix: logging and comments * fix: improve job status updates and failure handling * fix: allow specifying job type on create, update tests * feat: move job logs to their own field to not mess with progress updates * feat: basic tests for category maps * feat: improve bulk saving & logging for pipeline results * feat: update occurrence determinations when saving results * feat: ensure determination is not considering intermediate classifications * feat: allow agreeing with any prediction/identification * fix: updates to detection occurrences * fix: update detections with occurrences * fix: fallback to non-terminal classifications if need be (non-moth) * fix: creating occurrences in bulk * feat: kill long queries during development * feat: add test for mapping taxa from category maps * feat: add more classification and detection details to API responses * feat: optionally return pipeline results * fix: associate category map with each classification in addition to algo * chore: more logging when saving results * feat: allow filtering captures by project * fix: update formatting * feat: make the classification list view more lightweight * feat: use redis for primary cache locally * feat: support for retying failed requests * Change MLBackend to ProcessingService * Change all instances of backend to processing service * Fix ui formatting, fix tests, and add migrations * Update comment to processing service * Update process_images error handling * Fix last_checked_live and processing services online * Change Sync to Add Pipelines * Remove updated at column for processing services * Display column of num pipelines added * Change status label of pipelines online to pipelines avaialble * Use slugify to add processing service * fix: clean up some logging, type warnings and extra code * feat: remove slug field, update naming * fix: update phrasing * Remove print statements * Fix log formatting * Squash migrations * task: rename directory and container name to relieve upcoming conflicts * fix: missing check for no-project * fix: field name * Filter processing services by project ID * Button indicates pipeline registration error * fix: cleanup naming of Random vs Dummy pipelines * feat: new schema for service info response * feat: new schema for the processing service info response * feat: change the main "process" endpoint path * fix: fixes to pipeline status / get_info response * fix: job should fail if sub-tasks fail (saving results) * fix: ensure pipeline slugs are unique * feat: improve error handling for results saved in subtasks * feat: improve usage of pipeline configs and schema names * fix: show blank date instead of undefined in the UI * feat: slightly improve robustness of status check responses * fix: slightly improve handling of unique pipeline identifiers * feat: save category maps for algorithms when registering pipelines * fix: skip slow counting in occurrences_per_day chart * chore: shorten column name * fix: selection of existing pipelines & algorithms * fix: undefined variable * feat: increase batch size per request * chore: use old name for processing_service for now * fix: update tests * fix: use old endpoint name for now * fix: conflicting migrations * Fix processing service error handling * feat: show algorithm key/slug in UI * fix: don't rename existing algorithms if not necessary * feat: update existing matching algorithms * fix: ensure scores and labels align * fix: missing env var during build time --------- Co-authored-by: Vanessa Mac <[email protected]>
RolnickLab · Jan 26, 2025 · b5d6885 · b5d6885
1 parent e5b1aed
commit b5d6885
Show file tree

Hide file tree

Showing 61 changed files with 2,981 additions and 737 deletions.
diff --git a/ami/jobs/migrations/0013_add_job_logs.py b/ami/jobs/migrations/0013_add_job_logs.py
@@ -0,0 +1,53 @@
+# Generated by Django 4.2.10 on 2024-12-17 20:01
+
+import django_pydantic_field.fields
+from django.db import migrations
+
+import ami.jobs.models
+
+
+def migrate_logs_forward(apps, schema_editor):
+    """Move logs from Job.progress to Job.logs"""
+    Job = apps.get_model("jobs", "Job")
+    jobs_to_update = []
+    for job in Job.objects.filter(progress__isnull=False):
+        if job.progress.logs or job.progress.errors:
+            # Move logs from progress to the new logs field
+            job.logs.stdout = job.progress.logs
+            job.logs.stderr = job.progress.errors
+            jobs_to_update.append(job)
+    # Update all jobs in a single query
+    Job.objects.bulk_update(jobs_to_update, ["logs"])
+
+
+def migrate_logs_backward(apps, schema_editor):
+    """Move logs from Job.logs back to Job.progress"""
+    Job = apps.get_model("jobs", "Job")
+    jobs_to_update = []
+    for job in Job.objects.filter(logs__isnull=False):
+        # Move logs back to progress
+        job.progress.logs = job.logs.stdout
+        job.progress.errors = job.logs.stderr
+        jobs_to_update.append(job)
+    # Update all jobs in a single query
+    Job.objects.bulk_update(jobs_to_update, ["progress"])
+
+
+class Migration(migrations.Migration):
+    dependencies = [
+        ("jobs", "0012_alter_job_limit"),
+    ]
+
+    operations = [
+        migrations.AddField(
+            model_name="job",
+            name="logs",
+            field=django_pydantic_field.fields.PydanticSchemaField(
+                config=None, default={"stderr": [], "stdout": []}, schema=ami.jobs.models.JobLogs
+            ),
+        ),
+        migrations.RunPython(
+            migrate_logs_forward,
+            migrate_logs_backward,
+        ),
+    ]
diff --git a/ami/jobs/migrations/0014_alter_job_progress.py b/ami/jobs/migrations/0014_alter_job_progress.py
@@ -0,0 +1,23 @@
+# Generated by Django 4.2.10 on 2024-12-17 20:13
+
+import ami.jobs.models
+from django.db import migrations
+import django_pydantic_field.fields
+
+
+class Migration(migrations.Migration):
+    dependencies = [
+        ("jobs", "0013_add_job_logs"),
+    ]
+
+    operations = [
+        migrations.AlterField(
+            model_name="job",
+            name="progress",
+            field=django_pydantic_field.fields.PydanticSchemaField(
+                config=None,
+                default={"errors": [], "logs": [], "stages": [], "summary": {"progress": 0.0, "status": "CREATED"}},
+                schema=ami.jobs.models.JobProgress,
+            ),
+        ),
+    ]
diff --git a/ami/jobs/migrations/0015_merge_20250117_2100.py b/ami/jobs/migrations/0015_merge_20250117_2100.py
@@ -0,0 +1,12 @@
+# Generated by Django 4.2.10 on 2025-01-17 21:00
+
+from django.db import migrations
+
+
+class Migration(migrations.Migration):
+    dependencies = [
+        ("jobs", "0013_merge_0011_alter_job_limit_0012_alter_job_limit"),
+        ("jobs", "0014_alter_job_progress"),
+    ]
+
+    operations = []