Skip to content

Commit

Permalink
ML Pipeline v2 (#684)
Browse files Browse the repository at this point in the history
* Create backend model

* Create backend status endpoint

* Return server status and available pipelines

* Use pipeline slug

* Fix .gitignore

* Update backend status endpoint, test pipeline process images

* fix: missing import in ml models

* Add Backend to admin, update pipeline/backend model, register_pipelines action

* Fix type checking

* Add backend id to test pipeline processing

* Constant and Random pipeline processing

* Add test fixture

* Don't use same project id for all tests

* Added Backend created_at and updated_at serializer fields

* Update models and display backends last checked

* Resolve merge conflicts

* Remove unused variables

* Remove unused file

* Register pipelines via frontend

* Add missing fields to backend, fix migration error after merging with main

* Add backend details dialog

* Display backend details

* Fix backend details displayed values

* Select first backend associated with pipeline

* Fix linting errors

* Remove backend_id

* Remove version/version name, fix adding project, make endpoint required

* Use ErrorState component

* Add serializer details

* API test to check that pipelines are created

* Add edit backend default values

* Process images using backend with lowest latency

* Remove projects from ML schemas

* Resolve todos

* Raise exception if no backends online

* Fail the job if no backends are online

* feat: begin storing category maps with algorithms in the DB

* feat: begin saving all logits and scores from all predictions

* fix: complete rename of softmax_scores field (now any calibrated score)

* feat: admin sections for Classifications and Category Maps

* fix: save simple labels along with category map data

* feat: use function to generate fake classifications

* feat: update schema in example ML backend. comments

* fix: formatting

* fix: remove bad admin filter

* fix: update formatting (line-lengths)

* fix: reset line-length to existing project value

* feat: API views for algorithm category maps

* feat: define schema for Algorithm & AlgorithmCategoryMap in the ML backend API

* feat: continue writing schema for algorithms

* feat: update image fetching utils based on latest AMI data companion

* feat: bring schemas related to the ML backend responses in sync

* fix: fix import

* feat: update schema for algorithms

* feat: update tests for processing pipeline responses

* chore: update formatting (line-lengths)

* fix: use only the key field for keeping algorithms unique

* feat: update logging when saving pipeline results

* feat: features from live ml backend schema

* feat: improve matching existing algorithms & labels to incoming data

* chore: reduce number of test images

* fix: attempt to fix reprocessing when there is a new algorithm

* chore: refactor pipeline results into multiple functions

* feat: support to add logits & scores to existing classifications

* fix: logging and comments

* fix: improve job status updates and failure handling

* fix: allow specifying job type on create, update tests

* feat: move job logs to their own field to not mess with progress updates

* feat: basic tests for category maps

* feat: improve bulk saving & logging for pipeline results

* feat: update occurrence determinations when saving results

* feat: ensure determination is not considering intermediate classifications

* feat: allow agreeing with any prediction/identification

* fix: updates to detection occurrences

* fix: update detections with occurrences

* fix: fallback to non-terminal classifications if need be (non-moth)

* fix: creating occurrences in bulk

* feat: kill long queries during development

* feat: add test for mapping taxa from category maps

* feat: add more classification and detection details to API responses

* feat: optionally return pipeline results

* fix: associate category map with each classification in addition to algo

* chore: more logging when saving results

* feat: allow filtering captures by project

* fix: update formatting

* feat: make the classification list view more lightweight

* feat: use redis for primary cache locally

* feat: support for retying failed requests

* Change MLBackend to ProcessingService

* Change all instances of backend to processing service

* Fix ui formatting, fix tests, and add migrations

* Update comment to processing service

* Update process_images error handling

* Fix last_checked_live and processing services online

* Change Sync to Add Pipelines

* Remove updated at column for processing services

* Display column of num pipelines added

* Change status label of pipelines online to pipelines avaialble

* Use slugify to add processing service

* fix: clean up some logging, type warnings and extra code

* feat: remove slug field, update naming

* fix: update phrasing

* Remove print statements

* Fix log formatting

* Squash migrations

* task: rename directory and container name to relieve upcoming conflicts

* fix: missing check for no-project

* fix: field name

* Filter processing services by project ID

* Button indicates pipeline registration error

* fix: cleanup naming of Random vs Dummy pipelines

* feat: new schema for service info response

* feat: new schema for the processing service info response

* feat: change the main "process" endpoint path

* fix: fixes to pipeline status / get_info response

* fix: job should fail if sub-tasks fail (saving results)

* fix: ensure pipeline slugs are unique

* feat: improve error handling for results saved in subtasks

* feat: improve usage of pipeline configs and schema names

* fix: show blank date instead of undefined in the UI

* feat: slightly improve robustness of status check responses

* fix: slightly improve handling of unique pipeline identifiers

* feat: save category maps for algorithms when registering pipelines

* fix: skip slow counting in occurrences_per_day chart

* chore: shorten column name

* fix: selection of existing pipelines & algorithms

* fix: undefined variable

* feat: increase batch size per request

* chore: use old name for processing_service for now

* fix: update tests

* fix: use old endpoint name for now

* fix: conflicting migrations

* Fix processing service error handling

* feat: show algorithm key/slug in UI

* fix: don't rename existing algorithms if not necessary

* feat: update existing matching algorithms

* fix: ensure scores and labels align

* fix: missing env var during build time

---------

Co-authored-by: Vanessa Mac <[email protected]>
  • Loading branch information
mihow and vanessavmac authored Jan 26, 2025
1 parent e5b1aed commit b5d6885
Show file tree
Hide file tree
Showing 61 changed files with 2,981 additions and 737 deletions.
53 changes: 53 additions & 0 deletions ami/jobs/migrations/0013_add_job_logs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Generated by Django 4.2.10 on 2024-12-17 20:01

import django_pydantic_field.fields
from django.db import migrations

import ami.jobs.models


def migrate_logs_forward(apps, schema_editor):
"""Move logs from Job.progress to Job.logs"""
Job = apps.get_model("jobs", "Job")
jobs_to_update = []
for job in Job.objects.filter(progress__isnull=False):
if job.progress.logs or job.progress.errors:
# Move logs from progress to the new logs field
job.logs.stdout = job.progress.logs
job.logs.stderr = job.progress.errors
jobs_to_update.append(job)
# Update all jobs in a single query
Job.objects.bulk_update(jobs_to_update, ["logs"])


def migrate_logs_backward(apps, schema_editor):
"""Move logs from Job.logs back to Job.progress"""
Job = apps.get_model("jobs", "Job")
jobs_to_update = []
for job in Job.objects.filter(logs__isnull=False):
# Move logs back to progress
job.progress.logs = job.logs.stdout
job.progress.errors = job.logs.stderr
jobs_to_update.append(job)
# Update all jobs in a single query
Job.objects.bulk_update(jobs_to_update, ["progress"])


class Migration(migrations.Migration):
dependencies = [
("jobs", "0012_alter_job_limit"),
]

operations = [
migrations.AddField(
model_name="job",
name="logs",
field=django_pydantic_field.fields.PydanticSchemaField(
config=None, default={"stderr": [], "stdout": []}, schema=ami.jobs.models.JobLogs
),
),
migrations.RunPython(
migrate_logs_forward,
migrate_logs_backward,
),
]
23 changes: 23 additions & 0 deletions ami/jobs/migrations/0014_alter_job_progress.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Generated by Django 4.2.10 on 2024-12-17 20:13

import ami.jobs.models
from django.db import migrations
import django_pydantic_field.fields


class Migration(migrations.Migration):
dependencies = [
("jobs", "0013_add_job_logs"),
]

operations = [
migrations.AlterField(
model_name="job",
name="progress",
field=django_pydantic_field.fields.PydanticSchemaField(
config=None,
default={"errors": [], "logs": [], "stages": [], "summary": {"progress": 0.0, "status": "CREATED"}},
schema=ami.jobs.models.JobProgress,
),
),
]
12 changes: 12 additions & 0 deletions ami/jobs/migrations/0015_merge_20250117_2100.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Generated by Django 4.2.10 on 2025-01-17 21:00

from django.db import migrations


class Migration(migrations.Migration):
dependencies = [
("jobs", "0013_merge_0011_alter_job_limit_0012_alter_job_limit"),
("jobs", "0014_alter_job_progress"),
]

operations = []
Loading

0 comments on commit b5d6885

Please sign in to comment.