Backend should return results data in a task agnostic format for frontend consumption #678

njbrake · 2025-01-17T20:48:58Z

Motivation

I believe this is also a relevant issue for the task that @HareeshBahuleyan is working on:

As I understand it currently, the metric names and settings are hardcoded into the frontend, e.g. https://github.com/mozilla-ai/lumigator/blob/main/lumigator/frontend/src/components/molecules/LExperimentResults.vue#L69

As we support new metrics and custom user metrics in the future, I'm thinking that the frontend should be able to remain ignorant about what the specific metric is.

In order to support this, I think we need to restructure the file whose path we provide to the frontend, which relates to #670.

Currently, the file that the eval job creates and saves to S3 (which then the backend creates a presigned url so that the frontend can download the file) looks like this:

{
  "rouge": {
    "rouge1": [
      0.27272727272727276,
      0.27586206896551724
    ],
    "rougeLsum": [
      0.27272727272727276,
      0.20689655172413796
    ],
    "rouge1_mean": 0.27429467084639503,
    "rougeLsum_mean": 0.23981191222570536
  },
......
  "bertscore": {
    "precision": [
      0.8871338963508606,
      0.8618351817131042
    ],
    "recall": [
      0.8770406246185303,
      0.8888148069381714
    ],
...
    "hashcode": "hashymchashhash",
    "precision_mean": 0.8744845390319824,
    "recall_mean": 0.8829277157783508,
  },
  "predictions": [
    "a summary",
    "a summary2"
  ],
  "ground_truth": [
    "the real summary",
    "the real summary2"
  ]
}

Can we re-org this to something that looks more like

{
  "metrics": {
    "rouge1": {
      "displayName": "ROUGE-1",
      "values": [
        0.27272727272727276,
        0.27586206896551724
      ],
      "mean": 0.27429467084639503
    },
    "rougeL": {
      "displayName": "ROUGE-L",
      "values": [
        0.27272727272727276,
        0.27586206896551724
      ],
      "mean": 0.27429467084639503
    },
    "bertscorePrecision": {
      "displayName": "BERTScore Precision",
      "values": [
        0.8871338963508606,
        0.8618351817131042
      ],
      "mean": 0.8744845390319824
    },
    "bertscoreRecall": {
      "displayName": "BERTScore Recall",
      "values": [
        0.8770406246185303,
        0.8888148069381714
      ],
      "mean": 0.8829277157783508
    }
  },
  "artifacts": {
    "predictions": [
      "a summary",
      "a summary2"
    ],
    "ground_truth": [
      "the real summary",
      "the real summary2"
    ]
  },
  "parameters": {
    "max_input_length": 512,
    "max_output_length": 150,
    "num_beams": 4,
    "length_penalty": 2.0,
    "early_stopping": true
  }
}

This would be the first step towards a more structured output. As I understand it, the backend is agnostic about what is in the output result.json from a job, and the frontend relies upon the job to create the result.json in the right format. My thought was that it would help if the backend was responsible for ensuring that the job output conforms to the standards that the frontend requires.

I think(?) this should help to make our code more flexible to handle new tasks.

In order to make this backwards compatible at first, we can overhaul each job to save both the existing result.json file in addition to a new file that conforms to our new agreed upon format. This way we don't need to as tightly coordinate the backend with frontend changes.

Alternatives

This is more of an incremental step. We could try to take a more massive bite and try to completely overhaul how files are saved and provided to the frontend, for example, saving and providing 3+ files to the frontend instead of the single results.json.

Contribution

Happy to help work on this if this direction sounds good.

Have you searched for similar issues before submitting this one?

Yes, I have searched for similar issues

The text was updated successfully, but these errors were encountered:

njbrake · 2025-01-17T20:53:57Z

I think @george-mzai is the right person to ask about what he needs the new file to contain so that he can access everything he needs to display the content? I'm thinking that this restructure will make your life on the frontend much easier, but let me know if that's not the case 😆

I think @peteski22 may be the right person to pull in to ask about whether this API makes sense from the backend perspective? Right now as I understand it, the backend doesn't do any validation of the artifact result.json created by a job, I'm not sure where the validation should go.

I think @ividal or @veekaybee or @aittalam may be the ones to comment about whether this makes sense to alter the job output to conform to this output?

ividal · 2025-01-24T18:04:20Z

My 2 cents, this makes perfect sense.

Two upcoming bigger bodies of work will be:

Supporting Translation - this can start with metrics we already support, but it will likely make sense to extend to COMET or others.
Supporting task-agnostic metrics (e.g. related to robustness, persona drift...). It´s indeed not practical at all to have to change hardcoded values in UI and BE for it, when the UI could just read what metrics were returned.

We just need to keep an eye out on how we show experiments that contain different sets of metrics (e.g. once we have summarization, translation, etc.).

njbrake added enhancement New feature or request sdk backend api Changes which impact API/presentation layer schemas Changes to schemas (which may be public facing) labels Jan 17, 2025

This was referenced Jan 17, 2025

Define interface between a job and the backend #670

Closed

Implement job_result_download for experiment service #632

Open

Use Job Schemas to validate input and output of jobs (inference & eval_lite) #692

Merged

ividal changed the title ~~[FEATURE]: Decouple metric names from the frontend~~ Decouple metric names from the frontend Jan 22, 2025

njbrake assigned njbrake and unassigned njbrake Jan 22, 2025

njbrake changed the title ~~Decouple metric names from the frontend~~ Backend return results data in a task agnostic format for frontend consumption Jan 22, 2025

njbrake changed the title ~~Backend return results data in a task agnostic format for frontend consumption~~ Backend should return results data in a task agnostic format for frontend consumption Jan 22, 2025

njbrake self-assigned this Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backend should return results data in a task agnostic format for frontend consumption #678

Backend should return results data in a task agnostic format for frontend consumption #678

njbrake commented Jan 17, 2025 •

edited

Loading

njbrake commented Jan 17, 2025

ividal commented Jan 24, 2025

Backend should return results data in a task agnostic format for frontend consumption #678

Backend should return results data in a task agnostic format for frontend consumption #678

Comments

njbrake commented Jan 17, 2025 • edited Loading

Motivation

Alternatives

Contribution

Have you searched for similar issues before submitting this one?

njbrake commented Jan 17, 2025

ividal commented Jan 24, 2025

njbrake commented Jan 17, 2025 •

edited

Loading