Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backend should return results data in a task agnostic format for frontend consumption #678

Open
1 task done
njbrake opened this issue Jan 17, 2025 · 2 comments
Open
1 task done
Assignees
Labels
api Changes which impact API/presentation layer backend enhancement New feature or request schemas Changes to schemas (which may be public facing) sdk

Comments

@njbrake
Copy link
Contributor

njbrake commented Jan 17, 2025

Motivation

I believe this is also a relevant issue for the task that @HareeshBahuleyan is working on:

As I understand it currently, the metric names and settings are hardcoded into the frontend, e.g. https://github.com/mozilla-ai/lumigator/blob/main/lumigator/frontend/src/components/molecules/LExperimentResults.vue#L69

As we support new metrics and custom user metrics in the future, I'm thinking that the frontend should be able to remain ignorant about what the specific metric is.

In order to support this, I think we need to restructure the file whose path we provide to the frontend, which relates to #670.

Currently, the file that the eval job creates and saves to S3 (which then the backend creates a presigned url so that the frontend can download the file) looks like this:

{
  "rouge": {
    "rouge1": [
      0.27272727272727276,
      0.27586206896551724
    ],
    "rougeLsum": [
      0.27272727272727276,
      0.20689655172413796
    ],
    "rouge1_mean": 0.27429467084639503,
    "rougeLsum_mean": 0.23981191222570536
  },
......
  "bertscore": {
    "precision": [
      0.8871338963508606,
      0.8618351817131042
    ],
    "recall": [
      0.8770406246185303,
      0.8888148069381714
    ],
...
    "hashcode": "hashymchashhash",
    "precision_mean": 0.8744845390319824,
    "recall_mean": 0.8829277157783508,
  },
  "predictions": [
    "a summary",
    "a summary2"
  ],
  "ground_truth": [
    "the real summary",
    "the real summary2"
  ]
} 

Can we re-org this to something that looks more like

{
  "metrics": {
    "rouge1": {
      "displayName": "ROUGE-1",
      "values": [
        0.27272727272727276,
        0.27586206896551724
      ],
      "mean": 0.27429467084639503
    },
    "rougeL": {
      "displayName": "ROUGE-L",
      "values": [
        0.27272727272727276,
        0.27586206896551724
      ],
      "mean": 0.27429467084639503
    },
    "bertscorePrecision": {
      "displayName": "BERTScore Precision",
      "values": [
        0.8871338963508606,
        0.8618351817131042
      ],
      "mean": 0.8744845390319824
    },
    "bertscoreRecall": {
      "displayName": "BERTScore Recall",
      "values": [
        0.8770406246185303,
        0.8888148069381714
      ],
      "mean": 0.8829277157783508
    }
  },
  "artifacts": {
    "predictions": [
      "a summary",
      "a summary2"
    ],
    "ground_truth": [
      "the real summary",
      "the real summary2"
    ]
  },
  "parameters": {
    "max_input_length": 512,
    "max_output_length": 150,
    "num_beams": 4,
    "length_penalty": 2.0,
    "early_stopping": true
  }
}

This would be the first step towards a more structured output. As I understand it, the backend is agnostic about what is in the output result.json from a job, and the frontend relies upon the job to create the result.json in the right format. My thought was that it would help if the backend was responsible for ensuring that the job output conforms to the standards that the frontend requires.

I think(?) this should help to make our code more flexible to handle new tasks.

In order to make this backwards compatible at first, we can overhaul each job to save both the existing result.json file in addition to a new file that conforms to our new agreed upon format. This way we don't need to as tightly coordinate the backend with frontend changes.

Alternatives

This is more of an incremental step. We could try to take a more massive bite and try to completely overhaul how files are saved and provided to the frontend, for example, saving and providing 3+ files to the frontend instead of the single results.json.

Contribution

Happy to help work on this if this direction sounds good.

Have you searched for similar issues before submitting this one?

  • Yes, I have searched for similar issues
@njbrake njbrake added enhancement New feature or request sdk backend api Changes which impact API/presentation layer schemas Changes to schemas (which may be public facing) labels Jan 17, 2025
@njbrake
Copy link
Contributor Author

njbrake commented Jan 17, 2025

I think @george-mzai is the right person to ask about what he needs the new file to contain so that he can access everything he needs to display the content? I'm thinking that this restructure will make your life on the frontend much easier, but let me know if that's not the case 😆

I think @peteski22 may be the right person to pull in to ask about whether this API makes sense from the backend perspective? Right now as I understand it, the backend doesn't do any validation of the artifact result.json created by a job, I'm not sure where the validation should go.

I think @ividal or @veekaybee or @aittalam may be the ones to comment about whether this makes sense to alter the job output to conform to this output?

@ividal ividal changed the title [FEATURE]: Decouple metric names from the frontend Decouple metric names from the frontend Jan 22, 2025
@njbrake njbrake assigned njbrake and unassigned njbrake Jan 22, 2025
@njbrake njbrake changed the title Decouple metric names from the frontend Backend return results data in a task agnostic format for frontend consumption Jan 22, 2025
@njbrake njbrake changed the title Backend return results data in a task agnostic format for frontend consumption Backend should return results data in a task agnostic format for frontend consumption Jan 22, 2025
@njbrake njbrake self-assigned this Jan 22, 2025
@ividal
Copy link
Contributor

ividal commented Jan 24, 2025

My 2 cents, this makes perfect sense.

Two upcoming bigger bodies of work will be:

  • Supporting Translation - this can start with metrics we already support, but it will likely make sense to extend to COMET or others.
  • Supporting task-agnostic metrics (e.g. related to robustness, persona drift...). It´s indeed not practical at all to have to change hardcoded values in UI and BE for it, when the UI could just read what metrics were returned.

We just need to keep an eye out on how we show experiments that contain different sets of metrics (e.g. once we have summarization, translation, etc.).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Changes which impact API/presentation layer backend enhancement New feature or request schemas Changes to schemas (which may be public facing) sdk
Projects
None yet
Development

No branches or pull requests

2 participants