-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Diachronic view #35
Comments
A few notes:
|
Feedback from Open Tech call on 2023-04-26:
|
To further refine the view, a few questions to assess what is really needed (and for me to understand):
In addition, a note from my side, axis labels and a legend or description (for color coding) need to be added. |
Average (mean or median) plus min and max (or first and last quartile, or better decentile or percentile) would be great, yes. But page-wise display is probably too much to ask for. As long as its easily possible to navigate into the raw data by page to analyse regressions in detail (i.e. when some metric fell drastically, like a 20% worse minimum) that's enough IMO. @mweidling and @paulpestov also discussed page-wise aggregation with me: currently the backend uses Dinglehopper, which is page-wise. So naive aggregation is macro-averaged ( In brief: micro-averaged aggregate for average score besides per-page minimum score should be sufficient for quick diagnostics, the rest can always be dug up by navigating into the raw data. (And of course later one might implement some way to navigate into the per-page reports by Dinglehopper.)
That's a difficult question. In terms of UI, IMO it would make sense to provide a slider to narrow it down dynamically. It should also be robust against gaps in the data (e.g. when some workflow or some metric was not available in earlier releases).
I don't understand. A graph will always show the concrete values, too. (Perhaps the exact numerical values can be highlighted via mouse-over?)
See above (no page-by-page display, but minimum/maximum across pages).
See above (browsing into the individual per-page reports would be super cool, but is a lot of effort to implement; as long as one can manually look these up it should suffice)
Yes, labelled tics with release/date on x-axis, score on y-axis and a colour legend where multiple scores are combined into one chart. |
We had a talk with @bertsky, @paulpestov, and @mweidling about this. The next steps will be:
In addition, in the table tab of the workflow dashboard a sorting feature will be added. |
(this is a copy of this internal issue for public purposes.
Describe the feature you'd like
Currently we display only the latest information about a workflow in QuiVer. We run a workflow A, the important metrics are saved and overwritten when we run workflow A again.
In order to measure how the changes in the OCR-D software impact the OCR quality as well as the hardware statistics we should introduce diachronic information to QuiVer, e.g. via a time stamp.
User story
As a developer I need an overview of how the changes in the software effect the OCR quality and hardware metrics in order to be certain that the newest contribution to OCR-D really improve the software's outcome.
Ideas we have discussed so far
How to display the information
For each GT corpus available there should be a line chart that depicts how a metric has changed over time. Each step in time (x axis) represents an ocrd_all or a ocrd_core release (clarify!)
Users can choose between the different metrics and can see a tendency whether the metric improves or not.
Underlying data structure
When selecting a GT corpus the front end uses an ID map file that points it to the right collection of JSON objects. Each OCR-D workflow that is executed on a GT corpus has a separate file in which all the runs per release are present.
Given GT workspace 16_ant_simple. We then have a file 16_ant_simple_minimal.json with all its benchmarking workflows, 16_ant_simple_selected_pages.json with all its benchmarking workflows etc. Each executed workflow has a timestamp by which the front end can then sort the single executions and retrieve the relevant data.
TODOs
The text was updated successfully, but these errors were encountered: