-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bypassing validation #341
Comments
I mean, there's a case to be made that we shouldn't be validating at the start of each metric function. This means, for example, that when we call a submodule's Alternatively, we could consider adding caching to the validators that process either the estimates or the references alone - that way if a validator gets called with the same reference multiple times, it will just used the cached result. Of course, this is only a good idea if the references are small in memory. |
I don't think that would work -- most of the validation checks exist to prevent errors that would propagate silently. OTOH, there's a good case to be made for validating once in
I agree in principle, but for most of the cases I have in mind, they're really as simple as they can feasibly be.
Yes, though caching in python is kind of a pain. The best option IMO is memoizing via joblib, but that doesn't provide any policies for cache eviction or size management. |
Warning, possibly bad ideas to follow.
mir_eval is generally pretty defensive about its inputs, and (i think?) all of the evaluators include at least some minimal validation of inputs for proper formatting, semantics, data types, etc. This is a good thing in general, but it can be quite wasteful when validation checks are expensive and performed many times on the same inputs. In a current project of mine, I'm finding that more than half of the computation time is going into validation, and most of this is wasted as the comparisons are of different model outputs against the same reference data.
This makes me think that we could provide a mechanism to bypass validation of input in situations where the user is certain that the data is well-formed. (Note: for now, I've simply commented out the validation lines in question, but that's obviously a bad solution in general.) We could control this by a context manager and a global state variable (yuck, I know), which would be pretty slick from a user perspective:
Ugly as a global state variable is, I think I prefer it to threading flag parameters through all evaluators here. This is primarily because we already bury quite a few things in kwargs, and it would be easy to lose track. Forcing a context manager for this would at least make it explicit and visually distinct from the user's perspective when this is happening.
Or it might just be a bad idea. 🤷 Thoughts?
The text was updated successfully, but these errors were encountered: