-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci: Check commit sha for knowledge contributions #1188
Comments
We could even take this further and try to find questions and answers in the document, or, just probe for specific things we know to get lost in translation often, like numbers, symbols, currency, units of measurement, etc. |
So there are several levels of checks that could be done:
(1) is pretty easy. (2) is a little harder. (3) is more complicated and could require the repo to be cloned into a tmp folder to look for matching md files. I think (1) is doable. (2) and (3) require opening up a URL and I am concerned about random requests from the action runner since a malicious qna.yaml could get our action to hit any provided URL. I don't think we should do (2) or (3). They will be implicitly handled by the SDG check since it must load the knowledge to generate. |
pattern can be a glob like |
Resolves instructlab#1188 Signed-off-by: Christian Kadner <[email protected]>
Hmm, the globbing eliminates using Python Requests. PyGithub could list all files in the repo for a given commit, maybe even with a glob filter? I agree, a local clone might be overkill. Even a shallow, partial clone could potentially be huge. |
Right. But we could could limit our check to GitHub URLs only. |
One possibility is to do a simple HTTP HEAD request against the commit to check that at least the URL and commit seem to be valid.
I agree that going beyond and trying to process the glob might be too involved. |
We could. But the yaml is allowed to use any git repo not just those in github.
This could work for github. We would have to remove the trailing To support any valid git URL, we would need to use a git agent to fetch the commit SHA from the repo. |
could be used. The exit code will be non-zero if the commit does not exist. But we would still have to |
I hear you but practically speaking I don't know that anyone is going to use anything other than a github repo for this. |
This issue has been automatically marked as stale because it has not had activity within 60 days. It will be automatically closed if no further activity occurs within 31 days. |
Closing in favor of instructlab/schema#30 |
@bjhargrave -- We may want to add a check (linter or separate) to verify knowledge documents are actually "available" with the specified commit
sha
So this part of the
qna.yaml
...Would "translate" to a URL like this ...
And then we could use the Github Python API or even just Python requests to check for a
200
HTTP status, similar to this simplecurl
example:curl -o /dev/null -w "%{http_code}" \ -s https://raw.githubusercontent.com/juliadenham/Summit_knowledge/ad185ef/BBC_studios.md 200
The text was updated successfully, but these errors were encountered: