-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Stat: Maintainers #23
Comments
Good stuff, thanks @gundalow. Jotting down some musing while I read it through:
Note that a given person can be multiple things - an employed developer in one repo, but contributing on their own time in another. This will like need to be managed per repo, which is an overhead - or at least we should verify that whether it's an issue.
This seems fair. IIRC one can get a public event stream for a user from the GH API, so we can figure out: a) Have they been active in this repo We can probably also quantify if there is work outstanding in a repo - an inactive maintainer on a repo with 50 open PRs is probably worse than one with 2 open PRs.
We could probably at least infer this from who is committing (IIRC GH logs the author and the committer separately). Possibly for a later phase though. Shower thoughtsThis looks like a solid start point, although it will likely require a new crawler at the user level. Shouldn't be too hard to write. That said, I'm slightly wary of creating metrics for people - they tend to lead to scoreboards, system gaming, and in rare cases, blame arguments. We run a risk of accidentally creating OKRs for the community when we're really trying to create them for our team (that is, we would like to know when a human needs to reach out with an offer of help, not create a table of names to guilt the lowest entries into trying harder). Thus, what's the plan for what we do with the data? Is it just for our team? Are we planning to only show aggregate stats here? Or will it be public per repo? (Because the latter essentially de-anonymises the data, any given repo will have just a few maintainers). I'm not against any particular outcome here, I just want to be sure we understand how the end result can be used. |
My 50 cents in addition:
|
Do you mean, just the names, in a random order? That I could support, I think. Anything more detailed would be for the community team and the steering committee (because otherwise you again have a public scoreboard that can be gamed). Otherwise, good points I think. We'll need to resolve how we feel about BOTMETA but that's likely a second phase of work anyway. |
Works for me:) We could also sort them by how many commits they did (it's available to see in our repositories, e.g. there https://github.com/ansible-collections/community.mysql/graphs/contributors). But it could be random, it's also fine, imo |
OK, cycling around to this. I already have a
This means the date isn't updated when they stop showing up in the list of maintainers, allowing to us to see how the number of maintainers is evolving across the repo set. Once we have the data being collected by the crawler, we can figure out how to display it, but to comment on this one:
That's true, but that's a single repo. If we're going to make it significantly easier to consume that across many repos, then we have an duty to consider how that new format might be used. Just because data already exists doesn't mean you are absolved of responsibility when you process it. |
FYI: When looking for new maintainers manually, I usually use the following metrics:
Maybe it will give some ideas how to find potential maintainers, though not all the metrics can be used in scripts, mathematical models, etc. Also if we see people who are active but not for time long enough for maintainers, we should support / encourage / mentor such people as future candidates. So, would be nice to have a banner saying something like "This person has opened 5 PRs in c.docker during the last week" to pay attention to such a person. |
What
Maintainers are a key part of the Ansible community. They are the people that can merge code (either directly or via ansibullbot)
Definition of a maintainer
triage
(or higher) permissions for a specific repo in GitHubActive maintainer
As a 2nd phase, we may wish to track how many of these maintainers are "active".
Someone interacting with the repository in any way should count as being an active maintainer, ie:
We would want to define some time limit, though given some repositories don't have much activity, maybe this limit should be fairly high, ie 6 months+?
Which repos
There are many collections on Galaxy.
Some of those are under gh/ansible-collections
The
ansible
package contains some collections from gh/ansible-collections, as well as some collections hosted elsewhere.If we need to extract a list of maintainers, then I believe that will limit us to collections under gh/ansible-collections.
We may wish to filter this, to only collections that we include in the
ansible
package.Special case repos with
.github/BOTMETA.yml
As well as maintainers being defined by having direct permissions via GitHub, we use
ansibullbot
to delegate permissions to certain people for a specific directory example.Given the vast number of BOTMETA maintainers, we may wish to track this separately to Collection owners.
"Active" status is harder, as it possible a repository may go many years before Ansibullbot needs to ping a specific maintainer, ie when a PR is raised against a specific module
What would cause an increase
When we add a new collection into the
ansible
package, it's likely that will have some new maintainers, ieibm.ds8000 requests a new collection repo with 8 maintainers, which are all new to the ansible-collections GitHub Org
What would cause a decrease
Presentations/questions we will ask the data
The text was updated successfully, but these errors were encountered: