Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usage statistics #24

Open
abadger opened this issue Mar 12, 2021 · 2 comments
Open

Usage statistics #24

abadger opened this issue Mar 12, 2021 · 2 comments
Labels
enhancement New feature or request

Comments

@abadger
Copy link

abadger commented Mar 12, 2021

We're making a lot of changes to how end users get ansible (collections, execution environments) and what ansible is compatible with (python version, collections not supporting all the same versions of python as the controller). It would be good to gather stats on end user usage over time to see if relative usage changes as we make these changes.

  • When we move to python3.8+ for ansible, will end users migrate to the new version of ansible or will they be stuck on the old versions?
  • When we moved to collections, was there a dropoff in the number of people using the ansible package? Did those people move to using ansible-base instead?
  • Is there a sweet spot for support in how long before users upgrade to a new version of ansible?

Some possible sources of information:

  1. PyPI stats
  2. GitHub Issues talking about ansible-version
  3. GitHub Issues listing Python Sitepackage dir which includes versions
  4. Web traffic stats for specific versions
  5. PPA stats
  6. GitHub issues that mention OS (as we know what default Python is on an OS)
@gundalow gundalow added the enhancement New feature or request label Mar 12, 2021
@GregSutcliffe
Copy link
Contributor

GregSutcliffe commented Mar 15, 2021

Thanks @abadger. Some thoughts:

  • Most of these are going to require knowledge of when we made a change, so that we can analyse if there's a before/after difference. Are we capturing these already (to my knowledge, no, but I could be wrong), and if not, do you have thoughts on a good place/format to store them?
  1. I suspect forecasting if users will upgrade to a given version of Python is a tough problem to solve.
  2. (& 3) These other two seem tractable, at first glance

On sources:

  1. PyPi Stats goes back 6 months, and getting everything we want from longer ago might require more than we can get from BigQuery (one query for daily data, ansible-base, for a range of 5-12 months ago, is 900Mb). Are some of our dates old than that?
  2. Does ansible-version imply a specific Python version? I would expect each release to support a range...
  3. That seems achievable with some text parsing of issue bodies. Is this over collections? core? other repos?
  4. I'll see what I can get, although last time I looked it was overwhelmingly "/latest/"
  5. Who has access to that? @dericcrago?
  6. Seems like a tricky parsing problem over a large number of issues. Definitely last on the list.

Most of this is reasonable. I think the real challenge here is defining how we combine all of these to get an estimate of the proportion a particular release (ansible or python) makes of the whole, and especially an estimate of how much of the wider community we think we're capturing (to assess validity). I guess we can get some initial data and start figuring out if/how they are compatible.

@gundalow
Copy link
Contributor

gundalow commented Apr 9, 2021

If we need to pay for BigQuery then we can do that, easier and cheaper than spending our time on workarounds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants