Skip to content
This repository has been archived by the owner on May 14, 2024. It is now read-only.

Smaller ansible packages (no tests/ dir?) #29

Closed
Daryes opened this issue Jun 26, 2021 · 13 comments
Closed

Smaller ansible packages (no tests/ dir?) #29

Daryes opened this issue Jun 26, 2021 · 13 comments

Comments

@Daryes
Copy link

Daryes commented Jun 26, 2021

Summary

SUMMARY

Hello,
As asked, reposting this from ansible-collections/overview#177

Not to be nitpicking, but when installing the metapackage ansible 3.4.0, I was surprised by the size the docker image I built took.

To be more specific, create a virtualenv and install ansible with pip3 install --no-cache-dir ansible==3.4.0
Then :

$ cd <venv>
$ du -sh 
484M    .

$ cd lib/python3.*/site-packages/ansible_collections
$ du -sh
400M    .

# list all "tests" directories under collections and return their disk size
$ find . -name "tests" -type d -print0 | xargs -0 du -sh --total
(...)
122M    total

That was at first glance.

While I understand the need of the unit and integration tests, they can become heavy in disk space consumption.
Maybe I missed something, but is there a requirement to have them shipped in the release builds ?

ISSUE TYPE
  • Enhancement

Additional Information

No response

@gundalow gundalow changed the title More lightweight releases Smaller ansible packages (not tests/?) Jul 7, 2021
@gundalow gundalow changed the title Smaller ansible packages (not tests/?) Smaller ansible packages (no tests/ dir?) Jul 7, 2021
@felixfontein
Copy link
Contributor

felixfontein commented Jul 7, 2021

We've discussed this a bit at today's meeting. The main problem is that we're still waiting for an answer from RH legal. We'll try to speed up that process.

A side-discussion started on how to make it easier to prepare more specialized (and thus smaller) environments. A related discussion for that is #31.

(https://meetbot.fedoraproject.org/ansible-community/2021-07-07/ansible_community_meeting.2021-07-07-18.00.html)

@dmsimard dmsimard added the next_meeting Topics that needs to be discussed in the next Community Meeting label Oct 5, 2021
@dmsimard
Copy link

dmsimard commented Oct 5, 2021

I'd like to revisit this at the next meeting if we have the time and I have submitted a PR to exclude files from future builds of community.general that I would hope to propose for every collection included in the package: ansible-collections/community.general#3517

My understanding of the potential legal liabilities is that we should avoid modifying or editing built collection tarballs that are published upstream. Therein lies not just potential legal or licensing issues but security ones as well: the checksums would no longer match and thus it would be harder to validate the trustworthiness of what we ship.

However, if the files in question are excluded from the build to begin with, they would not land in the tarball and this should not be a problem.

@dmsimard
Copy link

dmsimard commented Oct 5, 2021

I've linked an additional PR to community.network for the sake of discussion.

To put things into perspective, between just community.general and community.network, we're talking about excluding over 2k files which saves 10s off of the installation duration and reduces the uncompressed size on disk from 33MB to 15.6MB.

@dmsimard
Copy link

dmsimard commented Oct 6, 2021

We unfortunately didn't get time to discuss this during the meeting though it spawned a discussion right after, copied here for posterity.

In terms of take-aways:

  • It is still unclear whether or not we can exclude tests and other files from the source tarball due to licensing. dmsimard to renew discussions with legal
  • Felix suggests we could consider leveraging setup.py specifically for the ansible package to prevent the installation on-disk of unnecessary files (such as tests)
  • Tadeboro mentioned that we can't remove the entire tests directory completely since it contains ansible-test ignore files which are expected (and to an extent, necessary) when importing a collection to galaxy or automation hub.
15:06:43 <@abadger1999> dmsimard: I think that modifying them before they are sent to galaxy is much safer.  (community.general and community.network may still be concerning because the copyright there is owned by a lot of people rather than a single company, though)
15:07:35 <@abadger1999> I see that those two are where we achieve a significant savings so I'm not sure whether we want to evaluate just how concerning it would be.
15:08:07 <felixfontein> it would be a lot easier if galaxy would allow to publish both a source + non-source version of the galaxy at the same time, I think
15:08:13 <@dmsimard> abadger1999: my personal experience does not ring me any legal alarm bells about not shipping tests and git config files
15:08:22 <@abadger1999> felixfontein: +10000
15:08:40 <felixfontein> s/the galaxy/the collection/
15:09:09 <@dmsimard> abadger1999: case in point, if you install ansible-core you don't get all the unit and integration tests
15:09:32 <@dmsimard> (there is ansible-test but not the tests themselves)
15:09:43 <@abadger1999> dmsimard: but, if you download the tarball from pypi you do get them all.
15:10:40 <@abadger1999> So pypi is definitely distributing the corresponding  source.  What you get installed might also count as corresponding source for the gpl but it might not.
15:10:44 <felixfontein> https://files.pythonhosted.org/packages/19/c0/d7d23d4bfa04857cab82d2773d68290ce16941b7511163e611d196303bf9/ansible-core-2.11.5.tar.gz contains everything, yes
15:10:52 <@dmsimard> abadger1999: ah, I see what you mean
15:11:17 <felixfontein> it also has 12 MB docs directory and a 25 MB test directory
15:11:24 <@dmsimard> there is indeed a distinction between source and installation
15:12:02 <felixfontein> right now if you pip install ansible-core, it will download the source package and create a wheel out of it, and install the wheel
15:12:23 <@dmsimard> there's still some things that aren't included from the ansible git repo in that tarball (even if just .github, for example)
15:12:30 <felixfontein> you can also publish wheels, to make the installation step quicker, but for some specific reasons ansible-core does not do that
15:12:42 <@dmsimard> felixfontein: no wheel is a long story
15:12:57 <felixfontein> dmsimard: I know :) and it's not over yet...
15:13:41 <@dmsimard> ok /me sighs, I will try to get to the bottom of this with legal
15:13:47 <@abadger1999> There's various interpretations that you could make of "corresponding source" to either say it is okay or it is not okay to remove the tests. I think if upstream of the collection is doing it, then some of those interpretations of "not okay" do not apply so it's safer.  But I don't know if legal would be concerned about any of the remaining interpretations.
15:13:51 <felixfontein> dmsimard: "enjoy" :)
15:14:35 <tadeboro> dmsimard: We do not publish tests to PyPI most of the time, but this is a bit different from what we would like to do here because we are removing something that upstream does distribute.
15:14:38 <@abadger1999> (If ansible-core publishes a wheel and the sdist tarball, everything will still be GPL compliant.  If they published a wheel and not the tarball, then there would be questions of interpretation of the GPL there as well)
15:14:44 <felixfontein> actually, could we add some code to setup.py / ... in the generated package which removes tests? then the .tar.gz would still have them, but they would not end up in the wheel and installation
15:15:48 <@dmsimard> tadeboro: The idea is that we would not remove  or edit something that upstream distributes because it would not be distributed in the first place by being excluded from ansible-galaxy collection build
15:16:15 <@dmsimard> I don't want to remove the tests from the tarballs, I would rather they not be included to begin with
15:16:22 <felixfontein> dmsimard: also your upstream probably has a smaller amount of copyright holders than say community.general :)
15:17:29 <tadeboro> dmsimard: But that would only work for things "we" publish (where we is whoever we manage to convince to stop shipping tests). And getting people to change their build processes will probably be "a lot of fun" ;)
15:17:46 <felixfontein> abadger1999: do you think having tests/docs in the tarball but removing them from the wheel by setup.py would be something that's legally ok?
15:18:24 <@dmsimard> tadeboro: unless I am missing something obvious, we're only talking about a couple lines in galaxy.yml like in https://github.com/ansible-collections/community.general/pull/3517
15:18:25 <github-linkbot> https://github.com/ansible-collections/community.general/pull/3517 | open, created 2021-10-05T20:56:25Z by dmsimard: galaxy.yml: exclude non-collection files from collection build [bug,has_issue,WIP] 
15:18:46 <@dmsimard> I would like to think that if I try very hard I can manage to land a similar patches in a couple collections
15:18:54 <@abadger1999> felixfontein: For ansible-core?  yes, that's definitely okay as long as both the tarball and wheel are uploaded to pypi.
15:19:06 <felixfontein> abadger1999: no, I mean for ansible
15:19:17 <@abadger1999> felixfontein: ah... /me thinks....
15:19:24 <felixfontein> ansible-core already excludes that stuff from the wheel
15:19:37 <@abadger1999> felixfontein: yes... I think that works.
15:20:15 <tadeboro> dmsimard: excluding the whoe tests dir is not safe. AT least not for certified collections that do need the ignore files to pass the AH import.
15:20:49 <@abadger1999> felixfontein: I cannot see any additional GPL liability in doing things that way.
15:20:50 <felixfontein> tadeboro: dmsimard: yes, I would only exclude tests/unit/ and tests/integration/
15:21:27 <@dmsimard> good point, I will make note of that
15:21:57 <felixfontein> once three are files in tests/unit / tests/integration that are in the sanity ignore files you'll get new troubles though :)
15:22:01 <@dmsimard> does galaxy importer run unit and integration tests ?
15:22:12 <@dmsimard> I mean, other than lint and sanity
15:23:08 <tadeboro> I know AH runs sanity tests. Not sure how exactly they configure galaxy importer though.
15:24:43 <felixfontein> dmsimard: you could invest some time in teaching antsibull's generated setup.py to remove all tests/ directories, resp. exclude them from the wheel build
15:25:07 <felixfontein> dmsimard: that will probably speed up installation, and will reduce installed disk size a lot
15:25:38 <@abadger1999> and having antsibull-build create a wheel as well.
15:25:48 <@dmsimard> It would be nice to do it further up the stream but yes, it's something we can consider
15:26:24 <@dmsimard> It's something that is an issue outside of the community package as well -- in that the tarballs are "artificially" inflated and take longer to install via a regular "ansible-galaxy collection install"

@dmsimard
Copy link

I've been increasing the scope of testing for 5.0.0rc1 (including packaging in fedora) and I must re-iterate that I find this particularly problematic.

The Fedora RPM testing coverage highlighted thousands of errors and warnings of which we can ignore a sizeable chunk (ansible modules are not executable) but the remaining majority are related to tests and so, so many unnecessary files (files like .gitignore, .swp, .DS_Store, etc.).

I cannot in good conscience continue to knowingly and willingly ship all of these files, even if only considering how this reflects in terms of quality and professionalism to the community.

What might this look like in practice ? My current proposal in the fedora spec includes the following (and it's not even exhaustive):

for hidden in $(find ansible_collections | grep -E ".gitignore|.github|.azure-pipelines|.swp|.orig|.DS_Store|.keep|.gitkeep|.vscode|.pytest_cache|.yamllint|.pre-commit-config.yaml|.plugin-cache.yaml|.pytest_cache|.mypy_cache|.zuul.yaml|.flake8|.circleci|.gitattributes|.ansible-lint|.travis.yml|.gitlab-ci.yml|.idea|.settings"); do
  rm -rf $hidden
  count=$((count+1))
done
for tests in $(find ansible_collections | grep -E "tests/unit|tests/integration|tests/utils|tests/sanity|tests/runner|tests/regression"); do
  rm -rf $tests
  count=$((count+1))
done

# Additional, specific directories to remove
rm -rf ansible_collections/kubernetes/core/molecule

echo "Over ${count} files and directories removed."

That short snippet returns Over 24588 files and directories removed. (see build.log) and if we count the number of remaining files in the entire package, we're looking at 15526 files so these removed files make up for more than half of the package.

On another note, from a performance standpoint, it cuts the installation time in more than half on my (relatively fast) laptop:

# rc1
real	1m7.110s
user	0m59.975s
sys	0m9.460s

# a hypothetical rc2
real	0m30.916s
user	0m27.166s
sys	0m4.888s

I would like to strongly suggest that we include this cleanup as part of the build process and spin an ansible 5.0.0rc2 package.

@gundalow
Copy link
Contributor

Currently ansible-galaxy build ignores:

  1. https://github.com/ansible/ansible/blob/devel/lib/ansible/galaxy/collection/__init__.py#L793-L803
  2. Anything listed in a collection's galaxy.yml build_ignore example

@felixfontein
Copy link
Contributor

From today's discussion:

  1. It's too late to change it for 5.0.0.
  2. We can discuss whether such a change can make it into a 5.x.0 minor release, or has to wait for 6.0.0 (and its pre-releases).

@felixfontein
Copy link
Contributor

I will look at wheel building and making sure that the extra files aren't included in the wheel (and thus in the user's install), even if they are still in the tarball. I think this is the only thing we can really do without hearing from legal.

Also for Ansible 6, there will be a larger reduce of files in community.general and community.network due to them dropping the symlinks, which currently unfortunately get duplicated by setup.py. This is a tiny reduction compared to omitting tests etc., but better than nothing :)

@felixfontein
Copy link
Contributor

Here is a WIP PR for removing files from the bdist: ansible-community/antsibull-build#342

@felixfontein
Copy link
Contributor

In any case, if we want to do this already for 5.x.0, then we need some testing, and this should be announced in time (more than a couple of weeks in advance) so that collection authors/maintainers have some time to check whether this breaks something.

(While this shouldn't happen, you could have non-module plugins which depend on data, or even Python files, inside tests/, or other directories we are excluding. Or there could be other files in plugins/ we want to exclude for some reason, but which are actually needed by a plugin.)

@Andersson007
Copy link
Contributor

this PR should remind us when we review collections for inclusion to check for unnecessary files ansible-collections/overview#196. Please take a look

@felixfontein
Copy link
Contributor

Since this will happen for Ansible 6 (with the first 6.0.0 alpha targetted for next week) I'm going to close this.

Repository owner moved this from Fresh to Resolved in Community Topics TODO Apr 6, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants