-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Terraform sparse checkout module #19277
Comments
Hi @rverma-nikiai! Thanks for sharing this use-cases. The git module source actually already supports a syntax for selecting a sub-path from a repository, like this: module "dynamo-auto" {
source = "git::https://github.com/cloudposse/terraform-aws-dynamodb-autoscaler.git//modules/dynamo?ref=master"
} The extra That then just leaves the request for shallow cloning. The git handling is all done by a component which parses only the module "dynamo-auto" {
source = "git::https://github.com/cloudposse/terraform-aws-dynamodb-autoscaler.git//modules/dynamo?ref=master&depth=1"
} However, I think Terraform's module installer does a full clone by default just because when it was written the "shallow clone" functionality was relatively new and limited, and we wanted to be sure of proper behavior on subsequent commands such as upgrading the module, which requires running operations like I think we should investigate whether the improved shallow clone behavior added in Git 1.9 (now several years old) is featureful enough that we could enable shallow cloning by default in a future release, since the module installer's goal is always to install just the single version you requested, rather than to create a fully-fledged development environment for that repository. Before making that decision, we'll need to prototype it to make sure the upgrading behavior is well-behaved after a shallow, single-branch clone. We are in the early stages of planning some other changes to how Terraform manages configuration dependencies for a future release, so I'm going to label this one to remind us to consider this use-case as part of that work. |
@apparentlymart, though the terraform module support submodule like
It did clone the whole repo and reference to path which is useful locally. It still defats the purpose of sparse checkout, which provides various benefits. Just some thoughts as
Currently this will cause 3 complete clone of possibly in init step we can prebuilt the sparse-checkout info resulting in 1 clone of just 3 repos only. Anyways, shallow cloning would be a huge improvement standalone. |
Hi again @rverma-nikiai! Thanks for the additional context. It looks like you are interested in several slightly different (but related) problems here:
The first of these has already been addressed in master and will be included in the forthcoming v0.12.0 release: Terraform will now detect that all of these are coming from the same repository and only run Point 2 I think we can solve after we do some testing to make sure that Point 3 here is intentional because in a multi-module repository the different modules will often refer to one another with references like Point 4 is another one we can address eventually. For v0.12.0 we've switched to a directory naming scheme that reflects the module names in source code so that error messages (which now contain source location references) are more easily understandable. The new mechanism I mentioned for point 1 doesn't yet address this, since we wanted to keep things relatively simple for the first pass, but that mechanism could also potentially use additional techniques to share the files on disk between multiple copies of the same source. We intend to investigate that further in a later release. In order to keep things focused, let's say that this issue is about the second point, since I think that's the one that is in most need of some further study/prototyping. I expect we will also make a separate issue for the 4th point at a later date, once we've got some experience with this new download optimization fix in v0.12 and can potentially address any other concerns related to it at the same time. Point 1 here was originally discussed in #11435, which is now closed due to the fix being ready for release. |
@apparentlymart Now that hashicorp/go-getter#140 has been merged, any chance we can get terraform's vendoring updated to add support for shallow clones? I'm happy to open a PR including a docs update, I just need to know which target branch would be most appropriate at the moment. |
It looks like #20411 updated the go-getter version to include the shallow clone functionality. It is in 0.12 beta1. I'm looking forward to using this in 0.12, thanks! We can probably close this issue out. |
Since that other PR wasn't intentionally updating I'd also still like to investigate whether that option is necessary at all or if we can just make that behavior the default. Since we're not cloning the repository for development it seems unnecessary to produce a fully-functioning work tree by default, and in the rare case where someone does want to work directly with the cloned repository in |
Ah you're right I forgot about documentation, and I would fully support using depth=1 by default. I can't think of any reasonable situations where a user would need the full history in |
Surely the source can be cached as well as put in a lookup during execution. If the lookup contains the same url key just use the cached copy if not source it and add to the cache...this surely can't be hard to do versus downloading and identical copy form the internet over and over and over each time :/ |
Is there a github issue tracking the the following point?
I'm using the terraform-google-modules/gcloud/google module which ends up downloading hundreds of megabytes of history since the github repo contains |
That is the way Git works it is distributed so copies all history locally. There no way around this the first time |
It would be really nice to have shallow clone as default option for cloning Terraform modules. @apparentlymart can you tell whether there are any plans to implement it anytime soon? |
Nobody on the Terraform team at HashiCorp is currently working on this, because our attentions are currently elsewhere. As I mentioned before, the main trick here is making sure that shallow clone won't break the ability for
If the above is fruitful and it seems like making shallow clone the default work work, I expect the final change to If someone is interested in working on this but needs some more guidance, please let me know what specific questions you have and I can try to answer them as best I can with what I know already. |
Due to closed #11435 I'd like to slightly offtopic here and share a small pre-terraform routine utility that optimizes init (modules download) for git modules https://github.com/hayorov/terraform-init-booster |
Can we have some news? |
It seems that we have a beginning fix here: #10703 |
About Point 3. With 100 modules per repo, it will replicate it 100 times. Why not, for example, use links instead of copying 100 modules 100 times? |
I wanted to reference modules living in a mono repo. But boy, this is no fun. |
If all of your source code is in the same repository then you can use relative paths (starting with either |
@apparentlymart But then we use the possibility to version things, don't we? Or is there a trick? |
Indeed, when I hear "monorepo" I tend to understand that to mean "big bag of everything all versioned together as a single unit", so I was assuming that independent module versioning is not a requirement. If you do want to have separate versions for each of your modules then indeed it'd be best to use multiple smaller repositories to represent that, so that the boundaries between the packages are clear. |
You are absolutely right. Nevertheless, I was hoping to leverage versioning to do things like promoting things from testing, to staging, to prod.
Yeah, I guess we should simply move the TF code out and call it a day 👍 |
You can use git tags to pull off versioning inside a monorepo. So for each module in our modules folder we autogenerate a semver tag that looks like this:
It is not ideal though because if you do this a lot in a monorepo you get hundreds of local copies of the modules cloning the repo you already are working inside of over and over again. Which is kind of what this issue is opened to fix. The better approach is to just not monorepo, but thats simply not allowed in the organization we are in. Compliance, security, and strange engineering leadership can all just create bad and confusing rules and we have to work around it. |
Gave this a try to improve Registry module fetching, but the current
Related: hashicorp/go-getter#510 |
Current Terraform Version
Use-cases
Terraform module sparse checkout and specify depth. While terraform suggest 1 module per repo, there are orgs which are more willing to manage multiple related modules together. This gives faster feedback cycle also, related pull requests in one repo etc..
Attempted Solutions
Couldn't find any thing relevant.
Proposal
possibly we can evolve source with backward compatibility as
and also
which allows to sparse checkout /modules/dynamo as relevant terraform module.
The text was updated successfully, but these errors were encountered: