Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

git history really big, consider making new repos for each semester #2

Open
SamLau95 opened this issue Dec 19, 2016 · 9 comments
Open

Comments

@SamLau95
Copy link
Contributor

Looks like the .git folder is really big:

$ du -sh data8assets/.git                                                                                  
1.1G    data8assets/.git

This is probably because git keeps the history of adding / removing big datasets.

Students had trouble downloading their work since tar would zip the .git folder too. For now I've told them to run tar on the data8assets/materials folder instead (~150 MB).

Since git clone causes a full download of the history, next semester we'll have 500 students trying to download a >1 GB repo which seems wasteful.

We can avoid this by making new repos for each semester (materials_sp17, materials_fa17, etc), or by squashing the history for this repo. I prefer the first option since it avoids rewriting git history which makes git pull more complicated for the staff.

@yuvipanda
Copy link

We totally should do this. We could archive this repo, and then just have a new repo here that just has squashed history. Or just have a specific repo per semester as @SamLau95 mentioned. I think having a full checkout here for each student is going to cause problems with the fall infrastructure.

@SamLau95 @papajohn objections to doing this?

@SamLau95
Copy link
Contributor Author

SamLau95 commented Aug 8, 2017

@yuvipanda Sounds like a great idea!

@papajohn
Copy link
Contributor

papajohn commented Aug 8, 2017 via email

@vinitra-zz
Copy link

On it! @papajohn

@papajohn
Copy link
Contributor

papajohn commented Aug 14, 2017 via email

@yuvipanda
Copy link

Thanks @vinitra. I see the new materials repo, and it's at 90M - which is much better! It looks like you deleted old files as a commit - data-8/materials-fa17@ae3fcd3. Since git keeps all history, deleting like this doesn't actually reduce the size of the repo! You should use something like https://help.github.com/articles/removing-sensitive-data-from-a-repository/. Let's do that before classes start - am happy to spend some time with you to get it done.

@vinitra-zz
Copy link

@yuvipanda thanks for the insight! I think I fixed it... can you confirm?

@yuvipanda
Copy link

Nope, it's still there: data-8/materials-fa17@51bbdfd

@yuvipanda
Copy link

@vinitra I've done some force pushing and cleaned it out. People who had clones locally might have to re-clone - let me know if people run into issues and I can help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants