-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support mirror servers: custom downloading upstream #136
Comments
I hadn't thought of this at all, but it sounds like a great feature and I'd love to see something like that here as well! If there is a way that we could port/learn from the stuff you did on that in jill.py it would of course be even better :) Right now we have download URLs very hard coded in our "versions db" (which is just a JSON file), so we would need to redesign that a bit, but I don't think that would be prohibitive at all. I'm in the process of redesigning some of the logic for checking for new versions etc at the moment, so this issue came just in time, I'll try to design this from the get-go now so that we might be able to add mirrors down the road. Are these mirrors "official" Julia language org mirrors? If not, would it make sense to make them that? Also CCing @staticfloat. @johnnychen94 I actually had planned to reach out at some point to discuss the relationship between jill.py and juliaup, as they obviously have very similar (if not identical) goals. Not sure what your current thinking on all of this is, I think from my end this all started as a "get Julia in the Windows Store" project, and then spiraled out of control. At this point I do like that everything in Juliaup depends essentially on nothing (because it is written in Rust), but of course feature wise there is a fair bit of stuff that it is missing relative to jill.py. Not sure whether it would make sense to consider joining forces at some point? |
In jill.py, to support mirror servers there are two separated non-trivial tasks:
By adding multiple sources, it's no longer valid that any predefined URLs will be available, so every time when it tries to download something, it needs to check if the URL is valid (e.g., send a HTTP HEAD request and see if it's 200). And because there will be multiple sources, the query needs to be done quite fast asynchronously with some timeout. There will be some technical issues if we try to bring mirror support for nightly versions. Not all mirror sites are willing to provide nightly builds because only a few people want that. And even if they provided, it's not guaranteed that they're providing the latest version of nightly builds. (CRef #96 (comment), johnnychen94/jill.py#17) This reminds me back to the day when I wrote jill.py, @staticfloat provided a cache service on the julia pkg server (https://github.com/JuliaPackaging/PkgServerS3Mirror). For instance: - https://julialang-s3.julialang.org/bin/linux/x64/1.6/julia-1.6.2-linux-x86_64.tar.gz
+ https://mirror.us-east.pkg.julialang.org/julialang2/bin/linux/x64/1.6/julia-1.6.2-linux-x86_64.tar.gz But it seems that it's not available anymore.
This can be an advantage and also a disadvantage. The advantage is that it depends on nothing and can run on any target system as long as the compiled binary of I like the |
We could easily extend pkg.julialang.org servers to also serve Julia version tarballs. It seems silly to have a second set of infrastructure for distributing Julia versions separate from the infrastructure we already have for distributing artifacts and package versions. @staticfloat has already worked hard at making sure that these are highly available all around the world. It would likely be a pair of HTTP endpoints along the lines of:
Maybe also something for serving up channel info if that would be helpful. Note that if someone is already setting |
We in fact already have this! The whole https://pkg.julialang.org/mirror/julialang2/bin/linux/x64/1.6/julia-1.6.4-linux-x86_64.tar.gz If you want to programmatically get a list of available versions, you just download the https://pkg.julialang.org/mirror/julialang2/bin/versions.json |
It's nice that we can mirror from S3 but it feels a little haphazard (what if we decide to change the layout of that bucket or stop using S3?), so what about making it an official part of the Pkg server protocol so that we guarantee that there's a way to get this info and we'll keep it working? Of course, I'm long overdue on writing up the entire protocol as an official standard, but we largely stuck to the plan laid out in the original issue. |
Agreed, I think it would be nice if eventually this all worked via the package server protocol, in particular if it was simple to just redirect everything to some mirror via one central place. |
Actually, I think the triplet version |
@johnnychen94 do you know whether ipfs works in China? I've played around with that a bit lately, and it looks quite fantastic and could be a really efficient and simple solution for this problem? A very simple implementation could be that we add a configuration flag that makes |
Unsure of it. My understanding is that we should try to follow the Pkg protocol and still use HTTP protocol. The non-technical issue for ipfs in China is that an IPFS node can store anything, including the sensitive stuff that the government doesn't want. Thus I think the traditional mirror sites (e.g., TUNA) in universities won't serve an ipfs node. And using anonymous ipfs nodes might be unstable, I never used it so I'm really unsure of it. |
Ok, so I looked a bit more into this, and I think in the short term we can do the following: we just introduce a new env variable called So that would mean that if there is a mirror that simply reflects the same file structure that we have on S3, it should work, it should also work with the mirror service from the existing package server, and it actually can also work with ipfs, the URL then would just be There are a few, not very involved, things we need to do to make this work:
And I think that is pretty much it? We can then still think about some fancier story down the road along the lines of what @johnnychen94 has done with a hosted server database etc. |
Sounds like a good plan to me. |
That's a name, not a hash value. Specifically, when we identify content by hash, you can inherently check that you have the right version—just compute the tree hash of the content you got and it should be the same as the hash you requested. That obviously can't be done with that file name. |
One of the main reasons I build jill.py is that I live in China, where AWS S3 and GitHub are quite unstable and usually out of service. We have a few mirror sites in China and it's generally much faster and stable to download binaries from mirror servers for users in China. And jill.py has some smart built-in mechanism to download binaries from nearest servers (speaking of RTT) and has GPG verification to ensure they are trusted downloads.
This can also be useful to boost Julia setup in CI environments (e.g., self-hosted gitlab), where you can point your runners to download from a Julia binary mirror that is in LAN network instead of from the Julia s3 storage.
Is there any plan to support this?
The text was updated successfully, but these errors were encountered: