Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFE: disable metadata check for a given mirror's IP block #406

Open
AdamWill opened this issue Oct 23, 2024 · 5 comments
Open

RFE: disable metadata check for a given mirror's IP block #406

AdamWill opened this issue Oct 23, 2024 · 5 comments

Comments

@AdamWill
Copy link

AdamWill commented Oct 23, 2024

I was talking to @thrix about mirroring issues for CI systems today. It seems that he's noticed a category of problems where CI systems hit issues with the metadata validity check system we have (the one where the metalink contains a list of 'valid' metadata checksums, and dnf hits mirrors until it finds metadata with a checksum in the list).

I suggested that maybe putting the CI systems in the IP block associated with a specific mirror that is close to them and known to be rapidly updated could help, but forgot that even if you're in the IP block for a mirror, dnf will still do the metadata validity check, and (IIRC) still use the "pick a mirror at random to do the metadata download" approach. So it doesn't really help with issues with the metadata validity check.

So I wondered - could we add a setting like 'hosts in the IP block use this mirror regardless', which would make that mirror the only one in the metalink response, and make the metalink response not contain the <verification> stuff, to disable the validity check (assuming dnf just skips the validity check if the metalink doesn't include that data)?

For instance I could then use this for openQA to ensure it only ever uses the internal infra mirror and skips the metadata checks. Other CI systems could use it similarly to use a known-good mirror that is close to them in network terms.

@AdamWill
Copy link
Author

I guess we wouldn't want to let arbitrary mirrors just configure this themselves, as a bad mirror owner could maliciously include an IP in its block and force that IP to use content from their mirror. But we could allow it to be set by MM admins for specific use cases.

@adrianreber
Copy link
Member

I suggested that maybe putting the CI systems in the IP block associated with a specific mirror that is close to them and known to be rapidly updated could help, but forgot that even if you're in the IP block for a mirror, dnf will still use a random mirror for the metadata download (IIRC).

Are you sure dnf will take a random mirror? The mirrorlist server creates a list of mirrors (metalink or mirrorlist) by first looking for private mirrors, then mirrors in the local netblock (set by the mirror admin), then same ASN, then same country, same continent. If at this point not more than 5 (that is the default value) mirrors are on the result list, the mirrorlist server will put mirrors with the flag "always up to date" on the list. Always "up to date" is a flag that can only be set by a MirrorManager admin and the primary mirrors have that flag as well as cloudflare (I think) caching mirrors.

Each section gets a weighted shuffle by bandwidth to distribute load. We do not generate the same list twice (better: for each request we shuffle again) to better distribute the load. The expectation from my side was that DNF takes the list and goes through it until it finds one that matches the checksums from the metalink. So we definitely have a different assumption here how DNF works. The whole private mirror setup kind of relies on the fact that DNF tries the first mirror in the list first.

So I wondered - could we add a setting like 'hosts in the IP block use this mirror regardless', which would make that mirror the only one in the metalink response,

That sounds doable. Overall it feels like an extension of the private mirror concept.

and make the metalink response not contain the stuff, to disable the validity check (assuming dnf just skips the validity check if the metalink doesn't include that data)?

Not sure if a metalink works if the checksums are missing, like you said.

I guess we wouldn't want to let arbitrary mirrors just configure this themselves, as a bad mirror owner could maliciously include an IP in its block and force that IP to use content from their mirror. But we could allow it to be set by MM admins for specific use cases.

That is no problem to do. We already have a couple of MirrorManager admin only options.

@AdamWill
Copy link
Author

AdamWill commented Oct 23, 2024

My memory was that dnf always tries the first server in the list first for downloads beyond the initial repodata, but will do some kinda randomized round-robin thing for downloading the repodata. I might be wrong on that, though. I might have been remembering the randomization that applies to the creation of the list as you describe above.

I don't think it's actually terribly important to this issue, though - for the use case in question, what we really want is basically a way to make the metalink system always result in the use of a single specific mirror. If you have control of the SUT's repo config you can of course just overwrite it somehow, but often we cannot or do not want to rewrite the SUT's repo config.

edit: I'll try hand constructing a metalink of the format I'm envisioning and see if it actually works, that shouldn't be too hard.

@adrianreber
Copy link
Member

From my point of view the main question is how does DNF react if no checksums are in the metalink. If that works it should be doable to implement what you need.

@AdamWill
Copy link
Author

AdamWill commented Nov 7, 2024

So in a quick test (sorry, I went on PTO...) this seems to work. I constructed a metalink file thus, with no timestamp, file size or verification block:

<?xml version="1.0" encoding="utf-8"?>
<metalink version="3.0" xmlns="http://www.metalinker.org/" type="dynamic" pubdate="Thu, 07 Nov 2024 22:12:57 GMT" generator="mirrormanager" xmlns:mm0="http://fedorahosted.org/mirrormanager">
 <files>
  <file name="repomd.xml">
   <resources maxconnections="1">
    <url protocol="http" type="http" location="CA" preference="100">http://fedora.mirror.iweb.com/linux/development/41/Everything/x86_64/os/repodata/repomd.xml</url>
   </resources>
  </file>
 </files>
</metalink>

and saved it as /tmp/test.metalink . I created a repo file thus:

[mltest]
name=mltest
metalink=file:///tmp/test.metalink
enabled=1
metadata_expire=1
repo_gpgcheck=0
type=rpm
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-fedora-$releasever-$basearch
skip_if_unavailable=False

then I did dnf --disablerepo=* --enablerepo=mltest repoquery --info gedit. It downloaded the metadata, and worked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants