Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional S3_REGION config #846

Closed
lognaturel opened this issue Dec 12, 2024 · 16 comments
Closed

Add optional S3_REGION config #846

lognaturel opened this issue Dec 12, 2024 · 16 comments
Assignees

Comments

@lognaturel
Copy link
Member

lognaturel commented Dec 12, 2024

We originally decided not to make region configurable because minio detects, caches and uses the region when making a request against a generic storage URL.

We've now discovered that for Amazon S3, regions introduced after 2019 like af-south-1 can't use the generic URL: https://forum.getodk.org/t/using-external-aws-s3-bucket-in-region-other-than-us-east-1/51392

@github-project-automation github-project-automation bot moved this to 🕒 backlog in ODK Central Dec 12, 2024
alxndrsn pushed a commit to alxndrsn/odk-central that referenced this issue Jan 13, 2025
@alxndrsn alxndrsn moved this from 🕒 backlog to ✏️ in progress in ODK Central Jan 13, 2025
@derekmorr
Copy link

While I think this is a good idea, you can also upgrade minio-js to the current version to get the new AWS regions.

I sent in a PR to them (minio/minio-js#1363) to add the extra AWS regions. That has been incorporated into their 8.0.3 release.

@lognaturel
Copy link
Member Author

Amazing, thanks @derekmorr! Does this mean I misunderstood the issue? I thought that the problem was that the initial request against the generic S3 URL did not result in usable region information.

From your comment and PR, it sounds like that initial exchange does provide region information, minio just didn't have a way to use it to switch to the appropriate region-specific URL for subsequent requests before your PR. Does this new understanding sound right?

@derekmorr
Copy link

derekmorr commented Jan 21, 2025

Not quite. Minio's Client() construction takes an optional region parameter. If it's not passed, it looks up the region in its internal hard-coded cache. This cache only has endpoints for AWS.

That cache was out-of-date, so I sent them a PR to update it.

So there are a few ways to fix the problem:

  1. ODK can upgrade to the new version of minio. That will update minio's cache. This will let people use regions other than us-east-1 for AWS. But if they're using another S3-compatible provider, I'm not sure it will work.

  2. ODK can make the S3_REGION variable mandatory and pass it into the Client() constructor. That's what this PR does. It's the option that's least likely to cause problems.

  3. ODK could adopt a hybrid -- update minio to get the new region cache, and make the S3_REGION variable optional. For people using AWS, this should just work, and ODK admins wouldn't have to add the S3_REGION variable. But if S3_REGION is set, it would be used.

Since the ODK docs mention using S3-compatible providers other than AWS, I think you'll need to at least make S3_REGION optional. The only other option is to patch minio to support every possible S3-compatible endpoint which would be a lot of work.

@lognaturel
Copy link
Member Author

lognaturel commented Jan 21, 2025

Really appreciate the follow-up!

If it's not passed, it looks up the region in its internal hard-coded cache

That makes sense and you're right that we probably do want 3 regardless at some point no matter what.

If it's not passed, it looks up the region in its internal hard-coded cache

Looks like it gets the region code from the response to an initial request to the generic URL at https://github.com/minio/minio-js/blob/a8e1abb5e2322b9d9bf938fb560e6e330d305ccc/src/internal/client.ts#L772 and presumably this works even for the newer regions (so not affected by https://repost.aws/questions/QUIZTCFPrzQXa2WJ2VaPC2lw/buckets-created-in-af-south-1-cannot-be-accessed-through-global-endpoint)

Thanks again!

@alxndrsn
Copy link
Contributor

2. ODK can make the S3_REGION variable mandatory and pass it into the Client() constructor. That's what this PR does. It's the option that's least likely to cause problems.

I don't think that's what's happening in the current PR. The following code should allow creating a minio client both with and without the S3_REGION env var:

https://github.com/getodk/central-backend/pull/1357/files#diff-fcd170eb9d2c1b7aa9c52bb9d938d8b5f5cce5d547ce68f459699a06cda7bf0aR33-R34

@alxndrsn
Copy link
Contributor

  1. This will let people use regions other than us-east-1 for AWS.

I agree that updating odk-central's minio dependency will increase the number of s3 regions that can be auto-mapped from the S3_SERVER env var. However, I think the current list is more extensive than just us-east-1

E.g. odk-central v2024.3.1 bundles minio v7.1.3. The list for minio v7.1.3 is:

  • us-east-1
  • us-east-2
  • us-west-1
  • us-west-2
  • ca-central-1
  • eu-west-1
  • eu-west-2
  • sa-east-1
  • eu-central-1
  • ap-south-1
  • ap-southeast-1
  • ap-southeast-2
  • ap-northeast-1
  • cn-north-1
  • ap-east-1
  • eu-north-1

@alxndrsn
Copy link
Contributor

While I think this is a good idea, you can also upgrade minio-js to the current version to get the new AWS regions.

I've added an issue for this at getodk/central-backend#1376

However, there are API changes between minio-js v7 and v8 which will need to be worked through.

In future, it would be convenient if new s3 regions could be configured for the minio client without requiring a library update. I've created an issue about this at minio/minio-js#1375

@alxndrsn
Copy link
Contributor

So there are a few ways to fix the problem:

  1. ODK can upgrade to the new version of minio. That will update minio's cache. This will let people use regions other than us-east-1 for AWS. But if they're using another S3-compatible provider, I'm not sure it will work.

  2. ODK can make the S3_REGION variable mandatory and pass it into the Client() constructor. That's what this PR does. It's the option that's least likely to cause problems.

  3. ODK could adopt a hybrid -- update minio to get the new region cache, and make the S3_REGION variable optional. For people using AWS, this should just work, and ODK admins wouldn't have to add the S3_REGION variable. But if S3_REGION is set, it would be used.

This is a great summary, thanks.

I think 3 is addressed by the PRs at #846 (comment), 1 is addressed by getodk/central-backend#1376, and minio/minio-js#1375 would speed up dealing with similar issues in future.

@alxndrsn

This comment has been minimized.

@lognaturel
Copy link
Member Author

✅ ✅ s3.amazonaws.com (undefined)

You are saying that with the af-south-1 bucket you found, the generic URL works? Then why doesn't it work for the user? This should be equivalent to what we document.

✅ ✅ s3.us-east-1.amazonaws.com (undefined)

This is extremely surprising -- I could not get region-specific URLs to work. Is that because of Central's usage?

@alxndrsn
Copy link
Contributor

alxndrsn commented Jan 23, 2025

This is extremely surprising -- I could not get region-specific URLs to work. Is that because of Central's usage?

Ah no, this was because of a mistake in the test code shared above. Thanks for catching that. I've hidden the misleading results, and will re-share below.

@alxndrsn
Copy link
Contributor

alxndrsn commented Jan 23, 2025

7.1.3 8.0.3 endPoint region
s3.amazonaws.com (undefined)
s3.amazonaws.com us-east-1
s3.amazonaws.com af-south-1
s3.af-south-1.amazonaws.com (undefined)
s3.af-south-1.amazonaws.com us-east-1
s3.af-south-1.amazonaws.com af-south-1
s3.us-east-1.amazonaws.com (undefined)
s3.us-east-1.amazonaws.com us-east-1
s3.us-east-1.amazonaws.com af-south-1

test code

const minio = require('minio');
const { Client } = minio;

const minioVersion = require('minio/package.json').version

printRow(code(minioVersion), code('endPoint'), code('region'));
console.log('---|---|---');

(async () => {
  await testConfig({ endPoint:'s3.amazonaws.com' });
  await testConfig({ endPoint:'s3.amazonaws.com', region:'us-east-1' });
  await testConfig({ endPoint:'s3.amazonaws.com', region:'af-south-1' });
  await testConfig({ endPoint:'s3.af-south-1.amazonaws.com' });
  await testConfig({ endPoint:'s3.af-south-1.amazonaws.com', region:'us-east-1' });
  await testConfig({ endPoint:'s3.af-south-1.amazonaws.com', region:'af-south-1' });
  await testConfig({ endPoint:'s3.us-east-1.amazonaws.com' });
  await testConfig({ endPoint:'s3.us-east-1.amazonaws.com', region:'us-east-1' });
  await testConfig({ endPoint:'s3.us-east-1.amazonaws.com', region:'af-south-1' });

  function testConfig(cfg) {
    return new Promise(resolve => {
      try {
        const client = new Client({ ...cfg });
        stream = client.listObjects('deafrica-input-datasets');
        stream.on('error', err => {
          logResult(cfg, false);
          resolve();
        });
        stream.on('end', () => { console.log('stream.end'); });
        stream.on('data', o => {
          logResult(cfg, true);
          stream.destroy();
          resolve();
        });
      } catch(err) {
        console.log('\nouter error:', '\n  config:', JSON.stringify(cfg), '\n  error:', err.code, err.message);
      };
    });
  }
})();

function code(s) {
  return `\`${s}\``;
}

function logResult(cfg, ok) {
  printRow(ok ? '✅' : '❌', code(cfg.endPoint), cfg.region ? code(cfg.region) : '(undefined)');
}

function printRow(...cols) {
  console.log(cols.join(' | '));
}

@alxndrsn
Copy link
Contributor

Upgrading the minio dependency to 8.0.3 seems to work OK: getodk/central-backend#1378

@lognaturel
Copy link
Member Author

lognaturel commented Jan 23, 2025

Amazing! Thanks so much for the tip, @derekmorr

Let's close this issue and related PRs once getodk/central-backend#1378 is merged. We can re-open if someone needs another region-aware provider. I'd rather do that than add it right away because it would be helpful to know about the use case and see whether a REGION configuration is really the best way to address it.

@lognaturel
Copy link
Member Author

lognaturel commented Jan 24, 2025

Short-term need addressed by getodk/central-backend#1378 for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅ done
Development

No branches or pull requests

3 participants