Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue #3094] Use CDN URLs for attachments instead of S3 URLs #3659

Merged
merged 10 commits into from
Feb 5, 2025

Conversation

mikehgrantsgov
Copy link
Collaborator

Summary

Fixes #3094

Time to review: 10 mins

Changes proposed

Branch logic for attachment URLs whether CDN_URL is set or not. If set, use CDN_URL to replace s3:///<path_to_file>

Context for reviewers

Instead of changing actual values in the DB, evaluate at run-time if the CDN_URL is set and use that value instead of presigning S3 URLS.

Additional information

One thing to figure out: AttachmentConfig relies on *.env files being set, but doesn't appear to work when setting ENV vars via Monkeypatch.

api/src/util/file_util.py Outdated Show resolved Hide resolved
api/src/util/file_util.py Outdated Show resolved Hide resolved
def test_get_opportunity_returns_cdn_urls(
client, api_auth_token, monkeypatch_session, enable_factory_create, db_session, mock_s3_bucket
):
monkeypatch_session.setenv("CDN_URL", "https://cdn.example.com")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we set this in some autouse conftest value? Otherwise we'll need to set it for every API test with attachments. Fine to override it here.

Also we should set CDN_URL in local.env to be localhost:4566 and test that this works locally as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried this locally and receive:
Screenshot 2025-01-28 at 2 48 17 PM

This would be the correct path is <s3_path> was replaced with <cdn_path>

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you go to that path, does the file download work? I think for local, it also needs the bucket name (ie. CDN_URL should include the bucket for our local dev purposes).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the bucket name to CDN_URL in local.env and it works. Checking that in shortly.

@mikehgrantsgov mikehgrantsgov marked this pull request as ready for review January 28, 2025 19:54

class AttachmentConfig(PydanticBaseEnvConfig):
# If the CDN URL is set, we'll use it instead of pre-signing the file locations
cdn_url: str | None = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to be in capslock?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pydantic will look for CDN_URL as an env var. We sometimes also give values an alias if we want a cleaner python name, but doesn't matter too much here.

if not is_s3_path(file_path):
raise ValueError(f"Expected s3:// path, got: {file_path}")

return file_path.replace(os.environ["PUBLIC_FILES_BUCKET"], cdn_url)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you testing that this URL is exactly right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we added tests to cover a few scenarios: https://github.com/HHS/simpler-grants-gov/pull/3659/files#diff-e284f21c2216feb0f0612d0cbed5d770d98a4fccb1591714ce50bb9dc47910d0R214

Also the only scenarios where S3 -> CDN replacement happens is explicitly for files in the public files bucket.


class AttachmentConfig(PydanticBaseEnvConfig):
# If the CDN URL is set, we'll use it instead of pre-signing the file locations
cdn_url: str | None = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pydantic will look for CDN_URL as an env var. We sometimes also give values an alias if we want a cleaner python name, but doesn't matter too much here.

Comment on lines +65 to +66
else:
pre_sign_opportunity_file_location(opportunity.opportunity_attachments)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure there will be a case where it's not set once we merge this. (NOTE - we should first verify with @coilysiren that the CDN is fully working before we merge this - I think it's still in progress)

if not is_s3_path(file_path):
raise ValueError(f"Expected s3:// path, got: {file_path}")

return file_path.replace(os.environ["PUBLIC_FILES_BUCKET"], cdn_url)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use the s3 config for this instead of the os module? Just so we only define the env vars in a single place

def test_get_opportunity_returns_cdn_urls(
client, api_auth_token, monkeypatch_session, enable_factory_create, db_session, mock_s3_bucket
):
monkeypatch_session.setenv("CDN_URL", "https://cdn.example.com")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you go to that path, does the file download work? I think for local, it also needs the bucket name (ie. CDN_URL should include the bucket for our local dev purposes).

@chouinar
Copy link
Collaborator

@mikehgrantsgov - FYI - the change looks good, but holding approval until we can confirm the CDN itself is actually working.

@mdragon mdragon changed the title [Issue #3094] Update database records to use CDN URLs for attachments instead of S3 URLs [Issue #3094] Use CDN URLs for attachments instead of S3 URLs Feb 4, 2025
Copy link
Collaborator

@chouinar chouinar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - I think we're safe to merge this now.

After this is deployed, can you check that dev is using the CDN URL instead of presigned URLs?

Here's an opportunity with two attachments already on it: http://frontend-dev-1739892538.us-east-1.elb.amazonaws.com/opportunity/70111

@mikehgrantsgov mikehgrantsgov merged commit 971b38f into main Feb 5, 2025
3 checks passed
@mikehgrantsgov mikehgrantsgov deleted the mikehgrantsgov/3094-use-cdn-urls-for-attachments branch February 5, 2025 21:49
DavidDudas-Intuitial pushed a commit that referenced this pull request Feb 7, 2025
## Summary
Fixes #3094

### Time to review: 10 mins

## Changes proposed
Branch logic for attachment URLs whether CDN_URL is set or not. If set,
use CDN_URL to replace s3://<foo>/<path_to_file>

## Context for reviewers
Instead of changing actual values in the DB, evaluate at run-time if the
CDN_URL is set and use that value instead of presigning S3 URLS.

## Additional information
One thing to figure out: `AttachmentConfig` relies on *.env files being
set, but doesn't appear to work when setting ENV vars via Monkeypatch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use CDN URLs for attachments instead of S3 URLs
3 participants