-
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create S3 production/cloud processing pipeline and update dagster infrastructure #163
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! I left a few requests and questions for ya.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Just a few non blocking requests about comments.
src/usage_metrics/etl/__init__.py
Outdated
resources_by_env = { # STILL TO DO! | ||
"prod": {"database_manager": postgres_manager}, | ||
"local": {"database_manager": sqlite_manager}, | ||
} | ||
|
||
resources = resources_by_env[os.getenv("METRICS_PROD_ENV", "local")] | ||
|
||
defs: Definitions = Definitions( | ||
assets=default_assets, | ||
# asset_checks=default_asset_checks, | ||
resources={"database_manager": sqlite_manager}, # TODO: How to handle this? | ||
resources=resources, # TODO: How to handle this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these comments be removed?
src/usage_metrics/raw/s3.py
Outdated
@@ -73,4 +75,6 @@ def raw_s3_logs(context: AssetExecutionContext) -> pd.DataFrame: | |||
weekly_dfs.append(pd.read_csv(path, delimiter=" ", header=None)) | |||
except pd.errors.EmptyDataError: | |||
context.log.warnings(f"{path} is an empty file, couldn't read.") | |||
return pd.concat(weekly_dfs) | |||
if weekly_dfs: # If data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove this comment or make it more descriptive, please?
src/usage_metrics/etl/__init__.py
Outdated
@@ -12,7 +13,9 @@ | |||
AssetsDefinition, | |||
AssetSelection, | |||
Definitions, | |||
ExperimentalWarning, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like this is being used anywhere in the module.
Overview
Closes #147.
What problem does this address?
What did you change in this PR?
io_manager
decorator, and createSQLIOManager
class to hold methods common to both SQL database typespartition_key
toload_input
andhandle_output
functions for both IO managersparse_dates
handling to prevent dtypes from changing when reading in and out of SQL databasesREADME
all_metrics_etl
weekly and send update topudl-deployments
Testing
How did you make sure this worked? How can a reviewer verify this?
Run
all_etl
locally.To-do list
Tasks