Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bryce/query stream docs #57

Merged
merged 6 commits into from
Jan 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ jobs:
DBT_PROFILES_DIR: . # Use integration_tests/profiles.yml
- name: dbt CI - snowflake - with stream
id: snowflake_ci_with_stream
if: false
if: github.repository == 'bcodell/dbt-aql'
run: |
localstack extensions install localstack-extension-snowflake
localstack start -d
Expand All @@ -62,7 +62,7 @@ jobs:

- name: dbt CI - snowflake - skip stream
id: snowflake_ci_skip_stream
if: false
if: github.repository == 'bcodell/dbt-aql'
run: |
cd ./integration_tests
sed -i 's/skip_stream: false/skip_stream: true/' dbt_project.yml
Expand Down
50 changes: 48 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -335,7 +335,7 @@ As a refresher, Activity Stream queries require the following inputs to be defin

All of these inputs are still needed, and they can be viewed in the following examples - via the `query_stream` macro and via an `aql` query:

# **Creating Datasets Option 1: The `query_stream` Macro**
## **Creating Datasets Option 1: The `query_stream` Macro**
For a macro-based approach to producing datasets, use the following syntax:
```sql
{{ dbt_aql.query_stream(
Expand Down Expand Up @@ -390,7 +390,53 @@ For a macro-based approach to producing datasets, use the following syntax:
included_columns=[] -- optional (empty list is default)
)}}
```
It's long, verbose, and not very readable, but this macro will produce a full sql query and return a dataset. Implementation specifics (including arguments requirements) coming soon.
It's long, verbose, and not very readable, but this macro will produce a full sql query and return a dataset. Relevant macros and associated inputs are as follows:

### Macro: `query_stream`
#### Description
Generate a dataset from an Activity Stream via a macro interface.
#### Args
* `stream (str)`: the stream being queried. Should be the name of one of the activity streams in the project.
* `primary_activity (primary_activity)`: the primary activity to use in the dataset. Requires the activity to be defined in the `primary_activity` macro.
* `joined_activities (list[appended_activity, aggregated_activity], optional)`: a list of 0 of activities to join to the primary activity when building the dataset. Each list item should be an activity defined in either the `appended_activity` or `aggregated_activity` macro.
* `included_columns (list[str], optional)`: a list of predefined dataset columns to include in the dataset. Each item should correspond to the name of a model using a `dataset_column` materialization.

### Macro: `primary_activity`
A wrapper for defining the primary activity to use when building a dataset.
#### Args
* `activity (str)`: the name of the activity to use. may exclude the model prefix for the stream.
* `columns (list[dc])`: a list of one or more columns to use, where each item is defined using the `dc` (dataset column) macro.
* `relationship_selector (string)`: the relationship selector to use. valid options are `['first', 'nth', 'last', 'all']`
* `nth (int, optional)`: the nth instance of the activity to use. Only valid when `relationship_selector='nth'`.
* `filters (list[str], optional)`: the set of filters to apply to subset the activity, where each item is a set of valid sql snippets. Check out the advanced usage section for more details.

### Macro: `appended_activity`
A wrapper for defining activities to append-join to the primary activity use when building a dataset.
#### Args
* `activity (str)`: the name of the activity to use. may exclude the model prefix for the stream.
* `columns (list[dc])`: a list of one or more columns to use, where each item is defined using the `dc` (dataset column) macro.
* `relationship_selector (string)`: the relationship selector to use. valid options are `['first', 'nth', 'last']`
* `nth (int, optional)`: the nth instance of the activity to use. Only valid when `relationship_selector='nth'`.
* `join_condition (str)`: the join condition used when appending the activity to the primary. Valid options are `['before', 'between', 'after', 'ever']`.
* `filters (list[str], optional)`: the set of filters to apply to subset the activity, where each item is a set of valid sql snippets. Check out the advanced usage section for more details.
* `extra_joins (list[str], optional)`: the set of additional join criteria to apply to extend the logic for joining the appended activity to the primary. Check out the advanced usage section for more details.

### Macro: `aggregated_activity`
A wrapper for defining activities to aggregate-join to the primary activity use when building a dataset.
#### Args
* `activity (str)`: the name of the activity to use. may exclude the model prefix for the stream.
* `columns (list[dc])`: a list of one or more columns to use, where each item is defined using the `dc` (dataset column) macro.
* `join_condition (str)`: the join condition used when appending the activity to the primary. Valid options are `['before', 'between', 'after', 'all']`.
* `filters (list[str], optional)`: the set of filters to apply to subset the activity, where each item is a set of valid sql snippets. Check out the advanced usage section for more details.
* `extra_joins (list[str], optional)`: the set of additional join criteria to apply to extend the logic for joining the appended activity to the primary. Check out the advanced usage section for more details.

### Macro: `dc` (dataset column)
A wrapper for defining columns to choose in the primary, appended, and aggregated activities in the dataset.
#### Args
* `column_name (str)`: the name of the column. Should be a valid key in the `feature_json` or a standard Activity Schema column.
* `alias (str)`: the column alias to apply when producing the final version of the dataset.
* `aggfunc (str)`: the name of the aggregation function to apply when transforming the column for the dataset. **Only required for columns declared in `aggregated_activity` macros.**


## **Creating Datasets pt. 2: Querying The Activity Stream with `aql`**
Under the hood, this package will parse the `aql` query string into a json object, then use the object parameters to render the appropriate SQL statement:
Expand Down
2 changes: 1 addition & 1 deletion macros/activity_schema/dataset/dataset_activity.sql
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
the activity to use
columns: list[dc]
a list of columns to use, each passed as a dc macro object
relationship_selector: string (optional)
relationship_selector: string
the relationship selector to use.
valid options are ['first', 'nth', 'last', 'all']
nth: int (optional)
Expand Down
2 changes: 1 addition & 1 deletion macros/activity_schema/dataset/query_stream.sql
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{% macro query_stream(stream, primary_activity, joined_activities=[], included_columns=[]) %}

{#
stream: ref
stream: string
the stream being queried
primary_activity: primary_activity
the primary activity in the dataset
Expand Down