Releases · dbt-labs/metricflow

26 Jan 23:10

tlento

v0.140.0

a3af95b

v0.140.0

Highlights

We've added a number of new features, including:

Derived metrics
Support for joining against versioned dimensions (Slowly Changing Dimensions, or SCD)
Percentile measures
dbt Cloud support

Breaking Changes

Result layout is changing from one row per metric/null dimension valued pair to one row per null dimension value regardless of number of metrics in the query. This only affects queries for multiple metrics where the requested dimensions contain null values. See the description on the relevant PR for more detailed information and an example illustrating how the output will change.
Updates to the required SqlClient protocol could cause typechecking failures for users injecting a custom SqlClient implementation into the MetricFlowClient
Version minimum changes in SQLAlchemy and snowflake-sqlalchemy could cause dependency conflicts when installed in python environments with libraries requiring an older version of either of these dependencies.

New Feature Details

Derived Metrics (@WilliamDee)

MetricFlow now enables the user to reference metrics in the definition of a metric - an expression of metrics. This feature will further simplify and DRY out code by removing the need to create pre-aggregated subqueries in the data source definition or duplicated measure definitions. For example:

metric:
  name: net_sales_per_user
  owners:
    - [email protected]
  type: derived
  type_params:
    expr: gross_sales - cogs / active_users
    metrics:
      # these are all metrics (can be a derived metric, meaning building a derived metric with derived metrics)
      - name: gross_sales
      - name: cogs
      - name: users
        constraint: is_active # Optional additional constraint
        alias: active_users # Optional alias to use in the expr

Versioned dimension (SCD Type II) join support (@tlento)

MetricFlow now supports versioned dimension (Slowly Changing Dimension (SCD) Type II) joins!
Given an SCD Type II table with an entity key and dimension values with an appropriate start and end timestamp column, you can now fetch the slowly changing dimension from an SCD Type II table through extra configurations in your data source.
For specific details and examples, please see the documentation on slowly changing dimensions.

Percentile measures (@kyleli626)

MetricFlow now supports percentile calculations in measures! Simply specify percentile for the agg type in your data sources and input the desired percentile within agg_params as seen in the documentation for configuring measures. This feature also provides a median aggregation type as a convenience around the appropriate percentile configuration. For example:

measures:
  - name: p99_transaction_value
    description: The 99th percentile transaction value
    expr: transaction_amount_usd
    agg: percentile
    agg_params:
      percentile: .99
      # False will calculate the discrete percentile and True will calculate the continuous percentile
      use_discrete_percentile: False
    create_metric: True
  - name: median_transaction_value
    description: The median transaction value
    expr: transaction_amount_usd
    agg: median
    create_metric: True

Note that MetricFlow allows for choosing between continuous or discrete percentiles via the use_discrete_percentile parameter.

dbt Cloud support (@QMalcolm)

MetricFlow is now available to use with dbt Cloud. Instead of requiring additional MetricFlow config yamls to enable dbt in your MetricFlow model (as in the previous dbt metrics release), the MetricFlow CLI can work off of a semantic model built from dbt Cloud. To use, simply follow these two steps:

Install our dbt cloud package (e.g., by running pip install "metricflow[dbt-cloud]")
Add the following to .metricflow/config.yaml:

dbt_cloud_job_id: <job_id>
# The following service token MUST have dbt Metadata API access for the project containing the specified job
dbt_cloud_service_token: <dbt_service_token>

Other changes

Added

Support for querying metrics without grouping by dimensions (@WilliamDee)
A cancel_request API in the SQL client for canceling running queries, with the necessary support for SQL isolation levels and asynchronous query submission (@plypaul)
Support for passing in query tags for Snowflake queries (@plypaul)
DataFlowPlan optimization to reduce source table scans (@plypaul)
Internal API to enable developers to fetch joinable data source targets from an input data source (@courtneyholcomb)

Updated

Improved readability of validation error messages (@QMalcolm)
Made Postgres engine tests merge-blocking in CI to reduce cycle time on detecting engine-specific errors (@tlento)
Updated poetry and python versions in CI to align with our build process and verify all supported Python versions (@tlento)
Eliminated data source level primary time dimension requirement in cases where all measures have an aggregation time dimension set (@QMalcolm)
Extended support for typed values for bind parameters (@courtneyholcolm)
Removed the optional Python levenshtein package from build dependencies in order to streamline package version requirements (@plypaul)
Consolidated join validation logic to eliminate code duplication and speed development (@plypaul)
Factored join building logic out of DataflowToSqlQueryPlanBuilder to streamline development (@tlento)
Improved visibility on underlying errors thrown by sql client requests (@courtneyholcomb)
Updated SQLAlchemy and snowflake-sqlalchemy minimum version requirements to resolve a version incompatibility introduced with SQLAlchemy 1.4.42 (@tlento)
Added CI coverage for Databricks SQL Warehouse execution environments (@tlento)

Fixed

Resolved error encountered in Databricks whenever table rename methods were invoked (@courtneyholcomb)
Fixed bug with warehouse measure validation where an error would be inappropriately thrown when users with measure-specific agg_time_dimension configurations attempted to run the full validation suite (@WilliamDee)
Issue with parsing explain output for Databricks SQL warehouse configurations (@courtneyholcomb)
Floating point comparison errors in CI tests (@tlento)
Issue with incomplete time range constraint validation that could result in invalid queries(@plypaul)
Resolved GitHub upgrade warnings on use of deprecated APIs and node.js build versions (@tlento)
Resolved python-levenshtein optimization warning on CLI startup (@jzhu13)
Resolved SQLAlchemy warning about the impending deprecation of the engine.table_names method (@Jstein77)
Improved error message for queries with time range constraints which were too narrow for the chosen time granularity (@kyleli626)
Eliminate SQL rendering error in BigQuery which would intermittently produce invalid GROUP BY specifications (@tlento)

Contributors

tlento, plypaul, and 6 other contributors

Assets 2

19 Oct 20:50

tlento

v0.130.1

5d6b709

v0.130.1

Highlights

We've improved safeguards for proper model development and added support for profile and targets overrides for dbt queries!

Added

Support for overriding dbt profile and targets attributes when querying dbt models (@QMalcolm)
Validation to block use of DISTINCT keyword in COUNT aggregation expressions, as this can lead to incorrect results if optimized queries relying on partial aggregation attempt to do something like SUM(counts) to retrieve a less granular total value. (@tlento)

Updated

Made minor improvements to safeguards for internal development (@tlento)

Contributors

tlento and QMalcolm

Assets 2

19 Oct 20:43

tlento

v0.130.0

9d9d10c

v0.130.0

Highlights

Introducing query support for dbt metrics!

With this release you can now use MetricFlow to run queries against your dbt metrics config! If you wish to use the MetricFlow toolchain to query your dbt metrics you can now do this with a simple configuration change. To use, reinstall Metricflow with the appropriate dbt package (see below for supported installations) and make sure the following is in your .metricflow/config.yaml:

model_path: /path/to/dbt/project/root
dbt_repo: true

From there you can use all of Metricflow's tools to query that model!

Our supported installations can be added as follows:

BigQuery: pip install metricflow[dbt-bigquery]
Postgres: pip install metricflow[dbt-postgres]
Redshift: pip install metricflow[dbt-redshift]
Snowflake: pip install metricflow[dbt-snowflake]

Packaging changes

Use of the new dbt integration requires installation of extended package dependencies. These should not be pulled in by default.
Developers experiencing module not found errors with dbt models will need to run the expanded installation via poetry install -E dbt-<data_warehouse> where <data_warehouse> is one of the supported extras noted above.

Added

Initial dbt integration! See #292 for details. (@QMalcolm)

Updated

Internal refactor to use the more appropriate MetricReferences as lookup keys in place of MetricSpec classes. (@WilliamDee)

Contributors

QMalcolm and WilliamDee

Assets 2

13 Oct 23:54

tlento

v0.120.0

9186f75

v0.120.0

Highlights

We now support Databricks! If you use Databricks for your metrics data warehousing needs, give it a go! You'll need your server hostname, http path, and access token (see the databricks documentation on where to find these). You'll need to use the keys dwh_host, dwh_http_path, and dwh_access_token for these properties in your data warehouse configuration.

We have also fixed some bugs and improved support for semi-additive measures, and added the ability to use measure-specific constraints inside of more complex metric definitions!

Breaking Changes

Minor API change on SqlClient protocol could break existing API users writing custom SqlClient classes. MyPy should detect this issue. See the relevant PR for details.

Added

Support for Databricks! Now you can use Metricflow with your Databricks-backed warehouse! (@courtneyholcomb)
The ability to define constraints on input measures for ratio and expr metric types. This is a temporary quality of life improvement until full-featured derived metrics (i.e., metrics based on other metrics) are available. (@tlento)
Support for different time granularities, improved group by expression handling, and corrected constraint handling on semi-additive measures (@WilliamDee)
Support for count as a measure aggregation type. Note this is implemented as an alias around sum, so use of the DISTINCT keyword in expressions is not supported, and will be blocked via validation in a separate update. Users wishing for a COUNT(DISTINCT thing) equivalent should continue to use the count_distinct aggregation type. (@WilliamDee)

Fixed

Resolved incorrect constraint handling with semi-additive measures (@WilliamDee)
Eliminated Dataclass deserialization errors on default parameters (@tlento)

Updated

Optimized multi-hop join candidate selection (@plypaul)
Improved error handling and error messages on config validation (@QMalcolm, @tlento)
Streamlined our project README (@nhandel)
CLI now accepts metric-only queries in limited circumstances. See the assertion checks in the PR for details. We will announce this as a new feature when it is complete, but for the time being users may try it out to see if it meets their needs in its current form. (@WilliamDee)

Contributors

tlento, nhandel, and 4 other contributors

Assets 2

30 Aug 21:54

tlento

v0.111.1

1b8f491

v0.111.1

Highlights

We now have a basic extension for authoring configs available for Visual Studio Code, with simple support for auto-completion and inline schema validation, courtesy of @jack-transform !

This release also features several bugfixes and improvements, most notably a dramatic speedup (as much as 10x) in plan building for complex queries.

Changes:

Added

Metric model template for Shopify metrics - if you use Shopify, this template can get you up and running for tracking your Shopify metrics! (@hesham-nawaz)

Fixed

Fixed warehouse validation failures caused by interaction with measure-specific aggregation times (@QMalcolm)
Resolved intermittent semantic validation failures due to unpicklable pydantic objects (@tlento)
Fixed edge case with semi-additive measures failing on certain join dimension connections (@WilliamDee)
Restructured semantic container depedencies to eliminate certain classes of circular import errors (@tlento)

Updated

Replaced Pydantic objects with serializable Dataclasses to improve plan building speed. This can reduce time to generate dataflow plans by 10-20x for complex queries. (@plypaul)
Refactored validation logic to make it easier to reason about (@QMalcolm)
Modified inference detection rules to make it more likely to classify low cardinality string columns as categorical dimensions (@tlento)

Contributors

tlento, plypaul, and 4 other contributors

Assets 2

11 Aug 19:58

serramatutu

v0.111.0

9438f31

v0.111.0

Metricflow 0.111.0 was released today!

The big news is we have added data source inference as a Beta feature. It allows new users to quickly get started with Metricflow by bootstrapping their data source configurations from a warehouse connection.

Please note that this is still Beta feature. As such, it should not be expected to be free of bugs, and its CLI/Python interfaces might change without prior notice.

For full details see our release notes.

[0.111.0] - 2022-08-10

Added

Early beta for data source inference - Snowflake users can now run a command to bootstrap data source definitions based on warehouse metadata for tables in a given schema. Check out mf infer --help for more details. Feedback welcome! (@serramatutu)
Support for semi-additive measures - semi-additive measures are measures which can be aggregated across some dimensions, but not others. The most common use case for this is an end-of-period measure, such as a statement balance or end of day inventory hold. As of this release we support semi-additive restrictions bound by any time dimension. See the description on PR #183 for more details on usage. (@WilliamDee)
Expanded data warehouse validations - data warehoue validation runs will now check to ensure all measures, identifiers, and dimensions defined in data sources are queryable. (@QMalcolm)

Fixed

Resolved serialization error on custom Pydantic objects in the model parser (@tlento)
Cleaned up error message handling and missing test depencies on model validation (@tlento)
Eliminated a class of circular import errors by forking reference classes from the specs module (@WilliamDee)
Fixed error state when specifying a where constraint with metric_time (@plypaul)

Updated

Updated the tutorial to use metric_time instead of arbitrary time dimensions (@plypaul)
Increased strictness of typechecking enforcement (@tlento)
Removed unnecessary "all time" WHERE clause rendering for cases where time constraints were not provided (@WilliamDee)
Updated Click version from ^8.1.3 to >=7.1.2 to temporarily resolve dependency issue for downstream Flask 1.1.4 usage (@jack-transform)
Updated Jinja2 version from 2.11.0 to >=2.11.0 to allow downstream Flask 1.1.4 users to update to 2.x.x (@jpreillymb, @tlento)

Contributors

tlento, plypaul, and 5 other contributors

Assets 2

25 Jul 21:11

tlento

v0.110.1

4bce6d0

v0.110.1

Patch fixes and minor improvements on v0.110.0:

Fixing a bug where columns would occasionally render in the wrong order (@WilliamDee)
Package metadata cleanup to fix broken links on pypi (@tlento)
Rendering BETWEEN expressions to streamline SQL time range comparisons (@WilliamDee)
Consolidating error detection in validation to surface all errors in a single pass (@QMalcolm)

Get the latest here: https://pypi.org/project/metricflow/0.110.1/

Contributors

tlento, QMalcolm, and WilliamDee

Assets 2

22 Jul 17:20

serramatutu

v0.110.0

fc2326b

v0.110.0

Metricflow 0.110.0 was released today!

The big news is we have added support for setting custom time dimensions on a measure-by-measure basis. This change also means we no longer require the primary time dimensions to have the same names across all data sources.

To run a query for a time series metric computation, simply request the metric_time dimension, and Metricflow will use the aggregation time dimension associated with each measure in your query and line everything up on your time series chart for you. You don't even need to know the names of the time dimensions! Please note metric_time is now a reserved name.

Speaking of reserved names, we have also improved our validation against SQL reserved keywords, which should provide more rapid feedback in the metric config development workflow.

For full details see our release notes.

[0.110.0] - 2022-07-21

Breaking Changes

Updated query inputs for time series dimensions to use metric_time instead of dimension names, since it is now possible for measures to have different time dimensions for time series aggregation. This also removes the restriction that all data sources have the same primary time dimension name. However, users issuing queries might experience exceptions if they are not using metric_time as their requested primary time dimension. (@plypaul)
Added enforcement for new reserved keyword metric_time (@tlento)
Reordered column output to time dimension, identifiers, dimensions, metrics, which could break automation relying on order-dependent parsing of CLI output. We encourage affected users to switch to using the API, and to access the resulting data frame with order-independent (i.e., by name) access to column values. (@WilliamDee)
Removed support for SQLite - expected impact is minimal as the repo has been cut to DuckDB for in memory testing (@plypaul)

Added

Support for specifying measure-specific time dimensions for time series aggregations (@plypaul)
Validation against use of SQL reserved keywords as element names, which previously would have resulted in SQL errors at query time (@QMalcolm)

Contributors

tlento, plypaul, and 2 other contributors

Assets 2

Releases: dbt-labs/metricflow

v0.140.0

Highlights

Breaking Changes

New Feature Details

Derived Metrics (@WilliamDee)

Versioned dimension (SCD Type II) join support (@tlento)

Percentile measures (@kyleli626)

dbt Cloud support (@QMalcolm)

Other changes

Added

Updated

Fixed

Contributors

v0.130.1

Highlights

Added

Updated

Contributors

v0.130.0

Highlights

Packaging changes

Added

Updated

Contributors

v0.120.0

Highlights

Breaking Changes

Added

Fixed

Updated

Contributors

v0.111.1

Highlights

Added

Fixed

Updated

Contributors

v0.111.0

[0.111.0] - 2022-08-10

Added

Fixed

Updated

Contributors

v0.110.1

Contributors

v0.110.0

[0.110.0] - 2022-07-21

Breaking Changes

Added

Contributors