[AWS] [Billing] Duplicated data when having multiple tags #8942

gpop63 · 2024-01-22T17:18:41Z

When using an AWS billing configuration that groups by a combination of tags and dimensions, such as SERVICE and multiple tags (for example, team, project, aws:createdBy), we may end up with multiple duplicates of the same data. This is due to a limitation of the GetCostAndUsage API, which only allows grouping by two groups at once.

In beats, we pair each tag with each dimension and initiate a request. The total number of Cost and Usage requests equals the number of tags multiplied by the number of dimensions.

Possible solutions:

Allow users to add a filter
- When grouping by multiple tags and dimensions, we make several GetCostAndUsage requests. We would need a way to know which filter to use for which request.
I'm exploring if the new Data Export feature could replace the Cost and Usage API to solve these issues (GCP billing works in a similar way).

@agithomas @lalit-satapathy

The text was updated successfully, but these errors were encountered:

agithomas · 2024-01-23T08:53:53Z

@kaiyan-sheng , you developed the current AWS billing integration. What do you think of the new approach mentioned in the description ?

gpop63 · 2024-02-09T00:46:53Z

Athena Exploration

Amazon Athena is a query service frequently utilized for log analysis and big data analytics. It is capable of analyzing logs from various AWS services such as CloudTrail, CloudFront, as well as application logs.

Prerequisites

Billing report through Data Exports feature (if querying billing data)
Athena setup (database and table)
S3 bucket located in the same region as Athena to store query results
- Query results can be reused for a certain period of time to avoid costs

Implementation Capabilities

Athena Input Integration

As suggested by @agithomas, this could serve as an input package. It allows for the use of any SQL query against a table and the selection of desired fields to be included in the ES documents. Users would have the ability to query any of their data from an S3 bucket as long as it's in one of the supported formats.

AWS Billing Integration

While SQL query and fields remain customizable, the default config will prioritize key columns from billing data reports. Additional fields requests would not require beats changes.

Example of billing config

- module: aws
  period: 1m
  access_key_id: <REDACTED>
  secret_access_key: <REDACTED>
  regions:
    - eu-west-1
  metricsets:
    - billingv2
  athena_config:
    table: t1
    database: db1
    query_results_s3_bucket: s3://example/
    sql_query: |
      SELECT
          CAST(SUM(t1.line_item_unblended_cost) AS DECIMAL(10, 2)) AS UnblendedCost,
          product_servicecode as ProductServiceCode,
          identity_time_interval as IdentityTimeInterval,
          resource_tags as ResourceTags
      FROM
          db1.t1
      GROUP BY
          product_servicecode,
          identity_time_interval,
          resource_tags
      HAVING
          CAST(SUM(t1.line_item_unblended_cost) AS DECIMAL(10, 2)) > 0.00;
    columns:
      - name: UnblendedCost
      - name: ProductServiceCode
        unique: true
      - name: IdentityTimeInterval
        unique: true
      - name: ResourceTags
        unique: true

Benefits and Drawbacks

Benefits

No need for beats changes when requesting additional fields
Applicable to use cases beyond billing data
Provides more granular access to AWS billing data, allowing for the selection of any desired fields from reports
Can query data in various formats
Data exports work with multiple accounts
The issue with tags we have in current integration would not be present

Drawbacks

Athena usage costs $5.00 per TB of data scanned
Setup of Athena, S3 bucket, and data exports report is necessary
SQL knowledge would be needed for query customization by users
Data export reports in AWS can be refreshed multiple times throughout the day, and AWS may perform updates that affect these exports at any time — there's a risk of having outdated data. This issue isn't unique to data exports but also applies to the current implementation that utilizes the GetCostAndUsage API.

@agithomas @tommyers-elastic

tommyers-elastic · 2024-02-13T16:29:04Z

thanks for the detail @gpop63. i'm sceptical of the athena solution primarily because of the effort required to set it up.

could a short term solution to this issue be to document the limitations of the current integration wrt to tags, and make it very clear that incorrect (inflated) cost data will be reported if multiple tags/dimensions are present?

in terms of other things we could do that continue to utilize the existing cost API input, could we change how we do the groupings such that we get accurate cost data, but perhaps we have a limit on the number of supported dimensions. so at least we do not ever report incorrect data, even if it means customers cannot query it in such a granular way?

agithomas · 2024-02-14T06:38:48Z

thanks for the detail @gpop63. i'm sceptical of the athena solution primarily because of the effort required to set it up.

Would providing an AWS cloud formation template or terraform, including them as part of Readme, simplify the setup process?

agithomas · 2024-02-14T06:50:10Z

Should we revisit the default configurations? Keep only the AZ and SERVICE?

In a large AWS setup, having aws:createdBy led to a large number of documents.

I think, apart from the README, we can consider adding a hint in the configuration to limit the number of dimensions to 2.

m-adams · 2024-02-14T10:17:36Z

Can we come at this form a customer 0 perspective.
The people who will get the most value from this will be large orgs wanting to do some form of finops on the data.
To do that usefully you need to pull data at a granular level and then let people analyse that data in the stack.
We need a solution that at minimum is useful internally as our tagging scheme is not exactly that complex. The basic version being described although easier to setup doesn't seem to actually be useful when using user defined cost allocation tags which seems to be the direction people are pushed to track their costs.
Mabe there could be a basic and advanced option if we need to maintain something that is very easy to setup.

tommyers-elastic · 2024-02-14T22:18:44Z

@vinaychandrasekhar @SubhrataK it sounds like we need some research on the best way forward here before we implement anything new.

@gpop63 @agithomas at a minimum right now we should document the issue and/or remove the ability to configure the existing integration in a way that causes incorret billing data to be generated.

cc @lalit-satapathy

vinaychandrasekhar · 2024-03-04T12:56:34Z

@SubhrataK @lalit-satapathy - are we tracking this research effort? Do you need any input from me?

lalit-satapathy · 2024-03-21T12:33:53Z

@gpop63,

Please update & close the issue the final summary of research work done so far; as we are updating the docs here: #9290.

It will be nice to have a proposed architecture for future discussion.

m-adams · 2024-03-21T12:42:50Z

if we close this issue, can we open a new one for making multi-tag analysis work please

gpop63 · 2024-04-01T17:11:28Z

Leaving the steps here as a reference in case it is decided to proceed with the implementation.

This should cover the steps required both in AWS and Agent (Metricbeat).

AWS:

Credentials
Standard Data Export export
- S3 bucket to store reports
- Can be in CSV or Parquet format
Athena database & table from the data export report
- This is easily done by creating a table from S3 bucket data source
- S3 bucket where results will be stored (they can be reused)

Agent (Metricbeat):

Use AWS credentials as usual
Make Athena related settings:
- Table, database and s3 bucket for query results
- Customize SQL query and columns (if needed, default can be used)

Metricbeat config example

- module: aws
  period: 1m
  access_key_id: <REDACTED>
  secret_access_key: <REDACTED>
  regions:
    - eu-west-1
  metricsets:
    - billingv2
  athena_config:
    table: t1
    database: db1
    query_results_s3_bucket: s3://example/
    sql_query: |
      SELECT
          CAST(SUM(t1.line_item_unblended_cost) AS DECIMAL(10, 2)) AS UnblendedCost,
          product_servicecode as ProductServiceCode,
          identity_time_interval as IdentityTimeInterval,
          resource_tags as ResourceTags
      FROM
          db1.t1
      GROUP BY
          product_servicecode,
          identity_time_interval,
          resource_tags
      HAVING
          CAST(SUM(t1.line_item_unblended_cost) AS DECIMAL(10, 2)) > 0.00;
    columns:
      - name: UnblendedCost
      - name: ProductServiceCode
        unique: true
      - name: IdentityTimeInterval
        unique: true
      - name: ResourceTags
        unique: true

For now we have created a PR #9290 to document the limitation of the API.

gpop63 added the Integration:aws AWS label Jan 22, 2024

agithomas mentioned this issue Jan 23, 2024

[Meta] [AWS] AWS Billing Enhancements #8905

Closed

2 tasks

lalit-satapathy assigned gpop63 Jan 24, 2024

gpop63 closed this as completed Apr 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AWS] [Billing] Duplicated data when having multiple tags #8942

[AWS] [Billing] Duplicated data when having multiple tags #8942

gpop63 commented Jan 22, 2024 •

edited

Loading

agithomas commented Jan 23, 2024

gpop63 commented Feb 9, 2024 •

edited

Loading

tommyers-elastic commented Feb 13, 2024

agithomas commented Feb 14, 2024

agithomas commented Feb 14, 2024

m-adams commented Feb 14, 2024

tommyers-elastic commented Feb 14, 2024

vinaychandrasekhar commented Mar 4, 2024

lalit-satapathy commented Mar 21, 2024

m-adams commented Mar 21, 2024 •

edited

Loading

gpop63 commented Apr 1, 2024 •

edited

Loading

[AWS] [Billing] Duplicated data when having multiple tags #8942

[AWS] [Billing] Duplicated data when having multiple tags #8942

Comments

gpop63 commented Jan 22, 2024 • edited Loading

agithomas commented Jan 23, 2024

gpop63 commented Feb 9, 2024 • edited Loading

Athena Exploration

Prerequisites

Implementation Capabilities

Athena Input Integration

AWS Billing Integration

Benefits and Drawbacks

Benefits

Drawbacks

tommyers-elastic commented Feb 13, 2024

agithomas commented Feb 14, 2024

agithomas commented Feb 14, 2024

m-adams commented Feb 14, 2024

tommyers-elastic commented Feb 14, 2024

vinaychandrasekhar commented Mar 4, 2024

lalit-satapathy commented Mar 21, 2024

m-adams commented Mar 21, 2024 • edited Loading

gpop63 commented Apr 1, 2024 • edited Loading

gpop63 commented Jan 22, 2024 •

edited

Loading

gpop63 commented Feb 9, 2024 •

edited

Loading

m-adams commented Mar 21, 2024 •

edited

Loading

gpop63 commented Apr 1, 2024 •

edited

Loading