-
Notifications
You must be signed in to change notification settings - Fork 440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AWS] [Billing] Duplicated data when having multiple tags #8942
Comments
@kaiyan-sheng , you developed the current AWS billing integration. What do you think of the new approach mentioned in the description ? |
Athena ExplorationAmazon Athena is a query service frequently utilized for log analysis and big data analytics. It is capable of analyzing logs from various AWS services such as CloudTrail, CloudFront, as well as application logs. Prerequisites
Implementation CapabilitiesAthena Input IntegrationAs suggested by @agithomas, this could serve as an input package. It allows for the use of any SQL query against a table and the selection of desired fields to be included in the ES documents. Users would have the ability to query any of their data from an S3 bucket as long as it's in one of the supported formats. AWS Billing IntegrationWhile SQL query and fields remain customizable, the default config will prioritize key columns from billing data reports. Additional fields requests would not require beats changes. Example of billing config
- module: aws
period: 1m
access_key_id: <REDACTED>
secret_access_key: <REDACTED>
regions:
- eu-west-1
metricsets:
- billingv2
athena_config:
table: t1
database: db1
query_results_s3_bucket: s3://example/
sql_query: |
SELECT
CAST(SUM(t1.line_item_unblended_cost) AS DECIMAL(10, 2)) AS UnblendedCost,
product_servicecode as ProductServiceCode,
identity_time_interval as IdentityTimeInterval,
resource_tags as ResourceTags
FROM
db1.t1
GROUP BY
product_servicecode,
identity_time_interval,
resource_tags
HAVING
CAST(SUM(t1.line_item_unblended_cost) AS DECIMAL(10, 2)) > 0.00;
columns:
- name: UnblendedCost
- name: ProductServiceCode
unique: true
- name: IdentityTimeInterval
unique: true
- name: ResourceTags
unique: true Benefits and DrawbacksBenefits
Drawbacks
|
thanks for the detail @gpop63. i'm sceptical of the athena solution primarily because of the effort required to set it up. could a short term solution to this issue be to document the limitations of the current integration wrt to tags, and make it very clear that incorrect (inflated) cost data will be reported if multiple tags/dimensions are present? in terms of other things we could do that continue to utilize the existing cost API input, could we change how we do the groupings such that we get accurate cost data, but perhaps we have a limit on the number of supported dimensions. so at least we do not ever report incorrect data, even if it means customers cannot query it in such a granular way? |
Would providing an AWS cloud formation template or terraform, including them as part of Readme, simplify the setup process? |
Can we come at this form a customer 0 perspective. |
@vinaychandrasekhar @SubhrataK it sounds like we need some research on the best way forward here before we implement anything new. @gpop63 @agithomas at a minimum right now we should document the issue and/or remove the ability to configure the existing integration in a way that causes incorret billing data to be generated. |
@SubhrataK @lalit-satapathy - are we tracking this research effort? Do you need any input from me? |
if we close this issue, can we open a new one for making multi-tag analysis work please |
Leaving the steps here as a reference in case it is decided to proceed with the implementation. This should cover the steps required both in AWS and Agent (Metricbeat). AWS:
Agent (Metricbeat):
Metricbeat config example
- module: aws
period: 1m
access_key_id: <REDACTED>
secret_access_key: <REDACTED>
regions:
- eu-west-1
metricsets:
- billingv2
athena_config:
table: t1
database: db1
query_results_s3_bucket: s3://example/
sql_query: |
SELECT
CAST(SUM(t1.line_item_unblended_cost) AS DECIMAL(10, 2)) AS UnblendedCost,
product_servicecode as ProductServiceCode,
identity_time_interval as IdentityTimeInterval,
resource_tags as ResourceTags
FROM
db1.t1
GROUP BY
product_servicecode,
identity_time_interval,
resource_tags
HAVING
CAST(SUM(t1.line_item_unblended_cost) AS DECIMAL(10, 2)) > 0.00;
columns:
- name: UnblendedCost
- name: ProductServiceCode
unique: true
- name: IdentityTimeInterval
unique: true
- name: ResourceTags
unique: true For now we have created a PR #9290 to document the limitation of the API. |
When using an AWS billing configuration that groups by a combination of tags and dimensions, such as SERVICE and multiple tags (for example, team, project, aws:createdBy), we may end up with multiple duplicates of the same data. This is due to a limitation of the GetCostAndUsage API, which only allows grouping by two groups at once.
In beats, we pair each tag with each dimension and initiate a request. The total number of Cost and Usage requests equals the number of tags multiplied by the number of dimensions.
Possible solutions:
GetCostAndUsage
requests. We would need a way to know which filter to use for which request.@agithomas @lalit-satapathy
The text was updated successfully, but these errors were encountered: