diff --git a/300_Aggregations/30_histogram.asciidoc b/300_Aggregations/30_histogram.asciidoc index 266b0f481..206605761 100644 --- a/300_Aggregations/30_histogram.asciidoc +++ b/300_Aggregations/30_histogram.asciidoc @@ -12,9 +12,11 @@ undoubtedly had a few bar charts in it. The histogram works by specifying an int prices, you might specify an interval of 20,000. This would create a new bucket every $20,000. Documents are then sorted into buckets. -For our dashboard, we want a bar chart of car sale prices, but we -also want to know the top-selling make per price range. This is easily accomplished -using a `terms` bucket ((("terms bucket", "nested in a histogram bucket")))((("buckets", "nested in other buckets", "terms bucket nested in histogram bucket")))nested inside the `histogram`: +For our dashboard, we want to know how many cars sold in each price range. We +would also like to know the total revenue generated by that price bracket. This is +calculated by summing the price of each car sold in that interval. + +To do this, we use a `histogram` and a nested `sum` metric: [source,js] -------------------------------------------------- @@ -22,17 +24,16 @@ GET /cars/transactions/_search?search_type=count { "aggs":{ "price":{ - "histogram":{ - "field":"price", <1> - "interval":20000 <1> + "histogram":{ <1> + "field": "price", + "interval": 20000 }, "aggs":{ - "make":{ - "terms":{ - "field":"make", <2> - "size":1 + "revenue": { + "sum": { <2> + "field" : "price" } - } + } } } } @@ -42,19 +43,20 @@ GET /cars/transactions/_search?search_type=count <1> The `histogram` bucket requires two parameters: a numeric field, and an interval that defines the bucket size. // Mention use of "size" to get back just the top result? -<2> A `terms` bucket is nested inside each price range, which will show us the -top make per price range. +<2> A `sum` metric is nested inside each price range, which will show us the +total revenue for that bracket As you can see, our query is built around the `price` aggregation, which contains a `histogram` bucket. This bucket requires a numeric field to calculate buckets on, and an interval size. The interval defines how "wide" each bucket is. An interval of 20000 means we will have the ranges `[0-19999, 20000-39999, ...]`. -Next, we define a nested bucket inside the histogram. This is a `terms` bucket -over the `make` field. There is also a new `size` parameter, which defines the number of terms we want to generate. A `size` of `1` means we want only the top make -for each price range (the make that has the highest doc count). +Next, we define a nested metric inside the histogram. This is a `sum` metric, which +will sum up the `price` field from each document landing in that price range. +This gives us the revenue for each price range, so we can see if our business +makes more money from commodity or luxury cars. -And here is the response (truncated): +And here is the response: [source,js] -------------------------------------------------- @@ -66,40 +68,49 @@ And here is the response (truncated): { "key": 0, "doc_count": 3, - "make": { - "buckets": [ - { - "key": "honda", - "doc_count": 1 - } - ] + "revenue": { + "value": 37000 } }, { "key": 20000, "doc_count": 4, - "make": { - "buckets": [ - { - "key": "ford", - "doc_count": 2 - } - ] + "revenue": { + "value": 95000 } }, -... + { + "key": 80000, + "doc_count": 1, + "revenue": { + "value": 80000 + } + } + ] + } + } } -------------------------------------------------- The response is fairly self-explanatory, but it should be noted that the histogram keys correspond to the lower boundary of the interval. The key `0` -means `0-20,000`, the key `20000` means `20,000-40,000`, and so forth. +means `0-19,999`, the key `20000` means `20,000-39,999`, and so forth. + +[NOTE] +.Empty buckets are missing! +===================== +You'll notice that empty intervals, such as $40,000-60,000, is missing in the +response. The `histogram` bucket omits these by default, since it could lead +to the unintended generation of potentially enormous output. + +We'll discuss how to include empty buckets in the next section, <<_returning_empty_buckets>> +===================== Graphically, you could represent the preceding data in the histogram shown in <>: [[barcharts-histo1]] -.Top cars in each price range -image::images/elas_28in01.png["Top cars in each price range"] +.Sales and Revenue per price bracket +image::images/elas_28in01.png["Sales and Revenue per price bracket"] Of course, you can build bar charts with any aggregation that emits categories and statistics, not just the `histogram` bucket. Let's build a bar chart of diff --git a/images/elas_28in01.png b/images/elas_28in01.png index 21a42b462..232dde3de 100644 Binary files a/images/elas_28in01.png and b/images/elas_28in01.png differ diff --git a/snippets/300_Aggregations/30_histogram.json b/snippets/300_Aggregations/30_histogram.json index 3962c3922..4d8d7b50f 100644 --- a/snippets/300_Aggregations/30_histogram.json +++ b/snippets/300_Aggregations/30_histogram.json @@ -9,12 +9,11 @@ GET /cars/transactions/_search?search_type=count "interval":20000 }, "aggs":{ - "make":{ - "terms":{ - "field":"make", - "size":1 + "revenue": { + "sum": { + "field" : "price" } - } + } } } }