Skip to content

Commit

Permalink
Reconfigure histogram example to be simpler, more practical
Browse files Browse the repository at this point in the history
  • Loading branch information
polyfractal authored and clintongormley committed Jan 7, 2015
1 parent 03efb3e commit 4effb7d
Show file tree
Hide file tree
Showing 3 changed files with 50 additions and 40 deletions.
81 changes: 46 additions & 35 deletions 300_Aggregations/30_histogram.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -12,27 +12,28 @@ undoubtedly had a few bar charts in it. The histogram works by specifying an int
prices, you might specify an interval of 20,000. This would create a new bucket
every $20,000. Documents are then sorted into buckets.

For our dashboard, we want a bar chart of car sale prices, but we
also want to know the top-selling make per price range. This is easily accomplished
using a `terms` bucket ((("terms bucket", "nested in a histogram bucket")))((("buckets", "nested in other buckets", "terms bucket nested in histogram bucket")))nested inside the `histogram`:
For our dashboard, we want to know how many cars sold in each price range. We
would also like to know the total revenue generated by that price bracket. This is
calculated by summing the price of each car sold in that interval.

To do this, we use a `histogram` and a nested `sum` metric:

[source,js]
--------------------------------------------------
GET /cars/transactions/_search?search_type=count
{
"aggs":{
"price":{
"histogram":{
"field":"price", <1>
"interval":20000 <1>
"histogram":{ <1>
"field": "price",
"interval": 20000
},
"aggs":{
"make":{
"terms":{
"field":"make", <2>
"size":1
"revenue": {
"sum": { <2>
"field" : "price"
}
}
}
}
}
}
Expand All @@ -42,19 +43,20 @@ GET /cars/transactions/_search?search_type=count
<1> The `histogram` bucket requires two parameters: a numeric field, and an
interval that defines the bucket size.
// Mention use of "size" to get back just the top result?
<2> A `terms` bucket is nested inside each price range, which will show us the
top make per price range.
<2> A `sum` metric is nested inside each price range, which will show us the
total revenue for that bracket

As you can see, our query is built around the `price` aggregation, which contains
a `histogram` bucket. This bucket requires a numeric field to calculate
buckets on, and an interval size. The interval defines how "wide" each bucket
is. An interval of 20000 means we will have the ranges `[0-19999, 20000-39999, ...]`.

Next, we define a nested bucket inside the histogram. This is a `terms` bucket
over the `make` field. There is also a new `size` parameter, which defines the number of terms we want to generate. A `size` of `1` means we want only the top make
for each price range (the make that has the highest doc count).
Next, we define a nested metric inside the histogram. This is a `sum` metric, which
will sum up the `price` field from each document landing in that price range.
This gives us the revenue for each price range, so we can see if our business
makes more money from commodity or luxury cars.

And here is the response (truncated):
And here is the response:

[source,js]
--------------------------------------------------
Expand All @@ -66,40 +68,49 @@ And here is the response (truncated):
{
"key": 0,
"doc_count": 3,
"make": {
"buckets": [
{
"key": "honda",
"doc_count": 1
}
]
"revenue": {
"value": 37000
}
},
{
"key": 20000,
"doc_count": 4,
"make": {
"buckets": [
{
"key": "ford",
"doc_count": 2
}
]
"revenue": {
"value": 95000
}
},
...
{
"key": 80000,
"doc_count": 1,
"revenue": {
"value": 80000
}
}
]
}
}
}
--------------------------------------------------

The response is fairly self-explanatory, but it should be noted that the
histogram keys correspond to the lower boundary of the interval. The key `0`
means `0-20,000`, the key `20000` means `20,000-40,000`, and so forth.
means `0-19,999`, the key `20000` means `20,000-39,999`, and so forth.

[NOTE]
.Empty buckets are missing!
=====================
You'll notice that empty intervals, such as $40,000-60,000, is missing in the
response. The `histogram` bucket omits these by default, since it could lead
to the unintended generation of potentially enormous output.
We'll discuss how to include empty buckets in the next section, <<_returning_empty_buckets>>
=====================

Graphically, you could represent the preceding data in the histogram shown in <<barcharts-histo1>>:

[[barcharts-histo1]]
.Top cars in each price range
image::images/elas_28in01.png["Top cars in each price range"]
.Sales and Revenue per price bracket
image::images/elas_28in01.png["Sales and Revenue per price bracket"]

Of course, you can build bar charts with any aggregation that emits categories
and statistics, not just the `histogram` bucket. Let's build a bar chart of
Expand Down
Binary file modified images/elas_28in01.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
9 changes: 4 additions & 5 deletions snippets/300_Aggregations/30_histogram.json
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,11 @@ GET /cars/transactions/_search?search_type=count
"interval":20000
},
"aggs":{
"make":{
"terms":{
"field":"make",
"size":1
"revenue": {
"sum": {
"field" : "price"
}
}
}
}
}
}
Expand Down

0 comments on commit 4effb7d

Please sign in to comment.