Reconfigure histogram example to be simpler, more practical

baakind · Jan 7, 2015 · 4effb7d · 4effb7d
1 parent 03efb3e
commit 4effb7d
Show file tree

Hide file tree

Showing 3 changed files with 50 additions and 40 deletions.
diff --git a/300_Aggregations/30_histogram.asciidoc b/300_Aggregations/30_histogram.asciidoc
@@ -12,27 +12,28 @@ undoubtedly had a few bar charts in it. The histogram works by specifying an int
 prices, you might specify an interval of 20,000.  This would create a new bucket
 every $20,000.  Documents are then sorted into buckets.
 
-For our dashboard, we want a bar chart of car sale prices, but we
-also want to know the top-selling make per price range.  This is easily accomplished
-using a `terms` bucket ((("terms bucket", "nested in a histogram bucket")))((("buckets", "nested in other buckets", "terms bucket nested in histogram bucket")))nested inside the `histogram`:
+For our dashboard, we want to know how many cars sold in each price range.  We
+would also like to know the total revenue generated by that price bracket.  This is
+calculated by summing the price of each car sold in that interval.
+
+To do this, we use a `histogram` and a nested `sum` metric:
 
 [source,js]
 --------------------------------------------------
 GET /cars/transactions/_search?search_type=count
 {
    "aggs":{
       "price":{
-         "histogram":{
-            "field":"price",    <1>
-            "interval":20000    <1>
+         "histogram":{ <1>
+            "field": "price",
+            "interval": 20000
          },
          "aggs":{
-            "make":{
-               "terms":{
-                  "field":"make",   <2>
-                  "size":1
+            "revenue": {
+               "sum": { <2>
+                 "field" : "price"
                }
-            }
+             }
          }
       }
    }
@@ -42,19 +43,20 @@ GET /cars/transactions/_search?search_type=count
 <1> The `histogram` bucket requires two parameters: a numeric field, and an
 interval that defines the bucket size.
 // Mention use of "size" to get back just the top result?
-<2> A `terms` bucket is nested inside each price range, which will show us the
-top make per price range.
+<2> A `sum` metric is nested inside each price range, which will show us the
+total revenue for that bracket
 
 As you can see, our query is built around the `price` aggregation, which contains
 a `histogram` bucket.  This bucket requires a numeric field to calculate
 buckets on, and an interval size.  The interval defines how "wide" each bucket
 is.  An interval of 20000 means we will have the ranges `[0-19999, 20000-39999, ...]`.
 
-Next, we define a nested bucket inside the histogram.  This is a `terms` bucket
-over the `make` field.  There is also a new `size` parameter, which defines the number of terms we want to generate.  A `size` of `1` means we want only the top make
-for each price range (the make that has the highest doc count).
+Next, we define a nested metric inside the histogram.  This is a `sum` metric, which
+will sum up the `price` field from each document landing in that price range. 
+This gives us the revenue for each price range, so we can see if our business
+makes more money from commodity or luxury cars.
 
-And here is the response (truncated):
+And here is the response:
 
 [source,js]
 --------------------------------------------------
@@ -66,40 +68,49 @@ And here is the response (truncated):
             {
                "key": 0,
                "doc_count": 3,
-               "make": {
-                  "buckets": [
-                     {
-                        "key": "honda",
-                        "doc_count": 1
-                     }
-                  ]
+               "revenue": {
+                  "value": 37000
                }
             },
             {
                "key": 20000,
                "doc_count": 4,
-               "make": {
-                  "buckets": [
-                     {
-                        "key": "ford",
-                        "doc_count": 2
-                     }
-                  ]
+               "revenue": {
+                  "value": 95000
                }
             },
-...
+            {
+               "key": 80000,
+               "doc_count": 1,
+               "revenue": {
+                  "value": 80000
+               }
+            }
+         ]
+      }
+   }
 }
 --------------------------------------------------
 
 The response is fairly self-explanatory, but it should be noted that the
 histogram keys correspond to the lower boundary of the interval.  The key `0`
-means `0-20,000`, the key `20000` means `20,000-40,000`, and so forth.
+means `0-19,999`, the key `20000` means `20,000-39,999`, and so forth.
+
+[NOTE]
+.Empty buckets are missing!
+=====================
+You'll notice that empty intervals, such as $40,000-60,000, is missing in the
+response.  The `histogram` bucket omits these by default, since it could lead
+to the unintended generation of potentially enormous output.
+
+We'll discuss how to include empty buckets in the next section, <<_returning_empty_buckets>>
+=====================
 
 Graphically, you could represent the preceding data in the histogram shown in <<barcharts-histo1>>:
 
 [[barcharts-histo1]]
-.Top cars in each price range
-image::images/elas_28in01.png["Top cars in each price range"]
+.Sales and Revenue per price bracket
+image::images/elas_28in01.png["Sales and Revenue per price bracket"]
 
 Of course, you can build bar charts with any aggregation that emits categories
 and statistics, not just the `histogram` bucket.  Let's build a bar chart of

diff --git a/images/elas_28in01.png b/images/elas_28in01.png
diff --git a/snippets/300_Aggregations/30_histogram.json b/snippets/300_Aggregations/30_histogram.json
@@ -9,12 +9,11 @@ GET /cars/transactions/_search?search_type=count
             "interval":20000
          },
          "aggs":{
-            "make":{
-               "terms":{
-                  "field":"make",
-                  "size":1
+            "revenue": {
+               "sum": {
+                 "field" : "price"
                }
-            }
+             }
          }
       }
    }