Merge pull request #2 from private-attribution/histogram

Histogram explanation
patcg · Sep 12, 2024 · 7dc8bb0 · 7dc8bb0
2 parents 8adc624 + d6fa2df
commit 7dc8bb0
Show file tree

Hide file tree

Showing 3 changed files with 233 additions and 21 deletions.
diff --git a/api.bs b/api.bs
@@ -25,6 +25,57 @@ that enables the collection of aggregated, differentially-private metrics.
 The primary goal of this API is to enable attribution for advertising.
 
 
+## Attribution ## {#s-attribution}
+
+<dfn lt=attribution|attributed>Attribution</dfn> is the process of identifying [=actions=]
+that precede an [=outcome=] of interest,
+and allocating value to those [=actions=].
+
+For advertising, <dfn>actions</dfn> that are of interest
+are primarily the showing of advertisements
+(also referred to as <dfn>impressions</dfn>).
+Other actions include ad clicks (or other interactions)
+and opportunities to show ads that were not taken.
+
+Desired <dfn>outcomes</dfn> for advertising are more diverse,
+as they include any result that an advertiser seeks to improve
+through the showing of ads.
+A desirable outcome might also be referred to as a <dfn>conversion</dfn>,
+which refers to "converting" a potential customer
+into a customer.
+What counts as a conversion could include
+sales, subscriptions, page visits, and enquiries.
+
+For this API, [=actions=] and [=outcomes=] are both
+events: things that happen once.
+What is unique about attribution for advertising
+is that these events might not occur on the same [=site=].
+Advertisements are most often shown on sites
+other than the advertiser's site.
+
+The primary challenge with attribution is in maintaining privacy.
+Attribution involves connecting activity on different sites.
+The goal of attribution is to find an impression
+that was shown to the same person before the conversion occurred.
+
+If attribution information were directly revealed,
+it would enable unwanted
+[[PRIVACY-PRINCIPLES#dfn-cross-context-recognition|cross-context recognition]],
+thereby enabling [[UNSANCTIONED-TRACKING|tracking]].
+
+This document avoids cross context recognition by ensuring that
+attribution information is aggregated using an [=aggregation service=].
+The aggregation service is trusted to compute an aggregate
+without revealing the values that each person contributes to that aggregate.
+
+Strict limits are placed on the amount of information that each browser instance
+contributes to the aggregates for a given site.
+Differential privacy is used to provide additional privacy protection for each contribution.
+
+Details of aggregation service operation is included in [[#aggregation]].
+The differential privacy design used is outlined in [[#dp]].
+
+
 ## Background ## {#background}
 
 From the early days of the Web,
@@ -35,7 +86,7 @@ was the ability to obtain information about the effectiveness of advertising cam
 
 Web advertisers were able to measure key metrics like reach (how many people saw an ad),
 frequency (how often each person saw an ad),
-and conversions (how many people saw the ad then later took the action that the ad was supposed to motivate).
+and [=conversions=] (how many people saw the ad then later took the action that the ad was supposed to motivate).
 In comparison, these measurements were far more timely and accurate than for any other medium.
 
 The cost of measurement performance was privacy.
@@ -96,7 +147,50 @@ New additions to the
 
 ## Attribution Using Histograms ## {#histograms}
 
-TODO explain why we use histograms
+[=Attribution=] attempts to measure correlation
+between one or more ad placements ([=impressions=])
+and the [=outcomes=] that an advertiser desires.
+
+When considered in the aggregate,
+information about individuals is not useful.
+Actions and outcomes need to be grouped.
+
+The simplest form of attribution splits impressions into a number of groupings
+according to the attributes of the advertisement
+and counts the number of conversions.
+Groupings might be formed from attributes such as
+where the ad is shown,
+what was shown (the "creative"),
+when the ad was shown,
+or to whom.
+
+These groupings
+and the tallies of conversions attributed to each
+form a histogram.
+Each bucket of the histogram counts the conversions
+for a group of ads.
+
+<figure>
+<pre class=include-raw>
+path:images/histogram.svg
+</pre>
+<figcaption>Sample histogram for conversion counts,
+  grouped by the site where the impressions were shown</figcaption>
+</figure>
+
+Different groupings might be used for different purposes.
+For instance, grouping by creative (the content of an ad)
+might be used to learn which creative works best.
+
+Adding a value greater than one at each conversion
+enables more than simple counts.
+Histograms can also aggregate values,
+which might be used to differentiate between different outcomes.
+A higher value might be used for larger purchases
+or any outcome that is more highly-valued.
+A conversion value might also be split between multiple impressions
+to split credit,
+though this capability is not presently supported in the API.
 
 * Compatibility with privacy-preserving aggregation systems
 * Flexibility to assign buckets
@@ -109,36 +203,49 @@ TODO explain why we use histograms
 The private attribution API provides aggregate information about the
 association between two classes of events: [=impressions=] and [=conversions=].
 
-An <dfn>impression</dfn> is the
-event to which [=conversion=]s are being attributed. Selection of impression
-events is left to the consumer of the API. Examples include:
+An [=impression=] is any action that an advertiser takes on any website.
+The API does not constrain what can be recorded as an impression.
+Typical actions that an advertiser might seek to measure include:
 
-*   Displaying an advertisement to a user.
-*   Viewing a particular web page.
+*   Displaying an advertisement.
+*   Having a user interact with an advertisement in some way.
+*   Not displaying an advertisement (especially for controlled experiments that seek to confirm whether an advertising campaign is effective).
 
-A <dfn>conversion</dfn> is the
-event being attributed to [=impression=]s. Selection of conversion events
-is again left to the consumer of the API. Examples include:
+For the API, a [=conversion=] is an [=outcome=] that is being measured.
+The API does not constrain what might be considered to be an outcome.
+Typical outcomes that advertisers might seek to measure include:
 
-*   Signing up for an account.
 *   Making a purchase.
+*   Signing up for an account.
 *   Visiting a webpage.
 
-When an [=impression=] occurs, information about the impression is saved by the
-browser. This includes an identifier for the impression
-and some metadata about the impression, such as whether the impression was an
-ad view or an ad click.
+When an [=impression=] occurs,
+the <a method for=PrivateAttribution>saveImpression()</a> method can be used
+to request that the browser save information.
+This includes an identifier for the impression
+and some additional information about the impression.
+For instance, advertisers might use additional information
+to record whether the impression was an ad view or an ad click.
+
+At [=conversion=] time, a [=conversion report=] is created.
+A <dfn>conversion report</dfn> is an encrypted histogram contribution
+that includes information from any [=impressions=] that the browser previously stored.
 
-At [=conversion=] time, information for aggregation is created based on the
-impressions that were previously stored.  A site can request that the browser
-select impressions based on a simple query.
+The <a method for=PrivateAttribution>measureConversion</a> method accepts a simple query that is used
+to tell the browser how to construct a [=conversion report=].
+That includes a simple query that selects from the [=impressions=]
+that the browser has stored,
+a value to attribute to the selected impression(s),
+and other information needed to construct the [=conversion report=].
 
-*   If there was no matching impression,
+The histogram created by the [=conversion report=] is constructed as follows:
+
+*   If the query found no impressions,
     or the [=privacy budget=] for the site is exhausted,
     a histogram consisting entirely of zeros (0) is constructed.
 
 *   If a matching impression is found,
-    the specified value is added to a histogram
+    the provided value is added to a histogram
     at the bucket that was specified at the time of the impression.
     All other buckets are set to zero.
 
@@ -190,7 +297,9 @@ dictionary PrivateAttributionAggregationSystem {
 };
 </xmp>
 
-## SaveImpression API ## {#save-impression-api}
+## Saving Impressions ## {#save-impression-api}
+
+The <dfn method for=PrivateAttribution>saveImpression()</dfn> method does something or other.
 
 <pre>
 navigator.privateAttribution.saveImpression({
@@ -231,9 +340,12 @@ Implicit saveImpression API inputs:
 
 ## MeasureConversion API ## {#measure-conversion-api}
 
+The <dfn method for=PrivateAttribution>measureConversion()</dfn> method is used to do stuff.
+
 TODO:
 * Change filter data
 
+<pre>
 navigator.privateAttribution.measureConversion({
   // name of the aggregation system
   aggregator: "aggregator.example",
@@ -251,6 +363,7 @@ navigator.privateAttribution.measureConversion({
   // a list of sites where impressions might have been registered
   source: ["publisher.example"]
 });
+</pre>
 
 // TODO clarify "Infinity"
 

diff --git a/images/histogram.svg b/images/histogram.svg
diff --git a/images/value.svg b/images/value.svg