DAP Integration: sensitivity and privacy budget expenditure #35

martinthomson · 2024-10-03T05:26:38Z

Our design requires that we communicate the sensitivity and budget expenditure that every contribution makes. The aggregation service needs to take that information and use that in applying DP noise. There are a few ways to slice this, but my suggestion would be:

Each task is configured with a preset sensitivity ($\Delta$) and budget ($\epsilon$).
Each submission is bound to these values. We can use the additional authenticated data (AAD) component of the AEAD we use for encryption, so that the values -- which are public -- don't need to be decrypted and therefore don't need to be resubmitted with every submission. That will save a lot of overhead for batched submissions.
A submission is invalid unless the sensitivity is less than or equal to the configured sensitivity of the query. Ideally, the values would be the same, which might simplify things. However, queries will likely need some way to vary metadata for individual submissions, so this might not result in a genuine simplification of the protocol. We should probably start with the simple approach, though there is no technical or privacy reason that a smaller sensitivity wouldn't work. Using a smaller sensitivity is going to result in greater noise relative to that contribution, but the smaller sensitivity will reduce the size of the submission, which could be well worth the modest increase in complexity.
A submission is invalid unless the privacy budget expended is greater than or equal to the privacy budget of the query. Again, it would be ideal of the values were the same, with similar effect to that of having a lower sensitivity.
The aggregation service would apply noise based on the configured sensitivity and budget.

Extensions to the DAP protocol will be needed to communicate these values.

cjpatton · 2024-12-09T22:21:26Z

2. Each submission is bound to these values.  We can use the additional authenticated data (AAD) component of the AEAD we use for encryption, so that the values -- which are public -- don't need to be decrypted and therefore don't need to be resubmitted with every submission.  That will save a lot of overhead for batched submissions.

There are two ways to accomplish this w/ithout protocol changes:

Define a DAP report extension that encodes $\Delta$ and $\epsilon$ and include it in the the report metadata. This is straigh-forward but would waste some bandwidth. (Each report in a batch would include the same value.)
Use the taskprov report extension and encode $\Delta$ and $\epsilon$ in the TaskConfig's extension field.

3. A submission is invalid unless the sensitivity is less than or equal to the configured sensitivity of the query.  Ideally, the values would be the same, which might simplify things.  However, queries will likely need some way to vary metadata for individual submissions, so this might not result in a genuine simplification of the protocol.  We should probably start with the simple approach, though there is no technical or privacy reason that a smaller sensitivity wouldn't work.  Using a smaller sensitivity is going to result in greater noise relative to that contribution, but the smaller sensitivity will reduce the size of the submission, which could be well worth the modest increase in complexity.

4. A submission is invalid unless the privacy budget expended is greater than or equal to the privacy budget of the query.  Again, it would be ideal of the values were the same, with similar effect to that of having a lower sensitivity.

What do you mean by "query" here?

martinthomson · 2024-12-09T22:47:40Z

The first we've done already. Or at least we've a draft for it that the document references.

On that point, I hope that the sensitivity bound is effectively communicated via the L1 norm proof. Is there any risk there that a client that is told the wrong sensitivity could exceed the bound? My understanding is that this would lead to an invalid submission.

What do you mean by "query" here?

The aggregation that is done by DAP. Each release of an aggregate is what we've been calling a "query", in line with DP terms.

cjpatton · 2024-12-09T22:54:56Z

On that point, I hope that the sensitivity bound is effectively communicated via the L1 norm proof. Is there any risk there that a client that is told the wrong sensitivity could exceed the bound? My understanding is that this would lead to an invalid submission.

That's correct, insofar as the validity proof determines the sensitivity of the data. Encoding it somewhere would probably be redundant.

martinthomson moved this to Essential in Level 1 API Nov 25, 2024

martinthomson added this to Level 1 API Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DAP Integration: sensitivity and privacy budget expenditure #35

DAP Integration: sensitivity and privacy budget expenditure #35

martinthomson commented Oct 3, 2024 •

edited

Loading

cjpatton commented Dec 9, 2024

martinthomson commented Dec 9, 2024

cjpatton commented Dec 9, 2024

DAP Integration: sensitivity and privacy budget expenditure #35

DAP Integration: sensitivity and privacy budget expenditure #35

Comments

martinthomson commented Oct 3, 2024 • edited Loading

cjpatton commented Dec 9, 2024

martinthomson commented Dec 9, 2024

cjpatton commented Dec 9, 2024

martinthomson commented Oct 3, 2024 •

edited

Loading