Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAP Integration: sensitivity and privacy budget expenditure #35

Open
martinthomson opened this issue Oct 3, 2024 · 3 comments
Open

Comments

@martinthomson
Copy link
Collaborator

martinthomson commented Oct 3, 2024

Our design requires that we communicate the sensitivity and budget expenditure that every contribution makes. The aggregation service needs to take that information and use that in applying DP noise. There are a few ways to slice this, but my suggestion would be:

  1. Each task is configured with a preset sensitivity ($\Delta$) and budget ($\epsilon$).
  2. Each submission is bound to these values. We can use the additional authenticated data (AAD) component of the AEAD we use for encryption, so that the values -- which are public -- don't need to be decrypted and therefore don't need to be resubmitted with every submission. That will save a lot of overhead for batched submissions.
  3. A submission is invalid unless the sensitivity is less than or equal to the configured sensitivity of the query. Ideally, the values would be the same, which might simplify things. However, queries will likely need some way to vary metadata for individual submissions, so this might not result in a genuine simplification of the protocol. We should probably start with the simple approach, though there is no technical or privacy reason that a smaller sensitivity wouldn't work. Using a smaller sensitivity is going to result in greater noise relative to that contribution, but the smaller sensitivity will reduce the size of the submission, which could be well worth the modest increase in complexity.
  4. A submission is invalid unless the privacy budget expended is greater than or equal to the privacy budget of the query. Again, it would be ideal of the values were the same, with similar effect to that of having a lower sensitivity.
  5. The aggregation service would apply noise based on the configured sensitivity and budget.

Extensions to the DAP protocol will be needed to communicate these values.

@martinthomson martinthomson moved this to Essential in Level 1 API Nov 25, 2024
@cjpatton
Copy link

cjpatton commented Dec 9, 2024

2. Each submission is bound to these values.  We can use the additional authenticated data (AAD) component of the AEAD we use for encryption, so that the values -- which are public -- don't need to be decrypted and therefore don't need to be resubmitted with every submission.  That will save a lot of overhead for batched submissions.

There are two ways to accomplish this w/ithout protocol changes:

  1. Define a DAP report extension that encodes $\Delta$ and $\epsilon$ and include it in the the report metadata. This is straigh-forward but would waste some bandwidth. (Each report in a batch would include the same value.)
  2. Use the taskprov report extension and encode $\Delta$ and $\epsilon$ in the TaskConfig's extension field.
3. A submission is invalid unless the sensitivity is less than or equal to the configured sensitivity of the query.  Ideally, the values would be the same, which might simplify things.  However, queries will likely need some way to vary metadata for individual submissions, so this might not result in a genuine simplification of the protocol.  We should probably start with the simple approach, though there is no technical or privacy reason that a smaller sensitivity wouldn't work.  Using a smaller sensitivity is going to result in greater noise relative to that contribution, but the smaller sensitivity will reduce the size of the submission, which could be well worth the modest increase in complexity.

4. A submission is invalid unless the privacy budget expended is greater than or equal to the privacy budget of the query.  Again, it would be ideal of the values were the same, with similar effect to that of having a lower sensitivity.

What do you mean by "query" here?

@martinthomson
Copy link
Collaborator Author

The first we've done already. Or at least we've a draft for it that the document references.

On that point, I hope that the sensitivity bound is effectively communicated via the L1 norm proof. Is there any risk there that a client that is told the wrong sensitivity could exceed the bound? My understanding is that this would lead to an invalid submission.

What do you mean by "query" here?

The aggregation that is done by DAP. Each release of an aggregate is what we've been calling a "query", in line with DP terms.

@cjpatton
Copy link

cjpatton commented Dec 9, 2024

On that point, I hope that the sensitivity bound is effectively communicated via the L1 norm proof. Is there any risk there that a client that is told the wrong sensitivity could exceed the bound? My understanding is that this would lead to an invalid submission.

That's correct, insofar as the validity proof determines the sensitivity of the data. Encoding it somewhere would probably be redundant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Essential
Development

No branches or pull requests

2 participants