Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding notion of a recovery owner for network recovery #6705

Open
wants to merge 28 commits into
base: main
Choose a base branch
from

Conversation

gaurav137
Copy link

@gaurav137 gaurav137 commented Dec 13, 2024

Adding support for a member that can act as a recovery owner and gets assigned the full recovery share to it. Such a member can single handedly recover the network on their own w/o requiring any other recovery members to submit their recovery shares.

This capability is going to be used in the confidential clean room scenario where a confidential recovery service gets created and is able to recover the CCF network on its own using a recovery owner identity. Only if the confidential recovery service is unable to perform recovery then the recovery members can come together and perform break glass recovery by submitting the threshold amount of recovery shares.

Changes done:

Added an optional recovery_role enum in MemberDetails to indicate whether a recovery member is a recovery owner or not.
Added support to set the above flag both via the cchost config file during network create and set_member proposal.
Recovery owners don't count towards the recovery threshold calculation so updated code paths accordingly.
Added test cases to recover a network using the recovery owner.

@gaurav137 gaurav137 requested a review from a team as a code owner December 13, 2024 08:11
@gaurav137 gaurav137 changed the title Adding notion of a recovery owner notion for network recovery Adding notion of a recovery owner for network recovery Dec 13, 2024
// owners as only these members will be handed a recovery share
// that accrues towards the recovery threshold.
// Note that it is acceptable to start a network without any member
// having a recovery share. The service will check that at least one
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we support opening deliberately unrecoverable services, and although I am not aware of current use cases, they have come up as potential use cases in the past, so I think we want to leave that open as a possibility.

Copy link
Author

@gaurav137 gaurav137 Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@achamayou not sure which lines you wanted me to change here. I added (a) a check to ensure that if recovery_owner has a value then enc_pub_key must also have a value else throw and (b) count the member_with_pubk_count while skipping the recovery_owner members.
(a) is like a configuration issue while (b) is only ensuring the correctness of the existing check that the count of recovery members and supplied or calculated default recovery threshold values are sane else the logic already throws below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The service will check that at least one..."
^ I don't believe this is true now, and I don't want it to become true, because it precludes creating un-recoverable systems, which we think may be desirable in some cases.

This is something that an operator can quite trivially preclude by modifying the transition_service_to_open() transition if they wish to do so, there is no reason to hardcode it outside the constitution.

Copy link
Author

@gaurav137 gaurav137 Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@achamayou "The service will check that at least one..." this check that there must be atleast 1 recovery member (aka participant) and recovery threshold cannot exceed that number existed before this PR and continues to work today. So as of now you cannot open a service that has 0 recovery participants. I preserved the check in main.cpp using members_with_pubk_count and there are checks in internal table access::set_recovery_threshold, remove_member and open_service that continue to work as before. Having owners has not changed the checks around recovery threshold and recovery members (participants).

std::copy_n(secret.begin(), secret.size(), data.begin());
OPENSSL_cleanse(secret.data(), secret.size());
// TBD (gsinha): How to set ccf::crypto::sharing::Share secret member
// variable here?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's data (the raw secret) prefixed with an index 0, if we do that, we might as well drop data altogether and avoid keeping the secret twice.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure how to instantiate a Share instance using the constructor inputs. If we can do that then the wrap/unwrap methods also would need to create a temporary secret.key(size) instance to replace the current data instance usage in those methods?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no way to do that in the current API, it's the lines 153 to 162 of sharing.cpp, but it's deliberately not exposed and the sampling is combined with the sharing to avoid sharing of unsafe keys/unsafe sizes.

I think that is fine, because re-constructed keys never need to compute a zero share, and so all objects created with this ctor should never need this member attribute. This is why I was suggesting splitting this into two distinct types: SharedLedgerSecretWrappingKey and ReconstructedLedgerSecretWrappingKey.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, @achamayou we can take this comment when we get to SharedLedgerSecretWrappingKey/ReconstructedLedgerSecretWrappingKey refactoring work?

src/node/share_manager.h Outdated Show resolved Hide resolved
src/node/share_manager.h Outdated Show resolved Hide resolved
src/node/share_manager.h Outdated Show resolved Hide resolved
src/node/share_manager.h Outdated Show resolved Hide resolved
@achamayou
Copy link
Member

@gaurav137 in member_frontend.h on line 602 there is a version number of the openapi generation, if you bump it up and rebuild, the next run of the schema test will fail but produce the updated schema. You can check that in and then the run after should work (it's just updating a golden file).

src/node/share_manager.h Outdated Show resolved Hide resolved
src/node/gov/handlers/recovery.h Outdated Show resolved Hide resolved
tests/recovery.py Show resolved Hide resolved
.prettierrc.yaml Outdated Show resolved Hide resolved
@@ -96,6 +110,11 @@ namespace ccf
return ret;
}

std::vector<uint8_t> get_full_share_serialised() const
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a problem because it will leave a vector with the full share on the stack, we need to call with a member id instead, and encrypt (same as wrap()) so as to only return encrypted info.

I think it would be better to modify wrap() to return a WrappedShare, with a map of member_id to wrapped_shared, all encrypted.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@achamayou I'm not clear how exactly to make this change. Perhaps you could show it how?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants