Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miner will sometimes propose a tenure extend before 70% of signers are ready to accept it #5757

Open
obycode opened this issue Jan 28, 2025 · 7 comments
Assignees

Comments

@obycode
Copy link
Contributor

obycode commented Jan 28, 2025

Currently the miner will attempt to extend the tenure as soon as it believes 70% of the signers will approve it. This is based on timestamps that it is getting from the signer. Due to clock skew between the signers and the miners, this could cause a signer that the miner expected to accept, to instead reject a tenure extend, because it is slightly too early. This problem is worsened by the fact that when a small number of signers reject the tenure extend, it sometimes results in a standstill, where <70% approve and <30% reject, causing a stall.

Some options to avoid this:

  1. Wait for a time where the miner expects a higher percentage of signers to approve (e.g. 75%)
  2. Once the miner believes it has reached a time at which 70% of signers will approve a tenure extend, wait an additional N% of the default tenure_idle_timeout_secs before issuing the tenure extend
  3. Do not try to issue the tenure extend until at least 50% of the budget has been used (since it is not necessary in this case). This would at least make this situation happen less often.
  4. Redesign the time-based tenure extend communication to use a relative time instead of an absolute time. This option was rejected in the original design, since the miner is unable to know how much time has passed between the signer setting that time and when the miner sees it. The positive side of this, is that the error in this case would be in the direction of submitting the tenure extend later than it could have, which would avoid the problem of signers rejecting it.
@github-project-automation github-project-automation bot moved this to Status: 🆕 New in Stacks Core Eng Jan 28, 2025
@obycode obycode self-assigned this Jan 28, 2025
@kantai
Copy link
Member

kantai commented Jan 28, 2025

I like options 2 and 3.

Option 3 reduces the frequency of the issue, which is helpful in general.

Option 2 seems to get at the issue more directly. I'm uncertain about the mechanics of the N%, though. Really, I think that a flat (configurable) N seconds may make more sense -- the clock skew itself doesn't depend on tenure_idle_timeout_secs so why should the wait time?

I'll also add an option 5, which is maybe more complex: the signers adjust the tenure extend timestamp that they return the miner based on their perceived clock skew.

The signer knows what time they think they received the proposal from the miner (BlockInfo.proposed_time), and the proposal itself contains the miner's timestamp. If the miner's block proposal has time M and the signer received the block at local time S, then the "network delay" is S - M. If all clocks are synchronized, this is a positive duration that is something like the stackerdb propagation delay. If the miner is running ahead of the signer, this number will be shorter than the propagation delay, and possibly negative.

Theoretically, when the signers return a block acceptance to the miner, they could do:

    pub fn create_block_acceptance(&self, block_info: &BlockInfo) -> BlockResponse {
        let block = &block_info.block;
        let tenure_extend_ts = self.signer_db.calculate_tenure_extend_timestamp(
                self.proposal_config.tenure_idle_timeout,
                block,
                true,
        );
        let adjusted_ts = i128::from(tenure_extend_ts) - block_info.proposed_time.into() + block.header.timestamp.into();
        let adjusted_ts = u64::try_from(adjusted).unwrap_or(tenure_extend_ts);
        let signature = self
            .private_key
            .sign(block.header.signer_signature_hash().bits())
            .expect("Failed to sign block");
        BlockResponse::accepted(
            block.header.signer_signature_hash(),
            signature,
            adjusted_ts,
        )
    }

This is definitely more complex! But it also raises another option for implementing (2), which is having the signers just do the adjustment themselves (i.e., add a configurable amount to the returned value in accepted()).

@obycode
Copy link
Contributor Author

obycode commented Jan 28, 2025

Option 5 is interesting. A problem though is that the block's timestamp is set at the beginning of the mining process, so the signer would be over-estimating the network delay since it is including the block build time. This would cause the adjusted time to be sooner than it should be, worsening the problem.

@obycode
Copy link
Contributor Author

obycode commented Jan 28, 2025

I agree that we could definitely do both 2 and 3 together, and your point about it not needing to be a percentage of the time but instead just a constant value definitely makes sense.

@jferrant
Copy link
Collaborator

I have no issues with doing 2 and 3 together. I would hesitate to do anything more complicated as a miner proposing a tenure extend too early really should not ultimately be a problem/shouldn't cause a stall (is more a problem with the miner's retry logic/network latency which Roberto is already partially addressing). Mitigating it with the least amount of lift is what I am gonna lean towards.

@obycode
Copy link
Contributor Author

obycode commented Jan 28, 2025

#5760 may inadvertently help a little here too, since it will spend a little time trying to mine other transactions before proposing the block.

@obycode
Copy link
Contributor Author

obycode commented Feb 6, 2025

#5787 implemented item 3, Do not try to issue the tenure extend until at least 50% of the budget has been used (since it is not necessary in this case). This would at least make this situation happen less often.

obycode added a commit that referenced this issue Feb 7, 2025
This allows the miner to specify a buffer (in seconds) that it should
wait after it has reached a time when it thinks 70% of signers should
approve a time-based tenure extend. The goal is to provide a little
buffer for clock skew, since we have seen, live on mainnet, cases where
<70% of signers approve a tenure-extend.

This is item 2 in #5757.
@obycode
Copy link
Contributor Author

obycode commented Feb 7, 2025

#5816 implements item 2. After that merges, I think we can close this issue.

@obycode obycode moved this from Status: 🆕 New to Status: 💻 In Progress in Stacks Core Eng Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Status: 💻 In Progress
Development

No branches or pull requests

3 participants