-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
consensus: bugfix in HFC.reconstructSummary #3750
Conversation
@@ -170,56 +170,84 @@ reconstructSummary (History.Shape shape) transition (HardForkState st) = | |||
go :: Exactly xs' EraParams |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is draft because I think the new code below is correct, but I'm in the process of convincing myself (and en passant writing missing documentation). In particular, I need to understand exactly when TransitionKnown
, TransitionUnknown
, and in particular TransitionImpossible
both can and cannot arise.
This code dates back to Edsko's original PR #2034 two years ago, and this code unfortunately didn't get any discussion that archived on that PR (Thomas and Edsko would review on Meet and write-down TODOs -- it's possible they talked about it without writing a comment on the PR).
The other aspect of due diligence is to catalog the downstream consequences of this change. Which is unfortunately kind of broad -- for those with access, you can see the use-def tree here https://input-output.atlassian.net/browse/CAD-4296?focusedCommentId=91616
go (ExactlyCons params ss) (TS (K Past{..}) t) = | ||
NonEmptyCons (EraSummary pastStart (EraEnd pastEnd) params) $ go ss t | ||
go (ExactlyCons params ExactlyNil) (TZ Current{..}) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This go (ExactlyCons params ExactlyNil) (TZ Current{..})
case is the buggy one. Just because it's the last era the code knows about doesn't mean there won't be another era. My intuition is that it should instead be treated just like the go (ExactlyCons params (ExactlyCons nextParams _)) (TZ Current{..})
case below.
By removing this case, we will be changing many kinds of downstream behaviors: eg local clients can send the GetInterpreter
query (aka QueryEraHistory
in cardano-node
repo) in order to be able to do their own slot<->time translations. Because of this bug, such requests while in the last known era currently have no time horizon (they'll happily translate slot<->time a million years in the future), but as of the bugfix they response to that query (and everything else downstream of reconstructSummary
) will have the intended finite horizon (mostly "one stability window").
In particular, it's possible some SPO has written their own client and is unfortunately depending on this buggy unlimited translation range during Alonzo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function summarizes a hard fork ledger state as a series of eras. It was always treating the last era the code was aware of as eternal, which is incorrect. On proper HFC chains, the final era is not eternal, we just, for most of the time, don't yet know when it will end. Also as part of this commit I've split TransitionImpossible into its two uses cases. This helps with clarity in the new `reconstructSummary` function, since it's now doing a bit more pattern matching.
I tried to fix the branch name, and GitHub closed the PR. I'm restoring and reopening :( |
63140a6
to
9df9d25
Compare
3cec75a
to
5db1413
Compare
I have not found any types that determine this, so I think its simply another required static input instead of a new type class method.
…ForkConsensusConfigExtensible
5db1413
to
8de871a
Compare
At this point, these four commits are my attempt to refactor/bugfix what was there, comprehensively. The last two commits are mainly driven by trying to preserve the ability to have an unbounded final era. That was a feature of the original design, and so I've tried here to retain it, but I'm not 100% sure we actually need it. So maybe we could replace those last two commits with something simpler. Even so, I think those last two commits demonstrate some key learnings. In particular, these two occurrences of I think there are two primary paths forward:
|
I chatted with Duncan on Slack, and he said something similar and a bit stronger:
I agree, if it all, it should be opt-in (via eg IntersectMBO/cardano-base#277). |
Closed -- see PR #3754 instead. |
3754: Bugfix in HFC: do not consider the last known era to be eternal r=nfrisby a=nfrisby This supersedes PR #3750. And it unblocks the Vasil HF. This PR fixes a bug in the Consensus Hard Fork Combinator (HFC). The bug is that certain parts of the HFC before this PR assume that the final era the code is aware of (ie the rightmost era in the `xs` argument to `HardForkBlock xs`) will never end. At face value, this assumption seems very reasonable. If the final era could end, then that means we wrote the code that knows how to end the final era but didn't simultaneously add the code for the following era, which is pretty clearly a bad idea unless you indeed want your system-wide chain to stop growing. The patterns we have in `protocolInfoCardano` and in the related call in [input-output-hk/cardano-node](https://github.com/input-output-hk/cardano-node) ensure that mistake would be quite obvious in review of such a PR. Despite that assumption seeming reasonable, merely adding Babbage in the recent PR #3595 revealed this assumption as a bug: the new code considered some Alonzo transactions on the historical chains to now be invalid. Together, this PR and PR IntersectMBO/cardano-ledger#2785 fix the bug and also allows those Alonzo transactions to remain valid. The recent PR #3595 added the Babbage era, changing `type CardanoBlock = HardForkBlock [ByronBlock, ShelleyBlock, AllegraBlock, MaryBlock, AlonzoBlock]` to `type CardanoBlock = HardForkBlock [ByronBlock, ShelleyBlock, AllegraBlock, MaryBlock, AlonzoBlock, BabbageBlock]`. From that change alone, due to the bug, the code stopped considering the Alonzo era as eternal, since it was no longer the final era. We now classify this assumption as a bug because it's clear that the (inevitable) addition of a new final era causes, via the bug, a non-mononotic change in behavior: for Alonzo (ie before we transition to Babbage), the Babbage-aware HFC now refuses to translate some slot<->times that it happily translated when Alonzo was the final era in the list. The eras prior to Alonzo are unaffected because Alonzo introduced Plutus scripts and with them the requirement that the validity interval (specified as an interval between two slots) on an Alonzo transaction that contains Plutus scripts must be translatable to POSIX times, because the Plutus interpreter exposes the interval to the script as POSIX times, not as slots. The translation between slots and times is the responsibility of the HFC, because it depends on the slot duration, which is allowed to change during era transitions (eg it changed from 20s to 1s when the chain transitioned from Byron to Shelley; it has not yet changed a second time). The HFC is very careful with that translation, as you can see in the Time chapter in the Hard Fork Combinator section of the [The Cardano Consensus and Storage Layer](https://hydra.iohk.io/job/Cardano/ouroboros-network/native.consensus-docs.x86_64-linux/latest/download/1) report. In particular, that chapter explains that the HFC refuses to translate slots<->times unless the answer would always remain correct regardless of any possible rollbacks (ie would be the same for any extension of our immutable tip, which is `k` blocks back from the tip of our currently selected chain). The assumption that the final era does not end is in direct violation of that rule: if we assume the final era won't end, then we might translate a slot/time that is (currently!) 1000 years into the future -- and it's obvious that future activity on this chain is likely to change the correspondence between slots and times at some point during the next 1000 years! It wasn't until Alonzo's transaction validity interval check that this mattered, because that's the first (and so far only) slot<->time translation in the ledger rules that involves a user-defined slot (ie what they set as the transaction's validity interval bounds) -- all other translations are fixed by the ledger rules and are by design always within the range the HFC will translate (even after this PR's bugfix). Thus, as a result of this PR, Babbage and subsequent eras will never be considered eternal, thereby satisfying the rule about all successful slot<->time translations being deterministic with respect to the selection's immutable tip. And PR IntersectMBO/cardano-ledger#2785 will intentionally violate that rule only during Alonzo, so that the historical transactions already on-chain remain valid. Because the consequences are currently limited to transaction validity intervals, there's no harm in that. So-called "clients", such as `db-sync`, the wallet, the Cardano cli tools, etc may also exhibit a change in behavior due to this PR, but those at worst will be less convenient than they seemed before: any features of those tools that allowed the user to translate slot<->time well into the future will now refuse to do so. This PR changes no types, so that downstream code already has code paths that handle the HFC's refusal to translate a slot/time; they merely weren't being exercised for as many arguments as they should have been. Co-authored-by: Nicolas Frisby <nick.frisby@iohk.io>
One commit PR, see its message.