Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consensus: bugfix in HFC.reconstructSummary #3750

Closed
wants to merge 5 commits into from

Conversation

nfrisby
Copy link
Contributor

@nfrisby nfrisby commented May 17, 2022

One commit PR, see its message.

@nfrisby nfrisby added the consensus issues related to ouroboros-consensus label May 17, 2022
@nfrisby nfrisby requested review from dcoutts and nc6 May 17, 2022 13:45
@@ -170,56 +170,84 @@ reconstructSummary (History.Shape shape) transition (HardForkState st) =
go :: Exactly xs' EraParams
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is draft because I think the new code below is correct, but I'm in the process of convincing myself (and en passant writing missing documentation). In particular, I need to understand exactly when TransitionKnown, TransitionUnknown, and in particular TransitionImpossible both can and cannot arise.

This code dates back to Edsko's original PR #2034 two years ago, and this code unfortunately didn't get any discussion that archived on that PR (Thomas and Edsko would review on Meet and write-down TODOs -- it's possible they talked about it without writing a comment on the PR).

The other aspect of due diligence is to catalog the downstream consequences of this change. Which is unfortunately kind of broad -- for those with access, you can see the use-def tree here https://input-output.atlassian.net/browse/CAD-4296?focusedCommentId=91616

go (ExactlyCons params ss) (TS (K Past{..}) t) =
NonEmptyCons (EraSummary pastStart (EraEnd pastEnd) params) $ go ss t
go (ExactlyCons params ExactlyNil) (TZ Current{..}) =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This go (ExactlyCons params ExactlyNil) (TZ Current{..}) case is the buggy one. Just because it's the last era the code knows about doesn't mean there won't be another era. My intuition is that it should instead be treated just like the go (ExactlyCons params (ExactlyCons nextParams _)) (TZ Current{..}) case below.

By removing this case, we will be changing many kinds of downstream behaviors: eg local clients can send the GetInterpreter query (aka QueryEraHistory in cardano-node repo) in order to be able to do their own slot<->time translations. Because of this bug, such requests while in the last known era currently have no time horizon (they'll happily translate slot<->time a million years in the future), but as of the bugfix they response to that query (and everything else downstream of reconstructSummary) will have the intended finite horizon (mostly "one stability window").

In particular, it's possible some SPO has written their own client and is unfortunately depending on this buggy unlimited translation range during Alonzo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function summarizes a hard fork ledger state as a series of eras.

It was always treating the last era the code was aware of as eternal, which is
incorrect. On proper HFC chains, the final era is not eternal, we just, for
most of the time, don't yet know when it will end.

Also as part of this commit I've split TransitionImpossible into its two uses
cases. This helps with clarity in the new `reconstructSummary` function, since
it's now doing a bit more pattern matching.
@nfrisby nfrisby closed this May 17, 2022
@nfrisby nfrisby deleted the nfrisby-CAD-4264-Babbage-crash-2 branch May 17, 2022 19:48
@nfrisby nfrisby restored the nfrisby-CAD-4264-Babbage-crash-2 branch May 17, 2022 19:50
@nfrisby
Copy link
Contributor Author

nfrisby commented May 17, 2022

I tried to fix the branch name, and GitHub closed the PR. I'm restoring and reopening :(

@nfrisby nfrisby reopened this May 17, 2022
@nfrisby nfrisby force-pushed the nfrisby-CAD-4264-Babbage-crash-2 branch 4 times, most recently from 63140a6 to 9df9d25 Compare May 17, 2022 20:00
@nfrisby nfrisby force-pushed the nfrisby-CAD-4264-Babbage-crash-2 branch from 3cec75a to 5db1413 Compare May 17, 2022 21:30
I have not found any types that determine this, so I think its simply another
required static input instead of a new type class method.
@nfrisby nfrisby force-pushed the nfrisby-CAD-4264-Babbage-crash-2 branch from 5db1413 to 8de871a Compare May 18, 2022 02:39
@nfrisby
Copy link
Contributor Author

nfrisby commented May 18, 2022

At this point, these four commits are my attempt to refactor/bugfix what was there, comprehensively. The last two commits are mainly driven by trying to preserve the ability to have an unbounded final era. That was a feature of the original design, and so I've tried here to retain it, but I'm not 100% sure we actually need it. So maybe we could replace those last two commits with something simpler. Even so, I think those last two commits demonstrate some key learnings.

In particular, these two occurrences of TransitionImpossible are the only two that I don't know see a way to simply eliminate and both rely on the notion that a "degenerate, single era" use of the HFC is to be unbounded. I think that notion assumes that the single block type alone, would also be unbounded. Which clearly wasn't the case with Byron, but our "lesson learned" is to just use the HFC, even with a single block (but not in a degenerate way), even when you're chain is just starting and you only have the one era. In other words, make an open-world assumption that you'll have another era (and thus a bounded forecast range) even when you don't initially plan on having more eras. We had to do some hacky stuff to make up for the fact that Byron was not future-proof in that way.

https://github.com/input-output-hk/ouroboros-network/blob/0a3a61ae80244dde943843fd39ef17cd85260980/ouroboros-consensus/src/Ouroboros/Consensus/HardFork/Combinator/Embed/Unary.hs#L75-L81

https://github.com/input-output-hk/ouroboros-network/blob/0a3a61ae80244dde943843fd39ef17cd85260980/ouroboros-consensus/src/Ouroboros/Consensus/HardFork/Combinator/Embed/Unary.hs#L340-L353

https://github.com/input-output-hk/ouroboros-network/blob/0a3a61ae80244dde943843fd39ef17cd85260980/ouroboros-consensus/src/Ouroboros/Consensus/HardFork/Combinator/Embed/Unary.hs#L564-L570

I think there are two primary paths forward:

  • never have unbounded forecast ranges, even for a "degenerate" HFC application

  • add a new first parameter to the HardForkBlock that indicates whether the era list argument is open or closed (approx. final, a la Java's keyword) and probably add a matching related index to TransitionInfo etc

@nfrisby
Copy link
Contributor Author

nfrisby commented May 18, 2022

never have unbounded forecast ranges, even for a "degenerate" HFC application

I chatted with Duncan on Slack, and he said something similar and a bit stronger:

HardForkBlock [blk] being indistinguishable from blk was a useful property. Perhaps that property can be preserved by saying that even for a single non-HFC block type that there's a limited time horizon.
I think it's a much more reasonable assumption to make to assume that there could be change in future than to assume there will never be change.

And no, I also cannot think of when we'd need an unbounded epoch info.
Well, ok there are some client use cases where it's useful to be able to make that assumption locally (and understand and accept the consequences), but I can't think of places within the node we want that.

I agree, if it all, it should be opt-in (via eg IntersectMBO/cardano-base#277).

@nfrisby
Copy link
Contributor Author

nfrisby commented May 18, 2022

Closed -- see PR #3754 instead.

@nfrisby nfrisby closed this May 20, 2022
iohk-bors bot added a commit that referenced this pull request May 20, 2022
3754: Bugfix in HFC: do not consider the last known era to be eternal r=nfrisby a=nfrisby

This supersedes PR #3750. And it unblocks the Vasil HF.

This PR fixes a bug in the Consensus Hard Fork Combinator (HFC). The bug is that certain parts of the HFC before this PR assume that the final era the code is aware of (ie the rightmost era in the `xs` argument to `HardForkBlock xs`) will never end. At face value, this assumption seems very reasonable. If the final era could end, then that means we wrote the code that knows how to end the final era but didn't simultaneously add the code for the following era, which is pretty clearly a bad idea unless you indeed want your system-wide chain to stop growing. The patterns we have in `protocolInfoCardano` and in the related call in [input-output-hk/cardano-node](https://github.com/input-output-hk/cardano-node) ensure that mistake would be quite obvious in review of such a PR. Despite that assumption seeming reasonable, merely adding Babbage in the recent PR #3595 revealed this assumption as a bug: the new code considered some Alonzo transactions on the historical chains to now be invalid. Together, this PR and PR IntersectMBO/cardano-ledger#2785 fix the bug and also allows those Alonzo transactions to remain valid.

The recent PR #3595 added the Babbage era, changing `type CardanoBlock = HardForkBlock [ByronBlock, ShelleyBlock, AllegraBlock, MaryBlock, AlonzoBlock]` to `type CardanoBlock = HardForkBlock [ByronBlock, ShelleyBlock, AllegraBlock, MaryBlock, AlonzoBlock, BabbageBlock]`. From that change alone, due to the bug, the code stopped considering the Alonzo era as eternal, since it was no longer the final era. We now classify this assumption as a bug because it's clear that the (inevitable) addition of a new final era causes, via the bug, a non-mononotic change in behavior: for Alonzo (ie before we transition to Babbage), the Babbage-aware HFC now refuses to translate some slot<->times that it happily translated when Alonzo was the final era in the list.

The eras prior to Alonzo are unaffected because Alonzo introduced Plutus scripts and with them the requirement that the validity interval (specified as an interval between two slots) on an Alonzo transaction that contains Plutus scripts must be translatable to POSIX times, because the Plutus interpreter exposes the interval to the script as POSIX times, not as slots. The translation between slots and times is the responsibility of the HFC, because it depends on the slot duration, which is allowed to change during era transitions (eg it changed from 20s to 1s when the chain transitioned from Byron to Shelley; it has not yet changed a second time). The HFC is very careful with that translation, as you can see in the Time chapter in the Hard Fork Combinator section of the [The Cardano Consensus and Storage Layer](https://hydra.iohk.io/job/Cardano/ouroboros-network/native.consensus-docs.x86_64-linux/latest/download/1) report. In particular, that chapter explains that the HFC refuses to translate slots<->times unless the answer would always remain correct regardless of any possible rollbacks (ie would be the same for any extension of our immutable tip, which is `k` blocks back from the tip of our currently selected chain).

The assumption that the final era does not end is in direct violation of that rule: if we assume the final era won't end, then we might translate a slot/time that is (currently!) 1000 years into the future -- and it's obvious that future activity on this chain is likely to change the correspondence between slots and times at some point during the next 1000 years! It wasn't until Alonzo's transaction validity interval check that this mattered, because that's the first (and so far only) slot<->time translation in the ledger rules that involves a user-defined slot (ie what they set as the transaction's validity interval bounds) -- all other translations are fixed by the ledger rules and are by design always within the range the HFC will translate (even after this PR's bugfix).

Thus, as a result of this PR, Babbage and subsequent eras will never be considered eternal, thereby satisfying the rule about all successful slot<->time translations being deterministic with respect to the selection's immutable tip. And PR IntersectMBO/cardano-ledger#2785 will intentionally violate that rule only during Alonzo, so that the historical transactions already on-chain remain valid. Because the consequences are currently limited to transaction validity intervals, there's no harm in that.

So-called "clients", such as `db-sync`, the wallet, the Cardano cli tools, etc may also exhibit a change in behavior due to this PR, but those at worst will be less convenient than they seemed before: any features of those tools that allowed the user to translate slot<->time well into the future will now refuse to do so. This PR changes no types, so that downstream code already has code paths that handle the HFC's refusal to translate a slot/time; they merely weren't being exercised for as many arguments as they should have been.

Co-authored-by: Nicolas Frisby <nick.frisby@iohk.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
consensus issues related to ouroboros-consensus
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant