Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Engine API spec improvement proposal #321

Closed
mkalinin opened this issue Nov 8, 2022 · 14 comments
Closed

Engine API spec improvement proposal #321

mkalinin opened this issue Nov 8, 2022 · 14 comments

Comments

@mkalinin
Copy link
Collaborator

mkalinin commented Nov 8, 2022

Main ideas behind this proposal were brought up during the Engine API session at the Devcon R&D workshop. Thanks everyone attended for fruitful and engaged conversation. Special thanks to @protolambda for taking notes.

Upcoming HFs introduce changes to existing Engine API data structures and method semantics, and also may require new methods to be added. Outside of HFs scope, there are a number of proposals to add auxiliary methods for optimisation and usability purposes (#218, #318). In some cases auxiliary methods may be optional, i.e. EL may not support such a method but if supported it can be utilized by CL.

On the other side, we need a convenient way of deprecating unused or redundant methods (e.g. engine_exchangeTransitionConfigurationV1).

Capability of introducing changes to the Engine API without coordinated upgrades between the layers is an important property of proposed design.

Method and structure versioning

This proposal doesn't affect requirements described in the Versioning section of the original specification. Any changes to a method behaviour or its parameter set, likewise, changes to the fields of a data structure should be signified by bumping its version. A version is reflected in the name of each method and structure in a VX suffix, where X is a number of the version.

This approach makes specification clearer by shaping changes into separate self-contained definitions. CL and EL client implementations are free to maintain versioning of data structures according to their preferences and may utilize optionality of JSON fields whenever it is suitable.

Capabilities

New engine_getCaps method is proposed (accepts no parameters, returns array of strings). The new method returns a list of Engine API methods (capabilities) that are currently supported by the corresponding EL client. Every method must be represented by all supported versions, i.e. engine_newPayloadV1 and engine_newPayloadV2 must be in the return list if EL currently supports both.

The method may not return itself in the list, a version suffix for this method doesn't seem relevant too.

It is assumed that CL clients request EL capabilities at the beginning of the Websocket session, and do this on demand while communicating via HTTP (periodically or after EL gets back from outage).

Note: Deprecated methods may be listed in the response to engine_getCaps if EL client does still support them.

Note: Optional methods are not required to be supported by EL clients, thus, may not be listed in the response to engine_getCaps.

Method status

The following method statuses are proposed:

  • Experimental -- yet under development, semantics and parameter set changes are expected. Such a method may have no version suffix (name of a feature as a suffix does seem more convenient, e.g. engine_newPayloadWithdrwals, engine_getPayload4844 etc).
  • Final -- method specification is considered solid, changes may only be done in order to fix a bug.
  • Optional -- same as Final, but for optional methods.
  • Deprecated -- the method is deprecated and may no longer be supported by EL clients.

Note: These statuses are only used in the capabilities table described below, they don't explicitly appear in the client software implementations.

We propose deprecating a method as soon as it becomes irrelevant to the protocol as a general rule of thumb.

For instance, engine_exchangeTransitionConfigurationV1 has been used for the Merge transition and isn't required anymore, thus, should be deprecated. A post-Shanghai HF bumping some core methods to V3 would be a good moment to deprecate V1 versions of these methods introduced in Paris, or it even can be done some time after the Shanghai block event.

Table

Maintenance of the capabilities table is proposed to keep track of all Engine API methods and their statuses. A change of a status of any method version should be signified by the corresponding update in this table.

The table looks as follows:

Method Source Status
engine_newPayloadV1 Bellatrix Final
engine_forkchoiceUpdatedV1 Bellatrix Final
engine_getPayloadV1 Bellatrix Final
engine_getPayloadBodiesByRangeV1 Deduplication Optional
engine_getPayloadBodiesByHashV1 Deduplication Optional
engine_newPayloadV2 Capella Experimental
engine_forkchoiceUpdatedV2 Capella Experimental
engine_getPayloadV2 Capella Experimental
engine_getBlobsBundleV1 EIP-4844 Experimental
engine_exchangeTransitionConfigurationV1 Bellatrix Deprecated

Note: "Source" column is added for clarity and may be omitted.

Note: We may want to keep a lean version of the table in the README which doesn't include old deprecated and stale experimental methods, and track statuses of entire set of methods and versions in a separate markdown file.

Directory structure

The following directory structure is proposed:

execution-apis/src/engine
├─ schemas/           # Schema files
│  ├─ paris.yaml
│  ├─ shanghai.yaml
│  ├─ deduplication.yaml
├─ common.md         # Common definitions and specification
├─ paris.md          # Paris HF
├─ shanghai.md       # Shanghai HF
├─ eip4844.md        # Specification of a feature
                     # that may be included in an upcoming HF
                     
├─ deduplication.md  # Specification of a feature outside of any HF
├─ README.md         # Description of the process and lifecycle,
                     # summary of capabilities
├─ capabilities.md   # List of all capabilities

Method and spec document lifecycle

This section recommends a general procedure of working with spec documents and tracking Engine API methods/versions.

Every method and a new version of a method starts its way with Experimental status whether it is a new EIP that is planned for inclusion in one of the next HFs or simply a method providing an optimisation opportunity and tends to be deployed between HFs. This allows for a quick PR merge into Engine API specification and facilitates prototyping by adding experimental methods into the list returned by engine_getCaps call. Note that new methods should not be added to any document that is representing a HF that has already happened and which specification is considered as final.

Once the work on the HF gets more mature and the scope of changes to Engine API is more or less understood, it does make sense to move methods from multiple separate files into one hardfork.md file. When the work is done and HF specification is considered as final then statuses of involved methods should be changed to Final and the document should not have any further updates except for bug fixes.

For methods introduced outside of HFs the same process should be applied. When specification of optional method is finalized its status must be changed to the Optional instead of Final.

Whenever method specification is finalized its description should be added to the OpenRPC schema.

Deprecation of a method version should be reflected in the capabilities table by changing the status of the version to the Deprecated and removing it from the OpenRPC schema. Once this happens EL clients are free to get rid of the code of deprecated method version from their codebases, and remove it from the response to engine_getCaps method call. Note that documents containing specification of already deprecated methods must remain unchanged in the repository as other specification documents may refer to them.

Sometimes the deprecation flow may require more steps. For instance, let's see how deprecation of the engine_exchangeTransitionConfigurationV1 may work out:

  1. CL clients remove hard dependency on making this call. It implies that if the method is still in the capabilities list, CL must follow the previous convention and call it periodically to prevent an EL client from complaining.
  2. After that the status of this method is set to Deprecated and EL client devs remove its support.
  3. CL client code may be cleaned up when all EL clients have removed engine_exchangeTransitionConfigurationV1 from the list of supported methods.
@lightclient
Copy link
Member

Thanks a lot for writing this up! 👍 from me on the capabilities method. I think this is important information for the CL to ascertain.

I do wonder if we really want/need optional methods? My worry is about the extra degree of freedom this gives clients and could lead to a bit of frustration if CLs rely on the method and certain ELs don't provide it.

I'm not sure I follow what the content of the HF markdown files will be vs. the general spec. It might be useful to consider the two paths to upgrading the functionality of a method:

  • version bump -- client knows it is getting the new functionality because the method name is different
  • hardfork -- client knows (to the best of its ability) it is getting the new functionality because the chain has been upgraded

I think we've all agreed we'd like to bump the version always, regardless of HF or not. In that case, it feels we may be over indexing a bit on the HF side of things? Would it be better to lay out the spec more in terms of functionality (see current layout of src/eth) and make notes about the functions behavior given certain HFs?

Somewhat related, should we consider whether API changes are totally backwards compatible? Contrived example: we add support for withdrawals and one day decide to remove them, remove it from the engine API, and then it isn't possible to run a chain that supports withdrawals?

@mkalinin
Copy link
Collaborator Author

@lightclient thanks a lot for your feedback!

I do wonder if we really want/need optional methods? My worry is about the extra degree of freedom this gives clients and could lead to a bit of frustration if CLs rely on the method and certain ELs don't provide it.

Optional methods is an alternative to fallback on error approach, demand for optionality is only represented by this proposal #218. We may not care about optionality in the first iteration of setting up the process and debate on it later on in a separate iteration.

Would it be better to lay out the spec more in terms of functionality (see current layout of src/eth) and make notes about the functions behavior given certain HFs?

This is interesting idea, and it is more practical in some sense. Like we have a separate file for say newPayload which contains all versions of this function and call this file new-payload.md (I guess this is what you mean), it would be easier to spec them out this way and look for the history of changes. One of the downsides is that we have structures shared by different methods, e.g. ExecutionPayload shared by newPayload and getPayload but I think it's manageable. The other property of HF distinction is that we can discern yet experimental methods from solidified parts of the spec, but we may have experimental/ folder where files will be broken down not by functionality but rather by proposals that may affect all functionality at once, e.g. eip4844.md.

Somewhat related, should we consider whether API changes are totally backwards compatible?

I think we should follow the common sense, and make them backward compatible when it is reasonable, but do not make backward compatibility promise.

Contrived example: we add support for withdrawals and one day decide to remove them, remove it from the engine API, and then it isn't possible to run a chain that supports withdrawals?

This proposal suggests to have a reference table of all Engine API methods with their statuses, and instead of removing method from the spec we should mark it as Deprecated in that table and remain the spec for this method unchanged. So, one will be able to either dig out this method from client's code or just re-implement it.

@mkalinin
Copy link
Collaborator Author

Alternative structure with functional breakdown and statuses next to each version of a method (no need for a reference table):

├─ experimental/      # Experimental features
│  ├─ eip4844.md
│  │  ├─ Structures
│  │  │  ├─ BlobsBundleV1
│  │  │  ├─ PayloadAttributesV1
│  │  │  ├─ PayloadAttributesV2
│  │  ├─ Methods
│  │  │  ├─ getBlobsBundle
│  │  │  │  ├─ V1
│  ├─ feature-name.md
│  │  ├─ Structures
│  │  │  ├─ ExecutionPayloadFeatureName
│  │  ├─ Methods
│  │  │  ├─ newPayloadFeatureName
│  │  │  ├─ getPayloadFeatureName
│
├─ schemas/           # Schema files
│  ├─ blob.yaml
│  ├─ capability.yaml
│  ├─ configuration.yaml
│  ├─ forkchoice.yaml
│  ├─ payload.yaml
│  ├─ payload-body.yaml
│	
├─ common.md
│  ├─ Underlying protocol
│  ├─ Versioning
│  ├─ Message ordering
│  ├─ Load-balancing and advanced configurations
│  ├─ Errors
│  ├─ Timeouts
│  ├─ Capabilities
│  │  ├─ getCaps
│
├─ configuration.md
│  ├─ Structures
│  │  ├─ TransitionConfigurationV1
│  ├─ Methods
│  │  ├─ exchangeTransitionConfigurationV1
│  │  │  ├─ V1: Deprecated
│
├─ forkchoice.md
│  ├─ Structures
│  │  ├─ ForkchoiceStateV1
│  │  ├─ PayloadAttributesV1
│  │  ├─ PayloadAttributesV2
│  ├─ Routines
│  │  ├─ Payload building
│  ├─ Methods
│  │  ├─ forkchoiceUpdated
│  │  │  ├─ V1: Final
│  │  │  ├─ V2: Draft
│
├─ payload.md
│  ├─ Structures
│  │  ├─ ExecutionPayloadV1
│  │  ├─ ExecutionPayloadV2
│  │  ├─ PayloadStatusV1
│  │  ├─ WithdrawalV1
│  ├─ Routines
│  │  ├─ Payload validation
│  │  ├─ Sync
│  │  ├─ Payload building
│  ├─ Methods
│  │  ├─ newPayload
│  │  │  ├─ V1: Final
│  │  │  ├─ V2: Draft
│  │  ├─ getPayload
│  │  │  ├─ V1: Final
│  │  │  ├─ V2: Draft
│
├─ payload-body.md
│  ├─ Structures
│  │  ├─ ExecutionPayloadBodyV1
│  ├─ Methods
│  │  ├─ getPayloadBodiesByHash
│  │  │  ├─ V1: Final
│  │  ├─ getPayloadBodiesByRange
│  │  │  ├─ V1: Optional

cc @lightclient

@arnetheduck
Copy link
Contributor

arnetheduck commented Nov 24, 2022

New engine_getCaps method is proposed

While it's not wrong to have a capabilities function, reality is such that users upgrade their systems independently of each other and/or run round-robins on multiple versions of the same API - hence, when consuming the API we rarely assume that such support stays stable.

Instead, in the ideal case we would standardise:

  • an error code for when an API is missing (to cover "old clients" not yet supporting an API") (ie 404)
  • an error code for when an API has been removed permanently (ie 410)

Clients then operate the following way:

  • the try the newest API - if this gives a 404, they fall back on an earlier version, while at the same time notifying the user that they likely need to upgrade the API server
  • if the earlier version gives 410, they can notify their users they need to upgrade their API client

This optmistic strategy works well in practice, throughout the lifecycle of a feature: during testnets and development, it uses the latest version available without making any assumptions about support in the EL - then, when the feature is "released", it takes advantage of the new version where possible (without any messy assumptions about fork activation epochs etc) and gracefully falls back - finally, once the deprecation period for the old version has passed, old consumers can degrade gracefully as well (by showing an appropriate error).

This strategy also implies that when we add new API:

  • the new API should ideally cover the "previous" version (ie V-1 but not necessarily V-2)
  • ditto implementations should provide both API for a few versions

It is unlikely we would use a getCapabilities function with the above in place because that only introduces an extra call / complexity that still has to be handled at the per-call level - in other words: capabilities introduce an additional way interacting with the EL can fail but not much benefit, because the same information also has to be dealt with on a per-call basis.

@mkalinin
Copy link
Collaborator Author

run round-robins on multiple versions of the same API

I am not sure I understand this in the context of Engine API. If CL drives multiple ELs (not sure that this is what you meant by running round-robins) then there should be a multiplexer propagating calls to all of them, if Engine API versions of those ELs are different it can be an issue with whatever approach we take.

an error code for when an API is missing (to cover "old clients" not yet supporting an API") (ie 404)

We already have such error: -32601: Method not found.

if the earlier version gives 410, they can notify their users they need to upgrade their API client

Considering we agree that V-1 stays for quite long period of time, the case when 410 would be hit is pretty rare. I can imagine this happening when we're deprecating exchangeTransitionConfiguration as this method is going to be removed entirely. Based on 404 only CL may surface a complain to a user that it doesn't see a required Engine API method and either CL or EL should be upgraded. Supporting such status implies EL remembers all deprecated methods which doesn't seem to be worth it.

This optmistic strategy works well in practice

CL client will fallback to V-1 every method call until EL is upgraded, this is suboptimal. There can be more fancy logic when CL tries V+1 every Nth calls and if succeeded starts using the new version onwards. While getCapabilities allows for the following strategy:

  • CL knows the latest version of each method it is compatible with
  • CL calls getCapabilities every N minutes in parallel to the main communication flow, and if newer version of existing method or a new method is found, CL starts using it from now and onwards. This is why backwards compatibility at least up to V-1 is important.
  • This strategy may also utilise 404 to stop calling deprecated method if it is expected to disappear

IMO, both strategies may co-exist, if no CL team will use getCapabilities then we can avoid specifying and implementing it. Standardising 410 doesn't sound like a good idea to me because it requires a list of old methods kept by EL client software

@arnetheduck
Copy link
Contributor

CL client will fallback to V-1 every method call until EL is upgraded, this is suboptimal.

It's suboptimal but also temporary and cheap - further, it's those that haven't upgraded that pay the cost, which is appropriate: once everyone has upgraded (as happens eventually), the cost goes back to 0.

There can be more fancy logic

This fancy logic is surface area for bugs and breaks redundant round-robin setups - it's easier to not make this assumption thanks to how cheap it is to make a call, and if that fails, make another (ie this is really simple and cheap: you maintain a connection then try one after the other) - it's also foolproof in that the consumer doesn't have to maintain state beyond the "logical" request. Capabilities calls make simple scenarios more simple and optimal, but make more advanced scenarios harder or impossible.

IMO, both strategies may co-exist

True, it's not like it disturbs anyone, except those that have to implement the server and keep it up to date ;)

Standardising 410 doesn't sound like a good idea to me because it requires a list of old methods kept by EL client software

it's not required - it's more of a quality of implementation matter where during the transition period, we often get users that haven't upgraded part of their setup - 410 (or a reasonable -xxxxx code, whichever fits the current design better) give a way to give them a better error message - servers don't have to support this, but it can be used to good effect.

@mkalinin
Copy link
Collaborator Author

True, it's not like it disturbs anyone, except those that have to implement the server and keep it up to date ;)

Once getCapabilities is implemented there is no need to update the list, the list should be self-adjusting. I feel like implementation cost of this method is really low

@mkalinin
Copy link
Collaborator Author

@arnetheduck

it's those that haven't upgraded that pay the cost

This is not always true. If #218 makes into the spec, CL clients will have to use fallback approach or alternatively cache Engine API server configuration.

@arnetheduck
Copy link
Contributor

This is not always true.

While this is true in theory, I believe that if all CL:s start using that request (or any other similar "semi-optional" API), I'm pretty sure no EL will be want to be left behind as the slow API implementation and we'll see conformity within a few versions - fallback merely gives us a way to cover the interim period.

The EL-CL api is special in that there are only so many clients implementing it - it's not a generic thing where we expect a lot of "partial" implementations, so I'm inclined to believe that we'll mostly be using "full" implementations.

@lightclient
Copy link
Member

lightclient commented Nov 29, 2022

I'm 👍 on the functional structure of the spec. Minor nit: might be better to combine payload-body.md into payload.md. I don't feel strongly about it though.

And re: getCaps -- I'm mostly indifferent. I will wait to see what CL teams prefer.

@djrtwo
Copy link
Contributor

djrtwo commented Dec 8, 2022

Will just toss in that I like fork-diff format. It works well for CL specs. This path, imo, is optimized for spec writers and sophisticated readers (e.g. client teams). "What do I need to know/change in relation to what I already know".

Given this is not a user facing API, I bias toward optimizing for hte sophisticated user at the potential cost of slowing of onboarding of new users of the api

@mkalinin
Copy link
Collaborator Author

mkalinin commented Dec 9, 2022

I bias toward optimizing for hte sophisticated user at the potential cost of slowing of onboarding of new users of the api

OpenRPC schema is supposed to contain a reference of all stable and not yet deprecated Engine API methods, i.e. methods that EL clients must support. Therefore, it should alleviate the pain of non-experienced reader and reduce the entrance barrier.

Consideration of recent arguments makes me in favour of by-fork decomposition.

@ethDreamer
Copy link
Contributor

Strongly support engine_getCapabilities (provided it's called engine_getCapabilities and not engine_getCaps since none of the other method names are truncated).

In fact I've already stubbed out this functionality in lighthouse based on earlier talks that a function of this nature would be supported. Lighthouse essentially caches a data structure that enumerates the methods that the execution engine supports. It's trivial to implement functionality that would refresh this data structure (with a call to engine_getCapabilities) periodically or if lighthouse has detected the execution engine has gone offline (to update to a newer version for example).

reality is such that users upgrade their systems independently of each other and/or run round-robins on multiple versions of the same API - hence, when consuming the API we rarely assume that such support stays stable.

lighthouse docs explicitly state that we do not support 1:many or many:1 configurations between CL and EE. You would certainly require a multiplexer to support this. And if you've already written a custom multiplexer then it should be trivial to handle parsing the response from engine_getCapabilities and returning the highest version method supported by all execution engines.

@mkalinin
Copy link
Collaborator Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants