Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoreJam #31

Closed
wants to merge 50 commits into from
Closed

CoreJam #31

wants to merge 50 commits into from

Conversation

gavofyork
Copy link
Contributor

This is a proposal to fundamentally alter the workload done on the Polkadot Relay-chain, both in terms of that which is done "on-chain", i.e. by all Relay Chain Validators (Validators) as well as that which is done "in-core", i.e. distributed among subsets of the Validators (Validator Groups). The target is to create a model which closely matches the underlying technical architecture and is both generic and permissionlessly extensible.

In the proposed model, code is stored on-chain with two entry-points. Workloads are collated and processed in-core (and thus parallelized) using one entry-point, whereas the refined outputs of this processing are gathered together and an on-chain state-machine progressed according to the other.

While somewhat reminiscent of the Map-Reduce paradigm, a comprehensive analogy cannot be taken: the in-core processing code does not transform a set of inputs, but is rather used to refine entirely arbitrary input data collected by some third-party. Instead, and in accordance, we term it Collect-Refine-Join-Accumulate.

text/0027-corejam.md Outdated Show resolved Hide resolved
text/0027-corejam.md Outdated Show resolved Hide resolved
text/0027-corejam.md Outdated Show resolved Hide resolved
text/0027-corejam.md Outdated Show resolved Hide resolved
text/0027-corejam.md Outdated Show resolved Hide resolved
text/0027-corejam.md Outdated Show resolved Hide resolved
text/0027-corejam.md Outdated Show resolved Hide resolved
text/0027-corejam.md Outdated Show resolved Hide resolved
text/0027-corejam.md Outdated Show resolved Hide resolved
text/0027-corejam.md Outdated Show resolved Hide resolved
text/0031-corejam.md Outdated Show resolved Hide resolved
text/0031-corejam.md Outdated Show resolved Hide resolved
text/0031-corejam.md Outdated Show resolved Hide resolved
text/0031-corejam.md Outdated Show resolved Hide resolved
text/0031-corejam.md Outdated Show resolved Hide resolved
text/0031-corejam.md Outdated Show resolved Hide resolved
text/0031-corejam.md Show resolved Hide resolved
gavofyork and others added 4 commits October 18, 2023 23:39
Co-authored-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>
Co-authored-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>
Co-authored-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>

### Notes for implementing the Actor Progression model

Actor code is stored in the Storage Pallet. Actor-specific data including code hash, VM memory hash and sequence number is stored in the Actor Work Class Trie under that Actor's identifier. The Work Package would include pre-transition VM memories of actors to be progressed whose hash matches the VM memory hash stored on-chain and any additional data required for execution by the actors (including, perhaps, swappable memory pages). The `refine` function would initiate the relevant VMs and make entries into those VMs in line with the Work Package's manifest. The Work Output would provide a vector of actor progressions made including their identifer, pre- and post-VM memory hashes and sequence numbers. The `accumulate` function would identify and resolve any conflicting progressions and update the Actor Work Class Trie with the progressed actors' new states. More detailed information is given in the Coreplay RFC.
Copy link

@burdges burdges Oct 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ain't necessarily optimal to store VM memory on-chain. We might instead try bespoke erasure code handling for them. Aka the memory lives continuously as erasure coded pages. This involves considerable off-chain logic not particularly relevant here, so maybe we should split this multi-block polkavm session into a separate RFC?

}
type MaxWorkPackageSize = ConstU32<5 * 1024 * 1024>;
struct EncodedWorkPackage {
version: u32,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason to use 4 bytes for version over a single byte (u8)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future proofing.

| *Integration* | Inclusion | Irreversible transition of state |
| *Builder* | Collator | Creator of data worthy of Attestation |

Additionally, the *Work Class Trie* has no immediate analogue, but may be considered as the Relay-chain state used to track the code and head data of the parachains.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want to give a quick def of work class here? Otherwise it is a bit hard for me to follow when reading the rest of the docs without much idea of what is work class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Work class being for example CoreChains the parachain implementation on top of CoreJam. Or CorePlay


(The number of prerequisites of a Work Package is limited to at most one. However, we cannot trivially control the number of dependents in the same way, nor would we necessarily wish to since it would open up a griefing vector for misbehaving Work Package Builders who interrupt a sequence by introducing their own Work Packages with a prerequisite which is within another's sequence.)

Work Items are a pair of class and payload, where the `class` identifies the Class of Work to be done in this item (*Work Class*).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example will be great as I still have no idea what is work class

text/0031-corejam.md Outdated Show resolved Hide resolved
text/0031-corejam.md Outdated Show resolved Hide resolved

| CoreJam model | Legacy model | Context |
| --- | --- | --- |
| *Work Package* | Proof-of-Validity | Untrusted data provided to RcVG |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RcVG is Relaychain Validator Group right? Better to introduce this abbreviation before usage

Copy link

@burdges burdges Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always call this backing group, or backing checkers, when discussing the validation protocol. They back the candidate, and can be slashed, but their checks do not provide security. It's only the approval checkers who provide security. Approval checkers are out of scope here, but the distinction is critical to polkadot overall.

type Authorizers = StorageMap<AuthId, Authorizer>;
```

An *Authorization* is simply a blob which helps the Authorizer recognize a properly authorized Work Package. No constraints are placed on Authorizers over how they may interpret this blob. Expected authorization content includes signatures, Merkle-proofs and more exotic succinct zero-knowledge proofs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be compute time limited and then for deterministic reason, the execution needs to be metered?
Also AuthParamSize is only 1kb, and that most likely need to include some merkle proofs and a builder signature. I am a bit worried it may not be enough for some complicated cases. We really need some PoC to determine the max size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be compute time limited and then for deterministic reason, the execution needs to be metered?

Also AuthParamSize is only 1kb, and that most likely need to include some merkle proofs and a builder signature. I am a bit worried it may not be enough for some complicated cases. We really need some PoC to determine the max size.

AuthParamSize only concerns the parameterisation of the Authorizer. Proofs would go in the Authorization which is (currently) of unlimited size.

is_authorized is indeed metered and weight limited and there is a system-wide limit for this.

fn apply_refine(item: WorkItem) -> WorkResult;
```

The amount of weight used in executing the `refine` function is noted in the `WorkResult` value, and this is used later in order to help apportion on-chain weight (for the Join-Accumulate process) to the Work Classes whose items appear in the Work Packages.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you define weight? Is it only compute time or 2d weight including proof size? In this case, the proof size is completely irrelevant right?

For relaychain, it shouldn't really care about proof size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's equivalent to Relay-chain weight, without the need to consider proof-sizes.


#### Reporting and Integration

There are two main phases of on-chain logic before a Work Package's ramifications are irreversibly assimilated into the state of the (current fork of the) Relay-chain. The first is where the Work Package is *Reported* on-chain. This is proposed through an extrinsic introduced by the RcBA and implies the successful outcome of some *Initial Validation* (described next). This kicks-off an off-chain process of *Availability* which, if successful, culminates in a second extrinsic being introduced on-chain shortly afterwards specifying that the Availability requirements of the Work Report are met.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't figure out what is RcBA until a few paragraphs down.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relay chain block author 🙈


During this execution, all host functions above may be used except `checkpoint()`. The operation may result in error in which case all changes to state are reverted, including the balance transfer. (Weight is still used.)

Other host functions, including some to access Relay-chain hosted services such as the Balances and Storage Pallet may also be provided commensurate with this executing on-chain.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: so we will never remove balances pallet from relaychain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Balances wouldn't be hosted by the balances pallet since most of its functionality (extrinsics, ED, holds, freezes) is unneeded.

gavofyork and others added 2 commits October 21, 2023 09:11
Co-authored-by: Xiliang Chen <xlchen1291@gmail.com>
Co-authored-by: Xiliang Chen <xlchen1291@gmail.com>
Comment on lines 405 to 424
fn get_work_storage(key: &[u8]) -> Option<Vec<u8>>;
fn get_work_storage_len(key: &[u8]) -> Option<u32>;
fn checkpoint() -> Weight;
fn weight_remaining() -> Weight;
fn set_work_storage(key: &[u8], value: &[u8]) -> Result<(), ()>;
fn remove_work_storage(key: &[u8]);
fn set_validators(validator_keys: &[ValidatorKey]) -> Result<(), ()>;
fn set_code(code: &[u8]) -> Result<(), ()>;
fn assign_core(
core: CoreIndex,
begin: BlockNumber,
assignment: Vec<(CoreAssignment, PartsOf57600)>,
end_hint: Option<BlockNumber>,
) -> Result<(), ()>;
fn transfer(
destination: WorkClass,
amount: u128,
memo: &[u8],
weight: Weight,
) -> Result<Vec<u8>, ()>;
Copy link
Contributor

@tomaka tomaka Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these host functions are clashing with the idea in #4.

All the functions that return a Result should ideally return a number indicating success or failure. While this might seem like a no-brainer, if someone just copy-pastes these definitions in Substrate, the code generated by Substrate will instead use the allocator just to allocate one byte and write 0 or 1 in it.

I would suggest turning get_work_storage into read_work_storage, where a pointer to buffer and a maximum length are passed as parameter.

As for transfer, I didn't figure out what is being returned, so don't have an opinion.

text/0031-corejam.md Outdated Show resolved Hide resolved

The need of validators to be rewarded for doing work they might reasonably expect to be useful competes with that of the Coretime procurers to be certain to get work done which is useful to them. In Polkadot 1.0, validators only get rewarded for PoVs ("work packages") which do not panic or overrun. This ensures that validators are well-incentivized to ensure that their computation is useful for the assigned parachain. This incentive model works adequately where all PVF code is of high quality and collators are few and static.

However with this proposal (and even the advent of on-demand parachains), validators have little ability to identify a high-quality Work Package builder and the permissionless design means a greater expectation of flawed code executing in-core. Because of this, we make a slightly modified approach: Work Packages must have a valid Authorization, i.e. the Coretime-assigned `is_authorized` returns `true` when provided with the Work Package. However, Validators get rewarded for *any* such authorized Work Package, even one which ultimately panics or overruns on its evaluation.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the document, it is mentioned that for any work package which overruns or panics, the is_authorized will return false.

If it overruns this limit or panics on some input, it is considered equivalent to returning false.

However, here it is mentioned that validators will be rewarded for any work package that is authorized, so isn't the following sentence contradictory?

However, Validators get rewarded for *any* such authorized Work Package, even one which ultimately panics or overruns on its evaluation.

I would assume that a work package cannot be authorized if it panics or overruns. Or I may not correctly understand what is meant by evaluation here.

Copy link

@burdges burdges Nov 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This RFC is already to big to discuss rewards here.

It's one thing if the relay chain sells core time which never gets used, but we likely burn those funds. As for validators era points..

We'll pay backers for inclusion/integration of candidates, not before. We cannot pay backers for candidates which they download, run, and then abandon. It's unimportant why they abandon the candidates, ala invalidity, load, etc, because if they can extract rewards without availability then under some bandwidth cost profiles they'll become profitable stalling cores.

Approval checker & availability provider rewards require subtle approximations to run off-chain, and without authentication, so none of rewards statements here work in practice. A prori, approval checkers should take 80% of the era points, with babe/sassafras being like 15%, and backing being maybe 2%, except maybe availability providers should receive more than 3%.

It already became problematic that we check 30 VRFs and 30 signatures per candidate, which we're fixing. We definitely do not have enough on-chain CPU time to do lots of compute about who gets paid what for each candidate.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intention with my comment wasn't to further discuss rewards here. I just noticed two statements in the RFC that seemed contradictory to me. In one place, it is mentioned that a work package that overruns or panics cannot be authorized, and then later it is mentioned that validators will be rewarded for any authorized work packages even if they overrun or panic.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, authorization cannot know if some panic occurs later. It's possible dot holders wind up rewarded for failed work. We rewards precisely during disputes too. We cannot reward backers for failed work, only successful work on a finalized fork of the relay chain. We only approximately reward approval checkers for successful work on a finalized fork of the relay chain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intention with my comment wasn't to further discuss rewards here. I just noticed two statements in the RFC that seemed contradictory to me. In one place, it is mentioned that a work package that overruns or panics cannot be authorized, and then later it is mentioned that validators will be rewarded for any authorized work packages even if they overrun or panic.

An authorized Work Package which overruns is still reported. If the authorization code itself (which is not part of the WP) overruns or panics, then it is not authorized. The WP itself never executes.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot pay backers until after inclusion/integration of candidates, including execution. We risk it being profitable to stall cores otherwise.

We cannot execute the overrunning work package either because this breaks other invariants, including some essential for wasm.

As noted above, backers represent only a relatively minor duty worth only like 2% of rewards. We should not break other compoinents to reward backers more precisely. Instead we should solve builder/collator spam via solfter means, like reputaiton or just expect projects solve it, given it'll usually be their buggy code.

Copy link

@eskimor eskimor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally it makes sense 🥳 . I think the quite abstract description in this RFC would benefit greatly from concrete examples of how concepts introduced would look in the parachains or actor model.

Left a few remarks, some concerns with regards to implementation and otherwise mostly testing my understanding.

If my understanding is correct, work packages lend themselves well for cores which occupy multiple CPU cores (ala core groups), as work items should be executable in parallel. Also it should be possible to use resources more efficiently. E.g. we could have a service that mostly uses availability and builders could use tasks of that service to fill up any unused space.


(The number of prerequisites of a Work Package is limited to at most one. However, we cannot trivially control the number of dependents in the same way, nor would we necessarily wish to since it would open up a griefing vector for misbehaving Work Package Builders who interrupt a sequence by introducing their own Work Packages with a prerequisite which is within another's sequence.)

Work Items are a pair where the first item, `service`, itself identifies a pairing of code and state known as a *Service*; and the second item, `payload`, is a block of data which, through the aforementioned code, mutates said state in some presumably useful way.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First time reading this, I understood this as the service being the analogous to a PVF + state root, while in reality it is code that would trigger execution of something like a PVF (stored in the service trie) based on the logic of the service. A service therefore is a "kind" of execution. E.g. all of parachains.

The refine function of the parachains service would, based on state in the service trie and the data in the work item fetch PVF code from the service trie for the parachain referenced in the work item and pass it to the host for execution.

For even more clarity: I would expect the refine function to lookup a ParaId (or something like it) as referenced in something like the "header" of the work item, to lookup the code which will accept the work item as its input (The PVF).

The `code_hash` of the Authorizer is assumed to be the hash of some code accessible in the Relay-chain's Storage pallet. The procedure itself is called the *Authorization Procedure* (`AuthProcedure`) and is expressed in this code (which must be capable of in-core VM execution). Its entry-point prototype is:

```rust
fn is_authorized(param: &AuthParam, package: &WorkPackage, core_index: CoreIndex) -> bool;
Copy link

@eskimor eskimor Nov 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was very happy to see the CoreIndex here, because I think it is indeed needed. Consider the case of elastic scaling/using multiple cores. By providing a CoreIndex we can ensure that packages are not conflicting, as we can make it so that it is authorized on one core, but not the other.

This works well with bulk, where core time is bought on a specific core. It does less so with on-demand (but no longer provided anyway), where currently the implementation will just pick any core. For bulk this should really work though: Given that we have access to relay chain context, we could abstract away from the actual index and do logic like: This is the third core index, that is assigned to our task -> good. So the exact core index would not matter (and the authorizer would not need to change), all that needs to stay stable is ordering.

/// of its Work Items.
struct WorkReport {
/// The specification of the underlying Work Package.
package_id: WorkPackageId,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WorkPackageSpec?

}
type WorkOutputLen = ConstU32<4_096>;
type WorkOutput = BoundedVec<u8, WorkOutputLen>;
fn refine(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For things like CorePlay, is it assumed that WorkPayload itself might be split up even more - containing data of multiple actors which are co-scheduled? Or will it be one refine per actor?

If it is is the former, then the whole purpose of classed work items is to be able to pack tasks of different classes into one WorkPackage, but would not be needed for packing together multiple tasks of just one class. I will assume this is the case, because:

The latter would have been my intuition, but it would not be clear how synchronous composability would be achieved there.

This is also interesting, because we could define that work items are allowed to be executed/processed in parallel (on multiple CPU cores) - Hence they must be independent (even if of the same class), while you will have a single threaded envrionment within a single work item.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put differently: The ability at this level to put multiple workitems in a work package of different or the same class is purely a matter of efficient resource usage - correct? At this level "being co-scheduled" does not offer any advantages or is conceptually any different for a task, than getting scheduled/packed in different packages.

In any interpretation: A single prerequisite per package seems limiting, but given that class writers will have to limit at which Authorizer a given task may be executed (griefing attacks)*), we can actually make it a proper chain for a given Authorizer and thus for all the tasks associated with it. I would expect a prerequisite belonging to a different Authorizer a very rare occurrence.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*) The griefing attack should actually not work for the case of multiple work items, so parachains should be fine. While a parachain could get a block of the same height into multiple builder networks and thus multiple work packages - forcing a prune later, it would only harm itself, assuming builders charge the parachain for their service. If we have multiple actors in a single work item, then an actor targeting multiple builder networks, could actually take other actors with it. E.g. doing its thing on one network (not co-scheduled), while being co-scheduled on another network - which then gets pruned, because of the conflict.

Again we have to assume the actor gets billed for this, so the attack is not free, but if there are other gains e.g. MEV/starving competition/.. it might still be an interesting attack.

But then the single prerequisite still seems interesting: If dependencies of your work package, could exist in packages of multiple builders, how do you pick a single correct prerequisite? Likely, you will just always use "your" previous work package.

Put differently, a builder network can guarantee no conflicts in its own dependency chain and will assume that there are no conflicts with others, which may or may not be enforced by the service.

Blind enforcing might do more harm than good though as then a builder network can then easily censor a task. 😞

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Limiting tasks to a particular builder network (at least temporarily) also makes sense for other practical reasons. If tasks were valid on all networks, where would users then send transactions for a particular task to? If they sent it to any builder network, then for processing those transactions you would force the builders to schedule that task, risking a conflict. (Not in the interest of the task) Or they just don't accept the transactions, but for this they would need to know that they are not responsible.

Long story short, I believe we need some form of scheduling/assignment here. If we assume builder networks themselves to be decentralized, then the risk of them censoring a particular task should be mitigated. Hence, fixed assignments might be sensible.


Being *on-chain* (rather than *in-core* as with Collect-Refine), information and computation done in the Join-Accumulate stage is carried out (initially) by the Block Author and the resultant block evaluated by all Validators and full-nodes. Because of this, and unlike in-core computation, it has full access to the Relay-chain's state.

The Join-Accumulate stage may be seen as a synchronized counterpart to the parallelised Collect-Refine stage. It may be used to integrate the work done from the context of an isolated VM into a self-consistent singleton world model. In concrete terms this means ensuring that the independent work components, which cannot have been aware of each other during the Collect-Refine stage, do not conflict in some way. Less dramatically, this stage may be used to enforce ordering or provide a synchronisation point (e.g. for combining entropy in a sharded RNG). Finally, this stage may be a sensible place to manage asynchronous interactions between subcomponents of a Service or even different Services and oversee message queue transitions.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which cannot have been aware of each other during the Collect-Refine stage

As argued above, if tasks are sticky to a particular builder network, then they actually can be aware of each other. I would expect this to be the common case as conflicts in accumulate are costly.

Doesn't change anything fundamentally here though, as we still need to handle conflicts if they occur.


There is an amount of weight which it is allowed to use before being forcibly terminated and any non-committed state changes lost. The lowest amount of weight provided to `accumulate` is defined as the number of `WorkResult` values passed in `results` to `accumulate` multiplied by the `accumulate` field of the Service's weight requirements.

However, the actual amount of weight may be substantially more. Each Work Package is allotted a specific amount of weight for all on-chain activity (`weight_per_package` above) and has a weight liability defined by the weight requirements of all Work Items it contains (`total_weight_requirement` above). Any weight remaining after the liability (i.e. `weight_per_package - total_weight_requirement`) may be apportioned to the Services of Items within the Report on a pro-rata basis according to the amount of weight they utilized during `refine`. Any weight unutilized by Classes within one Package may be carried over to the next Package and utilized there.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the more weight was consumed in core, the more weight you are allowed to consume on chain? If so, what is the reasoning here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seemed a sensible path. It's not something I'm especially wedded to though.

fn on_transfer(source: Service, amount: u128, memo: Vec<u8>, buffer_len: u32) -> Result<Vec<u8>, ()>;
```

During this execution, all host functions above may be used except `checkpoint()`. The operation may result in error in which case all changes to state are reverted, including the balance transfer. (Weight is still used.)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so, that a weight overrun is not fatal and the service' tasks can proceed, albeit at a lower rate? For the panic case, I assume the idea is that a code upgrade is provided, fixing the situation and then we can proceed with the check point. Although is a check point still valid, once we changed the code?

Copy link
Contributor Author

@gavofyork gavofyork Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was changed in the latest prototype; on_transfer now executes at the end of the block.


However, there is a variable delay between a Work Report first being introduced on-chain in the Reporting and its eventual Integration into the Service's State due to the asynchronous Availability Protocol. This means that requiring the order at the point of Reporting is insufficient for guaranteeing that order at the time of Accumulation. Furthermore, the Availability Protocol may or may not actually complete for any Work Package.

Two alternatives present themselves: provide ordering only on a *best-effort* basis, whereby Work Reports respect the ordering requested in their Work Packages as much as possible, but it is not guaranteed. Work Reports may be Accumulated before, or even entirely without, their prerequisites. We refer to this *Soft-Ordering*. The alternative is to provide a guarantee that the Results of Work Packages will always be Accumulated no earlier than the Result of any prerequisite Work Package. As we are unable to alter the Availability Protocol, this is achieved through on-chain queuing and deferred Accumulation.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Work Reports may be Accumulated before, or even entirely without, their prerequisites

How? If the dependency is real, then we are missing a state transition needed to make this sound.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be the second option. In the first option, then it would be up to accumulate to do any needed queuing to ensure soundness for itself.


In this alternative, actual ordering is only guaranteed going *into* the Availability Protocol, not at the point of Accumulation.

The (on-chain) repercussion of the Availability Protocol completing for the Work Package is that each Work Result becomes scheduled for Accumulation at the end of the Relay-chain Block Execution along with other Work Results from the same Service. The Ordering of Reporting is replicated here for all Work Results present. If the Availability Protocol delays the Accumulation of a prerequisite Work Result, then the dependent Work Result may be Accumulated in a block prior to that of its dependency. It is assumed that the *Accumulation* logic will be able to handle this gracefully.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think I am getting it: Gracefully means dropping every work item that in fact is dependent on the prerequisite.


A *Work Package* is an *Authorization* together with a series of *Work Items* and a context, limited in plurality, versioned and with a maximum encoded size. The Context includes an optional reference to a Work Package (`WorkPackageHash`) which limits the relative order of the Work Package (see **Work Package Ordering**, later).

(The number of prerequisites of a Work Package is limited to at most one. However, we cannot trivially control the number of dependents in the same way, nor would we necessarily wish to since it would open up a griefing vector for misbehaving Work Package Builders who interrupt a sequence by introducing their own Work Packages with a prerequisite which is within another's sequence.)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly an issue if we allowed prerequisites to point to work packages of other builder networks (using a different authorizer) - correct? I am not sure that having a prerequisite pointing to a work package of another authorizer gains anything, hence if need be, by restricting prerequisites, we might be able to tighten things here a bit, if we find that beneficial enough.


An *Authorization* is simply a blob which helps the Authorizer recognize a properly authorized Work Package. No constraints are placed on Authorizers over how they may interpret this blob. Expected authorization content includes signatures, Merkle-proofs and more exotic succinct zero-knowledge proofs.

_(Note: depending on future Relay-chain Coretime scheduling implementation concerns, a window of Relay-chain blocks)._
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To where do this note refer?

It's anyways important that work packages cannot know in advance when they'll be backed, when the resulting candidate reciepts get reported on-chain, when they'll be included/integration, or what chain forks containing either those events gets approved and finalized.

We could set future cut offs for reporting, but not too optimistically near term. Async backing matters because latency arises from various sources.

Co-authored-by: ordian <write@reusable.software>
@gavofyork
Copy link
Contributor Author

As the prototype takes shape, a large number of the details RFC have changed. I'll close for new while I author a new spec to be discussed once the prototype is completed.

@gavofyork gavofyork closed this Dec 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.