polkadot-fellows · paritytech-rfc-bot · Jan 25, 2024 · Nov 13, 2023 · Nov 13, 2023 · Nov 13, 2023
diff --git a/text/0047-assignment-of-availability-chunks.md b/text/0047-assignment-of-availability-chunks.md
@@ -0,0 +1,313 @@
+# RFC-0047: Assignment of availability chunks to validators
+
+|                 |                                                                                             |
+| --------------- | ------------------------------------------------------------------------------------------- |
+| **Start Date**  | 03 November 2023                                                                            |
+| **Description** | An evenly-distributing indirection layer between availability chunks and validators.        |
+| **Authors**     | Alin Dima                                                                                   |
+
+## Summary
+
+Propose a way of permuting the availability chunk indices assigned to validators for a given core and relay
+chain block, in the context of
+[recovering available data from systematic chunks](https://github.com/paritytech/polkadot-sdk/issues/598), with the
+purpose of fairly distributing network bandwidth usage.
+
+## Motivation
+
+Currently, the ValidatorIndex is always identical to the ChunkIndex. Since the validator array is only shuffled once
+per session, naively using the ValidatorIndex as the ChunkIndex would pose an unreasonable stress on the first N/3
+validators during an entire session, when favouring availability recovery from systematic chunks.
+
+Therefore, the relay chain node needs a deterministic way of evenly distributing the first ~(N_VALIDATORS / 3)
+systematic availability chunks to different validators, based on the relay chain block and core.
+The main purpose is to ensure fair distribution of network bandwidth usage for availability recovery in general and in
+particular for systematic chunk holders. 
+
+## Stakeholders
+
+Relay chain node core developers.
+
+## Explanation
+
+### Systematic erasure codes
+
+An erasure coding algorithm is considered systematic if it preserves the original unencoded data as part of the
+resulting code.
+[The implementation of the erasure coding algorithm used for polkadot's availability data](https://github.com/paritytech/reed-solomon-novelpoly) is systematic.
+Roughly speaking, the first N_VALIDATORS/3 chunks of data can be cheaply concatenated to retrieve the original data,
+without running the resource-intensive and time-consuming reconstruction algorithm.
+
+Here's the concatenation procedure of systematic chunks for polkadot's erasure coding algorithm
+(minus error handling, for briefness):
+```rust
+pub fn reconstruct_from_systematic<T: Decode>(
+	n_validators: usize,
+	chunks: Vec<&[u8]>,
+) -> T {
+	let mut threshold = (n_validators - 1) / 3;
+	if !is_power_of_two(threshold) {
+		threshold = next_lower_power_of_2(threshold);
+	}
+
+	let shard_len = chunks.iter().next().unwrap().len();
+
+	let mut systematic_bytes = Vec::with_capacity(shard_len * threshold);
+
+	for i in (0..shard_len).step_by(2) {
+		for chunk in chunks.iter().take(threshold) {
+			systematic_bytes.push(chunk[i]);
+			systematic_bytes.push(chunk[i + 1]);
+		}
+	}
+
+	Decode::decode(&mut &systematic_bytes[..]).unwrap()
+}
+```
+
+In a nutshell, it performs a column-wise concatenation with 2-byte chunks.
+The output could be zero-padded at the end, so scale decoding must be aware of the expected length in bytes and ignore
+trailing zeros.
+
+### Availability recovery at present
+
+According to the [polkadot protocol spec](https://spec.polkadot.network/chapter-anv#sect-candidate-recovery):
+
+> A validator should request chunks by picking peers randomly and must recover at least `f+1` chunks, where
+`n=3f+k` and `k in {1,2,3}`.
+
+For parity's polkadot node implementation, the process was further optimised. At this moment, it works differently based
+on the estimated size of the available data:
+
+(a) for small PoVs (up to 128 Kib), sequentially try requesting the unencoded data from the backing group, in a random
+order. If this fails, fallback to option (b).
+
+(b) for large PoVs (over 128 Kib), launch N parallel requests for the erasure coded chunks (currently, N has an upper
+limit of 50), until enough chunks were recovered. Validators are tried in a random order. Then, reconstruct the
+original data.
+
+All options require that after reconstruction, validators then re-encode the data and re-create the erasure chunks trie
+in order to check the erasure root.
+
+### Availability recovery from systematic chunks
+
+As part of the effort of
+[increasing polkadot's resource efficiency, scalability and performance](https://github.com/paritytech/roadmap/issues/26),
+work is under way to modify the Availability Recovery protocol by leveraging systematic chunks. See
+[this comment](https://github.com/paritytech/polkadot-sdk/issues/598#issuecomment-1792007099) for preliminary
+performance results.
+
+In this scheme, the relay chain node will first attempt to retrieve the ~N/3 systematic chunks from the validators that
+should hold them, before falling back to recovering from regular chunks, as before.
+
+A re-encoding step is still needed for verifying the erasure root, so the erasure coding overhead cannot be completely
+brought down to 0.
+
+### Chunk assignment function
+
+#### Properties
+
+The function that decides the chunk index for a validator should be parameterized by at least
+`(validator_index, relay_parent, core_index)`
+and have the following properties:
+1. deterministic
+1. relatively quick to compute and resource-efficient.
+1. when considering the other params besides `validator_index` as fixed, the function should describe a permutation
+of the chunk indices
+1. considering `relay_parent` as a fixed argument, the validators that map to the first N/3 chunk indices should
+have as little overlap as possible for different paras scheduled on that relay parent.
+
+In other words, we want a uniformly distributed, deterministic mapping from `ValidatorIndex` to `ChunkIndex` per block
+per core.
+
+It's desirable to not embed this function in the runtime, for performance and complexity reasons.
+However, this means that the function needs to be kept very simple and with minimal or no external dependencies.
+Any change to this function could result in parachains being stalled and needs to be coordinated via a runtime upgrade
+or governance call.
+
+#### Proposed function
+
+Pseudocode:
+
+```rust
+pub fn get_chunk_index(
+  n_validators: u32,
+  validator_index: ValidatorIndex,
+  block_number: BlockNumber,
+  core_index: CoreIndex
+) -> ChunkIndex {
+  let threshold = systematic_threshold(n_validators); // Roughly n_validators/3
+  let core_start_pos = abs(core_index - block_number) * threshold;
+
+  (core_start_pos + validator_index) % n_validators
+}
+```
+
+### Network protocol
+
+The request-response `/polkadot/req_chunk` protocol will be bumped to a new version (from v1 to v2).
+For v1, the request and response payloads are:
+```rust
+/// Request an availability chunk.
+pub struct ChunkFetchingRequest {
+	/// Hash of candidate we want a chunk for.
+	pub candidate_hash: CandidateHash,
+	/// The index of the chunk to fetch.
+	pub index: ValidatorIndex,
+}
+
+/// Receive a requested erasure chunk.
+pub enum ChunkFetchingResponse {
+	/// The requested chunk data.
+	Chunk(ChunkResponse),
+	/// Node was not in possession of the requested chunk.
+	NoSuchChunk,
+}
+
+/// This omits the chunk's index because it is already known by
+/// the requester and by not transmitting it, we ensure the requester is going to use his index
+/// value for validating the response, thus making sure he got what he requested.
+pub struct ChunkResponse {
+	/// The erasure-encoded chunk of data belonging to the candidate block.
+	pub chunk: Vec<u8>,
+	/// Proof for this chunk's branch in the Merkle tree.
+	pub proof: Proof,
+}
+```
+
+Version 2 will add an `index` field to `ChunkResponse`:
+
+```rust
+#[derive(Debug, Clone, Encode, Decode)]
+pub struct ChunkResponse {
+	/// The erasure-encoded chunk of data belonging to the candidate block.
+	pub chunk: Vec<u8>,
+	/// Proof for this chunk's branch in the Merkle tree.
+	pub proof: Proof,
+	/// Chunk index.
+	pub index: ChunkIndex
+}
+```
+
+An important thing to note is that in version 1, the `ValidatorIndex` value is always equal to the `ChunkIndex`.
+Until the feature is enabled, this will also be true for version 2. However, after the feature is enabled,
+this will generally not be true.
+
+The requester will send the request to validator with index `V`. The responder will map the `V` validator index to the
+`C` chunk index and respond with the `C`-th chunk.
+
+The protocol implementation MAY check the returned `ChunkIndex` against the expected mapping to ensure that
+it received the right chunk.
+In practice, this is desirable during availability-distribution and systematic chunk recovery. However, regular
+recovery may not check this index, which is particularly useful when participating in disputes that don't allow
+for easy access to the validator->chunk mapping. See [Appendix A](#appendix-a) for more details.
+
+
+### Upgrade path
+
+#### Step 1: Enabling new network protocol
+In the beginning, both `/polkadot/req_chunk/1` and `/polkadot/req_chunk/2` will be supported, until all validators and
+collators have upgraded to use the new version. V1 will be considered deprecated.
+Once all nodes are upgraded, a new release will be cut that removes the v1 protocol. Only once all nodes have upgraded
+to this version will step 2 commence.
+
+#### Step 2: Enabling the new validator->chunk mapping
+Considering that the Validator->Chunk mapping is critical to para consensus, the change needs to be enacted atomically
+via governance, only after all validators have upgraded the node to a version that is aware of this mapping.
+It needs to be explicitly stated that after the runtime upgrade and governance enactment, validators that run older
+client versions that don't support this mapping will not be able to participate in parachain consensus.
+
+Additionally, an error will be logged when starting a validator with an older version, after the runtime was upgraded
+and the feature enabled.
+
+On the other hand, collators will not be required to upgrade, as regular chunk recovery will work as before, granted
+that version 1 of the networking protocol has been removed.  However, they are encouraged to upgrade in order to take
+advantage of the faster systematic recovery.
+
+## Drawbacks
+
+- Getting access to the `core_index` that used to be occupied by a candidate in some parts of the dispute protocol is
+very complicated (See [appendix A](#appendix-a)). This RFC assumes that availability-recovery processes initiated during
+disputes will only use regular recovery, as before. This is acceptable since disputes are rare occurrences in practice
+and is something that can be optimised later, if need be.
+- It's a breaking change that requires all validators and collators to upgrade their node version.
+
+## Testing, Security, and Privacy
+
+Extensive testing will be conducted - both automated and manual.
+This proposal doesn't affect security or privacy.
+
+## Performance, Ergonomics, and Compatibility
+
+### Performance
+
+This is a necessary data availability optimisation, as reed-solomon erasure coding has proven to be a top consumer of
+CPU time in polkadot as we scale up the parachain block size and number of availability cores.
+
+With this optimisation, preliminary performance results show that CPU time used for reed-solomon coding can be halved
+and total POV recovery time decrease by 80% for large POVs. See more
+[here](https://github.com/paritytech/polkadot-sdk/issues/598#issuecomment-1792007099).
+
+### Ergonomics
+
+Not applicable.
+
+### Compatibility
+
+This is a breaking change. See [upgrade path](#upgrade-path) section above.
+All validators need to have upgraded their node versions before the feature will be enabled via a runtime upgrade and
+governance call.
+
+## Prior Art and References
+
+See comments on the [tracking issue](https://github.com/paritytech/polkadot-sdk/issues/598) and the
+[in-progress PR](https://github.com/paritytech/polkadot-sdk/pull/1644)
+
+## Unresolved Questions
+
+- Is there a better upgrade path that would preserve backwards compatibility?
+
+## Future Directions and Related Material
+
+This enables future optimisations for the performance of availability recovery, such as retrieving batched systematic
+chunks from backers/approval-checkers.
+
+## Appendix A
+
+This appendix details the intricacies of getting access to the core index of a candidate in parity's polkadot node.
+
+Here, `core_index` refers to the index of the core that a candidate was occupying while it was pending availability
+(from backing to inclusion).
+
+Availability-recovery can currently be triggered by the following phases in the polkadot protocol:
+1. During the approval voting process.
+1. By other collators of the same parachain.
+1. During disputes.
+
+Getting the right core index for a candidate can be troublesome. Here's a breakdown of how different parts of the
+node implementation can get access to it:
+
+1. The approval-voting process for a candidate begins after observing that the candidate was included. Therefore, the
+node has easy access to the block where the candidate got included (and also the core that it occupied).
+1. The `pov_recovery` task of the collators starts availability recovery in response to noticing a candidate getting
+backed, which enables easy access to the core index the candidate started occupying.
+1. Disputes may be initiated on a number of occasions:
+
+    3.a. is initiated by the validator as a result of finding an invalid candidate while participating in the
+  approval-voting protocol. In this case, availability-recovery is not needed, since the validator already issued their
+  vote.
+
+    3.b is initiated by the validator noticing dispute votes recorded on-chain. In this case, we can safely
+  assume that the backing event for that candidate has been recorded and kept in memory.
+
+    3.c is initiated as a result of getting a dispute statement from another validator. It is possible that the dispute
+  is happening on a fork that was not yet imported by this validator, so the subsystem may not have seen this candidate
+  being backed.
+
+A naive attempt of solving 3.c would be to add a new version for the disputes request-response networking protocol.
+Blindly passing the core index in the network payload would not work, since there is no way of validating that
+the reported core_index was indeed the one occupied by the candidate at the respective relay parent.
+
+Another attempt could be to include in the message the relay block hash where the candidate was included.
+This information would be used in order to query the runtime API and retrieve the core index that the candidate was
+occupying. However, considering it's part of an unimported fork, the validator cannot call a runtime API on that block.