Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Optimize Approval-Distribution to more effectively use the grid topology #5162

Closed
rphmeier opened this issue Mar 19, 2022 · 1 comment · Fixed by #5164
Closed

Optimize Approval-Distribution to more effectively use the grid topology #5162

rphmeier opened this issue Mar 19, 2022 · 1 comment · Fixed by #5164
Assignees
Labels
I9-footprint An enhancement to provide a smaller (system load, memory, network or disk) footprint. I10-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. T5-parachains_protocol This PR/Issue is related to Parachains features and protocol changes.

Comments

@rphmeier
Copy link
Contributor

rphmeier commented Mar 19, 2022

Note: we should do something similar in statement distribution and bitfield distribution.

Background: (#4911) We are experiencing issues in approval distribution due to excessive propagation of messages.

The goal of this issue is to propose a set of tweaks which is reasonably hard to attack in the mid-term but improves bandwidth consumption and wire message volume substantially. There are likely much better, more secure, protocols we can implement and we should do so. This is a fairly simple, immediate solution to the problem at hand that doesn't require any changes to message format.

First, some statements about the expected parameters of approval distribution

  • Every candidate made available in each relay-chain block is expected to have around 20 approval checkers, who are randomly and unpredictably selected.
  • Each checker issues first an assignment (around 100 bytes) and then an approval (around 64 bytes).
  • Each assignment must be received at a node before the corresponding approval message.
  • We need assignments to propagate fairly quickly, so an attacker can't silence all the checkers.
  • For each parachain, if we assume that a candidate is made available in every relay-chain block, we have an expected 20 * 2 messages. This doesn't account for forks, and assumes that relay-chain blocks happen once every 6 seconds. We can say that the average size of our messages are 80 bytes and that relay-chain blocks come once every 4 seconds to account for forks, which happen pretty often with BABE. This gives us (20 * 2) / 4 = 10 unique messages per parachain per second, and 800Bps (6.4Kbps) of unique traffic per second per parachain. With 100 parachains, this is 640Kbps of unique traffic per second.

We currently have a grid topology, which is being misused. This takes the overall set of "gossip authorities" in a session and formats them into (roughly) a 2-dimensional matrix, where each node is intended to communicate with other nodes it shares either a row or a column with.

What is currently happening is that a node creates a message, sends it to the nodes in the same row or column, and then those nodes send it to the nodes in the same row and column as they are in, and so on. The only limit on propagation is that nodes shouldn't send a message to a node which it has already sent the message to, or to a node which it has received the same message from.

This leads to excessive repropagation, because it vastly underestimates the likelihood that a node has or will see a message from another source. Our measurements show that this is causing around 97% waste with 1000 validators. This means that instead of 10 messages per parachain per second, we're seeing hundreds or thousands of messages per parachain per second.

In a trustless and adversarial gossip environment, it's not sufficient to simply tag each message with a TTL, which is the number of propagations remaining on the message. TTLs can be set arbitrarily by adversarial peers, wasting bandwidth.

What we should do is trust our grid topology more and compute an implicit TTL based on the message and the source. Validators who produce assignments and approvals should send them to all peers in their row & column, as they currently do.

When we receive a message, we know the public key of the validator which produced the message. We should have a rule that
- If we receive a validator's message from a peer ID associated with that validator AND
- We are in a row or column with that validator (our 'shared dimension')
- THEN we repropagate to other peers in the dimension of the matrix that we do NOT share with the validator. e.g. if we're in the same row as the validator, we share the message with nodes in our column.
- OTHERWISE, we do not repropagate.

We should also have a rule that when we first receive a valid message, we randomly propagate the message to some small number of other peers (e.g. 4), regardless of where we get them from. This will create a random propagation effect to counterbalance any peers which are missing from the grid topology. When we choose a peer to receive a message, we are choosing to send it the corresponding approval as well, once we have that. Assignments and approvals come in pairs, so it only makes sense to bundle them.

And if a block takes 'too long' to be finalized, we will begin propagating the messages to more and more peers in both our row and column until we're sure they all have them. We should have an escape hatch in the case of disputes, because those can cause blocks to remain temporarily unfinalized even though all nodes have seen all approval-checking messages.

With this usage of the grid topology, each node will have 2 canonical pathways to each other node. Furthermore, there will be a sparse propagation that will likely span the entire peer-set in a few more hops. This should meet the goals of reducing bandwidth while ensuring that messages reach their destinations and assignments make their way out into the broader network quickly.

@rphmeier rphmeier added I9-footprint An enhancement to provide a smaller (system load, memory, network or disk) footprint. I10-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. labels Mar 19, 2022
@rphmeier rphmeier self-assigned this Mar 19, 2022
@burdges
Copy link
Contributor

burdges commented Mar 19, 2022

I believe this grid topology makes sense for some messages but we've never analyzed the grid topology for assignments. I've no idea if an adversary could really exploit non-uniformity in the grid, but analyzing this looks tricky so I've always just assumed we wanted more randomness in assignments.

We could discuss several techniques:

(0) We could adopt roughly the erlay / minisketch set reconciliation protocol by Peter Wullie, et al. for bitcoin's memepool ala https://hackmd.io/@rgbPIkIdTwSICPuAq67Jbw/HJ5rpldsB and https://github.com/w3f/research-internal/issues/141 but interact with nodes in only a sequential way so on-average each forwarding node sends out each censored message to two nodes before being censored. We should also ask if being sequential in set reconciliation slows down message propagation too much, which imho sounds likely.

(1) We aggressively forward messages to far fewer random nodes, and maybe give each message a TTL, after which time we no longer aggressively forward. Yes, we're worried an adversary could spam us by by sending out each message with a high TTL or whatever, but this is not a problem if the initial TTL is only two and maybe if each node sends to enough people then an initial TTL of two is fine.

(2) We cryptographically enforce the TTL by pre-batching VRFs, so if the original message is a VRF sigma with output x then the forwarder with keypair (pk,sk) computes their own VRFs sigma_k with input y_k = x++k and output z_k for k=1..4, determines the next gossip recipient from z_k, and pre-batches together sigma and sigma_k. In this way, we enforce that each recipient sees how many times each original VRF sigma was forwarded before reaching them, and make the forwarding completely deterministic.

A non-grid approach might look like:

Initially, we adopt set reconciliation now-ish which we expose for usage both within the grid and also in a non-sequential randomized topology. We initially do some non-sequential randomized set reconciliation for assignments, which avoids slowing down gossip much, but we later replace this with (1) if a TTL of 2 or 3 looks good in analysis, or possibly (2) otherwise.

As all this requires considerable new code, an easier grid-like approaches might look like:

We randomly select k initial hops from the global pool, not grid constrained. If we simply send to these, and then use the grid then we'll have everyone receive the message k times, which sucks without set reconciliation. Instead, we divide the rows and columns among these initial hops randomly and include two bitfields (rows,cols) representing this selection into the payload signed by the VRF. We then have an honest hop in (r,c) redistribute only to the validators in both (r,c') with c' in cols and (r',c) with r' in rows. In other words, we divide up the grid into random sub-grids per message. We could even permit some overlap in rows and cols for redundancy, assuming only one validator at each grid position.

I'll remark that a grid with only one validator at each grid position has poor delivery properties, so we'll need something vaguely like this for other messages too. We maybe have log num_validators at each position or something now?


We could unify each approval checkers' RelayVRFModulo assignments into one gossip message using vrfs_merge. In fact, we could've only one VRF for all the checkers' RelayVRFModulo assignments, which achieves the same without the vrfs_merge call, but then nodes should really do all their assignments, not withhold some because their slow or overworked. In fact, we could use only one VRF but let them say how many of those assignments they really wish to do, which sounds fairly optimal, although they then cannot later announce they wish to do more.

@ordian ordian added the T5-parachains_protocol This PR/Issue is related to Parachains features and protocol changes. label Aug 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
I9-footprint An enhancement to provide a smaller (system load, memory, network or disk) footprint. I10-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. T5-parachains_protocol This PR/Issue is related to Parachains features and protocol changes.
Projects
No open projects
Status: Done
3 participants