Investigate grandpa resource usage #830

eskimor · 2022-05-05T14:47:27Z

As can be seen here (data from Kusama), from all the tasks grandpa is apart from node and networking the most CPU intensive:

Talking to @andresilva , we figured that we might be able to significantly reduce this by accepting a slightly slower finality, by simply doubling the gossip_duration configuration parameter. That's a very easy change and the smaller nodes on Kusama used to have quite high CPU usage, so it seemed worth trying out. We doubled that value from 1000 to 2000 ms and deployed that to all Versi nodes:

https://github.com/paritytech/polkadot/pull/5448/files
https://gitlab.parity.io/parity/infrastructure/parity-testnet/-/merge_requests/313/diffs

This was tried on May 4th at around 7:30 PM UTC+2, another upgrade (relaxing escalation of approval distribution) got applied at around May 5th 0:00 UTC+2. On May 5th at around 10:00 UTC+2 master got deployed again.

What we found, poll times of grandpa dropped by around 37%:

Interestingly also a few subsystems had reduced poll times with that change:

Parachain block times did not really seem to be affected (or rather fluctuate too much to see a clear pattern):

As expected finality got a bit slower:

Number of active heads stayed quite stable:

Network traffic wise, there was no overall concrete pattern visible.

Conclusion and path forward

By doubling the gossip_duration indeed some load can be reduced, we were not able to find any negative effects apart of a bit slower finality.

If 1 block slower finality seems acceptable, this is an easy way to reduce load a bit. It is likely that the savings will be larger on larger networks. Versi was around 560 nodes at the time of the test. So trying this out on a larger test network, might be a worthwhile experiment.

The text was updated successfully, but these errors were encountered:

andresilva · 2022-05-05T14:57:27Z

It's a bit hard to predict how much impact it will have in the node resources when this change gets deployed on a different network, but since the worst thing that can happen is slightly increased finality times I think it would be worthwhile to test this on Kusama. Since it has 1000 validators it might have a slightly bigger impact than what was observed in Versi.

rphmeier · 2022-05-05T18:49:16Z

Out of curiosity, what is the heaviest part of GRANDPA? Can we optimize GRANDPA lower-level to achieve similar results without increasing the finality time?

andresilva · 2022-05-05T20:40:51Z

I think it boils down to the amount of data we need to spread every round, which with the current gossip_duration should be about 4 seconds. We only gossip to 4 random peers which is usually enough to spread the votes throughout the network without a lot of churn.

One thing that we could do in the future is changing GRANDPA to only start a new round when there's something new to vote on (maybe with a timeout to start a new round anyway even if there's nothing new). This would make it so that GRANDPA is "synced" with approval voting, avoiding rounds were we finalize the same thing.

#551 is also useful since the API we're currently using doesn't handle backpressure.

eskimor · 2022-10-30T12:11:15Z

Looking at this again, it seems that CPU load of grandpa indeed improved. I can see CPU load of grandpa at around 10%-15% now, e.g. here:

Anyhow, improving performance on Kusama and not on Polkadot is longer term not a good idea. I will remove the change for now, if we decide we want this we should enable it on both networks later on.

burdges · 2022-10-30T14:02:03Z

What about grandpa's gossip topology? I think four random peers does not sound wonderful, either for latency, deduplicaiton, etc. What if we used the grid from approvals?

h4x3rotab · 2023-02-15T03:07:44Z

The finality speed affects downstream parachains. Phala runs external offchain workers that follows the finality of parachain blocks, which follows the upstream relay chain. It has a built-in GRANDPA light client. It verifies the relay chain finality, and then checks if the parachain header matches the header in the relay chain. It always has the latency of relay chain finality + parachain finality. If the relay chain has one more block delay, our offchain worker will move slower. If the original delay is around 2 blocks, then 1 block is 50% slower, which is non-trivial.

It's nice to reduce the burden of running a relay chain node, but not ideal with the cost of increasing the downstream application latency. It would be better if we can optimize Grandpa itself, or only make adjustment on non-validator full nodes (not sure if it's relevant to this issue though).

burdges · 2023-02-15T08:09:14Z

An 8% reduction in CPU usage sounds worth doing. It'll likely mean more parachain cores and thus more parachains.

Did we see much finality lag on Versi @eskimor ?

Do we know how much grandpa time is signature verification v signature creation (minimal) v networking v business logic? We've partial work towards Rabin-Williams signatures, which theoretically give 10x faster verification with only mild key rotation, but also 256 bytes vs 64 bytes for Ed25519. We could explore single layer XMSS signatures, just blazingly fast, but require aggressive key rotation and 2kb of bandwidth. We would not otoh benefit from these if we've networking limitations, which afaik should not be the case, but libp2p maybe slow.

eskimor · 2023-02-15T09:06:03Z

Thanks @h4x3rotab ! This is really useful feedback! @burdges yes, but we have some bigger fish to fry for now :-)

* Update substrate to polkadot-v0.9.28 Signed-off-by: koushiro <koushiro.cqx@gmail.com> * Fix PruningParams fields Signed-off-by: koushiro <koushiro.cqx@gmail.com> * Some trivials Signed-off-by: koushiro <koushiro.cqx@gmail.com> * Ignore tests calling EthereumRuntimeRPCApi_current_transaction_statuses Signed-off-by: koushiro <koushiro.cqx@gmail.com> * ignore tests Signed-off-by: koushiro <koushiro.cqx@gmail.com> * Fix fmt Signed-off-by: koushiro <koushiro.cqx@gmail.com> Signed-off-by: koushiro <koushiro.cqx@gmail.com>

) * transaction_payment_without_multiplier -> transaction_payment * tests * fmt

* transaction_payment_without_multiplier -> transaction_payment * tests * fmt

eskimor mentioned this issue May 9, 2022

Double grandpa gossip duration for Kusama and test networks paritytech/polkadot#5448

Merged

eskimor mentioned this issue Oct 30, 2022

Revert special casing of Kusama for grandpa rounds. paritytech/polkadot#6217

Merged

ukint-vs mentioned this issue Apr 24, 2023

fix(node): slow down grandpa finality. gear-tech/gear#2579

Merged

Sophia-Gold transferred this issue from paritytech/polkadot Aug 24, 2023

serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024

Add multiplier parameter to transaction_payment function (paritytech#830

b9f2852

) * transaction_payment_without_multiplier -> transaction_payment * tests * fmt

serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024

Add multiplier parameter to transaction_payment function (paritytech#830

7386280

) * transaction_payment_without_multiplier -> transaction_payment * tests * fmt

serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024

Add multiplier parameter to transaction_payment function (paritytech#830

a506eb1

) * transaction_payment_without_multiplier -> transaction_payment * tests * fmt

serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024

Add multiplier parameter to transaction_payment function (paritytech#830

70f0f10

) * transaction_payment_without_multiplier -> transaction_payment * tests * fmt

serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024

Add multiplier parameter to transaction_payment function (paritytech#830

07f5413

) * transaction_payment_without_multiplier -> transaction_payment * tests * fmt

serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024

Add multiplier parameter to transaction_payment function (paritytech#830

8536e8d

) * transaction_payment_without_multiplier -> transaction_payment * tests * fmt

serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024

Add multiplier parameter to transaction_payment function (paritytech#830

55e262d

) * transaction_payment_without_multiplier -> transaction_payment * tests * fmt

serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024

Add multiplier parameter to transaction_payment function (paritytech#830

105693e

) * transaction_payment_without_multiplier -> transaction_payment * tests * fmt

serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024

Add multiplier parameter to transaction_payment function (paritytech#830

a42b56f

) * transaction_payment_without_multiplier -> transaction_payment * tests * fmt

serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024

Add multiplier parameter to transaction_payment function (paritytech#830

35917fa

) * transaction_payment_without_multiplier -> transaction_payment * tests * fmt

serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 10, 2024

Add multiplier parameter to transaction_payment function (paritytech#830

79e8858

) * transaction_payment_without_multiplier -> transaction_payment * tests * fmt

serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 10, 2024

Add multiplier parameter to transaction_payment function (paritytech#830

7b6fe3d

) * transaction_payment_without_multiplier -> transaction_payment * tests * fmt

bkchr pushed a commit that referenced this issue Apr 10, 2024

Add multiplier parameter to transaction_payment function (#830)

401d384

* transaction_payment_without_multiplier -> transaction_payment * tests * fmt

drgora mentioned this issue Jun 7, 2024

Update GRANDPA HorizenLabs/zkVerify#68

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate grandpa resource usage #830

Investigate grandpa resource usage #830

eskimor commented May 5, 2022 •

edited

Loading

andresilva commented May 5, 2022

rphmeier commented May 5, 2022

andresilva commented May 5, 2022 •

edited

Loading

eskimor commented Oct 30, 2022

burdges commented Oct 30, 2022

h4x3rotab commented Feb 15, 2023

burdges commented Feb 15, 2023

eskimor commented Feb 15, 2023

Investigate grandpa resource usage #830

Investigate grandpa resource usage #830

Comments

eskimor commented May 5, 2022 • edited Loading

Conclusion and path forward

andresilva commented May 5, 2022

rphmeier commented May 5, 2022

andresilva commented May 5, 2022 • edited Loading

eskimor commented Oct 30, 2022

burdges commented Oct 30, 2022

h4x3rotab commented Feb 15, 2023

burdges commented Feb 15, 2023

eskimor commented Feb 15, 2023

eskimor commented May 5, 2022 •

edited

Loading

andresilva commented May 5, 2022 •

edited

Loading