-
Notifications
You must be signed in to change notification settings - Fork 666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate grandpa resource usage #830
Comments
It's a bit hard to predict how much impact it will have in the node resources when this change gets deployed on a different network, but since the worst thing that can happen is slightly increased finality times I think it would be worthwhile to test this on Kusama. Since it has 1000 validators it might have a slightly bigger impact than what was observed in Versi. |
Out of curiosity, what is the heaviest part of GRANDPA? Can we optimize GRANDPA lower-level to achieve similar results without increasing the finality time? |
I think it boils down to the amount of data we need to spread every round, which with the current One thing that we could do in the future is changing GRANDPA to only start a new round when there's something new to vote on (maybe with a timeout to start a new round anyway even if there's nothing new). This would make it so that GRANDPA is "synced" with approval voting, avoiding rounds were we finalize the same thing. #551 is also useful since the API we're currently using doesn't handle backpressure. |
What about grandpa's gossip topology? I think four random peers does not sound wonderful, either for latency, deduplicaiton, etc. What if we used the grid from approvals? |
The finality speed affects downstream parachains. Phala runs external offchain workers that follows the finality of parachain blocks, which follows the upstream relay chain. It has a built-in GRANDPA light client. It verifies the relay chain finality, and then checks if the parachain header matches the header in the relay chain. It always has the latency of relay chain finality + parachain finality. If the relay chain has one more block delay, our offchain worker will move slower. If the original delay is around 2 blocks, then 1 block is 50% slower, which is non-trivial. It's nice to reduce the burden of running a relay chain node, but not ideal with the cost of increasing the downstream application latency. It would be better if we can optimize Grandpa itself, or only make adjustment on non-validator full nodes (not sure if it's relevant to this issue though). |
An 8% reduction in CPU usage sounds worth doing. It'll likely mean more parachain cores and thus more parachains. Did we see much finality lag on Versi @eskimor ? Do we know how much grandpa time is signature verification v signature creation (minimal) v networking v business logic? We've partial work towards Rabin-Williams signatures, which theoretically give 10x faster verification with only mild key rotation, but also 256 bytes vs 64 bytes for Ed25519. We could explore single layer XMSS signatures, just blazingly fast, but require aggressive key rotation and 2kb of bandwidth. We would not otoh benefit from these if we've networking limitations, which afaik should not be the case, but libp2p maybe slow. |
Thanks @h4x3rotab ! This is really useful feedback! @burdges yes, but we have some bigger fish to fry for now :-) |
* Update substrate to polkadot-v0.9.28 Signed-off-by: koushiro <koushiro.cqx@gmail.com> * Fix PruningParams fields Signed-off-by: koushiro <koushiro.cqx@gmail.com> * Some trivials Signed-off-by: koushiro <koushiro.cqx@gmail.com> * Ignore tests calling EthereumRuntimeRPCApi_current_transaction_statuses Signed-off-by: koushiro <koushiro.cqx@gmail.com> * ignore tests Signed-off-by: koushiro <koushiro.cqx@gmail.com> * Fix fmt Signed-off-by: koushiro <koushiro.cqx@gmail.com> Signed-off-by: koushiro <koushiro.cqx@gmail.com>
* transaction_payment_without_multiplier -> transaction_payment * tests * fmt
As can be seen here (data from Kusama), from all the tasks grandpa is apart from node and networking the most CPU intensive:
Talking to @andresilva , we figured that we might be able to significantly reduce this by accepting a slightly slower finality, by simply doubling the
gossip_duration
configuration parameter. That's a very easy change and the smaller nodes on Kusama used to have quite high CPU usage, so it seemed worth trying out. We doubled that value from 1000 to 2000 ms and deployed that to all Versi nodes:https://github.com/paritytech/polkadot/pull/5448/files
https://gitlab.parity.io/parity/infrastructure/parity-testnet/-/merge_requests/313/diffs
This was tried on May 4th at around 7:30 PM UTC+2, another upgrade (relaxing escalation of approval distribution) got applied at around May 5th 0:00 UTC+2. On May 5th at around 10:00 UTC+2 master got deployed again.
What we found, poll times of grandpa dropped by around 37%:
Interestingly also a few subsystems had reduced poll times with that change:
Parachain block times did not really seem to be affected (or rather fluctuate too much to see a clear pattern):
As expected finality got a bit slower:
Number of active heads stayed quite stable:
Network traffic wise, there was no overall concrete pattern visible.
Conclusion and path forward
By doubling the
gossip_duration
indeed some load can be reduced, we were not able to find any negative effects apart of a bit slower finality.If 1 block slower finality seems acceptable, this is an easy way to reduce load a bit. It is likely that the savings will be larger on larger networks. Versi was around 560 nodes at the time of the test. So trying this out on a larger test network, might be a worthwhile experiment.
The text was updated successfully, but these errors were encountered: