runtime: remove ttl #5461

alindima · 2024-08-26T08:37:13Z

Resolves #4776

This will enable proper core-sharing between paras, even if one of them is not producing blocks.

TODO:

duplicate first entry in the claim queue if the queue used to be empty
don't back anything if at the end of the block there'll be a session change
write migration for removing the availability core storage
update and write unit tests
prdoc
add zombienet test for synchronous backing
add zombienet test for core-sharing paras where one of them is not producing any blocks

Important note:
The ttl and max_availability_timeouts fields of the HostConfiguration are not removed in this PR, due to #64.
Adding the workaround with the storage version check for every use of the active HostConfiguration in all runtime APIs would be insane, as it's used in almost all runtime APIs.

So even though the ttl and max_availability_timeouts fields will now be unused, they will remain part of the host configuration.

These will be removed in a separate PR once #64 is fixed.

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

command-bot · 2024-10-02T20:07:23Z

@alindima Command "$PIPELINE_SCRIPTS_DIR/commands/bench/bench.sh" --subcommand=pallet --runtime=rococo --target_dir=polkadot --pallet=polkadot_runtime_parachains::paras_inherent has finished. Result: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/7484427 has finished. If any artifacts were generated, you can download them from https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/7484427/artifacts/download.

alindima · 2024-10-07T07:28:42Z

oh this got pretty messed up since it was based on #5423, which was merged. I'll try to fix it but will probably need to force push it

Later edit: managed to fix it

…upport' into alindima/remove-ttl

eskimor

Left a few nits. Let's strive for well documented and encapsulated pallets with a nice API that serves a purpose/sets clear to understand boundaries and expectations in both directions.

Other than that, I think I found one issue with regards to virtual cores.

eskimor · 2024-10-04T13:03:55Z

polkadot/runtime/parachains/src/assigner_coretime/mod.rs

+	fn report_processed(para_id: ParaId, core_index: CoreIndex) {
+		// Reporting processed assignments is only important for on-demand.
+		// Doing the call below is a no-op if the assignment was a `Bulk` one.
+		on_demand::Pallet::<T>::report_processed(para_id, core_index);


Ok sorry. Now I looked at the code as well. Indeed this is a hack of course. The assignment provider should know about on-demand/bulk. Calling into the on-demand for non-ondemand assignments is not nice indeed, but I see you added the no-op to the docs. Better yet would be to state this as a proper invariant, ideally even with an accommodating test, checking that this is indeed a noop.

eskimor · 2024-10-04T13:06:49Z

polkadot/runtime/parachains/src/assigner_coretime/mod.rs

+	fn report_processed(para_id: ParaId, core_index: CoreIndex) {
+		// Reporting processed assignments is only important for on-demand.
+		// Doing the call below is a no-op if the assignment was a `Bulk` one.
+		on_demand::Pallet::<T>::report_processed(para_id, core_index);


On the other hand this is only an intermediary hack right? With the proper fix coming, we should no longer need this hack anyway.

polkadot/runtime/parachains/src/paras_inherent/mod.rs

polkadot/runtime/parachains/src/scheduler.rs

polkadot/runtime/parachains/src/session_info.rs

eskimor · 2024-10-07T09:56:06Z

polkadot/runtime/parachains/src/assigner_coretime/mod.rs

+	fn report_processed(para_id: ParaId, core_index: CoreIndex) {
+		// Reporting processed assignments is only important for on-demand.
+		// Doing the call below is a no-op if the assignment was a `Bulk` one.
+		on_demand::Pallet::<T>::report_processed(para_id, core_index);


Reading the code I don't get why we need this hack. Anyhow, will not block on this as it is only interim.

polkadot/runtime/parachains/src/paras_inherent/mod.rs

eskimor · 2024-10-07T10:14:00Z

polkadot/runtime/parachains/src/paras_inherent/mod.rs

+			// In `Enter` context (invoked during execution) no more candidates should be
+			// filtered, because they have already been filtered during `ProvideInherent`
+			// context. Abort in such cases.
+			if context == ProcessInherentDataContext::Enter {


These checks should be top-level. They should be a wrapper around process_inherent. For debugging we can leave logs on when something was filtered for what reason.

Reasoning is simple: If this is handled once at the top-level - you can not mess it up/forget it in some place.

done. have a look

eskimor

Thanks Alin! Good work. I am having trouble getting confident that this working as expected. (hence all my comments about architecture, it is quite hard to reason about this code) But other than that, are we lacking tests? E.g. a test that: core sharing now works as expected, even if one chain is not producing blocks. Or that the runtime API now always returns the claim queue as expected?

eskimor · 2024-10-10T11:49:06Z

polkadot/runtime/parachains/src/configuration.rs

@@ -686,18 +680,7 @@ pub mod pallet {
 			Self::set_coretime_cores_unchecked(new)
 		}

-		/// Set the max number of times a claim may timeout on a core before it is abandoned
-		#[pallet::call_index(7)]


Shall we also already mark the fields as deprecated, pointing to the removing ticket. E.g. here?

eskimor · 2024-10-10T11:53:39Z

polkadot/runtime/parachains/src/paras_inherent/mod.rs


-			Self::process_inherent_data(data, ProcessInherentDataContext::Enter)
-				.map(|(_processed, post_info)| post_info)
+			Self::process_inherent_data(data, ProcessInherentDataContext::Enter).and_then(


That's not entirely what I meant. I mean we should have a generic test here. Essentially with Enter the processed data should be the same as the incoming one. This not only concerns backed_candidates.

Point being: We should be able to have this top-level invariant that the data passed in for Enter is already good, hence it should not be changed by calling process_inherent_data. It should not even be necessary to pass in the ProcessInherentDataContext. Instead we just let it filter, but if it actually filters something on Enter - we error out.

good point. done!

eskimor · 2024-10-10T12:05:24Z

polkadot/runtime/parachains/src/paras_inherent/mod.rs

@@ -345,7 +372,7 @@ impl<T: Config> Pallet<T> {
 		log::debug!(target: LOG_TARGET, "Time weight before filter: {}, candidates + bitfields: {}, disputes: {}", weight_before_filtering.ref_time(), candidates_weight.ref_time() + bitfields_weight.ref_time(), disputes_weight.ref_time());

 		let current_session = shared::CurrentSessionIndex::<T>::get();
-		let expected_bits = scheduler::AvailabilityCores::<T>::get().len();
+		let expected_bits = scheduler::Pallet::<T>::num_validator_groups();


Bitfields are for cores. So what we are actually querying here is number of cores, not backing groups. That some of these cores will never ever see an assignment, this is something only the scheduler needs to know.

renamed this

eskimor · 2024-10-10T14:03:27Z

polkadot/runtime/parachains/src/paras_inherent/mod.rs

+			let mut eligible: BTreeMap<ParaId, BTreeSet<CoreIndex>> = BTreeMap::new();
+			let mut total_eligible_cores = 0;
+
+			for (core_idx, para_id) in Self::eligible_paras(&occupied_cores) {


Why not make this a function of inclusion? It already knows the occupied cores and I think it also just makes sense.

we also need the occupied cores here when calling advance_claim_queue. this is why I chose to move it here.

eskimor · 2024-10-10T15:04:04Z

polkadot/runtime/parachains/src/runtime_api_impl/v11.rs

-		.enumerate()
-		.map(|(i, core)| match core {
-			CoreOccupied::Paras(entry) => {
+	(0..n_cores)


Introducing something similar to ClaimQueueIterator in inclusion would make this code a bit more straight forward. (Provide an iterator that iterates over all cores, including unoccupied ones.)

(Could also help avoiding looking up PendingAvailability twice.)

I reworked this to not query PendingAvailability twice

eskimor · 2024-10-10T15:35:14Z

polkadot/runtime/parachains/src/scheduler/tests.rs

-			Scheduler::free_cores_and_fill_claim_queue(BTreeMap::new(), 2);
+		{
+			let mut claim_queue = scheduler::ClaimQueue::<Test>::get();
+			assert_eq!(Scheduler::claim_queue_len(), 3);


This does not look like an exhaustive test, if we are not testing the newly added core?

we are. the core count used to be 2. we increased to 4. but out of the two new cores we only have an assignment for one of them. So we check that the claim queue contains the assignment for the third core (index 2)

eskimor · 2024-10-11T05:18:51Z

polkadot/runtime/parachains/src/session_info.rs

@@ -135,8 +135,8 @@ impl<T: Config> Pallet<T> {
 		let assignment_keys = AssignmentKeysUnsafe::<T>::get();
 		let active_set = shared::ActiveValidatorIndices::<T>::get();

-		let validator_groups = scheduler::ValidatorGroups::<T>::get().into();
-		let n_cores = scheduler::AvailabilityCores::<T>::get().len() as u32;
+		let validator_groups = scheduler::ValidatorGroups::<T>::get();


Note for future refactor: There is very little reason left for ValidatorGroups to remain in scheduler. Given that we are already storing them in SessionInfo, I wonder whether we need that storage entry at all.

A potentially improved architecture would probably have both ValidatorGroups and core count be managed and exposed via the session_info crate.

For the distinction between configured number of cores and actual number of cores .. this is still annoying. In practice at least not even the scheduler should care. The scheduler code should work equally well if we used actual number of cores as exposed by session_info. Then the configured number of coretime_cores is only relevant within the coretime assignment provider and nicely encapsulated where it belongs. 🤔

noted it on #5529

polkadot/runtime/parachains/src/shared/tests.rs

eskimor · 2024-10-11T06:51:22Z

polkadot/runtime/rococo/src/weights/polkadot_runtime_parachains_paras_inherent.rs

+		//  Measured:  `42760`
+		//  Estimated: `46225`
+		// Minimum execution time: 342_453_000 picoseconds.
+		Weight::from_parts(360_067_000, 0)


Do we understand where this significant increase is coming from?

eskimor · 2024-10-11T06:54:15Z

polkadot/runtime/rococo/src/weights/polkadot_runtime_parachains_paras_inherent.rs

+		//  Measured:  `76903`
+		//  Estimated: `82843`
+		// Minimum execution time: 1_927_115_000 picoseconds.
+		Weight::from_parts(1_974_210_767, 0)


This also got significantly heavier. Not necessarily a problem, but worth understanding where it is coming from.

I think it's a matter of rococo weights not being up to date even before this PR.
Westend weights were updated in #5270, but not for rococo.

If you compare the new rococo weights to the new westend weights, they're pretty similar. So I don't think there's any issue

…ng blocks

sandreim added 30 commits August 12, 2024 15:07

WIP primitives

793142e

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

WIP

3a29fdf

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

Working version.

4a53577

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

Better version

8285ae7

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

Add missing primitives and fix things

2831c5e

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

Implement v2 receipts in polkadot-runtime-parachains

c767d60

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

add missing stuff

96999e3

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

Switch parachains runtime to use new primitives

c5f2dc3

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

use vstaging primitives

dbb0160

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

update rococo and westend

5efab68

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

client keeps using the old primitives

c2232e4

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

no unsafe pls

87b079f

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

move async backing primtiives to own file

00e8c13

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

fix

cd4d02f

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

fix test build

5509e33

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

fix test-runtime

f8b86d2

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

self review feedback

fe2fbfb

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

review feedback

975e13b

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

feedback

1c7ac55

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

feedback

653873b

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

clippy

dc98149

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

chores

0a6bce3

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

Filter v2 candidate descriptors

5e4dac2

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

fix

f12ca7a

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

fix prospective parachains tests

13734de

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

fix fix

effb1cc

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

fmt

3f75cba

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

fix comment

75a47bb

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

another one

12ed853

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

fix build

f2c0882

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

alindima mentioned this pull request Oct 3, 2024

parachain-system: send core selector ump signal #5888

Merged

Base automatically changed from sandreim/runtime_v2_descriptor_support to master October 7, 2024 07:22

alindima added 2 commits October 7, 2024 11:31

Merge remote-tracking branch 'origin/sandreim/runtime_v2_descriptor_s…

6c81b4c

…upport' into alindima/remove-ttl

Merge remote-tracking branch 'origin/master' into alindima/remove-ttl

a5d4dc0

eskimor reviewed Oct 7, 2024

View reviewed changes

clippy

ccb1176

alindima mentioned this pull request Oct 8, 2024

Modify collators to not produce blocks if there's an upcoming session change at the next relay chain block #5964

Open

alindima added 9 commits October 8, 2024 13:58

address some review comments

9e411f8

clippy

7ea37ad

some more polishing

25b1c23

move eligible_paras

9767e1b

do report_processed when backing/dropping claim

ce092e5

clippy

b369153

Merge branch 'master' into alindima/remove-ttl

345c956

Merge branch 'master' into alindima/remove-ttl

4daf2ff

nits

e61646a

eskimor reviewed Oct 11, 2024

View reviewed changes

alindima added 2 commits October 15, 2024 11:38

some more review feedback

75f1390

Merge remote-tracking branch 'origin/master' into alindima/remove-ttl

6e8f479

alindima force-pushed the alindima/remove-ttl branch from a42f640 to 6e8f479 Compare October 15, 2024 08:40

This was referenced Oct 15, 2024

runtime refactoring: remove the claim queue from the scheduler #5529

Open

Remove ttl and max_availability_timeouts from the HostConfiguration #6067

Open

alindima added 4 commits October 15, 2024 14:09

some fixes

8942c57

fix hyperlink

eff59c9

add sync backing zombienet test

e03d00d

Merge remote-tracking branch 'origin/master' into alindima/remove-ttl

de0fb25

alindima requested a review from a team as a code owner October 15, 2024 13:03

add zombienet test for core sharing when one parachain is not produci…

4e89e25

…ng blocks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: remove ttl #5461

runtime: remove ttl #5461

alindima commented Aug 26, 2024 •

edited

Loading

command-bot bot commented Oct 2, 2024

alindima commented Oct 7, 2024 •

edited

Loading

eskimor left a comment

eskimor Oct 4, 2024

eskimor Oct 4, 2024

eskimor Oct 7, 2024

eskimor Oct 7, 2024

eskimor Oct 7, 2024

alindima Oct 8, 2024

eskimor left a comment

eskimor Oct 10, 2024

alindima Oct 15, 2024

eskimor Oct 10, 2024

eskimor Oct 10, 2024

alindima Oct 15, 2024

eskimor Oct 10, 2024

alindima Oct 15, 2024

eskimor Oct 10, 2024

alindima Oct 11, 2024 •

edited

Loading

eskimor Oct 10, 2024

eskimor Oct 10, 2024

alindima Oct 15, 2024

eskimor Oct 10, 2024

alindima Oct 11, 2024

eskimor Oct 11, 2024

alindima Oct 15, 2024

eskimor Oct 11, 2024

eskimor Oct 11, 2024

alindima Oct 15, 2024

runtime: remove ttl #5461

Are you sure you want to change the base?

runtime: remove ttl #5461

Conversation

alindima commented Aug 26, 2024 • edited Loading

command-bot bot commented Oct 2, 2024

alindima commented Oct 7, 2024 • edited Loading

eskimor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eskimor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alindima Oct 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alindima commented Aug 26, 2024 •

edited

Loading

alindima commented Oct 7, 2024 •

edited

Loading

alindima Oct 11, 2024 •

edited

Loading