-
Notifications
You must be signed in to change notification settings - Fork 667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: remove ttl #5461
base: master
Are you sure you want to change the base?
runtime: remove ttl #5461
Conversation
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
@alindima Command |
oh this got pretty messed up since it was based on #5423, which was merged. I'll try to fix it but will probably need to force push it Later edit: managed to fix it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few nits. Let's strive for well documented and encapsulated pallets with a nice API that serves a purpose/sets clear to understand boundaries and expectations in both directions.
Other than that, I think I found one issue with regards to virtual cores.
fn report_processed(para_id: ParaId, core_index: CoreIndex) { | ||
// Reporting processed assignments is only important for on-demand. | ||
// Doing the call below is a no-op if the assignment was a `Bulk` one. | ||
on_demand::Pallet::<T>::report_processed(para_id, core_index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok sorry. Now I looked at the code as well. Indeed this is a hack of course. The assignment provider should know about on-demand/bulk. Calling into the on-demand for non-ondemand assignments is not nice indeed, but I see you added the no-op to the docs. Better yet would be to state this as a proper invariant, ideally even with an accommodating test, checking that this is indeed a noop.
fn report_processed(para_id: ParaId, core_index: CoreIndex) { | ||
// Reporting processed assignments is only important for on-demand. | ||
// Doing the call below is a no-op if the assignment was a `Bulk` one. | ||
on_demand::Pallet::<T>::report_processed(para_id, core_index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand this is only an intermediary hack right? With the proper fix coming, we should no longer need this hack anyway.
fn report_processed(para_id: ParaId, core_index: CoreIndex) { | ||
// Reporting processed assignments is only important for on-demand. | ||
// Doing the call below is a no-op if the assignment was a `Bulk` one. | ||
on_demand::Pallet::<T>::report_processed(para_id, core_index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the code I don't get why we need this hack. Anyhow, will not block on this as it is only interim.
// In `Enter` context (invoked during execution) no more candidates should be | ||
// filtered, because they have already been filtered during `ProvideInherent` | ||
// context. Abort in such cases. | ||
if context == ProcessInherentDataContext::Enter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These checks should be top-level. They should be a wrapper around process_inherent
. For debugging we can leave logs on when something was filtered for what reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reasoning is simple: If this is handled once at the top-level - you can not mess it up/forget it in some place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. have a look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Alin! Good work. I am having trouble getting confident that this working as expected. (hence all my comments about architecture, it is quite hard to reason about this code) But other than that, are we lacking tests? E.g. a test that: core sharing now works as expected, even if one chain is not producing blocks. Or that the runtime API now always returns the claim queue as expected?
@@ -686,18 +680,7 @@ pub mod pallet { | |||
Self::set_coretime_cores_unchecked(new) | |||
} | |||
|
|||
/// Set the max number of times a claim may timeout on a core before it is abandoned | |||
#[pallet::call_index(7)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we also already mark the fields as deprecated, pointing to the removing ticket. E.g. here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
Self::process_inherent_data(data, ProcessInherentDataContext::Enter) | ||
.map(|(_processed, post_info)| post_info) | ||
Self::process_inherent_data(data, ProcessInherentDataContext::Enter).and_then( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not entirely what I meant. I mean we should have a generic test here. Essentially with Enter
the processed data should be the same as the incoming one. This not only concerns backed_candidates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Point being: We should be able to have this top-level invariant that the data passed in for Enter
is already good, hence it should not be changed by calling process_inherent_data
. It should not even be necessary to pass in the ProcessInherentDataContext
. Instead we just let it filter, but if it actually filters something on Enter
- we error out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. done!
@@ -345,7 +372,7 @@ impl<T: Config> Pallet<T> { | |||
log::debug!(target: LOG_TARGET, "Time weight before filter: {}, candidates + bitfields: {}, disputes: {}", weight_before_filtering.ref_time(), candidates_weight.ref_time() + bitfields_weight.ref_time(), disputes_weight.ref_time()); | |||
|
|||
let current_session = shared::CurrentSessionIndex::<T>::get(); | |||
let expected_bits = scheduler::AvailabilityCores::<T>::get().len(); | |||
let expected_bits = scheduler::Pallet::<T>::num_validator_groups(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bitfields are for cores. So what we are actually querying here is number of cores, not backing groups. That some of these cores will never ever see an assignment, this is something only the scheduler needs to know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed this
let mut eligible: BTreeMap<ParaId, BTreeSet<CoreIndex>> = BTreeMap::new(); | ||
let mut total_eligible_cores = 0; | ||
|
||
for (core_idx, para_id) in Self::eligible_paras(&occupied_cores) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not make this a function of inclusion
? It already knows the occupied cores and I think it also just makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we also need the occupied cores here when calling advance_claim_queue
. this is why I chose to move it here.
.enumerate() | ||
.map(|(i, core)| match core { | ||
CoreOccupied::Paras(entry) => { | ||
(0..n_cores) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introducing something similar to ClaimQueueIterator in inclusion would make this code a bit more straight forward. (Provide an iterator that iterates over all cores, including unoccupied ones.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Could also help avoiding looking up PendingAvailability
twice.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reworked this to not query PendingAvailability
twice
Scheduler::free_cores_and_fill_claim_queue(BTreeMap::new(), 2); | ||
{ | ||
let mut claim_queue = scheduler::ClaimQueue::<Test>::get(); | ||
assert_eq!(Scheduler::claim_queue_len(), 3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not look like an exhaustive test, if we are not testing the newly added core?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are. the core count used to be 2. we increased to 4. but out of the two new cores we only have an assignment for one of them. So we check that the claim queue contains the assignment for the third core (index 2)
@@ -135,8 +135,8 @@ impl<T: Config> Pallet<T> { | |||
let assignment_keys = AssignmentKeysUnsafe::<T>::get(); | |||
let active_set = shared::ActiveValidatorIndices::<T>::get(); | |||
|
|||
let validator_groups = scheduler::ValidatorGroups::<T>::get().into(); | |||
let n_cores = scheduler::AvailabilityCores::<T>::get().len() as u32; | |||
let validator_groups = scheduler::ValidatorGroups::<T>::get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for future refactor: There is very little reason left for ValidatorGroups
to remain in scheduler. Given that we are already storing them in SessionInfo
, I wonder whether we need that storage entry at all.
A potentially improved architecture would probably have both ValidatorGroups
and core count be managed and exposed via the session_info
crate.
For the distinction between configured number of cores and actual number of cores .. this is still annoying. In practice at least not even the scheduler should care. The scheduler code should work equally well if we used actual number of cores as exposed by session_info. Then the configured number of coretime_cores
is only relevant within the coretime assignment provider and nicely encapsulated where it belongs. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noted it on #5529
// Measured: `42760` | ||
// Estimated: `46225` | ||
// Minimum execution time: 342_453_000 picoseconds. | ||
Weight::from_parts(360_067_000, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we understand where this significant increase is coming from?
// Measured: `76903` | ||
// Estimated: `82843` | ||
// Minimum execution time: 1_927_115_000 picoseconds. | ||
Weight::from_parts(1_974_210_767, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also got significantly heavier. Not necessarily a problem, but worth understanding where it is coming from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a matter of rococo weights not being up to date even before this PR.
Westend weights were updated in #5270, but not for rococo.
If you compare the new rococo weights to the new westend weights, they're pretty similar. So I don't think there's any issue
a42f640
to
6e8f479
Compare
Resolves #4776
This will enable proper core-sharing between paras, even if one of them is not producing blocks.
TODO:
Important note:
The
ttl
andmax_availability_timeouts
fields of the HostConfiguration are not removed in this PR, due to #64.Adding the workaround with the storage version check for every use of the active HostConfiguration in all runtime APIs would be insane, as it's used in almost all runtime APIs.
So even though the ttl and max_availability_timeouts fields will now be unused, they will remain part of the host configuration.
These will be removed in a separate PR once #64 is fixed.