Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Allow transaction for offchain indexing #7290

Merged
30 commits merged into from
Jan 22, 2021
Merged

Conversation

cheme
Copy link
Contributor

@cheme cheme commented Oct 9, 2020

This PR enables state-machine transactional layer for the offchain indexing.

  • Move offchain storage change into the general storage change (first commit), this removes the offchain storage change parameter
    from some crates api and is a breaking change.
    Users no longer instantiate a offchain layer change with method 'enabled' but use 'enable_storage_change' method on existing OverlayStorageChange if needed.
  • Make use of OverlayedChangeSet with offchain indexing key and value types.

I wonder if some function could be removed, but I tried to keep changes from touching code logic.

@cheme cheme added A0-please_review Pull request needs code review. D2-breaksapi C1-low PR touches the given topic and has a low impact on builders. labels Oct 9, 2020
@cheme cheme requested a review from tomusdrw as a code owner October 9, 2020 14:03

Ok(StorageChanges {
main_storage_changes: main_storage_changes.collect(),
child_storage_changes: child_storage_changes.map(|(sk, it)| (sk, it.0.collect())).collect(),
#[cfg(feature = "std")]
offchain_storage_changes: Default::default(),
offchain_storage_changes,
Copy link
Contributor Author

@cheme cheme Oct 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering how the changes were communicated from state-machine to client-db, with this value set to default?
Edit: probably the change delta was not extracted but use with the change struct as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @bkchr , I am not hundred percent sure things were not passing through the &mut parameter (something I did remove in this PR (see macro change)), but yes client db probably don't read from the right data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which macro changes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wasn't it reading from the correct data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the part you also change in #7940 (just terrible way of saying it :(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLDR, everything is okay here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect :D Then merge it :D

Copy link
Contributor

@tomusdrw tomusdrw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I wonder what are the consequences now though. Would the offchain changes be reverted in case the block is found non-canon? If yes, then we should rather put the indexing storage changes to LOCAL database not PERSISTENT one, cause the latter have different promises.

Comment on lines 662 to 664
/// Activate offchain indexing.
/// Note that this function must only be call before any transactional
/// changes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Activate offchain indexing.
/// Note that this function must only be call before any transactional
/// changes.
/// Activate offchain indexing.
///
/// Note that this function must only be called before any transactional changes.

TBH I'd rather prefer OverlayedChanges::with_offchain_indexing() to make sure it can't be abused (i.e. altered in the middle of execution).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeap agree with this @tomaka

primitives/core/src/offchain/mod.rs Show resolved Hide resolved
@@ -746,6 +746,14 @@ impl TransactionPoolExt {
}
}

/// Change to be applied to the offchain worker db in regards to a key.
#[derive(Debug,Clone,Hash,Eq,PartialEq)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[derive(Debug,Clone,Hash,Eq,PartialEq)]
#[derive(Debug, Clone, Hash, Eq, PartialEq)]

}

/// Drain all elements of offchain changeset.
pub fn drain_offchain(&mut self) -> OffchainOverlayedChangesIntoIter {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only see this method being used in tests, where do we actually apply these changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// Stores the changes that this overlay constitutes.
changes: BTreeMap<StorageKey, OverlayedValue>,
pub(crate) changes: BTreeMap<K, OverlayedEntry<V>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the pub(crate) stuff for tests? I think it would be better to get an accessor method that is only compiled when #[cfg(test)]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is to access the field in primitives/state-machine/src/overlayed_changes/offchain.rs.
I can move the struct declaration in primitives/state-machine/src/overlayed_changes/mod.rs or use 'pub(super)'. I think I prefer the second option.

@cheme
Copy link
Contributor Author

cheme commented Oct 12, 2020

I wonder what are the consequences now though. Would the offchain changes be reverted in case the block is found non-canon?

No, offchain changes are only reverted when transaction are used during block processing (which happens probably only for contract at this point), so we still got every non-canonical changes commited.
I think being able to only write indexing changes on canonicalization would be a good thing.
It is certainly doable (from client-db write changes into journal and only actually write at canonicalization dropping non canonical journals), it will be very similar code to the non-archive node code.
Though I wonder if it should be call 'local' ('canonical' would make more sense or changing 'local' semantic with a new comment could also do the trick).
Or we want to be able to query non canonical block content to and can use similar code to state-db one (just not on trie node but on values (which means that there will be a little bit of new code logic involve)).
That would be probably a more straightforward way to implement 'LOCAL' than #7291 (in which we want to be able to query any chain state which is that much needed).

So, what is the more suitable?

  • changes only written when canonical (indexing of non canonical is not doable).
  • changes only written in indexing when canonical, and in memory state-db to be able to query non-canonical.

@tomusdrw
Copy link
Contributor

No, offchain changes are only reverted when transaction are used during block processing (which happens probably only for contract at this point), so we still got every non-canonical changes commited.

That's okay then.

The Indexing API was designed for PERSISTENT storage (dont-care-if-canonical one) in mind, and imho for backward compatibility it should stay the same. We can add a parameter to write to (fork-)LOCAL one too if needed, but only as an option.

Or we want to be able to query non canonical block content to and can use similar code to state-db one (just not on trie node but on values (which means that there will be a little bit of new code logic involve)).
That would be probably a more straightforward way to implement 'LOCAL' than #7291 (in which we want to be able to query any chain state which is that much needed).

I'm not sure I fully understand this option here though. My idea of LOCAL was that it always reflects the state of a particular fork - i.e. only stuff that was actually written to the storage by ancestors of the current block (no fork-garbage). If you are running in the context of a block that is on canonical chain, you will indeed see the "canonical" storage, but subsequent blocks imported to the canonical chain should not really affect what the OCW sees (hence "locality").
If that's not feasible to implement I'm okay with some other ideas (which I presume #7291 implements), but we need to carefully document the behavior of such storage.

@cheme
Copy link
Contributor Author

cheme commented Oct 12, 2020

I think #7291 is a bit overcomplicated for the use case. (if we only need to be able to access a single state).

If you are running in the context of a block that is on canonical chain, you will indeed see the "canonical" storage, but subsequent blocks imported to the canonical chain should not really affect what the OCW sees (hence "locality").

IIUC this implies option 1 is enough (put changes in journals and only write them to db on canonicalization, so when you query content it is only canonical content), which is probably the better choice (simpler and OW probably only care with finalized content most of the time).

@cheme
Copy link
Contributor Author

cheme commented Oct 14, 2020

I got the option 1 (journal change and only write them on block finalization) implemented in cheme/substrate@offchain_indexing_tx...cheme:offchain_indexing_tx_2 (did not update most doc and lacks testing).
I think it is fine, it allows writing offchain local storage through the indexing api only, and then querying this will report state of finalized block.
I was also thinking a bit about allowing update through non indexing api, but that would probably need storing pending change differently (having a single encoded journal would be bad, and individual items should probably be use), seems rather doable.
In this case the last modification done to a value would apply on finalization (we cannot use compare_and_set), and may be rather random, not sure it is a good idea to implement this way.

@tomusdrw
Copy link
Contributor

tomusdrw commented Oct 19, 2020

I think it is fine, it allows writing offchain local storage through the indexing api only, and then querying this will report state of finalized block.

Wait, that completely changes semantics and is going to be really cumbersome (close to impossible) to use. OCWs don't have any idea about finality and the purpose of Indexing API is to pass data directly to OCWs. If the changes are only applied on finality it means OCWs would need to access the state of the finalized block somehow and it's not currently possible (they only run right after the latest new block is imported and can only access state of that block).

This solution could be acceptable (although imho quite limiting) if we re-triggered OCWs for finalized blocks passing this information somehow, but currently it's not how it works at all.

@cheme
Copy link
Contributor Author

cheme commented Oct 19, 2020

I guess I misinterpreted you previous reply, it sounds like what is targetted by #7291 or option 2 (query does report non canonical change for a given fork).
Option 1 which is rather straight forward implementation is indeed different semantic, I am not sure if it can be useful, the fact that the ocw can trigger with a different code than the one of the finalized bloc is indeed problematic: would require to trigger ocw on finalization explicitely (with the code on chain at the time).
To avoid the massive code of #7291, option 1 could be overloaded with a changeset layer that reapply updates on reorg (a non negligeable cost but should be rather ok for our chain), probably not super hard to implement. I will only allows to query state of the tip of the current fork when #7291 aims to allow queries at any state.

@cheme
Copy link
Contributor Author

cheme commented Oct 20, 2020

Maybe this discussion (local ocw) should move to #7291, since it is a bit out of the scope of this PR (just have offchain indexing conform to state-machine transactional layer).

About idea to use some changeset layer from previous comment, I think it can only lead to concurrency issue, and approach of #7291 (explicitely allow change for a given block hash) is correct.
Still we could have a changeset managed for every blocks, but this is generally the same as #7291 implemented in a different way, I will probably forget this idea.

Copy link
Contributor

@tomusdrw tomusdrw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! Couple of typedefs could be introduced to make the iterator types more readable.

Comment on lines 123 to 125
btree_map::Iter<'i, (Vec<u8>,Vec<u8>), OverlayedEntry<OffchainOverlayedChange>>,
fn((&'i (Vec<u8>, Vec<u8>), &'i OverlayedEntry<OffchainOverlayedChange>))
-> (&'i (Vec<u8>, Vec<u8>), &'i OffchainOverlayedChange),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add some typedefs here? This is really hard to read :)


/// Iterate by reference over the prepared offchain worker storage changes.
pub struct OffchainOverlayedChangesIter<'i> {
inner: Option<sp_std::iter::Map<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you use something like impl Iterator<...>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish I could. In fact if I remove all OffchainIterator struct and just have a function return impl Iterator<...> it will work nice and easy.
I wanted to keep those iterator types to avoid breaking too much the api, but I guess I will try to remove them, I am not sure they are usefull here.

Comment on lines 157 to 159
btree_map::IntoIter<(Vec<u8>,Vec<u8>), OverlayedEntry<OffchainOverlayedChange>>,
fn(((Vec<u8>, Vec<u8>), OverlayedEntry<OffchainOverlayedChange>))
-> ((Vec<u8>, Vec<u8>), OffchainOverlayedChange),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, deserves a bunch of typedefs. Especially the (Vec<u8>, Vec<u8>) stuff across the entire file.

@@ -101,23 +95,21 @@ impl<H: Hasher, N: ChangesTrieBlockNumber> TestExternalities<H, N>
let changes_trie_config = storage.top.get(CHANGES_TRIE_CONFIG)
.and_then(|v| Decode::decode(&mut &v[..]).ok());
overlay.set_collect_extrinsics(changes_trie_config.is_some());
overlay.enable_offchain_indexing();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's not possible to move that to the constructor function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC that is totally doable (I did consider it), will try.

@tomusdrw tomusdrw added A6-mustntgrumble and removed A0-please_review Pull request needs code review. labels Oct 20, 2020
@tomusdrw
Copy link
Contributor

Maybe this discussion (local ocw) should move to #7291, since it is a bit out of the scope of this PR (just have offchain indexing conform to state-machine transactional layer).

💯

@cheme cheme added the A0-please_review Pull request needs code review. label Dec 8, 2020
@cheme cheme removed A3-needsresolving A5-stale Pull request did not receive any updates in a long time. No review needed at this stage. Close it. labels Jan 20, 2021
Copy link
Member

@athei athei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bkchr
Copy link
Member

bkchr commented Jan 20, 2021

@cheme I think tests are missing?

@cheme
Copy link
Contributor Author

cheme commented Jan 20, 2021

yes will complete that at state-machine level (about the overlay and transaction), do you expect other tests?
(I am thinking about the '/' case and test touching more client but I am not sure if you will handle that in another pr)

@bkchr
Copy link
Member

bkchr commented Jan 20, 2021

I think statemachine level should be enough, as this shouldn't propagate to upper layers ;)

@cheme
Copy link
Contributor Author

cheme commented Jan 21, 2021

I did merge master post #7940 .

I choose as @bkchr to remove the optimization that avoid writing data in state machine (mainly because with test executor it is a bit awkward to use).

Thinking twice I could also have change the default to active and unactivate it in client/service/src/client/call_executor.rs .

Please tell me what you think @todr , @ateissen , @bkchr

Will call 'bœt mërge' tomorrow evening if no disagreement until then.

@athei
Copy link
Member

athei commented Jan 22, 2021

No objections. I like the simplification resulting in expressing a None as an empty map. Collections already have a natural None which makes it a good idea to use it when possible.

@@ -388,6 +390,7 @@ impl OverlayedChanges {
.expect("Top and children changesets are started in lockstep; qed");
!changeset.is_empty()
});
self.offchain.overlay_mut().rollback_transaction()?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably use an expect like for children above?

@@ -401,6 +404,7 @@ impl OverlayedChanges {
changeset.commit_transaction()
.expect("Top and children changesets are started in lockstep; qed");
}
self.offchain.overlay_mut().commit_transaction()?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

@@ -414,6 +418,7 @@ impl OverlayedChanges {
changeset.enter_runtime()
.expect("Top and children changesets are entering runtime in lockstep; qed")
}
self.offchain.overlay_mut().enter_runtime()?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

@@ -427,6 +432,7 @@ impl OverlayedChanges {
changeset.exit_runtime()
.expect("Top and children changesets are entering runtime in lockstep; qed");
}
self.offchain.overlay_mut().exit_runtime()?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same


Ok(StorageChanges {
main_storage_changes: main_storage_changes.collect(),
child_storage_changes: child_storage_changes.map(|(sk, it)| (sk, it.0.collect())).collect(),
#[cfg(feature = "std")]
offchain_storage_changes: Default::default(),
offchain_storage_changes,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which macro changes?


Ok(StorageChanges {
main_storage_changes: main_storage_changes.collect(),
child_storage_changes: child_storage_changes.map(|(sk, it)| (sk, it.0.collect())).collect(),
#[cfg(feature = "std")]
offchain_storage_changes: Default::default(),
offchain_storage_changes,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wasn't it reading from the correct data?

@cheme
Copy link
Contributor Author

cheme commented Jan 22, 2021

bot merge

@ghost
Copy link

ghost commented Jan 22, 2021

Trying merge.

@ghost ghost merged commit 947a6bc into paritytech:master Jan 22, 2021
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. C1-low PR touches the given topic and has a low impact on builders.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants