Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Motoko Data Inspection #4705

Open
wants to merge 20 commits into
base: master
Choose a base branch
from
Open

Motoko Data Inspection #4705

wants to merge 20 commits into from

Conversation

luc-blaeser
Copy link
Contributor

Motoko Data Inspection

Note: This is still a prototype, not yet ready for merging.

Generic data inspection of Motoko canisters by authorized users.

Your Motoko Program is a Database!

This is only a small first step towards the bigger vision of providing data tooling to Motoko, similar to a database management system. This would support data inspection, data queries, maybe even data modification, data backup/restore, complex data migration, and/or administrative functionality.

Frontend Prototype

A simple frontend is available to test the data inspection, see https://github.com/luc-blaeser/data-inspector (limited access).

Backend Design

The Motoko runtime system is extended to stream the heap state to the frontend canister for displaying the data to authorized users.

Currently, the following design aspects apply to the data inspection in the Motoko runtime system:

  • Only a controller can inspect the heap state over this functionality.
  • The inspection is based on enhanced orthogonal persistence to profit from the precise object tagging.
  • The data inspection returns the set of live objects (stable and flexible) reachable from the main actor.
  • The format is mostly a one-to-one binary map of the heap object payload to minimize processing in the backend.
  • For simplicity, currently, only full-heap inspection is supported. Larger scalability can be implemented later, see thoughts in .
  • Currently, the field names are not yet shown. This can be supported later.

Binary format (EBNF)

Format = Version Root Heap.
Version = `1: usize`.
Root = `object_id: usize`.
Heap = `object_id: usize` `object_tag: usize` `object_payload:

The object payload is organized as follows:

  • The regular RTS object payload, with pointers replaced by object ids.
  • The payload is always a multiple of the word size.
  • For Object, the object size is prepended because the hash blob cannot directly be
    located in the stream.

object_id are potentially synthetic identifiers of an objects. The ids are skewed,
to distinguish them from scalars. Currently, the object_id are heap pointers but
this would change with incremental inspection.

usize is 64-bit little endian.

Implementation

  • Currently, a separate mark bitmap is used for heap traversal during inspection. This
    bitmap is independent of the potentially other bitmaps used during incremental GC.
  • For arrays, the tag can not be copied one-to-one from the heap object as it may
    temporarily store slicing information stored during the incremental GC.
  • As usual, forwarding pointers of the incremental GC need to be resolved during heap
    inspection.
  • A separate mark stack is needed during heap inspection. This stack additionally
    stores the array slicing information of the heap inspection, independent of the
    incremental GC.
  • A simple stream buffer is used to serialize the binary result of the heap inspection.
    The buffer is represented as a linked list of blobs that is finally copied to a combined
    single blob. This is because the size of the live set is not known in advance.

Future: Incremental Inspection

Incremental inspection can be realized in the future for scalability to larger heaps:

  • It enables chunked data downloads in multiple messages without blocking other user
    messages. This is particularly important because the message response size is limited.
  • It establishes a logical session where the client receives incremental heap changes
    without needing to refetch the full heap.

Possible implementation:

  • Synthetic object ids need to be used that are independent of the address of the object.
    This is because objects are moved by the GC.
  • A hash map can be used to map heap pointers to synthetic object ids. This map also serves
    for marking during heap traversal, such that a mark bitmap would no longer be needed.
    The pointers in the map are weak pointers that are updated by the GCs but will be removed
    from the map if the object is collected.
  • On chunked data downloads, the object state can only be sent if all their contained
    pointers have been traversed. Otherwise, their state need to be transmitted on a
    subsequent download message.
  • A pending list record the objects which state is ready to be sent in a next download.
    The pointers in the pending list are weak.
  • Write barriers need to be extended to catch all mutator writes to pointers and scalars
    during a heap inspection session. The pointers of modified objects are recorded in a
    hash set, similar to the remembered set of the generational GCs. Again the pointers are
    treated as weak pointers.
  • On incremental inspection, the runtime system resends the state of modified objects of
    the hash set in addition to a potentially next heap chunk of the pending list.
    The hash set is eventually cleared and the sent objects are removed from the pending list.
  • On the client side, the object graph is updated for each resent object, while new objects
    are added.

@luc-blaeser luc-blaeser marked this pull request as draft September 20, 2024 12:06
@luc-blaeser luc-blaeser self-assigned this Sep 20, 2024
@luc-blaeser luc-blaeser added DO-NOT-MERGE feature New feature or request labels Sep 20, 2024
Copy link

Comparing from 247aa05 to 53bcc75:
In terms of gas, 3 tests regressed and the mean change is +0.0%.
In terms of size, 5 tests regressed and the mean change is +0.4%.

@crusso
Copy link
Contributor

crusso commented Sep 23, 2024

Very cool!

@crusso
Copy link
Contributor

crusso commented Sep 23, 2024

I wonder if leb encodings would compress stuff quite a bit...

@luc-blaeser luc-blaeser marked this pull request as ready for review September 23, 2024 11:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DO-NOT-MERGE feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants