Motoko Data Inspection #4705

luc-blaeser · 2024-09-20T12:06:27Z

Motoko Data Inspection

Note: This is still a prototype, not yet ready for merging.

Generic data inspection of Motoko canisters by authorized users.

Your Motoko Program is a Database!

This is only a small first step towards the bigger vision of providing data tooling to Motoko, similar to a database management system. This would support data inspection, data queries, maybe even data modification, data backup/restore, complex data migration, and/or administrative functionality.

Frontend Prototype

A simple frontend is available to test the data inspection, see https://github.com/luc-blaeser/data-inspector (limited access).

Backend Design

The Motoko runtime system is extended to stream the heap state to the frontend canister for displaying the data to authorized users.

Currently, the following design aspects apply to the data inspection in the Motoko runtime system:

Only a controller can inspect the heap state over this functionality.
The inspection is based on enhanced orthogonal persistence to profit from the precise object tagging.
The data inspection returns the set of live objects (stable and flexible) reachable from the main actor.
The format is mostly a one-to-one binary map of the heap object payload to minimize processing in the backend.
For simplicity, currently, only full-heap inspection is supported. Larger scalability can be implemented later, see thoughts in .
Currently, the field names are not yet shown. This can be supported later.

Binary format (EBNF)

Format = Version Root Heap.
Version = `1: usize`.
Root = `object_id: usize`.
Heap = `object_id: usize` `object_tag: usize` `object_payload:

The object payload is organized as follows:

The regular RTS object payload, with pointers replaced by object ids.
The payload is always a multiple of the word size.
For Object, the object size is prepended because the hash blob cannot directly be
located in the stream.

object_id are potentially synthetic identifiers of an objects. The ids are skewed,
to distinguish them from scalars. Currently, the object_id are heap pointers but
this would change with incremental inspection.

usize is 64-bit little endian.

Implementation

Currently, a separate mark bitmap is used for heap traversal during inspection. This
bitmap is independent of the potentially other bitmaps used during incremental GC.
For arrays, the tag can not be copied one-to-one from the heap object as it may
temporarily store slicing information stored during the incremental GC.
As usual, forwarding pointers of the incremental GC need to be resolved during heap
inspection.
A separate mark stack is needed during heap inspection. This stack additionally
stores the array slicing information of the heap inspection, independent of the
incremental GC.
A simple stream buffer is used to serialize the binary result of the heap inspection.
The buffer is represented as a linked list of blobs that is finally copied to a combined
single blob. This is because the size of the live set is not known in advance.

Future: Incremental Inspection

Incremental inspection can be realized in the future for scalability to larger heaps:

It enables chunked data downloads in multiple messages without blocking other user
messages. This is particularly important because the message response size is limited.
It establishes a logical session where the client receives incremental heap changes
without needing to refetch the full heap.

Possible implementation:

Synthetic object ids need to be used that are independent of the address of the object.
This is because objects are moved by the GC.
A hash map can be used to map heap pointers to synthetic object ids. This map also serves
for marking during heap traversal, such that a mark bitmap would no longer be needed.
The pointers in the map are weak pointers that are updated by the GCs but will be removed
from the map if the object is collected.
On chunked data downloads, the object state can only be sent if all their contained
pointers have been traversed. Otherwise, their state need to be transmitted on a
subsequent download message.
A pending list record the objects which state is ready to be sent in a next download.
The pointers in the pending list are weak.
Write barriers need to be extended to catch all mutator writes to pointers and scalars
during a heap inspection session. The pointers of modified objects are recorded in a
hash set, similar to the remembered set of the generational GCs. Again the pointers are
treated as weak pointers.
On incremental inspection, the runtime system resends the state of modified objects of
the hash set in addition to a potentially next heap chunk of the pending list.
The hash set is eventually cleared and the sent objects are removed from the pending list.
On the client side, the object graph is updated for each resent object, while new objects
are added.

github-actions · 2024-09-20T16:42:43Z

Comparing from 247aa05 to 53bcc75:
In terms of gas, 3 tests regressed and the mean change is +0.0%.
In terms of size, 5 tests regressed and the mean change is +0.4%.

crusso · 2024-09-23T09:34:36Z

Very cool!

crusso · 2024-09-23T09:37:24Z

I wonder if leb encodings would compress stuff quite a bit...

luc-blaeser added 18 commits September 10, 2024 15:55

Initial prototype (work in progress)

2d29959

Continue implementation

d114d4e

Add test cases

026ee5e

Refine test

2e826e3

Refine error message

4275410

Refine implementation

d97bc2a

Prepend object size

e466bd2

Adjust data format

a34cecd

Fix object payload streaming

017f030

Update test result

ebed4ab

Fix mark bitmap

aa17abe

Use main actor as root

2c53a69

Update test result

73ba628

Fix root inspection

312db4f

Update comment

e4d37f8

Fix inspected type

f1e7e4b

Add comment

e1391aa

Update benchmark results

c0288a5

luc-blaeser marked this pull request as draft September 20, 2024 12:06

luc-blaeser self-assigned this Sep 20, 2024

luc-blaeser added DO-NOT-MERGE feature New feature or request labels Sep 20, 2024

luc-blaeser added 2 commits September 20, 2024 14:52

Fix RTS tests

32bb4ca

Make test portable

53bcc75

luc-blaeser marked this pull request as ready for review September 23, 2024 11:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Motoko Data Inspection #4705

Motoko Data Inspection #4705

luc-blaeser commented Sep 20, 2024

github-actions bot commented Sep 20, 2024

crusso commented Sep 23, 2024

crusso commented Sep 23, 2024

Motoko Data Inspection #4705

Are you sure you want to change the base?

Motoko Data Inspection #4705

Conversation

luc-blaeser commented Sep 20, 2024

Motoko Data Inspection

Your Motoko Program is a Database!

Frontend Prototype

Backend Design

Binary format (EBNF)

Implementation

Future: Incremental Inspection

github-actions bot commented Sep 20, 2024

crusso commented Sep 23, 2024

crusso commented Sep 23, 2024