Accessor methods for skiplist nodes #385

holiman · 2021-12-11T21:13:27Z

This is, ostensibly, a large PR. However, this PR contains no functional changes.

Background

When using go-leveldb in go-ethereum, we've often seen memdb.findGE pop up as a large node on cpu profiles. This blob then expands further, making it look like the native bytes.Compare->cmpbody is slow. However, that is not the case.

The slow thing is not the comparison by itself, but rather the memory accesses being made. A lookup for an item in the memory database can easily do ~30 comparisons for a dataset of a few hundred Mb, and in the 40 region for gigabytes of data (see #384 for some charts about this).
These memory accesses are performed over

In go-ethereum, here's an example stats from one of the memory dbs:

Memory db stats

kvdata size: 405.84 MB
nodeData size: 172.75 MB
item count: 4245225
data/metadata ratio: 2.349334,
average kv item size: 100.242383,

The actual data size is 406MB. The skiplist metadata is 173Mb, almost 43% of the data.
When performing a lookup, the skiplist traversal takes place over this 173MB memory structure,
and the comparisons load from the 405MB slice.

So a Put or Get operation doing 30 comparisons, will do

30+ loads, spread across the 173MB skiplist (nodeData), and
30 loads, spread across the 405MB kvdata structure, to load the keys.

I looked at some other data engines, namely pebble skiplist and badger skiplist.
Those engines are also using skiplist, but have spent a lot of time minimizing these structures. They use an arena which is a big byte slice, on to which they
use the unsafe package to cast slices into object-form. That is a bit on the extreme, and not what is done in this PR.

Reason for this PR

This PR doesn't actually change anything in the underlying model, but it does introduce accessors to manipulate nodeData.
The idea being, that if the code uses accessor methods, then it's easier to experiment with two things:

Field packing

The current implementation of nodeData is a slice of int, which is uint64 on a 64-bit platform. Thus, every single field in a node takes up 8 bytes.
If this is changed into uint32, the nodeData goes down by 50%. Furthermore, the height field is limited to 12, and could be packed as a uint8 into e.g. keyLength.
In general, this PR enables experimentation with different ways to pack the fields.

With this PR, converting the int to uint32 is as simple as redefining nodeInt as uint32.

None of the changes described below are part of this PR. -- they just become very simple to experiment with.

I tested this, and got the following charts. The datapoints are 400 points during the insertion of 4194304 items, each 32 byte key and 32 byte value. The Y-label shows the time (ms) that inserting the (around 10K) items took.

`int` (`uint64`) as backing type

Stats: keyvalue size: 268435456
metadata size: 178964832
item count: 4194304
data/metadata ratio: 1.50
average kv item size: 64.00

`uint32` as backing type

Stats: keyvalue size: 268435456
metadata size: 89482416
item count: 4194304
data/metadata ratio: 3.00
average kv item size: 64.00

The charts show a slight speed improvement, and a sizeable reduction in memory usage.

`uint32` as backing type + pack `height` into `keyLen`

If we use uint32, and 24 bits to store keySize, and pack height as 8 bits into that field, metadata goes from 89M to 72M

Stats: keyvalue size: 268435456
metadata size: 72705196
item count: 4194304
data/metadata ratio: 3.69
average kv item size: 64.00

In this run, however, the speed degraded back bit.

KV separation

A separate track to improve the memdb lookup speed would be to split up the keys and values, which currently both reside in kvData. This PR makes such experimentation
somewhat simpler.

holiman added 5 commits December 11, 2021 18:41

memdb: use object-oriented accessors for skiplist nodes

54c7f78

memdb: define backing-type

667fbf8

memdb: make tests use accessors

081ba6c

memdb: remove unused constants

de5eb3f

memdb: minor refactor

afed0b2

holiman mentioned this pull request Dec 13, 2021

Speed up memory db by using fixed-size keys in skiplist search #387

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accessor methods for skiplist nodes #385

Accessor methods for skiplist nodes #385

holiman commented Dec 11, 2021 •

edited

Loading

Accessor methods for skiplist nodes #385

Are you sure you want to change the base?

Accessor methods for skiplist nodes #385

Conversation

holiman commented Dec 11, 2021 • edited Loading

Background

Reason for this PR

Field packing

int (uint64) as backing type

uint32 as backing type

uint32 as backing type + pack height into keyLen

KV separation

holiman commented Dec 11, 2021 •

edited

Loading

`int` (`uint64`) as backing type

`uint32` as backing type

`uint32` as backing type + pack `height` into `keyLen`