Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accessor methods for skiplist nodes #385

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

holiman
Copy link

@holiman holiman commented Dec 11, 2021

This is, ostensibly, a large PR. However, this PR contains no functional changes.

Background

When using go-leveldb in go-ethereum, we've often seen memdb.findGE pop up as a large node on cpu profiles. This blob then expands further, making it look like the native bytes.Compare->cmpbody is slow. However, that is not the case.

The slow thing is not the comparison by itself, but rather the memory accesses being made. A lookup for an item in the memory database can easily do ~30 comparisons for a dataset of a few hundred Mb, and in the 40 region for gigabytes of data (see #384 for some charts about this).
These memory accesses are performed over

In go-ethereum, here's an example stats from one of the memory dbs:

Memory db stats

  • kvdata size: 405.84 MB
  • nodeData size: 172.75 MB
  • item count: 4245225
  • data/metadata ratio: 2.349334,
  • average kv item size: 100.242383,

The actual data size is 406MB. The skiplist metadata is 173Mb, almost 43% of the data.
When performing a lookup, the skiplist traversal takes place over this 173MB memory structure,
and the comparisons load from the 405MB slice.

So a Put or Get operation doing 30 comparisons, will do

  • 30+ loads, spread across the 173MB skiplist (nodeData), and
  • 30 loads, spread across the 405MB kvdata structure, to load the keys.

I looked at some other data engines, namely pebble skiplist and badger skiplist.
Those engines are also using skiplist, but have spent a lot of time minimizing these structures. They use an arena which is a big byte slice, on to which they
use the unsafe package to cast slices into object-form. That is a bit on the extreme, and not what is done in this PR.

Reason for this PR

This PR doesn't actually change anything in the underlying model, but it does introduce accessors to manipulate nodeData.
The idea being, that if the code uses accessor methods, then it's easier to experiment with two things:

Field packing

The current implementation of nodeData is a slice of int, which is uint64 on a 64-bit platform. Thus, every single field in a node takes up 8 bytes.
If this is changed into uint32, the nodeData goes down by 50%. Furthermore, the height field is limited to 12, and could be packed as a uint8 into e.g. keyLength.
In general, this PR enables experimentation with different ways to pack the fields.

With this PR, converting the int to uint32 is as simple as redefining nodeInt as uint32.

None of the changes described below are part of this PR. -- they just become very simple to experiment with.


I tested this, and got the following charts. The datapoints are 400 points during the insertion of 4194304 items, each 32 byte key and 32 byte value. The Y-label shows the time (ms) that inserting the (around 10K) items took.

int (uint64) as backing type

memdb-12-4-4194304-8

Stats: keyvalue size: 268435456
metadata size: 178964832
item count: 4194304
data/metadata ratio: 1.50
average kv item size: 64.00

uint32 as backing type

memdb-12-4-4194304-4

Stats: keyvalue size: 268435456
metadata size: 89482416
item count: 4194304
data/metadata ratio: 3.00
average kv item size: 64.00

The charts show a slight speed improvement, and a sizeable reduction in memory usage.

uint32 as backing type + pack height into keyLen

If we use uint32, and 24 bits to store keySize, and pack height as 8 bits into that field, metadata goes from 89M to 72M

Stats: keyvalue size: 268435456
metadata size: 72705196
item count: 4194304
data/metadata ratio: 3.69
average kv item size: 64.00

In this run, however, the speed degraded back bit.
memdb-12-4-4194304-4

KV separation

A separate track to improve the memdb lookup speed would be to split up the keys and values, which currently both reside in kvData. This PR makes such experimentation
somewhat simpler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant