-
Notifications
You must be signed in to change notification settings - Fork 3
Effect of tagged node pointers
Laurynas Biveinis edited this page Jul 20, 2021
·
10 revisions
micro_benchmark_key_prefix
: 2% slowdown (unpredictable_get_shared_length
) to 15% speedup (unpredictable_leaf_key_prefix_split
)
unodb::db
:
-
micro_benchmark_n4
: 23% slowdown (shrink_node16_to_n4_randomly<unodb::db>/16383
) to 30% speedup (full_n4_sequential_insert<unodb::db>/65535
) -
micro_benchmark_n16
: 335% slowdown (full_n16_tree_random_delete<unodb::db>/246000
) to 15% speedup (n16_random_add<unodb::db>/10
) -
micro_benchmark_n48
: 115% slowdown (full_n48_tree_random_delete<unodb::db>/196608
) to 24% speedup (full_n48_tree_random_delete<unodb::db>/512
) -
micro_benchmark_n256
: 102% slowdown (full_n256_tree_sequential_delete<unodb::db>/196608
) to 25% speedup (full_n256_tree_random_delete<unodb::db>/512
)
unodb::olc_db
:
-
micro_benchmark_n4
: 23% slowdown (n4_full_scan<unodb::olc_db>/512
) to 18% speedup (minimal_n4_random_insert<unodb::db>/64
) -
micro_benchmark_n16
: 22% slowdown (full_n16_tree_full_scan<unodb::olc_db>/512
) to 13% speedup (grow_n4_to_n16_randomly<unodb::olc_db>/64
) -
micro_benchmark_n48
: 11% slowdown (grow_n16_to_n48_sequentially<unodb::olc_db>/4096
) to 4% speedup (full_n48_tree_sequential_delete<unodb::olc_db>/196608
) -
micro_benchmark_n256
: 4% slowdown (minimal_n256_tree_full_scan<unodb::olc_db>/4096
) to 4% speedup (grow_n48_to_n256_randomly<unodb::olc_db>/512
)
perf report
on the biggest regression (335% slowdown on full_n16_tree_random_delete<unodb::db>/246000
) shows significant differences in glibc allocation code paths. As the branches are not orthogonal to the allocator due to tagged pointer nodes being slightly smaller, the allocator starts to matter. Let's try with LD_PRELOAD
-ed JEMalloc, same code
micro_benchmark_key_prefix
: 3% slowdown (unpredictable_get_shared_length
) to 15% speedup (unpredictable_cut_key_prefix
)
unodb::db
:
-
micro_benchmark_n4
: 4% slowdown (n4_full_scan<unodb::db>/4096
) to 23% speedup (full_n4_to_minimal_sequential_delete<unodb::db>/100
) -
micro_benchmark_n16
: 3% slowdown (full_n16_tree_full_scan<unodb::db>/512
) to 14% speedup (n16_random_add<unodb::db>/16383
) -
micro_benchmark_n48
: 4% slowdown (grow_n16_to_n48_sequentially<unodb::db>/512
) to 27% speedup (full_n48_tree_random_delete<unodb::db>/192
) -
micro_benchmark_n256
: 6% slowdown (minimal_n256_tree_full_scan<unodb::db>/512
) to 32% speedup (full_n256_tree_random_delete<unodb::db>/192
)
unodb::olc_db
:
-
micro_benchmark_n4
: 17% slowdown (minimal_n4_random_insert<unodb::olc_db>/16
) to 9% speedup (shrink_node16_to_n4_randomly<unodb::olc_db>/64
) -
micro_benchmark_n16
: 2% slowdown (n16_sequential_add<unodb::olc_db>/4096
) to 9% speedup (full_n16_tree_sequential_delete<unodb::olc_db>/64
) -
micro_benchmark_n48
: 3% slowdown (n48_random_add<unodb::olc_db>/2
) to 8% speedup (shrink_n256_to_n48_randomly<unodb::olc_db>/64
) -
micro_benchmark_n256
: 8% slowdown (grow_n48_to_n256_sequentially<unodb::olc_db>/8
) to 3% speedup (grow_n48_to_n256_sequentially<unodb::olc_db>/64
)
- N4
unodb::db
: 8.06248M to 7.99998M - N4
unodb::olc_db
: 9.33331M to 8.66665M - N16
unodb::db
: 37.6469M to 37.347M - N16
unodb::olc_db
: 42.5216M to 40.1217M - N48
unodb::db
: 25.3126M to 25.1141M - N48
unodb::olc_db
: 28.3604M to 26.7724M - N256
unodb::db
: 30.9042M to 30.6493M - N256
unodb::olc_db
: 34.7363M to 32.6969M