Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new metering & wasm-opt O3 #83

Closed
wants to merge 6 commits into from
Closed

Conversation

crusso
Copy link

@crusso crusso commented Sep 15, 2023

base: new metering; no wasm-opt
this: wasm-opt 03

@crusso crusso added the build_base Build base instead of fetching from gh-pages. Note that the build tool runs in the same version label Sep 15, 2023
@chenyan-dfinity chenyan-dfinity changed the base branch from main to no-wasm-opt September 15, 2023 23:54
@github-actions
Copy link

github-actions bot commented Sep 16, 2023

Note
Diffing the performance result against the published result from main branch.
Unchanged benchmarks are omitted.

Map

binary_size generate 1m max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 138_225 ($\textcolor{green}{-13.07\%}$) 8_520_780_136 ($\textcolor{green}{-10.28\%}$) 61_987_732 348_752 ($\textcolor{green}{-11.23\%}$) 6_627_741_092 ($\textcolor{green}{-9.23\%}$) 376_656 ($\textcolor{green}{-11.01\%}$)
triemap 139_617 ($\textcolor{green}{-13.65\%}$) 15_166_031_798 ($\textcolor{green}{-12.34\%}$) 74_216_052 300_512 ($\textcolor{green}{-13.15\%}$) 731_870 ($\textcolor{green}{-12.99\%}$) 718_098 ($\textcolor{green}{-12.96\%}$)
rbtree 140_301 ($\textcolor{green}{-13.40\%}$) 7_622_391_973 ($\textcolor{green}{-9.93\%}$) 57_995_940 114_430 ($\textcolor{green}{-27.86\%}$) 343_988 ($\textcolor{green}{-10.67\%}$) 367_925 ($\textcolor{green}{-13.91\%}$)
splay 136_157 ($\textcolor{green}{-13.49\%}$) 15_359_123_123 ($\textcolor{green}{-11.84\%}$) 53_995_876 736_710 ($\textcolor{green}{-12.41\%}$) 775_847 ($\textcolor{green}{-12.30\%}$) 1_074_979 ($\textcolor{green}{-12.94\%}$)
btree 181_211 ($\textcolor{green}{-15.27\%}$) 11_014_903_625 ($\textcolor{green}{-16.99\%}$) 31_103_892 375_051 ($\textcolor{green}{-18.68\%}$) 518_243 ($\textcolor{green}{-17.64\%}$) 576_083 ($\textcolor{green}{-18.47\%}$)
zhenya_hashmap 146_506 ($\textcolor{green}{-13.02\%}$) 3_325_766_902 ($\textcolor{green}{-14.20\%}$) 65_987_480 84_640 ($\textcolor{green}{-20.62\%}$) 104_252 ($\textcolor{green}{-20.59\%}$) 123_377 ($\textcolor{green}{-20.95\%}$)
btreemap_rs 446_267 1_797_752_179 13_762_560 74_544 126_136 92_839
imrc_hashmap_rs 446_166 2_571_892_333 122_454_016 38_956 179_095 115_561
hashmap_rs 439_346 447_664_894 36_536_320 22_228 27_664 25_290

Priority queue

binary_size heapify 1m max mem pop_min 50 put 50
heap 132_174 ($\textcolor{green}{-13.23\%}$) 6_380_397_060 ($\textcolor{green}{-12.55\%}$) 29_995_836 700_346 ($\textcolor{green}{-13.88\%}$) 258_199 ($\textcolor{green}{-13.23\%}$)
heap_rs 437_278 142_914_793 9_109_504 59_850 23_726

Growable array

binary_size generate 5k max mem batch_get 500 batch_put 500 batch_remove 500
buffer 139_917 ($\textcolor{green}{-13.33\%}$) 2_801_661 ($\textcolor{green}{-14.14\%}$) 65_508 108_523 ($\textcolor{green}{-13.41\%}$) 883_989 ($\textcolor{green}{-15.21\%}$) 177_023 ($\textcolor{green}{-15.84\%}$)
vector 138_378 ($\textcolor{green}{-13.93\%}$) 2_434_733 ($\textcolor{green}{-11.96\%}$) 24_764 170_710 ($\textcolor{green}{-13.58\%}$) 232_943 ($\textcolor{green}{-11.86\%}$) 223_461 ($\textcolor{green}{-16.37\%}$)
vec_rs 435_834 290_143 655_360 17_605 31_014 25_400

Statistics

  • binary_size: -13.60% [-14.02%, -13.17%]
  • max_mem: no change
  • cycles: -14.44% [-15.54%, -13.33%]

SHA-2

binary_size SHA-256 SHA-512 account_id neuron_id
Motoko 172_874 ($\textcolor{green}{-11.88\%}$) 295_693_065 ($\textcolor{green}{-16.20\%}$) 279_705_297 ($\textcolor{green}{-17.51\%}$) 37_597 ($\textcolor{green}{-16.11\%}$) 27_061 ($\textcolor{green}{-15.21\%}$)
Rust 528_234 82_789_387 56_794_263 50_651 53_532

Certified map

binary_size generate 10k max mem inc witness
Motoko 176_461 ($\textcolor{green}{-13.95\%}$) 5_230_705_397 ($\textcolor{green}{-16.35\%}$) 3_429_924 620_784 ($\textcolor{green}{-16.39\%}$) 434_543 ($\textcolor{green}{-14.20\%}$)
Rust 469_955 6_359_442_714 1_081_344 1_012_174 305_119

Statistics

  • binary_size: -12.92% [-19.47%, -6.36%]
  • max_mem: no change
  • cycles: -15.99% [-16.76%, -15.23%]

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal
Motoko 230_245 ($\textcolor{green}{-17.00\%}$) 47_892 ($\textcolor{green}{-6.54\%}$) 22_919 ($\textcolor{green}{-9.45\%}$) 19_044 ($\textcolor{green}{-8.77\%}$) 20_223 ($\textcolor{green}{-9.73\%}$)
Rust 763_017 552_075 105_203 128_753 139_539

DIP721 NFT

binary_size init mint_token transfer_token
Motoko 188_522 ($\textcolor{green}{-18.04\%}$) 17_786 ($\textcolor{green}{-7.69\%}$) 29_865 ($\textcolor{green}{-7.53\%}$) 8_891 ($\textcolor{green}{-8.59\%}$)
Rust 828_238 146_257 380_260 93_763

Statistics

  • binary_size: -17.52% [-20.82%, -14.22%]
  • max_mem: no change
  • cycles: -8.33% [-9.16%, -7.50%]

Heartbeat

binary_size heartbeat
Motoko 123_352 ($\textcolor{green}{-13.20\%}$) 23_302 ($\textcolor{red}{15.80\%}$)
Rust 25_650 1_179

Timer

binary_size setTimer cancelTimer
Motoko 129_617 ($\textcolor{green}{-13.15\%}$) 52_132 ($\textcolor{green}{-4.36\%}$) 4_649 ($\textcolor{green}{-6.61\%}$)
Rust 470_693 69_727 11_405

Statistics

  • binary_size: -13.15%
  • max_mem: no change
  • cycles: -5.48% [-12.59%, 1.63%]

Garbage Collection

Note
Same as main branch, skipping.

Actor class

binary size put new bucket put existing bucket get
Map 261_394 ($\textcolor{green}{-12.20\%}$) 715_724 ($\textcolor{green}{-8.66\%}$) 16_284 ($\textcolor{green}{-4.47\%}$) 16_794 ($\textcolor{green}{-4.20\%}$)

Statistics

  • binary_size: no change
  • max_mem: no change
  • cycles: -7.38% [-11.86%, -2.91%]

Publisher & Subscriber

pub_binary_size sub_binary_size subscribe_caller subscribe_callee publish_caller publish_callee
Motoko 144_405 ($\textcolor{green}{-13.42\%}$) 131_311 ($\textcolor{green}{-13.55\%}$) 28_775 ($\textcolor{green}{-3.94\%}$) 11_973 ($\textcolor{green}{-4.64\%}$) 23_083 ($\textcolor{green}{-4.06\%}$) 6_436 ($\textcolor{green}{-6.34\%}$)
Rust 511_870 565_407 71_728 44_318 95_767 53_941

Statistics

  • binary_size: -13.48% [-13.90%, -13.06%]
  • max_mem: no change
  • cycles: -4.75% [-6.05%, -3.44%]

Overall Statistics

  • binary_size: -13.96% [-14.64%, -13.28%]
  • max_mem: no change
  • cycles: -12.46% [-13.51%, -11.41%]

@github-actions
Copy link

github-actions bot commented Sep 16, 2023

Note
The flamegraph link only works after you merge.
Unchanged benchmarks are omitted.

Collection libraries

Measure different collection libraries written in both Motoko and Rust.
The library names with _rs suffix are written in Rust; the rest are written in Motoko.

We use the same random number generator with fixed seed to ensure that all collections contain
the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:

  • generate 1m. Insert 1m Nat64 integers into the collection. For Motoko collections, it usually triggers the GC; the rest of the column are not likely to trigger GC.
  • max mem. For Motoko, it reports rts_max_heap_size after generate call; For Rust, it reports the Wasm's memory page * 32Kb.
  • batch_get 50. Find 50 elements from the collection.
  • batch_put 50. Insert 50 elements to the collection.
  • batch_remove 50. Remove 50 elements from the collection.

💎 Takeaways

  • The platform only charges for instruction count. Data structures which make use of caching and locality have no impact on the cost.
  • We have a limit on the maximal cycles per round. This means asymptotic behavior doesn't matter much. We care more about the performance up to a fixed N. In the extreme cases, you may see an $O(10000 n\log n)$ algorithm hitting the limit, while an $O(n^2)$ algorithm runs just fine.
  • Amortized algorithms/GC may need to be more eager to avoid hitting the cycle limit on a particular round.
  • Rust costs more cycles to process complicated Candid data, but it is more efficient in performing core computations.

Note

  • The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.
  • Due to the instrumentation overhead and cycle limit, we cannot profile computations with large collections. Hopefully, when deterministic time slicing is ready, we can measure the performance on larger memory footprint.
  • hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.
  • btree comes from mops.one/stableheapbtreemap.
  • zhenya_hashmap comes from mops.one/map.
  • vector comes from mops.one/vector. Compare with buffer, put has better worst case time and space complexity ($O(\sqrt{n})$ vs $O(n)$); get has a slightly larger constant overhead.
  • hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.
  • imrc_hashmap_rs uses the im-rc crate, which is the immutable version hashmap in Rust.

Map

binary_size generate 1m max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 138_225 8_520_780_136 61_987_732 348_752 6_627_741_092 376_656
triemap 139_617 15_166_031_798 74_216_052 300_512 731_870 718_098
rbtree 140_301 7_622_391_973 57_995_940 114_430 343_988 367_925
splay 136_157 15_359_123_123 53_995_876 736_710 775_847 1_074_979
btree 181_211 11_014_903_625 31_103_892 375_051 518_243 576_083
zhenya_hashmap 146_506 3_325_766_902 65_987_480 84_640 104_252 123_377
btreemap_rs 446_267 1_797_752_179 13_762_560 74_544 126_136 92_839
imrc_hashmap_rs 446_166 2_571_892_333 122_454_016 38_956 179_095 115_561
hashmap_rs 439_346 447_664_894 36_536_320 22_228 27_664 25_290

Priority queue

binary_size heapify 1m max mem pop_min 50 put 50
heap 132_174 6_380_397_060 29_995_836 700_346 258_199
heap_rs 437_278 142_914_793 9_109_504 59_850 23_726

Growable array

binary_size generate 5k max mem batch_get 500 batch_put 500 batch_remove 500
buffer 139_917 2_801_661 65_508 108_523 883_989 177_023
vector 138_378 2_434_733 24_764 170_710 232_943 223_461
vec_rs 435_834 290_143 655_360 17_605 31_014 25_400

Cryptographic libraries

Measure different cryptographic libraries written in both Motoko and Rust.

  • SHA-2 benchmarks
    • SHA-256/SHA-512. Compute the hash of a 1M Wasm binary.
    • account_id. Compute the ledger account id from principal, based on SHA-224.
    • neuron_id. Compute the NNS neuron id from principal, based on SHA-256.
  • Certified map. Merkle Tree for storing key-value pairs and generate witness according to the IC Interface Specification.
    • generate 10k. Insert 10k 7-character word as both key and value into the certified map.
    • max mem. For Motoko, it reports rts_max_heap_size after generate call; For Rust, it reports the Wasm's memory page * 32Kb.
    • inc. Increment a counter and insert the counter value into the map.
    • witness. Generate the root hash and a witness for the counter.

SHA-2

binary_size SHA-256 SHA-512 account_id neuron_id
Motoko 172_874 295_693_065 279_705_297 37_597 27_061
Rust 528_234 82_789_387 56_794_263 50_651 53_532

Certified map

binary_size generate 10k max mem inc witness
Motoko 176_461 5_230_705_397 3_429_924 620_784 434_543
Rust 469_955 6_359_442_714 1_081_344 1_012_174 305_119

Sample Dapps

Measure the performance of some typical dapps:

  • Basic DAO,
    with heartbeat disabled to make profiling easier. We have a separate benchmark to measure heartbeat performance.
  • DIP721 NFT

Note

  • The cost difference is mainly due to the Candid serialization cost.
  • Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use serde to dynamically deserialize data based on data on the wire.
  • We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by serde.
  • For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal
Motoko 230_245 47_892 22_919 19_044 20_223
Rust 763_017 552_075 105_203 128_753 139_539

DIP721 NFT

binary_size init mint_token transfer_token
Motoko 188_522 17_786 29_865 8_891
Rust 828_238 146_257 380_260 93_763

Heartbeat / Timer

Measure the cost of empty heartbeat and timer job.

  • setTimer measures both the setTimer(0) method and the execution of empty job.
  • It is not easy to reliably capture the above events in one flamegraph, as the implementation detail
    of the replica can affect how we measure this. Typically, a correct flamegraph contains both setTimer and canister_global_timer function. If it's not there, we may need to adjust the script.

Heartbeat

binary_size heartbeat
Motoko 123_352 23_302
Rust 25_650 1_179

Timer

binary_size setTimer cancelTimer
Motoko 129_617 52_132 4_649
Rust 470_693 69_727 11_405

Motoko Specific Benchmarks

Measure various features only available in Motoko.

  • Garbage Collection. Measure Motoko garbage collection cost using the Triemap benchmark. The max mem column reports rts_max_heap_size after generate call. The cycle cost numbers reported here are garbage collection cost only. Some flamegraphs are truncated due to the 2M log size limit. The dfx/ic-wasm optimizer is disabled for the garbage collection test cases due to how the optimizer affects function names, making profiling trickier.

    • default. Compile with the default GC option. With the current GC scheduler, generate will trigger the copying GC. The rest of the methods will not trigger GC.
    • copying. Compile with --force-gc --copying-gc.
    • compacting. Compile with --force-gc --compacting-gc.
    • generational. Compile with --force-gc --generational-gc.
    • incremental. Compile with --force-gc --incremental-gc.
  • Actor class. Measure the cost of spawning actor class, using the Actor classes example.

Garbage Collection

generate 800k max mem batch_get 50 batch_put 50 batch_remove 50
default 1_338_231_405 59_396_776 118 118 118
copying 1_338_231_287 59_396_776 1_337_913_569 1_338_002_371 1_337_919_144
compacting 1_911_420_608 59_396_776 1_473_824_186 1_756_485_066 1_787_369_954
generational 2_891_818_643 59_405_240 1_141_865_993 1_217_376 1_117_840
incremental 33_436_719 1_136_155_048 333_734_166 336_829_512 336_860_690

Actor class

binary size put new bucket put existing bucket get
Map 261_394 715_724 16_284 16_794

Publisher & Subscriber

Measure the cost of inter-canister calls from the Publisher & Subscriber example.

pub_binary_size sub_binary_size subscribe_caller subscribe_callee publish_caller publish_callee
Motoko 144_405 131_311 28_775 11_973 23_083 6_436
Rust 511_870 565_407 71_728 44_318 95_767 53_941

@crusso crusso changed the title selecting inlining & wasm-opt3 selecting inlining & wasm-opt O3 Sep 16, 2023
@crusso crusso changed the title selecting inlining & wasm-opt O3 new metering & wasm-opt O3 Sep 17, 2023
mergify bot pushed a commit to dfinity/motoko that referenced this pull request Sep 19, 2023
To mitigate cycle perf regression of new cost model, selectively inline `share_code` helpers in the backend using an additional argument `Never | Always` (i.e. always inline vs never inline). Also, add compiler flags to explicitly opt-in or disable the inlining optimization.

NB: some recursive share_code cannot be unshared/inlined (e.g.  recursive serialization code and code that explicitly returns rather than returning control flow). 

Similar to #4207, but also inlines all heap object allocation and adds compiler flags to enable (default)/disable the optimization.
Note users may want to disable the optimization if they can't accept the increase in code size.

# Profiling data 

## new metering, sans wasm-opt
dfinity/canister-profiling#85

Overall Statistics
binary_size: 10.68% [8.58%, 12.79%]
max_mem: no change
cycles: -8.32% [-9.67%, -6.97%]

## new metering with wasm-opt 03

dfinity/canister-profiling#86

Overall Statistics
binary_size: -6.28% [-7.53%, -5.03%]
max_mem: no change
cycles: -18.21% [-20.07%, -16.36%]

## new metering, master (no-inlining) and wasm-opt 03

dfinity/canister-profiling#83

Overall Statistics
binary_size: -13.96% [-14.64%, -13.28%]
max_mem: no change
cycles: -12.46% [-13.51%, -11.41%]

(UPDATE: revised stats after @chenyan-dfinity updates to PRs)
@chenyan-dfinity chenyan-dfinity deleted the claudio/mild-wasm-opt branch November 27, 2023 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build_base Build base instead of fetching from gh-pages. Note that the build tool runs in the same version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants