Skip to content

Spatter 2.0 Validation

Connor Radelja edited this page Sep 28, 2024 · 7 revisions

Spatter 2.0 Validation

This page details some of the tests performed to validate Spatter 2.0 versus 1.1 kernels.

Both versions of Spatter were compiled using the GNU GCC compiler with the OpenMP backend and Release build type. For v1.1, the -g flag was removed from line 247 in CMakeLists.txt for consistency with v2, ensuring both were compiled with similar flags. Additionally, the source code of v1.1 was changed to increase the number of warmup runs from 1 to 10; aligning it with the number of warmup runs used in v2.

Spatter v1.1 build command:

cmake -DCMAKE_BUILD_TYPE=Release -DBACKEND=openmp -DCOMPILER=gnu -B build_openmp -S . && make -j$(nproc) -C build_openmp

Spatter v2 build command:

cmake -DUSE_OPENMP=1 -B build_openmp -S . && make -j$(nproc) -C build_openmp

Using objdump to validate kernels

The assembly code generated by the GNU GCC compiler was disassembled from the v1.1 and v2 spatter binaries using objdump. This can be used to validate the kernels by comparing the instructions in the loop region of the kernels between v1.1 and v2. These loop regions can be identified using VTune or stepping through the code with gdb.

To get the assembly code for a specific kernel, the following workflow was used:

  1. Run objdump -t spatter | grep <kernel_name> to get a list of matching symbols

  2. Run objdump --disassemble=<kernel_symbol> spatter to disassemble the kernel

Example with Scatter Kernel:

Output from objdump -t spatter | grep scatter
0000000000411cb0 l     F .text	00000000000000f5              _ZN7Spatter13ConfigurationINS_6OpenMPEE7scatterEbm._omp_fn.1
0000000000411db0 l     F .text	00000000000000e0              _ZN7Spatter13ConfigurationINS_6OpenMPEE14scatter_gatherEbm._omp_fn.2
0000000000411f90 l     F .text	0000000000000101              _ZN7Spatter13ConfigurationINS_6OpenMPEE13multi_scatterEbm._omp_fn.4
00000000004122c0 g     F .text	00000000000000a4              _ZN7Spatter13ConfigurationINS_6OpenMPEE7scatterEbm
00000000004117c0 g     F .text	000000000000013e              _ZN7Spatter13ConfigurationINS_6SerialEE14scatter_gatherEbm
0000000000411680 g     F .text	0000000000000136              _ZN7Spatter13ConfigurationINS_6SerialEE7scatterEbm
0000000000411a50 g     F .text	000000000000015b              _ZN7Spatter13ConfigurationINS_6SerialEE13multi_scatterEbm
0000000000412370 g     F .text	00000000000000ac              _ZN7Spatter13ConfigurationINS_6OpenMPEE14scatter_gatherEbm
00000000004120a0 g     F .text	00000000000000ac              _ZN7Spatter13ConfigurationINS_6OpenMPEE13multi_scatterEbm
Output from objdump --disassemble=_ZN7Spatter13ConfigurationINS_6OpenMPEE7scatterEbm._omp_fn.1 spatter

spatter-openmp: file format elf64-x86-64

Disassembly of section .init:

Disassembly of section .plt:

Disassembly of section .text:

0000000000411cb0 <_ZN7Spatter13ConfigurationINS_6OpenMPEE7scatterEbm._omp_fn.1>: 411cb0: 41 55 push %r13 411cb2: 41 54 push %r12 411cb4: 55 push %rbp 411cb5: 48 89 fd mov %rdi,%rbp 411cb8: 53 push %rbx 411cb9: 48 83 ec 08 sub $0x8,%rsp 411cbd: 4c 8b 2f mov (%rdi),%r13 411cc0: e8 cb 1f ff ff call 403c90 omp_get_thread_num@plt 411cc5: 49 8b 9d 28 01 00 00 mov 0x128(%r13),%rbx 411ccc: 48 85 db test %rbx,%rbx 411ccf: 0f 84 ba 00 00 00 je 411d8f <_ZN7Spatter13ConfigurationINS_6OpenMPEE7scatterEbm._omp_fn.1+0xdf> 411cd5: 41 89 c4 mov %eax,%r12d 411cd8: e8 c3 20 ff ff call 403da0 omp_get_num_threads@plt 411cdd: 31 d2 xor %edx,%edx 411cdf: 49 63 cc movslq %r12d,%rcx 411ce2: 48 63 f0 movslq %eax,%rsi 411ce5: 48 89 d8 mov %rbx,%rax 411ce8: 48 f7 f6 div %rsi 411ceb: 48 39 d1 cmp %rdx,%rcx 411cee: 0f 82 a6 00 00 00 jb 411d9a <_ZN7Spatter13ConfigurationINS_6OpenMPEE7scatterEbm._omp_fn.1+0xea> 411cf4: 49 89 c2 mov %rax,%r10 411cf7: 4c 0f af d1 imul %rcx,%r10 411cfb: 49 01 d2 add %rdx,%r10 411cfe: 4e 8d 1c 10 lea (%rax,%r10,1),%r11 411d02: 4d 39 da cmp %r11,%r10 411d05: 0f 83 84 00 00 00 jae 411d8f <_ZN7Spatter13ConfigurationINS_6OpenMPEE7scatterEbm._omp_fn.1+0xdf> 411d0b: 49 8b 85 98 00 00 00 mov 0x98(%r13),%rax 411d12: 49 8b 95 e8 00 00 00 mov 0xe8(%r13),%rdx 411d19: 48 8b 75 08 mov 0x8(%rbp),%rsi 411d1d: 49 8b 9d 20 01 00 00 mov 0x120(%r13),%rbx 411d24: 4c 8b 00 mov (%rax),%r8 411d27: 48 8d 04 49 lea (%rcx,%rcx,2),%rax 411d2b: 48 8b 0a mov (%rdx),%rcx 411d2e: 49 8b ad 00 01 00 00 mov 0x100(%r13),%rbp 411d35: 48 8d 04 c1 lea (%rcx,%rax,8),%rax 411d39: 4c 8b 20 mov (%rax),%r12 411d3c: 48 85 f6 test %rsi,%rsi 411d3f: 74 4e je 411d8f <_ZN7Spatter13ConfigurationINS_6OpenMPEE7scatterEbm._omp_fn.1+0xdf> 411d41: 4c 89 d1 mov %r10,%rcx 411d44: 4d 8b 4d 50 mov 0x50(%r13),%r9 411d48: 48 0f af cd imul %rbp,%rcx 411d4c: 0f 1f 40 00 nopl 0x0(%rax) 411d50: 4c 89 d0 mov %r10,%rax 411d53: 31 d2 xor %edx,%edx 411d55: 48 f7 f3 div %rbx 411d58: 31 c0 xor %eax,%eax 411d5a: 48 0f af d6 imul %rsi,%rdx 411d5e: 49 8d 3c d4 lea (%r12,%rdx,8),%rdi 411d62: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 411d68: 49 8b 14 c1 mov (%r9,%rax,8),%rdx 411d6c: f2 0f 10 04 c7 movsd (%rdi,%rax,8),%xmm0 411d71: 48 83 c0 01 add $0x1,%rax 411d75: 48 01 ca add %rcx,%rdx 411d78: f2 41 0f 11 04 d0 movsd %xmm0,(%r8,%rdx,8) 411d7e: 48 39 c6 cmp %rax,%rsi 411d81: 75 e5 jne 411d68 <_ZN7Spatter13ConfigurationINS_6OpenMPEE7scatterEbm._omp_fn.1+0xb8> 411d83: 49 83 c2 01 add $0x1,%r10 411d87: 48 01 e9 add %rbp,%rcx 411d8a: 4d 39 d3 cmp %r10,%r11 411d8d: 75 c1 jne 411d50 <_ZN7Spatter13ConfigurationINS_6OpenMPEE7scatterEbm._omp_fn.1+0xa0> 411d8f: 48 83 c4 08 add $0x8,%rsp 411d93: 5b pop %rbx 411d94: 5d pop %rbp 411d95: 41 5c pop %r12 411d97: 41 5d pop %r13 411d99: c3 ret
411d9a: 48 83 c0 01 add $0x1,%rax 411d9e: 31 d2 xor %edx,%edx 411da0: e9 4f ff ff ff jmp 411cf4 <_ZN7Spatter13ConfigurationINS_6OpenMPEE7scatterEbm._omp_fn.1+0x44>

Disassembly of section .fini:


Note: Passing a parameter to the --disassemble flag of objdump was added in version 2.32 of GNU Binutils.

Gather Kernel

Version 1.1 (OpenMP Backend)

spatter: file format elf64-x86-64

Disassembly of section .init:

Disassembly of section .plt:

Disassembly of section .text:

000000000041e880 <gather_smallbuf._omp_fn.7>: 41e880: 41 54 push %r12 41e882: 49 89 fc mov %rdi,%r12 41e885: 55 push %rbp 41e886: 53 push %rbx 41e887: 48 8b 5f 28 mov 0x28(%rdi),%rbx 41e88b: e8 50 2d fe ff call 4015e0 omp_get_thread_num@plt 41e890: 48 85 db test %rbx,%rbx 41e893: 0f 84 a6 00 00 00 je 41e93f <gather_smallbuf._omp_fn.7+0xbf> 41e899: 89 c5 mov %eax,%ebp 41e89b: e8 70 2e fe ff call 401710 omp_get_num_threads@plt 41e8a0: 31 d2 xor %edx,%edx 41e8a2: 48 63 cd movslq %ebp,%rcx 41e8a5: 48 63 f0 movslq %eax,%rsi 41e8a8: 48 89 d8 mov %rbx,%rax 41e8ab: 48 f7 f6 div %rsi 41e8ae: 48 39 d1 cmp %rdx,%rcx 41e8b1: 0f 82 8d 00 00 00 jb 41e944 <gather_smallbuf._omp_fn.7+0xc4> 41e8b7: 49 89 c2 mov %rax,%r10 41e8ba: 4c 0f af d1 imul %rcx,%r10 41e8be: 49 01 d2 add %rdx,%r10 41e8c1: 4e 8d 1c 10 lea (%rax,%r10,1),%r11 41e8c5: 4d 39 da cmp %r11,%r10 41e8c8: 73 75 jae 41e93f <gather_smallbuf._omp_fn.7+0xbf> 41e8ca: 49 8b 74 24 18 mov 0x18(%r12),%rsi 41e8cf: 49 8b 04 24 mov (%r12),%rax 41e8d3: 49 8b 5c 24 30 mov 0x30(%r12),%rbx 41e8d8: 49 8b 6c 24 20 mov 0x20(%r12),%rbp 41e8dd: 4d 8b 44 24 10 mov 0x10(%r12),%r8 41e8e2: 4d 8b 4c 24 08 mov 0x8(%r12),%r9 41e8e7: 4c 8b 24 c8 mov (%rax,%rcx,8),%r12 41e8eb: 48 85 f6 test %rsi,%rsi 41e8ee: 74 4f je 41e93f <gather_smallbuf._omp_fn.7+0xbf> 41e8f0: 48 89 e9 mov %rbp,%rcx 41e8f3: 49 0f af ca imul %r10,%rcx 41e8f7: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1) 41e8fe: 00 00 41e900: 4c 89 d0 mov %r10,%rax 41e903: 31 d2 xor %edx,%edx 41e905: 48 f7 f3 div %rbx 41e908: 31 c0 xor %eax,%eax 41e90a: 48 0f af d6 imul %rsi,%rdx 41e90e: 49 8d 3c d4 lea (%r12,%rdx,8),%rdi 41e912: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 41e918: 49 8b 14 c0 mov (%r8,%rax,8),%rdx 41e91c: 48 01 ca add %rcx,%rdx 41e91f: f2 41 0f 10 04 d1 movsd (%r9,%rdx,8),%xmm0 41e925: f2 0f 11 04 c7 movsd %xmm0,(%rdi,%rax,8) 41e92a: 48 83 c0 01 add $0x1,%rax 41e92e: 48 39 c6 cmp %rax,%rsi 41e931: 75 e5 jne 41e918 <gather_smallbuf._omp_fn.7+0x98> 41e933: 49 83 c2 01 add $0x1,%r10 41e937: 48 01 e9 add %rbp,%rcx 41e93a: 4d 39 d3 cmp %r10,%r11 41e93d: 75 c1 jne 41e900 <gather_smallbuf._omp_fn.7+0x80> 41e93f: 5b pop %rbx 41e940: 5d pop %rbp 41e941: 41 5c pop %r12 41e943: c3 ret
41e944: 48 83 c0 01 add $0x1,%rax 41e948: 31 d2 xor %edx,%edx 41e94a: e9 68 ff ff ff jmp 41e8b7 <gather_smallbuf._omp_fn.7+0x37>

Disassembly of section .fini:

Version 2 (OpenMP Backend)

spatter: file format elf64-x86-64

Disassembly of section .init:

Disassembly of section .plt:

Disassembly of section .text:

0000000000411bb0 <_ZN7Spatter13ConfigurationINS_6OpenMPEE6gatherEbm._omp_fn.0>: 411bb0: 41 55 push %r13 411bb2: 41 54 push %r12 411bb4: 55 push %rbp 411bb5: 48 89 fd mov %rdi,%rbp 411bb8: 53 push %rbx 411bb9: 48 83 ec 08 sub $0x8,%rsp 411bbd: 4c 8b 2f mov (%rdi),%r13 411bc0: e8 cb 20 ff ff call 403c90 omp_get_thread_num@plt 411bc5: 49 8b 9d 28 01 00 00 mov 0x128(%r13),%rbx 411bcc: 48 85 db test %rbx,%rbx 411bcf: 0f 84 ba 00 00 00 je 411c8f <_ZN7Spatter13ConfigurationINS_6OpenMPEE6gatherEbm._omp_fn.0+0xdf> 411bd5: 41 89 c4 mov %eax,%r12d 411bd8: e8 c3 21 ff ff call 403da0 omp_get_num_threads@plt 411bdd: 31 d2 xor %edx,%edx 411bdf: 49 63 cc movslq %r12d,%rcx 411be2: 48 63 f0 movslq %eax,%rsi 411be5: 48 89 d8 mov %rbx,%rax 411be8: 48 f7 f6 div %rsi 411beb: 48 39 d1 cmp %rdx,%rcx 411bee: 0f 82 a6 00 00 00 jb 411c9a <_ZN7Spatter13ConfigurationINS_6OpenMPEE6gatherEbm._omp_fn.0+0xea> 411bf4: 49 89 c2 mov %rax,%r10 411bf7: 4c 0f af d1 imul %rcx,%r10 411bfb: 49 01 d2 add %rdx,%r10 411bfe: 4e 8d 1c 10 lea (%rax,%r10,1),%r11 411c02: 4d 39 da cmp %r11,%r10 411c05: 0f 83 84 00 00 00 jae 411c8f <_ZN7Spatter13ConfigurationINS_6OpenMPEE6gatherEbm._omp_fn.0+0xdf> 411c0b: 49 8b 85 98 00 00 00 mov 0x98(%r13),%rax 411c12: 49 8b 95 e8 00 00 00 mov 0xe8(%r13),%rdx 411c19: 48 8b 75 08 mov 0x8(%rbp),%rsi 411c1d: 49 8b 9d 20 01 00 00 mov 0x120(%r13),%rbx 411c24: 4c 8b 00 mov (%rax),%r8 411c27: 48 8d 04 49 lea (%rcx,%rcx,2),%rax 411c2b: 48 8b 0a mov (%rdx),%rcx 411c2e: 49 8b ad 00 01 00 00 mov 0x100(%r13),%rbp 411c35: 48 8d 04 c1 lea (%rcx,%rax,8),%rax 411c39: 4c 8b 20 mov (%rax),%r12 411c3c: 48 85 f6 test %rsi,%rsi 411c3f: 74 4e je 411c8f <_ZN7Spatter13ConfigurationINS_6OpenMPEE6gatherEbm._omp_fn.0+0xdf> 411c41: 4c 89 d1 mov %r10,%rcx 411c44: 4d 8b 4d 50 mov 0x50(%r13),%r9 411c48: 48 0f af cd imul %rbp,%rcx 411c4c: 0f 1f 40 00 nopl 0x0(%rax) 411c50: 4c 89 d0 mov %r10,%rax 411c53: 31 d2 xor %edx,%edx 411c55: 48 f7 f3 div %rbx 411c58: 31 c0 xor %eax,%eax 411c5a: 48 0f af d6 imul %rsi,%rdx 411c5e: 49 8d 3c d4 lea (%r12,%rdx,8),%rdi 411c62: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 411c68: 49 8b 14 c1 mov (%r9,%rax,8),%rdx 411c6c: 48 01 ca add %rcx,%rdx 411c6f: f2 41 0f 10 04 d0 movsd (%r8,%rdx,8),%xmm0 411c75: f2 0f 11 04 c7 movsd %xmm0,(%rdi,%rax,8) 411c7a: 48 83 c0 01 add $0x1,%rax 411c7e: 48 39 c6 cmp %rax,%rsi 411c81: 75 e5 jne 411c68 <_ZN7Spatter13ConfigurationINS_6OpenMPEE6gatherEbm._omp_fn.0+0xb8> 411c83: 49 83 c2 01 add $0x1,%r10 411c87: 48 01 e9 add %rbp,%rcx 411c8a: 4d 39 d3 cmp %r10,%r11 411c8d: 75 c1 jne 411c50 <_ZN7Spatter13ConfigurationINS_6OpenMPEE6gatherEbm._omp_fn.0+0xa0> 411c8f: 48 83 c4 08 add $0x8,%rsp 411c93: 5b pop %rbx 411c94: 5d pop %rbp 411c95: 41 5c pop %r12 411c97: 41 5d pop %r13 411c99: c3 ret
411c9a: 48 83 c0 01 add $0x1,%rax 411c9e: 31 d2 xor %edx,%edx 411ca0: e9 4f ff ff ff jmp 411bf4 <_ZN7Spatter13ConfigurationINS_6OpenMPEE6gatherEbm._omp_fn.0+0x44>

Disassembly of section .fini:


Version 1.1 (Serial Backend)

spatter-serial: file format elf64-x86-64

Disassembly of section .init:

Disassembly of section .plt:

Disassembly of section .text:

000000000041dbe0 <gather_smallbuf_serial>: 41dbe0: 41 54 push %r12 41dbe2: 55 push %rbp 41dbe3: 53 push %rbx 41dbe4: 48 8b 6c 24 20 mov 0x20(%rsp),%rbp 41dbe9: 4d 85 c9 test %r9,%r9 41dbec: 74 51 je 41dc3f <gather_smallbuf_serial+0x5f> 41dbee: 4c 8b 27 mov (%rdi),%r12 41dbf1: 48 85 c9 test %rcx,%rcx 41dbf4: 74 49 je 41dc3f <gather_smallbuf_serial+0x5f> 41dbf6: 49 89 d3 mov %rdx,%r11 41dbf9: 31 ff xor %edi,%edi 41dbfb: 31 db xor %ebx,%ebx 41dbfd: 0f 1f 00 nopl (%rax) 41dc00: 48 89 d8 mov %rbx,%rax 41dc03: 31 d2 xor %edx,%edx 41dc05: 48 f7 f5 div %rbp 41dc08: 31 c0 xor %eax,%eax 41dc0a: 48 0f af d1 imul %rcx,%rdx 41dc0e: 4d 8d 14 d4 lea (%r12,%rdx,8),%r10 41dc12: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 41dc18: 49 8b 14 c3 mov (%r11,%rax,8),%rdx 41dc1c: 48 01 fa add %rdi,%rdx 41dc1f: f2 0f 10 04 d6 movsd (%rsi,%rdx,8),%xmm0 41dc24: f2 41 0f 11 04 c2 movsd %xmm0,(%r10,%rax,8) 41dc2a: 48 83 c0 01 add $0x1,%rax 41dc2e: 48 39 c1 cmp %rax,%rcx 41dc31: 75 e5 jne 41dc18 <gather_smallbuf_serial+0x38> 41dc33: 48 83 c3 01 add $0x1,%rbx 41dc37: 4c 01 c7 add %r8,%rdi 41dc3a: 49 39 d9 cmp %rbx,%r9 41dc3d: 75 c1 jne 41dc00 <gather_smallbuf_serial+0x20> 41dc3f: 5b pop %rbx 41dc40: 5d pop %rbp 41dc41: 41 5c pop %r12 41dc43: c3 ret

Disassembly of section .fini:

Version 2 (Serial Backend)

spatter-serial: file format elf64-x86-64

Disassembly of section .init:

Disassembly of section .plt:

Disassembly of section .text:

00000000004110b0 <_ZN7Spatter13ConfigurationINS_6SerialEE6gatherEbm>: 4110b0: 41 57 push %r15 4110b2: 41 56 push %r14 4110b4: 41 55 push %r13 4110b6: 41 54 push %r12 4110b8: 49 89 fc mov %rdi,%r12 4110bb: 55 push %rbp 4110bc: 53 push %rbx 4110bd: 48 83 ec 18 sub $0x18,%rsp 4110c1: 48 8b 6f 58 mov 0x58(%rdi),%rbp 4110c5: 48 2b 6f 50 sub 0x50(%rdi),%rbp 4110c9: 48 89 eb mov %rbp,%rbx 4110cc: 48 89 14 24 mov %rdx,(%rsp) 4110d0: 48 c1 fb 03 sar $0x3,%rbx 4110d4: 40 84 f6 test %sil,%sil 4110d7: 0f 85 9b 00 00 00 jne 411178 <_ZN7Spatter13ConfigurationINS_6SerialEE6gatherEbm+0xc8> 4110dd: 4c 8b 9f 28 01 00 00 mov 0x128(%rdi),%r11 4110e4: 4d 85 db test %r11,%r11 4110e7: 0f 84 7c 00 00 00 je 411169 <_ZN7Spatter13ConfigurationINS_6SerialEE6gatherEbm+0xb9> 4110ed: 48 85 db test %rbx,%rbx 4110f0: 74 77 je 411169 <_ZN7Spatter13ConfigurationINS_6SerialEE6gatherEbm+0xb9> 4110f2: 49 8b 84 24 98 00 00 mov 0x98(%r12),%rax 4110f9: 00 4110fa: 4d 8b 44 24 50 mov 0x50(%r12),%r8 4110ff: 45 31 c9 xor %r9d,%r9d 411102: 4d 8b b4 24 00 01 00 mov 0x100(%r12),%r14 411109: 00 41110a: 4d 8b 94 24 20 01 00 mov 0x120(%r12),%r10 411111: 00 411112: 48 8b 38 mov (%rax),%rdi 411115: 49 8b 84 24 e0 00 00 mov 0xe0(%r12),%rax 41111c: 00 41111d: 4c 8b 28 mov (%rax),%r13 411120: 4c 89 c8 mov %r9,%rax 411123: 31 d2 xor %edx,%edx 411125: 4c 89 f1 mov %r14,%rcx 411128: 49 f7 f2 div %r10 41112b: 49 0f af c9 imul %r9,%rcx 41112f: 31 c0 xor %eax,%eax 411131: 48 0f af d5 imul %rbp,%rdx 411135: 4d 8d 7c 15 00 lea 0x0(%r13,%rdx,1),%r15 41113a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 411140: 49 8b 14 c0 mov (%r8,%rax,8),%rdx 411144: 48 01 ca add %rcx,%rdx 411147: f2 0f 10 04 d7 movsd (%rdi,%rdx,8),%xmm0 41114c: f2 41 0f 11 04 c7 movsd %xmm0,(%r15,%rax,8) 411152: 48 83 c0 01 add $0x1,%rax 411156: 48 39 c3 cmp %rax,%rbx 411159: 75 e5 jne 411140 <_ZN7Spatter13ConfigurationINS_6SerialEE6gatherEbm+0x90> 41115b: 49 83 c1 01 add $0x1,%r9 41115f: 4d 39 d9 cmp %r11,%r9 411162: 72 bc jb 411120 <_ZN7Spatter13ConfigurationINS_6SerialEE6gatherEbm+0x70> 411164: 40 84 f6 test %sil,%sil 411167: 75 73 jne 4111dc <_ZN7Spatter13ConfigurationINS_6SerialEE6gatherEbm+0x12c> 411169: 48 83 c4 18 add $0x18,%rsp 41116d: 5b pop %rbx 41116e: 5d pop %rbp 41116f: 41 5c pop %r12 411171: 41 5d pop %r13 411173: 41 5e pop %r14 411175: 41 5f pop %r15 411177: c3 ret
411178: 4c 8d af 60 01 00 00 lea 0x160(%rdi),%r13 41117f: 89 74 24 0c mov %esi,0xc(%rsp) 411183: 4c 89 ef mov %r13,%rdi 411186: e8 55 41 02 00 call 4352e0 <_ZN7Spatter5Timer5startEv> 41118b: 4d 8b 9c 24 28 01 00 mov 0x128(%r12),%r11 411192: 00 411193: 8b 74 24 0c mov 0xc(%rsp),%esi 411197: 4d 85 db test %r11,%r11 41119a: 74 09 je 4111a5 <_ZN7Spatter13ConfigurationINS_6SerialEE6gatherEbm+0xf5> 41119c: 48 85 db test %rbx,%rbx 41119f: 0f 85 4d ff ff ff jne 4110f2 <_ZN7Spatter13ConfigurationINS_6SerialEE6gatherEbm+0x42> 4111a5: 4c 89 ef mov %r13,%rdi 4111a8: e8 53 41 02 00 call 435300 <_ZN7Spatter5Timer4stopEv> 4111ad: 4c 89 ef mov %r13,%rdi 4111b0: e8 7b 41 02 00 call 435330 <_ZNK7Spatter5Timer7secondsEv> 4111b5: 49 8b 84 24 78 01 00 mov 0x178(%r12),%rax 4111bc: 00 4111bd: 48 8b 34 24 mov (%rsp),%rsi 4111c1: 4c 89 ef mov %r13,%rdi 4111c4: f2 0f 11 04 f0 movsd %xmm0,(%rax,%rsi,8) 4111c9: 48 83 c4 18 add $0x18,%rsp 4111cd: 5b pop %rbx 4111ce: 5d pop %rbp 4111cf: 41 5c pop %r12 4111d1: 41 5d pop %r13 4111d3: 41 5e pop %r14 4111d5: 41 5f pop %r15 4111d7: e9 64 41 02 00 jmp 435340 <_ZN7Spatter5Timer5clearEv> 4111dc: 4d 8d ac 24 60 01 00 lea 0x160(%r12),%r13 4111e3: 00 4111e4: eb bf jmp 4111a5 <_ZN7Spatter13ConfigurationINS_6SerialEE6gatherEbm+0xf5>

Disassembly of section .fini:

Scatter Kernel

Version 1.1 (OpenMP Backend)

spatter: file format elf64-x86-64

Disassembly of section .init:

Disassembly of section .plt:

Disassembly of section .text:

000000000041eb20 <scatter_smallbuf._omp_fn.10>: 41eb20: 41 54 push %r12 41eb22: 49 89 fc mov %rdi,%r12 41eb25: 55 push %rbp 41eb26: 53 push %rbx 41eb27: 48 8b 5f 28 mov 0x28(%rdi),%rbx 41eb2b: e8 b0 2a fe ff call 4015e0 omp_get_thread_num@plt 41eb30: 48 85 db test %rbx,%rbx 41eb33: 0f 84 a6 00 00 00 je 41ebdf <scatter_smallbuf._omp_fn.10+0xbf> 41eb39: 89 c5 mov %eax,%ebp 41eb3b: e8 d0 2b fe ff call 401710 omp_get_num_threads@plt 41eb40: 31 d2 xor %edx,%edx 41eb42: 48 63 cd movslq %ebp,%rcx 41eb45: 48 63 f0 movslq %eax,%rsi 41eb48: 48 89 d8 mov %rbx,%rax 41eb4b: 48 f7 f6 div %rsi 41eb4e: 48 39 d1 cmp %rdx,%rcx 41eb51: 0f 82 8d 00 00 00 jb 41ebe4 <scatter_smallbuf._omp_fn.10+0xc4> 41eb57: 49 89 c2 mov %rax,%r10 41eb5a: 4c 0f af d1 imul %rcx,%r10 41eb5e: 49 01 d2 add %rdx,%r10 41eb61: 4e 8d 1c 10 lea (%rax,%r10,1),%r11 41eb65: 4d 39 da cmp %r11,%r10 41eb68: 73 75 jae 41ebdf <scatter_smallbuf._omp_fn.10+0xbf> 41eb6a: 49 8b 74 24 18 mov 0x18(%r12),%rsi 41eb6f: 49 8b 44 24 08 mov 0x8(%r12),%rax 41eb74: 49 8b 5c 24 30 mov 0x30(%r12),%rbx 41eb79: 49 8b 6c 24 20 mov 0x20(%r12),%rbp 41eb7e: 4d 8b 44 24 10 mov 0x10(%r12),%r8 41eb83: 4d 8b 0c 24 mov (%r12),%r9 41eb87: 4c 8b 24 c8 mov (%rax,%rcx,8),%r12 41eb8b: 48 85 f6 test %rsi,%rsi 41eb8e: 74 4f je 41ebdf <scatter_smallbuf._omp_fn.10+0xbf> 41eb90: 48 89 e9 mov %rbp,%rcx 41eb93: 49 0f af ca imul %r10,%rcx 41eb97: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1) 41eb9e: 00 00 41eba0: 4c 89 d0 mov %r10,%rax 41eba3: 31 d2 xor %edx,%edx 41eba5: 48 f7 f3 div %rbx 41eba8: 31 c0 xor %eax,%eax 41ebaa: 48 0f af d6 imul %rsi,%rdx 41ebae: 49 8d 3c d4 lea (%r12,%rdx,8),%rdi 41ebb2: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 41ebb8: 49 8b 14 c0 mov (%r8,%rax,8),%rdx 41ebbc: f2 0f 10 04 c7 movsd (%rdi,%rax,8),%xmm0 41ebc1: 48 83 c0 01 add $0x1,%rax 41ebc5: 48 01 ca add %rcx,%rdx 41ebc8: f2 41 0f 11 04 d1 movsd %xmm0,(%r9,%rdx,8) 41ebce: 48 39 c6 cmp %rax,%rsi 41ebd1: 75 e5 jne 41ebb8 <scatter_smallbuf._omp_fn.10+0x98> 41ebd3: 49 83 c2 01 add $0x1,%r10 41ebd7: 48 01 e9 add %rbp,%rcx 41ebda: 4d 39 d3 cmp %r10,%r11 41ebdd: 75 c1 jne 41eba0 <scatter_smallbuf._omp_fn.10+0x80> 41ebdf: 5b pop %rbx 41ebe0: 5d pop %rbp 41ebe1: 41 5c pop %r12 41ebe3: c3 ret
41ebe4: 48 83 c0 01 add $0x1,%rax 41ebe8: 31 d2 xor %edx,%edx 41ebea: e9 68 ff ff ff jmp 41eb57 <scatter_smallbuf._omp_fn.10+0x37>

Disassembly of section .fini:

Version 2 (OpenMP Backend)

spatter: file format elf64-x86-64

Disassembly of section .init:

Disassembly of section .plt:

Disassembly of section .text:

0000000000411cb0 <_ZN7Spatter13ConfigurationINS_6OpenMPEE7scatterEbm._omp_fn.1>: 411cb0: 41 55 push %r13 411cb2: 41 54 push %r12 411cb4: 55 push %rbp 411cb5: 48 89 fd mov %rdi,%rbp 411cb8: 53 push %rbx 411cb9: 48 83 ec 08 sub $0x8,%rsp 411cbd: 4c 8b 2f mov (%rdi),%r13 411cc0: e8 cb 1f ff ff call 403c90 omp_get_thread_num@plt 411cc5: 49 8b 9d 28 01 00 00 mov 0x128(%r13),%rbx 411ccc: 48 85 db test %rbx,%rbx 411ccf: 0f 84 ba 00 00 00 je 411d8f <_ZN7Spatter13ConfigurationINS_6OpenMPEE7scatterEbm._omp_fn.1+0xdf> 411cd5: 41 89 c4 mov %eax,%r12d 411cd8: e8 c3 20 ff ff call 403da0 omp_get_num_threads@plt 411cdd: 31 d2 xor %edx,%edx 411cdf: 49 63 cc movslq %r12d,%rcx 411ce2: 48 63 f0 movslq %eax,%rsi 411ce5: 48 89 d8 mov %rbx,%rax 411ce8: 48 f7 f6 div %rsi 411ceb: 48 39 d1 cmp %rdx,%rcx 411cee: 0f 82 a6 00 00 00 jb 411d9a <_ZN7Spatter13ConfigurationINS_6OpenMPEE7scatterEbm._omp_fn.1+0xea> 411cf4: 49 89 c2 mov %rax,%r10 411cf7: 4c 0f af d1 imul %rcx,%r10 411cfb: 49 01 d2 add %rdx,%r10 411cfe: 4e 8d 1c 10 lea (%rax,%r10,1),%r11 411d02: 4d 39 da cmp %r11,%r10 411d05: 0f 83 84 00 00 00 jae 411d8f <_ZN7Spatter13ConfigurationINS_6OpenMPEE7scatterEbm._omp_fn.1+0xdf> 411d0b: 49 8b 85 98 00 00 00 mov 0x98(%r13),%rax 411d12: 49 8b 95 e8 00 00 00 mov 0xe8(%r13),%rdx 411d19: 48 8b 75 08 mov 0x8(%rbp),%rsi 411d1d: 49 8b 9d 20 01 00 00 mov 0x120(%r13),%rbx 411d24: 4c 8b 00 mov (%rax),%r8 411d27: 48 8d 04 49 lea (%rcx,%rcx,2),%rax 411d2b: 48 8b 0a mov (%rdx),%rcx 411d2e: 49 8b ad 00 01 00 00 mov 0x100(%r13),%rbp 411d35: 48 8d 04 c1 lea (%rcx,%rax,8),%rax 411d39: 4c 8b 20 mov (%rax),%r12 411d3c: 48 85 f6 test %rsi,%rsi 411d3f: 74 4e je 411d8f <_ZN7Spatter13ConfigurationINS_6OpenMPEE7scatterEbm._omp_fn.1+0xdf> 411d41: 4c 89 d1 mov %r10,%rcx 411d44: 4d 8b 4d 50 mov 0x50(%r13),%r9 411d48: 48 0f af cd imul %rbp,%rcx 411d4c: 0f 1f 40 00 nopl 0x0(%rax) 411d50: 4c 89 d0 mov %r10,%rax 411d53: 31 d2 xor %edx,%edx 411d55: 48 f7 f3 div %rbx 411d58: 31 c0 xor %eax,%eax 411d5a: 48 0f af d6 imul %rsi,%rdx 411d5e: 49 8d 3c d4 lea (%r12,%rdx,8),%rdi 411d62: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 411d68: 49 8b 14 c1 mov (%r9,%rax,8),%rdx 411d6c: f2 0f 10 04 c7 movsd (%rdi,%rax,8),%xmm0 411d71: 48 83 c0 01 add $0x1,%rax 411d75: 48 01 ca add %rcx,%rdx 411d78: f2 41 0f 11 04 d0 movsd %xmm0,(%r8,%rdx,8) 411d7e: 48 39 c6 cmp %rax,%rsi 411d81: 75 e5 jne 411d68 <_ZN7Spatter13ConfigurationINS_6OpenMPEE7scatterEbm._omp_fn.1+0xb8> 411d83: 49 83 c2 01 add $0x1,%r10 411d87: 48 01 e9 add %rbp,%rcx 411d8a: 4d 39 d3 cmp %r10,%r11 411d8d: 75 c1 jne 411d50 <_ZN7Spatter13ConfigurationINS_6OpenMPEE7scatterEbm._omp_fn.1+0xa0> 411d8f: 48 83 c4 08 add $0x8,%rsp 411d93: 5b pop %rbx 411d94: 5d pop %rbp 411d95: 41 5c pop %r12 411d97: 41 5d pop %r13 411d99: c3 ret
411d9a: 48 83 c0 01 add $0x1,%rax 411d9e: 31 d2 xor %edx,%edx 411da0: e9 4f ff ff ff jmp 411cf4 <_ZN7Spatter13ConfigurationINS_6OpenMPEE7scatterEbm._omp_fn.1+0x44>

Disassembly of section .fini:


Version 1.1 (Serial Backend)

spatter-serial: file format elf64-x86-64

Disassembly of section .init:

Disassembly of section .plt:

Disassembly of section .text:

000000000041dc50 <scatter_smallbuf_serial>: 41dc50: 41 54 push %r12 41dc52: 55 push %rbp 41dc53: 53 push %rbx 41dc54: 48 8b 6c 24 20 mov 0x20(%rsp),%rbp 41dc59: 4d 85 c9 test %r9,%r9 41dc5c: 74 51 je 41dcaf <scatter_smallbuf_serial+0x5f> 41dc5e: 4c 8b 26 mov (%rsi),%r12 41dc61: 48 85 c9 test %rcx,%rcx 41dc64: 74 49 je 41dcaf <scatter_smallbuf_serial+0x5f> 41dc66: 49 89 d3 mov %rdx,%r11 41dc69: 31 f6 xor %esi,%esi 41dc6b: 31 db xor %ebx,%ebx 41dc6d: 0f 1f 00 nopl (%rax) 41dc70: 48 89 d8 mov %rbx,%rax 41dc73: 31 d2 xor %edx,%edx 41dc75: 48 f7 f5 div %rbp 41dc78: 31 c0 xor %eax,%eax 41dc7a: 48 0f af d1 imul %rcx,%rdx 41dc7e: 4d 8d 14 d4 lea (%r12,%rdx,8),%r10 41dc82: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 41dc88: 49 8b 14 c3 mov (%r11,%rax,8),%rdx 41dc8c: f2 41 0f 10 04 c2 movsd (%r10,%rax,8),%xmm0 41dc92: 48 83 c0 01 add $0x1,%rax 41dc96: 48 01 f2 add %rsi,%rdx 41dc99: f2 0f 11 04 d7 movsd %xmm0,(%rdi,%rdx,8) 41dc9e: 48 39 c1 cmp %rax,%rcx 41dca1: 75 e5 jne 41dc88 <scatter_smallbuf_serial+0x38> 41dca3: 48 83 c3 01 add $0x1,%rbx 41dca7: 4c 01 c6 add %r8,%rsi 41dcaa: 49 39 d9 cmp %rbx,%r9 41dcad: 75 c1 jne 41dc70 <scatter_smallbuf_serial+0x20> 41dcaf: 5b pop %rbx 41dcb0: 5d pop %rbp 41dcb1: 41 5c pop %r12 41dcb3: c3 ret

Disassembly of section .fini:

Version 2 (Serial Backend)

spatter-serial: file format elf64-x86-64

Disassembly of section .init:

Disassembly of section .plt:

Disassembly of section .text:

00000000004111f0 <_ZN7Spatter13ConfigurationINS_6SerialEE7scatterEbm>: 4111f0: 41 57 push %r15 4111f2: 41 56 push %r14 4111f4: 41 55 push %r13 4111f6: 41 54 push %r12 4111f8: 49 89 fc mov %rdi,%r12 4111fb: 55 push %rbp 4111fc: 53 push %rbx 4111fd: 48 83 ec 18 sub $0x18,%rsp 411201: 48 8b 6f 58 mov 0x58(%rdi),%rbp 411205: 48 2b 6f 50 sub 0x50(%rdi),%rbp 411209: 48 89 eb mov %rbp,%rbx 41120c: 48 89 14 24 mov %rdx,(%rsp) 411210: 48 c1 fb 03 sar $0x3,%rbx 411214: 40 84 f6 test %sil,%sil 411217: 0f 85 9b 00 00 00 jne 4112b8 <_ZN7Spatter13ConfigurationINS_6SerialEE7scatterEbm+0xc8> 41121d: 4c 8b 9f 28 01 00 00 mov 0x128(%rdi),%r11 411224: 4d 85 db test %r11,%r11 411227: 0f 84 7c 00 00 00 je 4112a9 <_ZN7Spatter13ConfigurationINS_6SerialEE7scatterEbm+0xb9> 41122d: 48 85 db test %rbx,%rbx 411230: 74 77 je 4112a9 <_ZN7Spatter13ConfigurationINS_6SerialEE7scatterEbm+0xb9> 411232: 49 8b 84 24 e0 00 00 mov 0xe0(%r12),%rax 411239: 00 41123a: 4d 8b 44 24 50 mov 0x50(%r12),%r8 41123f: 45 31 c9 xor %r9d,%r9d 411242: 4d 8b 94 24 20 01 00 mov 0x120(%r12),%r10 411249: 00 41124a: 4d 8b ac 24 00 01 00 mov 0x100(%r12),%r13 411251: 00 411252: 4c 8b 30 mov (%rax),%r14 411255: 49 8b 84 24 98 00 00 mov 0x98(%r12),%rax 41125c: 00 41125d: 48 8b 38 mov (%rax),%rdi 411260: 4c 89 c8 mov %r9,%rax 411263: 31 d2 xor %edx,%edx 411265: 4c 89 e9 mov %r13,%rcx 411268: 49 f7 f2 div %r10 41126b: 49 0f af c9 imul %r9,%rcx 41126f: 31 c0 xor %eax,%eax 411271: 48 0f af d5 imul %rbp,%rdx 411275: 4d 8d 3c 16 lea (%r14,%rdx,1),%r15 411279: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) 411280: 49 8b 14 c0 mov (%r8,%rax,8),%rdx 411284: f2 41 0f 10 04 c7 movsd (%r15,%rax,8),%xmm0 41128a: 48 83 c0 01 add $0x1,%rax 41128e: 48 01 ca add %rcx,%rdx 411291: f2 0f 11 04 d7 movsd %xmm0,(%rdi,%rdx,8) 411296: 48 39 c3 cmp %rax,%rbx 411299: 75 e5 jne 411280 <_ZN7Spatter13ConfigurationINS_6SerialEE7scatterEbm+0x90> 41129b: 49 83 c1 01 add $0x1,%r9 41129f: 4d 39 d9 cmp %r11,%r9 4112a2: 72 bc jb 411260 <_ZN7Spatter13ConfigurationINS_6SerialEE7scatterEbm+0x70> 4112a4: 40 84 f6 test %sil,%sil 4112a7: 75 73 jne 41131c <_ZN7Spatter13ConfigurationINS_6SerialEE7scatterEbm+0x12c> 4112a9: 48 83 c4 18 add $0x18,%rsp 4112ad: 5b pop %rbx 4112ae: 5d pop %rbp 4112af: 41 5c pop %r12 4112b1: 41 5d pop %r13 4112b3: 41 5e pop %r14 4112b5: 41 5f pop %r15 4112b7: c3 ret
4112b8: 4c 8d af 60 01 00 00 lea 0x160(%rdi),%r13 4112bf: 89 74 24 0c mov %esi,0xc(%rsp) 4112c3: 4c 89 ef mov %r13,%rdi 4112c6: e8 15 40 02 00 call 4352e0 <_ZN7Spatter5Timer5startEv> 4112cb: 4d 8b 9c 24 28 01 00 mov 0x128(%r12),%r11 4112d2: 00 4112d3: 8b 74 24 0c mov 0xc(%rsp),%esi 4112d7: 4d 85 db test %r11,%r11 4112da: 74 09 je 4112e5 <_ZN7Spatter13ConfigurationINS_6SerialEE7scatterEbm+0xf5> 4112dc: 48 85 db test %rbx,%rbx 4112df: 0f 85 4d ff ff ff jne 411232 <_ZN7Spatter13ConfigurationINS_6SerialEE7scatterEbm+0x42> 4112e5: 4c 89 ef mov %r13,%rdi 4112e8: e8 13 40 02 00 call 435300 <_ZN7Spatter5Timer4stopEv> 4112ed: 4c 89 ef mov %r13,%rdi 4112f0: e8 3b 40 02 00 call 435330 <_ZNK7Spatter5Timer7secondsEv> 4112f5: 49 8b 84 24 78 01 00 mov 0x178(%r12),%rax 4112fc: 00 4112fd: 48 8b 34 24 mov (%rsp),%rsi 411301: 4c 89 ef mov %r13,%rdi 411304: f2 0f 11 04 f0 movsd %xmm0,(%rax,%rsi,8) 411309: 48 83 c4 18 add $0x18,%rsp 41130d: 5b pop %rbx 41130e: 5d pop %rbp 41130f: 41 5c pop %r12 411311: 41 5d pop %r13 411313: 41 5e pop %r14 411315: 41 5f pop %r15 411317: e9 24 40 02 00 jmp 435340 <_ZN7Spatter5Timer5clearEv> 41131c: 4d 8d ac 24 60 01 00 lea 0x160(%r12),%r13 411323: 00 411324: eb bf jmp 4112e5 <_ZN7Spatter13ConfigurationINS_6SerialEE7scatterEbm+0xf5>

Disassembly of section .fini:

MultiGather Kernel

Version 1.1 (OpenMP Backend)

spatter: file format elf64-x86-64

Disassembly of section .init:

Disassembly of section .plt:

Disassembly of section .text:

000000000041e170 <multigather_smallbuf._omp_fn.0>: 41e170: 41 56 push %r14 41e172: 41 55 push %r13 41e174: 49 89 fd mov %rdi,%r13 41e177: 41 54 push %r12 41e179: 55 push %rbp 41e17a: 53 push %rbx 41e17b: 48 8b 5f 30 mov 0x30(%rdi),%rbx 41e17f: e8 5c 34 fe ff call 4015e0 omp_get_thread_num@plt 41e184: 48 85 db test %rbx,%rbx 41e187: 0f 84 a8 00 00 00 je 41e235 <multigather_smallbuf._omp_fn.0+0xc5> 41e18d: 89 c5 mov %eax,%ebp 41e18f: e8 7c 35 fe ff call 401710 omp_get_num_threads@plt 41e194: 31 d2 xor %edx,%edx 41e196: 48 63 cd movslq %ebp,%rcx 41e199: 48 63 f0 movslq %eax,%rsi 41e19c: 48 89 d8 mov %rbx,%rax 41e19f: 48 f7 f6 div %rsi 41e1a2: 48 89 c3 mov %rax,%rbx 41e1a5: 48 39 d1 cmp %rdx,%rcx 41e1a8: 0f 82 90 00 00 00 jb 41e23e <multigather_smallbuf._omp_fn.0+0xce> 41e1ae: 49 89 db mov %rbx,%r11 41e1b1: 4c 0f af d9 imul %rcx,%r11 41e1b5: 49 01 d3 add %rdx,%r11 41e1b8: 4c 01 db add %r11,%rbx 41e1bb: 49 39 db cmp %rbx,%r11 41e1be: 73 75 jae 41e235 <multigather_smallbuf._omp_fn.0+0xc5> 41e1c0: 49 8b 45 00 mov 0x0(%r13),%rax 41e1c4: 49 8b 75 20 mov 0x20(%r13),%rsi 41e1c8: 49 8b 6d 38 mov 0x38(%r13),%rbp 41e1cc: 4d 8b 65 28 mov 0x28(%r13),%r12 41e1d0: 4d 8b 45 18 mov 0x18(%r13),%r8 41e1d4: 4d 8b 4d 10 mov 0x10(%r13),%r9 41e1d8: 4d 8b 55 08 mov 0x8(%r13),%r10 41e1dc: 48 8b 3c c8 mov (%rax,%rcx,8),%rdi 41e1e0: 48 85 f6 test %rsi,%rsi 41e1e3: 74 50 je 41e235 <multigather_smallbuf._omp_fn.0+0xc5> 41e1e5: 4c 89 e1 mov %r12,%rcx 41e1e8: 49 0f af cb imul %r11,%rcx 41e1ec: 0f 1f 40 00 nopl 0x0(%rax) 41e1f0: 4c 89 d8 mov %r11,%rax 41e1f3: 31 d2 xor %edx,%edx 41e1f5: 48 f7 f5 div %rbp 41e1f8: 31 c0 xor %eax,%eax 41e1fa: 48 0f af d6 imul %rsi,%rdx 41e1fe: 4c 8d 2c d7 lea (%rdi,%rdx,8),%r13 41e202: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 41e208: 49 8b 14 c0 mov (%r8,%rax,8),%rdx 41e20c: 4d 8b 34 d1 mov (%r9,%rdx,8),%r14 41e210: 49 01 ce add %rcx,%r14 41e213: f2 43 0f 10 04 f2 movsd (%r10,%r14,8),%xmm0 41e219: f2 41 0f 11 44 c5 00 movsd %xmm0,0x0(%r13,%rax,8) 41e220: 48 83 c0 01 add $0x1,%rax 41e224: 48 39 c6 cmp %rax,%rsi 41e227: 75 df jne 41e208 <multigather_smallbuf._omp_fn.0+0x98> 41e229: 49 83 c3 01 add $0x1,%r11 41e22d: 4c 01 e1 add %r12,%rcx 41e230: 4c 39 db cmp %r11,%rbx 41e233: 75 bb jne 41e1f0 <multigather_smallbuf._omp_fn.0+0x80> 41e235: 5b pop %rbx 41e236: 5d pop %rbp 41e237: 41 5c pop %r12 41e239: 41 5d pop %r13 41e23b: 41 5e pop %r14 41e23d: c3 ret
41e23e: 48 83 c3 01 add $0x1,%rbx 41e242: 31 d2 xor %edx,%edx 41e244: e9 65 ff ff ff jmp 41e1ae <multigather_smallbuf._omp_fn.0+0x3e>

Disassembly of section .fini:

Version 2 (OpenMP Backend)

spatter: file format elf64-x86-64

Disassembly of section .init:

Disassembly of section .plt:

Disassembly of section .text:

0000000000411e90 <_ZN7Spatter13ConfigurationINS_6OpenMPEE12multi_gatherEbm._omp_fn.3>: 411e90: 41 56 push %r14 411e92: 41 55 push %r13 411e94: 41 54 push %r12 411e96: 55 push %rbp 411e97: 48 89 fd mov %rdi,%rbp 411e9a: 53 push %rbx 411e9b: 4c 8b 2f mov (%rdi),%r13 411e9e: e8 ed 1d ff ff call 403c90 omp_get_thread_num@plt 411ea3: 49 8b 9d 28 01 00 00 mov 0x128(%r13),%rbx 411eaa: 48 85 db test %rbx,%rbx 411ead: 0f 84 c2 00 00 00 je 411f75 <_ZN7Spatter13ConfigurationINS_6OpenMPEE12multi_gatherEbm._omp_fn.3+0xe5> 411eb3: 41 89 c4 mov %eax,%r12d 411eb6: e8 e5 1e ff ff call 403da0 omp_get_num_threads@plt 411ebb: 31 d2 xor %edx,%edx 411ebd: 49 63 cc movslq %r12d,%rcx 411ec0: 48 63 f0 movslq %eax,%rsi 411ec3: 48 89 d8 mov %rbx,%rax 411ec6: 48 f7 f6 div %rsi 411ec9: 48 39 d1 cmp %rdx,%rcx 411ecc: 0f 82 ac 00 00 00 jb 411f7e <_ZN7Spatter13ConfigurationINS_6OpenMPEE12multi_gatherEbm._omp_fn.3+0xee> 411ed2: 49 89 c2 mov %rax,%r10 411ed5: 4c 0f af d1 imul %rcx,%r10 411ed9: 49 01 d2 add %rdx,%r10 411edc: 4e 8d 1c 10 lea (%rax,%r10,1),%r11 411ee0: 4d 39 da cmp %r11,%r10 411ee3: 0f 83 8c 00 00 00 jae 411f75 <_ZN7Spatter13ConfigurationINS_6OpenMPEE12multi_gatherEbm._omp_fn.3+0xe5> 411ee9: 49 8b 85 98 00 00 00 mov 0x98(%r13),%rax 411ef0: 49 8b 95 e8 00 00 00 mov 0xe8(%r13),%rdx 411ef7: 48 8b 75 08 mov 0x8(%rbp),%rsi 411efb: 49 8b 9d 20 01 00 00 mov 0x120(%r13),%rbx 411f02: 48 8b 38 mov (%rax),%rdi 411f05: 48 8d 04 49 lea (%rcx,%rcx,2),%rax 411f09: 48 8b 0a mov (%rdx),%rcx 411f0c: 49 8b ad 00 01 00 00 mov 0x100(%r13),%rbp 411f13: 48 8d 04 c1 lea (%rcx,%rax,8),%rax 411f17: 4c 8b 20 mov (%rax),%r12 411f1a: 48 85 f6 test %rsi,%rsi 411f1d: 74 56 je 411f75 <_ZN7Spatter13ConfigurationINS_6OpenMPEE12multi_gatherEbm._omp_fn.3+0xe5> 411f1f: 4c 89 d1 mov %r10,%rcx 411f22: 4d 8b 4d 68 mov 0x68(%r13),%r9 411f26: 4d 8b 45 50 mov 0x50(%r13),%r8 411f2a: 48 0f af cd imul %rbp,%rcx 411f2e: 66 90 xchg %ax,%ax 411f30: 4c 89 d0 mov %r10,%rax 411f33: 31 d2 xor %edx,%edx 411f35: 48 f7 f3 div %rbx 411f38: 31 c0 xor %eax,%eax 411f3a: 48 0f af d6 imul %rsi,%rdx 411f3e: 4d 8d 2c d4 lea (%r12,%rdx,8),%r13 411f42: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 411f48: 49 8b 14 c1 mov (%r9,%rax,8),%rdx 411f4c: 4d 8b 34 d0 mov (%r8,%rdx,8),%r14 411f50: 49 01 ce add %rcx,%r14 411f53: f2 42 0f 10 04 f7 movsd (%rdi,%r14,8),%xmm0 411f59: f2 41 0f 11 44 c5 00 movsd %xmm0,0x0(%r13,%rax,8) 411f60: 48 83 c0 01 add $0x1,%rax 411f64: 48 39 c6 cmp %rax,%rsi 411f67: 75 df jne 411f48 <_ZN7Spatter13ConfigurationINS_6OpenMPEE12multi_gatherEbm._omp_fn.3+0xb8> 411f69: 49 83 c2 01 add $0x1,%r10 411f6d: 48 01 e9 add %rbp,%rcx 411f70: 4d 39 d3 cmp %r10,%r11 411f73: 75 bb jne 411f30 <_ZN7Spatter13ConfigurationINS_6OpenMPEE12multi_gatherEbm._omp_fn.3+0xa0> 411f75: 5b pop %rbx 411f76: 5d pop %rbp 411f77: 41 5c pop %r12 411f79: 41 5d pop %r13 411f7b: 41 5e pop %r14 411f7d: c3 ret
411f7e: 48 83 c0 01 add $0x1,%rax 411f82: 31 d2 xor %edx,%edx 411f84: e9 49 ff ff ff jmp 411ed2 <_ZN7Spatter13ConfigurationINS_6OpenMPEE12multi_gatherEbm._omp_fn.3+0x42>

Disassembly of section .fini:


Version 1.1 (Serial Backend)

spatter-serial: file format elf64-x86-64

Disassembly of section .init:

Disassembly of section .plt:

Disassembly of section .text:

000000000041dae0 <multigather_smallbuf_serial>: 41dae0: 41 56 push %r14 41dae2: 41 55 push %r13 41dae4: 41 54 push %r12 41dae6: 55 push %rbp 41dae7: 53 push %rbx 41dae8: 48 8b 6c 24 30 mov 0x30(%rsp),%rbp 41daed: 48 8b 5c 24 38 mov 0x38(%rsp),%rbx 41daf2: 48 85 ed test %rbp,%rbp 41daf5: 74 5e je 41db55 <multigather_smallbuf_serial+0x75> 41daf7: 4c 8b 27 mov (%rdi),%r12 41dafa: 4d 85 c0 test %r8,%r8 41dafd: 74 56 je 41db55 <multigather_smallbuf_serial+0x75> 41daff: 49 89 d2 mov %rdx,%r10 41db02: 31 ff xor %edi,%edi 41db04: 45 31 db xor %r11d,%r11d 41db07: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1) 41db0e: 00 00 41db10: 4c 89 d8 mov %r11,%rax 41db13: 31 d2 xor %edx,%edx 41db15: 48 f7 f3 div %rbx 41db18: 31 c0 xor %eax,%eax 41db1a: 49 0f af d0 imul %r8,%rdx 41db1e: 4d 8d 2c d4 lea (%r12,%rdx,8),%r13 41db22: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 41db28: 48 8b 14 c1 mov (%rcx,%rax,8),%rdx 41db2c: 4d 8b 34 d2 mov (%r10,%rdx,8),%r14 41db30: 49 01 fe add %rdi,%r14 41db33: f2 42 0f 10 04 f6 movsd (%rsi,%r14,8),%xmm0 41db39: f2 41 0f 11 44 c5 00 movsd %xmm0,0x0(%r13,%rax,8) 41db40: 48 83 c0 01 add $0x1,%rax 41db44: 49 39 c0 cmp %rax,%r8 41db47: 75 df jne 41db28 <multigather_smallbuf_serial+0x48> 41db49: 49 83 c3 01 add $0x1,%r11 41db4d: 4c 01 cf add %r9,%rdi 41db50: 4c 39 dd cmp %r11,%rbp 41db53: 75 bb jne 41db10 <multigather_smallbuf_serial+0x30> 41db55: 5b pop %rbx 41db56: 5d pop %rbp 41db57: 41 5c pop %r12 41db59: 41 5d pop %r13 41db5b: 41 5e pop %r14 41db5d: c3 ret

Disassembly of section .fini:

Version 2 (Serial Backend)

spatter-serial: file format elf64-x86-64

Disassembly of section .init:

Disassembly of section .plt:

Disassembly of section .text:

0000000000411470 <_ZN7Spatter13ConfigurationINS_6SerialEE12multi_gatherEbm>: 411470: 41 57 push %r15 411472: 41 56 push %r14 411474: 41 55 push %r13 411476: 41 54 push %r12 411478: 49 89 fc mov %rdi,%r12 41147b: 55 push %rbp 41147c: 53 push %rbx 41147d: 48 83 ec 28 sub $0x28,%rsp 411481: 48 8b 6f 70 mov 0x70(%rdi),%rbp 411485: 48 2b 6f 68 sub 0x68(%rdi),%rbp 411489: 48 89 eb mov %rbp,%rbx 41148c: 89 74 24 14 mov %esi,0x14(%rsp) 411490: 48 89 54 24 18 mov %rdx,0x18(%rsp) 411495: 48 c1 fb 03 sar $0x3,%rbx 411499: 40 84 f6 test %sil,%sil 41149c: 0f 85 b2 00 00 00 jne 411554 <_ZN7Spatter13ConfigurationINS_6SerialEE12multi_gatherEbm+0xe4> 4114a2: 4c 8b 9f 28 01 00 00 mov 0x128(%rdi),%r11 4114a9: 4d 85 db test %r11,%r11 4114ac: 0f 84 93 00 00 00 je 411545 <_ZN7Spatter13ConfigurationINS_6SerialEE12multi_gatherEbm+0xd5> 4114b2: 48 85 db test %rbx,%rbx 4114b5: 0f 84 8a 00 00 00 je 411545 <_ZN7Spatter13ConfigurationINS_6SerialEE12multi_gatherEbm+0xd5> 4114bb: 49 8b 84 24 98 00 00 mov 0x98(%r12),%rax 4114c2: 00 4114c3: 4d 8b 44 24 68 mov 0x68(%r12),%r8 4114c8: 45 31 c9 xor %r9d,%r9d 4114cb: 49 8b 7c 24 50 mov 0x50(%r12),%rdi 4114d0: 4d 8b b4 24 00 01 00 mov 0x100(%r12),%r14 4114d7: 00 4114d8: 48 8b 30 mov (%rax),%rsi 4114db: 49 8b 84 24 e0 00 00 mov 0xe0(%r12),%rax 4114e2: 00 4114e3: 4d 8b 94 24 20 01 00 mov 0x120(%r12),%r10 4114ea: 00 4114eb: 4c 8b 28 mov (%rax),%r13 4114ee: 66 90 xchg %ax,%ax 4114f0: 4c 89 c8 mov %r9,%rax 4114f3: 31 d2 xor %edx,%edx 4114f5: 4c 89 f1 mov %r14,%rcx 4114f8: 4c 89 4c 24 08 mov %r9,0x8(%rsp) 4114fd: 49 f7 f2 div %r10 411500: 49 0f af c9 imul %r9,%rcx 411504: 31 c0 xor %eax,%eax 411506: 48 0f af d5 imul %rbp,%rdx 41150a: 4d 8d 7c 15 00 lea 0x0(%r13,%rdx,1),%r15 41150f: 90 nop 411510: 49 8b 14 c0 mov (%r8,%rax,8),%rdx 411514: 4c 8b 0c d7 mov (%rdi,%rdx,8),%r9 411518: 49 01 c9 add %rcx,%r9 41151b: f2 42 0f 10 04 ce movsd (%rsi,%r9,8),%xmm0 411521: f2 41 0f 11 04 c7 movsd %xmm0,(%r15,%rax,8) 411527: 48 83 c0 01 add $0x1,%rax 41152b: 48 39 c3 cmp %rax,%rbx 41152e: 75 e0 jne 411510 <_ZN7Spatter13ConfigurationINS_6SerialEE12multi_gatherEbm+0xa0> 411530: 4c 8b 4c 24 08 mov 0x8(%rsp),%r9 411535: 49 83 c1 01 add $0x1,%r9 411539: 4d 39 d9 cmp %r11,%r9 41153c: 72 b2 jb 4114f0 <_ZN7Spatter13ConfigurationINS_6SerialEE12multi_gatherEbm+0x80> 41153e: 80 7c 24 14 00 cmpb $0x0,0x14(%rsp) 411543: 75 6c jne 4115b1 <_ZN7Spatter13ConfigurationINS_6SerialEE12multi_gatherEbm+0x141> 411545: 48 83 c4 28 add $0x28,%rsp 411549: 5b pop %rbx 41154a: 5d pop %rbp 41154b: 41 5c pop %r12 41154d: 41 5d pop %r13 41154f: 41 5e pop %r14 411551: 41 5f pop %r15 411553: c3 ret
411554: 4c 8d af 60 01 00 00 lea 0x160(%rdi),%r13 41155b: 4c 89 ef mov %r13,%rdi 41155e: e8 7d 3d 02 00 call 4352e0 <_ZN7Spatter5Timer5startEv> 411563: 4d 8b 9c 24 28 01 00 mov 0x128(%r12),%r11 41156a: 00 41156b: 4d 85 db test %r11,%r11 41156e: 74 09 je 411579 <_ZN7Spatter13ConfigurationINS_6SerialEE12multi_gatherEbm+0x109> 411570: 48 85 db test %rbx,%rbx 411573: 0f 85 42 ff ff ff jne 4114bb <_ZN7Spatter13ConfigurationINS_6SerialEE12multi_gatherEbm+0x4b> 411579: 4c 89 ef mov %r13,%rdi 41157c: e8 7f 3d 02 00 call 435300 <_ZN7Spatter5Timer4stopEv> 411581: 4c 89 ef mov %r13,%rdi 411584: e8 a7 3d 02 00 call 435330 <_ZNK7Spatter5Timer7secondsEv> 411589: 49 8b 84 24 78 01 00 mov 0x178(%r12),%rax 411590: 00 411591: 48 8b 7c 24 18 mov 0x18(%rsp),%rdi 411596: f2 0f 11 04 f8 movsd %xmm0,(%rax,%rdi,8) 41159b: 48 83 c4 28 add $0x28,%rsp 41159f: 4c 89 ef mov %r13,%rdi 4115a2: 5b pop %rbx 4115a3: 5d pop %rbp 4115a4: 41 5c pop %r12 4115a6: 41 5d pop %r13 4115a8: 41 5e pop %r14 4115aa: 41 5f pop %r15 4115ac: e9 8f 3d 02 00 jmp 435340 <_ZN7Spatter5Timer5clearEv> 4115b1: 4d 8d ac 24 60 01 00 lea 0x160(%r12),%r13 4115b8: 00 4115b9: eb be jmp 411579 <_ZN7Spatter13ConfigurationINS_6SerialEE12multi_gatherEbm+0x109>

Disassembly of section .fini:

MultiScatter Kernel

Version 1.1 (OpenMP Backend)

spatter: file format elf64-x86-64

Disassembly of section .init:

Disassembly of section .plt:

Disassembly of section .text:

000000000041e340 <multiscatter_smallbuf._omp_fn.2>: 41e340: 41 56 push %r14 41e342: 41 55 push %r13 41e344: 49 89 fd mov %rdi,%r13 41e347: 41 54 push %r12 41e349: 55 push %rbp 41e34a: 53 push %rbx 41e34b: 48 8b 5f 30 mov 0x30(%rdi),%rbx 41e34f: e8 8c 32 fe ff call 4015e0 omp_get_thread_num@plt 41e354: 48 85 db test %rbx,%rbx 41e357: 0f 84 a8 00 00 00 je 41e405 <multiscatter_smallbuf._omp_fn.2+0xc5> 41e35d: 89 c5 mov %eax,%ebp 41e35f: e8 ac 33 fe ff call 401710 omp_get_num_threads@plt 41e364: 31 d2 xor %edx,%edx 41e366: 48 63 cd movslq %ebp,%rcx 41e369: 48 63 f0 movslq %eax,%rsi 41e36c: 48 89 d8 mov %rbx,%rax 41e36f: 48 f7 f6 div %rsi 41e372: 48 89 c3 mov %rax,%rbx 41e375: 48 39 d1 cmp %rdx,%rcx 41e378: 0f 82 90 00 00 00 jb 41e40e <multiscatter_smallbuf._omp_fn.2+0xce> 41e37e: 49 89 db mov %rbx,%r11 41e381: 4c 0f af d9 imul %rcx,%r11 41e385: 49 01 d3 add %rdx,%r11 41e388: 4c 01 db add %r11,%rbx 41e38b: 49 39 db cmp %rbx,%r11 41e38e: 73 75 jae 41e405 <multiscatter_smallbuf._omp_fn.2+0xc5> 41e390: 49 8b 45 08 mov 0x8(%r13),%rax 41e394: 49 8b 75 20 mov 0x20(%r13),%rsi 41e398: 49 8b 6d 38 mov 0x38(%r13),%rbp 41e39c: 4d 8b 65 28 mov 0x28(%r13),%r12 41e3a0: 4d 8b 45 18 mov 0x18(%r13),%r8 41e3a4: 4d 8b 4d 10 mov 0x10(%r13),%r9 41e3a8: 4d 8b 55 00 mov 0x0(%r13),%r10 41e3ac: 48 8b 3c c8 mov (%rax,%rcx,8),%rdi 41e3b0: 48 85 f6 test %rsi,%rsi 41e3b3: 74 50 je 41e405 <multiscatter_smallbuf._omp_fn.2+0xc5> 41e3b5: 4c 89 e1 mov %r12,%rcx 41e3b8: 49 0f af cb imul %r11,%rcx 41e3bc: 0f 1f 40 00 nopl 0x0(%rax) 41e3c0: 4c 89 d8 mov %r11,%rax 41e3c3: 31 d2 xor %edx,%edx 41e3c5: 48 f7 f5 div %rbp 41e3c8: 31 c0 xor %eax,%eax 41e3ca: 48 0f af d6 imul %rsi,%rdx 41e3ce: 4c 8d 2c d7 lea (%rdi,%rdx,8),%r13 41e3d2: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 41e3d8: 49 8b 14 c0 mov (%r8,%rax,8),%rdx 41e3dc: f2 41 0f 10 44 c5 00 movsd 0x0(%r13,%rax,8),%xmm0 41e3e3: 48 83 c0 01 add $0x1,%rax 41e3e7: 4d 8b 34 d1 mov (%r9,%rdx,8),%r14 41e3eb: 49 01 ce add %rcx,%r14 41e3ee: f2 43 0f 11 04 f2 movsd %xmm0,(%r10,%r14,8) 41e3f4: 48 39 c6 cmp %rax,%rsi 41e3f7: 75 df jne 41e3d8 <multiscatter_smallbuf._omp_fn.2+0x98> 41e3f9: 49 83 c3 01 add $0x1,%r11 41e3fd: 4c 01 e1 add %r12,%rcx 41e400: 4c 39 db cmp %r11,%rbx 41e403: 75 bb jne 41e3c0 <multiscatter_smallbuf._omp_fn.2+0x80> 41e405: 5b pop %rbx 41e406: 5d pop %rbp 41e407: 41 5c pop %r12 41e409: 41 5d pop %r13 41e40b: 41 5e pop %r14 41e40d: c3 ret
41e40e: 48 83 c3 01 add $0x1,%rbx 41e412: 31 d2 xor %edx,%edx 41e414: e9 65 ff ff ff jmp 41e37e <multiscatter_smallbuf._omp_fn.2+0x3e>

Disassembly of section .fini:

Version 2 (OpenMP Backend)

spatter: file format elf64-x86-64

Disassembly of section .init:

Disassembly of section .plt:

Disassembly of section .text:

0000000000411f90 <_ZN7Spatter13ConfigurationINS_6OpenMPEE13multi_scatterEbm._omp_fn.4>: 411f90: 41 56 push %r14 411f92: 41 55 push %r13 411f94: 41 54 push %r12 411f96: 55 push %rbp 411f97: 48 89 fd mov %rdi,%rbp 411f9a: 53 push %rbx 411f9b: 4c 8b 2f mov (%rdi),%r13 411f9e: e8 ed 1c ff ff call 403c90 omp_get_thread_num@plt 411fa3: 49 8b 9d 28 01 00 00 mov 0x128(%r13),%rbx 411faa: 48 85 db test %rbx,%rbx 411fad: 0f 84 ca 00 00 00 je 41207d <_ZN7Spatter13ConfigurationINS_6OpenMPEE13multi_scatterEbm._omp_fn.4+0xed> 411fb3: 41 89 c4 mov %eax,%r12d 411fb6: e8 e5 1d ff ff call 403da0 omp_get_num_threads@plt 411fbb: 31 d2 xor %edx,%edx 411fbd: 49 63 cc movslq %r12d,%rcx 411fc0: 48 63 f0 movslq %eax,%rsi 411fc3: 48 89 d8 mov %rbx,%rax 411fc6: 48 f7 f6 div %rsi 411fc9: 48 39 d1 cmp %rdx,%rcx 411fcc: 0f 82 b4 00 00 00 jb 412086 <_ZN7Spatter13ConfigurationINS_6OpenMPEE13multi_scatterEbm._omp_fn.4+0xf6> 411fd2: 49 89 c2 mov %rax,%r10 411fd5: 4c 0f af d1 imul %rcx,%r10 411fd9: 49 01 d2 add %rdx,%r10 411fdc: 4e 8d 1c 10 lea (%rax,%r10,1),%r11 411fe0: 4d 39 da cmp %r11,%r10 411fe3: 0f 83 94 00 00 00 jae 41207d <_ZN7Spatter13ConfigurationINS_6OpenMPEE13multi_scatterEbm._omp_fn.4+0xed> 411fe9: 49 8b 85 98 00 00 00 mov 0x98(%r13),%rax 411ff0: 49 8b 95 e8 00 00 00 mov 0xe8(%r13),%rdx 411ff7: 48 8b 75 08 mov 0x8(%rbp),%rsi 411ffb: 49 8b 9d 20 01 00 00 mov 0x120(%r13),%rbx 412002: 48 8b 38 mov (%rax),%rdi 412005: 48 8d 04 49 lea (%rcx,%rcx,2),%rax 412009: 48 8b 0a mov (%rdx),%rcx 41200c: 49 8b ad 00 01 00 00 mov 0x100(%r13),%rbp 412013: 48 8d 04 c1 lea (%rcx,%rax,8),%rax 412017: 4c 8b 20 mov (%rax),%r12 41201a: 48 85 f6 test %rsi,%rsi 41201d: 74 5e je 41207d <_ZN7Spatter13ConfigurationINS_6OpenMPEE13multi_scatterEbm._omp_fn.4+0xed> 41201f: 4c 89 d1 mov %r10,%rcx 412022: 4d 8b 8d 80 00 00 00 mov 0x80(%r13),%r9 412029: 4d 8b 45 50 mov 0x50(%r13),%r8 41202d: 48 0f af cd imul %rbp,%rcx 412031: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) 412038: 4c 89 d0 mov %r10,%rax 41203b: 31 d2 xor %edx,%edx 41203d: 48 f7 f3 div %rbx 412040: 31 c0 xor %eax,%eax 412042: 48 0f af d6 imul %rsi,%rdx 412046: 4d 8d 2c d4 lea (%r12,%rdx,8),%r13 41204a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 412050: 49 8b 14 c1 mov (%r9,%rax,8),%rdx 412054: f2 41 0f 10 44 c5 00 movsd 0x0(%r13,%rax,8),%xmm0 41205b: 48 83 c0 01 add $0x1,%rax 41205f: 4d 8b 34 d0 mov (%r8,%rdx,8),%r14 412063: 49 01 ce add %rcx,%r14 412066: f2 42 0f 11 04 f7 movsd %xmm0,(%rdi,%r14,8) 41206c: 48 39 c6 cmp %rax,%rsi 41206f: 75 df jne 412050 <_ZN7Spatter13ConfigurationINS_6OpenMPEE13multi_scatterEbm._omp_fn.4+0xc0> 412071: 49 83 c2 01 add $0x1,%r10 412075: 48 01 e9 add %rbp,%rcx 412078: 4d 39 d3 cmp %r10,%r11 41207b: 75 bb jne 412038 <_ZN7Spatter13ConfigurationINS_6OpenMPEE13multi_scatterEbm._omp_fn.4+0xa8> 41207d: 5b pop %rbx 41207e: 5d pop %rbp 41207f: 41 5c pop %r12 412081: 41 5d pop %r13 412083: 41 5e pop %r14 412085: c3 ret
412086: 48 83 c0 01 add $0x1,%rax 41208a: 31 d2 xor %edx,%edx 41208c: e9 41 ff ff ff jmp 411fd2 <_ZN7Spatter13ConfigurationINS_6OpenMPEE13multi_scatterEbm._omp_fn.4+0x42>

Disassembly of section .fini:


Version 1.1 (Serial Backend)

spatter-serial: file format elf64-x86-64

Disassembly of section .init:

Disassembly of section .plt:

Disassembly of section .text:

000000000041db60 <multiscatter_smallbuf_serial>: 41db60: 41 56 push %r14 41db62: 41 55 push %r13 41db64: 41 54 push %r12 41db66: 55 push %rbp 41db67: 53 push %rbx 41db68: 48 8b 6c 24 30 mov 0x30(%rsp),%rbp 41db6d: 48 8b 5c 24 38 mov 0x38(%rsp),%rbx 41db72: 48 85 ed test %rbp,%rbp 41db75: 74 5e je 41dbd5 <multiscatter_smallbuf_serial+0x75> 41db77: 4c 8b 26 mov (%rsi),%r12 41db7a: 4d 85 c0 test %r8,%r8 41db7d: 74 56 je 41dbd5 <multiscatter_smallbuf_serial+0x75> 41db7f: 49 89 d2 mov %rdx,%r10 41db82: 31 f6 xor %esi,%esi 41db84: 45 31 db xor %r11d,%r11d 41db87: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1) 41db8e: 00 00 41db90: 4c 89 d8 mov %r11,%rax 41db93: 31 d2 xor %edx,%edx 41db95: 48 f7 f3 div %rbx 41db98: 31 c0 xor %eax,%eax 41db9a: 49 0f af d0 imul %r8,%rdx 41db9e: 4d 8d 2c d4 lea (%r12,%rdx,8),%r13 41dba2: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 41dba8: 48 8b 14 c1 mov (%rcx,%rax,8),%rdx 41dbac: f2 41 0f 10 44 c5 00 movsd 0x0(%r13,%rax,8),%xmm0 41dbb3: 48 83 c0 01 add $0x1,%rax 41dbb7: 4d 8b 34 d2 mov (%r10,%rdx,8),%r14 41dbbb: 49 01 f6 add %rsi,%r14 41dbbe: f2 42 0f 11 04 f7 movsd %xmm0,(%rdi,%r14,8) 41dbc4: 49 39 c0 cmp %rax,%r8 41dbc7: 75 df jne 41dba8 <multiscatter_smallbuf_serial+0x48> 41dbc9: 49 83 c3 01 add $0x1,%r11 41dbcd: 4c 01 ce add %r9,%rsi 41dbd0: 4c 39 dd cmp %r11,%rbp 41dbd3: 75 bb jne 41db90 <multiscatter_smallbuf_serial+0x30> 41dbd5: 5b pop %rbx 41dbd6: 5d pop %rbp 41dbd7: 41 5c pop %r12 41dbd9: 41 5d pop %r13 41dbdb: 41 5e pop %r14 41dbdd: c3 ret

Disassembly of section .fini:

Version 2 (Serial Backend)

spatter-serial: file format elf64-x86-64

Disassembly of section .init:

Disassembly of section .plt:

Disassembly of section .text:

00000000004115c0 <_ZN7Spatter13ConfigurationINS_6SerialEE13multi_scatterEbm>: 4115c0: 41 57 push %r15 4115c2: 41 56 push %r14 4115c4: 41 55 push %r13 4115c6: 41 54 push %r12 4115c8: 49 89 fc mov %rdi,%r12 4115cb: 55 push %rbp 4115cc: 53 push %rbx 4115cd: 48 83 ec 28 sub $0x28,%rsp 4115d1: 48 8b af 88 00 00 00 mov 0x88(%rdi),%rbp 4115d8: 48 2b af 80 00 00 00 sub 0x80(%rdi),%rbp 4115df: 48 89 eb mov %rbp,%rbx 4115e2: 89 74 24 14 mov %esi,0x14(%rsp) 4115e6: 48 89 54 24 18 mov %rdx,0x18(%rsp) 4115eb: 48 c1 fb 03 sar $0x3,%rbx 4115ef: 40 84 f6 test %sil,%sil 4115f2: 0f 85 bc 00 00 00 jne 4116b4 <_ZN7Spatter13ConfigurationINS_6SerialEE13multi_scatterEbm+0xf4> 4115f8: 4c 8b 9f 28 01 00 00 mov 0x128(%rdi),%r11 4115ff: 4d 85 db test %r11,%r11 411602: 0f 84 9d 00 00 00 je 4116a5 <_ZN7Spatter13ConfigurationINS_6SerialEE13multi_scatterEbm+0xe5> 411608: 48 85 db test %rbx,%rbx 41160b: 0f 84 94 00 00 00 je 4116a5 <_ZN7Spatter13ConfigurationINS_6SerialEE13multi_scatterEbm+0xe5> 411611: 49 8b 84 24 e0 00 00 mov 0xe0(%r12),%rax 411618: 00 411619: 4d 8b 94 24 20 01 00 mov 0x120(%r12),%r10 411620: 00 411621: 45 31 c9 xor %r9d,%r9d 411624: 4d 8b 84 24 80 00 00 mov 0x80(%r12),%r8 41162b: 00 41162c: 49 8b 7c 24 50 mov 0x50(%r12),%rdi 411631: 4c 8b 30 mov (%rax),%r14 411634: 49 8b 84 24 98 00 00 mov 0x98(%r12),%rax 41163b: 00 41163c: 4d 8b ac 24 00 01 00 mov 0x100(%r12),%r13 411643: 00 411644: 48 8b 30 mov (%rax),%rsi 411647: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1) 41164e: 00 00 411650: 4c 89 c8 mov %r9,%rax 411653: 31 d2 xor %edx,%edx 411655: 4c 89 e9 mov %r13,%rcx 411658: 4c 89 4c 24 08 mov %r9,0x8(%rsp) 41165d: 49 f7 f2 div %r10 411660: 49 0f af c9 imul %r9,%rcx 411664: 31 c0 xor %eax,%eax 411666: 48 0f af d5 imul %rbp,%rdx 41166a: 4d 8d 3c 16 lea (%r14,%rdx,1),%r15 41166e: 66 90 xchg %ax,%ax 411670: 49 8b 14 c0 mov (%r8,%rax,8),%rdx 411674: f2 41 0f 10 04 c7 movsd (%r15,%rax,8),%xmm0 41167a: 48 83 c0 01 add $0x1,%rax 41167e: 4c 8b 0c d7 mov (%rdi,%rdx,8),%r9 411682: 49 01 c9 add %rcx,%r9 411685: f2 42 0f 11 04 ce movsd %xmm0,(%rsi,%r9,8) 41168b: 48 39 c3 cmp %rax,%rbx 41168e: 75 e0 jne 411670 <_ZN7Spatter13ConfigurationINS_6SerialEE13multi_scatterEbm+0xb0> 411690: 4c 8b 4c 24 08 mov 0x8(%rsp),%r9 411695: 49 83 c1 01 add $0x1,%r9 411699: 4d 39 d9 cmp %r11,%r9 41169c: 72 b2 jb 411650 <_ZN7Spatter13ConfigurationINS_6SerialEE13multi_scatterEbm+0x90> 41169e: 80 7c 24 14 00 cmpb $0x0,0x14(%rsp) 4116a3: 75 6c jne 411711 <_ZN7Spatter13ConfigurationINS_6SerialEE13multi_scatterEbm+0x151> 4116a5: 48 83 c4 28 add $0x28,%rsp 4116a9: 5b pop %rbx 4116aa: 5d pop %rbp 4116ab: 41 5c pop %r12 4116ad: 41 5d pop %r13 4116af: 41 5e pop %r14 4116b1: 41 5f pop %r15 4116b3: c3 ret
4116b4: 4c 8d af 60 01 00 00 lea 0x160(%rdi),%r13 4116bb: 4c 89 ef mov %r13,%rdi 4116be: e8 1d 3c 02 00 call 4352e0 <_ZN7Spatter5Timer5startEv> 4116c3: 4d 8b 9c 24 28 01 00 mov 0x128(%r12),%r11 4116ca: 00 4116cb: 4d 85 db test %r11,%r11 4116ce: 74 09 je 4116d9 <_ZN7Spatter13ConfigurationINS_6SerialEE13multi_scatterEbm+0x119> 4116d0: 48 85 db test %rbx,%rbx 4116d3: 0f 85 38 ff ff ff jne 411611 <_ZN7Spatter13ConfigurationINS_6SerialEE13multi_scatterEbm+0x51> 4116d9: 4c 89 ef mov %r13,%rdi 4116dc: e8 1f 3c 02 00 call 435300 <_ZN7Spatter5Timer4stopEv> 4116e1: 4c 89 ef mov %r13,%rdi 4116e4: e8 47 3c 02 00 call 435330 <_ZNK7Spatter5Timer7secondsEv> 4116e9: 49 8b 84 24 78 01 00 mov 0x178(%r12),%rax 4116f0: 00 4116f1: 48 8b 7c 24 18 mov 0x18(%rsp),%rdi 4116f6: f2 0f 11 04 f8 movsd %xmm0,(%rax,%rdi,8) 4116fb: 48 83 c4 28 add $0x28,%rsp 4116ff: 4c 89 ef mov %r13,%rdi 411702: 5b pop %rbx 411703: 5d pop %rbp 411704: 41 5c pop %r12 411706: 41 5d pop %r13 411708: 41 5e pop %r14 41170a: 41 5f pop %r15 41170c: e9 2f 3c 02 00 jmp 435340 <_ZN7Spatter5Timer5clearEv> 411711: 4d 8d ac 24 60 01 00 lea 0x160(%r12),%r13 411718: 00 411719: eb be jmp 4116d9 <_ZN7Spatter13ConfigurationINS_6SerialEE13multi_scatterEbm+0x119>

Disassembly of section .fini:

GatherScatter Kernel

Version 1.1 (OpenMP Backend)

spatter: file format elf64-x86-64

Disassembly of section .init:

Disassembly of section .plt:

Disassembly of section .text:

000000000041e7b0 <sg_smallbuf._omp_fn.6>: 41e7b0: 41 56 push %r14 41e7b2: 41 55 push %r13 41e7b4: 41 54 push %r12 41e7b6: 49 89 fc mov %rdi,%r12 41e7b9: 55 push %rbp 41e7ba: 53 push %rbx 41e7bb: 48 8b 6f 38 mov 0x38(%rdi),%rbp 41e7bf: e8 1c 2e fe ff call 4015e0 omp_get_thread_num@plt 41e7c4: 48 85 ed test %rbp,%rbp 41e7c7: 0f 84 9c 00 00 00 je 41e869 <sg_smallbuf._omp_fn.6+0xb9> 41e7cd: 48 63 d8 movslq %eax,%rbx 41e7d0: e8 3b 2f fe ff call 401710 omp_get_num_threads@plt 41e7d5: 31 d2 xor %edx,%edx 41e7d7: 48 63 c8 movslq %eax,%rcx 41e7da: 48 89 e8 mov %rbp,%rax 41e7dd: 48 f7 f1 div %rcx 41e7e0: 48 89 c5 mov %rax,%rbp 41e7e3: 48 39 d3 cmp %rdx,%rbx 41e7e6: 0f 82 86 00 00 00 jb 41e872 <sg_smallbuf._omp_fn.6+0xc2> 41e7ec: 48 0f af dd imul %rbp,%rbx 41e7f0: 48 01 d3 add %rdx,%rbx 41e7f3: 48 01 dd add %rbx,%rbp 41e7f6: 48 39 eb cmp %rbp,%rbx 41e7f9: 73 6e jae 41e869 <sg_smallbuf._omp_fn.6+0xb9> 41e7fb: 4d 8b 44 24 20 mov 0x20(%r12),%r8 41e800: 4d 8b 6c 24 30 mov 0x30(%r12),%r13 41e805: 4d 8b 74 24 28 mov 0x28(%r12),%r14 41e80a: 4d 8b 4c 24 18 mov 0x18(%r12),%r9 41e80f: 4d 8b 54 24 10 mov 0x10(%r12),%r10 41e814: 4d 8b 5c 24 08 mov 0x8(%r12),%r11 41e819: 49 8b 3c 24 mov (%r12),%rdi 41e81d: 4d 85 c0 test %r8,%r8 41e820: 74 47 je 41e869 <sg_smallbuf._omp_fn.6+0xb9> 41e822: 4c 89 f6 mov %r14,%rsi 41e825: 4c 89 e9 mov %r13,%rcx 41e828: 48 0f af f3 imul %rbx,%rsi 41e82c: 48 0f af cb imul %rbx,%rcx 41e830: 31 c0 xor %eax,%eax 41e832: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 41e838: 49 8b 14 c2 mov (%r10,%rax,8),%rdx 41e83c: 48 01 f2 add %rsi,%rdx 41e83f: f2 0f 10 04 d7 movsd (%rdi,%rdx,8),%xmm0 41e844: 49 8b 14 c1 mov (%r9,%rax,8),%rdx 41e848: 48 83 c0 01 add $0x1,%rax 41e84c: 48 01 ca add %rcx,%rdx 41e84f: f2 41 0f 11 04 d3 movsd %xmm0,(%r11,%rdx,8) 41e855: 49 39 c0 cmp %rax,%r8 41e858: 75 de jne 41e838 <sg_smallbuf._omp_fn.6+0x88> 41e85a: 48 83 c3 01 add $0x1,%rbx 41e85e: 4c 01 f6 add %r14,%rsi 41e861: 4c 01 e9 add %r13,%rcx 41e864: 48 39 dd cmp %rbx,%rbp 41e867: 75 c7 jne 41e830 <sg_smallbuf._omp_fn.6+0x80> 41e869: 5b pop %rbx 41e86a: 5d pop %rbp 41e86b: 41 5c pop %r12 41e86d: 41 5d pop %r13 41e86f: 41 5e pop %r14 41e871: c3 ret
41e872: 48 83 c5 01 add $0x1,%rbp 41e876: 31 d2 xor %edx,%edx 41e878: e9 6f ff ff ff jmp 41e7ec <sg_smallbuf._omp_fn.6+0x3c>

Disassembly of section .fini:

Version 2 (OpenMP Backend)

spatter: file format elf64-x86-64

Disassembly of section .init:

Disassembly of section .plt:

Disassembly of section .text:

0000000000411db0 <_ZN7Spatter13ConfigurationINS_6OpenMPEE14scatter_gatherEbm._omp_fn.2>: 411db0: 41 56 push %r14 411db2: 41 55 push %r13 411db4: 41 54 push %r12 411db6: 55 push %rbp 411db7: 53 push %rbx 411db8: 4c 8b 2f mov (%rdi),%r13 411dbb: 49 8b ad 28 01 00 00 mov 0x128(%r13),%rbp 411dc2: 48 85 ed test %rbp,%rbp 411dc5: 0f 84 b1 00 00 00 je 411e7c <_ZN7Spatter13ConfigurationINS_6OpenMPEE14scatter_gatherEbm._omp_fn.2+0xcc> 411dcb: 49 89 fc mov %rdi,%r12 411dce: e8 cd 1f ff ff call 403da0 omp_get_num_threads@plt 411dd3: 41 89 c6 mov %eax,%r14d 411dd6: e8 b5 1e ff ff call 403c90 omp_get_thread_num@plt 411ddb: 49 63 ce movslq %r14d,%rcx 411dde: 31 d2 xor %edx,%edx 411de0: 48 63 d8 movslq %eax,%rbx 411de3: 48 89 e8 mov %rbp,%rax 411de6: 48 f7 f1 div %rcx 411de9: 48 89 c5 mov %rax,%rbp 411dec: 48 39 d3 cmp %rdx,%rbx 411def: 0f 82 90 00 00 00 jb 411e85 <_ZN7Spatter13ConfigurationINS_6OpenMPEE14scatter_gatherEbm._omp_fn.2+0xd5> 411df5: 48 0f af dd imul %rbp,%rbx 411df9: 48 01 d3 add %rdx,%rbx 411dfc: 48 01 dd add %rbx,%rbp 411dff: 48 39 eb cmp %rbp,%rbx 411e02: 73 78 jae 411e7c <_ZN7Spatter13ConfigurationINS_6OpenMPEE14scatter_gatherEbm._omp_fn.2+0xcc> 411e04: 49 8b 7c 24 08 mov 0x8(%r12),%rdi 411e09: 48 85 ff test %rdi,%rdi 411e0c: 74 6e je 411e7c <_ZN7Spatter13ConfigurationINS_6OpenMPEE14scatter_gatherEbm._omp_fn.2+0xcc> 411e0e: 49 8b 85 b0 00 00 00 mov 0xb0(%r13),%rax 411e15: 4d 8b 5d 68 mov 0x68(%r13),%r11 411e19: 4d 8b b5 08 01 00 00 mov 0x108(%r13),%r14 411e20: 4d 8b 8d 80 00 00 00 mov 0x80(%r13),%r9 411e27: 4c 8b 10 mov (%rax),%r10 411e2a: 49 8b 85 c8 00 00 00 mov 0xc8(%r13),%rax 411e31: 4d 8b a5 10 01 00 00 mov 0x110(%r13),%r12 411e38: 4c 8b 00 mov (%rax),%r8 411e3b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 411e40: 48 89 de mov %rbx,%rsi 411e43: 48 89 d9 mov %rbx,%rcx 411e46: 31 c0 xor %eax,%eax 411e48: 49 0f af f6 imul %r14,%rsi 411e4c: 49 0f af cc imul %r12,%rcx 411e50: 49 8b 14 c3 mov (%r11,%rax,8),%rdx 411e54: 48 01 f2 add %rsi,%rdx 411e57: f2 41 0f 10 04 d2 movsd (%r10,%rdx,8),%xmm0 411e5d: 49 8b 14 c1 mov (%r9,%rax,8),%rdx 411e61: 48 83 c0 01 add $0x1,%rax 411e65: 48 01 ca add %rcx,%rdx 411e68: f2 41 0f 11 04 d0 movsd %xmm0,(%r8,%rdx,8) 411e6e: 48 39 c7 cmp %rax,%rdi 411e71: 75 dd jne 411e50 <_ZN7Spatter13ConfigurationINS_6OpenMPEE14scatter_gatherEbm._omp_fn.2+0xa0> 411e73: 48 83 c3 01 add $0x1,%rbx 411e77: 48 39 dd cmp %rbx,%rbp 411e7a: 75 c4 jne 411e40 <_ZN7Spatter13ConfigurationINS_6OpenMPEE14scatter_gatherEbm._omp_fn.2+0x90> 411e7c: 5b pop %rbx 411e7d: 5d pop %rbp 411e7e: 41 5c pop %r12 411e80: 41 5d pop %r13 411e82: 41 5e pop %r14 411e84: c3 ret
411e85: 48 83 c5 01 add $0x1,%rbp 411e89: 31 d2 xor %edx,%edx 411e8b: e9 65 ff ff ff jmp 411df5 <_ZN7Spatter13ConfigurationINS_6OpenMPEE14scatter_gatherEbm._omp_fn.2+0x45>

Disassembly of section .fini:


Version 1.1 (Serial Backend)

spatter-serial: file format elf64-x86-64

Disassembly of section .init:

Disassembly of section .plt:

Disassembly of section .text:

000000000041dcc0 <sg_smallbuf_serial>: 41dcc0: 41 56 push %r14 41dcc2: 41 55 push %r13 41dcc4: 41 54 push %r12 41dcc6: 55 push %rbp 41dcc7: 53 push %rbx 41dcc8: 4c 8b 6c 24 38 mov 0x38(%rsp),%r13 41dccd: 4c 8b 74 24 30 mov 0x30(%rsp),%r14 41dcd2: 4d 85 ed test %r13,%r13 41dcd5: 74 4b je 41dd22 <sg_smallbuf_serial+0x62> 41dcd7: 4d 85 c0 test %r8,%r8 41dcda: 74 46 je 41dd22 <sg_smallbuf_serial+0x62> 41dcdc: 31 ed xor %ebp,%ebp 41dcde: 31 db xor %ebx,%ebx 41dce0: 45 31 e4 xor %r12d,%r12d 41dce3: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 41dce8: 31 c0 xor %eax,%eax 41dcea: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 41dcf0: 4c 8b 1c c2 mov (%rdx,%rax,8),%r11 41dcf4: 4c 8b 14 c1 mov (%rcx,%rax,8),%r10 41dcf8: 48 83 c0 01 add $0x1,%rax 41dcfc: 49 01 db add %rbx,%r11 41dcff: 49 01 ea add %rbp,%r10 41dd02: f2 42 0f 10 04 df movsd (%rdi,%r11,8),%xmm0 41dd08: f2 42 0f 11 04 d6 movsd %xmm0,(%rsi,%r10,8) 41dd0e: 49 39 c0 cmp %rax,%r8 41dd11: 75 dd jne 41dcf0 <sg_smallbuf_serial+0x30> 41dd13: 49 83 c4 01 add $0x1,%r12 41dd17: 4c 01 cb add %r9,%rbx 41dd1a: 4c 01 f5 add %r14,%rbp 41dd1d: 4d 39 e5 cmp %r12,%r13 41dd20: 75 c6 jne 41dce8 <sg_smallbuf_serial+0x28> 41dd22: 5b pop %rbx 41dd23: 5d pop %rbp 41dd24: 41 5c pop %r12 41dd26: 41 5d pop %r13 41dd28: 41 5e pop %r14 41dd2a: c3 ret

Disassembly of section .fini:

Version 2 (Serial Backend)

spatter-serial: file format elf64-x86-64

Disassembly of section .init:

Disassembly of section .plt:

Disassembly of section .text:

0000000000411330 <_ZN7Spatter13ConfigurationINS_6SerialEE14scatter_gatherEbm>: 411330: 41 57 push %r15 411332: 41 56 push %r14 411334: 41 55 push %r13 411336: 41 54 push %r12 411338: 49 89 fc mov %rdi,%r12 41133b: 55 push %rbp 41133c: 53 push %rbx 41133d: 48 83 ec 18 sub $0x18,%rsp 411341: 48 8b 9f 88 00 00 00 mov 0x88(%rdi),%rbx 411348: 48 2b 9f 80 00 00 00 sub 0x80(%rdi),%rbx 41134f: 48 89 14 24 mov %rdx,(%rsp) 411353: 48 c1 fb 03 sar $0x3,%rbx 411357: 40 84 f6 test %sil,%sil 41135a: 0f 85 a0 00 00 00 jne 411400 <_ZN7Spatter13ConfigurationINS_6SerialEE14scatter_gatherEbm+0xd0> 411360: 4c 8b af 28 01 00 00 mov 0x128(%rdi),%r13 411367: 4d 85 ed test %r13,%r13 41136a: 0f 84 81 00 00 00 je 4113f1 <_ZN7Spatter13ConfigurationINS_6SerialEE14scatter_gatherEbm+0xc1> 411370: 48 85 db test %rbx,%rbx 411373: 74 7c je 4113f1 <_ZN7Spatter13ConfigurationINS_6SerialEE14scatter_gatherEbm+0xc1> 411375: 49 8b 84 24 b0 00 00 mov 0xb0(%r12),%rax 41137c: 00 41137d: 4d 8b 5c 24 68 mov 0x68(%r12),%r11 411382: 31 ed xor %ebp,%ebp 411384: 4d 8b bc 24 08 01 00 mov 0x108(%r12),%r15 41138b: 00 41138c: 4d 8b 8c 24 80 00 00 mov 0x80(%r12),%r9 411393: 00 411394: 4c 8b 10 mov (%rax),%r10 411397: 49 8b 84 24 c8 00 00 mov 0xc8(%r12),%rax 41139e: 00 41139f: 4d 8b b4 24 10 01 00 mov 0x110(%r12),%r14 4113a6: 00 4113a7: 4c 8b 00 mov (%rax),%r8 4113aa: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 4113b0: 4c 89 ff mov %r15,%rdi 4113b3: 4c 89 f1 mov %r14,%rcx 4113b6: 31 c0 xor %eax,%eax 4113b8: 48 0f af fd imul %rbp,%rdi 4113bc: 48 0f af cd imul %rbp,%rcx 4113c0: 49 8b 14 c3 mov (%r11,%rax,8),%rdx 4113c4: 48 01 fa add %rdi,%rdx 4113c7: f2 41 0f 10 04 d2 movsd (%r10,%rdx,8),%xmm0 4113cd: 49 8b 14 c1 mov (%r9,%rax,8),%rdx 4113d1: 48 83 c0 01 add $0x1,%rax 4113d5: 48 01 ca add %rcx,%rdx 4113d8: f2 41 0f 11 04 d0 movsd %xmm0,(%r8,%rdx,8) 4113de: 48 39 c3 cmp %rax,%rbx 4113e1: 75 dd jne 4113c0 <_ZN7Spatter13ConfigurationINS_6SerialEE14scatter_gatherEbm+0x90> 4113e3: 48 83 c5 01 add $0x1,%rbp 4113e7: 4c 39 ed cmp %r13,%rbp 4113ea: 72 c4 jb 4113b0 <_ZN7Spatter13ConfigurationINS_6SerialEE14scatter_gatherEbm+0x80> 4113ec: 40 84 f6 test %sil,%sil 4113ef: 75 73 jne 411464 <_ZN7Spatter13ConfigurationINS_6SerialEE14scatter_gatherEbm+0x134> 4113f1: 48 83 c4 18 add $0x18,%rsp 4113f5: 5b pop %rbx 4113f6: 5d pop %rbp 4113f7: 41 5c pop %r12 4113f9: 41 5d pop %r13 4113fb: 41 5e pop %r14 4113fd: 41 5f pop %r15 4113ff: c3 ret
411400: 48 8d af 60 01 00 00 lea 0x160(%rdi),%rbp 411407: 89 74 24 0c mov %esi,0xc(%rsp) 41140b: 48 89 ef mov %rbp,%rdi 41140e: e8 cd 3e 02 00 call 4352e0 <_ZN7Spatter5Timer5startEv> 411413: 4d 8b ac 24 28 01 00 mov 0x128(%r12),%r13 41141a: 00 41141b: 8b 74 24 0c mov 0xc(%rsp),%esi 41141f: 4d 85 ed test %r13,%r13 411422: 74 09 je 41142d <_ZN7Spatter13ConfigurationINS_6SerialEE14scatter_gatherEbm+0xfd> 411424: 48 85 db test %rbx,%rbx 411427: 0f 85 48 ff ff ff jne 411375 <_ZN7Spatter13ConfigurationINS_6SerialEE14scatter_gatherEbm+0x45> 41142d: 48 89 ef mov %rbp,%rdi 411430: e8 cb 3e 02 00 call 435300 <_ZN7Spatter5Timer4stopEv> 411435: 48 89 ef mov %rbp,%rdi 411438: e8 f3 3e 02 00 call 435330 <_ZNK7Spatter5Timer7secondsEv> 41143d: 49 8b 84 24 78 01 00 mov 0x178(%r12),%rax 411444: 00 411445: 48 8b 34 24 mov (%rsp),%rsi 411449: 48 89 ef mov %rbp,%rdi 41144c: f2 0f 11 04 f0 movsd %xmm0,(%rax,%rsi,8) 411451: 48 83 c4 18 add $0x18,%rsp 411455: 5b pop %rbx 411456: 5d pop %rbp 411457: 41 5c pop %r12 411459: 41 5d pop %r13 41145b: 41 5e pop %r14 41145d: 41 5f pop %r15 41145f: e9 dc 3e 02 00 jmp 435340 <_ZN7Spatter5Timer5clearEv> 411464: 49 8d ac 24 60 01 00 lea 0x160(%r12),%rbp 41146b: 00 41146c: eb bf jmp 41142d <_ZN7Spatter13ConfigurationINS_6SerialEE14scatter_gatherEbm+0xfd>

Disassembly of section .fini:

Using vTune to validate kernel functionality

Spatter was profiled with VTune to record the number of loads, stored, and instructions retired in the kernels. These metrics were used to investigate the performance gap observed on different platforms for the scatter, multiscatter, and gather-scatter kernels. VTune's memory access analysis was used to associate the number of loads and stores with each disassembled instruction from the kernel. Similarly, VTune's microarchitecture exploration analysis was used to record the number of instructions retired in the kernels.

The same application parameters were provided to VTune for both the memory access analysis and microarchitecture exploration analysis, with the kernel (-k flag) varying depending on the specific kernel being tested. Furthermore, the same configurations for the memory access and microarchitecture exploration analyses were used for all of the kernels tested.

Application Parameters

-pUNIFORM:8:1 -kscatter -l16777216 -v2 -t12
Configuration for Memory Access Analysis
Configuration for Microarchitecture Exploration Analysis

Gather Kernel

Memory Access

Version 1.1 (OpenMP Backend)
Version 2 (OpenMP Backend)

Microarchitecture Exploration

Version 1.1 (OpenMP Backend)
Version 2 (OpenMP Backend)

Scatter Kernel

Memory Access

Version 1.1 (OpenMP Backend)
Version 2 (OpenMP Backend)

Microarchitecture Exploration

Version 1.1 (OpenMP Backend)
Version 2 (OpenMP Backend)

MultiGather Kernel

Memory Access

Version 1.1 (OpenMP Backend)
Version 2 (OpenMP Backend)

Microarchitecture Exploration

Version 1.1 (OpenMP Backend)
Version 2 (OpenMP Backend)

MultiScatter Kernel

Memory Access

Version 1.1 (OpenMP Backend)
Version 2 (OpenMP Backend)

Microarchitecture Exploration

Version 1.1 (OpenMP Backend)
Version 2 (OpenMP Backend)

GatherScatter Kernel

Memory Access

Version 1.1 (OpenMP Backend)
Version 2 (OpenMP Backend)

Microarchitecture Exploration

Version 1.1 (OpenMP Backend)
Version 2 (OpenMP Backend)

Current Limitations

There is a performance gap between the 1.1 and 2.0 gather-scatter OpenMP kernels. This is shown in the output below, which was generated on an Intel Gold 6226R (Cascade Lake) platform. Note: the program arguments passed to Spatter are slightly different due to changes in version 2.0.

Spatter 1.1

Running Spatter version 1.0
Compiler: GNU ver. 8.5.0
Compiler Location: /usr/bin/gcc
Backend: OPENMP
Aggregate Results? YES

Run Configurations
[ {'name':'', 'kernel':'GS', 'pattern':[], 'pattern_gather':[0,1,2,3,4,5,6,7], 'pattern_scatter':[0,1,2,3,4,5,6,7], 'deltas':[], 'delta_gather':8, 'delta_scatter':8, 'length':16777216, 'agg':10, 'wrap':1, 'threads':12} ]

config  bytes        time(s)      bw(MB/s)    
0       2147483648   0.02078      103338.951358

Min         25%          Med          75%          Max         
103339       103339       103339       103339       103339      
H.Mean       H.StdErr    
103339       0
Program Arguments
-gUNIFORM:8:1 -hUNIFORM:8:1 -kgs -l16777216 -v2 -t12

Spatter 2.0

Running Spatter version 1.1
Compiler: GNU ver. 8.5.0
Backend: OpenMP
Aggregate Results? NO

Run Configurations
[ {'id': 0, 'kernel': 'sg', 'pattern': [], 'pattern-gather': [0, 1, 2, 3, 4, 5, 6, 7], 'pattern-scatter': [0, 1, 2, 3, 4, 5, 6, 7], 'delta': 8, 'delta-gather': 8, 'delta-scatter': 8, 'count': 16777216, 'wrap': 1, 'threads': 12} ]

config         bytes          time(s)        bw(MB/s)       
0              2147483648     0.0391121      54905.9
Program Arguments
-gUNIFORM:8:1 -uUNIFORM:8:1 -ksg -l16777216 -v2 -t12

Summary

We note that this validation process demonstrated that the two codebases are performing the equivalent operations specifically for the gather and scatter kernels. However, due to the large number of changes (specifically C to C++ data structure migration), it is not feasible to get exact parity for the kernels between the two versions of Spatter. For this reason, we highly recommend that you use a consistent tagged version of the codebase for your benchmarking and any comparisons between architectures.

Clone this wiki locally