Merge pull request #204 from jyoung3131/documentation-updates

Updated README for refactor. Added small change to gitignore
hpcgarage · Aug 5, 2024 · b94bb32 · b94bb32
2 parents 4a78a54 + 65f1041
commit b94bb32
Show file tree

Hide file tree

Showing 2 changed files with 45 additions and 96 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,12 +1,11 @@
 *.swp
 **/*.swp
-sgbench
 docs/*
 build/*
 build_*
 modulefiles/
-/configure_*
 src/python/env
 .vscode/
 *.pyc
 spatter-cuda-test.out
+json.tar.xz
diff --git a/README.md b/README.md
@@ -33,33 +33,33 @@ MultiGather:
 This diagram depicts a combined Gather/Scatter. Gather performs on the top half of this diagram and Scatter the second half.
 
 ## Building
-CMake is required to build Spatter
+CMake is required to build Spatter. Currently we require CMake 3.25 or newer.
 
 To build with CMake from the main source directory, use the following command structure:
 ```
-cmake -DCMAKE_BUILD_TYPE=<BUILD_TYPE> -DBACKEND=<BACKEND> -DCOMPILER=<COMPILER> -D<OPTIONAL> -B build_<BACKEND>_<COMPILER>_<OPTIONAL> -S .
-cd build_<BACKEND>_<COMPILER>_<OPTIONAL>
+cmake -DCMAKE_BUILD_TYPE=<BUILD_TYPE> -DUSE_<OPENMP/CUDA/MPI>=1 -B build_<BACKEND> -S .
+cd build_<BACKEND>
 make
 ```
-For example, to do a debug build with the serial backend and the GNU compiler:
+For example, to do a debug build with the serial backend:
 ```
-cmake -D CMAKE_BUILD_TYPE=Debug -DBACKEND=serial -DCOMPILER=gnu -B build_serial_gnu -S .
-cd build_serial_gnu
+cmake -DCMAKE_BUILD_TYPE=Debug -B build_serial -S .
+cd build_serial
 make
 ```
-To do an MPI build with GNU, note that `Release` build is the default choice when not specified:
+To do an OpenMP and MPI build:
 ```
-cmake -DBACKEND=openmp -DCOMPILER=gnu -DUSE_MPI=1 -B build_openmp_gnu_mpi -S .
+cmake -DUSE_OPENMP=1 -DUSE_MPI=1 -B build_openmp_mpi -S .
 ```
 
-To do an CUDA build, specify nvcc as the compiler. Note we usually use NVHPC to build Spatter:
+For CUDA builds, we normally load CUDA 11/12 using NVHPC:
 ```
-cmake -DBACKEND=cuda -DCOMPILER=nvcc -B build_cuda -S .
+cmake -DUSE_CUDA=1 -B build_cuda -S .
 ```
 For a complete list of build options, see [Build.md](Build.md)
 
 ## Running Spatter
-Spatter is highly configurable, but a basic run is rather simple. You must at least specify a pattern with `-p` and you should probably speficy a length with `-l`. Spatter will print out the time it took to perform the number of gathers you requested with `-l` and it will print out a bandwwidth. As a sanity check, the following run should give you a number close to your STREAM bandwith, although we note that this is a one-sided operation - it only performs gathers (reads).
+Spatter is highly configurable, but a basic run is rather simple. You must at least specify a pattern with `-p` and you should specify a length with `-l`. Spatter will print out the time it took to perform the number of gathers you requested with `-l` and it will print out a bandwwidth. As a sanity check, the following run should give you a number close to your STREAM bandwith, although we note that this is a one-sided operation - it only performs gathers (reads).
 ```
 ./spatter -pUNIFORM:8:1 -l$((2**24))
 ```
@@ -77,92 +77,44 @@ Spatter has a large number of arguments, broken up into two types. Backend confi
 Backend configuration arguments determine which language and device will be used. Spatter can be compiled with support for multiple backends, so it is possible to choose between backends and devices at runtime. Spatter will attempt intelliigently pick a backend for you, so you may not need to worry about these arguments at all! It is only necessary to specifiy which `--backend` you want if you have compiled with support for more than one, and it is only necessary to specify which `--device` you want if there would be ambiguity (for instance, if you have more than one GPU available). If you want to see what Spatter has chosen for you, you can run with `--verbose`.
 
 ```
-./spatter --help
-Usage:
- [-qiac] [--help] [--verbose] [--validate] -p <pattern> [-k <kernel>] [-o <s>] [-d <delta[,delta,...]>] [-l <n>] [-w <n>] [-R <n>] [-t <n>] [-v <n>] [-z <n>] [-m <n>] [-n <name>] [-s [<n>]] [-b <backend>] [--cl-platform=<platform>] [--cl-device=<device>] [-f <FILE>] [--morton=<n>] [--hilbert=<n>] [--roblock=<n>] [--stride=<n>] [--papi=<s>]
- --help                       Displays info about commands and then exits.
- --verbose                    Print info about default arguments that you have not overridden.
- -q, --no-print-header        Do not print header information.
- -i, --interactive            Pick the platform and the device interactively.
- --validate                   TODO
---atomic-writes=<n>           Enable atomic writes for CUDA backend [Default 0/off] (TODO: OpenMP atomics)  
- -a, --aggregate              Report a minimum time for all runs of a given configuration for 2 or more runs. [Default 1] (Do not use with PAPI)
- -c, --compress               TODO
- -p, --pattern=<pattern>      Specify either a built-in pattern (i.e. UNIFORM), a custom pattern (i.e. 1,2,3,4), or a path to a json file with a run-configuration.
- -g, --pattern-gather=<pattern> Valid wtih [kernel-name: GS, MultiGather]. Specify either a built-in pattern (i.e. UNIFORM), a custom pattern (i.e. 1,2,3,4), or a path to a json file with a run-configuration.
- -h, --pattern-scatter=<pattern> Valid with [kernel-name: GS, MultiScatter]. Specify either a built-in pattern (i.e. UNIFORM), a custom pattern (i.e. 1,2,3,4), or a path to a json file with a run-configuration.
- -k, --kernel-name=<kernel>   Specify the kernel you want to run. [Default: Gather, Options: Gather, Scatter, GS, MultiGather, MultiScatter]
- -o, --op=<s>                 TODO
- -d, --delta=<delta[,delta,...]> Specify one or more deltas. [Default: 8]
- -x, --delta-gather=<delta[,delta,...]> Specify one or more deltas. [Default: 8]
- -y, --delta-scatter=<delta[,delta,...]> Specify one or more deltas. [Default: 8] 
- -e, --boundary=<n>           Specify the boundary to mod pattern indices with to limit data array size.
- -j, --pattern-size=<n>       Valid with [kernel-name: Gather, Scatter] and custom patterns (i.e. not UNIFORM, MS1, LAPLACIAN, etc.). Size of Gather/Scatter pattern. Pattern will be truncated to size if used.
- -u, --strong-scale=<0,1>     Enable Strong Scaling (Will Split Pattern Evenly Amongst Ranks). [Default: Off]
- -l, --count=<n>              Number of Gathers or Scatters to perform.
- -w, --wrap=<n>               Number of independent slots in the small buffer (source buffer if Scatter, Target buffer if Gather. [Default: 1]
- -R, --runs=<n>               Number of times to repeat execution of the kernel. [Default: 10]
- -t, --omp-threads=<n>        Number of OpenMP threads. [Default: OMP_MAX_THREADS]
- -v, --vector-len=<n>         TODO
- -z, --local-work-size=<n>    Number of Gathers or Scatters performed by each thread on a GPU. [Default: 1024]
- -m, --shared-memory=<n>      Amount of dummy shared memory to allocate on GPUs (used for occupancy control).
- -n, --name=<name>            Specify and name this configuration in the output.
- -s, --random=[<n>]           Sets the seed, or uses a random one if no seed is specified.
- -b, --backend=<backend>      Specify a backend: OpenCL, OpenMP, CUDA, or Serial.
- --cl-platform=<platform>     Specify platform if using OpenCL (case-insensitive, fuzzy matching).
- --cl-device=<device>         Specify device if using OpenCL (case-insensitive, fuzzy matching).
- -f, --kernel-file=<FILE>     Specify the location of an OpenCL kernel file.
-```
-        
-        
-
-#### Benchmark Configuration
-The second set of arguments are benchmark  configuration arguments, and these define how the benchmark is run, including the pattern used and the amount of data that is moved. These arguments are special because you can supply multiple sets of benchmark configurations to spatter so that many runs can be performed at once. This way, memory is allocated only once which greatly reduces the amount of time needed to collect a large amount of data.
-
-```
-./spatter <arguments>
-    -p, --pattern=<Built-in pattern>
-    -p, --pattern=FILE=<config file>
-        See the section on Patterns. 
-    -g, --pattern-gather=<Built-in pattern>
-    -g, --pattern-gather=FILE=<config file>
-        See the section on Patterns. (Used with kernel=GS and MultiGather)
-    -h, --pattern-scatter=<Built-in pattern>
-    -h, --pattern-scatter=FILE=<config file>
-        See the section on Patterns. (Used with kernel=GS and MultiScatter)
-    -k, --kernel-name=<kernel>
-        Specify the kernel you want to run [Default: Gather]
-    -d, --delta=<delta[,delta,...]>
-        Specify one or more deltas [Default: 8]
-    -x, --delta-gather=<delta[,delta,...]>
-        Specify one or more deltas [Default: 8] (Used with kernel=GS)
-    -y --delta-scatter=<delta[,delta,...]>
-        Specify one or more deltas [Default: 8] (Used with kernel=GS)
-    -l, --count=<N>
-        Number of Gathers or Scatters to do
-    -w, --wrap=<N>
-        Number of independent slots in the "small" buffer (Source buffer if Scatter, Target buffer if Gather) [Default: 1]
-    -R, --runs=<N>
-        Number of times to repeat execution of the kernel. [Default: 10]
-    -t, --omp-thread=<N>
-        Number of OpenMP threads [Default: OMP_MAX_THREADS]
-    -z, --local-work-size=<N>
-        Number of Gathers or Scatters performed by each thread on a GPU
-    -s, --shared-memory=<N>
-        Amount of dummy shared memory to allocate on GPUs (used for occupancy control)
-    -n, --name=<NAME>
-        Specify and name used to identify this configuration in the output
-    
-```
+$> ./spatter --help
+
+Usage: ./spatter
+-a (--aggregate) Aggregate (default off)
+   (--atomic-writes) Enable atomic writes for CUDA backend (default 0/off)
+-b (--backend) Backend (default serial)
+-c (--compress) Enable compression of pattern indices
+-d (--delta) Delta (default 8)
+-e (--boundary) Set Boundary (limits max value of pattern using modulo)
+-f (--file) Input File
+-g (--pattern-gather) Set Inner Gather Pattern (Valid with kernel-name: sg, multigather)
+-h (--help) Print Help Message
+-j (--pattern-size) Set Pattern Size (truncates pattern to pattern-size)
+-k (--kernel) Kernel (default gather)
+-l (--count) Set Number of Gathers or Scatters to Perform (default 1024)
+-m (--shared-memory) Set Amount of Dummy Shared Memory to Allocate on GPUs
+-n (--name) Specify the Configuration Name
+-p (--pattern) Set Pattern
+-r (--runs) Set Number of Runs (default 10)
+-s (--random) Set Random Seed (default random)
+-t (--omp-threads) Set Number of Threads (default 1 if !USE_OPENMP or backend != openmp or OMP_MAX_THREADS if USE_OPENMP)
+-u (--pattern-scatter) Set Inner Scatter Pattern (Valid with kernel-name: sg, multiscatter)
+-v (--verbosity) Set Verbosity Level (default 1)
+-w (--wrap) Set Wrap (default 1)
+-x (--delta-gather) Delta (default 8)
+-y (--delta-scatter) Delta (default 8)
+-z (--local-work-size) Set Local Work Size (default 1024)
+```      
 
 #### Pattern
-Spatter supports two built-in pattners, uniform stride and mostly stride-1. 
+Spatter supports a few built-in patterns, such as uniform stride, mostly stride-1, and Laplacian. 
 
 ```
 Uniform:
     -pUNIFORM:<length>:<gap>
         Length is the length of the pattern, and gap is the size of each jump. 
         E.g. UNIFORM:8:4 -> [0,4,8,12,16,20,24,28]
+
 Mostly Stride-1
     -pMS1:<length>:<gap_locations>:<gap(s)>
         Length is the length of the pattern, gap_locations are the places within the pattern
@@ -182,7 +134,6 @@ Laplacian:
              LAPLACIAN:3:1:100 -> [0,9900,9999,10000,10001,10100,20000] // 7-point stencil (3D)
 
         The default delta is 1 for Laplacian patterns
-
 ```
 
 You can also simply specify your own pattern, of any length.
@@ -264,11 +215,10 @@ Lavin, P., Young, J., Vuduc, R., Riedy, J., Vose, A. and Ernst, D., Evaluating G
 
 #### Dependencies: 
 
-* CMake 3.18+ 
-* A supported C/C++ 11 compiler 
+* CMake 3.25+ 
+* A supported C++ 17 compiler 
   * GCC 
   * Clang 
 * If using CUDA, CUDA 11.0+ 
 * If using OpenMP, OpenMP 3.0+
-  * Note: Issues have been reported in Mac systems with OpenMP. If you encounter issues finding OpenMP, please use Spatter in a Linux container. 
-* Spatter can also run serially
+  * Note: Issues have been reported in Mac systems with OpenMP. If you encounter issues finding OpenMP when building on Mac OSX, please try to build and run Spatter in a Linux container.