Skip to content

Tensile Tuning Guide

likelovewant edited this page Aug 8, 2024 · 1 revision

First set up development environment as guide by tensile wikis, However, this wikis kind of old ,therefore we need update those to run it work.

Set up Ubuntu 22.04.03 ( windows not able to run due to lack of environment ,we will to tuning our gpu host machine in linux , use the final file to build both in windows and linux)

1 , Set up ROCm

sudo apt update && sudo apt install -y curl git vim ffmpeg gfortran libstdc++-12-dev cockpit openssh-server
sudo apt install "linux-headers-$(uname -r)" "linux-modules-extra-$(uname -r)"
sudo usermod -a -G render,video $LOGNAME # Adding current user to Video, Render groups. See prerequisites.
wget https://repo.radeon.com/amdgpu-install/6.1.1/ubuntu/jammy/amdgpu-install_6.1.60101-1_all.deb
sudo apt install ./amdgpu-install_6.1.60101-1_all.deb
sudo apt update
sudo apt install amdgpu-dkms
sudo apt install rocm

sudo reboot 

Then check info if the rocm correct install

rocminfo

2 , Now we use rocblas set up to get the set up . by

git clone -b release/rocm-rel-6.1.1 https://github.com/ROCm/rocBLAS.git

cd rocBLAS

run

./install.sh -d

After done , we are basic set up the environment . Now we move the the wiki guide .

Ubuntu: sudo apt install python3 python3-yaml libomp-dev libboost-program-options-dev libboost-filesystem-dev libtinfo-dev

  • joblib library: pip install joblib (or possibly pip3 install joblib)

Optional Dependency

  • Ubuntu: sudo apt install libmsgpack-dev if you're using the msgpack backend
  • Ubuntu: sudo apt install libtinfo-dev if you're using the YAML backend \

its better to download both . its may not update anythings as we already install them by ./install.sh -d

sudo apt install cmake

Then , download llvm . we install llvm-17 which is compatible with rocm so far .


wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -
sudo add-apt-repository "deb http://apt.llvm.org/jammy/ llvm-toolchain-jammy-17 main"
sudo apt-get update

sudo apt-get install -y clang-17 lldb-17 lld-17

if report error ,saying there is missing link for msgpack

pip install msgpack 

sudo apt-get install libzstd-dev

now , we set up the envrionment .

3, test the environment and get the library file

git clone https://github.com/ROCmSoftwarePlatform/Tensile

cd tensile 

**"If your GPUs are out of official support, you can try editing the arch definitions in Tensile via VS Code. **

  1. Locate Supported Architectures: Look for any official supported architectures listed (e.g., "gfx1102").
  2. Add Your GPU Information: Beside each existing architecture, add your GPU's architecture information using a similar format.
  3. Find 'ISA' Lines: Search for lines containing 11, 0, 2 (or similar). This indicates the supported instruction set architecture.
  4. Modify 'ISA' Code: Add your GPU's ISA code alongside the existing one (e.g., ISA: 11, 0, 2, 11, 0, 3). Remember to replace 11, 0, 3 with the correct ISA code for your specific GPU.

A sample tuning file can be found in ./Tensile/Configs/rocblas_sgemm_example.yaml.

This file generates a library for gfx1030. If you are running on a different architecture, you will first need to edit the line that says ArchitectureName: "gfx1030" at the bottom of the file. change Device ID to yourgpu's Device ID. get it by rocminfo and info availalbe here Then, you can run the benchmark as follows:

mkdir build

cd build

../Tensile/bin/Tensile ../Tensile/Configs/rocblas_sgemm_example.yaml ./

After about 1 minute of benchmarking, Tensile will output a yaml file with the winning kernels. This file contains the results of the winning kernels in the 3_LibraryLogic directory. Spreadsheets and yaml files with the Benchmark Data for all kernels are available in the 2_BenchmarkData directory.

if there is any error try fix it by google or ask AI get your support .

rename the navi21 to navi34 ( change to your liked name ,or simply keep it ) ,change the data over there , change gfx1030 to your gpu arch ,Device Id to your gpu Device ID ( get by run rocminfo).

Then run

cd build

../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_hgemm_gb_nn_asm_full.yaml ./

its take longer ,maybe hours . its best not to use your pc when the benchmarking running . Then run one by one get the rest of information .make sure to back up your build folder after each one benchmark finish to name something build 1,2,3 ...We may only need the file in the 3_LibraryLogic directory.

Then run this one by one ,make sure change the navi34 in the code to your named for your gpu library .or simple change back to navi21,and use the file in navi21 and replace the information with yours. and run the code as example show as above .

here is the list I am using , for your reference only , run each line ,one by one


../Tensile/bin/Tensile ../Tensile/Configs/rocblas_sgemm_example.yaml ./

../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_hgemm_gb_nn_asm_full.yaml ./?


../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_hgemm_gb_nt_asm_full.yaml ./
../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_hgemm_gb_tn_asm_full.yaml ./

../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_hgemm_gb_tt_asm_full.yaml ./

will generate 4 different files in name navi34_Cijk_Alik_Bjlk_HB_GB.yaml


../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_hgemm_sb_nn_asm_full.yaml ./

../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_hgemm_sb_nt_asm_full.yaml ./

../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_hgemm_sb_tn_asm_full.yaml ./
../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_hgemm_sb_tt_asm_full.yaml ./

will generate 4 different files in name navi34_Cijk_Alik_Bjlk_HB.yaml



../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_hpa_hgemm_gb_nn_asm_full.yaml ./

../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_hpa_hgemm_gb_nt_asm_full.yaml ./


../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_hpa_hgemm_gb_tn_asm_full.yaml ./
../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_hpa_hgemm_gb_tt_asm_full.yaml ./

will generate 4 different files in name /navi34_Cijk_Alik_Bjlk_HHS_BH_GB.yaml

../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_hpa_hgemm_sb_nn_asm_full.yaml ./
../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_hpa_hgemm_sb_nt_asm_full.yaml ./

../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_hpa_hgemm_sb_tn_asm_full.yaml ./

../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_hpa_hgemm_sb_tt_asm_full.yaml ./

will generate 4 different files in name navi34_Cijk_Alik_Bjlk_HHS_BH.yaml


../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_sgemm_gb_nn_asm_full.yaml ./
../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_sgemm_gb_nt_asm_full.yaml ./
../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_sgemm_gb_tn_asm_full.yaml ./
../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_sgemm_gb_tt_asm_full.yaml ./

will generate 4 different files in name navi34_cijk_Bljk_SB_GB.yaml ( failed to generate logic by this four file ,except tt file)

../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_sgemm_sb_nn_asm_full.yaml ./
../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_sgemm_sb_nt_asm_full.yaml ./
../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_sgemm_sb_tn_asm_full.yaml ./
../Tensile/bin/Tensile ../Tensile/Configs/navi34/rocblas_sgemm_sb_tt_asm_full.yaml ./

will generate 4 different files in name navi34_cijk_Bljk_SB.yaml ( failed to generate logic by this four file ,except tt file)

There are some files not in navi 21 , you may edit those files like below with name in asm_full.yaml, replace the information with your .or ignore it .

missing file


dgemm rocblas_dgemm_..._asm_full.yaml

rocblas_hpa_bfloat16_..._asm_full.yaml (will generate  navi34_Cijk_Alik_Bjlk_I8II_BH.yaml)

rocblas_igemm_asm_full_nn.yaml ( will generate 4xi8 files , make sure the value in the asm cap to true(set in the common.py or AsmCaps.py) ,if you machine don't support 4xi8 initaly)

mfma_igemm_nn_asm_full.yaml ( will generate navi34_Cijk_Ailk_Bljk_BBS_BH.yaml)

"Important: The default Tensile configuration file might not include support for all GPUs. If you encounter errors when running Tensile, carefully review the terminal output. It will often provide clues about missing or incorrect architecture information.

Troubleshooting Steps:

  1. Identify Missing Information: The error messages should point to specific architectures that are not properly defined in the config file.
  2. Edit the Config File: Add the necessary information for your GPU, following the format used for existing entries (refer to previous instructions on adding architecture details and ISA codes).

Note:

  • Editing configuration files can be risky, so proceed with caution! Back up your original file before making any changes.
  • For newer architectures like RNDA3, there may be a lack of pre-defined configurations. You might need to consult official documentation, explore community forums, or experiment carefully to determine the correct settings.

Future Updates: Official updates to Tensile are often necessary to provide complete support for new GPUs and architectures."

more information for tuning is here and tensile tuning .tex and a pdf version available in here

Rocblas build on linux ,follow this guide