Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MIOpenTensile Part2 : Support low precision types #556

Merged
merged 166 commits into from
Apr 18, 2021
Merged
Show file tree
Hide file tree
Changes from 165 commits
Commits
Show all changes
166 commits
Select commit Hold shift + click to select a range
3201498
add miopentensile path
ce1adon Apr 7, 2020
46daceb
use miotensile in hip backend
ce1adon Apr 9, 2020
bb28471
hip tidy
ce1adon Apr 10, 2020
3c6d5bd
update miotensile
ce1adon Apr 11, 2020
a0d8cbc
generalize miotensile stride
ce1adon Apr 14, 2020
93e51f2
disable miopentensile in ocl backend
ce1adon Apr 14, 2020
0aa1a64
remove ocl backend consecutive identical branches
ce1adon Apr 15, 2020
e00580c
disable use of miotensile in hip-clang
ce1adon Apr 16, 2020
583dd50
Merge branch 'develop' into miotensile
ce1adon Apr 17, 2020
736940c
update miopentensile version
ce1adon Apr 27, 2020
5e29d30
Merge branch 'develop' into miotensile
ce1adon May 7, 2020
33b0fbd
Merge branch 'develop' into miotensile
ce1adon May 19, 2020
0998746
Merge branch 'develop' into miotensile
ce1adon Jun 21, 2020
818544f
use const arg in gemm
ce1adon Jun 23, 2020
c6e5611
miot introduce transpose feature
ce1adon Aug 15, 2020
7bfe9f7
update miot version
ce1adon Aug 15, 2020
1c02111
Merge branch 'develop' into miotensile
ce1adon Aug 15, 2020
2a73f16
disable miotensile in hip-clang in cmakelists
ce1adon Aug 17, 2020
5d0639b
disable miotensile except in hcc in cmakelists
ce1adon Aug 18, 2020
17f5a59
disable ocl wrw2 solvers
ce1adon Aug 24, 2020
836fbd4
Merge branch 'develop' into miotensile
ce1adon Aug 24, 2020
1beaf51
Merge branch 'disable-oclwrw' into miotensile
ce1adon Aug 24, 2020
bd706c5
clang tidy
ce1adon Aug 25, 2020
cecd3e9
Merge branch 'disable-oclwrw' into miotensile
ce1adon Aug 25, 2020
d691097
enable miopentensile built with hipclang
ce1adon Aug 27, 2020
7d2c634
add miotensile in hip-clang docker
ce1adon Aug 28, 2020
2bd15dc
add python3-venv in hip-clang docker
ce1adon Aug 28, 2020
9580b0a
hip-clang docker update
ce1adon Aug 31, 2020
3460907
update hip-clang docker compiler
Aug 31, 2020
93148be
update miopentensile info
Aug 31, 2020
3c04ea2
Merge branch 'develop' into miotensile
ce1adon Sep 1, 2020
a744b47
disable miotensile with hcc
ce1adon Sep 2, 2020
b6ddabe
set miopentensile off & add dedicated test stage
ce1adon Sep 4, 2020
e8f2a9d
Revert "clang tidy"
ce1adon Sep 4, 2020
c5c18a8
Revert "disable ocl wrw2 solvers"
ce1adon Sep 4, 2020
785fda2
workaround for issue 2534
ce1adon Sep 4, 2020
048799a
Adjust test config
ce1adon Sep 5, 2020
653e6df
Adjust gemm env_var
ce1adon Sep 5, 2020
6ca9951
use miotensile in winograd3x3multipass solver
ce1adon Sep 8, 2020
c6d3d7c
revert gemm enforce backend values
ce1adon Sep 8, 2020
3692844
skip igemm tests in test stage iv (miotensile tests)
ce1adon Sep 8, 2020
658f441
skip igemm tests in test stage iv
ce1adon Sep 9, 2020
f2e3bf5
[skip ci] add comment for cmake's skip rule
ce1adon Sep 9, 2020
8678cea
use parallel jobs for tensile
Sep 10, 2020
215bdff
specify gpu architecture(s) in miotensile build process
Sep 11, 2020
1780dbb
fix jenkins docker arg
Sep 11, 2020
716b09f
fix jenkins docker arg string
Sep 11, 2020
a15cde6
support low-precision datatypes
Sep 12, 2020
e17e05e
enable bfloat16 miopentensile path
Sep 12, 2020
fc738db
Merge branch 'develop' into miotensile
ce1adon Sep 14, 2020
fb452bf
rename gpu target param for miotensile
Sep 15, 2020
555aa2c
Merge branch 'miotensile' of https://github.com/ROCmSoftwarePlatform/…
Sep 15, 2020
c089443
specify gpu architecture(s) in Jenkinsfile buildJob
Sep 15, 2020
83c8082
relocate miotensile gpu targets arg
Sep 16, 2020
fd94b50
add symlink to /opt/rocm in Dockerfile
Sep 17, 2020
9a7ba7f
revise Dockerfile to pass GPU architecture flag for dependencies
Sep 18, 2020
10fa0fd
Merge branch 'develop' into miotensile
ce1adon Sep 20, 2020
67a06c7
add jenkins tests for last/latest version of miopentensile
Sep 26, 2020
569bd7b
remove workaround for issue #2534
Sep 26, 2020
6b28286
winograd workaround condition update: current miopentensile doesn't s…
Sep 26, 2020
8d50d0e
clang format
Sep 26, 2020
20b6e44
Merge branch 'develop' into miotensile
ce1adon Sep 26, 2020
29c0524
fix jenkinsfile typo & move GPU architecture flag to cget init in hip…
ce1adon Sep 29, 2020
402b02e
revise conv_multipass_wino3x3WrW GEMM workaround for miopentensile
ce1adon Sep 29, 2020
99581d7
clang format
ce1adon Sep 29, 2020
0e2dd07
miopentensile stage rename
ce1adon Sep 29, 2020
b49fee4
reset miopentensile version in hcc
ce1adon Sep 30, 2020
4d7cc64
revert wino3x3WrW datatype constraint in miopentensile
ce1adon Oct 5, 2020
ee6b154
Merge branch 'develop' into miotensile
Oct 5, 2020
3402052
remove miopentensile from dependency list & update version
ce1adon Oct 10, 2020
cc0b6e7
Merge branch 'miotensile' into miotensile-lowprec
Oct 14, 2020
c7c17ea
remove datatype limit in multipass wino3x3wrw
Oct 14, 2020
bae6b2f
add low precision tests for miopentensile in Jenkins
Oct 14, 2020
7678578
update tensile version for int8
ce1adon Oct 20, 2020
dcdec86
miopentensile to 3-10-x
ce1adon Oct 20, 2020
f6bc06a
revert miopentensile latest version
ce1adon Oct 20, 2020
b76f85f
revise int8 conv test validation check based on latest conv host
ce1adon Oct 20, 2020
f858aa1
revise format
ce1adon Oct 20, 2020
62156d6
fix bug & clang format
ce1adon Oct 20, 2020
98b2acd
fix merge conflicts
ce1adon Oct 21, 2020
0449ee0
skip not applicable tests
ce1adon Oct 26, 2020
4a151de
skip redundant tests for low precision
ce1adon Oct 29, 2020
c6f86ea
Merge branch 'develop' into miotensile-lowprec
ce1adon Oct 30, 2020
4e2f8ec
Merge branch 'develop' into miotensile-lowprec
ce1adon Nov 5, 2020
964b839
add symlink & update miot version
ce1adon Nov 6, 2020
47dbb6c
[skip ci] include miopentensile in MP BD winograd path
ce1adon Nov 14, 2020
d62f2de
update target id feature in docker and jenkins files
ce1adon Nov 15, 2020
2460147
Merge branch 'develop' into miotensile-lowprec
ce1adon Nov 15, 2020
0fd4160
skip miopentensile install when target id env not present
ce1adon Nov 17, 2020
65477cf
Merge branch 'develop' into miotensile-lowprec
ce1adon Nov 24, 2020
f1cd1aa
targetid-fixes(01) Simplify & improve logging of build commands. Some…
atamazov Dec 16, 2020
4977bde
targetid-fixes(02) Refactor: device_name.hpp -> handle.cpp
atamazov Dec 17, 2020
79a8a11
targetid-fixes(03) Fix OCL build error. Remove dead code: dumpKernel()
atamazov Dec 17, 2020
5e672e0
targetid-fixes(04) Move HIP_PACKAGE_VERSION_FLAT to config.h
atamazov Dec 21, 2020
74a4fc5
targetid-fixes(05) Refactor GetDeviceName()
atamazov Dec 22, 2020
df4bd5a
targetid-fixes(06) Rework workaround for issue 1711
atamazov Dec 22, 2020
125010e
targetid-fixes(07) Fix CompileTime logging for generated sources (MIO…
atamazov Dec 22, 2020
f6c6442
targetid-fixes(08) Store target properties in the handle (HIP)
atamazov Dec 23, 2020
30b9a12
targetid-fixes(09) Store target properties in the handle (OCL)
atamazov Dec 23, 2020
ec11cec
targetid-fixes(10) Fix TargetProperties::Init()
atamazov Dec 23, 2020
ed74896
targetid-fixes(11) Simplifications & formatting
atamazov Dec 23, 2020
3280330
targetid-fixes(12) Clang-tidy fix
atamazov Dec 23, 2020
fe31c30
targetid-fixes(13) [nfc] Target features: init to defaults, engage in…
atamazov Dec 24, 2020
5edcc27
Merge branch 'develop' into miotensile-lowprec
ce1adon Dec 25, 2020
ab4106e
update GEMM macro
ce1adon Dec 25, 2020
18a6581
Merge branch 'miotensile-lowprec' of https://github.com/ROCmSoftwareP…
ce1adon Dec 25, 2020
165189f
targetid-fixes(14) [nfc] Fix: Add missing Handle::GetTargetProperties…
atamazov Dec 25, 2020
c8b3557
targetid-fixes(15) [nfc] Refactor & Pass TargetProperties to HipBuild…
atamazov Dec 25, 2020
56a1857
targetid-fixes(16) Clang-tidy fix
atamazov Dec 25, 2020
5b232eb
targetid-fixes(17) Use target features to in perf- and find-databases
atamazov Dec 25, 2020
24e7c79
targetid-fixes(18) Fix bug from PR 511 (comparison of external_tool_v…
atamazov Dec 29, 2020
5d58fcf
targetid-fixes(19) Use xnack feature in HIP builds
atamazov Dec 29, 2020
d8b8b8f
targetid-fixes(20) HIP builds: Use --cuda-gpu-arch instead of --offlo…
atamazov Dec 29, 2020
f3b88f8
targetid-fixes(21) OCL builds (HIP be): Update CO version selection o…
atamazov Dec 29, 2020
875a38f
targetid-fixes(22) Clang-tidy fix
atamazov Jan 9, 2021
ee53f31
targetid-fixes(23) OCL builds (OCL be): Enable logging of build options
atamazov Jan 9, 2021
c6904be
targetid-fixes(24) OCL builds (OCL be): Avoid duplication of error me…
atamazov Jan 9, 2021
b7b8d99
targetid-fixes(25) Workaround for cppcheck
atamazov Jan 11, 2021
085245c
targetid-fixes(26) Fix COMGR build errors & some 'unused macro' warnings
atamazov Jan 12, 2021
ad8d5e7
targetid-fixes(27) [tests] Enable W/A for missing macros in PCH for a…
atamazov Jan 13, 2021
2717fcc
re-group miopentensile tests & rename targetID docker
ce1adon Jan 14, 2021
e52e9d7
Merge remote-tracking branch 'origin/develop' into miotensile-lowprec
ce1adon Jan 14, 2021
abeac2b
Merge remote-tracking branch 'origin/targetid-fixes' into miotensile-…
ce1adon Jan 14, 2021
cc03d67
disable extra implicit gemm test in miopentensile tests
ce1adon Jan 15, 2021
845253a
Merge remote-tracking branch 'origin/develop' into miotensile-lowprec
ce1adon Jan 15, 2021
86e7f7a
rename targetID dockers
ce1adon Jan 19, 2021
09b04fa
Merge remote-tracking branch 'origin/develop' into miotensile-lowprec
ce1adon Jan 20, 2021
d248bff
Merge branch 'develop' into miotensile-lowprec
ce1adon Feb 10, 2021
8d4540a
update MIOpenTensile and docker image
ce1adon Feb 10, 2021
76a124f
update Tensile arch string
ce1adon Feb 10, 2021
fb7b62a
workaround for hipcc
ce1adon Feb 10, 2021
691113b
restrict hipcc workaround and env var export in targetID env
ce1adon Feb 12, 2021
361249f
Merge branch 'develop' into miotensile-lowprec
ce1adon Mar 9, 2021
83caf61
fix missing property target_id in JenkinsFile
ce1adon Mar 10, 2021
35fc6d9
test miopentensile on latest CI framework
ce1adon Mar 10, 2021
cb14a16
Add MIOPEN_DEBUG_HIP_KERNELS=0 to reduce testing time
ce1adon Mar 11, 2021
439042c
rearrange miopentensile tests
ce1adon Mar 11, 2021
2e0346d
regulate stage names
ce1adon Mar 11, 2021
fb54873
Revert "test miopentensile on latest CI framework"
ce1adon Mar 11, 2021
301608a
merge develop
ce1adon Apr 7, 2021
60c8e5c
fix conflicts
ce1adon Apr 7, 2021
002e51b
clang format
ce1adon Apr 7, 2021
93a1041
merge tensile stages
ce1adon Apr 7, 2021
fe6e75d
replace tensile stages with function
ce1adon Apr 7, 2021
2bb47ef
test miopentensile stages
ce1adon Apr 7, 2021
cb4d064
remove image param
ce1adon Apr 7, 2021
ca67bcc
accept custom docker image
ce1adon Apr 7, 2021
1153d5d
Revert "test miopentensile stages"
ce1adon Apr 7, 2021
a9ac78c
remove MIOPEN_DEBUG_IMPLICIT_GEMM_NON_XDLOPS_INLINE_ASM in jenkins en…
ce1adon Apr 7, 2021
89f94c3
build docker with parallel-jobs=4
ce1adon Apr 7, 2021
23134d9
revert comment
ce1adon Apr 8, 2021
499cb91
update miopentensile version
ce1adon Apr 8, 2021
69d736a
revert build docker with parallel-jobs=4
ce1adon Apr 8, 2021
7f7cda0
update mainline build versoin
ce1adon Apr 9, 2021
a688a71
unify image name
ce1adon Apr 9, 2021
eea15d8
revert mainline build version
ce1adon Apr 13, 2021
a36086d
add comment to distinguish GEMM logic
ce1adon Apr 13, 2021
86629f2
disable MIOpenTensile stage
ce1adon Apr 15, 2021
360fa6b
Merge branch 'develop' into miotensile-lowprec
atamazov Apr 16, 2021
0fc0fd1
Refactor test/CmakeLists.txt. Separate MIOpenTensile related code fro…
atamazov Apr 16, 2021
6ef88d7
Merge branch 'develop' into miotensile-lowprec
atamazov Apr 16, 2021
255122b
Fix stage names. Add MIOPENTENSILE_LATEST parameter.
atamazov Apr 16, 2021
f71a778
MIOpenTensile tests OFF by default.
atamazov Apr 16, 2021
0eefeb4
Fix syntax error
atamazov Apr 16, 2021
aa49456
Enable testing of LATEST by default
atamazov Apr 16, 2021
ef3c289
Fix syntax error
atamazov Apr 17, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 8 additions & 6 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,13 @@ FROM ubuntu:18.04
ARG PREFIX=/usr/local
ARG GPU_ARCH=";"
ARG MIOTENSILE_VER="default"
ARG USE_TARGETID="OFF"

# Support multiarch
RUN dpkg --add-architecture i386

# Add rocm repository
RUN sh -c 'echo deb [arch=amd64 trusted=yes] http://repo.radeon.com/rocm/apt/.apt_3.7/ xenial main > /etc/apt/sources.list.d/rocm.list'
RUN if [ "$USE_TARGETID" = "ON" ] ; then sh -c 'echo deb [arch=amd64 trusted=yes] http://compute-artifactory.amd.com/artifactory/list/rocm-osdb-deb/ compute-rocm-dkms-no-npi-hipclang 6416 > /etc/apt/sources.list.d/rocm.list'; else sh -c 'echo deb [arch=amd64 trusted=yes] http://repo.radeon.com/rocm/apt/.apt_3.7/ xenial main > /etc/apt/sources.list.d/rocm.list'; fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Q] Why 6416? Shall we promote this number in the future (and when)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember how long this PR has been staying here? :)
Let me know which mainline# is used currently

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... Do we want this to advance automatically, to the latest mainline build?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the latest mainline build number?

Copy link
Contributor

@atamazov atamazov Apr 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can find it here, but I do not know how good or bad is it.

RUN sh -c "echo deb http://mirrors.kernel.org/ubuntu xenial main universe | tee -a /etc/apt/sources.list"

# Install dependencies
Expand Down Expand Up @@ -84,15 +85,16 @@ ADD min-requirements.txt /min-requirements.txt
RUN CXXFLAGS='-isystem $PREFIX/include' cget -p $PREFIX install -f /min-requirements.txt
RUN cget -p $PREFIX install danmar/cppcheck@dd05839a7e63ef04afd34711cb3e1e0ef742882f

RUN export HIPCC_LINK_FLAGS_APPEND='-O3 -parallel-jobs=4'
RUN export HIPCC_COMPILE_FLAGS_APPEND='-O3 -Wno-format-nonliteral -parallel-jobs=4'

# Install doc requirements
ADD doc/requirements.txt /doc-requirements.txt
RUN pip install -r /doc-requirements.txt

# install last released miopentensile in default, install latest commits when MIOTENSILE_VER="latest"
RUN if [ "$MIOTENSILE_VER" = "latest" ] ; then cget -p $PREFIX install ROCmSoftwarePlatform/MIOpenTensile@7568654c938d42e9a91c6b18fb382f5b978d12fd; else cget -p $PREFIX install ROCmSoftwarePlatform/MIOpenTensile@5fe0bf4a8dc59f3ab62df929297280915372ce16; fi
# Use parallel job to accelerate tensile build
# Workaround for Tensile with TargetID feature
RUN if [ "$USE_TARGETID" = "ON" ] ; then export HIPCC_LINK_FLAGS_APPEND='-O3 -parallel-jobs=4' && export HIPCC_COMPILE_FLAGS_APPEND='-O3 -Wno-format-nonliteral -parallel-jobs=4' && rm /usr/bin/hipcc; fi
atamazov marked this conversation as resolved.
Show resolved Hide resolved

# install last released miopentensile in default (master), install latest commits when MIOTENSILE_VER="latest" (develop)
RUN if [ "$USE_TARGETID" = "OFF" ] ; then echo "MIOpenTensile is not installed."; elif [ "$MIOTENSILE_VER" = "latest" ] ; then cget -p $PREFIX install ROCmSoftwarePlatform/MIOpenTensile@be26d30d3d7509a414134a45f4a6d49e5da250b8; else cget -p $PREFIX install ROCmSoftwarePlatform/MIOpenTensile@4bfe00a8de61d12862d9fa803b8ea9a981a50f97; fi

RUN cd ~ && \
export MLIR_COMMIT=31d92f4c64ae6fa6b7c5d543f68b69300b4513ce && \
Expand Down
305 changes: 260 additions & 45 deletions Jenkinsfile

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ RadeonOpenCompute/rocm-cmake@cdd0f632b3a65bd4411593bb827eb664e25c80bc --build
RadeonOpenCompute/clang-ocl@930015924b012c332d373535ff31a663b6ad2c64
ROCmSoftwarePlatform/MIOpenGEMM@0eb1257cfaef83ea155aabd67af4437c0028db48
ROCmSoftwarePlatform/rocBLAS@9790a8658341bc665c11c311129ad0dfc533d5c4
# ROCmSoftwarePlatform/MIOpenTensile@5fe0bf4a8dc59f3ab62df929297280915372ce16
# ROCmSoftwarePlatform/MIOpenTensile@master
92 changes: 57 additions & 35 deletions src/gemm_v2.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -149,20 +149,13 @@ static GemmBackend_t enforce_gemm_backend(miopenDataType_t data_type,

// make sure backend chosen based on env variable is suppported
#if MIOPEN_USE_MIOPENTENSILE
(void)data_type;
switch(gemm_backend_env)
{
case GemmBackend_t::nogemmbackend: gemm_backend_enforced = GemmBackend_t::nogemmbackend; break;
case GemmBackend_t::rocblas:
case GemmBackend_t::miopengemm:
case GemmBackend_t::miopentensile:
gemm_backend_enforced = (data_type == miopenFloat) ? GemmBackend_t::miopentensile :
#if MIOPEN_USE_ROCBLAS
GemmBackend_t::rocblas
#else
GemmBackend_t::nogemmbackend
#endif
;
break;
case GemmBackend_t::miopentensile: gemm_backend_enforced = GemmBackend_t::miopentensile; break;
}
#elif MIOPEN_USE_ROCBLAS and MIOPEN_USE_MIOPENGEMM
switch(gemm_backend_env)
Expand Down Expand Up @@ -270,8 +263,49 @@ miopenStatus_t CallGemmMIOpenTensile(const Handle& handle,
{
MIOPEN_LOG_FUNCTION("MIOpenTensile");

if(gemm_desc.dataType != miopenFloat)
return miopenStatusNotImplemented;
miopen_tensile_type miotsl_in_dtype, miotsl_out_dtype;
Data_t ptrA, ptrB, ptrC;
switch(gemm_desc.dataType)
{
case miopenFloat:
miotsl_in_dtype = miopen_tensile_type_float;
ptrA = Data_t(reinterpret_cast<const float*>(A) + a_offset);
ptrB = Data_t(reinterpret_cast<const float*>(B) + b_offset);
ptrC = Data_t(reinterpret_cast<float*>(C) + c_offset);
break;
case miopenHalf:
miotsl_in_dtype = miopen_tensile_type_half;
ptrA = Data_t(reinterpret_cast<const half_float::half*>(A) + a_offset);
ptrB = Data_t(reinterpret_cast<const half_float::half*>(B) + b_offset);
ptrC = Data_t(reinterpret_cast<half_float::half*>(C) + c_offset);
break;
case miopenBFloat16:
miotsl_in_dtype = miopen_tensile_type_bfloat16;
ptrA = Data_t(reinterpret_cast<const unsigned short*>(A) + a_offset);
ptrB = Data_t(reinterpret_cast<const unsigned short*>(B) + b_offset);
ptrC = Data_t(reinterpret_cast<unsigned short*>(C) + c_offset);
break;
case miopenInt32:
miotsl_in_dtype = miopen_tensile_type_int32;
ptrA = Data_t(reinterpret_cast<const int32_t*>(A) + a_offset);
ptrB = Data_t(reinterpret_cast<const int32_t*>(B) + b_offset);
ptrC = Data_t(reinterpret_cast<int32_t*>(C) + c_offset);
break;
case miopenInt8:
case miopenInt8x4:
miotsl_in_dtype = miopen_tensile_type_int8x4;
ptrA = Data_t(reinterpret_cast<const int8_t*>(A) + a_offset);
ptrB = Data_t(reinterpret_cast<const int8_t*>(B) + b_offset);
ptrC = Data_t(reinterpret_cast<int32_t*>(C) + c_offset);
}
if(gemm_desc.dataType == miopenInt8 || gemm_desc.dataType == miopenInt8x4)
{
miotsl_out_dtype = miopen_tensile_type_int32;
}
else
{
miotsl_out_dtype = miotsl_in_dtype;
}

#if MIOPEN_BACKEND_HIP
HipEventPtr start = nullptr;
Expand Down Expand Up @@ -302,21 +336,21 @@ miopenStatus_t CallGemmMIOpenTensile(const Handle& handle,
miopen_tensile_matrix mtA{{mtA_len0, mtA_len1},
{mtA_str0, mtA_str1},
{mtA_b_n, mtA_b_str},
miopen_tensile_type_float,
miotsl_in_dtype,
gemm_desc.transA,
Data_t(reinterpret_cast<const float*>(A) + a_offset)};
ptrA};
miopen_tensile_matrix mtB{{mtB_len0, mtB_len1},
{mtB_str0, mtB_str1},
{mtB_b_n, mtB_b_str},
miopen_tensile_type_float,
miotsl_in_dtype,
gemm_desc.transB,
Data_t(reinterpret_cast<const float*>(B) + b_offset)};
ptrB};
miopen_tensile_matrix mtC{{mtC_len0, mtC_len1},
{mtC_str0, mtC_str1},
{mtC_b_n, mtC_b_str},
miopen_tensile_type_float,
miotsl_out_dtype,
false,
Data_t(reinterpret_cast<float*>(C) + c_offset)};
ptrC};

miopen_tensile_status mt_status = miopen_tensile_status_no_solution;
#if MIOPEN_BACKEND_HIP
Expand Down Expand Up @@ -358,13 +392,9 @@ miopenStatus_t CallGemm(const Handle& handle,
gemm_backend = enforce_gemm_backend(gemm_desc.dataType, gemm_backend);

// do row-to-column major conversion here
// add macro to distinguish MIOpenTensile and rocBlas logic
#if MIOPEN_USE_MIOPENTENSILE
if((gemm_desc.isColMajor && gemm_desc.dataType == miopenFloat)
#if MIOPEN_USE_ROCBLAS
||
(!gemm_desc.isColMajor && gemm_desc.dataType != miopenFloat)
#endif
)
if(gemm_desc.isColMajor)
asroy marked this conversation as resolved.
Show resolved Hide resolved
#else
if(!gemm_desc.isColMajor)
#endif
Expand Down Expand Up @@ -658,13 +688,9 @@ miopenStatus_t CallGemmStridedBatched(const Handle& handle,
gemm_backend = enforce_gemm_backend(gemm_desc.dataType, gemm_backend);

// do row-to-column major conversion here
// add macro to distinguish MIOpenTensile and rocBlas logic
#if MIOPEN_USE_MIOPENTENSILE
if((gemm_desc.isColMajor && gemm_desc.dataType == miopenFloat)
#if MIOPEN_USE_ROCBLAS
||
(!gemm_desc.isColMajor && gemm_desc.dataType != miopenFloat)
#endif
)
if(gemm_desc.isColMajor)
#else
if(!gemm_desc.isColMajor)
#endif
Expand Down Expand Up @@ -907,13 +933,9 @@ miopenStatus_t CallGemmStridedBatchedSequential(const Handle& handle,
gemm_backend = enforce_gemm_backend(gemm_desc.dataType, gemm_backend);

// do row-to-column major conversion here
// add macro to distinguish MIOpenTensile and rocBlas logic
#if MIOPEN_USE_MIOPENTENSILE
if((gemm_desc.isColMajor && gemm_desc.dataType == miopenFloat)
#if MIOPEN_USE_ROCBLAS
||
(!gemm_desc.isColMajor && gemm_desc.dataType != miopenFloat)
#endif
)
if(gemm_desc.isColMajor)
#else
if(!gemm_desc.isColMajor)
#endif
Expand Down
10 changes: 5 additions & 5 deletions src/ocl/convolutionocl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -74,9 +74,9 @@ MIOPEN_DECLARE_ENV_VAR(MIOPEN_DEBUG_COMPILE_ONLY)
#if MIOPEN_USE_GEMM
#ifdef CPPCHECK
// Keep the value unknown in cppcheck since this can differ between opencl and hip
static bool IsUseRocBlas;
static bool IsBF16PathValid;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@atamazov
Please notice here that bf16 is considered valid as long as gemm is used. Under this condition we should test bf16 gemm path, including conv3d.

#else
static const bool IsUseRocBlas = (MIOPEN_USE_ROCBLAS == 1);
static const bool IsBF16PathValid = (MIOPEN_USE_ROCBLAS == 1 || MIOPEN_USE_MIOPENTENSILE == 1);
#endif

static inline bool IsAnyBufferBF16(const TensorDescriptor& xDesc,
Expand Down Expand Up @@ -669,7 +669,7 @@ bool ConvolutionDescriptor::IsGemmApplicableWrw(const TensorDescriptor& dyDesc,
{
#if MIOPEN_USE_GEMM
if(!miopen::IsDisabled(MIOPEN_DEBUG_CONV_GEMM{}) &&
!(IsAnyBufferBF16(xDesc, dyDesc, dwDesc) && !IsUseRocBlas))
!(IsAnyBufferBF16(xDesc, dyDesc, dwDesc) && !IsBF16PathValid))
{
const std::size_t spatial_dim = GetSpatialDimension();
const auto wei_spatial = boost::adaptors::slice(dwDesc.GetLengths(), 2, 2 + spatial_dim);
Expand Down Expand Up @@ -1644,7 +1644,7 @@ void ConvolutionDescriptor::FindConvBwdWeightsAlgorithm(Handle& handle,
perf_db = UserFindDbRecord::TryLoad(handle, problem, [&](DbRecord& record) {
#if MIOPEN_USE_GEMM
if(!miopen::IsDisabled(MIOPEN_DEBUG_CONV_GEMM{}) &&
!(IsAnyBufferBF16(xDesc, dyDesc, dwDesc) && !IsUseRocBlas))
!(IsAnyBufferBF16(xDesc, dyDesc, dwDesc) && !IsBF16PathValid))
{
const bool time_precision = (!IsDisabled(MIOPEN_CONV_PRECISE_ROCBLAS_TIMING{}));

Expand Down Expand Up @@ -1934,7 +1934,7 @@ void ConvolutionDescriptor::BackwardWeightsGemm(Handle& handle,
{
MIOPEN_THROW("GEMM convolution is disabled");
}
if(IsAnyBufferBF16(tensors.xDesc, tensors.dyDesc, tensors.dwDesc) && !IsUseRocBlas)
if(IsAnyBufferBF16(tensors.xDesc, tensors.dyDesc, tensors.dwDesc) && !IsBF16PathValid)
{
MIOPEN_THROW("GEMM convolution is unsupported");
}
Expand Down
6 changes: 3 additions & 3 deletions src/solver/conv_MP_bidirectional_winograd.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,7 @@ GetWinoBuffer(const ConvolutionContext& params,
template <int WinoDataH, int WinoFilterH, int WinoDataW, int WinoFilterW>
inline bool IsApplicableGEMM(const ConvolutionContext& params)
{
#if(MIOPEN_BACKEND_HIP && MIOPEN_USE_ROCBLAS)
#if(MIOPEN_BACKEND_HIP && (MIOPEN_USE_ROCBLAS || MIOPEN_USE_MIOPENTENSILE))

const miopenDataType_t transform_data_type =
miopen::IsEnabled(MIOPEN_DEBUG_AMD_MP_BD_WINOGRAD_EXPEREMENTAL_FP16_TRANSFORM{})
Expand Down Expand Up @@ -429,7 +429,7 @@ InvokerFactory MakeWinogradInvokerFactory(const ConvolutionContext& params,
}
else
{
#if MIOPEN_USE_ROCBLAS
#if MIOPEN_USE_ROCBLAS || MIOPEN_USE_MIOPENTENSILE
// GEMM
gemm_conv_kernel_name = "WRW_WINO_GEMM: ";

Expand All @@ -453,7 +453,7 @@ InvokerFactory MakeWinogradInvokerFactory(const ConvolutionContext& params,
gemm_conv_factory = [=](const std::vector<Kernel>&) {

return [=](const Handle& handle, const AnyInvokeParams& ctx) {
#if MIOPEN_USE_ROCBLAS
#if MIOPEN_USE_ROCBLAS || MIOPEN_USE_MIOPENTENSILE
const auto& data_ctx = ctx.CastTo<conv::DataInvokeParams>();
Data_t workSpace = data_ctx.workSpace;
CallGemmStridedBatched(
Expand Down
4 changes: 0 additions & 4 deletions src/solver/conv_multipass_wino3x3WrW.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -364,10 +364,6 @@ bool ConvWinograd3x3MultipassWrW<WinoDataH, WinoFilterH, WinoDataW, WinoFilterW>
// ROCBLAS for GEMM step

#if(MIOPEN_BACKEND_HIP && (MIOPEN_USE_ROCBLAS || MIOPEN_USE_MIOPENTENSILE))
#if(!MIOPEN_USE_ROCBLAS)
if(!params.IsFp32())
return false;
#endif
static const int wino_data_tile = std::max(WinoDataH, WinoDataW);
static const int wino_filter_tile = std::max(WinoFilterH, WinoFilterW);

Expand Down
9 changes: 5 additions & 4 deletions src/solver/gemm.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -53,9 +53,10 @@ namespace solver {
#if MIOPEN_USE_GEMM
#ifdef CPPCHECK
// Keep the value unknown in cppcheck since this can differ between opencl and hip
static bool IsUseRocBlas;
static bool IsBF16PathValid;
#else
static constexpr const bool IsUseRocBlas = (MIOPEN_USE_ROCBLAS == 1);
static constexpr const bool IsBF16PathValid =
(MIOPEN_USE_ROCBLAS == 1 || MIOPEN_USE_MIOPENTENSILE == 1);
#endif

static inline bool IsAnyBufferBF16(const TensorDescriptor& xDesc,
Expand All @@ -75,9 +76,9 @@ bool GemmFwdBase::IsApplicable(const ExecutionContext&,
const auto& wDesc = problem.GetWeights();
const auto& yDesc = problem.GetOut();
return problem.GetDirection() == conv::Direction::Forward && problem.IsLayoutDefault() &&
!(IsAnyBufferBF16(xDesc, yDesc, wDesc) && !IsUseRocBlas);
!(IsAnyBufferBF16(xDesc, yDesc, wDesc) && !IsBF16PathValid);
#else
std::ignore = problem;
std::ignore = problem;
return false;
#endif
};
Expand Down
9 changes: 5 additions & 4 deletions src/solver/gemm_bwd.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -50,9 +50,10 @@ MIOPEN_DECLARE_ENV_VAR(MIOPEN_CONV_PRECISE_ROCBLAS_TIMING)
#if MIOPEN_USE_GEMM
#ifdef CPPCHECK
// Keep the value unknown in cppcheck since this can differ between opencl and hip
static bool IsUseRocBlas;
static bool IsBF16PathValid;
#else
static constexpr const bool IsUseRocBlas = (MIOPEN_USE_ROCBLAS == 1);
static constexpr const bool IsBF16PathValid =
(MIOPEN_USE_ROCBLAS == 1 || MIOPEN_USE_MIOPENTENSILE == 1);
#endif

namespace miopen {
Expand Down Expand Up @@ -89,9 +90,9 @@ bool GemmBwdBase::IsApplicable(const ExecutionContext&,
const auto& wDesc = problem.GetWeights();
const auto& dxDesc = problem.GetOut();
return problem.GetDirection() == conv::Direction::BackwardData && problem.IsLayoutDefault() &&
!(IsAnyBufferBF16(dxDesc, dyDesc, wDesc) && !IsUseRocBlas);
!(IsAnyBufferBF16(dxDesc, dyDesc, wDesc) && !IsBF16PathValid);
#else
std::ignore = problem;
std::ignore = problem;
return false;
#endif
}
Expand Down
Loading