Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MIOpenTensile Part1 : Add MIOpenTensile in GEMM path #139

Merged
merged 69 commits into from
Oct 30, 2020
Merged

Conversation

ce1adon
Copy link
Contributor

@ce1adon ce1adon commented Apr 8, 2020

The purpose of this PR is to add MIOpenTensile path in GEMM algorithm for fp32 on hip backend. It replaces the default GEMM path through rocblas in HIP backend for fp32 type.

@ghost
Copy link

ghost commented Apr 8, 2020

Congratulations 🎉. DeepCode analyzed your code in 2.794 seconds and we found no issues. Enjoy a moment of no bugs ☀️.

👉 View analysis in DeepCode’s Dashboard | Configure the bot

@atamazov
Copy link
Contributor

atamazov commented Apr 8, 2020

The Invokers work is complicated by frequent changes in the baseline code. For example, recent additions of ImplicitGemm solvers have created substantial additional work for @DrizztDoUrden, and, thus have delayed the overall Invoker's ETA.

@DrizztDoUrden is also reworking Gemm code to Solvers and Invokers. This work is currently paused because Vasya is doing something more important (ImplicitGemm Invokers), but I heard it is almost ready. If this PR will be merged first, then its code will need to be reworked once again almost immediately after merging, and that is we all would like to avoid, I guess.

I am not aware of the priorities of this MIOpenTensile thing. @daniellowell @ce1adon @DrizztDoUrden could you please communicate and decide the optimal order?

@DrizztDoUrden
Copy link
Contributor

Doesn't matter much unless it's conv...ocl.cpp changes, which are not here. I don't change gemm internals there.

@atamazov
Copy link
Contributor

atamazov commented Apr 9, 2020

Okay, then. @ce1adon Could you please set priority label.

@atamazov
Copy link
Contributor

@ce1adon Please describe how the PR affects the library's behavior. Or this is NFC?

@ce1adon
Copy link
Contributor Author

ce1adon commented Apr 29, 2020

@ce1adon Please describe how the PR affects the library's behavior. Or this is NFC?

It's not a new feature. The influence is updated in top comment.

@ce1adon
Copy link
Contributor Author

ce1adon commented May 19, 2020

Jira ticket related to the blocker of this PR
http://ontrack-internal.amd.com/browse/SWDEV-233998

@ce1adon ce1adon removed the ON_HOLD label Sep 25, 2020
@atamazov
Copy link
Contributor

@ce1adon Up to you. However, AFAIU on_hold does not prevent anything. It simply says that PR should not be merged.

src/solver/conv_multipass_wino3x3WrW.cpp Show resolved Hide resolved
src/solver/conv_multipass_wino3x3WrW.cpp Outdated Show resolved Hide resolved
src/solver/conv_multipass_wino3x3WrW.cpp Outdated Show resolved Hide resolved
hip-clang.docker Outdated
RUN export HIPCC_LINK_FLAGS_APPEND='-O3 -parallel-jobs=4'
RUN export HIPCC_COMPILE_FLAGS_APPEND='-O3 -Wno-format-nonliteral -parallel-jobs=4'
# install last released miopentensile in default, install latest commits when MIOTENSILE_VER="latest"
RUN if [ "$MIOTENSILE_VER" = "latest" ] ; then cget -p $PREFIX install -DAMDGPU_TARGETS=${GPU_ARCH} ROCmSoftwarePlatform/MIOpenTensile@e258bbaed0bd9de546ea38c9a5c42f71fa41d9a0; else cget -p $PREFIX install -DAMDGPU_TARGETS=${GPU_ARCH} ROCmSoftwarePlatform/MIOpenTensile@2e3e792b2674bf8cdf2620749a298ed5313351bb; fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-DAMDGPU_TARGETS=${GPU_ARCH} should be passed cget init. Also this should install tensile from a file so that hcc and hip-clang install the same version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will move gpu_arch to init.
Currently both hcc and hip-clang are using the same version. The version divergence here is both for hip-clang. It intends to test both a stable version and a latest version of yaml+tensile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved gpu_arch flag to cget init

@ce1adon
Copy link
Contributor Author

ce1adon commented Oct 9, 2020

The last test passed but is marked as failed.

@ce1adon
Copy link
Contributor Author

ce1adon commented Oct 26, 2020

NOTE: d0969a0 passed at # 38

Copy link
Contributor

@atamazov atamazov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ce1adon ce1adon changed the title Add MIOpenTensile in GEMM path MIOpenTensile Part1 : Add MIOpenTensile in GEMM path Oct 29, 2020
@daniellowell daniellowell merged commit 6e91fe5 into develop Oct 30, 2020
@ce1adon ce1adon deleted the miotensile branch November 14, 2020 23:02
JehandadKhan pushed a commit that referenced this pull request Nov 20, 2020
* add miopentensile path
* use miotensile in hip backend
* update miotensile
* generalize miotensile stride
* disable miopentensile in ocl backend
* remove ocl backend consecutive identical branches
* disable use of miotensile in hip-clang
* update miopentensile version
* use const arg in gemm
* miot introduce transpose feature
* update miot version
* disable miotensile in hip-clang in cmakelists
* disable miotensile except in hcc in cmakelists
* disable ocl wrw2 solvers
* enable miopentensile built with hipclang
* add miotensile in hip-clang docker
* add python3-venv in hip-clang docker
* hip-clang docker update
* update hip-clang docker compiler
* update miopentensile info
* disable miotensile with hcc
* set miopentensile off & add dedicated test stage
* Revert "clang tidy"
This reverts commit bd706c5.
* Revert "disable ocl wrw2 solvers"
This reverts commit 17f5a59.
* workaround for issue 2534
* Adjust test config
* Adjust gemm env_var
* use miotensile in winograd3x3multipass solver
* revert gemm enforce backend values
* skip igemm tests in test stage iv (miotensile tests)
* skip igemm tests in test stage iv
* use parallel jobs for tensile
* specify gpu architecture(s) in miotensile build process
* fix jenkins docker arg
* fix jenkins docker arg string
* rename gpu target param for miotensile
* specify gpu architecture(s) in Jenkinsfile buildJob
* relocate miotensile gpu targets arg
* add symlink to /opt/rocm in Dockerfile
* revise Dockerfile to pass GPU architecture flag for dependencies
* add jenkins tests for last/latest version of miopentensile
* remove workaround for issue #2534
* winograd workaround condition update: current miopentensile doesn't support bf16/fp16
* fix jenkinsfile typo & move GPU architecture flag to cget init in hip-clang dockerfile
* revise conv_multipass_wino3x3WrW GEMM workaround for miopentensile
* miopentensile stage rename
* reset miopentensile version in hcc
* revert wino3x3WrW datatype constraint in miopentensile
* remove miopentensile from dependency list & update version
Co-authored-by: Jing Zhou <zhou@ixt-rack-95.local.lan>
OR (MIOPEN_TEST_BFLOAT16 AND ${PARSE_ALLOW_BFLOAT16})
OR (MIOPEN_TEST_HALF AND ${PARSE_ALLOW_HALF})
OR (MIOPEN_TEST_INT8 AND ${PARSE_ALLOW_INT8}))
OR (MIOPEN_TEST_INT8 AND ${PARSE_ALLOW_INT8})
OR (NOT (MIOPEN_TEST_MIOTENSILE AND (NAME IN_LIST SKIP_TESTS))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ce1adon It should be

OR (MIOPEN_TEST_MIOTENSILE AND NOT (NAME IN_LIST SKIP_TESTS)))

Current code enables ALL tests when MIOPEN_TEST_MIOTENSILE=off, thus affecting non-miopentensile tests (half etc). I will fix that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is under the condition of (NOT (MIOPEN_TEST_INT8 OR MIOPEN_TEST_BFLOAT16 OR MIOPEN_TEST_HALF OR MIOPEN_TEST_MIOTENSILE)), which means this only applies to fp32 cases. Do you mean fp32 doesn't need to test all the cases?

Copy link
Contributor

@atamazov atamazov Feb 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is not "under" that confition. It is just one of ORed conditions. Alternate formatting:

if(
    (NOT (MIOPEN_TEST_INT8 OR MIOPEN_TEST_BFLOAT16 OR MIOPEN_TEST_HALF OR MIOPEN_TEST_MIOTENSILE)) // (1)
    OR (MIOPEN_TEST_BFLOAT16 AND ${PARSE_ALLOW_BFLOAT16}) // (2)
    OR ( MIOPEN_TEST_HALF AND ${PARSE_ALLOW_HALF}) // (3)
    OR (MIOPEN_TEST_INT8 AND ${PARSE_ALLOW_INT8}) // (4)
    OR (NOT (MIOPEN_TEST_MIOTENSILE AND (NAME IN_LIST SKIP_TESTS))) // (5)
) ...

When MIOPEN_TEST_MIOTENSILE=Off, condition (5) is always true. This works properly for MIOpenTensile, that's why you've missed that, I guess.

Copy link
Contributor

@atamazov atamazov Feb 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 318f480 (not yet merged to develop)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #750

@atamazov
Copy link
Contributor

atamazov commented Nov 6, 2021

@ce1adon

It replaces the default GEMM path through rocblas in HIP backend for fp32 type.

If MIOpenTensile is the default for FP32, then shall we remove FP32 testing from "MIOpenTensile" stages in the Jenkinsfile?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants