Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OCL][MI100][MI200] GEMM convolution failures with OCL backend (Staging 95b58f72f) #1315

Closed
atamazov opened this issue Dec 1, 2021 · 2 comments
Assignees
Milestone

Comments

@atamazov
Copy link
Contributor

atamazov commented Dec 1, 2021

make check convolution failures. Preconditions:

  • Mainline build 9148
  • OpenCL backend
  • MI200 or MI100 target

Failures: make-check.9148-opencl.float.gfx90a.failures.txt.

As you can see, all failures happen with GemmBwd1x1_stride1 and GemmBwdRest.

The exact technical reason of failures is unknown. However we know that MIOpen leverages MIOpenGemm library for GEMM when backend is OpenCL. But MIOpenGEMM was never tested with MI100 and MI200 because it is deprecated and not supported anymore. So the most likely reason is issue in MIOpenGEMM.

The solution is disabling MIOpenGEMM in convolutions. The side effect is some performance drop. However, given the assumption that MI100 and MI200 are not designed to provide the best possible performance with an OpenCL backend, this performance drop can be ignored.

If performance will become important, then we can consider enabling back GEMM solvers except GemmBwd1x1_stride1 and GemmBwdRest.

@atamazov atamazov added this to the ROCm 5.0 milestone Dec 1, 2021
@atamazov atamazov changed the title [OCL][MI100][MI200] Multiple failures with OCL backend (Staging 95b58f72f) [OCL][MI100][MI200] GEMM convolution failures with OCL backend (Staging 95b58f72f) Dec 2, 2021
@atamazov atamazov self-assigned this Dec 2, 2021
@atamazov
Copy link
Contributor Author

atamazov commented Dec 4, 2021

Full description provided.

@atamazov
Copy link
Contributor Author

atamazov commented Dec 6, 2021

#1321 merged. This specific issue can be now closed. We already have the goal to replace MIOpenGemm with MiopenTensile including OCL BE).

@atamazov atamazov closed this as completed Dec 6, 2021
junliume pushed a commit that referenced this issue Dec 6, 2021
…for #1315). Disable iGemm ASM GTC XDLOPS NCHW convolutions (W/A for #1317) (#1321)

* W/A for #1315. Disable MIOpenGEMM convolutions for xDLOPs GPUs (MI100/MI200) && OpenCL BE
* W/A for #1317. Disable iGemm ASM GTC XDLOPS convolutions for NCHW configs && OCL BE (keep them enabled for NHWC)
* [Jenkins] Add Fp32 Full tests stages for Opencl BE && MI100/MI200
* [NFC] Fix comments related to WORKAROUND_MIOPENGEMM_SINCE_ROCM41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant