You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As you can see, all failures happen with GemmBwd1x1_stride1 and GemmBwdRest.
The exact technical reason of failures is unknown. However we know that MIOpen leverages MIOpenGemm library for GEMM when backend is OpenCL. But MIOpenGEMM was never tested with MI100 and MI200 because it is deprecated and not supported anymore. So the most likely reason is issue in MIOpenGEMM.
The solution is disabling MIOpenGEMM in convolutions. The side effect is some performance drop. However, given the assumption that MI100 and MI200 are not designed to provide the best possible performance with an OpenCL backend, this performance drop can be ignored.
If performance will become important, then we can consider enabling back GEMM solvers except GemmBwd1x1_stride1 and GemmBwdRest.
The text was updated successfully, but these errors were encountered:
atamazov
changed the title
[OCL][MI100][MI200] Multiple failures with OCL backend (Staging 95b58f72f)
[OCL][MI100][MI200] GEMM convolution failures with OCL backend (Staging 95b58f72f)
Dec 2, 2021
…for #1315). Disable iGemm ASM GTC XDLOPS NCHW convolutions (W/A for #1317) (#1321)
* W/A for #1315. Disable MIOpenGEMM convolutions for xDLOPs GPUs (MI100/MI200) && OpenCL BE
* W/A for #1317. Disable iGemm ASM GTC XDLOPS convolutions for NCHW configs && OCL BE (keep them enabled for NHWC)
* [Jenkins] Add Fp32 Full tests stages for Opencl BE && MI100/MI200
* [NFC] Fix comments related to WORKAROUND_MIOPENGEMM_SINCE_ROCM41
make check
convolution failures. Preconditions:Failures: make-check.9148-opencl.float.gfx90a.failures.txt.
As you can see, all failures happen with
GemmBwd1x1_stride1
andGemmBwdRest
.The exact technical reason of failures is unknown. However we know that MIOpen leverages MIOpenGemm library for GEMM when backend is OpenCL. But MIOpenGEMM was never tested with MI100 and MI200 because it is deprecated and not supported anymore. So the most likely reason is issue in MIOpenGEMM.
The solution is disabling MIOpenGEMM in convolutions. The side effect is some performance drop. However, given the assumption that MI100 and MI200 are not designed to provide the best possible performance with an OpenCL backend, this performance drop can be ignored.
If performance will become important, then we can consider enabling back GEMM solvers except
GemmBwd1x1_stride1
andGemmBwdRest
.The text was updated successfully, but these errors were encountered: