Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WORKAROUND][OCL][MI100][MI200] Disable MIOpenGEMM convolutions (W/A for #1315). Disable iGemm ASM GTC XDLOPS NCHW convolutions (W/A for #1317) #1321

Merged
merged 12 commits into from
Dec 6, 2021

Conversation

atamazov
Copy link
Contributor

@atamazov atamazov commented Dec 2, 2021

@atamazov
Copy link
Contributor Author

atamazov commented Dec 2, 2021

Needs to double-check if this really resolves #1315 and #1317. I will do that tomorrow.

@atamazov atamazov added this to the ROCm 5.0 milestone Dec 2, 2021
@codecov

This comment has been minimized.

@carlushuang
Copy link
Contributor

carlushuang commented Dec 3, 2021

#1324 -> found the root cause for SWDEV-312112 and have PR to fix it.
#1325 -> found an issue for SWDEV-313696 (maybe this is the root cause, but can't tell for now if there still some issue, since if ignore the pooing solver, we can't run the model.)

cc @atamazov @junliume

@atamazov atamazov changed the title [WORKAROUND][MI100][MI200] Disable MIOpenGEMM convolutions for OCL BE (W/A for #1315). Disable iGemm ASM GTC XDLOPS convolutions for NCHW (W/A for #1317, SWDEV-312112, SWDEV-313696) [WORKAROUND][OCL][MI100][MI200] Disable MIOpenGEMM convolutions (W/A for #1315). Disable iGemm ASM GTC XDLOPS NCHW convolutions (W/A for #1317) Dec 4, 2021
@atamazov atamazov removed the ON_HOLD label Dec 4, 2021
@atamazov
Copy link
Contributor Author

atamazov commented Dec 4, 2021

This unblocking PR is ready to go, please review!

@atamazov
Copy link
Contributor Author

atamazov commented Dec 4, 2021

Manually tested on MI200 node with OCL BE, all tests pass.

@@ -792,6 +792,11 @@ ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC::GetWorkspaceSize(const ConvolutionCo

bool ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC::IsApplicable(const ConvolutionContext& ctx) const
{
#if WORKAROUND_ISSUE_1317
if(ctx.IsLayoutDefault())
Copy link
Contributor

@carlushuang carlushuang Dec 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since #1324 resolve this issue, this logic can be removed?
sorry I saw #1324 (comment), seems not resolved
cc @shaojiewang

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@atamazov are there other missing cases where we still get the same error? #1324 indeed has fixed the issue raised in the JIRA ticket I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junliume @shaojiewang

are there other missing cases where we still get the same error?

Please see #1317. I am suspecting this is related to n_groups > 1 && OCL BE.

shurale-nkn
shurale-nkn previously approved these changes Dec 6, 2021
Copy link
Contributor

@shurale-nkn shurale-nkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@atamazov atamazov merged this pull request into develop Dec 6, 2021
junliume pushed a commit that referenced this pull request Dec 6, 2021
…for #1315). Disable iGemm ASM GTC XDLOPS NCHW convolutions (W/A for #1317) (#1321)

* W/A for #1315. Disable MIOpenGEMM convolutions for xDLOPs GPUs (MI100/MI200) && OpenCL BE
* W/A for #1317. Disable iGemm ASM GTC XDLOPS convolutions for NCHW configs && OCL BE (keep them enabled for NHWC)
* [Jenkins] Add Fp32 Full tests stages for Opencl BE && MI100/MI200
* [NFC] Fix comments related to WORKAROUND_MIOPENGEMM_SINCE_ROCM41
@atamazov
Copy link
Contributor Author

atamazov commented Dec 18, 2021

WORKAROUND_ISSUE_1317 must also disable test_regression_opencl_float_mi100 which is intended to test some NCHW configs on OCL BE with ConvAsmImplicitGemmGTCDynamicWrwXdlops (see #1012). The reason is that this PR totally disables ConvAsmImplicitGemmGTCDynamic*Xdlops for NCHW && OCL.

@junliume junliume deleted the wa-issue-1315-1317-312112-313696 branch January 7, 2022 18:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants