Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1x1 kernel hangs indefinitely in miopenFindConvolutionBackwardWeightsAlgorithm #13

Closed
patflick opened this issue Jul 14, 2017 · 5 comments
Labels

Comments

@patflick
Copy link
Contributor

The following configuration with a 1x1 kernel fails the forward verification and then seems to hang indefinitely inside the miopenFindConvolutionBackwardWeightsAlgorithm function.

$ ./MIOpenDriver conv -H 14 -W 14 -P 1 -k 512 -c 256 -n 128 -p 0 -q 0 -u 2 -v 2 -x 1 -y 1 -t 1
MIOpenDriver: conv -H 14 -W 14 -P 1 -k 512 -c 256 -n 128 -p 0 -q 0 -u 2 -v 2 -x 1 -y 1 -t 1
MIOpen Forward Conv. Algorithm: 1
GPU Kernel Time Forward Conv. Elapsed: 3.433880 ms
Forward Convolution Verifies on CPU and GPU
MIOpen Backward Data Conv. Algorithm: 0
GPU Kernel Time Backward Data Conv. Elapsed: 104.385124 ms
^C

(waited a good 10 minutes). Interrupting in gdb shows that this hangs inside the miopenFindConvolutionBackwardWeightsAlgorithm function.

@patflick
Copy link
Contributor Author

Sorry, it fails the forward verification only without -t 1:

$ ./MIOpenDriver conv -H 14 -W 14 -P 1 -k 512 -c 256 -n 128 -p 0 -q 0 -u 2 -v 2 -x 1 -y 1     
MIOpenDriver: conv -H 14 -W 14 -P 1 -k 512 -c 256 -n 128 -p 0 -q 0 -u 2 -v 2 -x 1 -y 1
Forward Convolution Failed: 0.206861
^C

@dagamayank
Copy link
Contributor

Closing this for now as the common problem is missing synchronization without -t 1 flag. This is the same issue as #12

@patflick
Copy link
Contributor Author

Even with the -t 1 flag, it still hangs indefinitely. The failed verification was just an additional thing.

@dagamayank dagamayank reopened this Jul 14, 2017
@dagamayank
Copy link
Contributor

dagamayank commented Jul 14, 2017

@patflick I narrowed it down to a bug in the Direct algorithm. If you disable the Direct algorithm with this env. variable, the test passes.
MIOPEN_DEBUG_CONV_DIRECT=0

MIOPEN_DEBUG_CONV_DIRECT=0 ./bin/MIOpenDriver conv -H 14 -W 14 -P 1 -k 512 -c 256 -n 128 -p 0 -q 0 -u 2 -v 2 -x 1 -y 1 
MIOpenDriver: conv -H 14 -W 14 -P 1 -k 512 -c 256 -n 128 -p 0 -q 0 -u 2 -v 2 -x 1 -y 1
Forward Convolution Verifies on CPU and GPU
Backward Convolution Data Verifies on CPU and GPU
Backward Convolution Weights Verifies on CPU and GPU

But the downside is that the Direct algorithm is disabled for fwd as well resulting in slower fwd time.

Let us look into fixing it. Thanks for reporting.
cc\ @alyashev

@dagamayank dagamayank added the bug label Jul 14, 2017
@dagamayank
Copy link
Contributor

Disabling Direct algorithm for all stride=2 cases fixes this issue for now 756f73f. We will update and optimize the Direct algorithm in subsequent releases.

ltqin pushed a commit that referenced this issue Oct 28, 2021
646fcc268 Merge pull request #47 from ROCmSoftwarePlatform/develop
6014185ac [Bug Fix] GridwiseGemm_bk0mk1_bk0nk1_mn_xdlops_v2r4 loop issue (#44)
3e9113707 Merge pull request #46 from ROCmSoftwarePlatform/miopen_downstream_all
211dae822 Merge branch 'develop' into miopen_downstream_all
5890e3007 [Composable Kernel] update develop branch code to ck_upstream
d5297abae fix bug in gridwise gemm xdlops v2r3 (#45)
38a90b6ed Merge pull request #43 from ROCmSoftwarePlatform/develop
c3018794b bug fix (#39)
fd49ff808 add nchw atomic , nhwc and nhwc atomic method   for backward weight (#30)
b2dc55f82 [MIOpen Downstream] Fix Reduction Kernel (#34)
b3e8d57d5 Tweak GEMM kernel (#38)
846f462bd Add VectorType support into StaticBuffer (#27)
dfb80c4e3 [Enhancements] Several bugfixes and refactoring of dynamic generic reduction  (#1156)
8557901d0 Merge pull request #1165 from ROCmSoftwarePlatform/develop
f305bebdc Merge pull request #31 from ROCmSoftwarePlatform/miopen_downstream-dynamic_reduction_pr
b725e3fc8 Merge remote-tracking branch 'origin/develop' into miopen_downstream-dynamic_reduction_pr
88833bd9a Merge pull request #32 from ROCmSoftwarePlatform/develop
df0d68106 :Merge remote-tracking branch 'origin/develop' into CK_upstream
f3acd2510 Add  a version of Merge transform that use integerdivision and mod (#25)
19613902b GEMM driver and kernel (#29)
627d8ef35 Backward weight v4r4r2 with xdlops (#18)
10bb81106 Misc fixes (#24)
9e80cdceb [SWDEV-281541][MSRCHA-100] Implementation of Dynamic Generic Reduction  (#1108)
a7a758d8c GlobalAtomicAdd for fp32/int32 (#23)
9d3f634a3 Xdlops refactor fix (#22)
c6f26bb48 magic division use __umulhi() (#19)
6fe3627a9 Composable kernel init integration v3 (#1097)
a2ad6d353 refactor dynamic xdlops iGemm (#13)
ba6f79a75 Added host_conv_wrw for verification (#15)

git-subtree-dir: src/composable_kernel
git-subtree-split: 646fcc268ede841a16cdaafb68aa64803d8390e1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants