Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConvAsmImplicitGemmGTCDynamicWrwXdlops: "FAILED ... oclkernel.hpp:109: Error setting argument #19 to kernel" #1012

Closed
atamazov opened this issue Jun 25, 2021 · 7 comments · Fixed by #1013
Assignees
Milestone

Comments

@atamazov
Copy link
Contributor

atamazov commented Jun 25, 2021

Found during triaging of https://ontrack-internal.amd.com/browse/SWDEV-292684.

  • Current develop, Staging, Mainline
  • OpenCL backend only (HIP seems Ok)
  • MI100 only
  • Environment
    • Default Find mode (dynamic_hybrid) AND removed User-find-db
    • Or, MIOPEN_DEBUG_IMPLICIT_GEMM_FIND_ALL_SOLUTIONS=1 MIOPEN_FIND_MODE=normal

Symptom: exception is thrown

FAILED: /dockerx/github/miopenx01/src/include/miopen/oclkernel.hpp:109: Error setting argument #19 to kernel (size = 4):  Invalid Arg Index

Some known failing configs

./bin/test_conv2d --float --cmode conv --pmode default --group-count 1 --input 128, 832, 7, 7 --weights 32, 832, 1, 1 --pads_strides_dilations 0 0 1 1 1 1
./bin/test_conv2d --float --cmode conv --pmode default --group-count 1 --input 64, 192, 28, 28 --weights 64, 192, 1, 1 --pads_strides_dilations 0 0 1 1 1 1
./bin/test_conv2d --float --cmode conv --pmode default --group-count 1 --input 64, 256, 28, 28 --weights 128, 256, 1, 1 --pads_strides_dilations 0 0 1 1 1 1
./bin/test_conv2d --float --cmode conv --pmode default --group-count 1 --input 64, 480, 14, 14 --weights 64, 480, 1, 1 --pads_strides_dilations 0 0 1 1 1 1
./bin/test_conv2d --float --cmode conv --pmode default --group-count 1 --input 64, 512, 14, 14 --weights 128, 512, 1, 1 --pads_strides_dilations 0 0 1 1 1 1
./bin/test_conv2d --float --cmode conv --pmode default --group-count 1 --input 64, 512, 28, 28 --weights 128, 512, 1, 1 --pads_strides_dilations 0 0 1 1 1 1
./bin/test_conv2d --float --cmode conv --pmode default --group-count 1 --input 64, 64, 56, 56 --weights 256, 64, 1, 1 --pads_strides_dilations 0 0 1 1 1 1
@atamazov
Copy link
Contributor Author

Example log at level 6
root@miopen908-1:/dockerx/github/miopenx01/build/42-16.release.opencl# MIOPEN_LOG_LEVEL=6 ./bin/test_conv2d --float --cmode conv --pmode default --group-count 1 --input 64, 64, 56, 56 --weights 256, 64, 1, 1 --pads_strides_dilations 0 0 1 1 1 1 ${MIO_TFLAGS}
MIOpen(OpenCL): Info [GetFindModeValueImpl] MIOPEN_FIND_MODE = DYNAMIC_HYBRID(5)
MIOpen(OpenCL): Info [get_device_name] Raw device name: gfx908:sramecc+:xnack-
MIOpen(OpenCL): Info [Handle] stream: 0x55cc93cb4540, device_id: 0x55cc93e74cb0
MIOpen(OpenCL): Info [BackwardDataGetWorkSpaceSize]
MIOpen(OpenCL): Info [AmdRocmMetadataVersionDetect] ROCm MD version AMDHSA_COv3, MIOpen version 2.13.0.120289fcb
MIOpen(OpenCL): Info2 [ValidateGcnAssemblerImpl] Running: '/opt/rocm/llvm/bin/clang --version'
MIOpen(OpenCL): Info2 [ValidateGcnAssemblerImpl] clang version 12.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-4.2.0 21155 fd12620eb54b192daf0231b6950421a85cda1feb)
MIOpen(OpenCL): Info2 [ValidateGcnAssemblerImpl] Target: x86_64-unknown-linux-gnu
MIOpen(OpenCL): Info2 [ValidateGcnAssemblerImpl] Thread model: posix
MIOpen(OpenCL): Info2 [ValidateGcnAssemblerImpl] InstalledDir: /opt/rocm/llvm/bin
MIOpen(OpenCL): Info2 [ValidateGcnAssemblerImpl]
MIOpen(OpenCL): Info [GetBackwardSolutions]
MIOpen(OpenCL): Info [Measure] Db::Prefetch time: 89.4054 ms
MIOpen(OpenCL): Info2 [FindRecordUnsafe] Looking for key 256-56-56-1x1-64-56-56-64-0x0-1x1-1x1-0-NCHW-FP32-B in file /dockerx/github/miopenx01/src/kernels/gfx90878.OpenCL.2_12_0.ufdb.txt
MIOpen(OpenCL): Info2 [FindRecordUnsafe] File is unreadable: /dockerx/github/miopenx01/src/kernels/gfx90878.OpenCL.2_12_0.ufdb.txt
MIOpen(OpenCL): Info2 [FindRecord] Looking for key 256-56-56-1x1-64-56-56-64-0x0-1x1-1x1-0-NCHW-FP32-B in file /dockerx/github/miopenx01/src/kernels/gfx90878.OpenCL.fdb.txt
MIOpen(OpenCL): Info2 [FindRecord] Key match: 256-56-56-1x1-64-56-56-64-0x0-1x1-1x1-0-NCHW-FP32-B
MIOpen(OpenCL): Info2 [FindRecord] Contents found: miopenConvolutionBwdDataAlgoGEMM:GemmBwd1x1_stride1,0.307359,0,rocBlas,<unused>;miopenConvolutionBwdDataAlgoImplicitGEMM:ConvAsmImplicitGemmGTCDynamicBwdXdlops,0.338399,0,miopenConvolutionBwdDataAlgoImplicitGEMM,<unused>;miopenConvolutionBwdDataAlgoDirect:ConvAsm1x1U,0.954719,0,miopenConvolutionBwdDataAlgoDirect,<unused>;miopenConvolutionBwdDataAlgoWinograd:ConvBinWinogradRxSf3x2,1.26384,0,miopenConvolutionBwdDataAlgoWinograd,<unused>
MIOpen(OpenCL): Info2 [Measure] Db::FindRecord time: 0.150166 ms
MIOpen(OpenCL): Info2 [BackwardDataGetWorkSpaceSize] 0
MIOpen(OpenCL): Info [ForwardGetWorkSpaceSize]
MIOpen(OpenCL): Info [GetForwardSolutions]
MIOpen(OpenCL): Info2 [FindRecordUnsafe] Looking for key 64-56-56-1x1-256-56-56-64-0x0-1x1-1x1-0-NCHW-FP32-F in file /dockerx/github/miopenx01/src/kernels/gfx90878.OpenCL.2_12_0.ufdb.txt
MIOpen(OpenCL): Info2 [FindRecordUnsafe] File is unreadable: /dockerx/github/miopenx01/src/kernels/gfx90878.OpenCL.2_12_0.ufdb.txt
MIOpen(OpenCL): Info2 [FindRecord] Looking for key 64-56-56-1x1-256-56-56-64-0x0-1x1-1x1-0-NCHW-FP32-F in file /dockerx/github/miopenx01/src/kernels/gfx90878.OpenCL.fdb.txt
MIOpen(OpenCL): Info2 [FindRecord] Key match: 64-56-56-1x1-256-56-56-64-0x0-1x1-1x1-0-NCHW-FP32-F
MIOpen(OpenCL): Info2 [FindRecord] Contents found: miopenConvolutionFwdAlgoImplicitGEMM:ConvAsmImplicitGemmGTCDynamicFwdXdlops,0.34768,0,miopenConvolutionFwdAlgoImplicitGEMM,<unused>;miopenConvolutionFwdAlgoGEMM:GemmFwd1x1_0_1,0.370079,0,rocBlas,<unused>;miopenConvolutionFwdAlgoDirect:ConvAsm1x1U,1.34944,0,miopenConvolutionFwdAlgoDirect,<unused>;miopenConvolutionFwdAlgoWinograd:ConvBinWinogradRxSf3x2,1.62528,0,miopenConvolutionFwdAlgoWinograd,<unused>
MIOpen(OpenCL): Info2 [Measure] Db::FindRecord time: 0.061431 ms
MIOpen(OpenCL): Info2 [ForwardGetWorkSpaceSize] 0
MIOpen(OpenCL): Info [BackwardWeightsGetWorkSpaceSize]
MIOpen(OpenCL): Info [GetWrwSolutions]
MIOpen(OpenCL): Info2 [FindRecordUnsafe] Looking for key 256-56-56-1x1-64-56-56-64-0x0-1x1-1x1-0-NCHW-FP32-W in file /dockerx/github/miopenx01/src/kernels/gfx90878.OpenCL.2_12_0.ufdb.txt
MIOpen(OpenCL): Info2 [FindRecordUnsafe] File is unreadable: /dockerx/github/miopenx01/src/kernels/gfx90878.OpenCL.2_12_0.ufdb.txt
MIOpen(OpenCL): Info2 [FindRecord] Looking for key 256-56-56-1x1-64-56-56-64-0x0-1x1-1x1-0-NCHW-FP32-W in file /dockerx/github/miopenx01/src/kernels/gfx90878.OpenCL.fdb.txt
MIOpen(OpenCL): Info2 [FindRecord] Key match: 256-56-56-1x1-64-56-56-64-0x0-1x1-1x1-0-NCHW-FP32-W
MIOpen(OpenCL): Info2 [FindRecord] Contents found: miopenConvolutionBwdWeightsAlgoImplicitGEMM:ConvAsmImplicitGemmGTCDynamicWrwXdlops,0.943359,0,miopenConvolutionBwdWeightsAlgoImplicitGEMM,<unused>;miopenConvolutionBwdWeightsAlgoDirect:ConvAsmBwdWrW1x1,1.80912,0,miopenConvolutionBwdWeightsAlgoDirect,<unused>;miopenConvolutionBwdWeightsAlgoGEMM:GemmWrw1x1_stride1,1.98672,0,rocBlas,<unused>;miopenConvolutionBwdWeightsAlgoWinograd:ConvBinWinogradRxSf2x3g1,9.23919,0,miopenConvolutionBwdWeightsAlgoWinograd,<unused>
MIOpen(OpenCL): Info2 [Measure] Db::FindRecord time: 0.05952 ms
MIOpen(OpenCL): Info2 [BackwardWeightsGetWorkSpaceSize] 0./bin/test_conv2d --float --cmode conv --pmode default --group-count 1 --disable-forward --disable-backward-data --input 64, 64, 56, 56 --weights 256, 64, 1, 1 --pads_strides_dilations 0 0 1 1 1 1 --trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW
MIOpen(OpenCL): Info [BackwardWeightsGetWorkSpaceSize]
MIOpen(OpenCL): Info [GetWrwSolutions]
MIOpen(OpenCL): Info2 [FindRecordUnsafe] Looking for key 256-56-56-1x1-64-56-56-64-0x0-1x1-1x1-0-NCHW-FP32-W in file /dockerx/github/miopenx01/src/kernels/gfx90878.OpenCL.2_12_0.ufdb.txt
MIOpen(OpenCL): Info2 [FindRecordUnsafe] File is unreadable: /dockerx/github/miopenx01/src/kernels/gfx90878.OpenCL.2_12_0.ufdb.txt
MIOpen(OpenCL): Info2 [FindRecord] Looking for key 256-56-56-1x1-64-56-56-64-0x0-1x1-1x1-0-NCHW-FP32-W in file /dockerx/github/miopenx01/src/kernels/gfx90878.OpenCL.fdb.txt
MIOpen(OpenCL): Info2 [FindRecord] Key match: 256-56-56-1x1-64-56-56-64-0x0-1x1-1x1-0-NCHW-FP32-W
MIOpen(OpenCL): Info2 [FindRecord] Contents found: miopenConvolutionBwdWeightsAlgoImplicitGEMM:ConvAsmImplicitGemmGTCDynamicWrwXdlops,0.943359,0,miopenConvolutionBwdWeightsAlgoImplicitGEMM,<unused>;miopenConvolutionBwdWeightsAlgoDirect:ConvAsmBwdWrW1x1,1.80912,0,miopenConvolutionBwdWeightsAlgoDirect,<unused>;miopenConvolutionBwdWeightsAlgoGEMM:GemmWrw1x1_stride1,1.98672,0,rocBlas,<unused>;miopenConvolutionBwdWeightsAlgoWinograd:ConvBinWinogradRxSf2x3g1,9.23919,0,miopenConvolutionBwdWeightsAlgoWinograd,<unused>
MIOpen(OpenCL): Info2 [Measure] Db::FindRecord time: 0.197889 ms
MIOpen(OpenCL): Info2 [BackwardWeightsGetWorkSpaceSize] 0
MIOpen(OpenCL): Info [FindConvBwdWeightsAlgorithm] requestAlgoCount = 1, workspace = 0
MIOpen(OpenCL): Info [GetWrwSolutions]
MIOpen(OpenCL): Info2 [FindRecordUnsafe] Looking for key 256-56-56-1x1-64-56-56-64-0x0-1x1-1x1-0-NCHW-FP32-W in file /dockerx/github/miopenx01/src/kernels/gfx90878.OpenCL.2_12_0.ufdb.txt
MIOpen(OpenCL): Info2 [FindRecordUnsafe] File is unreadable: /dockerx/github/miopenx01/src/kernels/gfx90878.OpenCL.2_12_0.ufdb.txt
MIOpen(OpenCL): Info2 [FindRecord] Looking for key 256-56-56-1x1-64-56-56-64-0x0-1x1-1x1-0-NCHW-FP32-W in file /dockerx/github/miopenx01/src/kernels/gfx90878.OpenCL.fdb.txt
MIOpen(OpenCL): Info2 [FindRecord] Key match: 256-56-56-1x1-64-56-56-64-0x0-1x1-1x1-0-NCHW-FP32-W
MIOpen(OpenCL): Info2 [FindRecord] Contents found: miopenConvolutionBwdWeightsAlgoImplicitGEMM:ConvAsmImplicitGemmGTCDynamicWrwXdlops,0.943359,0,miopenConvolutionBwdWeightsAlgoImplicitGEMM,<unused>;miopenConvolutionBwdWeightsAlgoDirect:ConvAsmBwdWrW1x1,1.80912,0,miopenConvolutionBwdWeightsAlgoDirect,<unused>;miopenConvolutionBwdWeightsAlgoGEMM:GemmWrw1x1_stride1,1.98672,0,rocBlas,<unused>;miopenConvolutionBwdWeightsAlgoWinograd:ConvBinWinogradRxSf2x3g1,9.23919,0,miopenConvolutionBwdWeightsAlgoWinograd,<unused>
MIOpen(OpenCL): Info2 [Measure] Db::FindRecord time: 0.111151 ms
MIOpen(OpenCL): Info [CompileWrwSolution] solver_id = ConvAsmImplicitGemmGTCDynamicWrwXdlops
MIOpen(OpenCL): Info2 [GetInvoker] Returning an invoker for problem 256x56x56x1x1x64x56x56x64xNCHWxFP32x0x0x1x1x1x1x1xW and solver ConvAsmImplicitGemmGTCDynamicWrwXdlops
MIOpen(OpenCL): Info2 [SQLiteBase] Initializing system database file /dockerx/github/miopenx01/src/kernels/miopen.db
MIOpen(OpenCL): Info2 [SQLiteBase] Initializing user database file /dockerx/github/miopenx01/src/kernels/miopen_1.0.0.udb
MIOpen(OpenCL): Info [FindSolutionImpl] ConvAsmImplicitGemmGTCDynamicWrwXdlops (not searchable)
MIOpen(OpenCL): Info2 [GetSolution] igemm_wrw_gtc_gfx908.s:igemm_wrw_gtcx_nchw_fp32_bx4_ex0_bt128x64x16_wt32x8x1_ws1x2_wr2x2_ta1x4x2x1_1x4x1x64_tb1x4x1x1_1x4x1x64_gkgs
MIOpen(OpenCL): Info2 [PrepareInvoker] Preparing kernel: igemm_wrw_gtcx_nchw_fp32_bx4_ex0_bt128x64x16_wt32x8x1_ws1x2_wr2x2_ta1x4x2x1_1x4x1x64_tb1x4x1x1_1x4x1x64_gkgs
MIOpen(OpenCL): Info2 [HipCompilerVersionImpl] Read version information from HIP package...
MIOpen(OpenCL): Info [HipCompilerVersionImpl] 4.2.21155
MIOpen(OpenCL): Info2 [AmdgcnAssembleQuiet] /opt/rocm/llvm/bin/clang  -x assembler -target amdgcn--amdhsa -mcpu=gfx900 /tmp/d944-6e2d-64b1-27a0 -o /dev/null 2>&1
MIOpen(OpenCL): Info2 [AmdgcnAssemble] ' -x assembler -target amdgcn--amdhsa -mno-xnack -mcpu=gfx908:sramecc+:xnack-  -Wa,-defsym,ROCM_METADATA_VERSION=5 - -o /tmp/miopen-tmp-b0bf-52eb-5697-7bb0/amdgcn-asm-out-XXXXXX'
MIOpen(OpenCL): Info2 [Log] Kernel igemm_wrw_gtc_gfx908.s Compile Time, ms: 1376.57
MIOpen(OpenCL): Info2 [Register] Invoker registered for algorithm 256x56x56x1x1x64x56x56x64xNCHWxFP32x0x0x1x1x1x1x1xW and solver ConvAsmImplicitGemmGTCDynamicWrwXdlops
MIOpen(OpenCL): Info2 [SetAsFound1_0] Solver ConvAsmImplicitGemmGTCDynamicWrwXdlops registered as find 1.0 best for miopenConvolutionBwdWeightsAlgoImplicitGEMM in 256x56x56x1x1x64x56x56x64xNCHWxFP32x0x0x1x1x1x1x1xW
MIOpen(OpenCL): Info [FindConvBwdWeightsAlgorithm] miopenConvolutionBwdWeightsAlgoImplicitGEMM  0.943359        0
MIOpen(OpenCL): Info [FindConvBwdWeightsAlgorithm] BWrW Chosen Algorithm: ConvAsmImplicitGemmGTCDynamicWrwXdlops , 0, 0.943359
MIOpen(OpenCL): Info [ConvolutionBackwardWeights] algo = 5, workspace = 0
MIOpen(OpenCL): Info2 [GetInvoker] Returning an invoker for problem 256x56x56x1x1x64x56x56x64xNCHWxFP32x0x0x1x1x1x1x1xW and algorithm miopenConvolutionBwdWeightsAlgoImplicitGEMM
MIOpen(OpenCL): Info2 [GetKernels] 0 kernels for key: SubTensorOpWithScalar1d "set 1 16384"
MIOpen(OpenCL): Info2 [AddKernel] Key: SubTensorOpWithScalar1d "set 1 16384"
MIOpen(OpenCL): Info2 [LoadProgram] Building OpenCL program: 'MIOpenSubTensorOpWithScalarKernel.cl', options: '-DSUBTENSOR_OP_WITH_SCALAR=SUBTENSOR_OP_WITH_SCALAR_SET -DMIOPEN_USE_FP16=0 -DMIOPEN_USE_FP32=1 -DMIOPEN_USE_INT8=0 -DMIOPEN_USE_INT8x4=0 -DMIOPEN_USE_BFP16=0 -DMIOPEN_USE_INT32=0 -DMIOPEN_USE_RNE_BFLOAT16=1 -DWORK_LENGTH_0=16384 -Werror -Wf,-Weverything -Wf,-Wno-cast-align -Wf,-Wno-cast-qual -Wf,-Wno-conversion -Wf,-Wno-double-promotion -Wf,-Wno-float-equal -Wf,-Wno-missing-prototypes -Wf,-Wno-pass-failed -Wf,-Wno-pedantic-core-features -Wf,-Wno-reserved-id-macro -Wf,-Wno-shorten-64-to-32 -Wf,-Wno-sign-compare -Wf,-Wno-sign-conversion -Wf,-Wno-unused-function -Wf,-Wno-unused-macros -cl-std=CL1.2
MIOpen(OpenCL): Info2 [Log] Kernel MIOpenSubTensorOpWithScalarKernel.cl Compile Time, ms: 188.954
./bin/test_conv2d --float --cmode conv --pmode default --group-count 1 --disable-forward --disable-backward-data --input 64, 64, 56, 56 --weights 256, 64, 1, 1 --pads_strides_dilations 0 0 1 1 1 1 --trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW
FAILED: /dockerx/github/miopenx01/src/include/miopen/oclkernel.hpp:109: Error setting argument #19 to kernel (size = 4):  Invalid Arg Index
Backward weights convolution:
Input tensor: 64, 64, 56, 56
Weights tensor: 256, 64, 1, 1
Output tensor: 64, 256, 56, 56
Filter: conv2d, miopenConvolution, miopenPaddingDefault, {0, 0}, {1, 1}, {1, 1},

@atamazov atamazov changed the title [OCL] ConvAsmImplicitGemmGTCDynamicWrwXdlops: fails to SetTensor (FAILED to launch SubTensorOpWithScalar1d) [OCL] ConvAsmImplicitGemmGTCDynamicWrwXdlops: "FAILED ... oclkernel.hpp:109: Error setting argument #19 to kernel" Jun 26, 2021
@atamazov
Copy link
Contributor Author

Most likely, the reason is OCL-incompatible metadata in igemm_wrw_gtc_gfx908.s.

@atamazov
Copy link
Contributor Author

atamazov commented Jun 26, 2021

Well, not only this. The library passes more arguments (19) than kernels want to read (18).

@carlushuang @shaojiewang You can take https://github.com/ROCmSoftwarePlatform/MIOpen/tree/fix-issue-1012-atamazov as a basis (it fixes constness of buffers) and add fix for argument 19. I think that the simplest solution could be fixing the Invoker.

@atamazov
Copy link
Contributor Author

Since we are passing more arguments than necessary, we run into UB (undefined behavior) with HIP runtime. Which means that even if it works right now, it still may fail at any given moment.

@atamazov atamazov added this to the ROCm 4.4 milestone Jun 26, 2021
@atamazov
Copy link
Contributor Author

/cc @junliume

@carlushuang
Copy link
Contributor

@atamazov yes you are right. we need make sure number of kernel arg feeding into solver, and declared in metadata are the same. PR created #1013

@atamazov atamazov changed the title [OCL] ConvAsmImplicitGemmGTCDynamicWrwXdlops: "FAILED ... oclkernel.hpp:109: Error setting argument #19 to kernel" ConvAsmImplicitGemmGTCDynamicWrwXdlops: "FAILED ... oclkernel.hpp:109: Error setting argument #19 to kernel" Oct 27, 2023
@atamazov
Copy link
Contributor Author

This issue is actually not OpenCL backend-specific (see #1012 (comment)). But only OpenCL backend is able to catch this BTW.

/fyi @junliume @JehandadKhan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants