Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory access fault in miopenGemm #8

Closed
patflick opened this issue Jul 10, 2017 · 7 comments
Closed

Memory access fault in miopenGemm #8

patflick opened this issue Jul 10, 2017 · 7 comments
Assignees
Labels

Comments

@patflick
Copy link
Contributor

There's a bug inside the miopenGemm function call resulting in a memory access fault.
I've tried debugging it a bit. So far I found that the kernel accesses the A matrix one element past its size, ie, it accesses A[a_data_size].

The issue can be reproduced with the MIOpenDriver, which rules out the cause in my code:

$ ./MIOpenDriver gemm -m 128 -k 9216 -n 4096 -v 0
MIOpenDriver: gemm -m 128 -k 9216 -n 4096 -v 0
Memory access fault by GPU node-1 on address 0x1101808000. Reason: Page not present or supervisor privilege.
Aborted (core dumped)

This may be an issue inside MIOpenGemm.

@dagamayank
Copy link
Contributor

/cc @newling

@patflick I can reproduce this bug. Let us try to debug. Thanks for reporting.

@newling
Copy link
Contributor

newling commented Jul 10, 2017

@dagamayank
It's failing through the driver for certain not (m = n = k) cases. I'm investigating.

@newling
Copy link
Contributor

newling commented Jul 10, 2017

The problem is related to not switching pointers a <-> b when you switch rowMajor <-> colMajor. Pointers must switch (along with lda <-> ldb etc).

@newling
Copy link
Contributor

newling commented Jul 10, 2017

@dagamayank
A solution might be to swap DataCast(A) and DataCast(B) on line 62 of gemm_api.cpp when rowMaj but this is probably not the neatest solution as MIOpenGEMM already deals with rowMaj and colMaj. I'll need to think more.

@dagamayank
Copy link
Contributor

Thanks @newling and @patflick We have a fix for this bug and will push it out with the next release of MIOpen.

@dagamayank dagamayank added the bug label Jul 12, 2017
@dagamayank dagamayank self-assigned this Jul 12, 2017
@patflick
Copy link
Contributor Author

Is it possible to get the fix beforehand? I'm already compiling MiOpen myself. Ie, could you push the fix to some branch?

@dagamayank
Copy link
Contributor

Fixed by 78955d5 and 64e7553

ce1adon pushed a commit that referenced this issue May 11, 2021
Merge official develop to local develop
asroy pushed a commit that referenced this issue Aug 21, 2021
31b403526 Merge pull request #16 from ROCmSoftwarePlatform/develop
b62bf8c3f Merge pull request #14 from ROCmSoftwarePlatform/miopen_downstream_init_integration
ccc4a1d36 Merge pull request #8 from ROCmSoftwarePlatform/miopen_downstream_init_integration
67ad47e7c refactor
16effa767 refactor
a91b68dfc DynamicBuffer, StaticBuffer, amd_buffer_load support customized value for invalid element
2cbabbba5 use int instead of index_t in kernel wrapper
0834bc763 compiler parameter use stream
f2ac7832c make innner product compatiable on gfx900
4e57b30a6 rename
c03045ce2 rename
b2589957f update CK build script
2c48039d0 fix kernel filename
d626dccc9 fix enum issue
643ebd4f3 tidy
ddd49ec9e fix clang warning suppression
4f566c622 vector/scalar pointer cast use c-style pointer cast instead of reinterpret_cast
172036d72 add c-style pointer cast
76f313193 tidy
d18428901 tidy
f885c131d tidy
80120f0a0 tidy
c3efeb5e2 tidy
56fc0842b tidy
54fba515b tidy
e62bae7a4 tidy
24c872894 add tidy
61487e0a0 fix
ae98b52ad remove online compilation from CK
cb9542131 refactor
73ca97015 Merge commit '437cc595c6e206dfebb118985b5171bbc1e29eab' into composable_kernel_init_integration_v3
3b8664611 Merge pull request #7 from ROCmSoftwarePlatform/master
d09ea4f4e Update develop (#5)
3d32ae940 add solver ConvIgemmFwdV6r1DlopsNchwKcyxNkhw; rename static ck source files

git-subtree-dir: src/composable_kernel
git-subtree-split: 31b403526ec54abf13c4bb58dfb6635b4d2aa619
junliume pushed a commit that referenced this issue Sep 29, 2021
…duction (#1156)

* Squashed 'src/composable_kernel/' content from commit f6edda611

git-subtree-dir: src/composable_kernel
git-subtree-split: f6edda6119ebbb237dfa6270797b34f960d7b190

* add solver ConvIgemmFwdV6r1DlopsNchwKcyxNkhw; rename static ck source files

* Squashed 'src/composable_kernel/' changes from f6edda611..5781adf5c

5781adf5c Update develop (#5) (#6)
97e6d514f Merge pull request #4 from ROCmSoftwarePlatform/separate_online_compile
7b1ec41e5 refactor
49c33aaea refactor
54b3e73d1 rename

git-subtree-dir: src/composable_kernel
git-subtree-split: 5781adf5cf4ac753e2e36da7385791775b744bf7

* fix

* refactor

* remove online compilation from CK

* refactor

* fix

* add ctest

* tidy

* add tidy

* tidy

* tidy

* tidy

* tidy

* tidy

* tidy

* tidy

* tidy

* tidy

* add c-style pointer cast

* vector/scalar pointer cast use c-style pointer cast instead of reinterpret_cast

* fix clang warning suppression

* tidy

* suppress cppcheck

* fix enum issue

* revert chagnes to hip build

* fix kernel filename

* update CK build script

* rename

* rename

* make innner product compatiable on gfx900

* Update src/include/miopen/solver/ck_utility_common.hpp

Co-authored-by: JD <Jehandad.Khan@amd.com>

* compiler parameter use stream

* use int instead of index_t in kernel wrapper

* DynamicBuffer, StaticBuffer, amd_buffer_load support customized value for invalid element

* refactor

* refactor

* change cmakelist

* change ck common utility

* fix

* Squashed 'src/composable_kernel/' changes from 5781adf5c..31b403526

31b403526 Merge pull request #16 from ROCmSoftwarePlatform/develop
b62bf8c3f Merge pull request #14 from ROCmSoftwarePlatform/miopen_downstream_init_integration
ccc4a1d36 Merge pull request #8 from ROCmSoftwarePlatform/miopen_downstream_init_integration
67ad47e7c refactor
16effa767 refactor
a91b68dfc DynamicBuffer, StaticBuffer, amd_buffer_load support customized value for invalid element
2cbabbba5 use int instead of index_t in kernel wrapper
0834bc763 compiler parameter use stream
f2ac7832c make innner product compatiable on gfx900
4e57b30a6 rename
c03045ce2 rename
b2589957f update CK build script
2c48039d0 fix kernel filename
d626dccc9 fix enum issue
643ebd4f3 tidy
ddd49ec9e fix clang warning suppression
4f566c622 vector/scalar pointer cast use c-style pointer cast instead of reinterpret_cast
172036d72 add c-style pointer cast
76f313193 tidy
d18428901 tidy
f885c131d tidy
80120f0a0 tidy
c3efeb5e2 tidy
56fc0842b tidy
54fba515b tidy
e62bae7a4 tidy
24c872894 add tidy
61487e0a0 fix
ae98b52ad remove online compilation from CK
cb9542131 refactor
73ca97015 Merge commit '437cc595c6e206dfebb118985b5171bbc1e29eab' into composable_kernel_init_integration_v3
3b8664611 Merge pull request #7 from ROCmSoftwarePlatform/master
d09ea4f4e Update develop (#5)
3d32ae940 add solver ConvIgemmFwdV6r1DlopsNchwKcyxNkhw; rename static ck source files

git-subtree-dir: src/composable_kernel
git-subtree-split: 31b403526ec54abf13c4bb58dfb6635b4d2aa619

* Tiny fix in using data type template parameters in blockwise and direct_threadwise kernel

* Fix with regard to implementing GetZeroVal() in both kernel and host

* Avoid convert to compType from dstDataType before writting the output value

* Add half_t support to NumericLimits and make constexpr GetZeroVal() of binary operator

* Add CONSTANT decorator for descriptor read buffer

* Use get_thread_local_1d_id() for thread local Id

* Rename GetZeroVal() to GetReductionZeroVal() in the kernels

* Remove constexpr from initialized zeroVal and tiny fix in reduction_operator.hpp

* Occasional tiny simplification and update in the kernel files

* Update in src/reducetensor.cpp for consistent IDs passing to the kernel

* Update to re-order tensor dimensions on the host, split second_call kernel wrapper files and simplify reduce_all kernel wrappers

* Update to remove OpenCL tidy checking failures

* Small updates in src/reducetensor.cpp

* Update for better readability

* Remove unused codes and not-needed template parameters in the kernel wrappers

Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: JD <Jehandad.Khan@amd.com>
junliume pushed a commit that referenced this issue Jan 18, 2022
…duction (#1156)

* Squashed 'src/composable_kernel/' content from commit f6edda611

git-subtree-dir: src/composable_kernel
git-subtree-split: f6edda6119ebbb237dfa6270797b34f960d7b190

* add solver ConvIgemmFwdV6r1DlopsNchwKcyxNkhw; rename static ck source files

* Squashed 'src/composable_kernel/' changes from f6edda611..5781adf5c

5781adf5c Update develop (#5) (#6)
97e6d514f Merge pull request #4 from ROCmSoftwarePlatform/separate_online_compile
7b1ec41e5 refactor
49c33aaea refactor
54b3e73d1 rename

git-subtree-dir: src/composable_kernel
git-subtree-split: 5781adf5cf4ac753e2e36da7385791775b744bf7

* fix

* refactor

* remove online compilation from CK

* refactor

* fix

* add ctest

* tidy

* add tidy

* tidy

* tidy

* tidy

* tidy

* tidy

* tidy

* tidy

* tidy

* tidy

* add c-style pointer cast

* vector/scalar pointer cast use c-style pointer cast instead of reinterpret_cast

* fix clang warning suppression

* tidy

* suppress cppcheck

* fix enum issue

* revert chagnes to hip build

* fix kernel filename

* update CK build script

* rename

* rename

* make innner product compatiable on gfx900

* Update src/include/miopen/solver/ck_utility_common.hpp

Co-authored-by: JD <Jehandad.Khan@amd.com>

* compiler parameter use stream

* use int instead of index_t in kernel wrapper

* DynamicBuffer, StaticBuffer, amd_buffer_load support customized value for invalid element

* refactor

* refactor

* change cmakelist

* change ck common utility

* fix

* Squashed 'src/composable_kernel/' changes from 5781adf5c..31b403526

31b403526 Merge pull request #16 from ROCmSoftwarePlatform/develop
b62bf8c3f Merge pull request #14 from ROCmSoftwarePlatform/miopen_downstream_init_integration
ccc4a1d36 Merge pull request #8 from ROCmSoftwarePlatform/miopen_downstream_init_integration
67ad47e7c refactor
16effa767 refactor
a91b68dfc DynamicBuffer, StaticBuffer, amd_buffer_load support customized value for invalid element
2cbabbba5 use int instead of index_t in kernel wrapper
0834bc763 compiler parameter use stream
f2ac7832c make innner product compatiable on gfx900
4e57b30a6 rename
c03045ce2 rename
b2589957f update CK build script
2c48039d0 fix kernel filename
d626dccc9 fix enum issue
643ebd4f3 tidy
ddd49ec9e fix clang warning suppression
4f566c622 vector/scalar pointer cast use c-style pointer cast instead of reinterpret_cast
172036d72 add c-style pointer cast
76f313193 tidy
d18428901 tidy
f885c131d tidy
80120f0a0 tidy
c3efeb5e2 tidy
56fc0842b tidy
54fba515b tidy
e62bae7a4 tidy
24c872894 add tidy
61487e0a0 fix
ae98b52ad remove online compilation from CK
cb9542131 refactor
73ca97015 Merge commit '437cc595c6e206dfebb118985b5171bbc1e29eab' into composable_kernel_init_integration_v3
3b8664611 Merge pull request #7 from ROCmSoftwarePlatform/master
d09ea4f4e Update develop (#5)
3d32ae940 add solver ConvIgemmFwdV6r1DlopsNchwKcyxNkhw; rename static ck source files

git-subtree-dir: src/composable_kernel
git-subtree-split: 31b403526ec54abf13c4bb58dfb6635b4d2aa619

* Tiny fix in using data type template parameters in blockwise and direct_threadwise kernel

* Fix with regard to implementing GetZeroVal() in both kernel and host

* Avoid convert to compType from dstDataType before writting the output value

* Add half_t support to NumericLimits and make constexpr GetZeroVal() of binary operator

* Add CONSTANT decorator for descriptor read buffer

* Use get_thread_local_1d_id() for thread local Id

* Rename GetZeroVal() to GetReductionZeroVal() in the kernels

* Remove constexpr from initialized zeroVal and tiny fix in reduction_operator.hpp

* Occasional tiny simplification and update in the kernel files

* Update in src/reducetensor.cpp for consistent IDs passing to the kernel

* Update to re-order tensor dimensions on the host, split second_call kernel wrapper files and simplify reduce_all kernel wrappers

* Update to remove OpenCL tidy checking failures

* Small updates in src/reducetensor.cpp

* Update for better readability

* Remove unused codes and not-needed template parameters in the kernel wrappers

Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: JD <Jehandad.Khan@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants