MIOpen Error: /data/repo/MIOpen/src/hip/handlehip.cpp:70: Memory not available to allocate buffer: 655360000 #47

greatken999 · 2018-07-17T01:10:06Z

2018-07-17 09:05:55.622488: I tensorflow/core/kernels/conv_ops.cc:670] running auto-tune for Convolve
MIOpen Error: /data/repo/MIOpen/src/hip/handlehip.cpp:70: Memory not available to allocate buffer: 655360000
2018-07-17 09:05:55.622826: F tensorflow/stream_executor/rocm/rocm_dnn.cc:1603] Check failed: status == miopenStatusSuccess (7 vs. 0)Unable to find a suitable algorithm for doing forward convolution

daniellowell · 2018-07-17T15:44:01Z

Looks like it is trying to allocate 655MB of memory which is not available. Can you run this test using the environment variable set:

MIOPEN_LOG_LEVEL=6

It will help us see what the configuration looks like. Also, the above message is not enough for us to understand what is going on. What is your system environment and total allocations for the model you're running?

greatken999 · 2018-07-18T06:45:43Z

export MIOPEN_LOG_LEVEL=6
(asrtspeechenv) ken@ken-B250M-D3H:/media/ken/3b9999c7-6235-4b04-b006-0ca0b26ded281/data1/ai/ASRT_SpeechRecognition$ python3 train_mspeech.py
Using TensorFlow backend.
2018-07-18 14:38:49.807364: W tensorflow/stream_executor/rocm/rocm_driver.cc:405] creating context when one is currently active; existing: 0x7ff672f144f0
2018-07-18 14:38:49.807438: I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] Found device 0 with properties:
name: Device 687f
AMDGPU ISA: gfx900
memoryClockRate (GHz) 1.63
pciBusID 0000:03:00.0
Total memory: 7.98GiB
Free memory: 7.73GiB
2018-07-18 14:38:49.807450: I tensorflow/core/common_runtime/gpu/gpu_device.cc:928] DMA: 0
2018-07-18 14:38:49.807455: I tensorflow/core/common_runtime/gpu/gpu_device.cc:938] 0: Y
2018-07-18 14:38:49.807460: I tensorflow/core/common_runtime/gpu/gpu_device.cc:996] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Device 687f, pci bus id: 0000:03:00.0)
[*提示] 创建模型成功，模型编译成功
[running] train epoch 0 .
[message] epoch 0 . Have train datas 0+
Epoch 1/1
2018-07-18 14:38:52.752720: I tensorflow/core/kernels/conv_ops.cc:670] running auto-tune for Convolve
MIOpen Error: /data/repo/MIOpen/src/hip/handlehip.cpp:70: Memory not available to allocate buffer: 655360000
2018-07-18 14:38:52.753049: F tensorflow/stream_executor/rocm/rocm_dnn.cc:1603] Check failed: status == miopenStatusSuccess (7 vs. 0)Unable to find a suitable algorithm for doing forward convolution
已放弃 (核心已转储)

greatken999 · 2018-07-18T06:49:51Z

it 's look no more details when export MIOPEN_LOG_LEVEL=6.
hipconfig info:
HIP version : 1.5.18151

== hipconfig
HIP_PATH : /opt/rocm/hip
HIP_PLATFORM : hcc
CPP_CONFIG : -D__HIP_PLATFORM_HCC__= -I/opt/rocm/hip/include -I/opt/rocm/hcc/include

== hcc
HSA_PATH : /opt/rocm/hsa
HCC_HOME : /opt/rocm/hcc
HCC clang version 7.0.0 (ssh://gerritgit/compute/ec/hcc-tot/clang 86791fc4961dc8ffde77bde20d7dfa5e5cbeff5e) (ssh://gerritgit/compute/ec/hcc-tot/llvm 0ccef158132e1222d549edf2da33d4bc0be6c2d1) (based on HCC 1.2.18184-74f5fa9-86791fc-0ccef15 )
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm/hcc/bin
LLVM (http://llvm.org/):
LLVM version 7.0.0svn
Optimized build.
Default target: x86_64-unknown-linux-gnu
Host CPU: skylake

Registered Targets:
amdgcn - AMD GCN GPUs
r600 - AMD GPUs HD2XXX-HD6XXX
x86 - 32-bit X86: Pentium-Pro and above
x86-64 - 64-bit X86: EM64T and AMD64
HCC-cxxflags : -hc -std=c++amp -I/opt/rocm/hcc/includeHCC-ldflags : -hc -std=c++amp -L/opt/rocm/hcc/lib -Wl,--rpath=/opt/rocm/hcc/lib -ldl -lm -lpthread -lhc_am -Wl,--whole-archive -lmcwamp -Wl,--no-whole-archive

=== Environment Variables
PATH=/media/ken/3b9999c7-6235-4b04-b006-0ca0b26ded281/data1/asrtspeechenv/bin:/opt/rocm/hcc/bin:/opt/rocm/hip/bin:/home/ken/bin:/home/ken/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/opt/rocm/bin/
LD_LIBRARY_PATH=/opt/rocm/lib/
HIP_PATH=/opt/rocm/hip
HCC_HOME=/opt/rocm/hcc

== Linux Kernel
Hostname : ken-B250M-D3H
Linux ken-B250M-D3H 4.13.0-45-generic #50~16.04.1-Ubuntu SMP Wed May 30 11:18:27 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.4 LTS
Release: 16.04
Codename: xenial

greatken999 · 2018-07-18T10:21:57Z

rocminfo

HSA System Attributes

Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (number of timestamp)
Machine Model: LARGE
System Endianness: LITTLE

==========
HSA Agents

Agent 1

Name: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0
Queue Min Size: 0
Queue Max Size: 0
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768KB
Chip ID: 0
Cacheline Size: 64
Max Clock Frequency (MHz):3800
BDFID: 0
Compute Unit: 4
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 32899292KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Acessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 32899292KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Acessible by all: TRUE
ISA Info:
N/A

Agent 2

Name: gfx900
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128
Queue Min Size: 4096
Queue Max Size: 131072
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16KB
Chip ID: 26751
Cacheline Size: 64
Max Clock Frequency (MHz):1630
BDFID: 768
Compute Unit: 64
Features: KERNEL_DISPATCH
Fast F16 Operation: FALSE
Wavefront Size: 64
Workgroup Max Size: 1024
Workgroup Max Size Per Dimension:
Dim[0]: 67109888
Dim[1]: 50332672
Dim[2]: 604110848
Grid Max Size: 4294967295
Waves Per CU: 40
Max Work-item Per CU: 2560
Grid Max Size per Dimension:
Dim[0]: 4294967295
Dim[1]: 4294967295
Dim[2]: 4294967295
Max number Of fbarriers Per Workgroup:32
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 8372224KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Acessible by all: FALSE
Pool 2
Segment: GROUP
Size: 64KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Acessible by all: FALSE
ISA Info:
ISA 1
Name: AMD:AMDGPU:9:0:0
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Dimension:
Dim[0]: 67109888
Dim[1]: 1024
Dim[2]: 16777217
Workgroup Max Size: 1024
Grid Max Dimension:
x 4294967295
y 4294967295
z 4294967295
Grid Max Size: 4294967295
FBarrier Max Size: 32
*** Done ***

greatken999 · 2018-07-18T10:24:21Z

rocm_bandwidth_test
......
....

      RocmBandwidthTest Version: 1.0.0

      Device: 0,  Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
      Device: 1,  Device 687f

      Device Access

      D/D       0         1         

      0         1         1         

      1         1         1         


      Device Numa Distance

      D/D       0         1         

      0         0         N/A       

      1         0         0         


      Unidirectional peak bandwidth GB/s

      D/D       0           1           

      0         N/A         13.915766   

      1         14.088893   394.403061  


      Bdirectional peak bandwidth GB/s

      D/D       0           1           

      0         N/A         15.290195   

      1         15.624503   N/A

daniellowell · 2018-07-18T20:57:22Z

Epoch` 1/1
2018-07-18 14:38:52.752720: I tensorflow/core/kernels/conv_ops.cc:670] running auto-tune for Convolve
MIOpen Error: /data/repo/MIOpen/src/hip/handlehip.cpp:70: Memory not available to allocate buffer: 655360000
2018-07-18 14:38:52.753049: F tensorflow/stream_executor/rocm/rocm_dnn.cc:1603] Check failed: status == miopenStatusSuccess (7 vs. 0)Unable to find a suitable algorithm for doing forward convolution

Seems like you're simply running out of memory, however let's try one more thing, can you rerun it using this environment variable:
MIOPEN_ENABLE_LOGGING=1

greatken999 · 2018-07-19T06:55:06Z

thanks for your help! @daniellowell
export MIOPEN_ENABLE_LOGGING=1
(asrtspeechenv) ken@ken-B250M-D3H:/media/ken/3b9999c7-6235-4b04-b006-0ca0b26ded281/data1/ai/ASRT_SpeechRecognition$ python3 train_mspeech.py
Using TensorFlow backend.
2018-07-19 14:48:25.069862: W tensorflow/stream_executor/rocm/rocm_driver.cc:405] creating context when one is currently active; existing: 0x7f12e54dfa70
2018-07-19 14:48:25.069964: I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] Found device 0 with properties:
name: Device 687f
AMDGPU ISA: gfx900
memoryClockRate (GHz) 1.63
pciBusID 0000:03:00.0
Total memory: 7.98GiB
Free memory: 7.73GiB
2018-07-19 14:48:25.069976: I tensorflow/core/common_runtime/gpu/gpu_device.cc:928] DMA: 0
2018-07-19 14:48:25.069981: I tensorflow/core/common_runtime/gpu/gpu_device.cc:938] 0: Y
2018-07-19 14:48:25.069987: I tensorflow/core/common_runtime/gpu/gpu_device.cc:996] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Device 687f, pci bus id: 0000:03:00.0)
[*提示] 创建模型成功，模型编译成功
[running] train epoch 0 .
[message] epoch 0 . Have train datas 0+
Epoch 1/1
2018-07-19 14:48:27.635339: I tensorflow/core/kernels/conv_ops.cc:670] running auto-tune for Convolve
MIOpen(HIP): miopenStatus_t miopenCreateTensorDescriptor(miopenTensorDescriptor_t *){
tensorDesc = 0
}
MIOpen(HIP): miopenStatus_t miopenSet4dTensorDescriptor(miopenTensorDescriptor_t, miopenDataType_t, int, int, int, int){
tensorDesc =
dataType = 1
n = 16
c = 1
h = 1600
w = 200
}
MIOpen(HIP): miopenStatus_t miopenCreateTensorDescriptor(miopenTensorDescriptor_t *){
tensorDesc = 0
}
MIOpen(HIP): miopenStatus_t miopenSet4dTensorDescriptor(miopenTensorDescriptor_t, miopenDataType_t, int, int, int, int){
tensorDesc =
dataType = 1
n = 16
c = 32
h = 1600
w = 200
}
MIOpen(HIP): miopenStatus_t miopenCreateTensorDescriptor(miopenTensorDescriptor_t *){
tensorDesc = 0
}
MIOpen(HIP): miopenStatus_t miopenSet4dTensorDescriptor(miopenTensorDescriptor_t, miopenDataType_t, int, int, int, int){
tensorDesc =
dataType = 1
n = 32
c = 1
h = 3
w = 3
}
MIOpen(HIP): miopenStatus_t miopenCreateConvolutionDescriptor(miopenConvolutionDescriptor_t *){
convDesc = 0
}
MIOpen(HIP): miopenStatus_t miopenInitConvolutionDescriptor(miopenConvolutionDescriptor_t, miopenConvolutionMode_t, int, int, int, int, int, int){
convDesc = 0, 0, 1, 1, 1, 1,
c_mode = 0
pad_h = 1
pad_w = 1
u = 1
v = 1
dilation_h = 1
dilation_w = 1
}
MIOpen(HIP): miopenStatus_t miopenConvolutionForwardGetWorkSpaceSize(miopenHandle_t, const miopenTensorDescriptor_t, const miopenTensorDescriptor_t, const miopenConvolutionDescriptor_t, const miopenTensorDescriptor_t, size_t *){
wDesc = 32, 1, 3, 3
yDesc = 16, 32, 1600, 200
convDesc = 1, 1, 1, 1, 1, 1,
workSpaceSize = 14471916849344069120
}
MIOpen(HIP): miopenStatus_t miopenFindConvolutionForwardAlgorithm(miopenHandle_t, const miopenTensorDescriptor_t, const void *, const miopenTensorDescriptor_t, const void *, const miopenConvolutionDescriptor_t, const miopenTensorDescriptor_t, void *, const int, int *, miopenConvAlgoPerf_t *, void *, size_t, bool){
xDesc = 16, 1, 1600, 200
x = 0x909575200
wDesc = 32, 1, 3, 3
w = 0x908573600
convDesc = 1, 1, 1, 1, 1, 1,
yDesc = 16, 32, 1600, 200
y = 0x932542600
requestAlgoCount = 1
returnedAlgoCount = -4176939
perfResults =
workSpace = 0x959642600
workSpaceSize = 11520000
exhaustiveSearch = 0
}
MIOpen Error: /data/repo/MIOpen/src/hip/handlehip.cpp:70: Memory not available to allocate buffer: 655360000
2018-07-19 14:48:27.636525: F tensorflow/stream_executor/rocm/rocm_dnn.cc:1603] Check failed: status == miopenStatusSuccess (7 vs. 0)Unable to find a suitable algorithm for doing forward convolution

daniellowell · 2019-05-16T21:09:25Z

@greatken999 Can you try this on the current software stack.

greatken999 · 2019-05-23T10:10:09Z

@daniellowell ,sorry ,my vega64 had hangup problem now .

646fcc268 Merge pull request #47 from ROCmSoftwarePlatform/develop 6014185ac [Bug Fix] GridwiseGemm_bk0mk1_bk0nk1_mn_xdlops_v2r4 loop issue (#44) 3e9113707 Merge pull request #46 from ROCmSoftwarePlatform/miopen_downstream_all 211dae822 Merge branch 'develop' into miopen_downstream_all 5890e3007 [Composable Kernel] update develop branch code to ck_upstream d5297abae fix bug in gridwise gemm xdlops v2r3 (#45) 38a90b6ed Merge pull request #43 from ROCmSoftwarePlatform/develop c3018794b bug fix (#39) fd49ff808 add nchw atomic , nhwc and nhwc atomic method for backward weight (#30) b2dc55f82 [MIOpen Downstream] Fix Reduction Kernel (#34) b3e8d57d5 Tweak GEMM kernel (#38) 846f462bd Add VectorType support into StaticBuffer (#27) dfb80c4e3 [Enhancements] Several bugfixes and refactoring of dynamic generic reduction (#1156) 8557901d0 Merge pull request #1165 from ROCmSoftwarePlatform/develop f305bebdc Merge pull request #31 from ROCmSoftwarePlatform/miopen_downstream-dynamic_reduction_pr b725e3fc8 Merge remote-tracking branch 'origin/develop' into miopen_downstream-dynamic_reduction_pr 88833bd9a Merge pull request #32 from ROCmSoftwarePlatform/develop df0d68106 :Merge remote-tracking branch 'origin/develop' into CK_upstream f3acd2510 Add a version of Merge transform that use integerdivision and mod (#25) 19613902b GEMM driver and kernel (#29) 627d8ef35 Backward weight v4r4r2 with xdlops (#18) 10bb81106 Misc fixes (#24) 9e80cdceb [SWDEV-281541][MSRCHA-100] Implementation of Dynamic Generic Reduction (#1108) a7a758d8c GlobalAtomicAdd for fp32/int32 (#23) 9d3f634a3 Xdlops refactor fix (#22) c6f26bb48 magic division use __umulhi() (#19) 6fe3627a9 Composable kernel init integration v3 (#1097) a2ad6d353 refactor dynamic xdlops iGemm (#13) ba6f79a75 Added host_conv_wrw for verification (#15) git-subtree-dir: src/composable_kernel git-subtree-split: 646fcc268ede841a16cdaafb68aa64803d8390e1

722feea66 sp/chk precomp kernel 264 (#41) b9aba2034 Batch norm find compile (#50) 359f3da80 Fix missing link directives in fin binary (#48) a4020c1ba Cache Miss Fixes (#46) 2ec7ef44d Enable google test and compiling fin in the CI (#47) 8b6b453bc Applicability support for batch norm (#45) 44323aae9 Perf compile/eval for fin (#42) REVERT: a30a51bc6 remove unused header REVERT: 7d2fd834c reduce scope of variable REVERT: f6e9abe79 clang format REVERT: 834e9a397 remove comment REVERT: c8d6eb1a0 workspace rename REVERT: aa7d2ea24 Merge remote-tracking branch 'origin/develop' into cderb/miopen_perf REVERT: aaf13fb12 add to print for debug REVERT: 34e11fa70 Merge remote-tracking branch 'origin/develop' into cderb/miopen_perf REVERT: cb6c19d13 add search+update directives to execution context, add json examples for perf eval REVERT: 85029077b connecting new fin functions for perf eval REVERT: 4d1e031fd add outputs and definitions REVERT: 952538cb8 adding perf eval function, in progress REVERT: 617dccd9c rename REVERT: 5c35ae886 fixes for collecting kernel blobs REVERT: 5cfea7c43 syntax fixes REVERT: 2f2a4ed9f add test file REVERT: 7175019f5 first rendition of perf_compile git-subtree-dir: fin git-subtree-split: 722feea660e2e3d7f8e1edcc520a938be4885a44

30d699b9e Perf Eval Update (#60) 3535b948c PerfCompile and PerfEval changes (#59) de79468d2 remove unneccessary solution check, add check for previously modified kernel names (#56) 6924286a2 miopen hash update (#55) 530399575 Refactor googletest infra to align with MIOpen (#53) 71c50d146 Datatype fix for BN (#57) 8abe2f5c6 Perf Eval updates, Add find info (#51) e1c1ef0f5 filter find compile by solver input (#54) 722feea66 sp/chk precomp kernel 264 (#41) b9aba2034 Batch norm find compile (#50) 359f3da80 Fix missing link directives in fin binary (#48) a4020c1ba Cache Miss Fixes (#46) 2ec7ef44d Enable google test and compiling fin in the CI (#47) 8b6b453bc Applicability support for batch norm (#45) 44323aae9 Perf compile/eval for fin (#42) ebd9aa6bd update member name (#43) d6d798efe add cu count (#39) 8e1989a9f Add find option for selecting only dynamic solvers (#38) 0e164bf66 setting json version (#37) f3f7fed18 Remove function redefinition (#36) e1de51a58 Performance DB de-serialize test (#34) 043cdcdaa Layout support in Fin (#33) 3a1d58236 Hotfix (#32) ee3f0d543 4.4 Tuning Bugfixes (#31) 832dbe234 Tunability Reporting (#27) a564a229f include gfx90a_110 (#28) git-subtree-dir: fin git-subtree-split: 30d699b9edc014c6076a9649f849bd3c4588d4ab

* add perf cfg validity test to TestSysDbRecord * remove debug prints * removing invalid entries from all perf dbs * VACUUM sqlite * Squashed 'fin/' changes from 53d2563fe..30d699b9e 30d699b9e Perf Eval Update (#60) 3535b948c PerfCompile and PerfEval changes (#59) de79468d2 remove unneccessary solution check, add check for previously modified kernel names (#56) 6924286a2 miopen hash update (#55) 530399575 Refactor googletest infra to align with MIOpen (#53) 71c50d146 Datatype fix for BN (#57) 8abe2f5c6 Perf Eval updates, Add find info (#51) e1c1ef0f5 filter find compile by solver input (#54) 722feea66 sp/chk precomp kernel 264 (#41) b9aba2034 Batch norm find compile (#50) 359f3da80 Fix missing link directives in fin binary (#48) a4020c1ba Cache Miss Fixes (#46) 2ec7ef44d Enable google test and compiling fin in the CI (#47) 8b6b453bc Applicability support for batch norm (#45) 44323aae9 Perf compile/eval for fin (#42) ebd9aa6bd update member name (#43) d6d798efe add cu count (#39) 8e1989a9f Add find option for selecting only dynamic solvers (#38) 0e164bf66 setting json version (#37) f3f7fed18 Remove function redefinition (#36) e1de51a58 Performance DB de-serialize test (#34) 043cdcdaa Layout support in Fin (#33) 3a1d58236 Hotfix (#32) ee3f0d543 4.4 Tuning Bugfixes (#31) 832dbe234 Tunability Reporting (#27) a564a229f include gfx90a_110 (#28) git-subtree-dir: fin git-subtree-split: 30d699b9edc014c6076a9649f849bd3c4588d4ab * Squashed 'fin/' changes from 30d699b9e..ea5c844af ea5c844af fix direction test 3aa412ee1 Update to use revised testSysDbRecord miopen function git-subtree-dir: fin git-subtree-split: ea5c844aff8b5d46537aa59034a596fd15cd9e1e * rename pipe step * Squashed 'fin/' changes from ea5c844af..c702cb968 c702cb968 format git-subtree-dir: fin git-subtree-split: c702cb96800a03b17ee17d03a015dfa38e3883b9 * Squashed 'fin/' changes from c702cb968..d5397abd3 d5397abd3 rename targets git-subtree-dir: fin git-subtree-split: d5397abd37b6908bcd96ef750ea5a3ace04cdf3c * rename archive Co-authored-by: Jun Liu <Liu.Jun@amd.com>

e05dcb421 perf db validation fix (#68) 260d9465d Add INT8 as a data_type v2 (#67) b6a5b2a77 sync with fin folder in miopen (#62) 0e03399ec prep for Palamida scan (#63) e6bd05c33 Performance db testing (#61) 30d699b9e Perf Eval Update (#60) 3535b948c PerfCompile and PerfEval changes (#59) de79468d2 remove unneccessary solution check, add check for previously modified kernel names (#56) 6924286a2 miopen hash update (#55) 530399575 Refactor googletest infra to align with MIOpen (#53) 71c50d146 Datatype fix for BN (#57) 8abe2f5c6 Perf Eval updates, Add find info (#51) e1c1ef0f5 filter find compile by solver input (#54) 722feea66 sp/chk precomp kernel 264 (#41) b9aba2034 Batch norm find compile (#50) 359f3da80 Fix missing link directives in fin binary (#48) a4020c1ba Cache Miss Fixes (#46) 2ec7ef44d Enable google test and compiling fin in the CI (#47) 8b6b453bc Applicability support for batch norm (#45) 44323aae9 Perf compile/eval for fin (#42) ebd9aa6bd update member name (#43) d6d798efe add cu count (#39) 8e1989a9f Add find option for selecting only dynamic solvers (#38) 0e164bf66 setting json version (#37) f3f7fed18 Remove function redefinition (#36) e1de51a58 Performance DB de-serialize test (#34) 043cdcdaa Layout support in Fin (#33) 3a1d58236 Hotfix (#32) ee3f0d543 4.4 Tuning Bugfixes (#31) 832dbe234 Tunability Reporting (#27) a564a229f include gfx90a_110 (#28) git-subtree-dir: fin git-subtree-split: e05dcb42187f05fe0d0d1b05b822dc4b750f199e

* remove datatype 0,1 from perf_db * rm invalid fp16 entries from pdb * Squashed 'fin/' changes from 53d2563fe..e05dcb421 e05dcb421 perf db validation fix (#68) 260d9465d Add INT8 as a data_type v2 (#67) b6a5b2a77 sync with fin folder in miopen (#62) 0e03399ec prep for Palamida scan (#63) e6bd05c33 Performance db testing (#61) 30d699b9e Perf Eval Update (#60) 3535b948c PerfCompile and PerfEval changes (#59) de79468d2 remove unneccessary solution check, add check for previously modified kernel names (#56) 6924286a2 miopen hash update (#55) 530399575 Refactor googletest infra to align with MIOpen (#53) 71c50d146 Datatype fix for BN (#57) 8abe2f5c6 Perf Eval updates, Add find info (#51) e1c1ef0f5 filter find compile by solver input (#54) 722feea66 sp/chk precomp kernel 264 (#41) b9aba2034 Batch norm find compile (#50) 359f3da80 Fix missing link directives in fin binary (#48) a4020c1ba Cache Miss Fixes (#46) 2ec7ef44d Enable google test and compiling fin in the CI (#47) 8b6b453bc Applicability support for batch norm (#45) 44323aae9 Perf compile/eval for fin (#42) ebd9aa6bd update member name (#43) d6d798efe add cu count (#39) 8e1989a9f Add find option for selecting only dynamic solvers (#38) 0e164bf66 setting json version (#37) f3f7fed18 Remove function redefinition (#36) e1de51a58 Performance DB de-serialize test (#34) 043cdcdaa Layout support in Fin (#33) 3a1d58236 Hotfix (#32) ee3f0d543 4.4 Tuning Bugfixes (#31) 832dbe234 Tunability Reporting (#27) a564a229f include gfx90a_110 (#28) git-subtree-dir: fin git-subtree-split: e05dcb42187f05fe0d0d1b05b822dc4b750f199e * fix clang-format issue Co-authored-by: Jun Liu <Liu.Jun@amd.com>

49e3e3a62 clang format db80b1777 update to using TestPerfCfgParams for pdb validity checks e48a4fd3a format a4f85842c exception for non-tunable solvers in params check d58c42bbd Check params at end of perf tuning (#70) 1a3b47c7b Return status for failed compile commands (#69) d59962752 out_layout -> in_layout 6ba7a8f3f Rename conv_mode to mode (#64) 513a3da1b [bg/LWPTUNA-173] (#65) e05dcb421 perf db validation fix (#68) 260d9465d Add INT8 as a data_type v2 (#67) b6a5b2a77 sync with fin folder in miopen (#62) 0e03399ec prep for Palamida scan (#63) e6bd05c33 Performance db testing (#61) 30d699b9e Perf Eval Update (#60) 3535b948c PerfCompile and PerfEval changes (#59) de79468d2 remove unneccessary solution check, add check for previously modified kernel names (#56) 6924286a2 miopen hash update (#55) 530399575 Refactor googletest infra to align with MIOpen (#53) 71c50d146 Datatype fix for BN (#57) 8abe2f5c6 Perf Eval updates, Add find info (#51) e1c1ef0f5 filter find compile by solver input (#54) 722feea66 sp/chk precomp kernel 264 (#41) b9aba2034 Batch norm find compile (#50) 359f3da80 Fix missing link directives in fin binary (#48) a4020c1ba Cache Miss Fixes (#46) 2ec7ef44d Enable google test and compiling fin in the CI (#47) 8b6b453bc Applicability support for batch norm (#45) 44323aae9 Perf compile/eval for fin (#42) ebd9aa6bd update member name (#43) d6d798efe add cu count (#39) 8e1989a9f Add find option for selecting only dynamic solvers (#38) 0e164bf66 setting json version (#37) f3f7fed18 Remove function redefinition (#36) e1de51a58 Performance DB de-serialize test (#34) 043cdcdaa Layout support in Fin (#33) 3a1d58236 Hotfix (#32) ee3f0d543 4.4 Tuning Bugfixes (#31) 832dbe234 Tunability Reporting (#27) a564a229f include gfx90a_110 (#28) git-subtree-dir: fin git-subtree-split: 49e3e3a62a7cc54adacbeea95680d35f9a4685de

daniellowell closed this as completed May 16, 2019

daniellowell reopened this May 16, 2019

daniellowell closed this as completed Apr 6, 2020

alexandraBara mentioned this issue Sep 11, 2020

Solver generic_search fail: ConvHipImplicitGemmBwdDataV1R1Xdlops and ConvHipImplicitGemmForwardV4R4Xdlops #427

Closed

cderb added a commit that referenced this issue Jun 28, 2024

Fast forward to the public MIOpen develop (#47)

ce70c8b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MIOpen Error: /data/repo/MIOpen/src/hip/handlehip.cpp:70: Memory not available to allocate buffer: 655360000 #47

MIOpen Error: /data/repo/MIOpen/src/hip/handlehip.cpp:70: Memory not available to allocate buffer: 655360000 #47

greatken999 commented Jul 17, 2018

daniellowell commented Jul 17, 2018

greatken999 commented Jul 18, 2018

greatken999 commented Jul 18, 2018

greatken999 commented Jul 18, 2018

greatken999 commented Jul 18, 2018

daniellowell commented Jul 18, 2018

greatken999 commented Jul 19, 2018

daniellowell commented May 16, 2019

greatken999 commented May 23, 2019

MIOpen Error: /data/repo/MIOpen/src/hip/handlehip.cpp:70: Memory not available to allocate buffer: 655360000 #47

MIOpen Error: /data/repo/MIOpen/src/hip/handlehip.cpp:70: Memory not available to allocate buffer: 655360000 #47

Comments

greatken999 commented Jul 17, 2018

daniellowell commented Jul 17, 2018

greatken999 commented Jul 18, 2018

greatken999 commented Jul 18, 2018

greatken999 commented Jul 18, 2018

rocminfo

HSA System Attributes

========== HSA Agents

greatken999 commented Jul 18, 2018

daniellowell commented Jul 18, 2018

greatken999 commented Jul 19, 2018

daniellowell commented May 16, 2019

greatken999 commented May 23, 2019

==========
HSA Agents