Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MIOpen Error: /data/repo/MIOpen/src/hip/handlehip.cpp:70: Memory not available to allocate buffer: 655360000 #47

Closed
greatken999 opened this issue Jul 17, 2018 · 9 comments

Comments

@greatken999
Copy link

2018-07-17 09:05:55.622488: I tensorflow/core/kernels/conv_ops.cc:670] running auto-tune for Convolve
MIOpen Error: /data/repo/MIOpen/src/hip/handlehip.cpp:70: Memory not available to allocate buffer: 655360000
2018-07-17 09:05:55.622826: F tensorflow/stream_executor/rocm/rocm_dnn.cc:1603] Check failed: status == miopenStatusSuccess (7 vs. 0)Unable to find a suitable algorithm for doing forward convolution

@daniellowell
Copy link
Contributor

Looks like it is trying to allocate 655MB of memory which is not available. Can you run this test using the environment variable set:

MIOPEN_LOG_LEVEL=6

It will help us see what the configuration looks like. Also, the above message is not enough for us to understand what is going on. What is your system environment and total allocations for the model you're running?

@greatken999
Copy link
Author

export MIOPEN_LOG_LEVEL=6
(asrtspeechenv) ken@ken-B250M-D3H:/media/ken/3b9999c7-6235-4b04-b006-0ca0b26ded281/data1/ai/ASRT_SpeechRecognition$ python3 train_mspeech.py
Using TensorFlow backend.
2018-07-18 14:38:49.807364: W tensorflow/stream_executor/rocm/rocm_driver.cc:405] creating context when one is currently active; existing: 0x7ff672f144f0
2018-07-18 14:38:49.807438: I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] Found device 0 with properties:
name: Device 687f
AMDGPU ISA: gfx900
memoryClockRate (GHz) 1.63
pciBusID 0000:03:00.0
Total memory: 7.98GiB
Free memory: 7.73GiB
2018-07-18 14:38:49.807450: I tensorflow/core/common_runtime/gpu/gpu_device.cc:928] DMA: 0
2018-07-18 14:38:49.807455: I tensorflow/core/common_runtime/gpu/gpu_device.cc:938] 0: Y
2018-07-18 14:38:49.807460: I tensorflow/core/common_runtime/gpu/gpu_device.cc:996] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Device 687f, pci bus id: 0000:03:00.0)
[*提示] 创建模型成功,模型编译成功
[running] train epoch 0 .
[message] epoch 0 . Have train datas 0+
Epoch 1/1
2018-07-18 14:38:52.752720: I tensorflow/core/kernels/conv_ops.cc:670] running auto-tune for Convolve
MIOpen Error: /data/repo/MIOpen/src/hip/handlehip.cpp:70: Memory not available to allocate buffer: 655360000
2018-07-18 14:38:52.753049: F tensorflow/stream_executor/rocm/rocm_dnn.cc:1603] Check failed: status == miopenStatusSuccess (7 vs. 0)Unable to find a suitable algorithm for doing forward convolution
已放弃 (核心已转储)

@greatken999
Copy link
Author

it 's look no more details when export MIOPEN_LOG_LEVEL=6.
hipconfig info:
HIP version : 1.5.18151

== hipconfig
HIP_PATH : /opt/rocm/hip
HIP_PLATFORM : hcc
CPP_CONFIG : -D__HIP_PLATFORM_HCC__= -I/opt/rocm/hip/include -I/opt/rocm/hcc/include

== hcc
HSA_PATH : /opt/rocm/hsa
HCC_HOME : /opt/rocm/hcc
HCC clang version 7.0.0 (ssh://gerritgit/compute/ec/hcc-tot/clang 86791fc4961dc8ffde77bde20d7dfa5e5cbeff5e) (ssh://gerritgit/compute/ec/hcc-tot/llvm 0ccef158132e1222d549edf2da33d4bc0be6c2d1) (based on HCC 1.2.18184-74f5fa9-86791fc-0ccef15 )
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm/hcc/bin
LLVM (http://llvm.org/):
LLVM version 7.0.0svn
Optimized build.
Default target: x86_64-unknown-linux-gnu
Host CPU: skylake

Registered Targets:
amdgcn - AMD GCN GPUs
r600 - AMD GPUs HD2XXX-HD6XXX
x86 - 32-bit X86: Pentium-Pro and above
x86-64 - 64-bit X86: EM64T and AMD64
HCC-cxxflags : -hc -std=c++amp -I/opt/rocm/hcc/includeHCC-ldflags : -hc -std=c++amp -L/opt/rocm/hcc/lib -Wl,--rpath=/opt/rocm/hcc/lib -ldl -lm -lpthread -lhc_am -Wl,--whole-archive -lmcwamp -Wl,--no-whole-archive

=== Environment Variables
PATH=/media/ken/3b9999c7-6235-4b04-b006-0ca0b26ded281/data1/asrtspeechenv/bin:/opt/rocm/hcc/bin:/opt/rocm/hip/bin:/home/ken/bin:/home/ken/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/opt/rocm/bin/
LD_LIBRARY_PATH=/opt/rocm/lib/
HIP_PATH=/opt/rocm/hip
HCC_HOME=/opt/rocm/hcc

== Linux Kernel
Hostname : ken-B250M-D3H
Linux ken-B250M-D3H 4.13.0-45-generic #50~16.04.1-Ubuntu SMP Wed May 30 11:18:27 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.4 LTS
Release: 16.04
Codename: xenial

@greatken999
Copy link
Author

rocminfo

HSA System Attributes

Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (number of timestamp)
Machine Model: LARGE
System Endianness: LITTLE

==========
HSA Agents


Agent 1


Name: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0
Queue Min Size: 0
Queue Max Size: 0
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768KB
Chip ID: 0
Cacheline Size: 64
Max Clock Frequency (MHz):3800
BDFID: 0
Compute Unit: 4
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 32899292KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Acessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 32899292KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Acessible by all: TRUE
ISA Info:
N/A


Agent 2


Name: gfx900
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128
Queue Min Size: 4096
Queue Max Size: 131072
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16KB
Chip ID: 26751
Cacheline Size: 64
Max Clock Frequency (MHz):1630
BDFID: 768
Compute Unit: 64
Features: KERNEL_DISPATCH
Fast F16 Operation: FALSE
Wavefront Size: 64
Workgroup Max Size: 1024
Workgroup Max Size Per Dimension:
Dim[0]: 67109888
Dim[1]: 50332672
Dim[2]: 604110848
Grid Max Size: 4294967295
Waves Per CU: 40
Max Work-item Per CU: 2560
Grid Max Size per Dimension:
Dim[0]: 4294967295
Dim[1]: 4294967295
Dim[2]: 4294967295
Max number Of fbarriers Per Workgroup:32
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 8372224KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Acessible by all: FALSE
Pool 2
Segment: GROUP
Size: 64KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Acessible by all: FALSE
ISA Info:
ISA 1
Name: AMD:AMDGPU:9:0:0
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Dimension:
Dim[0]: 67109888
Dim[1]: 1024
Dim[2]: 16777217
Workgroup Max Size: 1024
Grid Max Dimension:
x 4294967295
y 4294967295
z 4294967295
Grid Max Size: 4294967295
FBarrier Max Size: 32
*** Done ***

@greatken999
Copy link
Author

rocm_bandwidth_test
......
....

      RocmBandwidthTest Version: 1.0.0

      Device: 0,  Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
      Device: 1,  Device 687f

      Device Access

      D/D       0         1         

      0         1         1         

      1         1         1         


      Device Numa Distance

      D/D       0         1         

      0         0         N/A       

      1         0         0         


      Unidirectional peak bandwidth GB/s

      D/D       0           1           

      0         N/A         13.915766   

      1         14.088893   394.403061  


      Bdirectional peak bandwidth GB/s

      D/D       0           1           

      0         N/A         15.290195   

      1         15.624503   N/A         

@daniellowell
Copy link
Contributor

Epoch` 1/1
2018-07-18 14:38:52.752720: I tensorflow/core/kernels/conv_ops.cc:670] running auto-tune for Convolve
MIOpen Error: /data/repo/MIOpen/src/hip/handlehip.cpp:70: Memory not available to allocate buffer: 655360000
2018-07-18 14:38:52.753049: F tensorflow/stream_executor/rocm/rocm_dnn.cc:1603] Check failed: status == miopenStatusSuccess (7 vs. 0)Unable to find a suitable algorithm for doing forward convolution

Seems like you're simply running out of memory, however let's try one more thing, can you rerun it using this environment variable:
MIOPEN_ENABLE_LOGGING=1

@greatken999
Copy link
Author

thanks for your help! @daniellowell
export MIOPEN_ENABLE_LOGGING=1
(asrtspeechenv) ken@ken-B250M-D3H:/media/ken/3b9999c7-6235-4b04-b006-0ca0b26ded281/data1/ai/ASRT_SpeechRecognition$ python3 train_mspeech.py
Using TensorFlow backend.
2018-07-19 14:48:25.069862: W tensorflow/stream_executor/rocm/rocm_driver.cc:405] creating context when one is currently active; existing: 0x7f12e54dfa70
2018-07-19 14:48:25.069964: I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] Found device 0 with properties:
name: Device 687f
AMDGPU ISA: gfx900
memoryClockRate (GHz) 1.63
pciBusID 0000:03:00.0
Total memory: 7.98GiB
Free memory: 7.73GiB
2018-07-19 14:48:25.069976: I tensorflow/core/common_runtime/gpu/gpu_device.cc:928] DMA: 0
2018-07-19 14:48:25.069981: I tensorflow/core/common_runtime/gpu/gpu_device.cc:938] 0: Y
2018-07-19 14:48:25.069987: I tensorflow/core/common_runtime/gpu/gpu_device.cc:996] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Device 687f, pci bus id: 0000:03:00.0)
[*提示] 创建模型成功,模型编译成功
[running] train epoch 0 .
[message] epoch 0 . Have train datas 0+
Epoch 1/1
2018-07-19 14:48:27.635339: I tensorflow/core/kernels/conv_ops.cc:670] running auto-tune for Convolve
MIOpen(HIP): miopenStatus_t miopenCreateTensorDescriptor(miopenTensorDescriptor_t *){
tensorDesc = 0
}
MIOpen(HIP): miopenStatus_t miopenSet4dTensorDescriptor(miopenTensorDescriptor_t, miopenDataType_t, int, int, int, int){
tensorDesc =
dataType = 1
n = 16
c = 1
h = 1600
w = 200
}
MIOpen(HIP): miopenStatus_t miopenCreateTensorDescriptor(miopenTensorDescriptor_t *){
tensorDesc = 0
}
MIOpen(HIP): miopenStatus_t miopenSet4dTensorDescriptor(miopenTensorDescriptor_t, miopenDataType_t, int, int, int, int){
tensorDesc =
dataType = 1
n = 16
c = 32
h = 1600
w = 200
}
MIOpen(HIP): miopenStatus_t miopenCreateTensorDescriptor(miopenTensorDescriptor_t *){
tensorDesc = 0
}
MIOpen(HIP): miopenStatus_t miopenSet4dTensorDescriptor(miopenTensorDescriptor_t, miopenDataType_t, int, int, int, int){
tensorDesc =
dataType = 1
n = 32
c = 1
h = 3
w = 3
}
MIOpen(HIP): miopenStatus_t miopenCreateConvolutionDescriptor(miopenConvolutionDescriptor_t *){
convDesc = 0
}
MIOpen(HIP): miopenStatus_t miopenInitConvolutionDescriptor(miopenConvolutionDescriptor_t, miopenConvolutionMode_t, int, int, int, int, int, int){
convDesc = 0, 0, 1, 1, 1, 1,
c_mode = 0
pad_h = 1
pad_w = 1
u = 1
v = 1
dilation_h = 1
dilation_w = 1
}
MIOpen(HIP): miopenStatus_t miopenConvolutionForwardGetWorkSpaceSize(miopenHandle_t, const miopenTensorDescriptor_t, const miopenTensorDescriptor_t, const miopenConvolutionDescriptor_t, const miopenTensorDescriptor_t, size_t *){
wDesc = 32, 1, 3, 3
yDesc = 16, 32, 1600, 200
convDesc = 1, 1, 1, 1, 1, 1,
workSpaceSize = 14471916849344069120
}
MIOpen(HIP): miopenStatus_t miopenFindConvolutionForwardAlgorithm(miopenHandle_t, const miopenTensorDescriptor_t, const void *, const miopenTensorDescriptor_t, const void *, const miopenConvolutionDescriptor_t, const miopenTensorDescriptor_t, void *, const int, int *, miopenConvAlgoPerf_t *, void *, size_t, bool){
xDesc = 16, 1, 1600, 200
x = 0x909575200
wDesc = 32, 1, 3, 3
w = 0x908573600
convDesc = 1, 1, 1, 1, 1, 1,
yDesc = 16, 32, 1600, 200
y = 0x932542600
requestAlgoCount = 1
returnedAlgoCount = -4176939
perfResults =
workSpace = 0x959642600
workSpaceSize = 11520000
exhaustiveSearch = 0
}
MIOpen Error: /data/repo/MIOpen/src/hip/handlehip.cpp:70: Memory not available to allocate buffer: 655360000
2018-07-19 14:48:27.636525: F tensorflow/stream_executor/rocm/rocm_dnn.cc:1603] Check failed: status == miopenStatusSuccess (7 vs. 0)Unable to find a suitable algorithm for doing forward convolution

@daniellowell
Copy link
Contributor

@greatken999 Can you try this on the current software stack.

@greatken999
Copy link
Author

@daniellowell ,sorry ,my vega64 had hangup problem now .

ltqin pushed a commit that referenced this issue Oct 28, 2021
646fcc268 Merge pull request #47 from ROCmSoftwarePlatform/develop
6014185ac [Bug Fix] GridwiseGemm_bk0mk1_bk0nk1_mn_xdlops_v2r4 loop issue (#44)
3e9113707 Merge pull request #46 from ROCmSoftwarePlatform/miopen_downstream_all
211dae822 Merge branch 'develop' into miopen_downstream_all
5890e3007 [Composable Kernel] update develop branch code to ck_upstream
d5297abae fix bug in gridwise gemm xdlops v2r3 (#45)
38a90b6ed Merge pull request #43 from ROCmSoftwarePlatform/develop
c3018794b bug fix (#39)
fd49ff808 add nchw atomic , nhwc and nhwc atomic method   for backward weight (#30)
b2dc55f82 [MIOpen Downstream] Fix Reduction Kernel (#34)
b3e8d57d5 Tweak GEMM kernel (#38)
846f462bd Add VectorType support into StaticBuffer (#27)
dfb80c4e3 [Enhancements] Several bugfixes and refactoring of dynamic generic reduction  (#1156)
8557901d0 Merge pull request #1165 from ROCmSoftwarePlatform/develop
f305bebdc Merge pull request #31 from ROCmSoftwarePlatform/miopen_downstream-dynamic_reduction_pr
b725e3fc8 Merge remote-tracking branch 'origin/develop' into miopen_downstream-dynamic_reduction_pr
88833bd9a Merge pull request #32 from ROCmSoftwarePlatform/develop
df0d68106 :Merge remote-tracking branch 'origin/develop' into CK_upstream
f3acd2510 Add  a version of Merge transform that use integerdivision and mod (#25)
19613902b GEMM driver and kernel (#29)
627d8ef35 Backward weight v4r4r2 with xdlops (#18)
10bb81106 Misc fixes (#24)
9e80cdceb [SWDEV-281541][MSRCHA-100] Implementation of Dynamic Generic Reduction  (#1108)
a7a758d8c GlobalAtomicAdd for fp32/int32 (#23)
9d3f634a3 Xdlops refactor fix (#22)
c6f26bb48 magic division use __umulhi() (#19)
6fe3627a9 Composable kernel init integration v3 (#1097)
a2ad6d353 refactor dynamic xdlops iGemm (#13)
ba6f79a75 Added host_conv_wrw for verification (#15)

git-subtree-dir: src/composable_kernel
git-subtree-split: 646fcc268ede841a16cdaafb68aa64803d8390e1
cderb added a commit that referenced this issue May 16, 2022
722feea66 sp/chk precomp kernel 264 (#41)
b9aba2034 Batch norm find compile (#50)
359f3da80 Fix missing link directives in fin binary (#48)
a4020c1ba Cache Miss Fixes (#46)
2ec7ef44d Enable google test and compiling fin in the CI (#47)
8b6b453bc Applicability support for batch norm (#45)
44323aae9 Perf compile/eval for fin (#42)
REVERT: a30a51bc6 remove unused header
REVERT: 7d2fd834c reduce scope of variable
REVERT: f6e9abe79 clang format
REVERT: 834e9a397 remove comment
REVERT: c8d6eb1a0 workspace rename
REVERT: aa7d2ea24 Merge remote-tracking branch 'origin/develop' into cderb/miopen_perf
REVERT: aaf13fb12 add to print for debug
REVERT: 34e11fa70 Merge remote-tracking branch 'origin/develop' into cderb/miopen_perf
REVERT: cb6c19d13 add search+update directives to execution context, add json examples for perf eval
REVERT: 85029077b connecting new fin functions for perf eval
REVERT: 4d1e031fd add outputs and definitions
REVERT: 952538cb8 adding perf eval function, in progress
REVERT: 617dccd9c rename
REVERT: 5c35ae886 fixes for collecting kernel blobs
REVERT: 5cfea7c43 syntax fixes
REVERT: 2f2a4ed9f add test file
REVERT: 7175019f5 first rendition of perf_compile

git-subtree-dir: fin
git-subtree-split: 722feea660e2e3d7f8e1edcc520a938be4885a44
cderb added a commit that referenced this issue Aug 3, 2022
30d699b9e Perf Eval Update (#60)
3535b948c PerfCompile and PerfEval changes (#59)
de79468d2 remove unneccessary solution check, add check for previously modified kernel names (#56)
6924286a2 miopen hash update (#55)
530399575 Refactor googletest infra to align with MIOpen (#53)
71c50d146 Datatype fix for BN (#57)
8abe2f5c6 Perf Eval updates, Add find info (#51)
e1c1ef0f5 filter find compile by solver input (#54)
722feea66 sp/chk precomp kernel 264 (#41)
b9aba2034 Batch norm find compile (#50)
359f3da80 Fix missing link directives in fin binary (#48)
a4020c1ba Cache Miss Fixes (#46)
2ec7ef44d Enable google test and compiling fin in the CI (#47)
8b6b453bc Applicability support for batch norm (#45)
44323aae9 Perf compile/eval for fin (#42)
ebd9aa6bd update member name (#43)
d6d798efe add cu count (#39)
8e1989a9f Add find option for selecting only dynamic solvers (#38)
0e164bf66 setting json version (#37)
f3f7fed18 Remove function redefinition (#36)
e1de51a58 Performance DB de-serialize test (#34)
043cdcdaa Layout support in Fin (#33)
3a1d58236 Hotfix (#32)
ee3f0d543 4.4 Tuning Bugfixes (#31)
832dbe234 Tunability Reporting (#27)
a564a229f include gfx90a_110 (#28)

git-subtree-dir: fin
git-subtree-split: 30d699b9edc014c6076a9649f849bd3c4588d4ab
averinevg pushed a commit that referenced this issue Aug 19, 2022
* add perf cfg validity test to TestSysDbRecord

* remove debug prints

* removing invalid entries from all perf dbs

* VACUUM sqlite

* Squashed 'fin/' changes from 53d2563fe..30d699b9e

30d699b9e Perf Eval Update (#60)
3535b948c PerfCompile and PerfEval changes (#59)
de79468d2 remove unneccessary solution check, add check for previously modified kernel names (#56)
6924286a2 miopen hash update (#55)
530399575 Refactor googletest infra to align with MIOpen (#53)
71c50d146 Datatype fix for BN (#57)
8abe2f5c6 Perf Eval updates, Add find info (#51)
e1c1ef0f5 filter find compile by solver input (#54)
722feea66 sp/chk precomp kernel 264 (#41)
b9aba2034 Batch norm find compile (#50)
359f3da80 Fix missing link directives in fin binary (#48)
a4020c1ba Cache Miss Fixes (#46)
2ec7ef44d Enable google test and compiling fin in the CI (#47)
8b6b453bc Applicability support for batch norm (#45)
44323aae9 Perf compile/eval for fin (#42)
ebd9aa6bd update member name (#43)
d6d798efe add cu count (#39)
8e1989a9f Add find option for selecting only dynamic solvers (#38)
0e164bf66 setting json version (#37)
f3f7fed18 Remove function redefinition (#36)
e1de51a58 Performance DB de-serialize test (#34)
043cdcdaa Layout support in Fin (#33)
3a1d58236 Hotfix (#32)
ee3f0d543 4.4 Tuning Bugfixes (#31)
832dbe234 Tunability Reporting (#27)
a564a229f include gfx90a_110 (#28)

git-subtree-dir: fin
git-subtree-split: 30d699b9edc014c6076a9649f849bd3c4588d4ab

* Squashed 'fin/' changes from 30d699b9e..ea5c844af

ea5c844af fix direction test
3aa412ee1 Update to use revised testSysDbRecord miopen function

git-subtree-dir: fin
git-subtree-split: ea5c844aff8b5d46537aa59034a596fd15cd9e1e

* rename pipe step

* Squashed 'fin/' changes from ea5c844af..c702cb968

c702cb968 format

git-subtree-dir: fin
git-subtree-split: c702cb96800a03b17ee17d03a015dfa38e3883b9

* Squashed 'fin/' changes from c702cb968..d5397abd3

d5397abd3 rename targets

git-subtree-dir: fin
git-subtree-split: d5397abd37b6908bcd96ef750ea5a3ace04cdf3c

* rename archive

Co-authored-by: Jun Liu <Liu.Jun@amd.com>
cderb added a commit that referenced this issue Oct 5, 2022
e05dcb421 perf db validation fix (#68)
260d9465d Add INT8 as a data_type v2 (#67)
b6a5b2a77 sync with fin folder in miopen (#62)
0e03399ec prep for Palamida scan (#63)
e6bd05c33 Performance db testing (#61)
30d699b9e Perf Eval Update (#60)
3535b948c PerfCompile and PerfEval changes (#59)
de79468d2 remove unneccessary solution check, add check for previously modified kernel names (#56)
6924286a2 miopen hash update (#55)
530399575 Refactor googletest infra to align with MIOpen (#53)
71c50d146 Datatype fix for BN (#57)
8abe2f5c6 Perf Eval updates, Add find info (#51)
e1c1ef0f5 filter find compile by solver input (#54)
722feea66 sp/chk precomp kernel 264 (#41)
b9aba2034 Batch norm find compile (#50)
359f3da80 Fix missing link directives in fin binary (#48)
a4020c1ba Cache Miss Fixes (#46)
2ec7ef44d Enable google test and compiling fin in the CI (#47)
8b6b453bc Applicability support for batch norm (#45)
44323aae9 Perf compile/eval for fin (#42)
ebd9aa6bd update member name (#43)
d6d798efe add cu count (#39)
8e1989a9f Add find option for selecting only dynamic solvers (#38)
0e164bf66 setting json version (#37)
f3f7fed18 Remove function redefinition (#36)
e1de51a58 Performance DB de-serialize test (#34)
043cdcdaa Layout support in Fin (#33)
3a1d58236 Hotfix (#32)
ee3f0d543 4.4 Tuning Bugfixes (#31)
832dbe234 Tunability Reporting (#27)
a564a229f include gfx90a_110 (#28)

git-subtree-dir: fin
git-subtree-split: e05dcb42187f05fe0d0d1b05b822dc4b750f199e
junliume added a commit that referenced this issue Oct 6, 2022
* remove datatype 0,1 from perf_db

* rm invalid fp16 entries from pdb

* Squashed 'fin/' changes from 53d2563fe..e05dcb421

e05dcb421 perf db validation fix (#68)
260d9465d Add INT8 as a data_type v2 (#67)
b6a5b2a77 sync with fin folder in miopen (#62)
0e03399ec prep for Palamida scan (#63)
e6bd05c33 Performance db testing (#61)
30d699b9e Perf Eval Update (#60)
3535b948c PerfCompile and PerfEval changes (#59)
de79468d2 remove unneccessary solution check, add check for previously modified kernel names (#56)
6924286a2 miopen hash update (#55)
530399575 Refactor googletest infra to align with MIOpen (#53)
71c50d146 Datatype fix for BN (#57)
8abe2f5c6 Perf Eval updates, Add find info (#51)
e1c1ef0f5 filter find compile by solver input (#54)
722feea66 sp/chk precomp kernel 264 (#41)
b9aba2034 Batch norm find compile (#50)
359f3da80 Fix missing link directives in fin binary (#48)
a4020c1ba Cache Miss Fixes (#46)
2ec7ef44d Enable google test and compiling fin in the CI (#47)
8b6b453bc Applicability support for batch norm (#45)
44323aae9 Perf compile/eval for fin (#42)
ebd9aa6bd update member name (#43)
d6d798efe add cu count (#39)
8e1989a9f Add find option for selecting only dynamic solvers (#38)
0e164bf66 setting json version (#37)
f3f7fed18 Remove function redefinition (#36)
e1de51a58 Performance DB de-serialize test (#34)
043cdcdaa Layout support in Fin (#33)
3a1d58236 Hotfix (#32)
ee3f0d543 4.4 Tuning Bugfixes (#31)
832dbe234 Tunability Reporting (#27)
a564a229f include gfx90a_110 (#28)

git-subtree-dir: fin
git-subtree-split: e05dcb42187f05fe0d0d1b05b822dc4b750f199e

* fix clang-format issue

Co-authored-by: Jun Liu <Liu.Jun@amd.com>
cderb added a commit that referenced this issue Nov 21, 2022
49e3e3a62 clang format
db80b1777 update to using TestPerfCfgParams for pdb validity checks
e48a4fd3a format
a4f85842c exception for non-tunable solvers in params check
d58c42bbd Check params at end of perf tuning (#70)
1a3b47c7b Return status for failed compile commands (#69)
d59962752 out_layout -> in_layout
6ba7a8f3f Rename conv_mode to mode (#64)
513a3da1b [bg/LWPTUNA-173] (#65)
e05dcb421 perf db validation fix (#68)
260d9465d Add INT8 as a data_type v2 (#67)
b6a5b2a77 sync with fin folder in miopen (#62)
0e03399ec prep for Palamida scan (#63)
e6bd05c33 Performance db testing (#61)
30d699b9e Perf Eval Update (#60)
3535b948c PerfCompile and PerfEval changes (#59)
de79468d2 remove unneccessary solution check, add check for previously modified kernel names (#56)
6924286a2 miopen hash update (#55)
530399575 Refactor googletest infra to align with MIOpen (#53)
71c50d146 Datatype fix for BN (#57)
8abe2f5c6 Perf Eval updates, Add find info (#51)
e1c1ef0f5 filter find compile by solver input (#54)
722feea66 sp/chk precomp kernel 264 (#41)
b9aba2034 Batch norm find compile (#50)
359f3da80 Fix missing link directives in fin binary (#48)
a4020c1ba Cache Miss Fixes (#46)
2ec7ef44d Enable google test and compiling fin in the CI (#47)
8b6b453bc Applicability support for batch norm (#45)
44323aae9 Perf compile/eval for fin (#42)
ebd9aa6bd update member name (#43)
d6d798efe add cu count (#39)
8e1989a9f Add find option for selecting only dynamic solvers (#38)
0e164bf66 setting json version (#37)
f3f7fed18 Remove function redefinition (#36)
e1de51a58 Performance DB de-serialize test (#34)
043cdcdaa Layout support in Fin (#33)
3a1d58236 Hotfix (#32)
ee3f0d543 4.4 Tuning Bugfixes (#31)
832dbe234 Tunability Reporting (#27)
a564a229f include gfx90a_110 (#28)

git-subtree-dir: fin
git-subtree-split: 49e3e3a62a7cc54adacbeea95680d35f9a4685de
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants