[NHWC] add gpu reference kernel for nhwc #728

carlushuang · 2021-02-04T14:28:40Z

support nhwc/ndhwc, fwd/bwd/wrw, fp32/fp16/bf16, group conv, for reference kernel.
fix MIOpenDriver to have correct result of cpu NHWC convolution.
refactor LayoutToStrides() into test/tensor_layout.hpp, to let driver/test code both can call this function
dump layout info when MIOPEN_ENABLE_LOGGING_CMD=1, if currently is not default NCHW layout

Testing

Basically all test is in test/gpu_reference_kernel.cpp, this contains cases with a combination of these conv problems with a little bit of randomness to speedup ctest process, and contains NCHW/NHWC, 2d/3d, fp32/fp16/bf16 cases.

driver/conv_driver.hpp

src/convolution_api.cpp

JehandadKhan · 2021-02-08T11:24:49Z

@asroy Can I bother you to review the kernel ?

src/problem_description.cpp

atamazov · 2021-02-08T17:34:50Z

driver/conv_driver.hpp

-            warmup_wei = tensor<warmup_Tgpu>(miopen::deref(warmupWeightTensor).GetLengths());
-            warmup_out = tensor<warmup_Tgpu>(miopen::deref(warmupOutputTensor).GetLengths());
+            warmup_in = tensor<warmup_Tgpu>(miopen::deref(warmupInputTensor).GetLengths(),
+                                            miopen::deref(warmupInputTensor).GetStrides());


Optimization for speed?

Nope... this is indeed the way we pass in layout information into tensor type, by giving the stride information when call constructor.

Ah, I see.

[Recommendation] Revert changes for warmup_* tensors. These are always NCHW, and strides can be initialized implicitly.

jerryyin · 2021-02-09T15:17:43Z

I did a test run for this branch:

MIOPEN_FIND_MODE=1 MIOPEN_ENABLE_LOGGING=1 MIOPEN_LOG_LEVEL=6 ./bin/MIOpenDriver convfp16 -n 256 -c 128 -H 28 -W 28 -k 512 -y 1 -x 1 -p 0 -q 0 -u 1 -v 1 -l 1 -j 1 --in_layout NHWC --out_layout NHWC --fil_layout NHWC -m conv -g 1 -F 4 -t 1

The result is:

MIOpen Backward Weights Conv. Algorithm: 0, Solution: 33/gemm
GPU Kernel Time Backward Weights Conv. Elapsed: 10.103986 ms (average)
stats: name, n, c, ho, wo, x, y, k, flopCnt, bytesRead, bytesWritten, GFLOPs, GB/s, timeMs
stats: bwdw-conv1x1u1, 256, 128, 28, 28, 1, 1, 512, 26306674688, 0, 0, 2604, 0, 10.103986
Backward Convolution Weights Verifies OK on CPU reference (0.00277143)

Only when I add the MIOPEN_DEBUG_CONV_GEMM=0 that I can see Solution: 87/ConvDIrectNaiveConvWrw.

My question is: why is gemm still selected when I didn't disable it explicitly. I'd assume that either cases, only naive convolution kernel being picked.

carlushuang · 2021-02-09T15:44:00Z

I did a test run for this branch:

MIOPEN_FIND_MODE=1 MIOPEN_ENABLE_LOGGING=1 MIOPEN_LOG_LEVEL=6 ./bin/MIOpenDriver convfp16 -n 256 -c 128 -H 28 -W 28 -k 512 -y 1 -x 1 -p 0 -q 0 -u 1 -v 1 -l 1 -j 1 --in_layout NHWC --out_layout NHWC --fil_layout NHWC -m conv -g 1 -F 4 -t 1

The result is:

MIOpen Backward Weights Conv. Algorithm: 0, Solution: 33/gemm
GPU Kernel Time Backward Weights Conv. Elapsed: 10.103986 ms (average)
stats: name, n, c, ho, wo, x, y, k, flopCnt, bytesRead, bytesWritten, GFLOPs, GB/s, timeMs
stats: bwdw-conv1x1u1, 256, 128, 28, 28, 1, 1, 512, 26306674688, 0, 0, 2604, 0, 10.103986
Backward Convolution Weights Verifies OK on CPU reference (0.00277143)

Only when I add the MIOPEN_DEBUG_CONV_GEMM=0 that I can see Solution: 87/ConvDIrectNaiveConvWrw.

My question is: why is gemm still selected when I didn't disable it explicitly. I'd assume that either cases, only naive convolution kernel being picked.

I guess gemm maybe already support NHWC? by some sort of NHWC2NCHW? Anyway, this naive implementation is not performance optimized, so speed might be very slow

jerryyin · 2021-02-09T18:10:31Z

I guess gemm maybe already support NHWC? by some sort of NHWC2NCHW?

I don't think so. As far as I remember, gemm is hard-coded to support NCHW. What this means is that MIOpen profiles the available algorithms, finding the gemm is faster (naturally), and therefore using gemm instead of naive direct convolution. In this situation, execution will give wrong results.

I think what this implies is that you will need to disable gemm in NHWC mode in order not to return wrong results. I will leave it for MIOpen developers to decide whether this needs to be addressed in this PR or file an issue to address it later (in case it got forgotten). For now looks like a short term alternative is to use the macro to disable gemm.

carlushuang · 2021-02-10T13:28:52Z

Let's first open an issue for gemm NHWC #742

atamazov · 2021-02-10T13:31:27Z

@jerryyin @carlushuang

I don't think so. As far as I remember, gemm is hard-coded to support NCHW.

But how it passed validation then?

carlushuang · 2021-02-10T14:08:21Z

@atamazov I think it is due to computation error. I tested a fwd case, and computation error is larger than default nrms, while @jerryyin 's case may happen to be within default nrms.

 ./bin/MIOpenDriver  conv -n 256 -c 128 -H 28 -W 28 -k 512 -y 1 -x 1 -p 0 -q 0 -u 1 -v 1 -l 1 -j 1 --in_layout NHWC --out_layout NHWC --fil_layout NHWC -m conv -g 1 -F 1 -t 1
MIOpenDriver conv -n 256 -c 128 -H 28 -W 28 -k 512 -y 1 -x 1 -p 0 -q 0 -u 1 -v 1 -l 1 -j 1 --in_layout NHWC --out_layout NHWC --fil_layout NHWC -m conv -g 1 -F 1 -t 1
MIOpen Forward Conv. Algorithm: 0, Solution: 33/gemm
GPU Kernel Time Forward Conv. Elapsed: 1.145352 ms (average)
stats: name, n, c, ho, wo, x, y, k, flopCnt, bytesRead, bytesWritten, GFLOPs, GB/s, timeMs
stats: fwd-conv1x1u1, 256, 128, 28, 28, 1, 1, 512,  26306674688, 103022592, 411041792, 22968, 449, 1.145352
Forward Convolution Failed: 0.28159 > 1.5e-06

asroy · 2021-02-12T15:06:33Z

@carlushuang Could you elaborate on what tests you have done for these reference kernels?

carlushuang · 2021-02-12T15:25:20Z

@asroy basically all test is in test/gpu_reference_kernel.cpp, this contains cases with a combination of these conv problems with a little bit of randomness to speedup ctest process, and contains NCHW/NHWC, 2d/3d, fp32/fp16/bf16 cases.

atamazov

Some minor comments. Otherwise LGTM.

atamazov · 2021-02-19T18:38:57Z

src/solver/conv_direct_naive_conv_bwd.cpp

+    kernel.kernel_name = ConvDirectNaiveConvKernelName(ctx);
+    kernel.g_wk.clear();
+
+    kernel.g_wk.push_back(grid_size * block_size);


[Notice] If there is possibility that grid_size * block_size exceeds INT_MAX, please consider using size_t. g_wk and l_wk in KernelInfo are vectors of size_t.

OK, let me change this

src/solver/conv_direct_naive_conv_bwd.cpp

src/solver/conv_direct_naive_conv_fwd.cpp

src/solver/conv_direct_naive_conv_wrw.cpp

atamazov · 2021-02-19T18:48:26Z

PR description updated with testing info from #728 (comment)

atamazov · 2021-02-24T16:47:31Z

src/solver/conv_direct_naive_conv_wrw.cpp

-    int block_size = 256;
-    int grid_size  = k;
+    size_t block_size = 256;
+    size_t grid_size  = static_cast<size_t>(k);


[Note] FYI static cast is not needed here. No need to fix.

atamazov

LGTM

atamazov · 2021-03-03T23:20:49Z

Converted to draft to avoid spoiling docker image cache on CI. Please change this back to "normal" PR only after ending of CI moratorium (see mail). Then first merge from develop. Push the [Ready for review] button only after that. Thanks.

atamazov · 2021-03-16T00:44:23Z

@carlushuang Please first merge from develop, then re-run CI, otherwise your job will never pass. CI job stopped.

atamazov · 2021-03-16T00:46:07Z

@jerryyin Is this PR ready to be merged? There is a blocking review from you.

carlushuang · 2021-03-16T07:39:49Z

@atamazov OK let me merge develop

atamazov · 2021-03-18T22:54:34Z

Gagarin: Off we go! 🚀

atamazov · 2023-02-23T19:18:52Z

@carlushuang Please look at #1532 (comment), thanks!

atamazov · 2024-03-21T16:23:15Z

test/tensor_layout.hpp

🐛 This PR creates a copy of tensor_layout.hpp from ./src/include/miopen, which is wrong as this creates implicit dependence and error-prone. This immediate bug is that this header uses THE SAME guard as the original: GUARD_TENSOR_LAYOUT_HPP.

update nhwc naive_conv

35586c8

carlushuang requested a review from atamazov February 4, 2021 14:29

add nhwc reference kernel test

95b12dd

carlushuang requested a review from jerryyin February 4, 2021 14:49

fix tidy

723176e

This comment has been minimized.

Sign in to view

jerryyin requested changes Feb 5, 2021

View reviewed changes

driver/conv_driver.hpp Show resolved Hide resolved

Merge remote-tracking branch 'origin/develop' into reference_kernel_nhwc

1fce36d

carlushuang requested a review from jerryyin February 6, 2021 14:48

support layout string printout when MIOPEN_ENABLE_LOGGING_CMD=1

83dc135

JehandadKhan reviewed Feb 8, 2021

View reviewed changes

src/convolution_api.cpp Show resolved Hide resolved

JehandadKhan added complexity_high value_high labels Feb 8, 2021

atamazov reviewed Feb 8, 2021

View reviewed changes

carlushuang mentioned this pull request Feb 10, 2021

[NHWC] gemm algorithm not support NHWC #742

Closed

atamazov mentioned this pull request Feb 11, 2021

[NHWC] Disable GEMM for non-NCHW layouts #744

Closed

carlushuang requested a review from atamazov February 18, 2021 02:03

atamazov suggested changes Feb 19, 2021

View reviewed changes

carlushuang added 2 commits February 20, 2021 10:19

Merge remote-tracking branch 'origin/develop' into reference_kernel_nhwc

7878ce6

size_t for block/grid size, remove useless log

20e99c4

JehandadKhan requested review from asroy and atamazov February 24, 2021 14:35

atamazov reviewed Feb 24, 2021

View reviewed changes

atamazov approved these changes Feb 24, 2021

View reviewed changes

asroy approved these changes Feb 25, 2021

View reviewed changes

carlushuang mentioned this pull request Mar 3, 2021

[NHWC] support nhwc in ctest #765

Merged

atamazov marked this pull request as draft March 3, 2021 23:20

carlushuang marked this pull request as ready for review March 15, 2021 23:55

jerryyin approved these changes Mar 16, 2021

View reviewed changes

Merge remote-tracking branch 'origin/develop' into reference_kernel_nhwc

c8805be

carlushuang requested a review from atamazov March 18, 2021 08:16

atamazov merged commit 1c72099 into develop Mar 18, 2021

carlushuang deleted the reference_kernel_nhwc branch April 9, 2021 00:21

atamazov mentioned this pull request Feb 23, 2023

[API] NCHWc tensor layout support #1532

Merged

atamazov reviewed Mar 21, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NHWC] add gpu reference kernel for nhwc #728

[NHWC] add gpu reference kernel for nhwc #728

carlushuang commented Feb 4, 2021 •

edited by atamazov

Loading

This comment has been minimized.

JehandadKhan commented Feb 8, 2021

atamazov Feb 8, 2021

carlushuang Feb 9, 2021

atamazov Feb 12, 2021

jerryyin commented Feb 9, 2021

carlushuang commented Feb 9, 2021

jerryyin commented Feb 9, 2021

carlushuang commented Feb 10, 2021

atamazov commented Feb 10, 2021

carlushuang commented Feb 10, 2021

asroy commented Feb 12, 2021

carlushuang commented Feb 12, 2021 •

edited

Loading

atamazov left a comment

atamazov Feb 19, 2021

carlushuang Feb 20, 2021

atamazov Mar 16, 2021

atamazov commented Feb 19, 2021

atamazov Feb 24, 2021

atamazov left a comment

atamazov commented Mar 3, 2021 •

edited

Loading

atamazov commented Mar 16, 2021

atamazov commented Mar 16, 2021 •

edited

Loading

carlushuang commented Mar 16, 2021

atamazov commented Mar 18, 2021

atamazov commented Feb 23, 2023

atamazov Mar 21, 2024

[NHWC] add gpu reference kernel for nhwc #728

[NHWC] add gpu reference kernel for nhwc #728

Conversation

carlushuang commented Feb 4, 2021 • edited by atamazov Loading

Testing

This comment has been minimized.

JehandadKhan commented Feb 8, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jerryyin commented Feb 9, 2021

carlushuang commented Feb 9, 2021

jerryyin commented Feb 9, 2021

carlushuang commented Feb 10, 2021

atamazov commented Feb 10, 2021

carlushuang commented Feb 10, 2021

asroy commented Feb 12, 2021

carlushuang commented Feb 12, 2021 • edited Loading

atamazov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

atamazov commented Feb 19, 2021

Choose a reason for hiding this comment

atamazov left a comment

Choose a reason for hiding this comment

atamazov commented Mar 3, 2021 • edited Loading

atamazov commented Mar 16, 2021

atamazov commented Mar 16, 2021 • edited Loading

carlushuang commented Mar 16, 2021

atamazov commented Mar 18, 2021

atamazov commented Feb 23, 2023

Choose a reason for hiding this comment

carlushuang commented Feb 4, 2021 •

edited by atamazov

Loading

carlushuang commented Feb 12, 2021 •

edited

Loading

atamazov commented Mar 3, 2021 •

edited

Loading

atamazov commented Mar 16, 2021 •

edited

Loading