[great confusion and developer time sink]: Convolution backward passes receive tensor pointers and descriptors in the wrong order #2738

amberhassaan · 2024-02-08T18:01:35Z

Let me preface this by saying this has been a source of great confusion and wasting months of developer time in debugging (talking from personal and team experience) code. Not only that, multiple of us keep bumping into this proverbial stump again and again, so this needs to be fixed asap.

The issue: For backward data and backward weights convolutions, we create InvokeParams and ProblemDescriptions such that the x (aka input) and y (aka output) tensor descriptors and pointers are swapped (see src/ocl/convolutionocl.cpp). This creates two kinds of problems:

When invoking CK (composable_kernel) instances, this requires swapping tensor descriptors and pointers. This is because CK (sensibly so) expects input to always be x and output to always be y.
This creates subtle knots and quirks in the design that lead to repeating code, adding ugly hacks and fixes. See implicitgemm_ck_utils.hpp in Update CK-based 2d/3d convolution solvers to support nchw/ncdhw layout #2429 for example. It requires the programmer to remember that x is y and y is x and now he/she needs to take proper mending actions to fix things. Also see test/gtest/group_conv.hpp. So in summary, this stands in the way of good design of future code for MIOpen

The text was updated successfully, but these errors were encountered:

amberhassaan · 2024-02-08T18:03:21Z

@atamazov , @DrizztDoUrden , @CAHEK7 , @iq136boy , @averinevg : I'd like to open it up for discussion as to how we should go about fixing it. Ideally @JehandadKhan wants it to be staged if possible so that it's not one big PR that touches everything.

atamazov · 2024-02-08T21:56:20Z

Yes, the legacy approach to the naming of tensors in convolutions is confusing: for forward direction, x tensor is considered as read-only "input", and y is writable "output"; for backward direction, it is assumed that dy is read-only "input", and x is writable "output".

While it may seem logical for some programmers (who invented this stuff), it confuses the most of others ;) And it does not work for WrW convolutions anyway!
[Informative] In some old solvers "bottom" and "top" are used (for forward, x is "bottom", y is "top"; for backward, dy is "bottom", dx is "top")
New programmers are usually unaware of the legacy naming approach.
A large amount of work to refactor with a seemingly small result (which is why the rework of this stuff was constantly postponed)

Proposal: let's use "x", "y" and "w" (and their derivatives, e.g. "dxDesc" etc) everywhere.

Originated from: #2560 (comment)

DrizztDoUrden · 2024-02-13T21:52:59Z

I do not understand several things.

Convolution backward passes receive tensor pointers and descriptors in the wrong order

What do you mean? dy is an operand (input) of backward operation, dx is result (output). Are they passed the other way around?

When invoking CK (composable_kernel) instances, this requires swapping tensor descriptors and pointers. This is because CK (sensibly so) expects input to always be x and output to always be y.

I am not sure I understood you correctly, but I don't think naming dx (which is result) as input and dy (which is operand) as output is sensible.

This creates subtle knots and quirks in the design that lead to repeating code, adding ugly hacks and fixes.

I don't understand how naming scheme can cause repeating code, adding ugly hacks and fixes.

Proposal: let's use "x", "y" and "w" (and their derivatives, e.g. "dxDesc" etc) everywhere.

Currently DataInvokeParams is universal, i.e. it can be used for both forward and backward operations. What naming scheme would you suggest there? Or having two separate structs with names being the only difference?

Besides that, I agree that it is somewhat strange in general because we treat weights differently in many ways. It has some sense though because it is different from another operand of fwd and bwd. Honestly I don't think we should treat fwd and backward solvers and kernels differently because they are interchangeable anyway.

And besides that all, I am not explicitly arguing against any naming changes as that is not what I really care about. If that would be documented accordingly.

Simply "swapping" input and output field values would be a massive change due to how much solvers use that and how difficult would be to track down every single occurrence and ensure no case is bugged after that.

amberhassaan self-assigned this Feb 8, 2024

amberhassaan mentioned this issue Mar 26, 2024

[MHA] Implement convolution descriptors in graph (backend) API #2792

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[great confusion and developer time sink]: Convolution backward passes receive tensor pointers and descriptors in the wrong order #2738

[great confusion and developer time sink]: Convolution backward passes receive tensor pointers and descriptors in the wrong order #2738

amberhassaan commented Feb 8, 2024

amberhassaan commented Feb 8, 2024

atamazov commented Feb 8, 2024

DrizztDoUrden commented Feb 13, 2024 •

edited

Loading

[great confusion and developer time sink]: Convolution backward passes receive tensor pointers and descriptors in the wrong order #2738

[great confusion and developer time sink]: Convolution backward passes receive tensor pointers and descriptors in the wrong order #2738

Comments

amberhassaan commented Feb 8, 2024

amberhassaan commented Feb 8, 2024

atamazov commented Feb 8, 2024

Proposal: let's use "x", "y" and "w" (and their derivatives, e.g. "dxDesc" etc) everywhere.

DrizztDoUrden commented Feb 13, 2024 • edited Loading

DrizztDoUrden commented Feb 13, 2024 •

edited

Loading