Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[great confusion and developer time sink]: Convolution backward passes receive tensor pointers and descriptors in the wrong order #2738

Open
amberhassaan opened this issue Feb 8, 2024 · 3 comments
Assignees

Comments

@amberhassaan
Copy link
Contributor

Let me preface this by saying this has been a source of great confusion and wasting months of developer time in debugging (talking from personal and team experience) code. Not only that, multiple of us keep bumping into this proverbial stump again and again, so this needs to be fixed asap.

The issue: For backward data and backward weights convolutions, we create InvokeParams and ProblemDescriptions such that the x (aka input) and y (aka output) tensor descriptors and pointers are swapped (see src/ocl/convolutionocl.cpp). This creates two kinds of problems:

  1. When invoking CK (composable_kernel) instances, this requires swapping tensor descriptors and pointers. This is because CK (sensibly so) expects input to always be x and output to always be y.
  2. This creates subtle knots and quirks in the design that lead to repeating code, adding ugly hacks and fixes. See implicitgemm_ck_utils.hpp in Update CK-based 2d/3d convolution solvers to support nchw/ncdhw layout #2429 for example. It requires the programmer to remember that x is y and y is x and now he/she needs to take proper mending actions to fix things. Also see test/gtest/group_conv.hpp. So in summary, this stands in the way of good design of future code for MIOpen
@amberhassaan amberhassaan self-assigned this Feb 8, 2024
@amberhassaan
Copy link
Contributor Author

@atamazov , @DrizztDoUrden , @CAHEK7 , @iq136boy , @averinevg : I'd like to open it up for discussion as to how we should go about fixing it. Ideally @JehandadKhan wants it to be staged if possible so that it's not one big PR that touches everything.

@atamazov
Copy link
Contributor

atamazov commented Feb 8, 2024

Yes, the legacy approach to the naming of tensors in convolutions is confusing: for forward direction, x tensor is considered as read-only "input", and y is writable "output"; for backward direction, it is assumed that dy is read-only "input", and x is writable "output".

  • While it may seem logical for some programmers (who invented this stuff), it confuses the most of others ;) And it does not work for WrW convolutions anyway!
  • [Informative] In some old solvers "bottom" and "top" are used (for forward, x is "bottom", y is "top"; for backward, dy is "bottom", dx is "top")
  • New programmers are usually unaware of the legacy naming approach.
  • A large amount of work to refactor with a seemingly small result (which is why the rework of this stuff was constantly postponed)

Proposal: let's use "x", "y" and "w" (and their derivatives, e.g. "dxDesc" etc) everywhere.


Originated from: #2560 (comment)

@DrizztDoUrden
Copy link
Contributor

DrizztDoUrden commented Feb 13, 2024

I do not understand several things.

Convolution backward passes receive tensor pointers and descriptors in the wrong order

What do you mean? dy is an operand (input) of backward operation, dx is result (output). Are they passed the other way around?

When invoking CK (composable_kernel) instances, this requires swapping tensor descriptors and pointers. This is because CK (sensibly so) expects input to always be x and output to always be y.

I am not sure I understood you correctly, but I don't think naming dx (which is result) as input and dy (which is operand) as output is sensible.

This creates subtle knots and quirks in the design that lead to repeating code, adding ugly hacks and fixes.

I don't understand how naming scheme can cause repeating code, adding ugly hacks and fixes.

Proposal: let's use "x", "y" and "w" (and their derivatives, e.g. "dxDesc" etc) everywhere.

Currently DataInvokeParams is universal, i.e. it can be used for both forward and backward operations. What naming scheme would you suggest there? Or having two separate structs with names being the only difference?

Besides that, I agree that it is somewhat strange in general because we treat weights differently in many ways. It has some sense though because it is different from another operand of fwd and bwd. Honestly I don't think we should treat fwd and backward solvers and kernels differently because they are interchangeable anyway.

And besides that all, I am not explicitly arguing against any naming changes as that is not what I really care about. If that would be documented accordingly.

Simply "swapping" input and output field values would be a massive change due to how much solvers use that and how difficult would be to track down every single occurrence and ensure no case is bugged after that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants