-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[great confusion and developer time sink]: Convolution backward passes receive tensor pointers and descriptors in the wrong order #2738
Comments
@atamazov , @DrizztDoUrden , @CAHEK7 , @iq136boy , @averinevg : I'd like to open it up for discussion as to how we should go about fixing it. Ideally @JehandadKhan wants it to be staged if possible so that it's not one big PR that touches everything. |
Yes, the legacy approach to the naming of tensors in convolutions is confusing: for forward direction, x tensor is considered as read-only "input", and y is writable "output"; for backward direction, it is assumed that dy is read-only "input", and x is writable "output".
Proposal: let's use "x", "y" and "w" (and their derivatives, e.g. "dxDesc" etc) everywhere.Originated from: #2560 (comment) |
I do not understand several things.
What do you mean? dy is an operand (input) of backward operation, dx is result (output). Are they passed the other way around?
I am not sure I understood you correctly, but I don't think naming dx (which is result) as input and dy (which is operand) as output is sensible.
I don't understand how naming scheme can cause repeating code, adding ugly hacks and fixes.
Currently DataInvokeParams is universal, i.e. it can be used for both forward and backward operations. What naming scheme would you suggest there? Or having two separate structs with names being the only difference? Besides that, I agree that it is somewhat strange in general because we treat weights differently in many ways. It has some sense though because it is different from another operand of fwd and bwd. Honestly I don't think we should treat fwd and backward solvers and kernels differently because they are interchangeable anyway. And besides that all, I am not explicitly arguing against any naming changes as that is not what I really care about. If that would be documented accordingly. Simply "swapping" input and output field values would be a massive change due to how much solvers use that and how difficult would be to track down every single occurrence and ensure no case is bugged after that. |
Let me preface this by saying this has been a source of great confusion and wasting months of developer time in debugging (talking from personal and team experience) code. Not only that, multiple of us keep bumping into this proverbial stump again and again, so this needs to be fixed asap.
The issue: For backward data and backward weights convolutions, we create
InvokeParams
andProblemDescriptions
such that thex
(akainput
) andy
(akaoutput
) tensor descriptors and pointers are swapped (seesrc/ocl/convolutionocl.cpp
). This creates two kinds of problems:input
to always bex
and output to always bey
.implicitgemm_ck_utils.hpp
in Update CK-based 2d/3d convolution solvers to support nchw/ncdhw layout #2429 for example. It requires the programmer to remember thatx
isy
andy
isx
and now he/she needs to take proper mending actions to fix things. Also seetest/gtest/group_conv.hpp
. So in summary, this stands in the way of good design of future code for MIOpenThe text was updated successfully, but these errors were encountered: