Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tensor reorder] Universal tensor transform feature, a fallback of batched transpose kernel #1419

Merged
merged 75 commits into from
Mar 21, 2022

Conversation

aska-0096
Copy link
Collaborator

This kernel support (4-dimensional FP32, FP16, INT8) data layout transform to arbitrary one.

The input tensor is seen as default order with [0, 1, 2, 3], and could be reordered into 23 orders:
[0, 1, 3, 2], [0, 2, 1, 3], [0, 2, 3, 1], [0, 3, 1, 2], [0, 3, 2, 1],
[1, 0, 2, 3], [1, 0, 3, 2], [1, 2, 0, 3], [1, 2, 3, 0], [1, 3, 0, 2], [1, 3, 2, 0],
[2, 0, 1, 3], [2, 0, 3, 1], [2, 1, 0, 3], [2, 1, 3, 0], [2, 3, 0, 1], [2, 3, 1, 0],
[3, 0, 1, 2], [3, 0, 2, 1], [3, 1, 0, 2], [3, 1, 2, 0], [3, 2, 0, 1], [3, 2, 1, 0]

INPUT & OUTPUT tensor reorder commonly used:
NCHW to NCHWc = [0, 2, 3, 1], NHWC to NCHW = [0, 3, 1, 2], NCHW to NHWC =[0, 2, 3, 1], NCHW to NCHWc =[0, 2, 3, 1]
FILTER tensor reorder may used:
kyxc to cyxkc = [2, 1, 0, 3], kcyx to cyxkc = [1, 3, 0, 2]

Among these orders,
[0, 1, 3, 2], [0, 2, 3, 1], [0, 3, 1, 2], [2, 3, 0, 1], [3, 0, 1, 2]
are using batched transpose kernel to achieve higher performance.

Copy link
Contributor

@shaojiewang shaojiewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Do we need add some ctest cases for tensor reorder? @carlushuang
  2. Could you please attach some perf data?

src/hip/general_tensor_reorder_sol.cpp Outdated Show resolved Hide resolved
src/hip/general_tensor_reorder_sol.cpp Outdated Show resolved Hide resolved
src/include/miopen/tensor_reorder_util.hpp Show resolved Hide resolved
@aska-0096
Copy link
Collaborator Author

[tensor_reorder]FP32&FP16_perf_data_gfx908.txt
Here is the FP32&FP16 performance data.

@aska-0096
Copy link
Collaborator Author

@atamazov gentle ping for request review on this PR :)

@junliume
Copy link
Collaborator

Ping reviewers: @shaojiewang @carlushuang @atamazov :)

@junliume
Copy link
Collaborator

@aska-0096 please ping reviewers to push this PR through. Thanks!

@aska-0096
Copy link
Collaborator Author

@shaojiewang @carlushuang @atamazov ping for requesting review

@junliume junliume requested review from shaojiewang and removed request for atamazov March 14, 2022 23:08
@junliume
Copy link
Collaborator

@aska-0096 sorry for the late review, can we add some tests to the functionality?

@aska-0096
Copy link
Collaborator Author

Functional test has been added in test/tensor_reorder.cpp.

@junliume junliume merged commit 3efba70 into develop Mar 21, 2022
Copy link
Contributor

@atamazov atamazov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for delaying the review.

[Future] @DrizztDoUrden Can we align this tensor transform with the Solver/Solution architecture and then "fuse" solutions/invokers, when do you think?

test/tensor_reorder.cpp Show resolved Hide resolved
Comment on lines +524 to +526
run_test<reorder_test<float, miopen::TensorReorderSolution>>();
run_test<reorder_test<uint16_t, miopen::TensorReorderSolution>>();
run_test<reorder_test<uint8_t, miopen::TensorReorderSolution>>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test should not cover all data types at once. It should support the following options: --float (the default), --half, --double, -int8 etc. Please look how test_conv2d is designed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not resolved yet. #1481 says:

Modify CTest design, now only --all flag will conduct all the data type test while the default one is --float

This does not match the design of other tests. Datatype should be controlled in different way, as described in the comment above.

The --all option controls how many test configs should be tested. With it, a reasonably big set of configs is to be tested. Without it, only one. The test should provide a set of options that allows the user to specify a particular config for testing. This is how test_conv*d is designed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can run test like

./bin/test_tensor_reorder --all 
./bin/test_tensor_reorder --double
./bin/test_tensor_reorder --float
./bin/test_tensor_reorder --half
./bin/test_tensor_reorder --int8

To specify the data type we want to test.
In PR #1515

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good, but this won't fully resolve this review comment. Please ask questions if you need more info.

Luckily, this is not very important right now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, conv2d call different data type test driver via get the arguments from command line and the command line arguments are defined in test/CMakeLists.txt using add_custom_test function.
Do you mean we should add this ctest to test/CMakeLists.txt using add_custom_test feature and control the datatype via MIOPEN_TEST_FLOAT_ARG ?

Copy link
Contributor

@atamazov atamazov Apr 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aska-0096 My comment is about the test executable itself.

[Notice] WRT adding custom tests. Please look into test/CMakeLists.txt. You can see there:

file(GLOB TESTS *.cpp)
...
foreach(TEST ${TESTS})
    get_filename_component(BASE_NAME ${TEST} NAME_WE)
    add_test_executable(test_${BASE_NAME} ${TEST})
endforeach()

The above instructs CMake to build all .cpp files into executables and then add the executables to the testing. Options like "--all", "--half" etc are added to the command line of the executable in accordance to the MIOPEN_TEST_ALL, MIOPEN_TEST_HALF and similar CMake variables.

So you do not need to specifically add your new test executable to test/CMakeLists.txt, but you have to ensure that executable accepts and properly handles options like "--all", "--float" etc. For example, "--all" should not affect the data type. Please revise the first two review comments in this thread. Please also look into the source of add_test_executable() cmake function for some details.

We use custom tests for adding specific test cases that not covered by "--all", which can be handy for regression testing. That is why I've asked for options that allows to specify a single test case at #1419 (comment).

}
};

int main()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test should support running a single test case, so we need to add relevant options.

All test cases should be run only when --all is specified.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not resolved in #1481. Please see #1419 (comment)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in PR #1515

Copy link
Contributor

@atamazov atamazov Apr 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's assume this is resolved; #1419 (comment) can be used as a place for further discussion.

src/hip/general_tensor_reorder_sol.cpp Show resolved Hide resolved
src/hip/general_tensor_reorder_sol.cpp Show resolved Hide resolved
src/hip/general_tensor_reorder_sol.cpp Show resolved Hide resolved
src/include/miopen/tensor_reorder_util.hpp Show resolved Hide resolved
@aska-0096
Copy link
Collaborator Author

Sorry for delaying the review.

Thanks for reviewing. I'll solve them as soon as possible.

@atamazov
Copy link
Contributor

@aska-0096 The comments marked with 🔴 are the most important ones.

@atamazov
Copy link
Contributor

atamazov commented Apr 8, 2022

@junliume @aska-0096 Please mark all 🟢 review comments as Resolved (I can't do that), thanks.

@atamazov
Copy link
Contributor

@junliume @aska-0096 Please mark all 🟢 review comments as Resolved (I can't do that), thanks.

Let's do it once more; thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants