Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support collect_set on rolling window #7881

Merged
merged 14 commits into from
May 26, 2021

Conversation

sperlingxx
Copy link
Contributor

This pull request is to support collect_set on rolling window, which is required in #7809.

@sperlingxx sperlingxx requested a review from a team as a code owner April 7, 2021 10:13
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Apr 7, 2021
@sperlingxx sperlingxx added 3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change and removed libcudf Affects libcudf (C++/CUDA) code. labels Apr 7, 2021
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Apr 7, 2021
cpp/src/rolling/rolling_detail.cuh Outdated Show resolved Hide resolved
@mythrocks
Copy link
Contributor

mythrocks commented Apr 7, 2021

Hey, there, @sperlingxx. This PR should be against branch-0.20, not branch-0.19. You have the option of picking these commits into a branch-0.20 checkout, and overwriting this branch.

Also, pardon me for pointing this out: The reason we postponed making this change was that we are piling on if-else checks in the SFINAE code. @nvdbaranec is in the middle of sorting out that refactor.

@sperlingxx sperlingxx requested a review from a team as a code owner April 19, 2021 10:07
@github-actions github-actions bot added the CMake CMake build issue label Apr 19, 2021
@sperlingxx sperlingxx changed the base branch from branch-0.19 to branch-0.20 April 19, 2021 10:08
@codecov
Copy link

codecov bot commented Apr 19, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.06@dd5eecd). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head e146913 differs from pull request most recent head dca2bb6. Consider uploading reports for the commit dca2bb6 to get more accurate results
Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.06    #7881   +/-   ##
===============================================
  Coverage                ?   82.89%           
===============================================
  Files                   ?      105           
  Lines                   ?    17875           
  Branches                ?        0           
===============================================
  Hits                    ?    14817           
  Misses                  ?     3058           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dd5eecd...dca2bb6. Read the comment docs.

@sperlingxx
Copy link
Contributor Author

sperlingxx commented May 10, 2021

Hi @ttnghia @mythrocks, I apologize for triggering re-requesting review by mistake. I didn't realize the refactor involving multiple PRs until I found #8158.

@mythrocks
Copy link
Contributor

No worries, sir. We will prioritize this as soon as #8158 is in.

Signed-off-by: sperlingxx <lovedreamf@gmail.com>
Signed-off-by: sperlingxx <lovedreamf@gmail.com>
Signed-off-by: sperlingxx <lovedreamf@gmail.com>
Signed-off-by: sperlingxx <lovedreamf@gmail.com>
@ttnghia
Copy link
Contributor

ttnghia commented May 25, 2021

Rerun tests.

min_periods,
agg._null_handling,
stream,
mr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collect_list will not be returned, thus we will not use mr. Instead, please use rmm::mr::get_current_device_resource().

Copy link
Contributor Author

@sperlingxx sperlingxx May 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, my bad! I always forget that. Shall we keep the stream, or replacing it with rmm::cuda_stream_default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All func calls should use the same stream so we will not change that.

@@ -581,7 +581,7 @@ class collect_list_aggregation final : public rolling_aggregation {
/**
* @brief Derived aggregation class for specifying COLLECT_SET aggregation
*/
class collect_set_aggregation final : public aggregation {
class collect_set_aggregation final : public rolling_aggregation {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we inherit collect set from rolling_aggregation instead of aggregation? We need it for both rolling window and groupby, don't we?

Copy link
Contributor Author

@sperlingxx sperlingxx May 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because rolling_aggregation is virtually inherited from aggregation. I just followed corresponding codes for collect_list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe both are wrong, as collect_set_aggregation and collect_set_aggregation are used not only in rolling window but in groupby. Can you change to:

class collect_list_aggregation final : public aggregation 
...
class collect_set_aggregation final : public aggregation 

and test if they can compile and unit tests all pass please?

Copy link
Contributor Author

@sperlingxx sperlingxx May 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ttnghia, I tried on replacing rolling_aggregation with aggregation. And I got compiling error on src/aggregation/aggregation.cpp:454:

/home/alfredxu/workspace/codes/cudf/cpp/src/aggregation/aggregation.cpp:454:74: error: could not convert ‘std::make_unique(_Args&& ...) [with _Tp = cudf::detail::collect_list_aggregation; _Args = {cudf::null_policy&}; typename std::_MakeUniq<_Tp>::__single_object = std::unique_ptr<cudf::detail::collect_list_aggregation, std::default_delete<cudf::detail::collect_list_aggregation> >]()’ from ‘unique_ptr<cudf::detail::collect_list_aggregation,default_delete<cudf::detail::collect_list_aggregation>>’ to ‘unique_ptr<cudf::rolling_aggregation,default_delete<cudf::rolling_aggregation>>’

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// Factory to create a COLLECT_LIST aggregation
template <typename Base = aggregation>
std::unique_ptr<Base> make_collect_list_aggregation(null_policy null_handling)
{
  return std::make_unique<detail::collect_list_aggregation>(null_handling);
}
template std::unique_ptr<aggregation> make_collect_list_aggregation<aggregation>(
  null_policy null_handling);
template std::unique_ptr<rolling_aggregation> make_collect_list_aggregation<rolling_aggregation>(
  null_policy null_handling);

I think it is because we can not return std::unique_ptr<rolling_aggregation> after we made the replacement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thanks 😄

Signed-off-by: sperlingxx <lovedreamf@gmail.com>
Copy link
Contributor

@ttnghia ttnghia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Now we have other blocker PRs (#8342 and rapidsai/dask-cuda#623). Wait for them merged first before we can rerun tests.

@sperlingxx
Copy link
Contributor Author

rerun tests

@ttnghia
Copy link
Contributor

ttnghia commented May 26, 2021

Rerun tests.

@ttnghia
Copy link
Contributor

ttnghia commented May 26, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit be05a00 into rapidsai:branch-21.06 May 26, 2021
@sperlingxx sperlingxx deleted the collect_set_rolling branch May 26, 2021 22:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team CMake CMake build issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants