Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support exclude null_policy for collect list/set in groupby #8044

Merged
merged 9 commits into from
May 13, 2021

Conversation

sperlingxx
Copy link
Contributor

This pull request is to support null_policy::EXCLUDE for collect_list/collect_set in groupBy context, which is requested in #7777.

@sperlingxx sperlingxx requested a review from a team as a code owner April 23, 2021 08:01
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Apr 23, 2021
@sperlingxx sperlingxx added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 23, 2021
@codecov
Copy link

codecov bot commented Apr 23, 2021

Codecov Report

Merging #8044 (ba83f2d) into branch-0.20 (51336df) will decrease coverage by 0.00%.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.20    #8044      +/-   ##
===============================================
- Coverage        82.88%   82.88%   -0.01%     
===============================================
  Files              103      104       +1     
  Lines            17668    17907     +239     
===============================================
+ Hits             14645    14843     +198     
- Misses            3023     3064      +41     
Impacted Files Coverage Δ
python/cudf/cudf/core/tools/datetimes.py 80.42% <0.00%> (-4.11%) ⬇️
python/cudf/cudf/core/column/decimal.py 90.20% <0.00%> (-2.72%) ⬇️
python/cudf/cudf/core/column/datetime.py 88.03% <0.00%> (-1.88%) ⬇️
python/cudf/cudf/core/column/struct.py 94.73% <0.00%> (-1.56%) ⬇️
python/cudf/cudf/utils/dtypes.py 82.20% <0.00%> (-1.24%) ⬇️
python/dask_cudf/dask_cudf/groupby.py 91.28% <0.00%> (-0.88%) ⬇️
python/cudf/cudf/core/series.py 91.17% <0.00%> (-0.56%) ⬇️
python/cudf/cudf/core/index.py 92.52% <0.00%> (-0.55%) ⬇️
python/cudf/cudf/core/column/column.py 88.20% <0.00%> (-0.44%) ⬇️
python/cudf/cudf/core/column/lists.py 86.98% <0.00%> (-0.43%) ⬇️
... and 28 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 188b630...ba83f2d. Read the comment docs.

cpp/src/groupby/sort/group_collect.cu Outdated Show resolved Hide resolved
cpp/src/groupby/sort/group_collect.cu Show resolved Hide resolved
cpp/tests/groupby/collect_set_test.cpp Outdated Show resolved Hide resolved
Signed-off-by: sperlingxx <lovedreamf@gmail.com>
@sperlingxx
Copy link
Contributor Author

rebuild

@sperlingxx
Copy link
Contributor Author

@gpucibot build

@sperlingxx
Copy link
Contributor Author

build

cpp/src/groupby/sort/group_collect.cu Outdated Show resolved Hide resolved
Signed-off-by: sperlingxx <lovedreamf@gmail.com>
cpp/src/groupby/sort/group_collect.cu Outdated Show resolved Hide resolved
@sameerz sameerz linked an issue Apr 28, 2021 that may be closed by this pull request
Signed-off-by: sperlingxx <lovedreamf@gmail.com>
@mythrocks
Copy link
Contributor

@sperlingxx, this is looking good. Could you please resolve the conflicts in collect_set_test and group_collect_test?

Copy link
Contributor

@mythrocks mythrocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm +1 on the changes so far. We just need to resolve conflicts in a couple of files.

@sperlingxx
Copy link
Contributor Author

I'm +1 on the changes so far. We just need to resolve conflicts in a couple of files.

Resolved.

@mythrocks
Copy link
Contributor

Sorry, @sperlingxx. There's another conflict with aggregate.cpp. Likely a result of the aggregation operator refactor.

@sperlingxx
Copy link
Contributor Author

Sorry, @sperlingxx. There's another conflict with aggregate.cpp. Likely a result of the aggregation operator refactor.

Also fixed.

@sperlingxx sperlingxx requested a review from mythrocks May 11, 2021 01:38
cpp/src/groupby/sort/group_collect.cu Outdated Show resolved Hide resolved
cpp/src/groupby/sort/group_reductions.hpp Outdated Show resolved Hide resolved
cpp/src/groupby/sort/group_collect.cu Outdated Show resolved Hide resolved
cpp/tests/groupby/collect_list_tests.cpp Outdated Show resolved Hide resolved
cpp/tests/groupby/collect_list_tests.cpp Outdated Show resolved Hide resolved
cpp/src/groupby/sort/group_collect.cu Outdated Show resolved Hide resolved
cpp/src/groupby/sort/group_collect.cu Outdated Show resolved Hide resolved
Co-authored-by: David Wendt <45795991+davidwendt@users.noreply.github.com>
Copy link
Contributor

@mythrocks mythrocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work, @sperlingxx. Thanks for incorporating the review comments.

@mythrocks
Copy link
Contributor

@gpucibot merge

@rapids-bot rapids-bot bot merged commit f8d7de4 into rapidsai:branch-0.20 May 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] support null-excluded policy for collect aggregations in groupby
4 participants