Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add node2vec wrapper to cugraph #2093

Merged

Conversation

betochimas
Copy link
Contributor

Helps close #2058, when combined with #2085. Requires #2085 to be merged before merging, as the cugraph implementation requires the pylibcugraph node2vec wrapper

@betochimas betochimas requested a review from a team as a code owner February 23, 2022 18:24
@betochimas betochimas added the feature request New feature or request label Feb 23, 2022
@betochimas betochimas added this to the 22.04 milestone Feb 23, 2022
@betochimas betochimas added the non-breaking Non-breaking change label Feb 23, 2022
@codecov-commenter
Copy link

codecov-commenter commented Feb 23, 2022

Codecov Report

Merging #2093 (dacbba8) into branch-22.04 (ab72ed5) will increase coverage by 0.29%.
The diff coverage is 94.35%.

Impacted file tree graph

@@               Coverage Diff                @@
##           branch-22.04    #2093      +/-   ##
================================================
+ Coverage         73.70%   73.99%   +0.29%     
================================================
  Files               155      157       +2     
  Lines             10373    10496     +123     
================================================
+ Hits               7645     7767     +122     
- Misses             2728     2729       +1     
Impacted Files Coverage Δ
python/cugraph/cugraph/sampling/random_walks.py 96.55% <ø> (ø)
python/cugraph/cugraph/sampling/node2vec.py 91.11% <91.11%> (ø)
python/cugraph/cugraph/tests/test_node2vec.py 96.10% <96.10%> (ø)
python/cugraph/cugraph/__init__.py 100.00% <100.00%> (ø)
python/cugraph/cugraph/sampling/__init__.py 100.00% <100.00%> (ø)
python/cugraph/cugraph/structure/graph_classes.py 78.53% <0.00%> (+0.52%) ⬆️
.../cugraph/tests/test_edge_betweenness_centrality.py 84.93% <0.00%> (+0.60%) ⬆️
...graph/cugraph/tests/test_betweenness_centrality.py 83.01% <0.00%> (+0.62%) ⬆️
python/cugraph/cugraph/tests/test_graph_store.py 100.00% <0.00%> (+1.56%) ⬆️
python/cugraph/cugraph/tests/test_utils.py 75.55% <0.00%> (+4.44%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ab72ed5...dacbba8. Read the comment docs.

Fetching from 22.04 after random walk bug was merged
Fetching from 22.04 after random walk bug was merged
@betochimas
Copy link
Contributor Author

rerun tests

Automatically clone raft when the raft pinned tag changes (rapidsai#2087)
@cjnolet
Copy link
Member

cjnolet commented Feb 25, 2022

rerun tests

# are contained in Simple_1
_test_data = {"karate.csv": {
"seeds": cp.asarray([0, 0], dtype=np.int32),
"paths": cp.asarray([0, 8, 33, 29, 26, 0, 1, 3, 13, 33],
Copy link
Contributor

@jnke2016 jnke2016 Mar 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good for now but I am a little concern with this approach of testing because of the randomness of the results.

  1. did you make sure there are only 2 paths with depth 5 starting from seed 0?
  2. Even if it is the case, the testing seems hardcoded and only a few seeds are tested.

An alternative approach could be to pick random seeds and verify the accuracy of the paths by checking the predecessors. To illustrate this, let's take the first path
[0, 8, 33, 29, 26]. you verify that 0 is a predecessor of 8, then 8 being a predecessor of 33 and so on... but I am not sure how to easily do this without involving cugraph.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach of testing was also in the cugraph level, but yes when this was done I did not check that the walks taken were in fact valid, just putting placeholder values and focusing on making sure the start of each path was each of the seeds. I've since made the testing more robust on the cugraph side, but for pylibcugraph it's more complicated, I'm still figuring that out right now but I can include a FIXME or add an issue to revisit later

@betochimas
Copy link
Contributor Author

rerun tests

Copy link
Contributor

@acostadon acostadon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job on this. Testing especially.

@betochimas
Copy link
Contributor Author

rerun tests

Copy link
Contributor

@rlratzel rlratzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good, thanks. I have a few comments below based on what I found so far. Aside from that, I'd also like to look into the renumbering comments Joseph posted so I may have a follow-up review.

python/cugraph/cugraph/sampling/node2vec.py Outdated Show resolved Hide resolved
python/cugraph/cugraph/sampling/node2vec.py Outdated Show resolved Hide resolved
python/cugraph/cugraph/sampling/node2vec.py Outdated Show resolved Hide resolved
python/cugraph/cugraph/tests/test_node2vec.py Outdated Show resolved Hide resolved
python/pylibcugraph/pylibcugraph/node2vec.pyx Outdated Show resolved Hide resolved
@rlratzel rlratzel mentioned this pull request Mar 10, 2022
7 tasks
Copy link
Contributor

@rlratzel rlratzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed one other minor change to make, plus verifying the renumbering questions, then it looks ready to approve.

python/cugraph/cugraph/tests/test_node2vec.py Outdated Show resolved Hide resolved
@betochimas betochimas requested a review from a team as a code owner March 11, 2022 17:52
Copy link
Contributor

@jnke2016 jnke2016 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@BradReesWork
Copy link
Member

@gpucibot merge

@rapids-bot rapids-bot bot merged commit fab8f18 into rapidsai:branch-22.04 Mar 11, 2022
rapids-bot bot pushed a commit that referenced this pull request Mar 17, 2022
Merged PR #2093 requires `pylibcugraph`. This PR adds `pylibcugraph` as a run dependency to the `cugraph` conda package.

Authors:
  - Rick Ratzel (https://github.com/rlratzel)

Approvers:
  - Jordan Jacobelli (https://github.com/Ethyling)

URL: #2121
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENH] Add node2vec API to cugraph
8 participants