Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Vertex betweenness centrality #3160

Conversation

ChuckHastings
Copy link
Collaborator

Implements Vertex Betweenness, part of #2647. The core of this algorithm will also address Edge Betweenness, but the return values are different. Edge betweenness will be finished in a separate PR.

We use a parallel variation of the Brandes Algorithm (2001) to compute exact or approximate betweenness, implemented using the graph primitives.

@ChuckHastings ChuckHastings requested review from a team as code owners January 19, 2023 21:06
@ChuckHastings ChuckHastings self-assigned this Jan 19, 2023
@ChuckHastings ChuckHastings added 3 - Ready for Review improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jan 19, 2023
@ChuckHastings ChuckHastings added this to the 23.02 milestone Jan 19, 2023
@codecov-commenter
Copy link

codecov-commenter commented Jan 20, 2023

Codecov Report

❗ No coverage uploaded for pull request base (branch-23.04@6f47f45). Click here to learn what that means.
Patch has no changes to coverable lines.

Additional details and impacted files
@@               Coverage Diff               @@
##             branch-23.04    #3160   +/-   ##
===============================================
  Coverage                ?   56.26%           
===============================================
  Files                   ?      153           
  Lines                   ?     9658           
  Branches                ?        0           
===============================================
  Hits                    ?     5434           
  Misses                  ?     4224           
  Partials                ?        0           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Contributor

@seunghwak seunghwak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review part 1 (I haven't reviewed the actual betweenness centrality implementation yet).

raft::random::RngState& rng_state,
vertex_t num_vertices,
vertex_t select_count,
bool with_replacement);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is assuming that vertex IDs are consecutive integers starting from 0. This won't work once we start to support vertex masking. To be future proof, should we better take a graph_view object and call number_of_vertices() inside? Then, once we start to support vertex masking, we just need to update the function internals, and users don't need to make change in their side.

Copy link
Collaborator Author

@ChuckHastings ChuckHastings Jan 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll work on this. I hadn't thought about that particular use case, I had been thinking it might be better to pass in the graph, but didn't really have a good use case to motivate it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The legacy implementation of betweenness has a similar python based function which I am obviously going to substitute with the CAPI one from sampling_algorithms.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in next push

* @param [in] vertex_list Optionally specify a device array containing a list of vertices
* to use as seeds for BFS
* to use as seeds for betweenness approximation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better be betweenness centrality approximation or centrality approximation? I am not sure what 'betweenness` means on its own.

* @param [in] vertex_list Optionally specify a device array containing a list of vertices
* to use as seeds for BFS
* to use as seeds for traversals
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better match the documentation in vertex betweenness centrality?

In the above, we're saying to use as seeds for betweenness approximation.

Users may be confused that vertex_list may have different roles in vertex & edge betweenness centrality.


if constexpr (multi_gpu) {
local_vertices = cugraph::detail::shuffle_ext_vertices_to_local_gpu_by_vertex_partitioning(
handle_, std::move(local_vertices));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to double check, aren't the vertex IDs in local_vertices still internal at this point?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I copied this from test code where the vertices generated were external. I will consider the best way to fix this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in next push. Vertices are shuffled as internal vertices... and then unrenumbered to external vertices before being returned.

rmm::device_uvector<vertex_t> result(select_count, handle.get_stream());

if (select_count == num_vertices) {
detail::sequence_fill(handle.get_stream(), result.data(), result.size(), vertex_t{0});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is valid only when with_replacement = false.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. The only current use is with_replacement == false, I added the parameter for completeness. I'll rework this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update implementation in next push to support with_replacement == true also.

#include <cugraph-ops/graph/sampling.hpp>
#endif

namespace cugraph {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May better rename this file to select_random_vertices.cpp unless you have a plan to add other non-random vertex selection functions?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed in next push. With the API change (adding the graph_view), added the _impl.hpp, _sg.cpp and _mg.cpp files instead of a single file.


template <typename vertex_t>
struct brandes_e_op_t {
const vertex_t max_vertex_t{std::numeric_limits<vertex_t>::max()};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to add _t at the end? First seeing max_vertex_t in the code below, I mistakenly thought it is a type name. Better rename this variable to avoid confusion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually invalid_distance, right? Better name this as invalid_distance?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change this. the _t was a typo, but I think invalid_distance is better.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in next push

detail::scalar_fill(handle, centralities.data(), centralities.size(), weight_t{0});

size_t num_sources = thrust::distance(vertices_begin, vertices_end);
std::vector<size_t> source_starts{{0, num_sources}};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't source_starts here a misnomer? I guess source_starts.back() isn't a starting displacement but marks the end.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to source_offsets. This is the offsets array of a CSR-like structure identifying the sources by GPU.

At this point we either have a list of seed vertices that we want to use for computing approximate betweenness, or we have a list of all vertices for computing exact betweenness. The frontier bucket needs to be populated only on the GPU where the vertex exists, this provides us a mechanism - on each iteration of the outer loop - to populate the frontier properly.

// a tagged frontier (tagging each source with itself). Then we can
// expand from multiple sources concurrently. The challenge is managing
// the memory explosion.
//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it might not be feasible to take the entire set of sources at once especially for power-law graphs. We may take N vertices in each function call where N might be a small number for small diameter graphs and larger number for large diameter graphs (how to cheaply find a graph diameter is another question). Maybe a topic for future research.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely an area for future research. I had some ideas on optimizations (algorithmic changes, not tuning primitives), but definitely more research is required.


template <typename value_t, typename ignore_t>
__device__ thrust::optional<value_t> operator()(
vertex_t, vertex_t, value_t src_property, vertex_t dst_property, ignore_t) const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we better rename src_property to src_sigma and dst_property to dst_distance?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in next push

constexpr int bucket_idx_next{1};

rmm::device_uvector<vertex_t> sigma(graph_view.local_vertex_partition_range_size(),
handle.get_stream());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a guarantee that vertex_t won't overflow here? sigma here is a total number of shortest paths that go through a vertex v, and this shouldn't overflow if there is only one shortest path from s to t, but if there are multiple shortest paths between vertex pairs, this can theoretically overflow, right? (the odd that this will happen in reality sounds pretty low, but we cannot make trade-offs on correctness).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will need to spend some time in the code to ponder the rest of your comments in this file. This might need to be an edge_t, I'll work through the code and verify.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that vertex_t here is appropriate.

The brandes_bfs function will be computing distance and sigma from a single source vertex. In this case, sigma will accumulate how many times a vertex v appears on an s-t path. Since s is fixed for each call to brandes_bfs, there will be at most number_of_vertices() paths that will be explored in a single call to brandes_bfs. The value is consumed in the accumulate_vertex_results function (converted into a partial score) and reinitialized when we start from a new source.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there will be at most number_of_vertices() paths=>Is this true with multi-edges as well?

What happens if there are two vertices 0 and 1 and trillions of multi-edges between vertex 0 & 1?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, a multi-graph changes that dynamic. I'll change these to edge_t.

dst_properties.view(),
cugraph::edge_dummy_property_t{}.view(),
[d] __device__(auto, auto, auto src_props, auto dst_props, auto) {
if ((thrust::get<0>(dst_props) == d) && (thrust::get<0>(src_props) == (d - 1))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we can make this much more efficient if per_v_transform_reduce_outgoing_e takes a vertex list and we provide a vertex list with the distance == (d-1), right?

Then, we should be able to finish accumulate_vertex_results in O(E) instead of O(E) * diameter. This should be much easier (and I also have a plan to update multiple primitives to take a vertex list, this is necessary for Leiden as well) than implementing ``efficient transform_reduce_v_frontier_incoming_e_by_src on an untransposed graph, which does not currently exist''.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

For extra O(V) memory, we could keep the frontiers for each distance as we go through the loop, and then use them as input here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, then could you update the FIXME comment above accordingly?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated FIXME

vertex_t, vertex_t, value_t src_property, vertex_t dst_property, ignore_t) const
{
thrust::optional<value_t> result{thrust::nullopt};
return (dst_property < max_vertex_t) ? result : thrust::make_optional(src_property);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess return (dst_distance == invalid_distance)? thrust::make_optional(src_sigma) : thrust::nullopt will be more readable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in next push

[hop] __device__(auto v, auto old_values, auto v_sigma) {
vertex_t next_distance = thrust::get<0>(old_values) == invalid_distance
? (hop + 1)
: thrust::get<0>(old_values);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check necessary? We're accumulating dst only when dst_distance == invalid_distance. Did you find any case thrust::get<0>(old_values) != invalid_distance?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in next push. Not necessary.

diameter = host_scalar_allreduce(
handle.get_comms(), diameter, raft::comms::op_t::MAX, handle.get_stream());
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use transform_reduce_v to merge the two (assuming that distance.size() == graph_view.local_vertex_partition_range_size()).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in next push.

if constexpr (multi_gpu) {
count = host_scalar_allreduce(
handle.get_comms(), count, raft::comms::op_t::SUM, handle.get_stream());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use count_if_v.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in next push

* random seeds to select, or a device_span identifying a list of pre-selected vertices
* to use as seeds for the traversals for approximating betweenness.
* @param vertices Optional, if specified this provides a device_span identifying a list of
* pre-selected vertices to use as seeds for the traversals for approximating betweenness.
* @param normalized A flag indicating results should be normalized
Copy link
Contributor

@jnke2016 jnke2016 Jan 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this parameter is not working. I ran the CAPI test with normalized=True and it is still passing

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added implementation of normalization in latest push

seunghwak
seunghwak previously approved these changes Jan 24, 2023
Copy link
Contributor

@seunghwak seunghwak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few suggestions to consider, otherwise LGTM.

* vertex_list as NULL. This will compute betweenness centrality by doing a
* traversal from every vertex and counting the frequency that a vertex appears on
* a shortest path.
* Betweenness can be computed exactly by specifying This will compute betweenness centrality by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specifying This=>It seems like some words are missing between specifying and This.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaned up documentation

constexpr int bucket_idx_next{1};

rmm::device_uvector<vertex_t> sigma(graph_view.local_vertex_partition_range_size(),
handle.get_stream());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there will be at most number_of_vertices() paths=>Is this true with multi-edges as well?

What happens if there are two vertices 0 and 1 and trillions of multi-edges between vertex 0 & 1?

dst_properties.view(),
cugraph::edge_dummy_property_t{}.view(),
[d] __device__(auto, auto, auto src_props, auto dst_props, auto) {
if ((thrust::get<0>(dst_props) == d) && (thrust::get<0>(src_props) == (d - 1))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, then could you update the FIXME comment above accordingly?

@ChuckHastings ChuckHastings changed the base branch from branch-23.02 to branch-23.04 January 30, 2023 22:04
@ChuckHastings ChuckHastings dismissed seunghwak’s stale review January 30, 2023 22:04

The base branch was changed.

@ChuckHastings ChuckHastings modified the milestones: 23.02, 23.04 Jan 30, 2023
rapids-bot bot pushed a commit that referenced this pull request Feb 9, 2023
We introduced a change to pin RAPIDS wheel dependencies to the same release version. However, branch 23.04 was created before that last PR was merged, so as of now cugraph's 23.4 wheels are installing 23.2 RAPIDS dependencies. This PR updates those pins to the current release.

This should solve the error in #3160 (where the old pylibcugraph-cu11==23.2 is being pulled in the test step as opposed to the desired 23.4 version).

Authors:
  - Sevag H (https://github.com/sevagh)

Approvers:
  - Rick Ratzel (https://github.com/rlratzel)

URL: #3253
1,
select_count,
with_replacement,
handle.get_stream());
Copy link
Contributor

@seunghwak seunghwak Feb 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for a future reference, this will be executed using just a single warp, can be pretty slow if we're dealing with large graphs/large select_count values. Once we fix this problem, we can use this for seed generation in the sampling primitive & uniform sampling tests.

@ChuckHastings
Copy link
Collaborator Author

/merge

@rapids-bot rapids-bot bot merged commit 0a5d811 into rapidsai:branch-23.04 Feb 9, 2023
@ChuckHastings ChuckHastings deleted the implement_betweeenness_centrality branch September 27, 2023 21:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants