Implement Vertex betweenness centrality #3160

ChuckHastings · 2023-01-19T21:06:32Z

Implements Vertex Betweenness, part of #2647. The core of this algorithm will also address Edge Betweenness, but the return values are different. Edge betweenness will be finished in a separate PR.

We use a parallel variation of the Brandes Algorithm (2001) to compute exact or approximate betweenness, implemented using the graph primitives.

codecov-commenter · 2023-01-20T05:01:10Z

Codecov Report

❗ No coverage uploaded for pull request base (branch-23.04@6f47f45). Click here to learn what that means.
Patch has no changes to coverable lines.

Additional details and impacted files

@@               Coverage Diff               @@
##             branch-23.04    #3160   +/-   ##
===============================================
  Coverage                ?   56.26%           
===============================================
  Files                   ?      153           
  Lines                   ?     9658           
  Branches                ?        0           
===============================================
  Hits                    ?     5434           
  Misses                  ?     4224           
  Partials                ?        0

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

… selected vertex list

seunghwak

Review part 1 (I haven't reviewed the actual betweenness centrality implementation yet).

seunghwak · 2023-01-20T21:28:23Z

cpp/include/cugraph/graph_functions.hpp

+                                                     raft::random::RngState& rng_state,
+                                                     vertex_t num_vertices,
+                                                     vertex_t select_count,
+                                                     bool with_replacement);


This is assuming that vertex IDs are consecutive integers starting from 0. This won't work once we start to support vertex masking. To be future proof, should we better take a graph_view object and call number_of_vertices() inside? Then, once we start to support vertex masking, we just need to update the function internals, and users don't need to make change in their side.

I'll work on this. I hadn't thought about that particular use case, I had been thinking it might be better to pass in the graph, but didn't really have a good use case to motivate it.

The legacy implementation of betweenness has a similar python based function which I am obviously going to substitute with the CAPI one from sampling_algorithms.

Updated in next push

seunghwak · 2023-01-20T21:30:05Z

cpp/include/cugraph_c/centrality_algorithms.h

 * @param [in]  vertex_list        Optionally specify a device array containing a list of vertices
- *                                 to use as seeds for BFS
+ *                                 to use as seeds for betweenness approximation


Better be betweenness centrality approximation or centrality approximation? I am not sure what 'betweenness` means on its own.

seunghwak · 2023-01-20T21:32:11Z

cpp/include/cugraph_c/centrality_algorithms.h

 * @param [in]  vertex_list        Optionally specify a device array containing a list of vertices
- *                                 to use as seeds for BFS
+ *                                 to use as seeds for traversals


Better match the documentation in vertex betweenness centrality?

In the above, we're saying to use as seeds for betweenness approximation.

Users may be confused that vertex_list may have different roles in vertex & edge betweenness centrality.

seunghwak · 2023-01-20T21:36:59Z

cpp/src/c_api/random.cpp

+
+      if constexpr (multi_gpu) {
+        local_vertices = cugraph::detail::shuffle_ext_vertices_to_local_gpu_by_vertex_partitioning(
+          handle_, std::move(local_vertices));


Just to double check, aren't the vertex IDs in local_vertices still internal at this point?

Yes. I copied this from test code where the vertices generated were external. I will consider the best way to fix this.

Updated in next push. Vertices are shuffled as internal vertices... and then unrenumbered to external vertices before being returned.

seunghwak · 2023-01-20T21:39:11Z

cpp/src/structure/select_vertices.cpp

+  rmm::device_uvector<vertex_t> result(select_count, handle.get_stream());
+
+  if (select_count == num_vertices) {
+    detail::sequence_fill(handle.get_stream(), result.data(), result.size(), vertex_t{0});


I guess this is valid only when with_replacement = false.

Good point. The only current use is with_replacement == false, I added the parameter for completeness. I'll rework this.

Update implementation in next push to support with_replacement == true also.

seunghwak · 2023-01-20T21:40:25Z

cpp/src/structure/select_vertices.cpp

+#include <cugraph-ops/graph/sampling.hpp>
+#endif
+
+namespace cugraph {


May better rename this file to select_random_vertices.cpp unless you have a plan to add other non-random vertex selection functions?

Renamed in next push. With the API change (adding the graph_view), added the _impl.hpp, _sg.cpp and _mg.cpp files instead of a single file.

seunghwak · 2023-01-20T22:19:27Z

cpp/src/centrality/betweenness_centrality_impl.cuh

+
+template <typename vertex_t>
+struct brandes_e_op_t {
+  const vertex_t max_vertex_t{std::numeric_limits<vertex_t>::max()};


Any reason to add _t at the end? First seeing max_vertex_t in the code below, I mistakenly thought it is a type name. Better rename this variable to avoid confusion?

This is actually invalid_distance, right? Better name this as invalid_distance?

I will change this. the _t was a typo, but I think invalid_distance is better.

Fixed in next push

seunghwak · 2023-01-21T14:07:22Z

cpp/src/centrality/betweenness_centrality_impl.cuh

+  detail::scalar_fill(handle, centralities.data(), centralities.size(), weight_t{0});
+
+  size_t num_sources = thrust::distance(vertices_begin, vertices_end);
+  std::vector<size_t> source_starts{{0, num_sources}};


Isn't source_starts here a misnomer? I guess source_starts.back() isn't a starting displacement but marks the end.

Renamed to source_offsets. This is the offsets array of a CSR-like structure identifying the sources by GPU.

At this point we either have a list of seed vertices that we want to use for computing approximate betweenness, or we have a list of all vertices for computing exact betweenness. The frontier bucket needs to be populated only on the GPU where the vertex exists, this provides us a mechanism - on each iteration of the outer loop - to populate the frontier properly.

seunghwak · 2023-01-21T14:13:23Z

cpp/src/centrality/betweenness_centrality_impl.cuh

+  // a tagged frontier (tagging each source with itself).  Then we can
+  // expand from multiple sources concurrently. The challenge is managing
+  // the memory explosion.
+  //


Yes, it might not be feasible to take the entire set of sources at once especially for power-law graphs. We may take N vertices in each function call where N might be a small number for small diameter graphs and larger number for large diameter graphs (how to cheaply find a graph diameter is another question). Maybe a topic for future research.

Definitely an area for future research. I had some ideas on optimizations (algorithmic changes, not tuning primitives), but definitely more research is required.

seunghwak · 2023-01-21T14:31:26Z

cpp/src/centrality/betweenness_centrality_impl.cuh

+
+  template <typename value_t, typename ignore_t>
+  __device__ thrust::optional<value_t> operator()(
+    vertex_t, vertex_t, value_t src_property, vertex_t dst_property, ignore_t) const


Should we better rename src_property to src_sigma and dst_property to dst_distance?

Fixed in next push

seunghwak · 2023-01-21T14:36:46Z

cpp/src/centrality/betweenness_centrality_impl.cuh

+  constexpr int bucket_idx_next{1};
+
+  rmm::device_uvector<vertex_t> sigma(graph_view.local_vertex_partition_range_size(),
+                                      handle.get_stream());


Is there a guarantee that vertex_t won't overflow here? sigma here is a total number of shortest paths that go through a vertex v, and this shouldn't overflow if there is only one shortest path from s to t, but if there are multiple shortest paths between vertex pairs, this can theoretically overflow, right? (the odd that this will happen in reality sounds pretty low, but we cannot make trade-offs on correctness).

I will need to spend some time in the code to ponder the rest of your comments in this file. This might need to be an edge_t, I'll work through the code and verify.

I believe that vertex_t here is appropriate.

The brandes_bfs function will be computing distance and sigma from a single source vertex. In this case, sigma will accumulate how many times a vertex v appears on an s-t path. Since s is fixed for each call to brandes_bfs, there will be at most number_of_vertices() paths that will be explored in a single call to brandes_bfs. The value is consumed in the accumulate_vertex_results function (converted into a partial score) and reinitialized when we start from a new source.

there will be at most number_of_vertices() paths=>Is this true with multi-edges as well?

What happens if there are two vertices 0 and 1 and trillions of multi-edges between vertex 0 & 1?

Of course, a multi-graph changes that dynamic. I'll change these to edge_t.

seunghwak · 2023-01-21T14:57:15Z

cpp/src/centrality/betweenness_centrality_impl.cuh

+      dst_properties.view(),
+      cugraph::edge_dummy_property_t{}.view(),
+      [d] __device__(auto, auto, auto src_props, auto dst_props, auto) {
+        if ((thrust::get<0>(dst_props) == d) && (thrust::get<0>(src_props) == (d - 1))) {


So, we can make this much more efficient if per_v_transform_reduce_outgoing_e takes a vertex list and we provide a vertex list with the distance == (d-1), right?

Then, we should be able to finish accumulate_vertex_results in O(E) instead of O(E) * diameter. This should be much easier (and I also have a plan to update multiple primitives to take a vertex list, this is necessary for Leiden as well) than implementing ``efficient transform_reduce_v_frontier_incoming_e_by_src on an untransposed graph, which does not currently exist''.

Yes.

For extra O(V) memory, we could keep the frontiers for each distance as we go through the loop, and then use them as input here.

Great, then could you update the FIXME comment above accordingly?

Updated FIXME

seunghwak · 2023-01-21T15:38:53Z

cpp/src/centrality/betweenness_centrality_impl.cuh

+    vertex_t, vertex_t, value_t src_property, vertex_t dst_property, ignore_t) const
+  {
+    thrust::optional<value_t> result{thrust::nullopt};
+    return (dst_property < max_vertex_t) ? result : thrust::make_optional(src_property);


I guess return (dst_distance == invalid_distance)? thrust::make_optional(src_sigma) : thrust::nullopt will be more readable.

Fixed in next push

seunghwak · 2023-01-21T15:41:00Z

cpp/src/centrality/betweenness_centrality_impl.cuh

+                      [hop] __device__(auto v, auto old_values, auto v_sigma) {
+                        vertex_t next_distance = thrust::get<0>(old_values) == invalid_distance
+                                                   ? (hop + 1)
+                                                   : thrust::get<0>(old_values);


Is this check necessary? We're accumulating dst only when dst_distance == invalid_distance. Did you find any case thrust::get<0>(old_values) != invalid_distance?

Fixed in next push. Not necessary.

seunghwak · 2023-01-21T15:43:20Z

cpp/src/centrality/betweenness_centrality_impl.cuh

+    diameter = host_scalar_allreduce(
+      handle.get_comms(), diameter, raft::comms::op_t::MAX, handle.get_stream());
+  }
+


You can use transform_reduce_v to merge the two (assuming that distance.size() == graph_view.local_vertex_partition_range_size()).

Fixed in next push.

seunghwak · 2023-01-21T15:44:43Z

cpp/src/centrality/betweenness_centrality_impl.cuh

+    if constexpr (multi_gpu) {
+      count = host_scalar_allreduce(
+        handle.get_comms(), count, raft::comms::op_t::SUM, handle.get_stream());
+    }


You can use count_if_v.

Fixed in next push

jnke2016 · 2023-01-22T13:29:17Z

cpp/include/cugraph/algorithms.hpp

- *         random seeds to select, or a device_span identifying a list of pre-selected vertices
- *         to use as seeds for the traversals for approximating betweenness.
+ * @param vertices Optional, if specified this provides a device_span identifying a list of
+ * pre-selected vertices to use as seeds for the traversals for approximating betweenness.
 * @param normalized         A flag indicating results should be normalized


It looks like this parameter is not working. I ran the CAPI test with normalized=True and it is still passing

Added implementation of normalization in latest push

seunghwak

Few suggestions to consider, otherwise LGTM.

seunghwak · 2023-01-24T18:17:27Z

cpp/include/cugraph_c/centrality_algorithms.h

- * vertex_list as NULL.  This will compute betweenness centrality by doing a
- * traversal from every vertex and counting the frequency that a vertex appears on
- * a shortest path.
+ * Betweenness can be computed exactly by specifying This will compute betweenness centrality by


specifying This=>It seems like some words are missing between specifying and This.

Cleaned up documentation

seunghwak · 2023-01-24T18:22:21Z

cpp/src/centrality/betweenness_centrality_impl.cuh

+  constexpr int bucket_idx_next{1};
+
+  rmm::device_uvector<vertex_t> sigma(graph_view.local_vertex_partition_range_size(),
+                                      handle.get_stream());


there will be at most number_of_vertices() paths=>Is this true with multi-edges as well?

What happens if there are two vertices 0 and 1 and trillions of multi-edges between vertex 0 & 1?

seunghwak · 2023-01-24T18:24:08Z

cpp/src/centrality/betweenness_centrality_impl.cuh

+      dst_properties.view(),
+      cugraph::edge_dummy_property_t{}.view(),
+      [d] __device__(auto, auto, auto src_props, auto dst_props, auto) {
+        if ((thrust::get<0>(dst_props) == d) && (thrust::get<0>(src_props) == (d - 1))) {


Great, then could you update the FIXME comment above accordingly?

The base branch was changed.

We introduced a change to pin RAPIDS wheel dependencies to the same release version. However, branch 23.04 was created before that last PR was merged, so as of now cugraph's 23.4 wheels are installing 23.2 RAPIDS dependencies. This PR updates those pins to the current release. This should solve the error in #3160 (where the old pylibcugraph-cu11==23.2 is being pulled in the test step as opposed to the desired 23.4 version). Authors: - Sevag H (https://github.com/sevagh) Approvers: - Rick Ratzel (https://github.com/rlratzel) URL: #3253

seunghwak · 2023-02-09T16:22:11Z

cpp/src/structure/select_random_vertices_impl.hpp

+                                                   1,
+                                                   select_count,
+                                                   with_replacement,
+                                                   handle.get_stream());


Just for a future reference, this will be executed using just a single warp, can be pretty slow if we're dealing with large graphs/large select_count values. Once we fix this problem, we can use this for seed generation in the sampling primitive & uniform sampling tests.

ChuckHastings · 2023-02-09T20:36:11Z

/merge

ChuckHastings added 6 commits October 28, 2022 15:42

early work

98fc3dc

Merge branch 'branch-22.12' into implement_betweeenness_centrality

228b464

commit intermediate changes so I can merge to update with 23.02

d991d15

Merge branch 'branch-23.02' into implement_betweeenness_centrality

6c930b2

finish debugging/cleanup of vertex betweennness

c85a3c9

clean up a few things

75ec784

ChuckHastings requested review from a team as code owners January 19, 2023 21:06

ChuckHastings self-assigned this Jan 19, 2023

ChuckHastings added 3 - Ready for Review improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jan 19, 2023

ChuckHastings added this to the 23.02 milestone Jan 19, 2023

ChuckHastings added 3 commits January 19, 2023 13:08

Merge branch 'branch-23.02' into implement_betweeenness_centrality

d1681b3

fix format issues

86f9770

fix format issues

203a83b

make vertex list required, expose C API function to generate randomly…

b32c5e4

… selected vertex list

seunghwak reviewed Jan 20, 2023

View reviewed changes

seunghwak reviewed Jan 21, 2023

View reviewed changes

jnke2016 reviewed Jan 22, 2023

View reviewed changes

ChuckHastings added 4 commits January 23, 2023 15:03

address PR comments

d161eeb

missed some function instantiations, add normalization

1715e2c

Fix pre-commit issues

83f9bac

Merge branch 'branch-23.02' into implement_betweeenness_centrality

caf0369

seunghwak previously approved these changes Jan 24, 2023

View reviewed changes

ChuckHastings added 2 commits January 24, 2023 13:17

a few final changes based on PR comments

5bc2d9d

Merge branch 'branch-23.02' into implement_betweeenness_centrality

36fb795

alexbarghi-nv mentioned this pull request Jan 26, 2023

Refactor betweenness centrality #2971

Merged

Missing a factor of 2 in scaling results in an undirected graph

42dc3fa

ChuckHastings added 3 commits January 26, 2023 15:05

one more normalization adjustment

0842aec

remove commented out lines from testing

d325aef

n is larger f you include endpoints in the normalization computation

65a21b9

ChuckHastings changed the base branch from branch-23.02 to branch-23.04 January 30, 2023 22:04

ChuckHastings modified the milestones: 23.02, 23.04 Jan 30, 2023

ChuckHastings added 6 commits February 6, 2023 12:30

Add some tests for Joseph

7b02655

Merge branch 'branch-23.04' into implement_betweeenness_centrality

f047c88

add normalization to reference implementation

824b29f

fix formatting issues

36f427b

Merge branch 'branch-23.04' into implement_betweeenness_centrality

0466c8b

Merge branch 'branch-23.04' into implement_betweeenness_centrality

213e400

sevagh mentioned this pull request Feb 8, 2023

Bump pinned pip wheel deps to 23.4 #3253

Merged

Merge branch 'branch-23.04' into implement_betweeenness_centrality

fa60831

seunghwak approved these changes Feb 9, 2023

View reviewed changes

kingmesal removed the 3 - Ready for Review label Feb 9, 2023

jnke2016 approved these changes Feb 9, 2023

View reviewed changes

rapids-bot bot merged commit 0a5d811 into rapidsai:branch-23.04 Feb 9, 2023

ChuckHastings deleted the implement_betweeenness_centrality branch September 27, 2023 21:39

Implement Vertex betweenness centrality #3160

Implement Vertex betweenness centrality #3160

Conversation

ChuckHastings commented Jan 19, 2023

codecov-commenter commented Jan 20, 2023 • edited Loading

Codecov Report

seunghwak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChuckHastings Jan 21, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnke2016 Jan 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seunghwak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seunghwak Feb 9, 2023 • edited Loading

Choose a reason for hiding this comment

ChuckHastings commented Feb 9, 2023

codecov-commenter commented Jan 20, 2023 •

edited

Loading

ChuckHastings Jan 21, 2023 •

edited

Loading

jnke2016 Jan 22, 2023 •

edited

Loading

seunghwak Feb 9, 2023 •

edited

Loading