Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Heal command #468

Merged
merged 46 commits into from
Jun 18, 2024
Merged
Show file tree
Hide file tree
Changes from 32 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
cacf9d7
Heal all subnets
pietrodimarco-dfinity Jun 1, 2024
6e49b04
Merge branch 'main' into heal-all-subnets
pietrodimarco-dfinity Jun 4, 2024
10d4e2a
Fix bazel
pietrodimarco-dfinity Jun 4, 2024
a018d38
Add heal comand and endpoint
pietrodimarco-dfinity Jun 7, 2024
7ccc138
Merge branch 'main' into heal-all-subnets
pietrodimarco-dfinity Jun 7, 2024
0a36de1
Fix Clippy
pietrodimarco-dfinity Jun 7, 2024
a2411ac
Fix fmt
pietrodimarco-dfinity Jun 7, 2024
4964b09
Compute local optimum based on max-replacable-nodes
pietrodimarco-dfinity Jun 9, 2024
c3457c3
Test multi concurrent proposals
pietrodimarco-dfinity Jun 10, 2024
f68dd23
Merge branch 'main' into heal-all-subnets
pietrodimarco-dfinity Jun 10, 2024
d1066b9
Run rustftm
pietrodimarco-dfinity Jun 10, 2024
4b02f29
Add subnet name
pietrodimarco-dfinity Jun 10, 2024
bf8d0af
Merge branch 'main' into heal-all-subnets
pietrodimarco-dfinity Jun 10, 2024
05e94ea
Move unhealthy in separate function
pietrodimarco-dfinity Jun 10, 2024
a172508
Remove optimize_with_limit
pietrodimarco-dfinity Jun 11, 2024
dd8a787
Merge branch 'main' into heal-all-subnets
pietrodimarco-dfinity Jun 11, 2024
6b79b8a
Run rustfmt
pietrodimarco-dfinity Jun 11, 2024
8c8f6d0
Fixed clippy
pietrodimarco-dfinity Jun 11, 2024
173e4b8
Fixing clippy
pietrodimarco-dfinity Jun 11, 2024
923e24b
Refactor heal method
pietrodimarco-dfinity Jun 11, 2024
b29684e
Fix rustfmt
pietrodimarco-dfinity Jun 11, 2024
21a517b
Add NetworkHealRequest
pietrodimarco-dfinity Jun 11, 2024
0ee7cd5
Add removed anyhow
pietrodimarco-dfinity Jun 12, 2024
d56031b
Add ordering of subnets
pietrodimarco-dfinity Jun 17, 2024
5cb3e45
Add tests
pietrodimarco-dfinity Jun 17, 2024
9ccf3ef
Run rustfmt
pietrodimarco-dfinity Jun 17, 2024
19de17e
Merge branch 'main' into heal-all-subnets
pietrodimarco-dfinity Jun 17, 2024
cae5322
Add test
pietrodimarco-dfinity Jun 17, 2024
ab9a676
Merge branch 'main' into heal-all-subnets
pietrodimarco-dfinity Jun 17, 2024
7676b5b
Fix clippy
pietrodimarco-dfinity Jun 17, 2024
28b4d57
Fix clippy
pietrodimarco-dfinity Jun 17, 2024
64b039f
Removed unecessary derive
pietrodimarco-dfinity Jun 17, 2024
3535c43
Update rs/cli/src/main.rs
pietrodimarco-dfinity Jun 17, 2024
c145360
Update rs/cli/src/main.rs
pietrodimarco-dfinity Jun 17, 2024
0326934
Update rs/decentralization/src/network.rs
pietrodimarco-dfinity Jun 17, 2024
188e4f7
Update rs/cli/src/cli.rs
pietrodimarco-dfinity Jun 17, 2024
6069436
Update rs/decentralization/src/network.rs
pietrodimarco-dfinity Jun 17, 2024
b779c5c
Update rs/ic-management-types/src/requests.rs
pietrodimarco-dfinity Jun 17, 2024
96f5ad8
Update rs/ic-management-backend/src/endpoints/network.rs
pietrodimarco-dfinity Jun 17, 2024
39eab14
Update rs/decentralization/src/network.rs
pietrodimarco-dfinity Jun 17, 2024
5a9a5d4
Update rs/decentralization/src/network.rs
pietrodimarco-dfinity Jun 17, 2024
1aa10db
Add take max replacement unhealthy
pietrodimarco-dfinity Jun 17, 2024
2ac4ea5
Merge branch 'main' into heal-all-subnets
pietrodimarco-dfinity Jun 17, 2024
9b3947d
Merge branch 'main' into heal-all-subnets
pietrodimarco-dfinity Jun 18, 2024
ce209c9
Apply suggestions from code review
pietrodimarco-dfinity Jun 18, 2024
96ce52e
Run rustfmt
pietrodimarco-dfinity Jun 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions rs/cli/src/cli.rs
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,15 @@ pub enum Commands {
/// Path to the DER file
path: String,
},

Heal {
/// Max number of nodes to be replaced per subnet.
/// Optimization will be performed automatically maximizing the decentralization
/// and minimizing the number of replaced nodes per subnet
#[clap(short, long)]
max_replacable_nodes_per_sub: Option<usize>,
pietrodimarco-dfinity marked this conversation as resolved.
Show resolved Hide resolved
},

/// Manage an existing subnet
Subnet(subnet::Cmd),
/// Get a value using ic-admin CLI
Expand Down
11 changes: 10 additions & 1 deletion rs/cli/src/clients.rs
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
use async_trait::async_trait;
use decentralization::HealResponse;
use decentralization::SubnetChangeResponse;
use ic_base_types::PrincipalId;
use ic_management_types::{
requests::{MembershipReplaceRequest, NodesRemoveRequest, NodesRemoveResponse, SubnetCreateRequest, SubnetResizeRequest},
requests::{HealRequest, MembershipReplaceRequest, NodesRemoveRequest, NodesRemoveResponse, SubnetCreateRequest, SubnetResizeRequest},
Artifact, Network, NetworkError, Release, TopologyChangeProposal,
};
use log::error;
Expand Down Expand Up @@ -97,6 +98,14 @@ impl DashboardBackendClient {
.rest_send()
.await
}

pub(crate) async fn network_heal(&self, request: HealRequest) -> anyhow::Result<HealResponse> {
reqwest::Client::new()
.post(self.url.join("network/heal").map_err(|e| anyhow::anyhow!(e))?)
.json(&request)
.rest_send()
.await
}
}

#[async_trait]
Expand Down
14 changes: 14 additions & 0 deletions rs/cli/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,20 @@ async fn async_main() -> Result<(), anyhow::Error> {
Ok(())
}

cli::Commands::Heal {
max_replacable_nodes_per_sub,
pietrodimarco-dfinity marked this conversation as resolved.
Show resolved Hide resolved
} => {
runner_instance
.network_heal(
ic_management_types::requests::HealRequest {
max_replacable_nodes_per_sub: *max_replacable_nodes_per_sub,
pietrodimarco-dfinity marked this conversation as resolved.
Show resolved Hide resolved
},
cli_opts.verbose,
simulate,
)
.await
}

cli::Commands::Subnet(subnet) => {
// Check if required arguments are provided
match &subnet.subcommand {
Expand Down
27 changes: 27 additions & 0 deletions rs/cli/src/runner.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ use crate::operations::hostos_rollout::{HostosRollout, HostosRolloutResponse, No
use crate::ops_subnet_node_replace;
use crate::{ic_admin, local_unused_port};
use decentralization::SubnetChangeResponse;
use futures::future::join_all;
use ic_base_types::PrincipalId;
use ic_management_backend::proposal::ProposalAgent;
use ic_management_backend::public_dashboard::query_ic_dashboard_list;
Expand Down Expand Up @@ -444,4 +445,30 @@ impl Runner {
.await?;
Ok(())
}

pub async fn network_heal(
&self,
request: ic_management_types::requests::HealRequest,
_verbose: bool,
pietrodimarco-dfinity marked this conversation as resolved.
Show resolved Hide resolved
simulate: bool,
) -> Result<(), anyhow::Error> {
let change = self.dashboard_backend_client.network_heal(request).await?;
println!("{}", change);

join_all(change.subnets_change_response.iter().map(|subnet_change_response| async move {
pietrodimarco-dfinity marked this conversation as resolved.
Show resolved Hide resolved
self.run_membership_change(
subnet_change_response.clone(),
ops_subnet_node_replace::replace_proposal_options(subnet_change_response)?,
simulate,
)
.await
.map_err(|e| {
println!("{}", e);
e
})
}))
.await;
pietrodimarco-dfinity marked this conversation as resolved.
Show resolved Hide resolved

Ok(())
}
}
16 changes: 15 additions & 1 deletion rs/decentralization/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ impl From<&network::SubnetChange> for SubnetChangeResponse {

impl Display for SubnetChangeResponse {
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
writeln!(f, "Decentralization score changes:\n")?;
writeln!(f, "Decentralization score changes for subnet {}:\n", self.subnet_id.unwrap_or_default())?;
let before_individual = self.score_before.scores_individual();
let after_individual = self.score_after.scores_individual();
self.score_before
Expand Down Expand Up @@ -183,3 +183,17 @@ impl Display for SubnetChangeResponse {
Ok(())
}
}

#[derive(Clone, Debug, Deserialize, Serialize)]
pub struct HealResponse {
pub subnets_change_response: Vec<SubnetChangeResponse>,
}

impl Display for HealResponse {
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
for change in &self.subnets_change_response {
writeln!(f, "{}", change)?;
}
Ok(())
}
}
71 changes: 70 additions & 1 deletion rs/decentralization/src/nakamoto/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -425,7 +425,7 @@ impl Display for NakamotoScore {
mod tests {
use std::str::FromStr;

use crate::network::{DecentralizedSubnet, SubnetChangeRequest};
use crate::network::{DecentralizedSubnet, NetworkHealRequest, NetworkHealSubnets, SubnetChangeRequest};
use ic_base_types::PrincipalId;
use itertools::Itertools;
use regex::Regex;
Expand Down Expand Up @@ -871,4 +871,73 @@ mod tests {
(5000, vec!["European subnet has 5 non-European node(s)".to_string()])
);
}

#[test]
fn test_network_heal_subnets_ord() {
let not_important_small = new_test_subnet(0, 13, 0)
.with_subnet_id(PrincipalId::from_str("k44fs-gm4pv-afozh-rs7zw-cg32n-u7xov-xqyx3-2pw5q-eucnu-cosd4-uqe").unwrap());
let not_important_small = NetworkHealSubnets {
name: String::from("App 20"),
decentralized_subnet: not_important_small,
unhealthy_nodes: vec![],
};

let not_important_large = new_test_subnet(0, 28, 0)
.with_subnet_id(PrincipalId::from_str("bkfrj-6k62g-dycql-7h53p-atvkj-zg4to-gaogh-netha-ptybj-ntsgw-rqe").unwrap());
let not_important_large = NetworkHealSubnets {
name: String::from("European"),
decentralized_subnet: not_important_large,
unhealthy_nodes: vec![],
};

let important =
serde_json::from_str::<ic_management_types::Subnet>(include_str!("../../test_data/subnet-uzr34.json")).expect("failed to read test data");
let important = NetworkHealSubnets {
name: important.metadata.name.clone(),
decentralized_subnet: DecentralizedSubnet::from(important),
unhealthy_nodes: vec![],
};

let unordered = vec![not_important_small.clone(), important.clone(), not_important_large.clone()];
let healing_order = unordered.clone().into_iter().sorted_by(|a, b| a.cmp(b).reverse()).collect_vec();

assert_eq!(vec![important, not_important_large, not_important_small], healing_order);
}

#[test]
fn test_network_heal() {
let nodes_available = new_test_nodes("spare", 10, 2);
let nodes_available_principals = nodes_available.iter().map(|n| n.id).collect_vec();

let important =
serde_json::from_str::<ic_management_types::Subnet>(include_str!("../../test_data/subnet-uzr34.json")).expect("failed to read test data");
let important_decentralized = DecentralizedSubnet::from(important.clone());
let important_unhealthy_principals = vec![
PrincipalId::from_str("e4ysi-xp4fs-5ckcv-7e76q-edydw-ak6le-2acyt-k7udb-lj2vo-fqhhx-vqe").unwrap(),
PrincipalId::from_str("aefqq-d7ldg-ljk5s-cmnxk-qqu7c-tw52l-74g3m-xxl5d-ag4ia-dxubz-wae").unwrap(),
];
let unhealthy_nodes = important_decentralized
.nodes
.clone()
.into_iter()
.filter(|n| important_unhealthy_principals.contains(&n.id))
.collect_vec();
let important = NetworkHealSubnets {
name: important.metadata.name.clone(),
decentralized_subnet: important_decentralized,
unhealthy_nodes: unhealthy_nodes.clone(),
};

let network_heal_response = NetworkHealRequest::new(vec![important])
.heal_and_optimize(nodes_available.clone(), None)
.unwrap();

let result = network_heal_response.first().unwrap().clone();

assert_eq!(important_unhealthy_principals, result.removed.clone());

assert_eq!(important_unhealthy_principals.len(), result.added.len());

result.added.iter().for_each(|n| assert!(nodes_available_principals.contains(n)));
}
}
79 changes: 77 additions & 2 deletions rs/decentralization/src/network.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ use async_trait::async_trait;
use ic_base_types::PrincipalId;
use ic_management_types::{MinNakamotoCoefficients, NetworkError, NodeFeature};
use itertools::Itertools;
use log::{debug, info};
use log::{debug, info, warn};
use rand::{seq::SliceRandom, SeedableRng};
use serde::{Deserialize, Serialize};
use std::cmp::Ordering;
Expand Down Expand Up @@ -122,7 +122,7 @@ impl From<&ic_management_types::Node> for Node {
}
}

#[derive(Clone, Default, Debug, Serialize, Deserialize)]
#[derive(Clone, Default, Debug, Serialize, Deserialize, PartialEq, Eq)]
pub struct DecentralizedSubnet {
pub id: PrincipalId,
pub nodes: Vec<Node>,
Expand Down Expand Up @@ -985,3 +985,78 @@ impl Display for SubnetChange {
write!(f, "{}", SubnetChangeResponse::from(self))
}
}

#[derive(Debug, Clone, PartialEq, Eq)]
pub struct NetworkHealSubnets {
pub name: String,
pub decentralized_subnet: DecentralizedSubnet,
pub unhealthy_nodes: Vec<Node>,
}

impl NetworkHealSubnets {
const IMPORTANT_SUBNETS: &'static [&'static str] = &["NNS", "SNS", "Bitcoin", "Internet Identity", "tECDSA signing"];
}

impl PartialOrd for NetworkHealSubnets {
fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
Some(self.cmp(other))
}
}

impl Ord for NetworkHealSubnets {
fn cmp(&self, other: &Self) -> Ordering {
let self_is_important = NetworkHealSubnets::IMPORTANT_SUBNETS.contains(&self.name.as_str());
let other_is_important = NetworkHealSubnets::IMPORTANT_SUBNETS.contains(&other.name.as_str());

match (self_is_important, other_is_important) {
(true, false) => Ordering::Greater,
(false, true) => Ordering::Less,
_ => self.decentralized_subnet.nodes.len().cmp(&other.decentralized_subnet.nodes.len()),
}
}
}

pub struct NetworkHealRequest {
pub subnets: Vec<NetworkHealSubnets>,
}

impl NetworkHealRequest {
pub fn new(subnets: Vec<NetworkHealSubnets>) -> Self {
Self { subnets }
}

pub fn heal_and_optimize(
&self,
mut available_nodes: Vec<Node>,
max_replacable_nodes: Option<usize>,
pietrodimarco-dfinity marked this conversation as resolved.
Show resolved Hide resolved
) -> Result<Vec<SubnetChangeResponse>, NetworkError> {
let mut subnets_changed = Vec::new();
let subnets_to_heal = self.subnets.iter().sorted_by(|a, b| a.cmp(b).reverse()).collect_vec();

for subnet in subnets_to_heal {
let unhealthy_nodes_len = subnet.unhealthy_nodes.len();
if let Some(max_replacable_nodes) = max_replacable_nodes {
pietrodimarco-dfinity marked this conversation as resolved.
Show resolved Hide resolved
if unhealthy_nodes_len > max_replacable_nodes {
pietrodimarco-dfinity marked this conversation as resolved.
Show resolved Hide resolved
warn!(
"Subnet {} has {} unhealthy nodes\nMax replacable nodes is {} skipping...",
pietrodimarco-dfinity marked this conversation as resolved.
Show resolved Hide resolved
subnet.decentralized_subnet.id, unhealthy_nodes_len, max_replacable_nodes
pietrodimarco-dfinity marked this conversation as resolved.
Show resolved Hide resolved
);
continue;
}
}
let optimize_limit = max_replacable_nodes.unwrap_or(unhealthy_nodes_len) - unhealthy_nodes_len;
pietrodimarco-dfinity marked this conversation as resolved.
Show resolved Hide resolved

let change = SubnetChangeRequest {
subnet: subnet.decentralized_subnet.clone(),
available_nodes: available_nodes.clone(),
..Default::default()
};
let change = change.optimize(optimize_limit, &subnet.unhealthy_nodes)?;

available_nodes.retain(|node| !change.added().contains(node));
subnets_changed.push(SubnetChangeResponse::from(&change));
}

Ok(subnets_changed)
}
}
2 changes: 2 additions & 0 deletions rs/ic-management-backend/src/endpoints/mod.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
pub mod governance_canister;
pub mod network;
pub mod nodes_ops;
pub mod query_decentralization;
pub mod release;
Expand Down Expand Up @@ -86,6 +87,7 @@ pub async fn run_backend(
.service(self::subnet::create_subnet)
.service(self::subnet::resize)
.service(self::subnet::change_preview)
.service(self::network::heal)
.service(self::nodes_ops::remove)
.service(self::query_decentralization::decentralization_subnet_query)
.service(self::query_decentralization::decentralization_whatif_query)
Expand Down
36 changes: 36 additions & 0 deletions rs/ic-management-backend/src/endpoints/network.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
use super::*;
use crate::subnets;
use decentralization::network::{DecentralizedSubnet, NetworkHealRequest, NetworkHealSubnets, Node};
use ic_management_types::{requests::HealRequest, NetworkError};
use itertools::Itertools;

#[post("/network/heal")]
async fn heal(request: web::Json<HealRequest>, registry: web::Data<Arc<RwLock<RegistryState>>>) -> Result<HttpResponse, Error> {
let registry = registry.read().await;
let health_client = health::HealthClient::new(registry.network());
let nodes_health = health_client
.nodes()
.await
.map_err(|_| actix_web::error::ErrorInternalServerError("failed to fetch subnet health".to_string()))?;
let subnets: BTreeMap<PrincipalId, ic_management_types::Subnet> = registry.subnets();
let unhealthy_subnets: BTreeMap<PrincipalId, Vec<ic_management_types::Node>> = subnets::unhealthy_with_nodes(&subnets, nodes_health).await;

let subnets_to_heal = unhealthy_subnets
.iter()
.flat_map(|(id, unhealthy_nodes)| {
let unhealthy_nodes = unhealthy_nodes.iter().map(Node::from).collect::<Vec<_>>();
let unhealthy_subnet = subnets.get(id).ok_or(NetworkError::SubnetNotFound(*id))?;

Ok::<NetworkHealSubnets, NetworkError>(NetworkHealSubnets {
name: unhealthy_subnet.metadata.name.clone(),
decentralized_subnet: DecentralizedSubnet::from(unhealthy_subnet),
unhealthy_nodes,
})
})
.collect_vec();

let subnets_change_response =
NetworkHealRequest::new(subnets_to_heal).heal_and_optimize(registry.available_nodes().await?, request.max_replacable_nodes_per_sub)?;
pietrodimarco-dfinity marked this conversation as resolved.
Show resolved Hide resolved

Ok(HttpResponse::Ok().json(decentralization::HealResponse { subnets_change_response }))
}
Loading
Loading