Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support for static AzureCNI without overlay networking via generating additional ip configurations #365

Open
wants to merge 62 commits into
base: main
Choose a base branch
from

Conversation

Bryce-Soghigian
Copy link
Collaborator

@Bryce-Soghigian Bryce-Soghigian commented May 21, 2024

Fixes #367

Description
This PR adds support for azure cni without overlay, as well as introduces some makefile goodness for creating clusters or other cni configurations.

Why Do We Need Secondary IP Configs For AZ CNI Without Overlay?

When a pod is created, the Azure CNI plugin allocates an IP address from the pool of secondary IP addresses configured on the NIC of the node where the pod is scheduled. The Azure CNI plugin manages the allocation and de-allocation of these IP addresses through the IP Address Manager (IPAM), ensuring each pod receives a unique IP address and tracking the usage of these addresses.

In this setup, pods are assigned IP addresses from the node's subnet, allowing for direct IP connectivity. This enables pods within the same virtual network to communicate without the need for Network Address Translation (NAT). The node's NIC routes traffic to the appropriate pod based on the assigned IP.

Flow

  1. Node NIC: Primary IP and multiple secondary IP addresses are assigned to the nic on node creation, and the nic is assigned to the node
  2. Pod Create: Pod requests an IP address on intialization
  3. Azure CNI Plugin: Assigns a secondary IP to the pod from the node’s NIC.
  4. Network Interface within Node: We use transparent mode, which doesn't change any properties of the eth0 interface on the nic. Azure CNI creates and adds host-side pod veth pair interfaces that are added to the host network.
  5. Pod to Pod Communication: Pods communicate using their assigned IPs directly within the virtual network. Pod to Pod communication is over layer 3 and L3 routing rules route the pod traffic.

Learn more about specifics here

How was this change tested?

What this PR does not include

  • E2E Suite for our Upstream E2Es here testing other network connectivity(ingress + egress)
  • Port of our E2Es to run fully each time on azure cni without overlay
    Does this change impact docs?
  • Yes, PR includes docs updates
  • Yes, issue opened: #
  • No

Release Note

support added for static AzureCNI without overlay networking

Bryce-Soghigian and others added 25 commits April 28, 2024 01:59
…on of propagating kubelet configuration this way
Copy link
Collaborator Author

@Bryce-Soghigian Bryce-Soghigian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/test

Comment on lines 264 to 270
type createNICOptions struct {
NICName string
BackendPools *loadbalancer.BackendAddressPools
InstanceType *corecloudprovider.InstanceType
LaunchTemplate *launchtemplate.Template
NetworkPluginMode string
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There probably exists a better pattern for this that splits the population of the creation options and the actual reading of those options. Its a bit messy here because we set some values like BackendPools after the fact rather than initially.

Also creation options as a pattern could be more widely leveraged for VM creation as well. This was feedback placed here originally to split up some options rather than just using context as a global retrieval object.

But seems like a lot to tackle restructuring the entire dataflow for the scope of this PR. It deserves its own proper refactoring PR.

Copy link
Collaborator Author

@Bryce-Soghigian Bryce-Soghigian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/test

Copy link
Collaborator Author

@Bryce-Soghigian Bryce-Soghigian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/test

…etworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
@Bryce-Soghigian
Copy link
Collaborator Author

From what I can tell, in terms of the contract there are two places where NETWORK_PLUGIN is used

https://github.com/Azure/AgentBaker/blob/1c4f87e96b26cd8933a16cc31eaa9d18025932f1/self-contained/bootstrap_install.sh#L62

installNetworkPlugin() {
    if [[ "${NETWORK_PLUGIN}" = "azure" ]]; then
        installAzureCNI
    fi
    installCNI  #reference plugins. Mostly for kubenet but loop back used by contaierd until containerd 2
    rm -rf $CNI_DOWNLOADS_DIR &
}

https://github.com/Azure/AgentBaker/blob/1c4f87e96b26cd8933a16cc31eaa9d18025932f1/self-contained/bootstrap_config.sh#L283

configureCNIIPTables() {
    if [[ "${NETWORK_PLUGIN}" = "azure" ]]; then
        mv $CNI_BIN_DIR/10-azure.conflist $CNI_CONFIG_DIR/
        chmod 600 $CNI_CONFIG_DIR/10-azure.conflist
        if [[ "${NETWORK_POLICY}" == "calico" ]]; then
          sed -i 's#"mode":"bridge"#"mode":"transparent"#g' $CNI_CONFIG_DIR/10-azure.conflist
        elif [[ "${NETWORK_POLICY}" == "" || "${NETWORK_POLICY}" == "none" ]] && [[ "${NETWORK_MODE}" == "transparent" ]]; then
          sed -i 's#"mode":"bridge"#"mode":"transparent"#g' $CNI_CONFIG_DIR/10-azure.conflist
        fi
        /sbin/ebtables -t nat --list
    fi
}

but we need it to support BYO CNI it seems based on these two references in Agentbaker

…reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"

This reverts commit 5d88685.
@Bryce-Soghigian
Copy link
Collaborator Author

@tangyouzzz
Copy link

From what I can tell, in terms of the contract there are two places where NETWORK_PLUGIN is used

https://github.com/Azure/AgentBaker/blob/1c4f87e96b26cd8933a16cc31eaa9d18025932f1/self-contained/bootstrap_install.sh#L62

installNetworkPlugin() {
    if [[ "${NETWORK_PLUGIN}" = "azure" ]]; then
        installAzureCNI
    fi
    installCNI  #reference plugins. Mostly for kubenet but loop back used by contaierd until containerd 2
    rm -rf $CNI_DOWNLOADS_DIR &
}

https://github.com/Azure/AgentBaker/blob/1c4f87e96b26cd8933a16cc31eaa9d18025932f1/self-contained/bootstrap_config.sh#L283

configureCNIIPTables() {
    if [[ "${NETWORK_PLUGIN}" = "azure" ]]; then
        mv $CNI_BIN_DIR/10-azure.conflist $CNI_CONFIG_DIR/
        chmod 600 $CNI_CONFIG_DIR/10-azure.conflist
        if [[ "${NETWORK_POLICY}" == "calico" ]]; then
          sed -i 's#"mode":"bridge"#"mode":"transparent"#g' $CNI_CONFIG_DIR/10-azure.conflist
        elif [[ "${NETWORK_POLICY}" == "" || "${NETWORK_POLICY}" == "none" ]] && [[ "${NETWORK_MODE}" == "transparent" ]]; then
          sed -i 's#"mode":"bridge"#"mode":"transparent"#g' $CNI_CONFIG_DIR/10-azure.conflist
        fi
        /sbin/ebtables -t nat --list
    fi
}

but we need it to support BYO CNI it seems based on these two references in Agentbaker

After adding the NETWORK-POLICY variable, Karpenter should be able to be used in underlay network and Azure Cni AKS

if kubeletConfig == nil {
kubeletConfig = &corev1beta1.KubeletConfiguration{}
}
kubeletConfig.MaxPods = lo.ToPtr[int32](consts.DefaultKubernetesMaxPodsAzure)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

`consts.DefaultKubernetesMaxPodsAzure' is a fixed value, and some nodes may not seem to need to allocate so many IPs. Can we first obtain it through the maxpod field in the nodepool, and only use the default value if it is not defined?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubelet configuration isnt something we are supporting for GA

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thank you for your answer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/networking Issues or PRs related to networking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: Add support for Azure CNI (w/o overlay)
6 participants