Skip to main content
Calico Cloud documentation

Configure BGP peering with nested clusters running on KubeVirt VMs

Big picture​

Configure BGP peering with Calico Cloud nodes running on KubeVirt VMs that are running on your Kubernetes cluster.

Value​

Run nested Kubernetes clusters in VMs on your parent Kubernetes cluster. Avoid network overlay overheads; extend layer 3 network reachability all the way from pods running in the nested cluster to your physical network infrastructure.

Concepts​

This guide assumes you have a good understanding of BGP and Calico Cloud's BGP peering options.

The guide makes use of several Calico Cloud resources, including:

  • BGPPeer - Used to configure peerings both with external BGP routers and with BGP-enabled workloads.
  • BGPFilter - Defines a BGP filter to control which routes are imported or exported between BGP peers.
  • BGPConfiguration - Configures global BGP settings for the cluster.

See Configure BGP peering for a general overview of these concepts.

Supported BGP topologies​

Both parent and nested clusters must be configured to use non-overlay BGP networking.

eBGP must be used both for peering between parent cluster and top-of-rack router and for peering between parent cluster and nested workload cluster. This means that the AS numbers of the parent cluster, the nested cluster, and the top-of-rack router must all be different.

iBGP peerings to workloads or ToR routers are not supported:

  • iBGP to the ToR router is not supported because there is currently no way to configure "next hop self" behaviour on that peering (so the ToR would receive the nested cluster's routes with the workload's IP as next hop, rather than the parent cluster node's IP).

  • iBGP to the nested cluster has not been tested. It may work with appropriate BGPFilter resources to ensure the correct routes are re-advertised.

Peering with workloads and using a BGP mesh within the cluster is not supported; this combination has not been tested and it is likely to require additional BGPFilter resources to ensure routes are correctly re-advertised.

Supported KubeVirt networking modes​

Only KubeVirt's "bridged" networking mode is supported. This is because the peering from parent cluster to the workload is done using the "pod IP" of the workload. This must agree with the IP that the workload itself uses to source traffic.

How to​

Configure global BGP settings​

In order to ease configuration of the BGPPeer resources in the nested clusters, the workload BGP peering feature uses a single virtual IP (per IP version) that is provisioned on every node in the parent cluster. The workloads are configured to peer with this virtual IP rather than needing to know the specific IP of their parent node.

To avoid conflicts with your network, the addresses are configurable through the BGPConfiguration resource, and there are no defaults. They must be configured to enable the feature:

For IPv4, we recommend using a link-local IP address that is not otherwise in use on the parent cluster's network. Note that Calico Cloud uses 169.254.0.1 and 169.254.0.2 internally so these must not be used. You may also wish to avoid cloud provider metadata addresses, such as 169.254.169.254 (as used by AWS). In this guide we'll use 169.254.0.179; you can use the same address unless it conflicts with some other use on your nodes.

If using the IPv6 address, you must not use a link local address (because IPv6 requires link-local addresses to be qualified with a scope tag, and Calico Cloud currently has no way to provision the correct scopes). Instead, we recommend using a ULA address that is not otherwise in use in your network. In this guide we'll use fdc9:9723:09bc::1 (which was chosen randomly according to RFC 4193 for this guide, so it may be suitable for your use as well).

Once you have chosen the IP address(es), configure the BGPConfiguration resource. (You can omit the IPv6 address if not using IPv6.)

apiVersion: projectcalico.org/v3
kind: BGPConfiguration
metadata:
name: default
spec:
...
localWorkloadPeeringIPV4: "169.254.0.179"
localWorkloadPeeringIPV6: "fdc9:9723:09bc::1"

Create BGPFilter resources​

Before creating the BGPPeer resource to peer with the workloads, we create BGPFilter resources to control which routes are imported and exported.

  • Because the workloads are running on the Calico Cloud cluster, we can assume that they already have a valid default route via their host node. Hence, we don't need to export any routes to the workloads. Create the following BGPFilter resource, which can be used to block export of any routes.

    apiVersion: projectcalico.org/v3
    kind: BGPFilter
    metadata:
    name: no-export
    spec:
    exportV4:
    - action: Reject
    exportV6:
    - action: Reject
  • For added security, it is also wise to limit what routes we'll accept from the workloads. Create a BGPFilter resource that only accepts routes that are within the IP pool of the nested cluster. For example, if the nested cluster's IP pools are 10.123.0.0/16 and ca11:c0::/48:

    apiVersion: projectcalico.org/v3
    kind: BGPFilter
    metadata:
    name: accept-nested-ip-pools
    spec:
    importV4:
    - action: Accept
    matchOperator: In
    cidr: 10.123.0.0/16 # IP pool CIDR of nested cluster
    importV6:
    - action: Accept
    matchOperator: In
    cidr: ca11:c0::/48 # IP pool CIDR of nested cluster

In addition we need to create a BGPFilter to tell Calico Cloud to re-advertise routes learned from the workloads to the ToR. Create the following BGPFilter resource, adjusting the CIDRs as appropriate for your nested cluster:

apiVersion: projectcalico.org/v3
kind: BGPFilter
metadata:
name: export-to-tor
spec:
exportV4:
- action: Accept
matchOperator: In
cidr: 10.123.0.0/16
source: RemotePeers
exportV6:
- action: Accept
matchOperator: In
cidr: ca11:c0::/48
source: RemotePeers

Configure parent cluster BGP topology​

Configure the parent cluster to use BGP networking and to peer with its top-of-rack router(s) over eBGP. For example, by using the downward default model.

Include the BGPFilter resource defined above in the BGPPeer resource used to peer with the ToR:

apiVersion: projectcalico.org/v3
kind: BGPPeer
metadata:
name: node-tor-peer
spec:
peerIP: <ToR IP>
asNumber: <ToR AS number>
filters:
- export-to-tor

Configure BGP peering with workloads​

In the parent cluster, create a BGPPeer resource to peer with the workloads.

  • The localWorkloadSelector field is used to select which workloads to peer with. In this example, we select workloads with the label color=red.

  • Include the BGPFilter resources defined above in the BGPPeer resource to control which routes are imported and exported.

apiVersion: projectcalico.org/v3
kind: BGPPeer
metadata:
name: peer-to-workloads
spec:
localWorkloadSelector: color == 'red'
asNumber: <workload cluster's AS number>
filters:
- no-export
- accept-nested-ip-pools

Both per-node and global BGPPeer resources are supported. To limit the peering to certain nodes, use the nodeSelector field.

note

localWorkloadSelector is scoped to all workloads and it does not currently have a corresponding "namespace selector". To match on namespace labels, you can prefix the label name with pcns..

In the child cluster,

  • Ensure that the node-to-node mesh is disabled in the BGPConfiguration resource and that the workload cluster's AS number is set:

    apiVersion: projectcalico.org/v3
    kind: BGPConfiguration
    metadata:
    name: default
    spec:
    nodeToNodeMeshEnabled: false
    asNumber: <workload cluster's AS number>
  • Create a BGPPeer resource (per desired IP version) that peers with the parent clusters' virtual IP:

    apiVersion: projectcalico.org/v3
    kind: BGPPeer
    metadata:
    name: peer-to-parent-cluster
    spec:
    peerIP: 169.254.0.179
    asNumber: <parent cluster's AS number>
    ---
    apiVersion: projectcalico.org/v3
    kind: BGPPeer
    metadata:
    name: peer-to-parent-cluster-v6
    spec:
    peerIP: "fdc9:9723:09bc::1"
    asNumber: <parent cluster's AS number>

Configure more than one nested cluster​

You can configure more than one nested cluster by:

  • Choosing a unique non-overlapping IP pool for each nested cluster.
  • In the parent cluster: creating a new import BGPFilter resource for each cluster to import its routes.
  • In the parent cluster: creating a new, uniquely-named BGPPeer resource for each cluster to peer with the parent cluster. You can re-use the shared "no-export" BGPFilter resource with every nested cluster peering.
  • In the parent cluster: updating the export-to-tor BGPFilter resource to include the new cluster's CIDRs.
  • In the nested cluster: adding a copy of the peer-to-parent-cluster BGPPeer(s). These should be the same in all nested clusters since a parent cluster always uses the same virtual IP for all nested clusters.
note

If you plan to use many nested clusters consider allocating their IP pools from a single larger CIDR. Then the export-to-tor filter can be simplified to match on the larger CIDR only.

If you trust the nested clusters, similarly, you could also re-use a single accept-nested-ip-pools filter for all child clusters.

Additional resources​