Configure BGP peering with nested clusters running on KubeVirt VMs
Big picture​
Configure BGP peering with Calico Cloud nodes running on KubeVirt VMs that are running on your Kubernetes cluster.
Value​
Run nested Kubernetes clusters in VMs on your parent Kubernetes cluster. Avoid network overlay overheads; extend layer 3 network reachability all the way from pods running in the nested cluster to your physical network infrastructure.
Concepts​
This guide assumes you have a good understanding of BGP and Calico Cloud's BGP peering options.
The guide makes use of several Calico Cloud resources, including:
- BGPPeer - Used to configure peerings both with external BGP routers and with BGP-enabled workloads.
- BGPFilter - Defines a BGP filter to control which routes are imported or exported between BGP peers.
- BGPConfiguration - Configures global BGP settings for the cluster.
See Configure BGP peering for a general overview of these concepts.
Supported BGP topologies​
Both parent and nested clusters must be configured to use non-overlay BGP networking.
eBGP must be used both for peering between parent cluster and top-of-rack router and for peering between parent cluster and nested workload cluster. This means that the AS numbers of the parent cluster, the nested cluster, and the top-of-rack router must all be different.
iBGP peerings to workloads or ToR routers are not supported:
-
iBGP to the ToR router is not supported because there is currently no way to configure "next hop self" behaviour on that peering (so the ToR would receive the nested cluster's routes with the workload's IP as next hop, rather than the parent cluster node's IP).
-
iBGP to the nested cluster has not been tested. It may work with appropriate
BGPFilter
resources to ensure the correct routes are re-advertised.
Peering with workloads and using a BGP mesh within the cluster is not supported;
this combination has not been tested and it is likely to require additional
BGPFilter
resources to ensure routes are correctly re-advertised.
Supported KubeVirt networking modes​
Only KubeVirt's "bridged" networking mode is supported. This is because the peering from parent cluster to the workload is done using the "pod IP" of the workload. This must agree with the IP that the workload itself uses to source traffic.
How to​
- Configure global BGP settings
- Create BGPFilter resources
- Configure parent cluster BGP topology
- Configure BGP peering with workloads
- Configure more than one nested cluster
Configure global BGP settings​
In order to ease configuration of the BGPPeer
resources in the nested clusters,
the workload BGP peering feature uses a single virtual IP (per IP version) that
is provisioned on every node in the parent cluster. The workloads are configured
to peer with this virtual IP rather than needing to know the specific IP of their
parent node.
To avoid conflicts with your network, the addresses are configurable through
the BGPConfiguration
resource, and there are no defaults. They must be
configured to enable the feature:
For IPv4, we recommend using a link-local IP address that is not otherwise in use on the parent cluster's network. Note that Calico Cloud uses 169.254.0.1 and 169.254.0.2 internally so these must not be used. You may also wish to avoid cloud provider metadata addresses, such as 169.254.169.254 (as used by AWS). In this guide we'll use 169.254.0.179; you can use the same address unless it conflicts with some other use on your nodes.
If using the IPv6 address, you must not use a link local address (because IPv6 requires link-local addresses to be qualified with a scope tag, and Calico Cloud currently has no way to provision the correct scopes). Instead, we recommend using a ULA address that is not otherwise in use in your network. In this guide we'll use fdc9:9723:09bc::1 (which was chosen randomly according to RFC 4193 for this guide, so it may be suitable for your use as well).
Once you have chosen the IP address(es), configure the BGPConfiguration
resource.
(You can omit the IPv6 address if not using IPv6.)
apiVersion: projectcalico.org/v3
kind: BGPConfiguration
metadata:
name: default
spec:
...
localWorkloadPeeringIPV4: "169.254.0.179"
localWorkloadPeeringIPV6: "fdc9:9723:09bc::1"
Create BGPFilter
resources​
Before creating the BGPPeer
resource to peer with the workloads, we create
BGPFilter
resources to control which routes are imported and exported.
-
Because the workloads are running on the Calico Cloud cluster, we can assume that they already have a valid default route via their host node. Hence, we don't need to export any routes to the workloads. Create the following
BGPFilter
resource, which can be used to block export of any routes.apiVersion: projectcalico.org/v3
kind: BGPFilter
metadata:
name: no-export
spec:
exportV4:
- action: Reject
exportV6:
- action: Reject -
For added security, it is also wise to limit what routes we'll accept from the workloads. Create a
BGPFilter
resource that only accepts routes that are within the IP pool of the nested cluster. For example, if the nested cluster's IP pools are 10.123.0.0/16 and ca11:c0::/48:apiVersion: projectcalico.org/v3
kind: BGPFilter
metadata:
name: accept-nested-ip-pools
spec:
importV4:
- action: Accept
matchOperator: In
cidr: 10.123.0.0/16 # IP pool CIDR of nested cluster
importV6:
- action: Accept
matchOperator: In
cidr: ca11:c0::/48 # IP pool CIDR of nested cluster
In addition we need to create a BGPFilter
to tell Calico Cloud to re-advertise
routes learned from the workloads to the ToR. Create the following BGPFilter
resource,
adjusting the CIDRs as appropriate for your nested cluster:
apiVersion: projectcalico.org/v3
kind: BGPFilter
metadata:
name: export-to-tor
spec:
exportV4:
- action: Accept
matchOperator: In
cidr: 10.123.0.0/16
source: RemotePeers
exportV6:
- action: Accept
matchOperator: In
cidr: ca11:c0::/48
source: RemotePeers
Configure parent cluster BGP topology​
Configure the parent cluster to use BGP networking and to peer with its top-of-rack router(s) over eBGP. For example, by using the downward default model.
Include the BGPFilter
resource defined above in the BGPPeer
resource used to peer with the ToR:
apiVersion: projectcalico.org/v3
kind: BGPPeer
metadata:
name: node-tor-peer
spec:
peerIP: <ToR IP>
asNumber: <ToR AS number>
filters:
- export-to-tor
Configure BGP peering with workloads​
In the parent cluster, create a BGPPeer
resource to peer with the workloads.
-
The
localWorkloadSelector
field is used to select which workloads to peer with. In this example, we select workloads with the labelcolor=red
. -
Include the
BGPFilter
resources defined above in theBGPPeer
resource to control which routes are imported and exported.
apiVersion: projectcalico.org/v3
kind: BGPPeer
metadata:
name: peer-to-workloads
spec:
localWorkloadSelector: color == 'red'
asNumber: <workload cluster's AS number>
filters:
- no-export
- accept-nested-ip-pools
Both per-node and global BGPPeer
resources are supported. To limit the peering to certain nodes, use the nodeSelector
field.
localWorkloadSelector
is scoped to all workloads and it does not currently have a corresponding
"namespace selector". To match on namespace labels, you can prefix the label name with pcns.
.
In the child cluster,
-
Ensure that the node-to-node mesh is disabled in the
BGPConfiguration
resource and that the workload cluster's AS number is set:apiVersion: projectcalico.org/v3
kind: BGPConfiguration
metadata:
name: default
spec:
nodeToNodeMeshEnabled: false
asNumber: <workload cluster's AS number> -
Create a
BGPPeer
resource (per desired IP version) that peers with the parent clusters' virtual IP:apiVersion: projectcalico.org/v3
kind: BGPPeer
metadata:
name: peer-to-parent-cluster
spec:
peerIP: 169.254.0.179
asNumber: <parent cluster's AS number>
---
apiVersion: projectcalico.org/v3
kind: BGPPeer
metadata:
name: peer-to-parent-cluster-v6
spec:
peerIP: "fdc9:9723:09bc::1"
asNumber: <parent cluster's AS number>
Configure more than one nested cluster​
You can configure more than one nested cluster by:
- Choosing a unique non-overlapping IP pool for each nested cluster.
- In the parent cluster: creating a new import
BGPFilter
resource for each cluster to import its routes. - In the parent cluster: creating a new, uniquely-named
BGPPeer
resource for each cluster to peer with the parent cluster. You can re-use the shared "no-export"BGPFilter
resource with every nested cluster peering. - In the parent cluster: updating the
export-to-tor
BGPFilter
resource to include the new cluster's CIDRs. - In the nested cluster: adding a copy of the
peer-to-parent-cluster
BGPPeer
(s). These should be the same in all nested clusters since a parent cluster always uses the same virtual IP for all nested clusters.
If you plan to use many nested clusters consider allocating their IP pools from a single larger
CIDR. Then the export-to-tor
filter can be simplified to match on the larger CIDR only.
If you trust the nested clusters, similarly, you could also re-use a single accept-nested-ip-pools
filter
for all child clusters.