In virtual machine usage scenarios, live migration allows a virtual machine to be moved from one node to another for operations such as node maintenance, upgrades, and failover.
However KubeVirt faces the following challenges during live migration:
Kube-OVN specifically addresses the above issues during the virtual machine migration process, allowing users to perform network-transparent live migrations. Our tests show that network interruption time can be controlled within 0.5 seconds, and TCP connections remain uninterrupted.
Users only need to add the annotation kubevirt.io/allow-pod-bridge-network-live-migration: "true"
in the VM Spec. Kube-OVN will automatically handle network migration during the process.
1. Create VM
kubectl apply -f - <<EOF
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
name: testvm
spec:
runStrategy: Always
template:
metadata:
labels:
kubevirt.io/size: small
kubevirt.io/domain: testvm
annotations:
kubevirt.io/allow-pod-bridge-network-live-migration: "true"
spec:
domain:
devices:
disks:
- name: containerdisk
disk:
bus: virtio
- name: cloudinitdisk
disk:
bus: virtio
interfaces:
- name: default
bridge: {}
resources:
requests:
memory: 64M
networks:
- name: default
pod: {}
volumes:
- name: containerdisk
containerDisk:
image: quay.io/kubevirt/cirros-container-disk-demo
- name: cloudinitdisk
cloudInitNoCloud:
userDataBase64: SGkuXG4=
EOF
2.
SSH into the Virtual Machine and Test Network Connectivity
# password: gocubsgo
virtctl ssh cirros@testvm
ping 8.8.8.8
3. Perform Migration in Another Terminal and Observe Virtual Machine Network Connectivity
virtctl migrate testvm
It can be observed that during the VM live migration process, the SSH connection remains uninterrupted, and ping only experiences packet loss in a few instances.
During the live migration process, Kube-OVN implements techniques inspired by the Red Hat team's Live migration - Reducing downtime with multi-chassis port bindings.
To ensure network consistency between the source and target virtual machines during migration, the same IP address exists on the network for both the source and target VMs. This requires handling network conflicts and traffic confusion. The specific steps are as follows: Here’s the translation:
In this process, the network interruption mainly occurs between steps 5 and 6. The network interruption time primarily depends on the time it takes for libvirt to send the RARP. Tests show that the network interruption time can be controlled within 0.5 seconds, and TCP connections will not experience interruptions due to the retry mechanism.