VMWare Internals: Understanding how vMotion works

Understanding how vMotion works

This is my first article on detailing VMWare Internals and in this article we discuss about how VMWare vMotion works and what happens in the background during the vMotion.

About vMotion

vMotion is one of the key feature of VMWare technology. vMotion is a live migration/movement of running Virtual Machines (VMs) between physical hosts with ZERO downtime and offering continuous service availability. It is transparent to the VM Guest OS & applications as well as to the end user.  The primary use case is to support manual load balancing of your ESX servers in cases like, a host requires a hardware maintenance or is highly overloaded, then you can move the running VMs off of that host using vMotion. It avoids server downtime, allows administrators to troubleshoot physical host issues and provides flexibility of balancing VMs across the hosts. Many people misunderstand vMotion as HA but its essentially an invaluable tool for IT admins to achieve zero downtime load balancing of running VMs.  Its is the key feature on which various other capabilities are built upon. They include the DRS (Distributed Resource Scheduler), DPM (Distributed Power Management), FT (Fault Tolerance).  

Notes: Migration is the process of moving a virtual machine from one host or storage location to another. Copying a virtual machine creates a new virtual machine. It is not a form of migration. You cannot use vMotion to move virtual machines from one datacentre to another. Migration with Storage vMotion allows you to move a virtual machine’s storage without any interruption in the availability of the virtual machine.

Below is the architecture diagram of vMotion functionality:

図5.vMotionネットワークの構成例

 

Configuration Requirements for vMotion

  1. HOST Level
  1. Both hosts must be correctly licensed for vMotion.
  2. Both hosts must have access to VMs shared storage.
  3. Both hosts must have access to VMs shared network.
  • VM Level:
    1. VMs using raw disks for clustering purposes, can’t be migrated.
    2. VMs connected with Virtual Devices that are attached to Client computer, can’t be migrated.
    3. VMs connected with Virtual Devices that are NOT accessible by destination host, can’t be migrated.
  • Network Level:
    1. 10 GbE network connectivity between the physical hosts so that transfer can occur faster
    2. Transfer via Multi-NICs is supported, configure them all on one vSwitch
  • What needs to be Migrated?

    Its explained that vMotion involves migrating/moving a running VM between physical hosts, lets get another step deep to understand what actually gets migrated.  vMotion leverages the vSphere checkpoint state serialization infrastructure (referred as checkpoint infrastructure) to make the migration.

    1. Processor and Device state (CPU, Network, SVGA, etc.)
    2. Disk (use shared storage between source and destination host)
    3. Memory (Pre-Copy memory while VM is running)
    4. Network (Reverse ARP to the router to notify the host change)

    Process of vMotion:

    In order to migrate above listed components as part of vMotion, it involves below process:

    1. Pre-Copy memory from Source to Destination host.  Pre-Copy is achieved through Memory Iterative Pre-Copy which includes below stages:
    1. First Phase, ‘Trace Phase/HEAT Phase’
    1. Send the VM’s ‘cold’ pages (least changing memory contents) from source to destination.
    2. Trace all the VM’s memory.
    3. Performance impact: noticeable brief drop in throughput due to trace installation, generally proportional to memory size.
  • Subsequent Phases
    1. Pass over memory again, sending pages modified since the previous phase.
    2. Trace each page as it is transmitted
    3. Performance impact: usually minimal on guest performance
  • Switch-over phase
    1. If pre-copy has converged, very few dirty pages remain
    2. VM is momentarily quiesced on source and resumed on destination
    3. Performance impact: increase of latency as the guest is stopped, duration less than a second
  • Below is a pictorial representation of the Memory Iterative Pre-Copy process.
    1. image
  • Quiesce Vm on Source host
  • Transfer device state from source to destination
  • Copy remainder of memory from source to destination.
    1. Uses Stun During Page Send (SDPS) to ensure more active, large memory virtual machines that are successfully vMotioned from one host to another
    2. SDPS intentionally slows down the vCPUs to keep the virtual machine’s memory from being faster.This guarantees that the vMotion operation ultimately succeeds impacting the performance while vMotion is in progress.
  • Resume VM on the destination
  • Free VM resources on the source machine
  • VM sends a reverse ARP to the router to notify the host change
  •  

     

    References:

    1. ESXi and vCenter Server 5 Documentation > vSphere Virtual Machine Administration > Managing Virtual Machines > Migrating Virtual Machines
    2. VMware vMotion in VMware vSphere 5: Architecture, Performance & Best Practices (VSP2122)
    3. Virtual machine performance degrades while a vMotion is being performed (2007595)
    4. New features vMotion Hen of VMware vSphere 6.0
    5. VSP2122 VMware vMotion in VMware vSphere 5.0: Architecture, Performance and Best Practices
    6. vMotion – How it Works – in details….VMworld Video