Scalable Multiprocessor Virtual Machines
We support virtual machines with multiple virtual processors, and which flexibly schedule the virtual processors on physical processors. For example, where a VM has four virtual processors, potentially only one is actively using a physical processor and the remaining virtual processors are preempted. Or, one of the virtual processors runs at 2 GHz, and the remaining have limited access to physical processors and thus seem to run at slower speeds, such as 1 GHz.
Free scheduling introduces the problem that virtual processors could try to acquire a kernel lock held by a preempted virtual processor. Spin locks are designed for short spin times and statistical fairness; in Linux 2.4, 95% of all lock hierarchies are released before 20 microseconds. A preempted lock holder will likely increase lock hold time to the order of multiple time slices, in the millisecond range, and thus over 1000 times longer than expected by the original design. When other virtual processors want to acquire the preempted lock, they'll wait multiple time slices, rather than several microseconds.
When running an Apache 2 workload, we found that it was subject to a 39.2% probability of kernel lock holder preemption. Apache 2 uses the sendfile()
Linux system call, which offloads file sending to the Linux kernel.
Although there is a 39.2% chance of preempting a kernel lock holder in the Apache 2 workload, we must ask whether other virtual processors actually try to acquire the preempted locks. We found that with more than two processors, 20% of the execution time was spent on extended lock waiting. Extended lock waiting is where a virtual processor spent more than a millisecond waiting for a (preempted) lock.
We developed two solutions for avoiding lock holder preemption. One solution is intrusive and applicable to paravirtualization. The other solution is non-intrusive, and thus applicable to a fully virtualized environment. We compared our solutions to the original Linux spin locks, and to the obvious yet naive solution for handling lock holder preemption, which is to yield the time slice. Our solutions nearly eliminate all extended lock waiting time.
Our solution for avoiding lock holder preemption improved the Apache 2 throughput by 28% in some cases.
Besides the problem of lock holder preemption, free scheduling introduces variable speed virtual processors, where the virtual processors can adjust speed independently. Commodity operating systems expect all processors to operate at the same speed, and thus their load balancers become confused in a freely scheduled virtual machine environment. We have a solution called time ballooning, which helps the native scheduler to make appropriate scheduling decisions in a VM environment.
For more information about our solutions for multiprocessor virtual machine scalability, see our publication Towards Scalable Multiprocessor Virtual Machines.