Summary: Transparent Process Migration: Design Alternatives and the Sprite Implementation

Transparent Process Migration: Design Alternatives and the Sprite Implementation

by Fred Douglis and John Ousterhout
summarized by Adam M. Costello

Overview

Process migration means the ability to move a running process from a source machine to a target machine of the same architecture on the local network. Sprite is an operating system for a local collection of workstations and file servers. It provides consistent access to files and devices from anywhere in the network. Process migration was designed to allow idle machines to be exploited, but not at the expense of their returning users.

Sprite process migration is highly transparent. Each process has a home machine, and in most cases, neither the process nor the users notice whether the process is executing at home or somewhere else. A migrated process has residual dependencies on its home machine, but never on any other machine, no matter how many times it migrates.

The overall problem is transfering state associated with the process, like virtual memory, open files, and the miscellaneous kernel state normally found in a process control block (e.g. current working directory, signal masks, etc). When a process migrates, each piece of state must be transfered, or may be left behind and arrangements made to allow remote access to it, or it may be ignored, sacrificing transparency.

Mechanisms

Virtual memory could be transfered in a number of ways:

The LOCUS and Charlotte systems freeze the process on the source machine, copy the entire virtual memory, then start the process on the target machine. The freeze time can be unacceptable for some applications.

The V system pre-copies the virtual memory while the process is still running on the source, then freezes the process and re-copies dirty pages to the target, then starts the process on the target. This allows for a much shorter freeze time, at the expense of slightly more total work.

The Accent system freezes the process on the source and immediately starts it on the target, transfering pages on demand. This allows for significantly less total work, but residual dependencies on the source machine can persist for the life of the process.

Sprite freezes the process on the source, flushes its dirty pages to disk, then starts the process on the target, which pages from the disk. The requires slightly more total work than Accent, because a few pages may make two hops, but there are no residual dependencies on the source after the process has started on the target. This scheme depends on the fact that normal network files are used as backing store for virtual memory.

In order to transfer open files, it is not sufficient to close the file on the source machine and reopen it on the target, because the file might not still exist by then. In Sprite, it was undesirable to open the file on the target before closing it on the source because this could cause caching to be disabled unnecessarily (because it looks like the file is being shared). Special file server code allows the file reference to be moved atomically. As part of the move, the source flushes its cached file blocks back to the server.

Sometimes, after calls to fork, processes may share the access location (seek point) of an open file. If the processes do not all reside on the same machine, the access location is managed by the file server, and caching is disabled for the file. Another approach, used by LOCUS, is to pass a token among the processes which grants exclusive access to the file, but that is more complex, and Sprite already used cache-disabling for cache consistency.

Transfering the process control block (PCB) is easy. Both the home machine and the current machine have a PCB; when the process is away from home, most fields of the home PCB are ignored, except for a pointer to the current machine.

To achieve transparency, kernel calls must work the same on the current machine as they would on the home machine. Some calls are location independent, like open (because all machines share a file namespace). Other calls, like read and getuid, work because state has been transfered. Still others, like gettimeofday and getpgrp, are forwarded back to the home machine via RPC. Forwarding is also used in the reverse direction to deliver signals. Some ad hoc techniques are used for fork, exit, and wait, which require cooperation between the home and current machines.

The residual dependencies are few enough that they do not greatly impact the performance of the home machine, and there are no dependencies on vacated machines, which is good for users reclaiming their formerly idle machines. It would be desirable to allow a process to break all ties with its home machine, to survive a failure of the home machine, but that presents challenges to transparency.

Policies

A fully automatic policy, like that of MOSIX, would view the network as a pool of processors. Users would have no control over where processes execute, and processes would be moved dynamically to balance load. A fully manual policy would require users to select machines and request migration explicitly.

Sprite automatically maintains a central database of idle machines, updated by daemons on each machine that watch for keyboard and mouse input, and load average. Applications like pmake (parallel make) can request a number of idle machines from the database, and use the returned list (which might be shorter than requested) to migrate subprocesses. Hosts that have been idle the longest are returned first.

When the daemon detects a returning user, foreign processes are evicted. The central database server can also evict processes when it receives a request that cannot be granted, and other applications instances are using more than their fair share of machines.

Performance

Migration was used often by Sprite users, mostly with pmake and simulations. Compilations achieved speedups of 3 to 4 times using 4 to 6 remote machines. Additional machines were less helpful as the home machine and file server became the bottleneck. Starting a new remote process typically took a third of a second.

Prepared by Adam M. Costello
Last modified: 1997-Sep-26-Fri 08:12:59 GMT