by Thomas E. Anderson, Henry M. Levy, Brian N. Bershad,
and Edward D. Lazowska
summarized by Adam M. Costello
Microprocessor architectures have been getting much faster in terms of integer and floating-point application performance, but the speed of various primitive OS operations has not kept pace. Meanwhile, operating systems have been optimized for hardware, but not the newest architectures. The costs of a null system call, a trap, changing a page table entry, and a context switch, were all measured and found to have increased relative to application performance between a recent CISC processor and several newer RISC processors.
Operating systems have been evolving from monolithic systems to modular extensible systems, with many independent address spaces communicating through message passing, but newer architectures have increased relative communication costs. Cross-machine communication is often done via RPC, whose costs are found to be dominated by system calls, traps, and memory operations, not network latency and computation. Local RPC across address spaces involves system calls and context switching. Newers processors allow faster transitions between user space and kernel space, but require software management of new processors features like register windows (SPARC) and exposed pipelines (88000). The R2000 and i860 have moved exception dispatching from hardware to software. RPC involves data copying, but memory speeds have not kept up with processor speeds.
Operating systems have been making more use of virtual memory hardware, e.g. copy-on-write in Mach, distributed shared memory, garbage collection, checkpointing, recoverable virtual memory, and transaction locking. But memory management has become more difficult in newer architectures. The exposed pipelines of the 88000, which lead to imprecise interrupts, greatly complicate fault handling, as do the floating-point pipelines of the 88000 and i860. Another problem is the reduction of helpful information about the cause of faults provided by the processor. Virtually addressed caches, like pipelines, improve performance for sequential application code, but slow down context switches because of the required cache flush, and slow down cache invalidation due to page table changes.
Operating systems are providing more support for threads, which require context switches within a single address space, preferably at the user level. Newer processors have more registers to help sequential programs reduce memory and parameter-passing costs, but these increase context-switch costs for multithreaded programs. The SPARC requires a kernel trap to change the register window pointer. Threaded programming makes heavy use of locking, but the R2000 provides no semaphore instruction. The i860 and 88000 require the OS to handle exceptions in the middle of atomic instructions or critical sections.