Adam M. Costello and George Varghese
Computer and Communications Research Center
Washington University in St. Louis
Current BSD kernels (4.4BSD-Lite and derivatives) take
time proportional to the number of outstanding timers
(“callouts”) to set or cancel timers. Our implementation
takes constant time to start, stop, and maintain timers; this leads
to a highly scalable design that can support thousands of outstanding
timers without much overhead. Unlike the existing implementation, our
routines are guaranteed to lock out interrupts only for a small, bounded
amount of time. We also extend the
to allow a process to have multiple outstanding timers, thereby reducing
the need for users to maintain their own timer packages. The changes
to the kernel are small (548 lines of code added, 80 removed), and are
available below. See also:
Patches for NetBSD:
To activate the new facilities, you need additional configuration options.
For convenience, patches.tar.gz contains all of the
patches, and the options.
Warning: They will all untar into the current directory.
Fixed bug: The hash function used when the performance results in the paper were generated accidentally used bitwise-or instead of bitwise-xor. Somehow, it didn't seem to adversely affect the performance, but it's been fixed in the patch above. Here is the original flawed version of kern_clock.c.patch.
timeout_guts() inserts the
callout structure at the head of the list, rather than
the tail. This causes two problems. First, callouts scheduled for the
same time are not executed in causal order (the order in which they were
scheduled), but rather in the reverse order, which differs from the
existing implementation. Second, callouts that happen to get inserted
into the list currently being traversed by
don't get seen until
callwheelsize ticks in the future,
although they ought to get processed immediately. (Thanks to Glenn
Waters for noticing this.)
The solution to both problems is to have
insert at the end of the list. This could be achieved by making the
list fully circular, with each
callwheel element being a
dummy list item with both a forward and backward pointer. All code
that operates on these lists would have to be modified.
Another way would be to leave the
as-is, use the
c_back member of the first callout as
a tail-pointer to the last callout, leave the
member of the last callout pointing to
callout structure with a new pointer to the
callwheel element at the head of the list (needed for
updating the tail-pointer when the last callout is canceled). This
would probably be easier to code, but...
Justin Gibbs has further improved on that idea. Store the absolute expiration time in the callout instead of the number of callwheel cycles until the expiration. This allows you to deduce which callwheel entry heads the list containing the callout, eliminating the need for the new pointer.
No matter how we choose to append callouts to the end of
timeout_guts() will need to change
nextsoftcheck to point to the new callout if
+ to_ticks - softticks) & callwheelbits is 0 (i.e. if
softclock() is just about to fall off the end of the list
Aesthetic nitpick: The code uses “mrtimer” and “mrt” for some names, and “mritimer” and “mrit” for others. Eventually, all the names should be made consistent (preferring the latter convention).
I am no longer set up to compile and test the code, so I haven't tried implementing the changes outlined above. However, Justin Gibbs is continuing the development in the FreeBSD environment (at least, the callout portion, maybe not the mritimer stuff).