Adam M. Costello and George Varghese
Computer and Communications
Research Center
Washington University in St. Louis
Current BSD kernels (4.4BSD-Lite and derivatives) take
time proportional to the number of outstanding timers
(“callouts”) to set or cancel timers. Our implementation
takes constant time to start, stop, and maintain timers; this leads
to a highly scalable design that can support thousands of outstanding
timers without much overhead. Unlike the existing implementation, our
routines are guaranteed to lock out interrupts only for a small, bounded
amount of time. We also extend the setitimer()
interface
to allow a process to have multiple outstanding timers, thereby reducing
the need for users to maintain their own timer packages. The changes
to the kernel are small (548 lines of code added, 80 removed), and are
available below. See also:
Patches for NetBSD:
To activate the new facilities, you need additional configuration options.
For convenience, patches.tar.gz contains all of the
patches, and the options.
Warning: They will all untar into the current
directory.
Fixed bug: The hash function used when the performance results in the paper were generated accidentally used bitwise-or instead of bitwise-xor. Somehow, it didn't seem to adversely affect the performance, but it's been fixed in the patch above. Here is the original flawed version of kern_clock.c.patch.
Known bug: timeout_guts()
inserts the
new callout
structure at the head of the list, rather than
the tail. This causes two problems. First, callouts scheduled for the
same time are not executed in causal order (the order in which they were
scheduled), but rather in the reverse order, which differs from the
existing implementation. Second, callouts that happen to get inserted
into the list currently being traversed by softclock()
don't get seen until callwheelsize
ticks in the future,
although they ought to get processed immediately. (Thanks to Glenn
Waters for noticing this.)
The solution to both problems is to have timeout_guts()
insert at the end of the list. This could be achieved by making the
list fully circular, with each callwheel
element being a
dummy list item with both a forward and backward pointer. All code
that operates on these lists would have to be modified.
Another way would be to leave the callwheel
array
as-is, use the c_back
member of the first callout as
a tail-pointer to the last callout, leave the c_next
member of the last callout pointing to NULL
, and
enlarge the callout
structure with a new pointer to the
callwheel
element at the head of the list (needed for
updating the tail-pointer when the last callout is canceled). This
would probably be easier to code, but...
Justin Gibbs has further improved on that idea. Store the absolute expiration time in the callout instead of the number of callwheel cycles until the expiration. This allows you to deduce which callwheel entry heads the list containing the callout, eliminating the need for the new pointer.
No matter how we choose to append callouts to the end of
the list, timeout_guts()
will need to change
nextsoftcheck
to point to the new callout if
nextsoftcheck
is NULL
and (ticks
+ to_ticks - softticks) & callwheelbits
is 0 (i.e. if
softclock()
is just about to fall off the end of the list
being appended).
Aesthetic nitpick: The code uses “mrtimer” and “mrt” for some names, and “mritimer” and “mrit” for others. Eventually, all the names should be made consistent (preferring the latter convention).
I am no longer set up to compile and test the code, so I haven't tried implementing the changes outlined above. However, Justin Gibbs is continuing the development in the FreeBSD environment (at least, the callout portion, maybe not the mritimer stuff).
|