by Adam M. Costello
A job is something that runs on a server and consumes CPU cycles (and other resources, but for now we will restrict our attention to this one resource). A server may be a single computer with one processor, or a multiprocessor, or a cluster, or anything else capable of running jobs. A client is something that submits jobs to servers.
Clients pay for their resource usage with compucredit. Each administrative domain has its own currency, which is accepted by servers within that domain. Clients can obtain the needed currency by receiving an allowance from the administration, or by purchasing it with real money, or perhaps by trading the currencies of other domains. The workings of a currency exchange market depend on whether payments resemble cash or checks, and are beyond the scope of this proposal.
In order for a job to be run, the client and server must negotiate a contract. We do not assume that the job's run time or CPU consumption can be predicted in advance; therefore, the contract does not specify a price. Rather, it specifies a time limit and a function mapping actual resource usage to price.
The client makes an offer (a contract proposal) to the server, who either accepts or rejects it. A rejection may be accompanied by counter-offers (offers that the server would accept if the client made them immediately). When the server accepts an offer, it begins running the job.
While the job is running, the client can instruct the server to suspend or kill the job. The server may suspend or kill the job only after the time limit (which is wall-clock time) is reached, or if the job violates some promise (such promises are beyond the scope of this proposal). Of course, the job can also terminate normally or crash.
Whenever a job stops running, the server calculates the price of the CPU cycles used by the job according to the terms of the contract, and collects the payment from the client. That contract is then discarded. If the job was not terminated, but only suspended, then the client and server may negotiate a new contract and the job may be resumed.
The server might require the client to prepay when the job starts running, rather than collect the payment when the job stops running (fearing that the client will be unable to pay). In this case, the server refunds the unused portion of the payment when the job stops running. Also, the server may suspend the job before the time limit is reached if the prepayment is exhausted.
If the user writes the job to be checkpointable, he has a more flexible bargaining position with respect to the time limit. Otherwise, he must make sure the time limit exceeds the maximum run time of the job, or risk having the job killed prematurely.
In order to allow negotiations to be automated, we must restrict the expressiveness of the price functions used in contracts. For CPU-bound jobs, the server must measure two things: the number of CPU cycles used (U), and the elapsed wall-clock time (T) between the start (or resumption) and end (or suspension) of the job. The amount due is U * P(U/T), where P is the price per cycle as a function of the rate of consumption.
Any sensible P function is non-decreasing, because a user will always be willing to pay more (or at least the same amount) to get the job done sooner. A sensible function will be zero below some threshhold rate (rmin), because at a low enough rate of consuption the job will not finish soon enough, and will be worthless. A sensible function will reach a maximum (Pmax), because beyond some threshhold rate (rmax), it's just not useful to get the job done any sooner. The simplest class of functions with this general shape grows linearly between rmin and rmax, so let's see if that's expressive enough. This allows P to be specified with three parameters.
If we wish to accomodate interactive (non-CPU-bound) jobs, we must have the server measure a third quantity: The minimum CPU share (S, in cycles per second) available to the job while it was running. Why? Because an interactive job never knows when it will need the CPU, but its user is willing to pay to make cycles available, even if they often go unused. A server with a simple fair time-sharing scheduler can keep track of the maximum number of processes running (not blocked) at any moment while the job was running, and divide that into the total cycles per second available (after subtracting the OS overhead). A server with a more sophisticated real-time scheduler could choose S and guarantee it, rather than measure it.
If the server knows S, it can support a second price function Q(S). The total payment due is then U * P(U/T) + T * Q(S). Like P, Q is zero below some threshhold (Smin), non-decreasing, and reaches a maximum (Qmax) after some threshhold (Smax), so let's try using the same simple class of functions that we're using for P.
Note that when a server accepts a contract without a Q function (Qmax = 0), it is guaranteeing that if the job is “always” ready to run (at some granularity), then U(t)/t will always be at least rmin, where t is the elapsed wall-clock time since the job started (or resumed). If servers did not make this guarantee, they might as well accept all offers and then not run the jobs until it becomes profitable.
Similarly, when a server accepts a contract with Qmax > 0, it is guaranteeing that cycles will be available to the job continuously at a rate of at least Smin. In this case the rmin guarantee does not apply.
An administration can create compucredit at will, and receives all the compucredit collected by its servers. The servers should try to maximize their revenue, though how to do this is left as a research question.
One possible policy for distributing compucredit in a commercial environment is to sell it for real money at a fixed exchange rate. In a non-commercial environment, like a university, where the goal is to provide “fair” access to shared resources and encourage conservative use, a better policy might be to give users allowances.
If users are given simple constant incomes, they can hoard compucredit, and there is the danger of one long-dormant user waking up and monopolizing the entire cluster for too long. A possible remedy is to make a user's income vary with her balance: dx/dt = a - bx, where x is the balance, a is a per-user parameter, and b is a global parameter that determines how quickly every user approaches her equilibrium balance of a/b. Note that if b were allowed to vary per user, then users could shuffle their compucredit to the user with the minimum b in order to increase the total income in the system.
A user could still monopolize the system by paying real money for other users' compucredit, but this could be forbidden, just as using university equipment for unauthorized commercial purposes is forbidden.