Curiosity is bliss    Archive    Feed    About    Search

Julien Couvreur's programming blog and more

Scheduling Your Network Connections (SYNC)

 

SYNC (Scheduling Your Network Connections) is a very interesting project that attempts to improve the responsitivity (mean response time) of a web server by applying a better scheduling strategy for the bottleneck resource (the uplink bandwidth).

SYNC description:
The default policy in current servers, called PS (Processor Sharing), is that all requests get an equal access to the response queue, taking turns to send packets. In contrast, the SRPT (shortest-remaining-processing-time-first) algorithm allows the responses that are closer to completion to send their packets first.

SRPT was prooved to be the optimal scheduling policy in terms of mean response time, but the intuition is that it hurts large jobs (unfair, but unavoidable because there is "no free lunch"). The SYNC papers actually find the counter-intuitive result that SRPT actually is almost always fair (depending on job distribution and load): SRPT not only helps small jobs, it also helps large ones most of the time ("all can win" theorem).

They also provide experimental measurements in the HTTP server world, by extending the Linux kernel to support priorities in the socket/network layers and modifying the Apache web server to use this new socket option. Since the only requests considered here are for static files, the web server easily can estimate the remaining size of each of the responses.

There remains a fairness problem, since in some cases large jobs perform less well under SRPT than they would under the strictly equitable PS policy. The question is whether a scheduling strategy can be found that keeps the expected response time lower than PS, while staying in the "always fair" category (see Classifying scheduling policies wrt unfairness and performance).

File size as job size estimate:
One of the assumptions of the SYNC project is that the file size can be used as a prediction for the remaining time for a job. But from analyzing log files from a public cache proxy, the authors of "How Well Does File Size Predict Wide-Area Transfer Time?" find no correlation between the file size and the transfer time, for small files (under 30ko) and only a limited correlation for larger files.
On the other hand, the SRPT analysis done around SYNC uses a heavy-tailed (HT = "largest 1% of jobs comprise half the load") workload model, where most of the load comes from a very small portion of the request (the largest ones). This choice was supported by their analysis of traffic logs of some large web sites. Maybe this property is what helped SRPT perform well in the measurements against PS.

Conclusion:
It is hard to doubt the results of SYNC given the described experimental setup (slide 38). I don't think they made their implementation available (I couldn't find it), but I look forward to having this improvement made widely available for some more real world testing.

Some of the diagrams showing the three fairness categories mention a "FSP" policy (Fair Sojourn Policy). I haven't had time to read the "Bounds on a Fair Policy with near Optimal Performance" paper yet, but the abstract presents FSP as having both fairness and good response times properties.


Links:
More details on SYNC.
An Analysis of SRPT scheduling: investigating unfairness.
Paper on Connection Scheduling in Web Servers.

A research project that uses the same type of OS+server modifications as SYNC to test another scheduling policy called SWIFT.

comments powered by Disqus