Summary         Details         People         Papers         Demos        

Project Summary

The increasing complexity, scale, and dynamics of networked computing systems make it hard for users and system administrators to understand and control these systems. Recent studies indicate that a significant fraction of user time gets wasted because of unexpected system slowdowns, crashes, and application errors. Business-critical systems often have hundreds of components---e.g., applications, databases, servers, routers---whose performance depend on thousands of intricate and time-varying dependencies and parameters. The Ques project aims to arrest and reverse the dangerous spiral towards unwieldy systems, high administrative costs, and frustrated users.

Ques is supported generously by NSF CAREER Award Number 0644106, startup funds from Duke, three faculty awards from IBM, and an equipment grant (jointly with three other Duke faculty members) from IBM.

Project Details

Ques tackles system management through innovative data management. Ques treats a computing system as a rich source of data about system configuration and activity, available typically as continuous, rapid, and time-varying data streams. The system data---e.g., multidimensional time-series of performance and utilization metrics, control and data-flow paths of requests, and error messages---is collected in an efficient and controlled fashion. Ques gives users and administrators the ability to pose a broad range of system-management queries over this data:

Ques-Querying addresses challenges in developing simple and intuitive ways to express such queries---e.g., using a visual interface, declarative query language, or keyword search---and processing the queries automatically and efficiently using execution plans. These plans use statistical (e.g., neural network) and performance (e.g., queuing network) models learned from system data as well as operators for data transformation (e.g., feature selection) and inference. We have developed algorithms to navigate the huge plan space comprising models, model-parameters, and transformations quickly using techniques like cost estimation---estimating plan accuracy and execution time using statistics---and active-learning---executing sample plans for learning purposes.

Ques-Control is an ambitious next step to Ques-Querying to enable automated control of complex computing systems under changing conditions, based on policies specified by system administrators. Like Ques-Querying, Ques-Control learns models of system behavior from data collected passively or through active perturbation. Given a set of system policies P, Ques-Control derives a controller---an execution plan based on sensing, actuation, and feedback---to enforce P always. Ques-Control poses interesting challenges in policy-interface design, acquiring the right training data to model specific system behavior quickly, robustness to bursty workloads, and proactive system tuning.

Ques seeks to advance the state of the art in our ability to understand and control computing systems in a number of ways:

We are committed to building a fully-functional prototype of Ques and deploying it in real-world settings. With each novel component of Ques, we will: (i) perform the research and evaluation using a prototype in a testbed setting, with both synthetic and real applications and data, (ii) demonstrate the prototype at a leading conference, (iii) make the demonstration available publicly on the Internet, (iv) do a real-world deployment and user studies if there is sufficient interest, and (v) release the source code publicly. The effectiveness of Ques will be tested by deploying it to manage workloads on a virtualized, service-oriented, and on-demand computing platform on our departmental research-computing cluster. We have also had encouraging preliminary discussions with the administrators of an university-wide production cluster used heavily for computational-science applications. We have established industrial collaborations (IBM) with the eventual goal of transferring technology from Ques to industrial-strength system-management products.

Project Members