Flex: A Platform for Experiment-Driven System Management
Project Summary
Despite a number of recent efforts, current solutions for system-administration tasks like benchmarking, tuning, troubleshooting, and
capacity-planning remain far from satisfactory.
Consider an example scenario where a database administrator (DBA) notices a slowdown of the production database due to some unknown cause. The DBA
may collect some monitoring data on the production database in an attempt to diagnose the problem. However, data collection can increase the load on
an already under-performing database; forcing the DBA to shift to the test database. The DBA's usual course of action would be:
- Create a replica of the production environment
on the test database.
- Get more insight into system behavior by performing runs of
the production workload on the test database, and collecting
instrumentation data. Multiple runs may be required because of system
variability.
- Form hypotheses regarding potential causes of the
performance problem. Do further runs under different system
configurations to refine or confirm these hypotheses. For example,
new indexes, statistics about the data, or resources may be added;
hints may be given to the database query optimizer to force it to
choose specific query execution plans; database configuration
parameter settings may be changed; and so on.
- When a fix is found, possibly after much trial and error, a
careful validation is done to ensure that the fix will work on the
production system. Validation may require multiple runs
to test correctness and stability.
Note that the above process required the DBA to do a number of
experiments. Each experiment involved setting up the system in a
desired configuration, running a specific workload, and collecting
instrumentation data for analysis. Experiments were used (i) to better
understand the problem, (ii) during the search process for finding the fix,
and (iii) for validating that an accurate and stable fix has been found.
We call the overall process an instance of experiment-driven
management.
Experiment-driven management is an important piece of the system administration puzzle
that has largely been left untouched by researchers; until
now. The Flex project is our attempt to automate experiment-driven management and bring its
benefits to several long-standing problems in databases as well as other systems. More details of Flex's vision can be found in
this overview talk or our
HotOS 2009 paper.
Flex is supported generously by NSF, startup
funds from Duke, and three faculty awards from IBM.
Current Project Members
-
Shivnath Babu, Associate Professor, Duke Computer Science
-
Nedyalko Borisov, Ph.D. Candidate, Duke Computer Science
-
Herodotos Herodotou, Ph.D. Candidate, Duke Computer Science
-
Vamsidhar Thummala, Ph.D. Candidate, Duke Computer Science
Collaborators
-
Prof. Ashraf Aboulnaga, University of Waterloo
-
Mumtaz Ahmad, Ph.D. Candidate, University of Waterloo
-
Prof. Kamesh Munagala, Duke University
-
Prof. Jeff Chase, Duke University
Alumni
Publications
Overall Vision
On Parameter Tuning
- H. Herodotou and S. Babu.
Profiling, What-if Analysis, and Cost-based Optimization
of MapReduce Programs
In Proc. of the 2011 Intl. Conference on
Very Large Data Bases (VLDB), August 2011 (to appear)
- S. Babu.
Towards Automatic Optimization of MapReduce Programs
In Proc. of the ACM Symposium on Cloud Computing 2010 (ACM SOCC 2010), June 2010
- S. Duan, V. Thummala, and S. Babu.
Tuning Database Configuration Parameters with iTuned
In Proc. of the International Conference on Very Large Databases (VLDB), August 2009
-
R. Thonangi, V. Thummala, and S. Babu.
Finding Good Configurations in High-Dimensional Spaces: Doing
More with Less
In Proc. of the IEEE International Symposium
on Modeling, Analysis, and Simulation
of Computer and Telecommunication Systems (MASCOTS),
September 2008
On SQL Tuning
On Query Interactions
-
M. Ahmad, S. Duan, A. Aboulnaga, and S. Babu.
Predicting Completion Times of Batch Query Workloads using Interaction-aware Models and Simulation
In Proc. of the
Intl. Conference on Extending Database Technology
(EDBT), March 2011
-
M. Ahmad,
A. Aboulnaga, S. Babu, and K. Munagala.
Interaction-aware Scheduling of Report Generation Workloads
In the VLDB Journal,
2011
-
M. Ahmad,
A. Aboulnaga,
and S. Babu.
Query Interactions in Database Workloads
In Proc. of the Second Workshop on
Testing Database Systems (DBTest),
June 2009
-
M. Ahmad,
A. Aboulnaga,
S. Babu, and
K. Munagala.
Modeling and Exploiting Query Interactions in Database Systems
In Proc. of
ACM International Conference on Information and Knowledge Management (CIKM),
October 2008
On Diagnosis
On Benchmarking and Modeling
-
P. Shivam, V. Marupadi, J. Chase, and S. Babu.
Cutting Corners: Workbench Automation for Server Benchmarking
In Proc. of the 2008 USENIX Annual Technical Conference,
June 2008
- P. Shivam, S. Babu, and J. Chase.
Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications
In Proc. of the International Conference on Very Large Databases (VLDB), September 2006
- P. Shivam, S. Babu, and J. Chase.
Active Sampling for Accelerated Learning of Performance Models
In Proc. of the First Workshop on Tackling Computer Systems Problems with Machine Learning Techniques (SysML), June 2006
On Control- and System-level Issues
- A. Demberel, J. Chase, and S. Babu.
Reflective Control for an Elastic Cloud Application: An Automated Experiment Workbench
In Proc. of the First Workshop on
Hot Topics in Cloud Computing (HotCloud), in conjunction with USENIX Annual Technical Conference, June 2009
- A. Yumerefendi, P. Shivam, D. Irwin, P. Gunda,
L. Grit, A. Demberel, J. Chase, and S. Babu.
Towards an Autonomic Computing Testbed
In Workshop
on Hot Topics in Autonomic Computing (HotAC), June 2007
Demonstrations
- V. Thummala and S. Babu.
A Tool for Configuring and Visualizing Database Parameters
Winner of the SIGMOD'10 Best-demo Award Competition!
Demonstrated at the
2010 ACM Intl. Conf. on Management of Data (SIGMOD 2010), June 2010
- S. Duan, P. Franklin, V. Thummala, D. Zhao, and S. Babu.
Shaman: A Self-Healing Database System
Demonstrated at the
2009 IEEE International Conference on Data Engineering (ICDE), April 2009
- P. Shivam, A. Demberel, P. Gunda, D. Irwin,
L. Grit, A. Yumerefendi, S. Babu, and J.
Chase.
Automated and On-Demand Provisioning of Virtual Machines for Database Applications
Demonstrated at the
2007 ACM Intl. Conf. on Management of Data (SIGMOD 2007), June 2007