|
Shivnath Babu
D338 Levine Science Research Center
Department of Computer Science
Duke University
Durham, NC 27708
myfirstname@cs.duke.edu
Phone: 919-660-6579
Fax: 919-660-6519
|
I am an Associate Professor in the Department of Computer Science at Duke University.
My primary research interest is in making data-intensive computing systems easier to manage.
Recent work from my research group has focused on the Hadoop MapReduce system.
Check out our Starfish project.
I am very interested in using cloud platforms for online experimentation to aid
system tuning and testing. The vision of our Flex
project is to enable users---irrespective of whether
they are end-users, developers, or system administrators---to
have programmatic access to collect information needed
for system testing and tuning through planned experiments on the cloud.
The DIADS project tackles
integrated problem diagnosis for database systems running
on networked storage as well as automated detection and recovery
from data corruption caused by hardware faults,
software bugs, or human mistakes.
Our work is supported by startup funds from Duke, grants from the US National Science Foundation,
faculty awards from IBM, an equipment grant from IBM, and resource usage
grants from Amazon Web Services.
Recent Updates
-
I gave a talk at the Spark Summit 2015 on Simplifying Spark Application Development.
See slides and
video.
-
I gave a talk at the Hadoop Summit 2015 on tuning Spark applications.
See slides and
video.
Research Interests
-
Data management for new application domains, e.g., elastic cloud computing
and large-scale analytics with the Hadoop ecosystem
-
Architectures and algorithms for self-managing database systems
Projects
-
Starfish: A
Self-tuning System for Big Data Analytics (new)
-
Flex: A Platform for Experiment-Driven System Management (ongoing)
-
Ques: Querying and controlling systems (ongoing).
DIADS is a subproject of Ques.
-
STREAM: Stanford Stream Data
Manager (completed)
Teaching
-
COMPSCI 290.1:
Data Engineering, Fall 2015
-
CPS 516:
Data-Intensive Computing Systems, Fall 2012, Spring 2015
-
CPS 182s:
Technical and Social Foundations of the Internet, Spring 2011, 2012
-
CPS 216:
Data-Intensive Computing Systems, Fall 2009, 2010, 2011
-
CPS 196.03:
Information Management and Mining, Spring 2009
-
CPS 216:
Advanced Database Systems, Fall 2006, 2007, 2008
-
CPS 49S:
Google: The Computer Science Within and its Impact on Society,
Spring 2007 and 2008. This class got some press, e.g., the Duke News
article and the
Duke Magazine article.
-
CPS 296.2: Self-Managing Systems, Spring 2006
Awards
-
IBM Faculty Award, 2008
-
NSF CAREER, 2007-2011
-
IBM Faculty Award, 2007
-
IBM Faculty Award, 2006
Recent Publications
- H. Lim, H. Herodotou, and S. Babu.
Stubby: A Transformation-based Optimizer for MapReduce
Workflows
In Proc. of the 2012 Intl. Conference on
Very Large Data Bases (VLDB), August 2012 (To appear)
-
H. Herodotou, F. Dong, and S. Babu.
No One (Cluster) Size Fits All: Automatic Cluster Sizing for Data-intensive Analytics
In Proc. of the ACM Symposium on Cloud Computing 2011 (ACM SOCC 2011), October 2011
- H. Herodotou and S. Babu.
Profiling, What-if Analysis, and Cost-based Optimization
of MapReduce Programs
In Proc. of the 2011 Intl. Conference on
Very Large Data Bases (VLDB), August 2011
-
N. Borisov, S. Babu, N. Mandagere, and S. Uttamchandani.
Warding off the Dangers of Data Corruption with Amulet
In Proc. of the
2011 ACM Intl. Conf. on Management of Data (SIGMOD), June 2011
- H. Herodotou, N. Borisov, and S. Babu.
Query Optimization Techniques for Partitioned Tables
In Proc. of the
2011 ACM Intl. Conf. on Management of Data (SIGMOD), June 2011
-
N. Borisov, S. Babu, N. Mandagere, and S. Uttamchandani.
Dealing Proactively with Data Corruption: Challenges and Opportunities
In Proc. of the Sixth Intl. Workshop on Self-Managing Database Systems (SMDB), April 2011
-
M. Ahmad, S. Duan, A. Aboulnaga, and S. Babu.
Predicting Completion Times of Batch Query Workloads using Interaction-aware Models and Simulation
In Proc. of the
Intl. Conference on Extending Database Technology
(EDBT), March 2011
-
M. Ahmad,
A. Aboulnaga, S. Babu, and K. Munagala.
Interaction-aware Scheduling of Report Generation Workloads
In the VLDB Journal,
2011
-
H. Herodotou, H. Lim, G. Luo, N. Borisov, L. Dong, F. B. Cetin, and S. Babu.
Starfish: A Self-tuning System for Big Data Analytics
In Proc. of the Fifth Biennial Conference on Innovative Data Systems Research (CIDR), January 2011
- H. Herodotou and S. Babu.
Xplus: A SQL-Tuning-Aware Query Optimizer
In Proc. of PVLDB Volume 3 (the International Conference on Very Large Databases (VLDB)), September 2010
- S. Babu.
Towards Automatic Optimization of MapReduce Programs
In Proc. of the ACM Symposium on Cloud Computing 2010 (ACM SOCC 2010), June 2010
- H. Lim, S. Babu and J. Chase.
Automated Control for Elastic Storage
In Proc. of the Intl. Conference on Autonomic Computing (ICAC 2010), June 2010
- S. Duan, V. Thummala, and S. Babu.
Tuning Database Configuration Parameters with iTuned
In Proc. of the International Conference on Very Large Databases (VLDB), August 2009
-
H. Herodotou
and S. Babu.
Automated SQL Tuning through Trial and (Sometimes) Error
In Proc. of the Second Workshop on
Testing Database Systems (DBTest),
June 2009
-
M. Ahmad,
A. Aboulnaga,
and S. Babu.
Query Interactions in Database Workloads
In Proc. of the Second Workshop on
Testing Database Systems (DBTest),
June 2009
- A. Demberel, J. Chase, and S. Babu.
Reflective Control for an Elastic Cloud Application: An Automated Experiment Workbench
In Proc. of the First Workshop on
Hot Topics in Cloud Computing (HotCloud), in conjunction with USENIX Annual Technical Conference, June 2009
- H. Lim, S. Babu, J. Chase, and S. Parekh.
Automated Control in Cloud Computing: Challenges and Opportunities
In Proc. of the First Workshop on Automated Control
for Datacenters and Clouds, June 2009
- S. Babu, N. Borisov, S. Duan, H. Herodotou, and V. Thummala.
Automated Experiment-Driven Management of (Database) Systems
In Proc. of
the 12th Workshop on
Hot Topics in Operating Systems (HotOS), May 2009
- S. Duan, S. Babu, and K. Munagala.
Fa: A System for Automating Failure Diagnosis
In Proc. of
2009 IEEE International Conference on Data Engineering (ICDE), April 2009
- S. Babu, N. Borisov, S. Uttamchandani,
R. Routray, and
A. Singh.
DIADS: Addressing the "My-Problem-or-Yours" Syndrome with
Integrated SAN and Database Diagnosis
In Proc. of
the USENIX Conference on File and Storage Technologies (FAST), February 2009
See my full list of my publications and demonstrations.
Recent Demonstrations
- N. Borisov and S. Babu.
Proactive Detection and Repair of Data Corruption:
Towards a Hassle-free Declarative Approach with Amulet
Demonstrated at the 2011 International Conference on
Very Large Data Bases (VLDB), August 2011
- H. Herodotou, F. Dong, and S. Babu.
MapReduce Programming and Cost-based Optimization?
Crossing this Chasm with Starfish.
Demonstrated at the 2011 International Conference on
Very Large Data Bases (VLDB), August 2011
- V. Thummala and S. Babu.
A Tool for Configuring and Visualizing Database Parameters
Winner of the SIGMOD'10 Best-demo Award Competition!
Demonstrated at the
2010 ACM Intl. Conf. on Management of Data (SIGMOD 2010), June 2010
- N. Borisov, S. Babu, S. Uttamchandani,
R. Routray, and
A. Singh.
DIADS: A Problem Diagnosis Tool for Databases
and Storage Area Networks
Demonstrated at the
2009 International Conference on Very Large Databases (VLDB), August 2009
- S. Duan, P. Franklin, V. Thummala, D. Zhao, and S. Babu.
Shaman: A Self-Healing Database System
Demonstrated at the
2009 IEEE International Conference on Data Engineering (ICDE), April 2009
- S. Duan and S. Babu.
Automated Diagnosis of System Failures with Fa
Demonstrated at the
2009 IEEE International Conference on Data Engineering (ICDE), April 2009
Software Releases
- Starfish Self-tuning Analytics System
-
Source code is available
here
- STREAM Data Stream Management System
-
Source code is available
here