Self-Managing Systems

Course 296.2: Self-Managing Systems, Spring 2006
Meeting Time TuTh 2:50-4:05
Meeting Place D240 LSRC
Office Hours By appointment
Course Instructor   Shivnath Babu


Overview     Outline     Reading List     Evaluation     Projects     FAQs    


Overview

The management of personal and networked computing systems has become cumbersome because of their scale and complexity. The cost of high-end computing systems is now dominated by labor cost which, e.g., includes the rising salaries of system gurus who are needed to keep these systems up and running smoothly. In the crisp words of Robert Morris, director of IBM's Almaden Research Center: "There is no less than a crisis today in three areas (of systems): cost, availability, and user experience."

This "crisis of complexity" seems to be creating a Computer-Science-wide push towards building self-managing systems. IBM has coined a new word for this field: Autonomic Computing. You can read a lot about this emerging field here.

In this new course, we will study the emerging field of self-managing systems. In this course:

  1. We will discuss research papers from a wide range of disciplines. The unifying theme in these papers is that they are making progress towards creating self-managing systems.
  2. We will hear guest lectures from researchers in academia and the industry.
  3. We will try to give structure to this field, e.g., by concretely defining problems that arise in this setting, by identifying core algorithmic techniques useful in this domain, and by proposing guidelines for designers of future systems and software.
  4. Students will do a semester-long course project where they will work on a problem that falls under the umbrella of self-managing systems. Students are welcome to work on a problem drawn from their area of interest.

Reading List


WeekDateAuthor, Title, Venue, and ContentsPresenter
1 Thu, Jan. 12 Introduction and logistics Shivnath
[slides]
2 Tue, Jan. 17 A. Ganek and T. Corbi
The dawning of the autonomic computing era
In IBM Systems Journal, Volume 42, Number 1, 2003
[pdf] [html]
Shivnath
[slides] [notes]
Thu, Jan. 19 Class projects Shivnath
[slides]
3 Tue, Jan. 24 Class projects Shivnath
[slides]
Thu, Jan. 26 D. Patterson, A. Brown, and others
Recovery-Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies
UC Berkeley Computer Science Technical Report UCB//CSD-02-1175
[pdf] [doc]
Shivnath
[slides]
4 Tue, Jan. 31 (1) V. Markl, G. Lohman, and V. Raman
LEO: An autonomic query optimizer for DB2
In IBM Systems Journal, Volume 42, Number 1, 2003
[pdf] [html]

(2) K. Dias, M. Ramacher, U. Shaft, V. Venkataramani, and G. Wood
Automatic Performance Diagnosis and Tuning in Oracle
In Second Biennial Conference on Innovative Data Systems Research (CIDR), 2005
[pdf]
Shivnath
[slides] [notes]
Thu, Feb. 2 Invited speaker: Brent Miller from IBM Research Triangle Park (Autonomic Computing group)
5 Tue, Feb. 7 Presentation of project proposals
Thu, Feb. 9 I. Cohen, M. Goldszmidt, T. Kelly, J.Symons, and J. Chase
Correlating Instrumentation Data to System States: A Building Block for Automated Diagnosis and Control
In Operating Systems Design and Implementation (OSDI), 2004
[pdf] [html]
Shivnath
[slides]
6 Tue, Feb. 14 (1) K. Appleby, S. Fakhouri, L. Fong, and others
Oceano--SLA Based Management of a Computing Utility
In 7th IFIP/IEEE International Symposium on Integrated Network Management (IM), 2001
[pdf]

(2) V. Matossian, V. Bhat, M. Parashar, M. Peszynska, M. Sen, P. Stoffa and M. Wheeler
Autonomic Oil Reservoir Optimization on the Grid
In Concurrency and Computation: Practice and Experience, John Wiley and Sons, Vol. 17, Issue 1, 2005
[pdf] [Related slides 1] [Related slides 2]
Shivnath
[slides]
Thu, Feb. 16 N. Gandhi, J. Hellerstein, S. Parekh, D. Tilbury
Managing the Performance of Lotus Notes: A Control Theoretic Approach
Proceedings of the Computer Measurement Group, 2001
[pdf]
Shivnath
[slides]
7 Tue, Feb. 21 S. Zhang, I. Cohen, M. Goldszmidt, J. Symons, A. Fox
Ensembles of Models for Automated Diagnosis of System Performance Problems
Proceedings of the Intl. Conf. on Dependable Systems and Networks, 2005
[pdf]
Shivnath
[slides]
Thu, Feb. 23 Invited speaker: Wayne Clark from Cisco
8 Tue, Feb. 28 Invited speaker: Mike Lake from IBM Research Triangle Park (Tivoli group)
Thu, March 2 A. Aboulnaga, P. Haas, S. Lightstone, G. Lohman, V. Markl, I. Popivanov, V. Raman
Automated Statistics Collection in DB2 UDB
Proceedings of the Intl. Conf. on Very Large Databases (VLDB), 2004
[pdf]
Ranjith
[slides]
[notes]
9 Tue, March 7 Mid-course project presentation (15 minutes per group)
Thu, March 9 Class cancelled
10 Tue, March 21 Invited speaker: Mike Lake from IBM Research Triangle Park (Tivoli group)
Thu, March 23 I. Cohen, S. Zhang, M. Goldszmidt, J. Symons, T. Kelly, and A. Fox
Capturing, Indexing, Clustering, and Retrieving System History
Proceedings of Symposium on Operating System Principles (SOSP), 2005
[html]
Songyun
11 Tue, March 28 (1) 3-minute project updates

(2) K. Droegemeier, D. Gannon, D. Reed, and others
Service-oriented Environments in Research and Education for dynamically interacting with Mesoscale Weather
IEEE Computing in Science and Engineering, Nov-Dec 2005
[pdf]
Emma
[slides]
Thu, March 30 Mark Brodie, Sheng Ma, Guy Lohman, Tanveer Syeda-Mahmood, Laurent Mignet, Natwar Modani, Mark Wilding, Jon Champlin, and Peter Sohn
An Architecture for Quickly Detecting Known Software Problems
IEEE International Conference on Autonomic Computing (ICAC) 2005
[pdf] Access from within Duke or use a VPN
Mason
12 Tue, April 4 Project presentation (15 minutes per group)
Thu, April 6 George Candea, Shinichi Kawamoto, Yuichi Fujiki, Greg Friedman, Armando Fox
Microreboot - A Technique for Cheap Recovery
6th Symposium on Operating Systems Design and Implementation (OSDI), San Francisco, CA, December 2004
[html]
Brian
13 Tue, April 11 Eric Anderson, Michael Hobbs, Kimberly Keeton, Susan Spence, Mustafa Uysal, and Alistair Veitch
Hippodrome: Running Circles Around Storage Administration
USENIX Conference on File and Storage Technology (FAST) 2002
[html]
Amber
Thu, April 13 Paul Barham, Austin Donnelly, Rebecca Isaacs, and Richard Mortier
Using Magpie for request extraction and workload modelling
6th Symposium on Operating Systems Design and Implementation (OSDI'04), December 2004
[pdf]
Seda
14 Tue, April 18 Patrick Reynolds, Charles Killian, Janet L. Wiener, Jeffrey C. Mogul, Mehul A. Shah, and Amin Vahdat
Pip: Detecting the Unexpected in Distributed Systems
Proceedings of NSDI, San Jose, CA, May 2006 (to appear)
[pdf]
Pradeep
Thu, April 20 Final Project presentations
15 Tue, April 25 Final report and demo due