Mike Clancy, statement CS2 workshop


OOPSLA "The Future of CS 2 and Data Structures" workshop
Mike Clancy, University of California, Berkeley
clancy@cs.berkeley.edu

I first taught the data structures course in 1979. The data structures we covered then--arrays, linked lists, trees, and hash tables--are those we cover now. Programming environments are much different now, though; most come with extensive class libraries that provide implementations of the standard data structures. A big question I see for CS 2 is, to what extent should we continue to focus on implementations at the expense of various other topics (described below)?

Another big change in the last twenty years is our perception of the computing environment. I suspect that many computer scientists these days understand the organization of a modern operating system and GUI only at a relatively high level. This factor may be relevant in a trend away from study of implementations.

Course activities have also changed in a variety of ways. Computing resources have improved dramatically; in particular, it takes a much larger problem or data set for a "good" data structure to make a difference, and I have attempted to adjust my assignments appropriately. Activities also include more experimentation, analysis, and use of code we supply. We move through the semester from studying and analyzing the standard data structures--implementations are provided--to using them in large (1500-2000 lines) projects. Where possible, we use case studies to model problem solving and complexity management.

Two recent textbooks aimed at the CS 2 market offer interesting contrasts. Data Abstraction and Problem Solving with C++: Walls and Mirrors, by Frank M. Carrano (Benjamin/Cummings, 1995), is a more traditional treatment. It has chapters titled "Linked Lists", "Stacks", "Queues", "Trees", "Tables and Priority Queues", and "Advanced Implementations of Tables". Its coverage lumps interface, implementation, and application together; there is little if any treatment of applications where more than one data structure might be appropriately chosen. Algorithms, Data Structures, and Problem Solving with C++, by Mark A. Weiss (Addison-Wesley, 1996), has a chapter called "Data Structures" in which interfaces to classes for stacks, queues, linked lists, trees, hash tables, and priority queues are presented in 25 pages. This is followed by five chapters of "Applications" that describe the design of programs that use the classes. Only later, in several "Implementations" sections, does Weiss deal with the details of how the classes do their things.

My view of the modern CS 2 leans more toward the Weiss model. The important points of a data structures course should not be the details of individual algorithms or representations. Students should also learn to identify efficiency constraints on problem solutions, to understand the tradeoffs between storing and recomputing information and the tradeoffs of using linked vs. contiguous storage to store it, and to identify appropriate data types whose implementations satisfy the problem requirements. Applications where students must wrestle with these issues are a necessary part of such a course.

Our CS 1 courses at Berkeley have for the past several years been organized along the following lines. First, students work in closed lab sections on exercises that cover low-level details of new primitive operations or techniques. Later, they see the primitives assembled or techniques illustrated in the context of a case study. Finally, they apply the techniques in project assignments. These three steps are repeated through the course, and culminate in a project program that pulls everything together. This course organization also seems appropriate for a modern CS 2. Lab activities would involve the use and analysis of container classes and comparison of their behavior with representations covered earlier. Some example exercises would be the following:

Appropriate applications for treatment in case studies might include the design of an inventory manager, a game player, or the boundary tags technique for storage management. Examples of case studies for object-based Pascal programming appear in Designing Pascal Solutions: Case Studies using Data Structures, by me and Marcia Linn (W.H. Freeman, 1996).

What's the downside of such a course organization? With lab exercises that focus on experimentation and comparison of data structures, plus homework activities that involve choice among a number of implementation alternatives, students don't have a lot of time for broad "coverage" of less central topics. For instance, my course treats techniques for maintaining balance in a binary search tree and for managing a hash table using open addressing only in passing. Basically, depth of coverage wins out over breadth.

Another goal of our CS 2 is to help students become better programmers through practice of basic principles of software engineering. For example, the introduction of study and use of software design patterns has been suggested (see articles by Astrachan and Wallingford in the proceedings of the last three SIGCSE conferences). But time is limited, as noted above. If patterns go in, what goes out?

Dan McCracken and Dennis Frailey had an interesting "conversation" in the June 1998 SIGCSE Bulletin about how much to incorporate the C++ Standard Template Library into a CS 2 course. I hope that this workshop will build on this conversation, to give the participants a better basis to decide how to balance coverage between implementations, applications, and good software practice.