Dialogue Theory for Virtual Environments

Table of Contents:

  1. Principal Investigator.
  2. Productivity Measures.
  3. Summary of Objectives and Approach.
  4. Detailed Summary of Technical Progress.
  5. Transitions and DOD Interactions.
  6. Software and Hardware Prototypes.
  7. List of Publications.
  8. Invited and Contributed Presentations.
  9. Honors, Prizes or Awards Received.
  10. Project Personnel Promotions.
  11. Project Staff.
  12. Multimedia URL.
  13. Keywords.
  14. Business Office.
  15. Expenditures.
  16. Students.
  17. Book Plans.
  18. Sabbatical Plans.
  19. Related Research.
  20. History.

Principal Investigator.

Productivity Measures.

Summary of Objectives and Approach.

  1. Our project has developed a theory of dialogue that enables a machine to cooperate with a human in the solution of a problem. Specifically, the machine proceeds to prove the top level goal that represents the solution to the problem. If it finds subgoals in its proof that it cannot solve, it resorts to an interaction with the user to attempt to obtain the needed information to finish the proof. Carried to its natural conclusion, this strategy becomes a theory of dialogue; all interactions are initiated to fill in gaps in uncompleted proofs.

    This theory was implemented in a circuit repair system that could help a user diagnose and repair a failure in an electric circuit. The system could interact with the user with voice and was perfected and tested at length.

    The current project aims to extend the mechanism to handle multimedia interactions with the user. Specifically we have been implementing a multimedia grammatical system that can handle a variety of modes such as voice, graphics entities, displayed text, artificial sounds, and haptic devices. With this system, our dialogue mechanisms can proceed as before but the interactions can utilize all of these communicative modes.

    The approaches to the research involve both developing a theoretical model and studying its properties and implementing the ideas in a voice-graphics interactive dialogue machine. The particular system currently being prototyped is a tutor for teaching computer programming.

Detailed Summary of Technical Progress.

  1. This project has developed a model for multimedia grammars which translate between voice, text, graphic, or haptic entities and machine internal forms, a logic oriented language. In the generation mode, such grammars can begin with a predicate calculus statement of a message to be conveyed to a user and search the set of grammar operators to find a way to express the message in terms of the available modes. Thus the output might be a voice statement, a text message, a graphic entity such as an arrow to a displayed item, or a combination of these. In the parsing mode, such grammars receive a multimodal input, find its significant syntactic features, and translate them into meaning structures.

    An important feature of our multimedia grammars is a complexity measuring scheme that evaluates each structure during generation. This feature provides the system with a way to select a preferred form of expression when many versions of a communication could be used. For example, the system might be able to reference an item as either "the seventy-sixth item in a row", "the third green object", or "that object" (with an accompanying graphic arrow). In each case, the system needs to be able to place a measure of the desireability on the particular syntactic form so that it can choose the one to be used.

    The complexity computation needs to be flexible and dynamic. For example, if the user is distracted visually, then voice messages might be preferred. The complexity measuring system should be able to instantaneously modify its behavior to accommodate the situation. If the environment is momentarily swept with loud noises, the outputs might drop the use of voice and use presented text and graphic messages. If the the user is inexperienced, the system might select versions of the message that have high redundancy and overkill on clarity. Our project has developed a mechanism for representing and using such complexity functions, but we need much more experience and experimentation to learn the details of how to optimize such a system.

    In order to gain experience with our grammatical mechanisms, we have coded a version for a voice interactive programming tutor system. This system has a large amount of programming knowledge in the form of Prolog rules and is designed to aid Duke University students to learn a programming language. The system, in prototype form, is now running and was used on an experimental basis for tutoring students in our Computer Science 1 class. We found that students could use it quite succesfully in the process of debugging one simple program and we are currently studying the details of the voice interactions to better understand what happened. A video demonstration of our voice interactive programming tutor can be viewed on the World Wide Web as referenced by our home page.

Transitions and DOD Interactions.

  1. Our group has had negotiations with the Army Research Office regarding these technologies and the result of these has been the initiation of an industrial project to create voice and graphics applications for the U. S. Army. This project has been undertaken by the Research Triangle Institute in Research Triangle Park, North Carolina with the first task being the creation of a voice-virtual environment tutor and trainer for tank repair and maintenance. The work began in February of 1995 funded at the level of approximately $600,000 for the first year. It resulted in a prototype system being created which has been demonstrated to the sponsor. (Reference: Dr. James N. Brown, Research Triangle Institute, Research Triangle Park, North Carolina)

Software and Hardware Prototypes.

  1. Prototype Name: Duke Programming Tutor

List of Publications.

  1. R.D. Barve and P.M. Long. On the Complexity of Learning from Drifting Distributions. Proceedings of the 1996 Conference on Computational Learning Theory.
  2. A.W. Biermann and P.M. Long. The Composition of Messages in Speech-Graphics Interactive Systems. Proceedings of the 1996 International Symposium on Spoken Dialogue.
  3. A.W. Biermann, C.I. Guinn, M. Fulkerson, G. Keim, Z. Liang, D. Melamed, and K. Rajagopalan. Goal-Oriented Multimedia Dialogue with Variable Initiative. Submitted for journal publication, 1996.
  4. N. Cesa-Bianchi, P.M. Long and M.K. Warmuth. Worst-case Quadratic Loss Bounds for Prediction using Linear Functions and Gradient Descent. IEEE Transactions on Neural Networks, 7(3):604-619, 1996.
  5. C.I. Guinn. The Role of Computer-Computer Dialogues in Human-Computer Dialogue System Development. Empirical Methods in Discourse Interpretation and Generation, Proceedings of the AAAI 1995 Spring Symposium.
  6. C.I. Guinn. Mechanisms for Mixed-Initiative Human-Computer Collaborative Discourse. Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, 1996.
  7. C.I. Guinn. Mechanisms for Dynamically Changing Initiative in Human--Computer Collaborative Discourse. Submitted for publication, 1996.
  8. P.M. Long. Improved Bounds about On-line Learning of Smooth Functions of a Single Variable. Proceedings of the 1996 Workshop on Algorithmic Learning Theory.
  9. P.M. Long and L. Tan. PAC Learning Axis-Aligned Rectangles with Respect to Product Distributions from Multiple-instance Examples. Proceedings of the 1996 Conference on Computational Learning Theory. Invited to the special issue of Machine Learning for COLT'96.

Invited and Contributed Presentations.

  1. Alan W. Biermann, "Dialogue Theory for Virtual Environments", Office of Naval Research Virtual Environment Workshop, Arlington Virgina, 21-24 March, 1995.
  2. Alan W. Biermann, "Spoken Language Dialogue", Invited lecture, Spoken Human-Machine Dialogue Workshop, U. S. Army Training and Doctrine Command, Research Triangle Park, North Carolina, 30 May- 1 June 1995.

Honors, Prizes or Awards Received.

  1. Alan W. Biermann was made a Fellow of the AAAI in August, 1994.

Project Personnel Promotions.

Project Staff.

  1. Name: Dr. Alan W. Biermann
  2. Name: Dr. Curry Guinn
  3. Name: Dr. Philip Long

Multimedia URL.

  1. EOYL FY95
  2. QUAD FY95
  3. EOYL FY94
  4. Dialogue Theory for Virtual Environments
  5. Quicktime Movie of a Real Debugging Session


  1. Voice Interactive Systems
  2. Multimedia Systems
  3. Human-Machine Interface
  4. Virtual Environments

Business Office


  1. FY95: 22%
  2. FY94: 100%


Book Plans

Sabbatical Plans

Related Research

  1. James F Allen,University of Rochester
  2. BBN, The HARK recognizer
  3. Barbara Grosz, Harvard
  4. Kathleen McKeown, Columbia University Natural Language Group
  5. Steven K. Feiner, Computer Graphics and User Interfaces Lab, Columbia University


  1. comp.ai.nat-lang
  2. comp.human-factors
  3. comp.ai.nlang-know-rep
  4. comp.multimedia
  5. comp.speech
  6. comp.ai.edu
  7. comp.ai.fuzzy


  1. We listed above that our project has passed its results (including some actual code) to the Research Triangle Institute, Research Triangle Park, Northa Carolina. This has resulted in the ARO sponsored project described there.

    As a second example, our project created a voice interactive word processing system in the mid 1980s called VIPX. This system has been undertaken as the prototype for a Kurzweil AI, Inc. product development project in Waltham, Massachusetts. It is funded by NIST. The development is going forward at this time and has already been demonstrated in a prototype form.

  2. In 1979, our project ran an experiment to test a typed natural language programming system. Students were asked to create programs using a special subset of English and using a traditional programming language and their performances using the two approaches were compared. One of the most noticeable aspects of student behaviors in the study was that the mere entering of the program was a major impediment to using the natural language system. This led to the addition of a voice interface to that system and, in later years, to a series of voice interactive systems.

    In the mid 1980s, dialogue theory, as described by James Allen, Barbara Grosz, Candy Sidner, our project, and several others, came into existence. This theory emphasised the idea of subdialogues as a major construct of dialogues and proposed ways to decompose interactions into such subunits. Our Circuit Fixit Shop Project between 1988 and 1991 was an implementation that tested many of these ideas. We chose PROLOG for our representation of knowledge and created our missing axiom theory for driving the interaction. In later work, we have been trying to formalize this theory to the point that its properties can be investigated in more systematic ways.

    Our next step was to turn to another idea that had been set aside for a long time. We had, for most part, ignored other interaction modalities besides typed and spoken natural language. We decided to create a multimedia grammer to do the translation between the internal logical language and the various external modes available to communication. This model includes a complexity feature that measures the desirability of the external form when it is generated and helps the system prefer attractive and efficient forms of expression.