This dissertation investigates the use of hierarchy and problem decomposition as a means of solving large, stochastic, sequential decision problems. These problems are framed as Markov decision problems (MDPs). The new technical content of this dissertation begins with a discussion of the concept of temporal abstraction. Temporal abstraction is shown to be equivalent to the transformation of a policy defined over a region of an MDP to an action in a semi-Markov decision problem (SMDP). Several algorithms are presented for performing this transformation efficiently.
This dissertation introduces the HAM method for generating hierarchical, temporally abstract actions. This method permits the partial specification of abstract actions in a way that corresponds to an abstract plan or strategy. Abstract actions specified as HAMs can be optimally refined for new tasks by solving a reduced SMDP. The formal results show that traditional MDP algorithms can be used to optimally refine HAMs for new tasks. This can be achieved in much less time than it would take to learn a new policy for the task from scratch.
HAMs complement some novel decomposition algorithms that are presented in this dissertation. These algorithms work by constructing a cache of policies for different regions of the MDP and then optimally combining the cached solution to produce a global solution that is within provable bounds of the optimal solution.
Together, the methods developed in this dissertation provide important tools for producing good policies for large MDPs. Unlike some ad-hoc methods, these methods provide strong formal guarantees. They use prior knowledge in a principled way, and they reduce larger MDPs into smaller ones while maintaining a well-defined relationship between the smaller problem and the larger problem.
Back to my home page.