CPS296.04 Sequential Decision Theory: Algorithms, Policies, and Games

Instructor: Kamesh Munagala

M-W 1:15-2:30, North N225, Fall 2009

Markov Decision Process

Lecture Notes: The references are omitted, and easy to deduce from the links with the course schedule.

Course Outline:
In several modern applications such as internet auctions, robot navigation, wireless communication, social networks, and database query processing, inputs are known only with associated uncertainty. The goal of an optimization procedure therefore becomes two-fold: Learn as much as possible about the uncertain input, and simultaneously optimize the desired objective using prior probabilistic models (that get refined as more information is available about the input). In different application scenarios, the specific problems that arise are different; however, the models, algorithmic paradigms, and solution techniques needed to address them are often similar. In this course, we will develop general algorithmic tools for sequential decision making under uncertainty in such diverse contexts. Though the course is theoretical in nature, almost all problems considered have significant practical motivation which will also be highlighted.

Topics Covered: Basic non-adaptive and adaptive schemes, Submodularity, Greedy algorithms and adaptivity gaps, Optimal stopping, Multi-armed bandit problems and the Gittins index, Regret measures and adversarial bandit algorithms, Online algorithms and gains using probabilistic models, Bayesian mechanism design.

Grading: Homeworks (30%), Scribing lecture notes (20%), Project (50%). There is no final exam.

Project: Project suggestions are available via Blackboard. The project can involve either a theoretical problem, or involve application to a systems or economics problem. In either case, I would prefer it if the project developed at least some theoretical understanding. You can also pick a collection of papers to read and summarize. I would prefer groups of no more than two people. The output should be a report and a presentation.

Scribing: Each of you would have to scribe three lectures. The templates for scribing are on Blackboard. Please download these into one directory and edit/rename/compile lecture1.tex. Please read the first 6 pages of Mathematical Writing before you scribe.

Reading Material:
No textbook is required. I will mainly use recent papers and surveys, and lecture notes that you will help scribe.
Check out related courses offered at Stanford and UPenn.
Also check out Vincent Conitzer's course on Computational Microeconomics. It is feasible to take both courses.

Lecture Schedule: Here is a tentative schedule of topics. Note that there will be almost no coverage of queueing theory, reinforcement learning, and financial models, all of which are important in their own right. Furthermore, I will choose to present simple and intuitive algorithms and proofs over optimal but more complicated algorithms.

Lecture NumberDateTopicReadingsAdditional Readings
1Aug. 24Course organization and logistics
Applications of stochastic optimization
Warmup: Sensor placement  and submodularity
Known Distributions: Stochastic Optimization and Adaptivity
2Aug. 26Submodular function maximization: Greedy algorithm
Graphical models, inference, and belief propagation.
[KG], [KMN][FMV]: Approximating non-monotone submodular functions
[Svi]: 1-1/e approximation with general costs; similar to [KMN]
3Aug. 31Influence in social networks [KKT][MR]: Shows influence is submodular in general
4Sep. 2Continuous Query Processing:
Pipelined filters: Greedy non-adaptive algorithm
Correlations and the conditional greedy algorithm
[MBMW] [FLT], [CFK]: Also show 4-approximation for min-sum set cover
[DGV]: Again note the use of a fixed ordering schedule
[MSU]: Stochastic scheduling with precedence constraints
5Sep. 7
Shared conjunctive queries with expensive predicates:
Adaptivity gaps, LP bounds, and semi-adaptive algorithms
[MSW]Note the large gap between adaptive and fixed ordering schemes
[EHJK]: A simpler version of the same problem
6Sep. 9Greedy set cover and Harmonic hypergraph cover
The adaptive greedy and harmonic algorithms
Accounting via dual pricing and conditioning
[LPRY][Das]: Adaptive greedy algorithm for active learning
[AH], [CPRS]: Adaptive greedy decision tree construction
[KKM]: Evaluating monotone CNF/DNF formulae
Learning Distributions: Multi-armed Bandit Problems
7Sep. 14Keyword auctions and the multi-armed bandit problem
Beta priors, Bayesian updates, Explore-exploit tradeoffs
[PO], [GM09]
[GP], [BSS]: Truthful mechanisms for keyword auctions
[Times]: The human mind also trades off exploration and exploitation!
8Sep. 16MDP formulation, decision policies, and dynamic programming
[SB, Ch. 2, 3, 4]
9Sep. 21Uniform discounting: The Gittins index policy[Tsi][FW]: Four different proofs of the Gittins index theorem
10Sep. 23Finite horizon problems, relaxations, and duality[GM09b], [GKN][GMS]: "Restless bandit" problem with similar solution idea
[SU]: Brownian restless bandits with a regret measure of performance
11Sep. 28Removing the prior: The stochastic multi-armed bandit problem
Regret measure and the UCB1 policy
12Sep. 30Tight lower bound for regret[ACFS]
13Oct. 7Robust regret and the adversarial bandit problem
Onine prediction: Weighted majority algorithm
14Oct. 12Relation between bandit problems and online prediction[AGT], Chapter 4
15Oct. 14Online Prediction: Follow the Perturbed Leader[KV]Essentially Hannan's 1954 algorithm.
16Oct. 19Prediction over convex spaces: Gradient descent[Zin]Can you use [Zin] to solve the problem considered in [KV]?
17Oct. 21Low Variance Unbiased Estimators
Barycentric spanners, Gradient estimators
[FKM], [AK][AHR]: The introduction unifies FPL and gradient descent as regularization
Also a nice description of why variance plays a role in regret.
Distribution-oblivious: Online Algorithms
18Oct. 26List access and Paging[Karp1]
19Oct. 28Randomized algorithms for paging
Lower bounds for randomized paging
20Nov. 2Online Steiner trees[IW]Note the nice lower bound construction in [IW]
21Nov. 4Sampling from the future:
Online Steiner trees with prior information
[GGLS][CCP], [GPRS]: Multi-stage stochastic optimization
[FMMM]: Online adword matching
22Nov. 9Yao's Theorem and the MinMax Theorem
23Nov. 11Primal-dual algorithm: Ski Rental, Budgeted allocations
[BN][BLMNO]: Application to wireless power allocation
24Nov. 16Online Set Cover
25Nov. 18Online Auctions and Secretary Problems:
Random order model and aspiration strategies
[Sec], [K] [BIKK]: Survey of known results for the secretary problem
[KP]: Secretary problem for graphs