,

CPS290 Algorithmic Foundations of Data Science

Instructor: Kamesh Munagala

    Tue/Thu 1:25 - 2:40 @ West Duke 105

Spring 2017



Course Outline:
This course will provide basic algorithmic tools that are not covered in CompSci 330, yet form the bread and butter of modern algorithmic thinking. At a high level, we will cover three topics that have wide applications in large-scale information processing and data mining, social and information networks, and machine learning.

Topics: The course will be organized into 3 logical parts.
  1. From Data to Information: Given large data, how can we succinctly represent it with the goal of discovering, maintaining, and retreiving salient features? 
  2. Continuous and Algebraic Methods: How can tools developed in continuous math and algebra help us solve large-scale optimization and perform compression?
  3. Reasoning about Graphs: How can we reason about and search large graphs that arise in information and social networks? 
Prerequisites: CompSci 330 is a hard prerequisite! Some familiarity with Calculus, Linear Algebra, and Probability is required. If you have never seen a gradient, an expectation, or an eigenvalue before, you will be in deep trouble.

Office Hours:
Kamesh Munagala:         Wed 11-11:45am,  LSRC D205
Akbota Anuarbek (TA): Thu 11 am - noon, LSRC D215

Grading:  Five homeworks (50%), Mid-term (20%), Final (30%).

Some of the homework will involve programming assignments. I will choose the best 5 out of 6 homeworks for computing the grade.


Reading Material:  I will mainly use material from these sources. Please do not buy any of these books for now. I have put pointers to easy-to-digest notes.

[MU]   Probability and Computing by M. Mitzenmacher and E. Upfal. Chapters 4, 7, 9, and 10.
[RV]    CS 168: The Modern Algorithmic Toolbox by Tim Roughgarden and Greg Valiant.

Lecture Schedule:

Lecture NumberDateTopicNotes
Readings
Data Compression and Modern Hashing
1
Jan. 12 Course organization and logistics
Data Compression and Huffman Coding


Lecture notes

Chapter 9 in [MU]
Guy Blelloch's notes from CMU

2Jan. 17
Entropy and Kraft's inequality
Move to Front Coding
3
Jan. 19
Universal Hashing
Miltersen's Notes

4

Jan. 24
Compression via Hashing
Bloom filter
Survey article

Chapter 5.5.3 in [MU]
5
Jan. 31
Count Min Sketch
Lecture notes from [RV]

6
Feb. 2
Nearest Neighbors (NN) Problem
Similarity measures, K-d Trees


Lecture notes
Chapter 5.2 in this book
7
Feb. 5 Curse of dimensionality
Projections onto random coordinates
Chapter 4 in [MU]
Wikipedia article defining CLT
8
Feb. 7 Approximate NN:
Projections via random vectors

Lecture notes
Lecture notes from [RV]

9
Feb. 9 Central Limit Theorem
Dimensionality reduction for Euclidean Spaces
Tutorial with simplest proof of JL Lemma
10
Feb. 14
Approximate NN: Locality Sensitive Hashing
Estimators for Hamming, Jaccard, Cosine measures
Lecture notes Paper1 Paper2
CACM Survey Article
Algebraic Methods
11
Feb. 16 Eigenvalues and Eigenvectors
Lecture notes

12 Feb. 21 Low Rank Approximations



Lecture Notes
Lecture notes from [RV]
Shalizi's chapter
13 Feb. 23 Principal Components Analysis (PCA)
Lecture notes from [RV]
Tomasi's notes

14 Feb. 28 Singular Value Decomposition (SVD)

15 Mar. 2 Applications of Linear Algebra
Least Squares via SVD
Graph Laplacian


Lecture Notes

Tutorial
Lecture notes from [RV]
16Mar. 7Spectral Partitioning
17
Mar. 9
In-class Midterm  (till Lecture 14)


SPRING BREAK
18
Mar. 21
Random walks and Graph centrality Lecture Notes Chapter 2 in Matt Jackson's book.
A simple description of PageRank
19 Mar. 23 Other Linear Bases
Fourier Transform

Slides

Lecture notes from [RV]
20
Mar. 28 FFT Algorithm
Linear Programming and Network Flows
21
Mar. 30
Linear Programming
Modeling problems as linear programs




Chapter 7 in this book



Michel Goemans' Lecture notes
22
Apr. 4
CLASS CANCELED

23
Apr. 6
Duality: Taking the dual


24
Apr .11
CLASS CANCELED
25
Apr. 13
Zero-sum games
Simplex Algorithm
26
Apr. 18
Maximum Flow Problem
Augmenting paths

Slides


27
Apr. 20
Maxflow-mincut theorem
28Apr. 25
Applications of max flow and min cut
Slides


May 2
FINAL EXAM: 7 - 10PM