Welcome to MAT1841: Mathematics of Massive Data Analysis: Fundamentals and Applications - Fall 2021!

Course Description

This course will focus on understanding the structure of high-dimensional data and the mathematical tools we can use to characterize and reshape it for computational analysis. Several major threads will be woven throughout the course:

This is a survey course, and we intend to cover a range of techniques/applications cursorially. In no particular order, some of the specific topics we may cover include:

  1. High dimensional space and concentration inequalities
  2. Johnson-Lindenstrauss Lemma and random projections
  3. Markov chains
  4. Hashing and (pseudo)randomness
  5. Probabilistic streaming and sketching algorithms
  6. Random graph theory and applies to percolation theory
  7. Wavelet bases
  8. Complexity and entropy
  9. Clustering and classification of data
  10. Nonlinear dimensionality reduction
  11. Computational topology

This course will assume background in linear algebra, probability, and algorithms. A few topics may make use of elementary results from group theory and algebraic topology.

Schedule

Wednesdays, Thursdays, and Fridays 3-4pm.
Wednesday lecture will be in Bahen BAB024.
Thursday and Friday lectures will be in Koffler House KP113
The first lecture will be on Thursday, September 9.

Office Hours and Contact Info

Yun William Yu, Assistant Professor of Mathematics
Office Hours:
By appointment only via Zoom.
ywyu@math.toronto.edu

Quercus: https://q.utoronto.ca/courses/225117

Piazza: https://piazza.com/utoronto.ca/fall2021/mat1841

 

Syllabus

MAT1841_Syllabus.pdf
 

Reference texts and notes

The primary reference text is Foundations of Data Science by Blum, Hopcroft, and Kanna (2020) Cambridge University Press. Note that the full text is available from the University of Toronto libraries as an online downloadable resource. There also exist partial earlier drafts of the textbook elsewhere on the internet, but I recommend you use the published version from the library.

Lecture notes

Homework

There will be 10 homework assignments that will be due roughly weekly and will be assigned and collected online. There will be no extensions to posted homework due dates. However, the lowest homework mark will be dropped.

The homework problems will be a mix of theory and implementation. I recommend using Python for implementation, but will accept R, Julia, or C/C++. If you wish to do an implementation in any other language or framework, please clear it with me beforehand. Notably, I will not be accepting MATLAB implementations in this course, unless you present a very compelling reason.

Note that solutions should be as short, clear, and concise as possible. I will be taking marks off for long, meandering solutions to otherwise short problems, even if all of the reasoning is technically correct. Brevity is the soul of wit.

Final Project



Final project deadline: Wednesday, Dec 8, 2021.

There will be a final project. You will either (A) design and implement a data analysis method for a problem of your choosing, (B) prove a new result about one of the methods we covered, or (C) perform a survey of a group of relevant academic articles.

You will submit a written report in the format of a conference proceedings or journal article and deliver a short presentation to your peers. Also, you may work with a partner on this project.

Icons made by xnimrodx from www.flaticon.com.