High Dimensional Statistical Learning (HDL)

Click here for up to date information on this course

Description

This module provides a detailed overview of the mathematical foundations of modern statistical learning by describing the theoretical basis and the conceptual tools needed to analyze and justify the algorithms. The emphasis is on problems involving high volumes of high dimensional datasets, and on dimension reduction techniques allowing to tackle them. The course involves detailed proofs of the main results and associated exercices.

Keywords

PAC (probably approximately correct), random projection, PCA (principal component analysis), concentration inequalities, measures of statistical complexity

Prerequisites

The prerequisites for this course include previous coursework in linear algebra, multivariate calculus, basic probability (continuous and discrete) and statistics.
Previous coursework in convex analysis, information theory, and optimization theory would be helpful but is not required. Students are expected to be able to follow a rigorous proof

Content

The PAC framework (probably approximately correct) for statistical learning
Measuring the complexity of a statistical learning problem
Dimension reduction
Sparsity and convex optimization for large scale learning (time allowing)
Notion of algorithmic stability (time allowing)

Acquired skills

Understanding the links between complexity and overfitting
Knowing the mathematical tools to measure learning complexity
Understanding the statistical and algorithmic stakes of large-scale learning
Understanding dimension reduction tools for learning

Teachers

Rémi Gribonval (responsible until 2019), Aline Roumy (currently responsible, see new web page of the course )

Etudiants (avec mot de passe)

Course schedule (2018-2019): see detailed times and rooms on ENT (click ISTIC>M2 SIF)

20-21/11, 28/11, 4/12, (~~5/12)~~, 11-12/12 Rémi Gribonval
19/12 Oral exam: chapter presentation
8-9/01, 15-16/01 Aline Roumy
22/1 Written exam

Evaluation modalities (details to come)

Chapter presentation: oral evaluation on 19/12/2018 (each group of students will have to present the content of one chapter from the book of Shail Shalev-Shwarz & Shai Ben-David linked below;
- chapter 19 Nearest Neighbor; chapter 14 Stochastic Gradient Descent; chapter 20 Neural Networks; chapter 11 Model selection and validation; chapter 6 The VC dimension
- or a more advanced chapter; chapter 26 Rademacher Complexities; chapter 29 Multiclass Learnability
Written exam on 22/1/2018

Some references

Chapter of the future book of Martin Wainwright (concentration inequalities)
Book by Shai Shalev-Shwarz & Shai Ben-David, Understanding Machine Learning
Book by Roman Vershynin, High-Dimensional Probability – An Introduction with Applications in Data Science

Cours HDL (High Dimensional Statistical Learning) – M2 SIF