ADM - Analyse de Données et Modélisation
probabiliste
M. Sc. course - Data analysis and probabilistic
modeling
Making data speak: Advanced
probabilistic data analysis and modeling
Data, whatever they are, are of very limited value without the
possibility to extract valuable information to better synthesize,
understand, predict. Statistical methods for data analysis and
probabilistic models for statistical machine learning are commonly
used to do so. This course aims at acquiring the basic techniques
for data analysis (exploratory statistics) and probabilistic
modeling (inferential statistics) and to study their application
to different types of data (symbolic data, language, numerical
data, signals, images, etc.). The lectures naturally articulate
around the two major steps of any modeling process: understand
your data then design an adequate model.
Keywords: data analysis, factor analysis, variance
analysis, clustering, hypothesis testing, decision theory,
estimation theory, Gaussian mixture models, EM algorithm, Markov
chains, Markov fields, hidden Markov chains, Viterbi algorithm,
Bayesian networks, token passing algorithm
Lectures, with the 2020-2021 dates
- Thu. Sep. 17 14h45. A gentle
reminder of the basics of probability: Kolmogorov,
random variables, moments, classical laws
- Fri. Sep. 18 11h. Exploratory
statistics: visualization, summaries, correlation,
factor analysis, PCA/LDA
- Thu. Sep. 24 14h45. Cluster
analysis: k-means, agglomerative/divisive clustering,
spectral clustering and other weird things
- Thu. Oct. 1 14h45. Fundamentals
of statistical machine learning and estimation theory: cost
function, decision theory, empirical estimation, estimation
theory, practical estimation techniques
- Fri. Oct. 2 11h. Mixture models:
mixture models, hidden variables, estimation-maximization
(EM) algorithm
- Thu. Oct. 8 14h45. Observable
and hidden Markov models: Markov property,
Markov chain, hidden Markov chain, Viterbi algorithm
- Fri. Oct. 9 11h. Hidden
Markov model (continued): Baum-Welsh algorithm,
practical examples
- Thu. Oct. 15 11h. Entropy
and conditional random fields: maximum entropy
principle, maxent model, logistic regression, log-linear
sequence models, parameter estimation
- Thu. Oct. 22 14h45. Graphical
models and Bayesian network: directed/undirected
graphical models, Bayesian networks, inference and reasoning,
moralization, variable elimination, junction tree algorithm
- Fri. Oct. 23 11h. Hypothesis
testing: typology, likelihood ratio test, classical
mean value tests, comparison and statistical significance,
variance analysis
- Fri. Nov. 13 14h45. Final exam - the exact
form of the exam isn't fully defined yet because
of the need to account for potential "cas contacts" who
could not make it ti the exam room.
Evaluation / exams
- Homework. Write a short comment on either one of the
articles below. Maximum length is 1000 words, ca. 1.5-2 pages
single column 11 point font (English or French, as you wish).
Your report shall identify the techniques seen in the classroom,
explain why they are appropriate in the context of this paper
and what efforts authors have made to cast their work into a
probabilistic framework, explain how they were adapted and/or
extended, discuss the limits you foresee (whether mentioned in
the paper or not). Deadline for mailing comment: before Nov.
2, 2020, 08:00 CET
- Final exam. Standard 2h written exam. You can check the
text of past exams below.