You are reading the homepage
http://www.math.leidenuniv.nl/~avdvaart/statlearn/index.html of this course.
for a general course description.
This year's course was adapted from earlier courses by
Prof. Dr. Peter Grünwald, and we gratefully
acknowledge his permission for using his materials.
|| Aad van der Vaart, Leiden
University, Mathematical Institute|
|Contact: ||Come see me in office R 2.23, or for simple questions, send email.|
|Course load: ||4 ECTS.|
|Dates: ||Lectures take place on the dates indicated below.|
|Hours:|| First session from 10.00-16.30; later sessions from 11.15 to 15.30. |
|Location:|| Room 409 of the Snellius Building, Niels Bohrweg 1, Leiden. |
|Examination:|| To pass the course you must obtain a sufficient grade (6 or higher) on
both of the following two:
The final grade will be determined as the average of the
homework grades and the final open-book examination.
- The written open-book examination on January 8, 14-17 hours; or the resit on February 14,
- Homework Projects. We hand out two homework assignments, possibly in parts (see below). The final
homework grade will be determined as an average of the grades for the two assignments.
Elements of Statistical Learning, 2nd edition, by Trevor Hastie,
Robert Tibshirani and Jerome Friedman, Springer-Verlag 2009. This book
can be downloaded for free at the above link. Alternatively, an electronic
version is available as pdf through the university library, where it is also possible
to order a paper copy through Springer link for 25 euros.
Homework Both homework assignments involve setting up some
experiments in R, experimenting, and writing a short report about the
results. Discussing the problems in the group is encouraged, but every
participant must do her or his experiments and write her/his own report.
Hand in your solution electronically as a pdf file. Let it be a self-contained,
readable report, which includes pictures, tables, computer output, explanations,
interpretations, answer in the text.
- The first homework assignment on regression can be found on blackboard from November 18 and
is due December 8. Email the solution as a single .pdf file to the lecturer (or use blackboard).
- The second homework assignment about classification can be found on blackboard
from December 9 and is due January 1.
Preliminary Course Schedule
The following preliminary schedule may not be updated. Instead we
shall employ the blackboard environment for late changes.
Make sure you are enrolled and check the blackboard pages!
If a section is included without mention
of subsections (e.g. Section 2.1), this means that all subsections of
that section are covered (i.e. you should also be
familiar with Sections 2.1.1, 2.1.2, etc.).
- November 4: Introduction, Regression Part I
- General introduction: statistical learning, supervised learning,
regression and classification, overfitting, linear classifiers,
nearest-neighbor classification, expected prediction error and
Bayes-optimal prediction rule.
- Literature: Chapter 1 and Sections 2.1-2.5, 2.6.1, 2.6.2, 2.9
of "The Elements of Statistical Learning".
- November 11: no lecture, self-study of 2.6-2.8 and linear algebra (see blackboard),
- November 18: Regression, Part II
- Linear regression:
basics, incorporating nonlinearities by extending the features; least
squares, interpretation of least squares as orthogonal projection,
maximum likelihood. Bias-variance decomposition for squared error
loss. Three Possible GOALS in learning; two views on the notion of
``model'' and ``fitting''.
- Model selection and overfitting: subset selection,
cross-validation, shrinkage methods (ridge regression and
- Literature: Sections 3.1-3.5 (except 3.2.4 and 3.4.4), 5.1.
- November 25: Regression Part III and Classification Part I
- Regression: Bayes MAP interpretation of Ridge Regression and Lasso, LARS.
- Classification: Bayes classifier.
(Problems with Least Squares for classification), Linear Discrimant Analysis (LDA)
- Literature: Sections (3.4.4), 4.1, (4.2), 4.3, 4.4 (4.4.1 until (4.23) and without 4.4.3),
4.5 (without 4.5.1 and everything past (4.46)).
- December 2: Classification Part II
- Naive Bayes classifier; Naive Bayes and Logistic Regression;
Optimal Separating Hyperplanes; Support Vector Machines; the Kernel Trick;
SVM learning as regularized hinge loss fitting.
- Literature: Sections 4.5.2, 6.6.3, 12.2, 12.3.1, 12.3.2.
- December 9: Classification Part III
- Classification and regression trees (CART), bagging, random forests, (MARS),
boosting (AdaBoost), boosting as forward stagewise additive modeling.
- Literature: Sections 9.2, 8.4, 15.1-15.2, (9.4), 10.1-10.5, 10.6
(only the part about classification).
- December 16: Model Assessment and selection (lecturer: Johannes Schmidt-Hieber)
- Bias, variance and model complexity, AIC, training and prediction error, Bayesian approach,
Bayes factors and BIC, Model Averaging.
- Literature: Sections 7.1-7.4, 7.5, 7.7.
- December 23: Unsupervised Learning
- Clustering: K-means, EM with Gaussian Mixtures.
- Literature: Sections 14.1, 14.3 before 14.3.1, 14.3.6, 14.3.7, 8.5.1.
Here you can find an example of examination questions