## Statistical Learning

#### Leiden University, Autumn 2013, 2nd year master of statistical science

You are reading the homepage http://www.math.leidenuniv.nl/~avdvaart/statlearn/index.html of this course.
This year's course was adapted from earlier courses by Prof. Dr. Peter Grünwald, and we gratefully acknowledge his permission for using his materials.

#### Homework

Both homework assignments involve setting up some experiments in R, experimenting, and writing a short report about the results. Discussing the problems in the group is encouraged, but every participant must do her or his experiments and write her/his own report. Hand in your solution electronically as a pdf file. Let it be a self-contained, readable report, which includes pictures, tables, computer output, explanations, interpretations, answer in the text.
1. The first homework assignment on regression can be found on blackboard from November 18 and is due December 8. Email the solution as a single .pdf file to the lecturer (or use blackboard).
2. The second homework assignment about classification can be found on blackboard from December 9 and is due January 1.

#### Preliminary Course Schedule

The following preliminary schedule may not be updated. Instead we shall employ the blackboard environment for late changes. Make sure you are enrolled and check the blackboard pages!

If a section is included without mention of subsections (e.g. Section 2.1), this means that all subsections of that section are covered (i.e. you should also be familiar with Sections 2.1.1, 2.1.2, etc.).

1. November 4: Introduction, Regression Part I
• General introduction: statistical learning, supervised learning, regression and classification, overfitting, linear classifiers, nearest-neighbor classification, expected prediction error and Bayes-optimal prediction rule.
• Literature: Chapter 1 and Sections 2.1-2.5, 2.6.1, 2.6.2, 2.9 of "The Elements of Statistical Learning".
2. November 11: no lecture, self-study of 2.6-2.8 and linear algebra (see blackboard), read 3.1-3.2.
3. November 18: Regression, Part II
• Linear regression: basics, incorporating nonlinearities by extending the features; least squares, interpretation of least squares as orthogonal projection, maximum likelihood. Bias-variance decomposition for squared error loss. Three Possible GOALS in learning; two views on the notion of ``model'' and ``fitting''.
• Model selection and overfitting: subset selection, cross-validation, shrinkage methods (ridge regression and lasso).
• Literature: Sections 3.1-3.5 (except 3.2.4 and 3.4.4), 5.1.
4. November 25: Regression Part III and Classification Part I
• Regression: Bayes MAP interpretation of Ridge Regression and Lasso, LARS.
• Classification: Bayes classifier. (Problems with Least Squares for classification), Linear Discrimant Analysis (LDA)
• Literature: Sections (3.4.4), 4.1, (4.2), 4.3, 4.4 (4.4.1 until (4.23) and without 4.4.3), 4.5 (without 4.5.1 and everything past (4.46)).
5. December 2: Classification Part II
• Naive Bayes classifier; Naive Bayes and Logistic Regression; Optimal Separating Hyperplanes; Support Vector Machines; the Kernel Trick; SVM learning as regularized hinge loss fitting.
• Literature: Sections 4.5.2, 6.6.3, 12.2, 12.3.1, 12.3.2.
6. December 9: Classification Part III
• Classification and regression trees (CART), bagging, random forests, (MARS), boosting (AdaBoost), boosting as forward stagewise additive modeling.
• Literature: Sections 9.2, 8.4, 15.1-15.2, (9.4), 10.1-10.5, 10.6 (only the part about classification).
7. December 16: Model Assessment and selection (lecturer: Johannes Schmidt-Hieber)
• Bias, variance and model complexity, AIC, training and prediction error, Bayesian approach, Bayes factors and BIC, Model Averaging.
• Literature: Sections 7.1-7.4, 7.5, 7.7.
8. December 23: Unsupervised Learning
• Clustering: K-means, EM with Gaussian Mixtures.
• Literature: Sections 14.1, 14.3 before 14.3.1, 14.3.6, 14.3.7, 8.5.1.

Here you can find an example of examination questions