Syllabus.

Instructor: Haiming Zhou.

Email: zhouh@niu.edu.

Office Hours: MW 2-3pm, and by appointment.

Office: 359E DuSable Hall, (331) 256-7793.

Class Meetings: MW 12:30pm - 1:45pm, DuSable Hall 274.

Textbook: *An Introduction to Statistical Learning with Applications in R*, by James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013).

- Chapter 2: Introduction to Statistical Learning. (
*R code*) - Chapter 3: Linear Regression. (
*R code*). A simulation example for checking the effect of colinearity on prediction. - Chapter 4: Logistic Regression | LDA | KNN. (
*R code*) - Chapter 5: Resampling Methods. (
*R code*) - Chapter 6: Linear Model Selection and Regularization. (
*R code*). Another good lecture slides by Professor Rob Tibshirani. Very good book on Lasso and its generalizations. - Chapter 7: Splines | Kernel Smoothing | Generalized Additive Models. (
*R code*). A good note on Splines. - Chapter 8: Tree-Based Methods | Random Forest | Boosting. (
*R code*) - Chapter 9: Support Vector Machines. (
*R code*). Example on how to calculate the distance from a point to a plane. Review of vector projection. - Chapter 10: Unsupervised Learning | PCA | Clustering.
- Chapter 11: Neural Networks. (
*R code*). Slides and Python code from Akhil Pandey.

- Homework 1: Due on September 16.
*Solution*. - Homework 2: Chapter 4: 3, 7, 10; Chapter 5: 2, 8, 9; and the additional problem. Due on October 7. A very helpful website: R Code for Comparing Decision Boundaries of Different Classifiers.
*Solution*. - Homework 3: Due on October 28.
*Solution*. - Homework 4: Problem 2, 5, 7, 8, 10 in Chapter 8. Due on November 11.
- Homework 5: Problem 1, 2, 3, 7 in Chapter 9. Due on December 9.

- Exam I will be given on Wednesday (10/10/2018) in class. It covers Chapters 2-4. Books, notes or device with internet capability are not allowed. Calculator is allowed, but all memory in the calculator must be cleared before this exam. Formula sheet and scratch paper will be provided during the exam. Here is a set of Practice Problems. Solution.
- Exam II will be given on Wednesday (11/14/2018) in class. It covers Chapters 5-8. Books, notes or device with internet capability are not allowed. Calculator is allowed, but all memory in the calculator must be cleared before this exam. Formula sheet and scratch paper will be provided during the exam. Here is a set of Practice Problems.

- Applied Data Mining and Statistical Learning: Very good online lecture notes from Pennsylvania State University.
- caret: An R Package for Classification and Regression Training
- scikit-learn: Machine Learning in Python
- ISLR-python: This repository contains Python code for a selection of tables, figures and LAB sections from the textbook
- What is the Difference Between Test and Validation Datasets? (link)
- Big Idea To Avoid Overfitting: Reusable Holdout to Preserve Validity in Adaptive Data Analysis. (link)
- 21 Must-Know Data Science Interview Questions and Answers: part 1 and part 2
- Evaluating Logistic Regression Models. (link)
- More About Ridge Regression. link 1, link 2
- Regularization and Variable Selection via the Elastic Net. (link)
- Should we use lasso estimates or OLS estimates on the Lasso-indentified subset of variables? A discussion.