STAT 681: Introduction to Statistical Learning, Fall 2018
Instructor: Haiming Zhou.
Office Hours: MW 2-3pm, and by appointment.
Office: 359E DuSable Hall, (331) 256-7793.
Class Meetings: MW 12:30pm - 1:45pm, DuSable Hall 274.
Textbook: An Introduction to Statistical Learning with Applications in R, by James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013).
(Note: Please submit your work in a single pdf file on Blackboard.)
- Homework 1: Due on September 16. Solution.
- Homework 2: Chapter 4: 3, 7, 10; Chapter 5: 2, 8, 9; and the additional problem. Due on October 7. A very helpful website: R Code for Comparing Decision Boundaries of Different Classifiers. Solution.
- Homework 3: Due on October 28. Solution.
- Homework 4: Problem 2, 5, 7, 8, 10 in Chapter 8. Due on November 11.
- Homework 5: Problem 1, 2, 3, 7 in Chapter 9. Due on December 9.
- Exam I will be given on Wednesday (10/10/2018) in class. It covers Chapters 2-4. Books, notes or device with internet capability are not allowed. Calculator is allowed, but all memory in the calculator must be cleared before this exam. Formula sheet and scratch paper will be provided during the exam. Here is a set of Practice Problems. Solution.
- Exam II will be given on Wednesday (11/14/2018) in class. It covers Chapters 5-8. Books, notes or device with internet capability are not allowed. Calculator is allowed, but all memory in the calculator must be cleared before this exam. Formula sheet and scratch paper will be provided during the exam. Here is a set of Practice Problems.
- Applied Data Mining and Statistical Learning: Very good online lecture notes from Pennsylvania State University.
- caret: An R Package for Classification and Regression Training
- scikit-learn: Machine Learning in Python
- ISLR-python: This repository contains Python code for a selection of tables, figures and LAB sections from the textbook
- What is the Difference Between Test and Validation Datasets? (link)
- Big Idea To Avoid Overfitting: Reusable Holdout to Preserve Validity in Adaptive Data Analysis. (link)
- 21 Must-Know Data Science Interview Questions and Answers: part 1 and part 2
- Evaluating Logistic Regression Models. (link)
- More About Ridge Regression. link 1, link 2
- Regularization and Variable Selection via the Elastic Net. (link)
- Should we use lasso estimates or OLS estimates on the Lasso-indentified subset of variables? A discussion.