π Office Hours
Schedule
Note: The following schedule is a general outline that we plan to follow. Depending on the pace of the course, some topics may be explored in greater detail, while others might be adjusted or omitted. Assignments are currently planned to be released on Thursdays of the corresponding week, though this is subject to change.
-
Week 1 - 8/26: Introduction
Auxiliary: L.A. Refresher P.S. RefresherNotebooks: Intro to R Intro to Python
-
Week 2 - 9/02: Statistical Learning
Lectures: Statistical Learning 2 Linear Regression
-
Week 3 - 9/09: Linear Regression
-
Week 4 - 9/16: Classification
Lectures: Classification Classification
-
Week 5 - 9/23: Classification & Homework 1
Lectures: Classification Review session, section 3.2 in ESL and section 4.5.1 in ISLR
-
Week 6 - 9/30: Resampling & Midterm 1
Lectures: ResamplingMidterm: Midterm 1 Solutions
-
Week 7 - 10/07: Resampling Methods & Model Selection
Lectures: Resampling Model Selection
- Week 8 - 10/14: Linear Model Selection and Regularization (LMSR)
-
Week 9 - 10/21: LMSR
Lectures: Model SelectionReview: Review Session
-
Week 10 - 10/28: LMSR & Midterm 2
Lectures: Model SelectionMidterm: Midterm 2 Solutions
-
Week 11 - 11/04: Nonlinear Models & Tree-Based Methods
Lectures: Nonlinear Models Tree-Based Methods
- Week 12 - 11/11: Tree-Based Methods & Homework 3
-
Week 13 - 11/18: Unsupervised Learning & Midterm 3
Lectures: Unsupervised LearningMidterm: Midterm 3 Solutions
-
Week 14 - 11/25: SVM & Thanksgiving
Lectures: Support Vector MachineHomework: Homework 4 (Due: 12/15)
- Week 15 - 12/02: Deep Learning
Recommended Books & Resources
These recommendations are meant to be useful for students willing to dive deeper into theoretical justifications of the concepts presented in my course. I highly encourage you to have a look at those amazing documents.
Theoretical Statistics
-
π Topics for a Core Course by Robert W. Keener
I recommend chapter 14.
High Dimensional Statistics
-
π HDS Lecture Notes by Philippe Rigollet and Jan-Christian HΓΌtter
I recommend sections 2.3 and 2.4. -
π Introduction to HDS by Christophe Giraud
I recommend section 5.2. -
π Mathematical Statistics: A Non-Asymptotic Approach by Alexander Rakhlin
I recommend chapter 6.
Convex Optimization
-
π§ Convex Optimization by Stephen Boyd and Lieven Vandenberghe
I recommend chapters 1, 2, 3, 4 and 5.
Description
This course provides an introduction to the fundamental concepts and techniques in statistical learning and machine learning, with a focus on understanding the theoretical underpinnings of various machine learning algorithms and their implementation in R (and tentatively in Python).
Objectives
By the end of this course, students will be able to:
- Explain the concepts of regression, classification, and clustering, and apply them to real-world problems.
- Implement machine learning algorithms in R (and tentatively in Python).
- Evaluate and compare the performance of different machine learning models.
- Understand the trade-offs involved in model selection and regularization.
Lectures
- The lectures for this course will be held on Tuesdays and Thursdays from 11:40am to 12:55pm in Phillips Hall, room 101.
Prerequisites
- Prerequisites: CS 1112, MATH 2220, STSCI 3200, STSCI 3080 or MATH 4710 or equivalents. Students must have a good command of basic statistics, probability, linear algebra and calculus. Proficiency in R or Python programming or willingness to learn is required.
- Required textbook (R): An Introduction to Statistical Learning with Applications in R, 2nd Edition, Springer, 2021 by G.James, D.Witten, T.Hastie, and R.Tibshirani. (ISLR) β Download
- Required textbook (Python): An Introduction to Statistical Learning with Applications in Python, Springer, 2023 by G.James, D.Witten, T.Hastie, R.Tibshirani and J.Taylor. (ISLP) β Download
- Supplementary textbook: The Elements of Statistical Learning, 2nd Edition, Springer, 2017 by T.Hastie, R.Tibshirani, and J.Friedman. (ESL) β Download
Materials
The materials for this class will be uploaded on this page. It is entirely your responsibility to download them as needed. A brief description of these materials follows.
- The syllabus should be used as a reference throughout the year for important dates, including exams, and for course policies.
- Lecture notes will be posted on this website, in the Schedule section, as the semester progresses. While these notes form the foundation of my lectures, additional insights and details will be provided during class.
- The lecture notes are designed to complement, not replace, the textbook. Their purpose is to guide you through new material more easily. It is your responsibility to thoroughly read both the notes and the corresponding textbook chapters. Ensure you identify the relevant sections and subsections in the textbook that align with the lecture notes. This constitutes your reading assignment for the semester.
Grading Policy
Your grade in this class will be based on homeworks and exams, as detailed below.
-
Homeworks (20%)
You will receive four assignments counting towards 20% of the grade. The lowest homework score will be dropped, with the remaining three assignments weighted equally.
For students in 5740, you may encounter one or two additional questions per assignment, which are required for 5740 but optional for 3740 (offering bonus points for 3740 students).
Late homework submissions will incur a 20% penalty if submitted within 24 hours past the deadline; submissions beyond that will not be accepted. Solutions will be posted on the course website two days after the submission window closes.
-
Midterms (50%)
There will be three in-class tests, each during regular lecture times, collectively accounting for 50% of your final grade. The lowest midterm score will be dropped. For 5740 both remaining midterms will carry equal weight. For 3740 the remaining lowest score will be weighted 1/3 and the best score will be weighted 2/3.
Each test will cover the material discussed in class up to the exam date, including problems solved in lectures and all the homeworks due before the exam. I will provide an overview of the exam in class and post a detailed outline of the required materials before each test.
All exams are closed-book, and the use of any electronic devices is strictly prohibited. This includes computers, calculators, cellphones, and other electronic gadgets.
Students with approved extended time: please see the section on accommodations below.
-
Final Project (30%)
The final project for this course will be a take-home data analysis assignment, designed to be completed at the end of the semester. The project will require students to work in groups, and the datasets along with specific questions for analysis will be distributed around October 20th. Students are expected to form groups of 3 to 4 members. These groups should be finalized and approved by the instructor no later than November 1st. Any student who have not joined a group by this deadline will be assigned to a group by the instructor.
The final report, which documents the results of your analysis, must be submitted as a PDF file by December 23rd. If the report is submitted late, a 20% penalty will be applied if it is received within 24 hours after the deadline; reports submitted after this period will not be accepted. The report should be no longer than 8 pages, formatted in a standard style with a font size of 12.
All data analysis must be conducted using R or Python, and the scripts used in your analysis must be submitted alongside the report, although these scripts will not count towards the 8 pages limit. Your project will be graded based on the effective application of the appropriate methods, the clarity and organization of the report, the accuracy of the interpretations, and the reproducibility of your analysis using the provided scripts.
There is no curving of grades in this class. Your final grade will be based entirely on your performance.
Students with Disabilities
- Students with disabilities are encouraged to engage fully in this course, and your access needs are a priority. To ensure that your approved accommodations are arranged in a timely manner, you must request your accommodation letter via the SDS Student Portal by August 31st.
- For students who are already registered with the Student Disability Services (SDS), please note that once you request your accommodation letter, it may take up to 48 hours for the letter to be processed and sent to me. If you are not yet registered with SDS, be aware that the process to register and receive new accommodations can take up to three weeks. Once approved, you will be able to request your accommodation letter for this course.
- If you are approved for accommodations later in the semester, it is important that you request your accommodation letter as soon as possible to avoid any delays in receiving the necessary support.
Students with Exam Accommodations
- Regarding exam accommodations, this course is participating in the Alternative Testing Program (ATP). All exams will be centrally managed by the ATP, and relevant information will be communicated through SDS-testing@cornell.edu and your SDS Student Portal. It is important to stay informed by reading these communications and visiting sds.cornell.edu/atp for additional details about the ATP process.
- Starting in Fall 2023, students no longer need to request each individual exam. However, if you have an academic conflict with a scheduled exam time, you must submit an "exam request form" in the SDS Student Portal. All requests for conflict exams must be submitted no later than 10 business days prior to the exam date, and conflict exams will be scheduled at standard times.
- For all relevant information and to manage your accommodations, please visit the SDS Student Portal at sds.cornell.edu.
Academic Integrity
- Course materials provided in this class are the intellectual property of the instructor. Students are strictly prohibited from buying, selling, or distributing any course materials without the express permission of the instructor. Engaging in such unauthorized activities is considered academic misconduct and will be treated accordingly.
- Every student in this course is expected to adhere to the Cornell University Code of Academic Integrity. All work submitted for academic credit must be the student's own original work. The use of AI resources, including tools like ChatGPT, is strictly prohibited in this class.
Wellness Resources
The material provided below has been thoughtfully compiled by students from the Body Positive Cornell organization. It offers a well-researched and comprehensive list of well-being resources available on campus. For detailed information and guidance, please refer to the following resources: