Instructors
Teaching Assistants
- Rishi Jha - Office Hours Tuesday, 2-3p. Bloomberg 338. Zoom link on Canvas (only open if announced on Slack).
- Tingwei Zhang - Office Hours Monday, 1:45-2:45p. Bloomberg 338. Zoom link on Canvas (only open if announced on Slack).
Please message us via Slack with course-related questions
Course Schedule
Date |
Topic |
Readings |
Resources |
Aug 25 |
Course logistics. AI pipelines and threats. |
- |
Slides |
Aug 27 |
Survey: AI in 2025. |
- |
Slides Puzzle 🧩 |
Sep 3 |
Definitions: AI security, safety, privacy, and trustworthiness. |
Trustworthy AI (Wing, 2021)
Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems (Dalrymple et al., 2024)
|
Slides |
Sep 8 |
Adversarial examples and adversarial robustness |
Intriguing properties of neural networks (Szegedy et al., 2013) |
Slides Puzzle 🧩 |
Sep 10 |
Data poisoning |
Required:
Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning (Jagielski et al., 2018)
Poisoning the Unlabeled Dataset of Semi-Supervised Learning (Carlini, 2021)
Recommended:
Poisoning Attacks against Support Vector Machines (Biggio et al., 2012)
Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks (Shafahi et al., 2018)
Poisoning Web-Scale Training Datasets is Practical (Carlini et al., 2023)
|
Slides |
Sep 15 |
Backdoor attacks |
Required:
BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain (Gu et al., 2017)
You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion (Schuster et al., 2021)
Poisoning Language Models During Instruction Tuning (Wan et al., 2023)
Recommended:
Black-Box Adversarial Attacks on LLM-Based Code Completion (Jenko et al., 2024)
|
Sep 17 |
Membership inference |
Required:
Membership Inference Attacks against Machine Learning Models (Shokri et al., 2017)
Membership Inference Attacks From First Principles (Carlini et al., 2021)
Recommended:
Do Membership Inference Attacks Work on Large Language Models? (Duan et al., 2024)
|
Slides |
Sep 22 |
Model stealing |
Required:
Stealing Machine Learning Models via Prediction APIs (Tramer et al., 2016)
Imitation Attacks and Defenses for Black-box Machine Translation Systems (Wallace et al., 2020)
Stealing Part of a Production Language Model (Carlini et al., 2024)
Recommended:
Adversarial Learning (Lowd and Meek, 2005)
High Accuracy and High Fidelity Extraction of Neural Networks (Jagielski et al., 2019)
|
Slides Puzzle 🧩 |
Sep 24 |
Model inversion |
Required:
Model Inversion Attacks That Exploit Confidence Information and Basic Countermeasures (Fredrikson et al., 2015)
Text Embeddings Reveal (Almost) As Much As Text (Morris et al.)
Recommended:
Deep Leakage from Gradients
|
Slides |
Sep 29 |
Memorization 1 |
The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks (Carlini et al., 2018)
Quantifying Memorization Across Neural Language Models (Carlini et al., 2022)
|
Slides |
Oct 1 |
Explainability and interpretability |
guest: Chandan Singh |
Oct 6 |
Watermarking data and models |
|
Oct 8 |
Fairness and bias in AI |
guest: Angelina Wang |
Oct 15 |
Memorization 2 |
|
Oct 20 |
Security of AI agents |
guest: Polina Kirichenko |
Oct 22 |
Midterm |
|
Oct 27 |
Differentially private machine learning |
|
Course Overview and Learning Outcomes
This course is about safety, security, privacy, alignment, and adversarial robustness of modern AI and ML technologies. Topics include threats and risks specific to these technologies, understanding vulnerabilities and state-of-the-art defenses, and how to build and use trustworthy AI/ML systems.
Learning Outcomes
- Understand what it means for an AI/ML system to be safe, secure, and privacy-preserving
- Be able to describe the threats and risks faced by AI/ML systems, and technologies that are available to defend against these threats
- Be able to build adversarially robust AI/ML systems
Prerequisites
- Fluency in Python. Experience managing Python code across multiple files of a large codebase
- Background in machine learning. Experience training neural networks on simple tasks such as classification and regression
- Basic data visualization ability in Python, including data science with Pandas and the matplotlib library
Course Materials
Lecture notes and occasional course readings will be available through links on the course schedule. Lectures will cover some material that is not in the notes or readings. Attendance is mandatory and tests will include this material.
Assignments and Grading Criteria
Assignment |
Weight |
Due Date |
Assignment 1 |
15% |
9/22 |
Assignment 2 |
15% |
10/20 |
In-class midterm exam |
15% |
10/22 |
Assignment 3 |
15% |
11/17 |
Assignment 4 |
15% |
12/8 |
In-class final exam |
15% |
12/12 |
Attendance and participation |
10% |
- |
Note: Due dates subject to change, please check back frequently.
Policies
Collaboration Policy
Assignments can be done in teams of 2. All exams are in-class and strictly individual.
Policy on Late Submissions
You have 3 late days for the entire semester to use any way you want (submit one assignment 3 days late, 3 assignments 1 day late each, etc.). Partial days are rounded up to the next full day.
After you use up your late days, you get 0 points for each late assignment.
Policy on LLMs and Other Generative AI Tools and Technologies
We discourage the use of LLMs and similar AI tools. For any assignment where you opted to use AI in spite of this discouragement, you must disclose what you used, how, and your specific prompts in a dedicated document called AI.txt
. Failure to disclose AI use is a serious violation of academic integrity and will be treated as such.
You are responsible for completely understanding all code you submit. We will be performing random checks to test your understanding. TAs will not help debug LLM-generated code. When asking TAs for help, you must disclose all uses of LLMs and be able to explain how every part of the code is intended to work.
The use of LLMs is strictly prohibited for the in-class exams.
Academic Integrity
We expect you to abide by Cornell's Code of Academic Integrity at all times. Please note that the Code specifically states that a "Cornell student's submission of work for academic credit indicates that the work is the student's own. All outside assistance should be acknowledged, and the student's academic position truthfully reported at all times." Please contact us if you have any questions or concerns about appropriately acknowledging others' work in your submitted assignments. You should expect that we will rigorously enforce the Code.