Trustworthy AI

Trustworthy AI

Fall 2025: MW 2:55-4:10p

Link to Course Slack


Instructors

Teaching Assistants

Please message us via Slack with course-related questions

Course Schedule

Date Topic Readings Resources
Aug 25 Course logistics.
AI pipelines and threats.
- Slides
Aug 27 Survey: AI in 2025. - Slides
Puzzle 🧩
Sep 3 Definitions: AI security, safety, privacy, and trustworthiness. Trustworthy AI (Wing, 2021)
Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems (Dalrymple et al., 2024)
Slides
Sep 8 Adversarial examples and adversarial robustness Intriguing properties of neural networks (Szegedy et al., 2013) Slides
Puzzle 🧩
Sep 10 Data poisoning Required:
Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning (Jagielski et al., 2018)
Poisoning the Unlabeled Dataset of Semi-Supervised Learning (Carlini, 2021)
Recommended:
Poisoning Attacks against Support Vector Machines (Biggio et al., 2012)
Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks (Shafahi et al., 2018)
Poisoning Web-Scale Training Datasets is Practical (Carlini et al., 2023)
Slides
Sep 15 Backdoor attacks Required:
BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain (Gu et al., 2017)
You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion (Schuster et al., 2021)
Poisoning Language Models During Instruction Tuning (Wan et al., 2023)
Recommended:
Black-Box Adversarial Attacks on LLM-Based Code Completion (Jenko et al., 2024)
Sep 17 Membership inference Required:
Membership Inference Attacks against Machine Learning Models (Shokri et al., 2017)
Membership Inference Attacks From First Principles (Carlini et al., 2021)
Recommended:
Do Membership Inference Attacks Work on Large Language Models? (Duan et al., 2024)
Slides
Sep 22 Model stealing Required:
Stealing Machine Learning Models via Prediction APIs (Tramer et al., 2016)
Imitation Attacks and Defenses for Black-box Machine Translation Systems (Wallace et al., 2020)
Stealing Part of a Production Language Model (Carlini et al., 2024)
Recommended:
Adversarial Learning (Lowd and Meek, 2005)
High Accuracy and High Fidelity Extraction of Neural Networks (Jagielski et al., 2019)
Slides
Puzzle 🧩
Sep 24 Model inversion Required:
Model Inversion Attacks That Exploit Confidence Information and Basic Countermeasures (Fredrikson et al., 2015)
Text Embeddings Reveal (Almost) As Much As Text (Morris et al.)
Recommended:
Deep Leakage from Gradients
Slides
Sep 29 Memorization 1 The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks (Carlini et al., 2018)
Quantifying Memorization Across Neural Language Models (Carlini et al., 2022)
Slides
Oct 1 Explainability and interpretability guest: Chandan Singh
Oct 6 Watermarking data and models
Oct 8 Fairness and bias in AI guest: Angelina Wang
Oct 15 Memorization 2
Oct 20 Security of AI agents guest: Polina Kirichenko
Oct 22 Midterm
Oct 27 Differentially private machine learning

Course Overview and Learning Outcomes

This course is about safety, security, privacy, alignment, and adversarial robustness of modern AI and ML technologies. Topics include threats and risks specific to these technologies, understanding vulnerabilities and state-of-the-art defenses, and how to build and use trustworthy AI/ML systems.

Learning Outcomes

Prerequisites

Course Materials

Lecture notes and occasional course readings will be available through links on the course schedule. Lectures will cover some material that is not in the notes or readings. Attendance is mandatory and tests will include this material.

Assignments and Grading Criteria

Assignment Weight Due Date
Assignment 1 15% 9/22
Assignment 2 15% 10/20
In-class midterm exam 15% 10/22
Assignment 3 15% 11/17
Assignment 4 15% 12/8
In-class final exam 15% 12/12
Attendance and participation 10% -

Note: Due dates subject to change, please check back frequently.

Policies

Collaboration Policy

Assignments can be done in teams of 2. All exams are in-class and strictly individual.

Policy on Late Submissions

You have 3 late days for the entire semester to use any way you want (submit one assignment 3 days late, 3 assignments 1 day late each, etc.). Partial days are rounded up to the next full day.

After you use up your late days, you get 0 points for each late assignment.

Policy on LLMs and Other Generative AI Tools and Technologies

We discourage the use of LLMs and similar AI tools. For any assignment where you opted to use AI in spite of this discouragement, you must disclose what you used, how, and your specific prompts in a dedicated document called AI.txt. Failure to disclose AI use is a serious violation of academic integrity and will be treated as such.

You are responsible for completely understanding all code you submit. We will be performing random checks to test your understanding. TAs will not help debug LLM-generated code. When asking TAs for help, you must disclose all uses of LLMs and be able to explain how every part of the code is intended to work.

The use of LLMs is strictly prohibited for the in-class exams.

Academic Integrity

We expect you to abide by Cornell's Code of Academic Integrity at all times. Please note that the Code specifically states that a "Cornell student's submission of work for academic credit indicates that the work is the student's own. All outside assistance should be acknowledged, and the student's academic position truthfully reported at all times." Please contact us if you have any questions or concerns about appropriately acknowledging others' work in your submitted assignments. You should expect that we will rigorously enforce the Code.