| Date |
Topic |
Readings |
Resources |
| Aug 25 |
Course logistics. AI pipelines and threats. |
- |
Slides |
| Aug 27 |
Survey: AI in 2025. |
- |
Slides Puzzle 🧩 |
| Sep 3 |
Definitions: AI security, safety, privacy, and trustworthiness. |
Trustworthy AI (Wing, 2021)
Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems (Dalrymple et al., 2024)
|
Slides |
| Sep 8 |
Adversarial examples and adversarial robustness |
Intriguing properties of neural networks (Szegedy et al., 2013) |
Slides Puzzle 🧩 |
| Sep 10 |
Data poisoning |
Required:
Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning (Jagielski et al., 2018)
Poisoning the Unlabeled Dataset of Semi-Supervised Learning (Carlini, 2021)
Recommended:
Poisoning Attacks against Support Vector Machines (Biggio et al., 2012)
Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks (Shafahi et al., 2018)
Poisoning Web-Scale Training Datasets is Practical (Carlini et al., 2023)
|
Slides |
| Sep 15 |
Backdoor attacks |
Required:
BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain (Gu et al., 2017)
You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion (Schuster et al., 2021)
Poisoning Language Models During Instruction Tuning (Wan et al., 2023)
Recommended:
Black-Box Adversarial Attacks on LLM-Based Code Completion (Jenko et al., 2024)
|
| Sep 17 |
Membership inference |
Required:
Membership Inference Attacks against Machine Learning Models (Shokri et al., 2017)
Membership Inference Attacks From First Principles (Carlini et al., 2021)
Recommended:
Do Membership Inference Attacks Work on Large Language Models? (Duan et al., 2024)
|
Slides |
| Sep 22 |
Model stealing |
Required:
Stealing Machine Learning Models via Prediction APIs (Tramer et al., 2016)
Imitation Attacks and Defenses for Black-box Machine Translation Systems (Wallace et al., 2020)
Stealing Part of a Production Language Model (Carlini et al., 2024)
Recommended:
Adversarial Learning (Lowd and Meek, 2005)
High Accuracy and High Fidelity Extraction of Neural Networks (Jagielski et al., 2019)
|
Slides Puzzle 🧩 |
| Sep 24 |
Model inversion |
Required:
Model Inversion Attacks That Exploit Confidence Information and Basic Countermeasures (Fredrikson et al., 2015)
Text Embeddings Reveal (Almost) As Much As Text (Morris et al., 2023)
Recommended:
Deep Leakage from Gradients (Zhu et al., 2019)
|
Slides |
| Sep 29 |
Memorization |
The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks (Carlini et al., 2018)
Quantifying Memorization Across Neural Language Models (Carlini et al., 2022)
|
Slides |
| Oct 1 |
Explainability and interpretability
guest: Chandan Singh
|
|
| Oct 6 |
Watermarking data and models |
Required:
A Watermark for Large Language Models (Kirchenbauer et al., 2023)
Radioactive data: tracing through training (Sablayrolles et al., 2020)
Recommended:
Scalable watermarking for identifying large language model outputs (Dathathri et al., 2024)
|
| Oct 8 |
Fairness and bias in AI
guest: Angelina Wang
|
Required:
Gender Shades (Buolamwini and Gebru, 2018)
Data Feminism for AI (Klein and D'Ignazio, 2024)
|
| Oct 15 |
Indirect prompt injection and defenses
guest: Sizhe Chen
|
Required:
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection (Greshake et al., 2023)
SecAlign: Defending Against Prompt Injection with Preference Optimization (Chen et al., 2024)
|
Slides
|
| Oct 20 |
Hallucinations and uncertainty in LLMs
guest: Polina Kirichenko
|
Required:
AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions (Kirichenko et al., 2025)
|
Slides |
| Oct 22 |
Midterm |
|
| Oct 27 |
AI and copyright
guest: James Grimmelmann
|
Required:
Talkin’ ‘Bout AI Generation: Copyright and the Generative-AI
Supply Chain (Lee et al., 2024)
|
| Oct 29 |
Alignment |
Required:
Constitutional AI: Harmlessness from AI Feedback (Bai et al., 2022)
|
Slides |
| Nov 3 |
LLM safety alignment and jailbreaking |
Required:
Jailbroken: How Does LLM Safety Training Fail? (Wei et al., 2023)
Universal and Transferable Adversarial Attacks on Aligned Language Models (Zou et al., 2023)
Recommended:
Are aligned neural networks adversarially aligned? (Carlini et al., 2023)
|
| Nov 5 |
Contextual integrity
guest: Helen Nissenbaum
|
Required:
A Contextual Approach to Privacy Online (Nissenbaum, 2011)
Recommended:
Contextual Integrity Up and Down
the Data Food Chain (Nissenbaum, 2019)
No Cookies For You!: Evaluating The Promises Of Big Tech’s ‘Privacy-Enhancing’ Techniques (Martin et al., 2025)
|
| Nov 10 |
Hacking AI agents
guest: Rishi Jha
|
|
| Nov 12 |
Unlearning (and why it's hard) |
Required:
Machine Unlearning in 2024 (Liu)
Recommended:
Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice (A. Feder Cooper et al., 2024)
|
| Nov 17 |
Training data extraction |
|
| Nov 19 |
Differentially private machine learning |
Required:
Deep Learning with Differential Privacy (Abadi et al., 2016)
VaultGemma (Google Research, 2025)
Recommended:
Evaluating Differentially Private Machine Learning in Practice (Jayaraman and Evans, 2019)
|
| Nov 24 |
Fine-tuning risks |
|
| Dec 1 |
Reasoning models. Reward hacking and deception. |
|
| Dec 3 |
Deepfakes and other abuses of AI
guest: Alexios Mantzarlis
|
|
| Dec 8 |
AGI and ASI. The existential risks debate. Governance of AI. |
|