* below indicates equal contribution

On the Hardness of Faithful Chain-of-Thought Reasoning in Large Language Models

D. Ley, S.H. Tanneru, C. Agarwal, H. Lakkaraju: arXiv, 2024

Paper

Effects of Fine-Tuning on Chain-of-Thought

Coming soon!!

Paper

Quantifying Uncertainty in Natural Language Explanations of Large Language Models

S. H. Tanneru, C. Agarwal, H. Lakkaraju: AISTATS, 2024

Spotlight at the NeurIPS R0-FoMo Workshop, 2023

Paper | Code

Towards Safe Large Language Models for Medicine

T. Han, A. Kumar, C. Agarwal, and H. Lakkaraju: NeurIPS, 2024

Paper

Understanding the Effects of Iterative Prompting on Truthfulness

S. Krishna, C. Agarwal, and H. Lakkaraju: ICML, 2024

Paper

Certifying llm safety against adversarial prompting

A. Kumar, C. Agarwal, S. Srinivas, A. Li, S. Feizi, H. Lakkaraju: COLM, 2024

Paper | Code | Science News | Harvard D3 Institute

Active Transferability Estimation

T. Menta, S. Jandial, A. Patil, S. Bachu, V. K B, V. Balasubramanian, B. Krishnamurthy, M. Sarkar, C. Agarwal: Workshop on Learning with Limited Labelled Data for Image and Video Understanding, CVPR 2024

Paper

Are Large Language Models Post Hoc Explainers?

N. Kroeger*, D. Ley*, S. Krishna, C. Agarwal, and H. Lakkaraju

R0-FoMo workshop, NeurIPS 2023

Paper | Code

Towards Fair Knowledge Distillation using Student Feedback

A. Java*, S. Jandial*, and C. Agarwal

Efficient Systems for Foundation Models Workshop, ICML 2023

Paper | Code

Explain like I am BM25: Interpreting a Dense Model's Ranked-List with a Sparse Approximation

M. Llordes, D. Ganguly, S. Bhatia, C. Agarwal: SIGIR, 2023

Paper | Code

Explaining RL decisions with trajectories

S. Deshmukh*, A. Dasgupta*, B. Krishnamurthy, N. Jiang, C. Agarwal, G. Theocharous, J. Subramanian: ICLR, 2023

Paper | Code

GNNDelete: A general strategy for unlearning in graph neural networks

J. Cheng, G. Dasoulas*, H. He*, C. Agarwal, M. Zitnik: ICLR, 2023

Paper | Code

Dear: Debiasing vision-language models with additive residuals

A. Seth, M. Hemani, C. Agarwal: CVPR, 2023

Paper | Protected-Attribute Tag Association dataset

OpenXAI: Towards a Transparent Evaluation of Post hoc Model Explanations

C. Agarwal, S. Krishna, E. Saxena, M. Pawelczyk, N. Johnson, I. Puri, M. Zitnik, H. Lakkaraju: NeurIPS Dataset and Benchmark, 2023

Paper | Code | Harvard Dataverse | 218 GitHub

Evaluating explainability for graph neural networks

C. Agarwal, O. Queen, H. Lakkaraju, and M. Zitnik: Nature Scientific Data, 2023

Paper | Code | Harvard Dataverse | 142 GitHub

Estimating example difficulty using variance of gradients

C. Agarwal, D. D'souza, and S. Hooker: CVPR, 2022

Paper | Code | Project Website | 58 GitHub

Probing gnn explainers: A rigorous theoretical and empirical analysis of gnn explanation methods

C. Agarwal, M. Zitnik*, H. Lakkaraju*: AISTATS, 2022

Paper

Exploring counterfactual explanations through the lens of adversarial examples: A theoretical and empirical analysis

M. Pawelczyk, C. Agarwal, S. Joshi, S. Upadhyay, H. Lakkaraju: AISTATS, 2022

Paper

Towards Training GNNs using Explanation Directed Message Passing

V. Giunchiglia*, CV. Shukla*, G. Gonzalez, C. Agarwal: LOG, 2022

Paper | Code

Towards a unified framework for fair and stable graph representation learning

C. Agarwal, H. Lakkaraju, and M. Zitnik: UAI, 2021

Paper | Code | 37 GitHub