PUBLICATIONS
On the Hardness of Faithful Chain-of-Thought Reasoning in Large Language Models
D. Ley, S.H. Tanneru, C. Agarwal, H. Lakkaraju: arXiv, 2024
Paper
Effects of Fine-Tuning on Chain-of-Thought
Coming soon!!
Quantifying Uncertainty in Natural Language Explanations of Large Language Models
S. H. Tanneru, C. Agarwal, H. Lakkaraju: AISTATS, 2024
Spotlight at the NeurIPS R0-FoMo Workshop, 2023
Paper | Code
Towards Safe Large Language Models for Medicine
T. Han, A. Kumar, C. Agarwal, and H. Lakkaraju: NeurIPS, 2024
Understanding the Effects of Iterative Prompting on Truthfulness
S. Krishna, C. Agarwal, and H. Lakkaraju: ICML, 2024
Certifying llm safety against adversarial prompting
A. Kumar, C. Agarwal, S. Srinivas, A. Li, S. Feizi, H. Lakkaraju: COLM, 2024
Paper | Code | Science News | Harvard D3 Institute
Active Transferability Estimation
T. Menta, S. Jandial, A. Patil, S. Bachu, V. K B, V. Balasubramanian, B. Krishnamurthy, M. Sarkar, C. Agarwal: Workshop on Learning with Limited Labelled Data for Image and Video Understanding, CVPR 2024
Are Large Language Models Post Hoc Explainers?
N. Kroeger*, D. Ley*, S. Krishna, C. Agarwal, and H. Lakkaraju
R0-FoMo workshop, NeurIPS 2023
Towards Fair Knowledge Distillation using Student Feedback
A. Java*, S. Jandial*, and C. Agarwal
Efficient Systems for Foundation Models Workshop, ICML 2023
Explain like I am BM25: Interpreting a Dense Model's Ranked-List with a Sparse Approximation
M. Llordes, D. Ganguly, S. Bhatia, C. Agarwal: SIGIR, 2023
Explaining RL decisions with trajectories
S. Deshmukh*, A. Dasgupta*, B. Krishnamurthy, N. Jiang, C. Agarwal, G. Theocharous, J. Subramanian: ICLR, 2023
GNNDelete: A general strategy for unlearning in graph neural networks
J. Cheng, G. Dasoulas*, H. He*, C. Agarwal, M. Zitnik: ICLR, 2023
Dear: Debiasing vision-language models with additive residuals
A. Seth, M. Hemani, C. Agarwal: CVPR, 2023
Paper | Protected-Attribute Tag Association dataset
OpenXAI: Towards a Transparent Evaluation of Post hoc Model Explanations
C. Agarwal, S. Krishna, E. Saxena, M. Pawelczyk, N. Johnson, I. Puri, M. Zitnik, H. Lakkaraju: NeurIPS Dataset and Benchmark, 2023
Paper | Code | Harvard Dataverse | 218 GitHub ★
Evaluating explainability for graph neural networks
C. Agarwal, O. Queen, H. Lakkaraju, and M. Zitnik: Nature Scientific Data, 2023
Paper | Code | Harvard Dataverse | 142 GitHub ★
Estimating example difficulty using variance of gradients
C. Agarwal, D. D'souza, and S. Hooker: CVPR, 2022
Paper | Code | Project Website | 58 GitHub ★
Probing gnn explainers: A rigorous theoretical and empirical analysis of gnn explanation methods
C. Agarwal, M. Zitnik*, H. Lakkaraju*: AISTATS, 2022
Exploring counterfactual explanations through the lens of adversarial examples: A theoretical and empirical analysis
M. Pawelczyk, C. Agarwal, S. Joshi, S. Upadhyay, H. Lakkaraju: AISTATS, 2022
Towards Training GNNs using Explanation Directed Message Passing
V. Giunchiglia*, CV. Shukla*, G. Gonzalez, C. Agarwal: LOG, 2022
Towards a unified framework for fair and stable graph representation learning
C. Agarwal, H. Lakkaraju, and M. Zitnik: UAI, 2021
Paper | Code | 37 GitHub ★