Trustworthy AI · Aikyam Lab

Publications

2026
arxiv'26
InterpretabilityAgentLLM Reasoning
STRUCTUREDAGENT: Planning with AND/OR Trees for Long-Horizon Web Tasks
E. Lobo, X. Chen, J. Meng, N. Xi, Y. Jiao, C. Agarwal, Y. Zick, Y. Gao
arxiv, 2026
arxiv'26
ExplainabilityGNNsOOD
Quantifying Explanation Quality in Graph Neural Networks using Out-of-Distribution Generalization
D. Zhang, S. Betala, C. Agarwal
arxiv, 2026
arxiv'26
InterpretabilityUnlearningLLMs
Towards Understanding Unlearning Difficulty: A Mechanistic Perspective and Circuit-Guided Difficulty Metric
J. Cheng, Z. Chen, C. Agarwal, H. Amiri
arxiv, 2026
IUI'26
XAILLM ReasoningInteractive
Improving Human Verification of LLM Reasoning through Interactive Explanation Interfaces
R. Zhou, G. Nguyen, N. Kharya, A. T. Nguyen, C. Agarwal
IUI 2026
AAAI'26 Oral 🏆
AlignmentProbingLLMs
Polarity-Aware Probing for Quantifying Latent Alignment in Language Models
S. Sadiekh, E. Ericheva, C. Agarwal
AAAI 2026 — Oral Presentation
arxiv'26
MultilingualLLM Reasoning
CURE-Med: Curriculum-Informed Reinforcement Learning for Multilingual Medical Reasoning
E. Onyame, A. Ghosh, S. Baidya, S. Saha, X. Chen, C. Agarwal
arxiv 2026
2025
arXiv
Multimodal ExplainabilityInterpretabilityMultimodal AI
Rethinking Explainability in the Era of Multimodal AI
C. Agarwal
arXiv 2025
arXiv
BenchmarkGNNsMultimodal LLMs
A Graph Talks, But Who’s Listening? Rethinking Evaluations for Graph-Language Models
S. Petkar, H. Aakash K, A. Vempati, A. Sinha, P. Kumaraguru, C. Agarwal
arXiv 2025
arXiv
MultilingualHealthcareLLMs
CLINIC: Evaluating Multilingual Trustworthiness in Language Models for Healthcare
A. Ghosh, S. Sridhar, R.K. Ravi, M. Muhsin, S. Saha, C. Agarwal
arXiv 2025
EMNLP'25
MultilingualReasoningSurvey
The Multilingual Mind: A Survey of Multilingual Reasoning in Language Models
A. Ghosh, D. Datta, S. Saha, C. Agarwal
EMNLP 2025
EMNLP'25
HallucinationVideoMultimodal
EGOILLUSION: Benchmarking Hallucinations in Egocentric Video Understanding
A. Seth, U. Tyagi, R. Selvakumar, N. Anand, S. Kumar, S. Ghosh, R. Duraiswami, C. Agarwal, D. Manocha
EMNLP 2025 — Main Conference
EMNLP'25
HallucinationVision-LanguageVQA
HALLUCINOGEN: Benchmarking Hallucination in Implicit Reasoning within Large Vision Language Models
A. Seth, D. Manocha, C. Agarwal
UncertaiNLP Workshop @ EMNLP 2025
NAACL'25 Oral
MemorizationAttributionLLMs
Analyzing Memorization in Large Language Models through the Lens of Model Attribution
TR Menta, S. Agrawal, C. Agarwal
NAACL 2025 — Oral 🏆
NAACL'25
PrivacyNLPUnlearnable
Towards Operationalizing Right to Data Protection
A. Java, S. Shahid, C. Agarwal
NAACL 2025
NAACL'25
Chain-of-ThoughtFine-tuning
On the Impact of Fine-tuning on Chain-of-Thought Reasoning
E. Lobo, C. Agarwal, H. Lakkaraju
NAACL 2025
2024
AISTATS'24 · NeurIPS Spotlight
UncertaintyNL Explanations
Quantifying Uncertainty in Natural Language Explanations of Large Language Models
S. H. Tanneru, C. Agarwal, H. Lakkaraju
AISTATS 2024
NeurIPS'24
SafetyMedicineLLMs
Towards Safe Large Language Models for Medicine
T. Han, A. Kumar, C. Agarwal, H. Lakkaraju
NeurIPS 2024
ICML'24
TruthfulnessPrompting
Understanding the Effects of Iterative Prompting on Truthfulness
S. Krishna, C. Agarwal, H. Lakkaraju
ICML 2024
COLM'24
RobustnessCertified Safety
Certifying LLM Safety Against Adversarial Prompting
A. Kumar, C. Agarwal, S. Srinivas, A. Li, S. Feizi, H. Lakkaraju
COLM 2024
2022 – 2023 · Selected
NeurIPS'22
XAIBenchmark
OpenXAI: Towards a Transparent Evaluation of Post hoc Model Explanations
C. Agarwal, S. Krishna, E. Saxena, M. Pawelczyk, et al.
NeurIPS Dataset & Benchmark 2022 · 218 ★
CVPR'22
XAITraining Dynamics
Estimating Example Difficulty Using Variance of Gradients
C. Agarwal, D. D'souza, S. Hooker
CVPR 2022 · 58 ★
Nature'23
GNNXAI
Evaluating Explainability for Graph Neural Networks
C. Agarwal, O. Queen, H. Lakkaraju, M. Zitnik
Nature Scientific Data 2023 · 142 ★
CVPR'23
DebiasingVision-Language
DeAR: Debiasing Vision-Language Models with Additive Residuals
A. Seth, M. Hemani, C. Agarwal
CVPR 2023
ICLR'23
UnlearningGNN
GNNDelete: A General Strategy for Unlearning in Graph Neural Networks
J. Cheng, G. Dasoulas*, H. He*, C. Agarwal, M. Zitnik
ICLR 2023
ICLR'23
RLExplanations
Explaining RL Decisions with Trajectories
S. Deshmukh*, A. Dasgupta*, B. Krishnamurthy, N. Jiang, C. Agarwal, et al.
ICLR 2023
SIGIR'23
IRInterpretability
Explain like I am BM25: Interpreting a Dense Model's Ranked-List with a Sparse Approximation
M. Llordes, D. Ganguly, S. Bhatia, C. Agarwal
SIGIR 2023
View All Papers on Google Scholar →