The 2nd Workshop on Reliable and Responsible Foundation Models

Workshop

The 2nd Workshop on Reliable and Responsible Foundation Models

Mohit Bansal · Xinyu Yang · Kate Donahue · Giulia Fanti · David Madras · Han Shao · Hongyi Wang · Steven Wu · Xinyu Yang · Huaxiu Yao

West Ballroom C

Sat 19 Jul, 8:50 a.m. PDT

[ Abstract ]

[ OpenReview]

Foundation models (FMs), with their emergent and reasoning abilities, are reshaping the future of scientific research and broader human society. However, as their intelligence approaches or surpasses that of humans, concerns arise regarding their responsible use in real-world applications, such as reliability, safety, transparency, and ethics. The workshop on reliable and responsible FMs delves into the urgent need to ensure that such models align with human values. The significance of this topic cannot be overstated, as the real-world implications of these models impact everything from daily information access to critical decision-making in fields like medicine and finance, especially for embodied FMs that directly interact with the physical world. Stakeholders, including developers, practitioners, and policymakers, care deeply about this because the reliable and responsible design, deployment, and oversight of these models dictate not only the success of AI solutions but also the preservation of societal norms, order, equity, and fairness. Some of the fundamental questions that this workshop aims to address are:* Diagnosis: How can we identify and characterize unreliable and irresponsible behaviors in FMs? Topics include prompt sensitivity, lack of self-consistency, and hallucinations in generation.* Evaluation: How should we assess the harmful capabilities of FMs and quantify their societal impact? * Sources: How can we pinpoint and understand the known or emerging sources of FM unreliability? This involves examining training data, optimization objectives, and architectural design.* Generalization: How can responsible and reliable properties be effectively adapted to increasingly advanced FMs, particularly as they incorporate new features such as more modalities or long CoT? * Governance: What principles or guidelines should inform the next generation of FMs to ensure they are reliable and responsible? How can real-time monitoring of these FMs be enabled?* Guarantee: Can we establish theoretical frameworks for reliably and responsibly provable FMs?* Practice: How to leverage domain-specific knowledge to guide FMs towards improved reliability and responsibility across diverse areas, such as drug discovery, education, or clinical health?

Chat is not available.

Timezone: America/Los_Angeles

Schedule

Sat 8:50 a.m. - 9:00 a.m.	Introduction and opening remarks ( Intro ) >	Xinyu Yang 🔗
Sat 9:00 a.m. - 9:40 a.m.	Invited Talk 1 ( Invited Talk ) >	🔗
Sat 9:40 a.m. - 10:00 a.m.	Oral Talk 1 ( Oral Talk ) >	🔗
Sat 10:00 a.m. - 10:40 a.m.	Invited Talk 2 ( Invited Talk ) >	🔗
Sat 10:40 a.m. - 11:00 a.m.	Oral Talk 2 ( Oral Talk ) >	🔗
Sat 11:00 a.m. - 11:40 a.m.	Invited Talk 3 ( Invited Talk ) >	🔗
Sat 11:40 a.m. - 12:00 p.m.	Oral Talk 3 ( Oral Talk ) >	🔗
Sat 12:00 p.m. - 1:00 p.m.	Poster Session 1 and Lunch Break ( Poster Session ) >	🔗
Sat 1:00 p.m. - 1:40 p.m.	Invited Talk 4 ( Invited Talk ) >	🔗
Sat 1:40 p.m. - 2:00 p.m.	Oral Talk 4 ( Oral Talk ) >	🔗
Sat 2:00 p.m. - 2:40 p.m.	Invited Talk 5 ( Invited Talk ) >	🔗
Sat 2:40 p.m. - 3:00 p.m.	Oral Talk 5 ( Oral Talk ) >	🔗
Sat 3:00 p.m. - 3:40 p.m.	Invited Talk 6 ( Invited Talk ) >	🔗
Sat 3:40 p.m. - 4:00 p.m.	Oral Talk 6 ( Oral Talk ) >	🔗
Sat 4:00 p.m. - 5:00 p.m.	Poster Session 2 and Coffee Break ( Poster Session ) >	🔗
Sat 5:00 p.m. - 5:05 p.m.	Closing Remarks ( Close ) >	🔗
-	Learning Robust 3D Representation from CLIP via Dual Denoising ( Poster ) > link Link	Shuqing Luo · Bowen Qu · Wei Gao 🔗
-	Conformal Risk Minimization with Variance Reduction ( Poster ) > link Link	Sima Noorani · Orlando Romero · Nicolo Dal Fabbro · Hamed Hassani · George Pappas 🔗
-	Enhancing Clinical Multiple-Choice Questions Benchmarks with Knowledge Graph Guided Distractor Generation ( Poster ) > link Link	Running Yang · wenlong deng · Minghui Chen · Yuyin Zhou · Xiaoxiao Li 🔗
-	Can We Infer Confidential Properties of Training Data from LLMs? ( Poster ) > link Link	Pengrun Huang · Chhavi Yadav · Ruihan Wu · Kamalika Chaudhuri 🔗
-	MixAT: Combining Continuous and Discrete Adversarial Training for LLMs ( Poster ) > link Link	Csaba Dékány · Stefan Balauca · Robin Staab · Dimitar I. Dimitrov · Martin Vechev 🔗
-	Conformal Prediciton Beyond the Seen: A Missing Mass Perspective for Uncertainty Quantification in Generative Models ( Poster ) > link Link	Sima Noorani · Shayan Kiyani · George Pappas · Hamed Hassani 🔗
-	Angular Steering: Behavior Control via Rotation in Activation Space ( Poster ) > link Link	Hieu M. Vu · Tan Nguyen 🔗
-	Teaching Models to Verbalize Reward Hacking in Chain-of-Thought Reasoning ( Poster ) > link Link	Miles Turpin · Andy Arditi · Marvin Li · Joe Benton · Julian Michael 🔗
-	A Thousand Words or An Image: Studying the Influence of Persona Modality in Multimodal LLMs ( Poster ) > link Link	Julius Broomfield · Kartik Sharma · Srijan Kumar 🔗
-	LoRA Merging with SVD: Understanding Interference and Preserving Performance ( Poster ) > link Link	Dennis Tang · Prateek Yadav · Yi-Lin Sung · Jaehong Yoon · Mohit Bansal 🔗
-	Dataset Protection via Watermarked Canaries in Retrieval-Augmented LLMs ( Poster ) > link Link	Liu · Xuandong Zhao · Dawn Song · Yuheng Bu 🔗
-	Beyond Multiple Choice: Evaluating Steering Vectors for Adaptive Free-Form Summarization ( Poster ) > link Link	Joschka Braun · Carsten Eickhoff · Seyed Ali Bahrainian 🔗
-	Lookahead Bias in Pretrained Language Models ( Poster ) > link Link	Suproteem Sarkar · Keyon Vafa 🔗
-	Scalable Defense against In-the-wild Jailbreaking Attacks with Safety Context Retrieval ( Poster ) > link Link	Taiye Chen · Zeming Wei · Ang Li · Yisen Wang 🔗
-	Do Sparse Autoencoders Generalize? A Case Study of Answerability ( Poster ) > link Link	Lovis Heindrich · Phil Torr · Fazl Barez · Veronika Thost 🔗
-	GLSim: Detecting Object Hallucinations in LVLMs via Global-Local Similarity ( Poster ) > link Link	Seongheon Park · Sharon Li 🔗
-	Evaluating Adversarial Protections for Diffusion Personalization: A Comprehensive Study ( Poster ) > link Link	Kai Ye · Tianyi Chen · Zhen Wang 🔗
-	Circuit Discovery Helps To Detect LLM Jailbreaking ( Poster ) > link Link	Paria Mehrbod · Boris Knyazev · Eugene Belilovsky · Guy Wolf · geraldin nanfack 🔗
-	The Geometry of Forgetting: Analyzing Machine Unlearning through Local Learning Coefficients ( Poster ) > link Link	Aashiq Muhamed · Virginia Smith 🔗
-	Thought calibration: Efficient and confident test-time scaling ( Poster ) > link Link	Menghua Wu · Cai Zhou · Stephen Bates · Tommi Jaakkola 🔗
-	Reliable Statistical Inference with Synthetic Data from Large Language Models ( Poster ) > link Link	Yewon Byun · Shantanu Gupta · Zachary Lipton · Rachel Childers · Bryan Wilder 🔗
-	On Characterizations for Language Generation: Interplay of Hallucinations, Breadth, and Stability ( Poster ) > link Link	Alkis Kalavasis · Anay Mehrotra · Grigorios Velegkas 🔗
-	Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs ( Poster ) > link Link	Kejia Zhang · Keda TAO · Jiasheng Tang · Huan Wang 🔗
-	In Agents We Trust, but Who Do Agents Trust? Latent Source Preferences Steer LLM Generations ( Poster ) > link Link	Mohammad Aflah Khan · Mahsa Amani · Soumi Das · Bishwamittra Ghosh · Qinyuan Wu · Krishna Gummadi · Manish Gupta · Abhilasha Ravichander 🔗
-	The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1 ( Poster ) > link Link	Kaiwen Zhou · Chengzhi Liu · Xuandong Zhao · Shreedhar Jangam · Jayanth Srinivasa · Gaowen Liu · Dawn Song · Xin Wang 🔗
-	Simple Mechanistic Explanations for Out-Of-Context Reasoning ( Poster ) > link Link	Atticus Wang · Josh Engels · Oliver Clive-Griffin 🔗
-	SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning ( Poster ) > link Link	Kaiwen Zhou · Xuandong Zhao · Gaowen Liu · Jayanth Srinivasa · Aosong Feng · Dawn Song · Xin Wang 🔗
-	Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models ( Poster ) > link Link	Bartłomiej Marek · Lorenzo Rossi · Vincent Hanke · Xun Wang · Michael Backes · Franziska Boenisch · Adam Dziedzic 🔗
-	Persuade Me If You Can: Evaluating AI Agent Influence on Safety Monitors ( Poster ) > link Link	Jennifer Za · Julija Bainiaksina · Tanush Chopra · Nikita Ostrovsky · Victoria Krakovna 🔗
-	SOSBENCH: Benchmarking Safety Alignment on Scientific Knowledge ( Poster ) > link Link	Fengqing Jiang · Fengbo Ma · Zhangchen Xu · Yuetai Li · Bhaskar Ramasubramanian · Luyao Niu · Bo Li · Xianyan Chen · Zhen Xiang · Radha Poovendran 🔗
-	Alignment of Large Language Models with Constrained Learning ( Poster ) > link Link	Botong Zhang · Shuo Li · Ignacio Hounie · Osbert Bastani · Dongsheng Ding · Alejandro Ribeiro 🔗
-	Copilot Arena: A Platform for Code LLM Evaluation in the Wild ( Oral ) > link Link	Wayne Chi · Valerie Chen · Anastasios Angelopoulos · Wei-Lin Chiang · Aditya Mittal · Naman Jain · Tianjun Zhang · Ion Stoica · Chris Donahue · Ameet Talwalkar 🔗
-	AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions ( Poster ) > link Link	Polina Kirichenko · Mark Ibrahim · Kamalika Chaudhuri · Samuel Bell 🔗
-	Model Organisms for Emergent Misalignment ( Poster ) > link Link	Edward Turner · Anna Soligo · Mia Taylor · Senthooran Rajamanoharan · Neel Nanda 🔗
-	ALIGN: Prompt-based Attribute Alignment for Reliable, Responsible, and Personalized LLM-based Decision-Making ( Poster ) > link Link	Bharadwaj Ravichandran · David Joy · Paul Elliott · Brian Hu · Jadie Adams · Christopher Funk · Emily Veenhuis · Anthony Hoogs · Arslan Basharat 🔗
-	Doubly Robust Alignment for Large Language Models ( Poster ) > link Link	Erhan Xu · Kai Ye · Hongyi Zhou · Luhan Zhu · Francesco Quinzan · Chengchun Shi 🔗
-	Does Representation Intervention Really Identify Desired Concepts and Elicit Alignment? ( Poster ) > link Link	Hongzheng Yang · Yongqiang Chen · Zeyu Qin · Tongliang Liu · Chaowei Xiao · Kun Zhang · Bo Han 🔗
-	Accountability Attribution: Tracing Model Behavior to Training Processes ( Poster ) > link Link	Shichang Zhang · Hongzhe Du · Karim Saraipour · Jiaqi Ma · Himabindu Lakkaraju 🔗
-	Escaping the SpuriVerse: Can Large Vision-Language Models Generalize Beyond Seen Spurious Correlations? ( Poster ) > link Link	Yiwei Yang · Chung Peng Lee · Shangbin Feng · Dora Zhao · Bingbing Wen · Anthony Liu · Yulia Tsvetkov · Bill Howe 🔗
-	Capability-Based Scaling Laws for LLM Red-Teaming ( Poster ) > link Link	Alexander Panfilov · Paul Kassianik · Maksym Andriushchenko · Jonas Geiping 🔗
-	When Meaning Doesn’t Matter: Exposing Guard Model Fragility via Paraphrasing ( Poster ) > link Link	Cristina Pinneri · Christos Louizos 🔗
-	From Tasks to Teams: A Risk-First Evaluation Framework for Multi-Agent LLM Systems in Finance ( Oral ) > link Link	Zichen Chen · Jianda Chen · Jiaao Chen · Misha Sra 🔗
-	GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning ( Poster ) > link Link	12 presenters Yue Liu · Shengfang Zhai · Mingzhe Du · Yulin Chen · Tri Cao · Hongcheng Gao · Cheng Wang · Xinfeng Li · Kun Wang · Junfeng Fang · Jiaheng Zhang · Bryan Hooi 🔗
-	Evaluating Large Language Models' Capability to Launch Fully Automated Spear Phishing Campaigns ( Poster ) > link Link	Fred Heiding · Simon Lermen · Andrew Kao · Bruce Schneier · Arun Vishwanath 🔗
-	DINGO: Constrained Inference for Diffusion LLMs ( Poster ) > link Link	Tarun Suresh · Debangshu Banerjee · Shubham Ugare · Sasa Misailovic · Gagandeep Singh 🔗
-	Consistency in Language Models: Current Landscape, Challenges, and Future Directions ( Poster ) > link Link	Jekaterina Novikova · Carol Anderson · Borhane Blili-Hamelin · Domenic Rosati · Subhabrata Majumdar 🔗
-	Empirical Comparison of Membership Inference Attacks in Deep Transfer Learning ( Poster ) > link Link	Yuxuan Bai · Gauri Pradhan · Marlon Tobaben · Antti Honkela 🔗
-	Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency ( Poster ) > link Link	Yukun Jiang · Mingjie Li · Michael Backes · Yang Zhang 🔗
-	Think with Moderation: Reasoning Models and Confidence Calibration in the Climate Domain ( Poster ) > link Link	Romain Lacombe · Kerrie Wu · Eddie Dilworth 🔗
-	Learning on LLM Output Signatures for Gray-Box Behavior Analysis ( Poster ) > link Link	Guy Bar Shalom · Fabrizio Frasca · Derek Lim · Yoav Gelberg · Yftah Ziser · Ran El-Yaniv · Gal Chechik · Haggai Maron 🔗
-	RealSafe-R1: Safety-Aligned DeepSeek-R1 without Compromising Reasoning Capability ( Poster ) > link Link	Yichi Zhang · Zihao Zeng · Dongbai Li · Yao Huang · Zhijie Deng · Yinpeng Dong 🔗
-	Dynamic Risk Assessments for Offensive Cybersecurity Agents ( Poster ) > link Link	Boyi Wei · Benedikt Stroebl · Jiacen Xu · Joie Zhang · Zhou Li · Peter Henderson 🔗
-	Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks ( Poster ) > link Link	Hanjiang Hu · Alexander Robey · Changliu Liu 🔗
-	Improving Commonsense Reasoning and Reliability in LLMs Through Cognitive-Inspired Prompting Frameworks ( Poster ) > link Link	Tanvi Ganapathy · Ishita Mathur · Anna Szczuka 🔗
-	Semi-Nonnegative GPT: Towards Monosemantic representations ( Poster ) > link Link	Junyi Li · Jinqi Liu · Qi Zhang · Yisen Wang 🔗
-	UniGuard: Towards Universal Safety Guardrails for Jailbreak Attacks on Multimodal Large Language Models ( Poster ) > link Link	Sejoon Oh · Yiqiao Jin · Megha Sharma · Donghyun Kim · Eric Ma · Gaurav Verma · Srijan Kumar 🔗
-	Beautiful Images, Toxic Words: Understanding and Addressing Offensive Text in Generated Images ( Poster ) > link Link	Aditya Kumar · Tom Blanchard · Adam Dziedzic · Franziska Boenisch 🔗
-	Watermarking Autoregressive Image Generation ( Poster ) > link Link	Nikola Jovanović · Ismail Labiad · Tomas Soucek · Martin Vechev · Pierre Fernandez 🔗
-	Multi-Modal Medical Image Augmentation for Controlled Heterogeneity and Fair Outcomes ( Poster ) > link Link	SOO YONG KIM · Seunghyeok Hong 🔗
-	Foundational Models Must Be Designed To Yield Safer Loss Landscapes That Resist Harmful Fine-Tuning ( Poster ) > link Link	Karan Uppal · Pavan Kalyan Tankala 🔗
-	Generalizing Trust: Weak-to-Strong Trustworthiness in Language Models ( Poster ) > link Link	Lillian Sun · Martin Pawelczyk · Zhenting Qi · Aounon Kumar · Himabindu Lakkaraju 🔗
-	Distilling Safe LLM Systems via Soft Prompts ( Poster ) > link Link	Motasem Alfarra · Dana Kianfar · Cristina Pinneri · Christos Louizos 🔗
-	Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective ( Poster ) > link Link	Jianyu Wang · Zhiqiang Hu · Lidong Bing 🔗
-	Investigating Tool-Memory Conflicts in Tool-Augmented LLMs ( Poster ) > link Link	Jiali Cheng · Rui Pan · Hadi Amiri 🔗
-	Finetuning-Activated Backdoors in LLMs ( Poster ) > link Link	Thibaud Gloaguen · Mark Vero · Robin Staab · Martin Vechev 🔗
-	Defending Against Prompt Injection with a Few DefensiveTokens ( Poster ) > link Link	Sizhe Chen · Yizhu Wang · Nicholas Carlini · Chawin Sitawarin · David Wagner 🔗
-	Verbalized Confidence Triggers Self-Verification : Emergent Behavior Without Explicit Reasoning Supervision ( Poster ) > link Link	Chaeyun Jang · Moonseok Choi · Yegon Kim · Hyungi Lee · Juho Lee 🔗
-	Steering LLM Reasoning Through Bias-Only Adaptation ( Poster ) > link Link	Viacheslav Sinii · Alexey Gorbatovski · Artem Cherepanov · Boris Shaposhnikov · Nikita Balagansky · Daniil Gavrilov 🔗
-	SimBA: Simplifying Benchmark Analysis ( Poster ) > link Link	Nishant Subramani · Alfredo Gomez · Mona Diab 🔗
-	Aligned Textual Scoring Rule ( Poster ) > link Link	Yuxuan Lu · Yifan Wu · Jason Hartline · Michael Curry 🔗
-	Visual Language Models as Zero-Shot Deepfake Detectors ( Poster ) > link Link	Viacheslav Pirogov 🔗
-	Adversarial Manipulation of Reasoning Models using Internal Representations ( Poster ) > link Link	Kureha Yamaguchi · Benjamin Etheridge · Andy Arditi 🔗
-	On Learning Verifiers for Chain-of-Thought Reasoning ( Poster ) > link Link	Nina Balcan · Avrim Blum · Zhiyuan Li · Dravyansh Sharma 🔗
-	RoMa: A Robust Model Watermarking Scheme for Protecting IP in Diffusion Models ( Poster ) > link Link	Yingsha Xie · Rui Min · Zeyu Qin · Fei Ma · Li Shen · Fei Yu · Xiaochun Cao 🔗
-	Weak-for-Strong: Training Weak Meta-Agent to Harness Strong Executors ( Oral ) > link Link	Fan Nie · Lan Feng · Haotian Ye · Weixin Liang · Pan Lu · Huaxiu Yao · Alexandre Alahi · James Zou 🔗
-	Extracting memorized pieces of (copyrighted) books from open-weight language models ( Oral ) > link Link	A. Feder Cooper · Aaron Gokaslan · Ahmed Ahmed · Amy Cyphert · Chris De Sa · Mark Lemley · Daniel Ho · Percy Liang 🔗
-	Efficient and Privacy-Preserving Soft Prompt Transfer for LLMs ( Poster ) > link Link	Xun Wang · Jing Xu · Franziska Boenisch · Michael Backes · Christopher A. Choquette Choo · Adam Dziedzic 🔗
-	A Frustratingly Simple Yet Highly Effective Attack Baseline: Over 90% Success Rate Against the Strong Black-box Models of GPT-4.5/4o/o1 ( Poster ) > link Link	Zhaoyi Li · Xiaohan Zhao · Dong-Dong Wu · Jiacheng Cui · Zhiqiang Shen 🔗
-	ASNO: An Interpretable Attention-Based Spatio-Temporal Neural Operator for Robust Scientific Machine Learning ( Poster ) > link Link	Vispi Karkaria · Doksoo Lee · Yi-Ping Chen · Yue Yu · Wei Chen 🔗
-	Position: Agent-Specific Trustworthiness Risk as a Research Priority ( Poster ) > link Link	Zeming Wei · Tianlin Li · Xiaojun Jia · Yihao Zhang · Yang Liu · Meng Sun 🔗
-	Focus on This, Not That! Steering LLMs with Adaptive Feature Specification ( Poster ) > link Link	Tom A. Lamb · Adam Davies · Alasdair J Paren · Phil Torr · Francesco Pinto 🔗
-	Robust and Interpretable Relational Reasoning with Large Language Models and Symbolic Solvers ( Poster ) > link Link	Ge Zhang · Mohammad Alomrani · Hongjian Gu · Jiaming Zhou · Yaochen Hu · Bin Wang · Qun Liu · Mark Coates · Yingxue Zhang · Jianye Hao 🔗
-	Towards Secure Model Sharing with Approximate Fingerprints ( Poster ) > link Link	Anshul Nasery · Sewoong Oh 🔗
-	Steering Language Model Refusal with Sparse Autoencoders ( Poster ) > link Link	Kyle O'Brien · David Majercak · Xavier Fernandes · Richard Edgar · Blake Bullwinkel · Jingya Chen · Harsha Nori · Dean Carignan · Eric Horvitz · Forough Poursabzi-Sangdeh 🔗
-	Predicting the Performance of Black-box Language Models with Follow-up Queries ( Poster ) > link Link	Dylan Sam · Marc Finzi · Zico Kolter 🔗
-	Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models ( Poster ) > link Link	Zhanke Zhou · Zhaocheng Zhu · Xuan Li · Mikhail Galkin · Xiao Feng · Sanmi Koyejo · Jian Tang · Bo Han 🔗
-	What do Geometric Hallucination Detection Metrics Actually Measure? ( Poster ) > link Link	12 presenters Eric Yeats · John Buckheit · Sarah Scullen · Brendan Kennedy · Loc Truong · Davis Brown · William Kay · Cliff Joslyn · Tegan Emerson · Michael Henry · John Emanuello · Henry Kvinge 🔗
-	A Statistical Physics of Language Model Reasoning ( Poster ) > link Link	Jack Carson 🔗
-	Sample-Specific Noise Injection For Diffusion-Based Adversarial Purification ( Poster ) > link Link	Yuhao Sun · Jiacheng Zhang · Zesheng Ye · Chaowei Xiao · Feng Liu 🔗
-	The Automated but Risky Game: Modeling Agent-to-Agent Negotiations and Transactions in Consumer Markets ( Poster ) > link Link	Shenzhe Zhu · Jiao Sun · Yi Nian · Tobin South · Alex Pentland · Jiaxin Pei 🔗
-	Do Students Debias Like Teachers? On the Distillability of Bias Mitigation Methods ( Poster ) > link Link	Jiali Cheng · Hadi Amiri 🔗
-	BiasGUARRD: Enhancing Fairness and Reliability in LLM Conflict Resolution Through Agentic Debiasing ( Poster ) > link Link	Erica Wang · Shrujana S Kunnam · Sreeyutha Ratala 🔗
-	In-Context Watermarks for Large Language Models ( Poster ) > link Link	Liu · Xuandong Zhao · Christopher Kruegel · Dawn Song · Yuheng Bu 🔗
-	Transferable Visual Adversarial Attacks for Proprietary Multimodal Large Language Models ( Poster ) > link Link	Kai Hu · Weichen Yu · Alexander Robey · Li Zhang · Andy Zou · Haoqi Hu · Chengming Xu · Matt Fredrikson 🔗
-	Position: Membership Inference Attack Should Move On to Distributional Statistics for Distilled Generative Models ( Poster ) > link Link	Muxing Li · Zesheng Ye · Sharon Li · Andy Song · Guangquan Zhang · Feng Liu 🔗
-	Prune 'n Predict: Optimizing LLM Decision-making with Conformal Prediction ( Poster ) > link Link	Harit Vishwakarma · Alan Mishler · Thomas Cook · Niccolo Dalmasso · Natraj Raman · Sumitra Ganesh 🔗
-	The Geometries of Truth Are Orthogonal Across Tasks ( Poster ) > link Link	Waïss Azizian · Michael Kirchhof · Eugene Ndiaye · Louis Béthune · Michal Klein · Pierre Ablin · Marco Cuturi 🔗
-	Auditing, Monitoring, and Intervention for Compliance of Advanced AI Systems ( Poster ) > link Link	Parand A. Alamdari · Toryn Q. Klassen · Sheila McIlraith 🔗
-	Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems through Benign Queries ( Poster ) > link Link	Yuhao Wang · Wenjie Qu · Yanze Jiang · Lichen Liu · Yue Liu · Shengfang Zhai · Yinpeng Dong · Jiaheng Zhang 🔗
-	MARVEL: Modular Abstention for Reliable and Versatile Expert LLMs ( Poster ) > link Link	Bingbing Wen · Faeze Brahman · Zhan Su · Shangbin Feng · Yulia Tsvetkov · Lucy Lu Wang · Bill Howe 🔗
-	Bidding for Influence: Auction-Driven Diffusion Image Generation ( Poster ) > link Link	Lillian Sun · Henry Huang · Fucheng Zhu · Giannis Daras · Constantinos Daskalakis 🔗
-	Advancing LLM Safe Alignment with Safety Representation Ranking ( Poster ) > link Link	Tianqi Du · Zeming Wei · Quan Chen · Chenheng Zhang · Yisen Wang 🔗
-	On the Scoring Functions for RAG-based Conformal Factuality ( Poster ) > link Link	Yi Chen · Caitlyn Yin · Sukrut Chikodikar · Ramya Vinayak 🔗
-	GPT, But Backwards: Exactly Inverting Language Model Outputs ( Poster ) > link Link	Adrians Skapars · Edoardo Manino · Youcheng Sun · Lucas Cordeiro 🔗
-	(Im)possibility of Automated Hallucination Detection in Large Language Models ( Poster ) > link Link	Amin Karbasi · Omar Montasser · John Sous · Grigorios Velegkas 🔗
-	Visual Instruction Bottleneck Tuning ( Oral ) > link Link	Changdae Oh · Jiatong Li · Shawn Im · Sharon Li 🔗
-	Lifelong Safety Alignment for Language Models ( Poster ) > link Link	Haoyu Wang · Zeyu Qin · Yifei Zhao · Chao Du · Min Lin · Xueqian Wang · Tianyu Pang 🔗
-	One Stone, Two Birds: Enhancing Adversarial Defense Through the Lens of Distributional Discrepancy ( Poster ) > link Link	Jiacheng Zhang · Benjamin Rubinstein · Jingfeng Zhang · Feng Liu 🔗
-	TRoVe: Discovering Error-Inducing Static Feature Biases in Temporal Vision-Language Models ( Poster ) > link Link	Maya Varma · Jean-Benoit Delbrouck · Sophie Ostmeier · Akshay Chaudhari · Curtis Langlotz 🔗
-	DP-AdamW: Investigating Decoupled Weight Decay and Bias Correction in Private Deep Learning ( Poster ) > link Link	Lillian Sun · Kevin Cong · Jay Chooi · Russell Li 🔗
-	Sysformer: Safeguarding Frozen Large Language Models with Adaptive System Prompts ( Poster ) > link Link	Kartik Sharma · Yiqiao Jin · Vineeth Rakesh · Yingtong Dou · Menghai Pan · Mahashweta Das · Srijan Kumar 🔗
-	The Necessity for Intervention Fidelity: Unintended Side Effects When Steering LLMs ( Poster ) > link Link	Jonas Raedler · Weiyue Li · Alyssa Taliotis · Manasvi Goyal · Siddharth Swaroop · Weiwei Pan 🔗
-	SAEs Can Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs ( Poster ) > link Link	Aashiq Muhamed · Jacopo Bonato · Mona Diab · Virginia Smith 🔗
-	Robust LLM Fingerprinting via Domain-Specific Watermarks ( Poster ) > link Link	Thibaud Gloaguen · Robin Staab · Nikola Jovanović · Martin Vechev 🔗
-	Position: Reasoning LLMs are Wandering Solution Explorers ( Poster ) > link Link	Jiahao Lu · Ziwei Xu · Mohan Kankanhalli 🔗
-	GenAI Copyright Evidence with Operational Meaning ( Poster ) > link Link	Eli Chien · Amit Saha · Yinan Huang · Pan Li 🔗
-	Reward Shaping to Mitigate Reward Hacking in RLHF ( Poster ) > link Link	Jiayi Fu · Xuandong Zhao · Chengyuan Yao · Heng Wang · Qi Han · Yanghua Xiao 🔗
-	Self-reflective Uncertainties: Do LLMs Know Their Internal Answer Distribution? ( Poster ) > link Link	Michael Kirchhof · Luca Füger · Adam Golinski · Eeshan Gunesh Dhekane · Arno Blaas · Sinead Williamson 🔗
-	State Space Models: A Naturally Robust Alternative to Transformers in Computer Vision ( Poster ) > link Link	Chengbin Du · Yanxi Li · Chang Xu 🔗
-	Transformers Don't In-Context Learn Least Squares Regression ( Oral ) > link Link	Joshua Hill · Benjamin Eyre · Elliot Creager 🔗
-	Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems? ( Poster ) > link Link	Kai Yan · Yufei Xu · Zhengyin Du · Xuesong Yao · Zheyu Wang · Xiaowen Guo · Jiecao Chen 🔗
-	Uncertainty Quantification for MLLMs ( Poster ) > link Link	Gregory Kang Ruey Lau · Hieu Dao · Nicole Hui Lin Kan · Bryan Kian Hsiang Low 🔗
-	Data Shifts Hurt CoT: A Theoretical Study ( Poster ) > link Link	Lang Yin · Debangshu Banerjee · Gagandeep Singh 🔗