Workshop
The 2nd Workshop on Reliable and Responsible Foundation Models
Mohit Bansal · Xinyu Yang · Kate Donahue · Giulia Fanti · David Madras · Han Shao · Hongyi Wang · Steven Wu · Xinyu Yang · Huaxiu Yao
West Ballroom C
Sat 19 Jul, 8:50 a.m. PDT
Foundation models (FMs), with their emergent and reasoning abilities, are reshaping the future of scientific research and broader human society. However, as their intelligence approaches or surpasses that of humans, concerns arise regarding their responsible use in real-world applications, such as reliability, safety, transparency, and ethics. The workshop on reliable and responsible FMs delves into the urgent need to ensure that such models align with human values. The significance of this topic cannot be overstated, as the real-world implications of these models impact everything from daily information access to critical decision-making in fields like medicine and finance, especially for embodied FMs that directly interact with the physical world. Stakeholders, including developers, practitioners, and policymakers, care deeply about this because the reliable and responsible design, deployment, and oversight of these models dictate not only the success of AI solutions but also the preservation of societal norms, order, equity, and fairness. Some of the fundamental questions that this workshop aims to address are:* Diagnosis: How can we identify and characterize unreliable and irresponsible behaviors in FMs? Topics include prompt sensitivity, lack of self-consistency, and hallucinations in generation.* Evaluation: How should we assess the harmful capabilities of FMs and quantify their societal impact? * Sources: How can we pinpoint and understand the known or emerging sources of FM unreliability? This involves examining training data, optimization objectives, and architectural design.* Generalization: How can responsible and reliable properties be effectively adapted to increasingly advanced FMs, particularly as they incorporate new features such as more modalities or long CoT? * Governance: What principles or guidelines should inform the next generation of FMs to ensure they are reliable and responsible? How can real-time monitoring of these FMs be enabled?* Guarantee: Can we establish theoretical frameworks for reliably and responsibly provable FMs?* Practice: How to leverage domain-specific knowledge to guide FMs towards improved reliability and responsibility across diverse areas, such as drug discovery, education, or clinical health?
Schedule
Sat 8:50 a.m. - 9:00 a.m.
|
Introduction and opening remarks
(
Intro
)
>
|
Xinyu Yang 🔗 |
Sat 9:00 a.m. - 9:40 a.m.
|
Invited Talk 1
(
Invited Talk
)
>
|
🔗 |
Sat 9:40 a.m. - 10:00 a.m.
|
Oral Talk 1
(
Oral Talk
)
>
|
🔗 |
Sat 10:00 a.m. - 10:40 a.m.
|
Invited Talk 2
(
Invited Talk
)
>
|
🔗 |
Sat 10:40 a.m. - 11:00 a.m.
|
Oral Talk 2
(
Oral Talk
)
>
|
🔗 |
Sat 11:00 a.m. - 11:40 a.m.
|
Invited Talk 3
(
Invited Talk
)
>
|
🔗 |
Sat 11:40 a.m. - 12:00 p.m.
|
Oral Talk 3
(
Oral Talk
)
>
|
🔗 |
Sat 12:00 p.m. - 1:00 p.m.
|
Poster Session 1 and Lunch Break
(
Poster Session
)
>
|
🔗 |
Sat 1:00 p.m. - 1:40 p.m.
|
Invited Talk 4
(
Invited Talk
)
>
|
🔗 |
Sat 1:40 p.m. - 2:00 p.m.
|
Oral Talk 4
(
Oral Talk
)
>
|
🔗 |
Sat 2:00 p.m. - 2:40 p.m.
|
Invited Talk 5
(
Invited Talk
)
>
|
🔗 |
Sat 2:40 p.m. - 3:00 p.m.
|
Oral Talk 5
(
Oral Talk
)
>
|
🔗 |
Sat 3:00 p.m. - 3:40 p.m.
|
Invited Talk 6
(
Invited Talk
)
>
|
🔗 |
Sat 3:40 p.m. - 4:00 p.m.
|
Oral Talk 6
(
Oral Talk
)
>
|
🔗 |
Sat 4:00 p.m. - 5:00 p.m.
|
Poster Session 2 and Coffee Break
(
Poster Session
)
>
|
🔗 |
Sat 5:00 p.m. - 5:05 p.m.
|
Closing Remarks
(
Close
)
>
|
🔗 |
-
|
Learning Robust 3D Representation from CLIP via Dual Denoising ( Poster ) > link | Shuqing Luo · Bowen Qu · Wei Gao 🔗 |
-
|
Conformal Risk Minimization with Variance Reduction ( Poster ) > link | Sima Noorani · Orlando Romero · Nicolo Dal Fabbro · Hamed Hassani · George Pappas 🔗 |
-
|
Enhancing Clinical Multiple-Choice Questions Benchmarks with Knowledge Graph Guided Distractor Generation ( Poster ) > link | Running Yang · wenlong deng · Minghui Chen · Yuyin Zhou · Xiaoxiao Li 🔗 |
-
|
Can We Infer Confidential Properties of Training Data from LLMs? ( Poster ) > link | Pengrun Huang · Chhavi Yadav · Ruihan Wu · Kamalika Chaudhuri 🔗 |
-
|
MixAT: Combining Continuous and Discrete Adversarial Training for LLMs ( Poster ) > link | Csaba Dékány · Stefan Balauca · Robin Staab · Dimitar I. Dimitrov · Martin Vechev 🔗 |
-
|
Conformal Prediciton Beyond the Seen: A Missing Mass Perspective for Uncertainty Quantification in Generative Models ( Poster ) > link | Sima Noorani · Shayan Kiyani · George Pappas · Hamed Hassani 🔗 |
-
|
Angular Steering: Behavior Control via Rotation in Activation Space ( Poster ) > link | Hieu M. Vu · Tan Nguyen 🔗 |
-
|
Teaching Models to Verbalize Reward Hacking in Chain-of-Thought Reasoning ( Poster ) > link | Miles Turpin · Andy Arditi · Marvin Li · Joe Benton · Julian Michael 🔗 |
-
|
A Thousand Words or An Image: Studying the Influence of Persona Modality in Multimodal LLMs ( Poster ) > link | Julius Broomfield · Kartik Sharma · Srijan Kumar 🔗 |
-
|
LoRA Merging with SVD: Understanding Interference and Preserving Performance ( Poster ) > link | Dennis Tang · Prateek Yadav · Yi-Lin Sung · Jaehong Yoon · Mohit Bansal 🔗 |
-
|
Dataset Protection via Watermarked Canaries in Retrieval-Augmented LLMs ( Poster ) > link | Liu · Xuandong Zhao · Dawn Song · Yuheng Bu 🔗 |
-
|
Beyond Multiple Choice: Evaluating Steering Vectors for Adaptive Free-Form Summarization ( Poster ) > link | Joschka Braun · Carsten Eickhoff · Seyed Ali Bahrainian 🔗 |
-
|
Lookahead Bias in Pretrained Language Models ( Poster ) > link | Suproteem Sarkar · Keyon Vafa 🔗 |
-
|
Scalable Defense against In-the-wild Jailbreaking Attacks with Safety Context Retrieval ( Poster ) > link | Taiye Chen · Zeming Wei · Ang Li · Yisen Wang 🔗 |
-
|
Do Sparse Autoencoders Generalize? A Case Study of Answerability ( Poster ) > link | Lovis Heindrich · Phil Torr · Fazl Barez · Veronika Thost 🔗 |
-
|
GLSim: Detecting Object Hallucinations in LVLMs via Global-Local Similarity ( Poster ) > link | Seongheon Park · Sharon Li 🔗 |
-
|
Evaluating Adversarial Protections for Diffusion Personalization: A Comprehensive Study ( Poster ) > link | Kai Ye · Tianyi Chen · Zhen Wang 🔗 |
-
|
Circuit Discovery Helps To Detect LLM Jailbreaking ( Poster ) > link | Paria Mehrbod · Boris Knyazev · Eugene Belilovsky · Guy Wolf · geraldin nanfack 🔗 |
-
|
The Geometry of Forgetting: Analyzing Machine Unlearning through Local Learning Coefficients ( Poster ) > link | Aashiq Muhamed · Virginia Smith 🔗 |
-
|
Thought calibration: Efficient and confident test-time scaling ( Poster ) > link | Menghua Wu · Cai Zhou · Stephen Bates · Tommi Jaakkola 🔗 |
-
|
Reliable Statistical Inference with Synthetic Data from Large Language Models ( Poster ) > link | Yewon Byun · Shantanu Gupta · Zachary Lipton · Rachel Childers · Bryan Wilder 🔗 |
-
|
On Characterizations for Language Generation: Interplay of Hallucinations, Breadth, and Stability ( Poster ) > link | Alkis Kalavasis · Anay Mehrotra · Grigorios Velegkas 🔗 |
-
|
Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs ( Poster ) > link | Kejia Zhang · Keda TAO · Jiasheng Tang · Huan Wang 🔗 |
-
|
In Agents We Trust, but Who Do Agents Trust? Latent Source Preferences Steer LLM Generations ( Poster ) > link | Mohammad Aflah Khan · Mahsa Amani · Soumi Das · Bishwamittra Ghosh · Qinyuan Wu · Krishna Gummadi · Manish Gupta · Abhilasha Ravichander 🔗 |
-
|
The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1 ( Poster ) > link | Kaiwen Zhou · Chengzhi Liu · Xuandong Zhao · Shreedhar Jangam · Jayanth Srinivasa · Gaowen Liu · Dawn Song · Xin Wang 🔗 |
-
|
Simple Mechanistic Explanations for Out-Of-Context Reasoning ( Poster ) > link | Atticus Wang · Josh Engels · Oliver Clive-Griffin 🔗 |
-
|
SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning ( Poster ) > link | Kaiwen Zhou · Xuandong Zhao · Gaowen Liu · Jayanth Srinivasa · Aosong Feng · Dawn Song · Xin Wang 🔗 |
-
|
Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models ( Poster ) > link | Bartłomiej Marek · Lorenzo Rossi · Vincent Hanke · Xun Wang · Michael Backes · Franziska Boenisch · Adam Dziedzic 🔗 |
-
|
Persuade Me If You Can: Evaluating AI Agent Influence on Safety Monitors ( Poster ) > link | Jennifer Za · Julija Bainiaksina · Tanush Chopra · Nikita Ostrovsky · Victoria Krakovna 🔗 |
-
|
SOSBENCH: Benchmarking Safety Alignment on Scientific Knowledge ( Poster ) > link | Fengqing Jiang · Fengbo Ma · Zhangchen Xu · Yuetai Li · Bhaskar Ramasubramanian · Luyao Niu · Bo Li · Xianyan Chen · Zhen Xiang · Radha Poovendran 🔗 |
-
|
Alignment of Large Language Models with Constrained Learning ( Poster ) > link | Botong Zhang · Shuo Li · Ignacio Hounie · Osbert Bastani · Dongsheng Ding · Alejandro Ribeiro 🔗 |
-
|
Copilot Arena: A Platform for Code LLM Evaluation in the Wild ( Oral ) > link | Wayne Chi · Valerie Chen · Anastasios Angelopoulos · Wei-Lin Chiang · Aditya Mittal · Naman Jain · Tianjun Zhang · Ion Stoica · Chris Donahue · Ameet Talwalkar 🔗 |
-
|
AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions ( Poster ) > link | Polina Kirichenko · Mark Ibrahim · Kamalika Chaudhuri · Samuel Bell 🔗 |
-
|
Model Organisms for Emergent Misalignment ( Poster ) > link | Edward Turner · Anna Soligo · Mia Taylor · Senthooran Rajamanoharan · Neel Nanda 🔗 |
-
|
ALIGN: Prompt-based Attribute Alignment for Reliable, Responsible, and Personalized LLM-based Decision-Making ( Poster ) > link | Bharadwaj Ravichandran · David Joy · Paul Elliott · Brian Hu · Jadie Adams · Christopher Funk · Emily Veenhuis · Anthony Hoogs · Arslan Basharat 🔗 |
-
|
Doubly Robust Alignment for Large Language Models ( Poster ) > link | Erhan Xu · Kai Ye · Hongyi Zhou · Luhan Zhu · Francesco Quinzan · Chengchun Shi 🔗 |
-
|
Does Representation Intervention Really Identify Desired Concepts and Elicit Alignment? ( Poster ) > link | Hongzheng Yang · Yongqiang Chen · Zeyu Qin · Tongliang Liu · Chaowei Xiao · Kun Zhang · Bo Han 🔗 |
-
|
Accountability Attribution: Tracing Model Behavior to Training Processes ( Poster ) > link | Shichang Zhang · Hongzhe Du · Karim Saraipour · Jiaqi Ma · Himabindu Lakkaraju 🔗 |
-
|
Escaping the SpuriVerse: Can Large Vision-Language Models Generalize Beyond Seen Spurious Correlations? ( Poster ) > link | Yiwei Yang · Chung Peng Lee · Shangbin Feng · Dora Zhao · Bingbing Wen · Anthony Liu · Yulia Tsvetkov · Bill Howe 🔗 |
-
|
Capability-Based Scaling Laws for LLM Red-Teaming ( Poster ) > link | Alexander Panfilov · Paul Kassianik · Maksym Andriushchenko · Jonas Geiping 🔗 |
-
|
When Meaning Doesn’t Matter: Exposing Guard Model Fragility via Paraphrasing ( Poster ) > link | Cristina Pinneri · Christos Louizos 🔗 |
-
|
From Tasks to Teams: A Risk-First Evaluation Framework for Multi-Agent LLM Systems in Finance ( Oral ) > link | Zichen Chen · Jianda Chen · Jiaao Chen · Misha Sra 🔗 |
-
|
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning ( Poster ) > link |
12 presentersYue Liu · Shengfang Zhai · Mingzhe Du · Yulin Chen · Tri Cao · Hongcheng Gao · Cheng Wang · Xinfeng Li · Kun Wang · Junfeng Fang · Jiaheng Zhang · Bryan Hooi |
-
|
Evaluating Large Language Models' Capability to Launch Fully Automated Spear Phishing Campaigns ( Poster ) > link | Fred Heiding · Simon Lermen · Andrew Kao · Bruce Schneier · Arun Vishwanath 🔗 |
-
|
DINGO: Constrained Inference for Diffusion LLMs ( Poster ) > link | Tarun Suresh · Debangshu Banerjee · Shubham Ugare · Sasa Misailovic · Gagandeep Singh 🔗 |
-
|
Consistency in Language Models: Current Landscape, Challenges, and Future Directions ( Poster ) > link | Jekaterina Novikova · Carol Anderson · Borhane Blili-Hamelin · Domenic Rosati · Subhabrata Majumdar 🔗 |
-
|
Empirical Comparison of Membership Inference Attacks in Deep Transfer Learning ( Poster ) > link | Yuxuan Bai · Gauri Pradhan · Marlon Tobaben · Antti Honkela 🔗 |
-
|
Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency ( Poster ) > link | Yukun Jiang · Mingjie Li · Michael Backes · Yang Zhang 🔗 |
-
|
Think with Moderation: Reasoning Models and Confidence Calibration in the Climate Domain ( Poster ) > link | Romain Lacombe · Kerrie Wu · Eddie Dilworth 🔗 |
-
|
Learning on LLM Output Signatures for Gray-Box Behavior Analysis ( Poster ) > link | Guy Bar Shalom · Fabrizio Frasca · Derek Lim · Yoav Gelberg · Yftah Ziser · Ran El-Yaniv · Gal Chechik · Haggai Maron 🔗 |
-
|
RealSafe-R1: Safety-Aligned DeepSeek-R1 without Compromising Reasoning Capability ( Poster ) > link | Yichi Zhang · Zihao Zeng · Dongbai Li · Yao Huang · Zhijie Deng · Yinpeng Dong 🔗 |
-
|
Dynamic Risk Assessments for Offensive Cybersecurity Agents ( Poster ) > link | Boyi Wei · Benedikt Stroebl · Jiacen Xu · Joie Zhang · Zhou Li · Peter Henderson 🔗 |
-
|
Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks ( Poster ) > link | Hanjiang Hu · Alexander Robey · Changliu Liu 🔗 |
-
|
Improving Commonsense Reasoning and Reliability in LLMs Through Cognitive-Inspired Prompting Frameworks ( Poster ) > link | Tanvi Ganapathy · Ishita Mathur · Anna Szczuka 🔗 |
-
|
Semi-Nonnegative GPT: Towards Monosemantic representations ( Poster ) > link | Junyi Li · Jinqi Liu · Qi Zhang · Yisen Wang 🔗 |
-
|
UniGuard: Towards Universal Safety Guardrails for Jailbreak Attacks on Multimodal Large Language Models ( Poster ) > link | Sejoon Oh · Yiqiao Jin · Megha Sharma · Donghyun Kim · Eric Ma · Gaurav Verma · Srijan Kumar 🔗 |
-
|
Beautiful Images, Toxic Words: Understanding and Addressing Offensive Text in Generated Images ( Poster ) > link | Aditya Kumar · Tom Blanchard · Adam Dziedzic · Franziska Boenisch 🔗 |
-
|
Watermarking Autoregressive Image Generation ( Poster ) > link | Nikola Jovanović · Ismail Labiad · Tomas Soucek · Martin Vechev · Pierre Fernandez 🔗 |
-
|
Multi-Modal Medical Image Augmentation for Controlled Heterogeneity and Fair Outcomes ( Poster ) > link | SOO YONG KIM · Seunghyeok Hong 🔗 |
-
|
Foundational Models Must Be Designed To Yield Safer Loss Landscapes That Resist Harmful Fine-Tuning ( Poster ) > link | Karan Uppal · Pavan Kalyan Tankala 🔗 |
-
|
Generalizing Trust: Weak-to-Strong Trustworthiness in Language Models ( Poster ) > link | Lillian Sun · Martin Pawelczyk · Zhenting Qi · Aounon Kumar · Himabindu Lakkaraju 🔗 |
-
|
Distilling Safe LLM Systems via Soft Prompts ( Poster ) > link | Motasem Alfarra · Dana Kianfar · Cristina Pinneri · Christos Louizos 🔗 |
-
|
Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective ( Poster ) > link | Jianyu Wang · Zhiqiang Hu · Lidong Bing 🔗 |
-
|
Investigating Tool-Memory Conflicts in Tool-Augmented LLMs ( Poster ) > link | Jiali Cheng · Rui Pan · Hadi Amiri 🔗 |
-
|
Finetuning-Activated Backdoors in LLMs ( Poster ) > link | Thibaud Gloaguen · Mark Vero · Robin Staab · Martin Vechev 🔗 |
-
|
Defending Against Prompt Injection with a Few DefensiveTokens ( Poster ) > link | Sizhe Chen · Yizhu Wang · Nicholas Carlini · Chawin Sitawarin · David Wagner 🔗 |
-
|
Verbalized Confidence Triggers Self-Verification : Emergent Behavior Without Explicit Reasoning Supervision ( Poster ) > link | Chaeyun Jang · Moonseok Choi · Yegon Kim · Hyungi Lee · Juho Lee 🔗 |
-
|
Steering LLM Reasoning Through Bias-Only Adaptation ( Poster ) > link | Viacheslav Sinii · Alexey Gorbatovski · Artem Cherepanov · Boris Shaposhnikov · Nikita Balagansky · Daniil Gavrilov 🔗 |
-
|
SimBA: Simplifying Benchmark Analysis ( Poster ) > link | Nishant Subramani · Alfredo Gomez · Mona Diab 🔗 |
-
|
Aligned Textual Scoring Rule ( Poster ) > link | Yuxuan Lu · Yifan Wu · Jason Hartline · Michael Curry 🔗 |
-
|
Visual Language Models as Zero-Shot Deepfake Detectors ( Poster ) > link | Viacheslav Pirogov 🔗 |
-
|
Adversarial Manipulation of Reasoning Models using Internal Representations ( Poster ) > link | Kureha Yamaguchi · Benjamin Etheridge · Andy Arditi 🔗 |
-
|
On Learning Verifiers for Chain-of-Thought Reasoning ( Poster ) > link | Nina Balcan · Avrim Blum · Zhiyuan Li · Dravyansh Sharma 🔗 |
-
|
RoMa: A Robust Model Watermarking Scheme for Protecting IP in Diffusion Models ( Poster ) > link | Yingsha Xie · Rui Min · Zeyu Qin · Fei Ma · Li Shen · Fei Yu · Xiaochun Cao 🔗 |
-
|
Weak-for-Strong: Training Weak Meta-Agent to Harness Strong Executors ( Oral ) > link | Fan Nie · Lan Feng · Haotian Ye · Weixin Liang · Pan Lu · Huaxiu Yao · Alexandre Alahi · James Zou 🔗 |
-
|
Extracting memorized pieces of (copyrighted) books from open-weight language models ( Oral ) > link | A. Feder Cooper · Aaron Gokaslan · Ahmed Ahmed · Amy Cyphert · Chris De Sa · Mark Lemley · Daniel Ho · Percy Liang 🔗 |
-
|
Efficient and Privacy-Preserving Soft Prompt Transfer for LLMs ( Poster ) > link | Xun Wang · Jing Xu · Franziska Boenisch · Michael Backes · Christopher A. Choquette Choo · Adam Dziedzic 🔗 |
-
|
A Frustratingly Simple Yet Highly Effective Attack Baseline: Over 90% Success Rate Against the Strong Black-box Models of GPT-4.5/4o/o1 ( Poster ) > link | Zhaoyi Li · Xiaohan Zhao · Dong-Dong Wu · Jiacheng Cui · Zhiqiang Shen 🔗 |
-
|
ASNO: An Interpretable Attention-Based Spatio-Temporal Neural Operator for Robust Scientific Machine Learning ( Poster ) > link | Vispi Karkaria · Doksoo Lee · Yi-Ping Chen · Yue Yu · Wei Chen 🔗 |
-
|
Position: Agent-Specific Trustworthiness Risk as a Research Priority ( Poster ) > link | Zeming Wei · Tianlin Li · Xiaojun Jia · Yihao Zhang · Yang Liu · Meng Sun 🔗 |
-
|
Focus on This, Not That! Steering LLMs with Adaptive Feature Specification ( Poster ) > link | Tom A. Lamb · Adam Davies · Alasdair J Paren · Phil Torr · Francesco Pinto 🔗 |
-
|
Robust and Interpretable Relational Reasoning with Large Language Models and Symbolic Solvers ( Poster ) > link | Ge Zhang · Mohammad Alomrani · Hongjian Gu · Jiaming Zhou · Yaochen Hu · Bin Wang · Qun Liu · Mark Coates · Yingxue Zhang · Jianye Hao 🔗 |
-
|
Towards Secure Model Sharing with Approximate Fingerprints ( Poster ) > link | Anshul Nasery · Sewoong Oh 🔗 |
-
|
Steering Language Model Refusal with Sparse Autoencoders ( Poster ) > link | Kyle O'Brien · David Majercak · Xavier Fernandes · Richard Edgar · Blake Bullwinkel · Jingya Chen · Harsha Nori · Dean Carignan · Eric Horvitz · Forough Poursabzi-Sangdeh 🔗 |
-
|
Predicting the Performance of Black-box Language Models with Follow-up Queries ( Poster ) > link | Dylan Sam · Marc Finzi · Zico Kolter 🔗 |
-
|
Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models ( Poster ) > link | Zhanke Zhou · Zhaocheng Zhu · Xuan Li · Mikhail Galkin · Xiao Feng · Sanmi Koyejo · Jian Tang · Bo Han 🔗 |
-
|
What do Geometric Hallucination Detection Metrics Actually Measure? ( Poster ) > link |
12 presentersEric Yeats · John Buckheit · Sarah Scullen · Brendan Kennedy · Loc Truong · Davis Brown · William Kay · Cliff Joslyn · Tegan Emerson · Michael Henry · John Emanuello · Henry Kvinge |
-
|
A Statistical Physics of Language Model Reasoning ( Poster ) > link | Jack Carson 🔗 |
-
|
Sample-Specific Noise Injection For Diffusion-Based Adversarial Purification ( Poster ) > link | Yuhao Sun · Jiacheng Zhang · Zesheng Ye · Chaowei Xiao · Feng Liu 🔗 |
-
|
The Automated but Risky Game: Modeling Agent-to-Agent Negotiations and Transactions in Consumer Markets ( Poster ) > link | Shenzhe Zhu · Jiao Sun · Yi Nian · Tobin South · Alex Pentland · Jiaxin Pei 🔗 |
-
|
Do Students Debias Like Teachers? On the Distillability of Bias Mitigation Methods ( Poster ) > link | Jiali Cheng · Hadi Amiri 🔗 |
-
|
BiasGUARRD: Enhancing Fairness and Reliability in LLM Conflict Resolution Through Agentic Debiasing ( Poster ) > link | Erica Wang · Shrujana S Kunnam · Sreeyutha Ratala 🔗 |
-
|
In-Context Watermarks for Large Language Models ( Poster ) > link | Liu · Xuandong Zhao · Christopher Kruegel · Dawn Song · Yuheng Bu 🔗 |
-
|
Transferable Visual Adversarial Attacks for Proprietary Multimodal Large Language Models ( Poster ) > link | Kai Hu · Weichen Yu · Alexander Robey · Li Zhang · Andy Zou · Haoqi Hu · Chengming Xu · Matt Fredrikson 🔗 |
-
|
Position: Membership Inference Attack Should Move On to Distributional Statistics for Distilled Generative Models ( Poster ) > link | Muxing Li · Zesheng Ye · Sharon Li · Andy Song · Guangquan Zhang · Feng Liu 🔗 |
-
|
Prune 'n Predict: Optimizing LLM Decision-making with Conformal Prediction ( Poster ) > link | Harit Vishwakarma · Alan Mishler · Thomas Cook · Niccolo Dalmasso · Natraj Raman · Sumitra Ganesh 🔗 |
-
|
The Geometries of Truth Are Orthogonal Across Tasks ( Poster ) > link | Waïss Azizian · Michael Kirchhof · Eugene Ndiaye · Louis Béthune · Michal Klein · Pierre Ablin · Marco Cuturi 🔗 |
-
|
Auditing, Monitoring, and Intervention for Compliance of Advanced AI Systems ( Poster ) > link | Parand A. Alamdari · Toryn Q. Klassen · Sheila McIlraith 🔗 |
-
|
Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems through Benign Queries ( Poster ) > link | Yuhao Wang · Wenjie Qu · Yanze Jiang · Lichen Liu · Yue Liu · Shengfang Zhai · Yinpeng Dong · Jiaheng Zhang 🔗 |
-
|
MARVEL: Modular Abstention for Reliable and Versatile Expert LLMs ( Poster ) > link | Bingbing Wen · Faeze Brahman · Zhan Su · Shangbin Feng · Yulia Tsvetkov · Lucy Lu Wang · Bill Howe 🔗 |
-
|
Bidding for Influence: Auction-Driven Diffusion Image Generation ( Poster ) > link | Lillian Sun · Henry Huang · Fucheng Zhu · Giannis Daras · Constantinos Daskalakis 🔗 |
-
|
Advancing LLM Safe Alignment with Safety Representation Ranking ( Poster ) > link | Tianqi Du · Zeming Wei · Quan Chen · Chenheng Zhang · Yisen Wang 🔗 |
-
|
On the Scoring Functions for RAG-based Conformal Factuality ( Poster ) > link | Yi Chen · Caitlyn Yin · Sukrut Chikodikar · Ramya Vinayak 🔗 |
-
|
GPT, But Backwards: Exactly Inverting Language Model Outputs ( Poster ) > link | Adrians Skapars · Edoardo Manino · Youcheng Sun · Lucas Cordeiro 🔗 |
-
|
(Im)possibility of Automated Hallucination Detection in Large Language Models ( Poster ) > link | Amin Karbasi · Omar Montasser · John Sous · Grigorios Velegkas 🔗 |
-
|
Visual Instruction Bottleneck Tuning ( Oral ) > link | Changdae Oh · Jiatong Li · Shawn Im · Sharon Li 🔗 |
-
|
Lifelong Safety Alignment for Language Models ( Poster ) > link | Haoyu Wang · Zeyu Qin · Yifei Zhao · Chao Du · Min Lin · Xueqian Wang · Tianyu Pang 🔗 |
-
|
One Stone, Two Birds: Enhancing Adversarial Defense Through the Lens of Distributional Discrepancy ( Poster ) > link | Jiacheng Zhang · Benjamin Rubinstein · Jingfeng Zhang · Feng Liu 🔗 |
-
|
TRoVe: Discovering Error-Inducing Static Feature Biases in Temporal Vision-Language Models ( Poster ) > link | Maya Varma · Jean-Benoit Delbrouck · Sophie Ostmeier · Akshay Chaudhari · Curtis Langlotz 🔗 |
-
|
DP-AdamW: Investigating Decoupled Weight Decay and Bias Correction in Private Deep Learning ( Poster ) > link | Lillian Sun · Kevin Cong · Jay Chooi · Russell Li 🔗 |
-
|
Sysformer: Safeguarding Frozen Large Language Models with Adaptive System Prompts ( Poster ) > link | Kartik Sharma · Yiqiao Jin · Vineeth Rakesh · Yingtong Dou · Menghai Pan · Mahashweta Das · Srijan Kumar 🔗 |
-
|
The Necessity for Intervention Fidelity: Unintended Side Effects When Steering LLMs ( Poster ) > link | Jonas Raedler · Weiyue Li · Alyssa Taliotis · Manasvi Goyal · Siddharth Swaroop · Weiwei Pan 🔗 |
-
|
SAEs Can Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs ( Poster ) > link | Aashiq Muhamed · Jacopo Bonato · Mona Diab · Virginia Smith 🔗 |
-
|
Robust LLM Fingerprinting via Domain-Specific Watermarks ( Poster ) > link | Thibaud Gloaguen · Robin Staab · Nikola Jovanović · Martin Vechev 🔗 |
-
|
Position: Reasoning LLMs are Wandering Solution Explorers ( Poster ) > link | Jiahao Lu · Ziwei Xu · Mohan Kankanhalli 🔗 |
-
|
GenAI Copyright Evidence with Operational Meaning ( Poster ) > link | Eli Chien · Amit Saha · Yinan Huang · Pan Li 🔗 |
-
|
Reward Shaping to Mitigate Reward Hacking in RLHF ( Poster ) > link | Jiayi Fu · Xuandong Zhao · Chengyuan Yao · Heng Wang · Qi Han · Yanghua Xiao 🔗 |
-
|
Self-reflective Uncertainties: Do LLMs Know Their Internal Answer Distribution? ( Poster ) > link | Michael Kirchhof · Luca Füger · Adam Golinski · Eeshan Gunesh Dhekane · Arno Blaas · Sinead Williamson 🔗 |
-
|
State Space Models: A Naturally Robust Alternative to Transformers in Computer Vision ( Poster ) > link | Chengbin Du · Yanxi Li · Chang Xu 🔗 |
-
|
Transformers Don't In-Context Learn Least Squares Regression ( Oral ) > link | Joshua Hill · Benjamin Eyre · Elliot Creager 🔗 |
-
|
Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems? ( Poster ) > link | Kai Yan · Yufei Xu · Zhengyin Du · Xuesong Yao · Zheyu Wang · Xiaowen Guo · Jiecao Chen 🔗 |
-
|
Uncertainty Quantification for MLLMs ( Poster ) > link | Gregory Kang Ruey Lau · Hieu Dao · Nicole Hui Lin Kan · Bryan Kian Hsiang Low 🔗 |
-
|
Data Shifts Hurt CoT: A Theoretical Study ( Poster ) > link | Lang Yin · Debangshu Banerjee · Gagandeep Singh 🔗 |