Counterfactual Reasoning for Steerable Pluralistic Value Alignment of Large Language Models

Published in NeurIPS 2025, CCF-A, 2025

COUPLE uses structural causal models and counterfactual reasoning to support fine-grained control over pluralistic human values in large language model alignment. The framework constructs interpretable reasoning trajectories and distills them into smaller models for personalized alignment tasks.

arXiv

Recommended citation: Hanze Guo, Jing Yao, Xiao Zhou, Xiaoyuan Yi, and Xing Xie. "Counterfactual Reasoning for Steerable Pluralistic Value Alignment of Large Language Models." NeurIPS, 2025.
Download Paper

Share on

Bluesky Facebook LinkedIn Mastodon X (formerly Twitter)