Counterfactual Reasoning for Steerable Pluralistic Value Alignment of Large Language Models

Published in NeurIPS 2025, CCF-A, 2025

COUPLE uses structural causal models and counterfactual reasoning to support fine-grained control over pluralistic human values in large language model alignment. The framework constructs interpretable reasoning trajectories and distills them into smaller models for personalized alignment tasks.

arXiv

Recommended citation: Hanze Guo, Jing Yao, Xiao Zhou, Xiaoyuan Yi, and Xing Xie. "Counterfactual Reasoning for Steerable Pluralistic Value Alignment of Large Language Models." NeurIPS, 2025.
Download Paper