"Pull or Not to Pull?'': Investigating Moral Biases in Leading Large Language Models Across Ethical Dilemmas

Ding, Junchen; Jiang, Penghao; Xu, Zihao; Ding, Ziqi; Zhu, Yichen; Jiang, Jiaojiao; Li, Yuekang

计算机科学 > 计算与语言

arXiv:2508.07284 (cs)

[提交于 2025年8月10日 ]

标题： “拉还是不拉？ ”：在伦理困境中调查大型语言模型中的道德偏见

标题： "Pull or Not to Pull?'': Investigating Moral Biases in Leading Large Language Models Across Ethical Dilemmas

Authors:Junchen Ding, Penghao Jiang, Zihao Xu, Ziqi Ding, Yichen Zhu, Jiaojiao Jiang, Yuekang Li

摘要：随着大型语言模型（LLMs）越来越多地参与涉及伦理敏感的决策，理解它们的道德推理过程变得至关重要。本研究对14个领先的LLMs进行了全面的实证评估，包括具备推理能力的模型和通用模型，在27种多样化的电车问题场景中进行测试，这些场景基于十种道德哲学框架，包括功利主义、义务论和利他主义。使用因子提示协议，我们获取了3,780个二元决策和自然语言的解释，使得能够在决策果断性、解释答案一致性、公共道德一致性以及对伦理无关线索的敏感性等维度上进行分析。我们的研究结果揭示了在不同伦理框架和模型类型之间存在显著的差异：增强推理的模型表现出更高的果断性和结构化的解释，但并不总能更好地与人类共识保持一致。值得注意的是，在利他主义、公平性和美德伦理框架中出现了“甜蜜区域”，在这些区域中，模型实现了高干预率、低解释冲突，并且与综合人类判断的偏差最小。然而，在强调亲属关系、合法性和自我利益的框架下，模型会出现偏差，常常产生伦理上有争议的结果。这些模式表明，道德提示不仅是一种行为调节工具，也是一种诊断工具，可以揭示不同提供者之间的潜在对齐哲学。我们主张将道德推理作为LLM对齐的主要轴线，呼吁建立标准化的基准测试，不仅要评估LLMs做出了什么决定，还要评估它们如何做出决定以及为什么这样决定。

摘要： As large language models (LLMs) increasingly mediate ethically sensitive decisions, understanding their moral reasoning processes becomes imperative. This study presents a comprehensive empirical evaluation of 14 leading LLMs, both reasoning enabled and general purpose, across 27 diverse trolley problem scenarios, framed by ten moral philosophies, including utilitarianism, deontology, and altruism. Using a factorial prompting protocol, we elicited 3,780 binary decisions and natural language justifications, enabling analysis along axes of decisional assertiveness, explanation answer consistency, public moral alignment, and sensitivity to ethically irrelevant cues. Our findings reveal significant variability across ethical frames and model types: reasoning enhanced models demonstrate greater decisiveness and structured justifications, yet do not always align better with human consensus. Notably, "sweet zones" emerge in altruistic, fairness, and virtue ethics framings, where models achieve a balance of high intervention rates, low explanation conflict, and minimal divergence from aggregated human judgments. However, models diverge under frames emphasizing kinship, legality, or self interest, often producing ethically controversial outcomes. These patterns suggest that moral prompting is not only a behavioral modifier but also a diagnostic tool for uncovering latent alignment philosophies across providers. We advocate for moral reasoning to become a primary axis in LLM alignment, calling for standardized benchmarks that evaluate not just what LLMs decide, but how and why.

主题：	计算与语言 (cs.CL) ; 人工智能 (cs.AI); 计算机与社会 (cs.CY)
引用方式：	arXiv:2508.07284 [cs.CL]
	(或者 arXiv:2508.07284v1 [cs.CL] 对于此版本)
	https://doi.org/10.48550/arXiv.2508.07284

提交历史

来自： Junchen Ding [查看电子邮件]
[v1] 星期日， 2025 年 8 月 10 日 10:45:16 UTC (1,009 KB)

计算机科学 > 计算与语言

标题： “拉还是不拉？ ”：在伦理困境中调查大型语言模型中的道德偏见

标题： "Pull or Not to Pull?'': Investigating Moral Biases in Leading Large Language Models Across Ethical Dilemmas

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算与语言

标题： “拉还是不拉？ ”：在伦理困境中调查大型语言模型中的道德偏见 显示英文标题

标题： "Pull or Not to Pull?'': Investigating Moral Biases in Leading Large Language Models Across Ethical Dilemmas

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： “拉还是不拉？ ”：在伦理困境中调查大型语言模型中的道德偏见