Ethics2vec: aligning automatic agents and human preferences

Bontempi, Gianluca

计算机科学 > 人工智能

arXiv:2508.07673 (cs)

[提交于 2025年8月11日 ]

标题：伦理2向量：对齐自动代理与人类偏好

标题： Ethics2vec: aligning automatic agents and human preferences

Authors:Gianluca Bontempi

摘要：尽管智能代理被期望改善人类体验（或使其更高效），但从人类的角度来看，很难把握嵌入在代理行为中的显式或隐式伦理价值观。这是众所周知的对齐问题，指的是设计与人类价值观、目标和偏好相一致的AI系统的挑战。这个问题尤其具有挑战性，因为大多数人类的伦理考虑都涉及\emph{不可通约的}（即不可测量和/或不可比较）的价值和标准。例如，考虑一个为癌症患者开处方的医疗代理。它如何考虑（和/或权衡）像人类生命的价值和治疗成本这样的不可比较方面？现在，只有当我们定义一个共同的空间，在其中可以定义和使用度量时，人类和人工价值观之间的对齐才是可能的。本文提出将传统的Anything2vec方法扩展到伦理领域，该方法已在许多类似且难以量化的情况下取得了成功（从自然语言处理到推荐系统和图分析）。本文提出了一种将自动代理决策（或控制定律）策略映射到多变量向量表示的方法，该方法可用于比较和评估与人类价值观的对齐程度。 Ethics2Vec方法首先在自动代理执行二元决策的情况下进行介绍。然后讨论了自动控制定律（如自动驾驶汽车的情况）的向量化，以说明该方法如何扩展到自动控制设置。

摘要： Though intelligent agents are supposed to improve human experience (or make it more efficient), it is hard from a human perspective to grasp the ethical values which are explicitly or implicitly embedded in an agent behaviour. This is the well-known problem of alignment, which refers to the challenge of designing AI systems that align with human values, goals and preferences. This problem is particularly challenging since most human ethical considerations refer to \emph{incommensurable} (i.e. non-measurable and/or incomparable) values and criteria. Consider, for instance, a medical agent prescribing a treatment to a cancerous patient. How could it take into account (and/or weigh) incommensurable aspects like the value of a human life and the cost of the treatment? Now, the alignment between human and artificial values is possible only if we define a common space where a metric can be defined and used. This paper proposes to extend to ethics the conventional Anything2vec approach, which has been successful in plenty of similar and hard-to-quantify domains (ranging from natural language processing to recommendation systems and graph analysis). This paper proposes a way to map an automatic agent decision-making (or control law) strategy to a multivariate vector representation, which can be used to compare and assess the alignment with human values. The Ethics2Vec method is first introduced in the case of an automatic agent performing binary decision-making. Then, a vectorisation of an automatic control law (like in the case of a self-driving car) is discussed to show how the approach can be extended to automatic control settings.

主题：	人工智能 (cs.AI) ; 机器学习 (cs.LG)
引用方式：	arXiv:2508.07673 [cs.AI]
	(或者 arXiv:2508.07673v1 [cs.AI] 对于此版本)
	https://doi.org/10.48550/arXiv.2508.07673

提交历史

来自： Gianluca Bontempi [查看电子邮件]
[v1] 星期一， 2025 年 8 月 11 日 06:52:46 UTC (487 KB)

计算机科学 > 人工智能

标题：伦理2向量：对齐自动代理与人类偏好

标题： Ethics2vec: aligning automatic agents and human preferences

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 人工智能

标题： 伦理2向量：对齐自动代理与人类偏好 显示英文标题

标题： Ethics2vec: aligning automatic agents and human preferences

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：伦理2向量：对齐自动代理与人类偏好