计算机科学 > 人工智能
[提交于 2026年1月8日
]
标题: MineNPC-Task:面向记忆感知Minecraft代理的任务套件
标题: MineNPC-Task: Task Suite for Memory-Aware Minecraft Agents
摘要: We present \textsc{MineNPC任务}, a user-authored benchmark and evaluation harness for testing memory-aware, mixed-initiative LLM agents in open-world \emph{我的世界}. Rather than relying on synthetic prompts, tasks are elicited from formative and summative co-play with expert players, normalized into parametric templates with explicit preconditions and dependency structure, and paired with machine-checkable validators under a bounded-knowledge policy that forbids out-of-world shortcuts. The harness captures plan/act/memory events-including plan previews, targeted clarifications, memory reads and writes, precondition checks, and repair attempts and reports outcomes relative to the total number of attempted subtasks, derived from in-world evidence. As an initial snapshot, we instantiate the framework with GPT-4o and evaluate \textbf{216} subtasks across \textbf{8} experienced players. We observe recurring breakdown patterns in code execution, inventory/tool handling, referencing, and navigation, alongside recoveries supported by mixed-initiative clarifications and lightweight memory. Participants rated interaction quality and interface usability positively, while highlighting the need for stronger memory persistence across tasks. We release the complete task suite, validators, logs, and harness to support transparent, reproducible evaluation of future memory-aware embodied agents.
文献和引用工具
与本文相关的代码,数据和媒体
alphaXiv (什么是 alphaXiv?)
CatalyzeX 代码查找器 (什么是 CatalyzeX?)
DagsHub (什么是 DagsHub?)
Gotit.pub (什么是 GotitPub?)
Hugging Face (什么是 Huggingface?)
带有代码的论文 (什么是带有代码的论文?)
ScienceCast (什么是 ScienceCast?)
演示
推荐器和搜索工具
arXivLabs:与社区合作伙伴的实验项目
arXivLabs 是一个框架,允许合作伙伴直接在我们的网站上开发和分享新的 arXiv 特性。
与 arXivLabs 合作的个人和组织都接受了我们的价值观,即开放、社区、卓越和用户数据隐私。arXiv 承诺这些价值观,并且只与遵守这些价值观的合作伙伴合作。
有一个为 arXiv 社区增加价值的项目想法吗? 了解更多关于 arXivLabs 的信息.