K-Dense Analyst: Towards Fully Automated Scientific Analysis

Li, Orion; Agarwal, Vinayak; Zhou, Summer; Gopinath, Ashwin; Kassis, Timothy

计算机科学 > 人工智能

arXiv:2508.07043 (cs)

[提交于 2025年8月9日 ]

标题： K-密集分析员：迈向完全自动化的科学分析

标题： K-Dense Analyst: Towards Fully Automated Scientific Analysis

Authors:Orion Li, Vinayak Agarwal, Summer Zhou, Ashwin Gopinath, Timothy Kassis

摘要：现代生物信息学分析的复杂性在数据生成和科学见解的开发之间造成了关键的差距。尽管大型语言模型（LLMs）在科学推理方面显示出潜力，但在处理需要迭代计算、工具集成和严格验证的实际分析工作流时，它们仍然存在根本性的限制。我们引入了K-Dense Analyst，这是一个分层的多智能体系统，通过双循环架构实现自主的生物信息学分析。 K-Dense Analyst是更广泛的K-Dense平台的一部分，它使用专用代理将规划与经过验证的执行相结合，在安全的计算环境中将复杂目标分解为可执行和可验证的任务。在BixBench上，这是一个用于开放生物分析的全面基准，K-Dense Analyst达到了29.2%的准确率，比表现最好的语言模型（GPT-5）高出6.3个百分点，这代表了比目前广泛认为最强大的LLM高出近27%的改进。值得注意的是，K-Dense Analyst使用Gemini 2.5 Pro实现了这一性能，而当直接使用Gemini 2.5 Pro时，其准确率仅为18.3%，这表明我们的架构创新能够释放出远超基础模型基准性能的能力。我们的见解表明，自主科学推理不仅仅需要增强的语言模型，还需要专门构建的系统，这些系统能够弥合高层次科学目标与低层次计算执行之间的差距。这些结果标志着在完全自主的计算生物学家方面取得了重大进展，这些生物学家能够在生命科学领域加速发现。

摘要： The complexity of modern bioinformatics analysis has created a critical gap between data generation and developing scientific insights. While large language models (LLMs) have shown promise in scientific reasoning, they remain fundamentally limited when dealing with real-world analytical workflows that demand iterative computation, tool integration and rigorous validation. We introduce K-Dense Analyst, a hierarchical multi-agent system that achieves autonomous bioinformatics analysis through a dual-loop architecture. K-Dense Analyst, part of the broader K-Dense platform, couples planning with validated execution using specialized agents to decompose complex objectives into executable, verifiable tasks within secure computational environments. On BixBench, a comprehensive benchmark for open-ended biological analysis, K-Dense Analyst achieves 29.2% accuracy, surpassing the best-performing language model (GPT-5) by 6.3 percentage points, representing nearly 27% improvement over what is widely considered the most powerful LLM available. Remarkably, K-Dense Analyst achieves this performance using Gemini 2.5 Pro, which attains only 18.3% accuracy when used directly, demonstrating that our architectural innovations unlock capabilities far beyond the underlying model's baseline performance. Our insights demonstrate that autonomous scientific reasoning requires more than enhanced language models, it demands purpose-built systems that can bridge the gap between high-level scientific objectives and low-level computational execution. These results represent a significant advance toward fully autonomous computational biologists capable of accelerating discovery across the life sciences.

主题：	人工智能 (cs.AI) ; 多智能体系统 (cs.MA); 基因组学 (q-bio.GN); 定量方法 (q-bio.QM)
引用方式：	arXiv:2508.07043 [cs.AI]
	(或者 arXiv:2508.07043v1 [cs.AI] 对于此版本)
	https://doi.org/10.48550/arXiv.2508.07043

提交历史

来自： Ashwin Gopinath [查看电子邮件]
[v1] 星期六， 2025 年 8 月 9 日 16:59:55 UTC (2,571 KB)

计算机科学 > 人工智能

标题： K-密集分析员：迈向完全自动化的科学分析

标题： K-Dense Analyst: Towards Fully Automated Scientific Analysis

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 人工智能

标题： K-密集分析员：迈向完全自动化的科学分析 显示英文标题

标题： K-Dense Analyst: Towards Fully Automated Scientific Analysis

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： K-密集分析员：迈向完全自动化的科学分析