Towards Theoretical Understanding of Transformer Test-Time Computing: Investigation on In-Context Linear Regression

Chen, Xingwu; Lu, Miao; Wu, Beining; Zou, Difan

计算机科学 > 机器学习

arXiv:2508.07571 (cs)

[提交于 2025年8月11日 ]

标题：面向Transformer测试时计算的理论理解：上下文线性回归的探究

标题： Towards Theoretical Understanding of Transformer Test-Time Computing: Investigation on In-Context Linear Regression

Authors:Xingwu Chen, Miao Lu, Beining Wu, Difan Zou

摘要：在语言模型推理过程中使用更多的测试时间计算，例如生成更多的中间思考或采样多个候选答案，已被证明能显著提高模型性能。本文通过引入随机性和采样，首次迈出了弥合实际语言模型推理与理论变压器分析之间差距的一步。我们专注于上下文中的线性回归，包括连续/二进制系数，其中我们的框架通过噪声注入和二进制系数采样来模拟语言模型解码。通过这个框架，我们提供了对广泛采用的推理技术的详细分析。在实证结果的支持下，我们的理论框架和分析展示了为理解现实世界语言模型中的推理行为提供新见解的潜力。

摘要： Using more test-time computation during language model inference, such as generating more intermediate thoughts or sampling multiple candidate answers, has proven effective in significantly improving model performance. This paper takes an initial step toward bridging the gap between practical language model inference and theoretical transformer analysis by incorporating randomness and sampling. We focus on in-context linear regression with continuous/binary coefficients, where our framework simulates language model decoding through noise injection and binary coefficient sampling. Through this framework, we provide detailed analyses of widely adopted inference techniques. Supported by empirical results, our theoretical framework and analysis demonstrate the potential for offering new insights into understanding inference behaviors in real-world language models.

主题：	机器学习 (cs.LG) ; 人工智能 (cs.AI)
引用方式：	arXiv:2508.07571 [cs.LG]
	(或者 arXiv:2508.07571v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2508.07571

提交历史

来自： Xingwu Chen [查看电子邮件]
[v1] 星期一， 2025 年 8 月 11 日 03:05:36 UTC (1,117 KB)

计算机科学 > 机器学习

标题：面向Transformer测试时计算的理论理解：上下文线性回归的探究

标题： Towards Theoretical Understanding of Transformer Test-Time Computing: Investigation on In-Context Linear Regression

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 面向Transformer测试时计算的理论理解：上下文线性回归的探究 显示英文标题

标题： Towards Theoretical Understanding of Transformer Test-Time Computing: Investigation on In-Context Linear Regression

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：面向Transformer测试时计算的理论理解：上下文线性回归的探究