An optimal two-step estimation approach for two-phase studies

Zhou, Qingning; Wong, Kin Yau

统计学 > 方法论

arXiv:2510.11587v1 (stat)

[提交于 2025年10月13日 ]

标题：两阶段研究的最优两步估计方法

标题： An optimal two-step estimation approach for two-phase studies

Authors:Qingning Zhou, Kin Yau Wong

摘要：两阶段抽样常用于降低成本并提高估计效率。在许多两阶段研究中，结果变量和一些低成本的协变量在第一阶段对大样本进行观测，而在第二阶段对样本的选定子集获取昂贵的协变量。因此，分析结果变量与协变量之间的关联面临缺失数据问题。仅依赖第二阶段样本的完整案例分析通常效率较低。在本文中，我们研究了一种两步估计方法，该方法首先使用完整数据获得一个估计量，然后利用从结果变量与低成本协变量之间的工作模型中得到的渐近均值为零的估计量，使用完整数据对其进行更新。这种两步估计量在渐近意义上至少与完整数据估计量一样有效，并且对工作模型的误设具有鲁棒性。我们提出了一种基于核的方法来构建达到最优效率的两步估计量。此外，当完全非参数核方法不可行时，我们开发了一种基于多个工作模型的简单联合更新方法来近似最优估计量。我们通过各种结果模型说明了所提出的方法。我们通过模拟研究展示了它们相对于现有方法的优势，并提供了一个主要癌症基因组学研究的应用实例。

摘要： Two-phase sampling is commonly adopted for reducing cost and improving estimation efficiency. In many two-phase studies, the outcome and some cheap covariates are observed for a large sample in Phase I, and expensive covariates are obtained for a selected subset of the sample in Phase II. As a result, the analysis of the association between the outcome and covariates faces a missing data problem. Complete-case analysis, which relies solely on the Phase II sample, is generally inefficient. In this paper, we study a two-step estimation approach, which first obtains an estimator using the complete data, and then updates it using an asymptotically mean-zero estimator obtained from a working model between the outcome and cheap covariates using the full data. This two-step estimator is asymptotically at least as efficient as the complete-data estimator and is robust to misspecification of the working model. We propose a kernel-based method to construct a two-step estimator that achieves optimal efficiency. Additionally, we develop a simple joint update approach based on multiple working models to approximate the optimal estimator when a fully nonparametric kernel approach is infeasible. We illustrate the proposed methods with various outcome models. We demonstrate their advantages over existing approaches through simulation studies and provide an application to a major cancer genomics study.

主题：	方法论 (stat.ME)
引用方式：	arXiv:2510.11587 [stat.ME]
	(或者 arXiv:2510.11587v1 [stat.ME] 对于此版本)
	https://doi.org/10.48550/arXiv.2510.11587

提交历史

来自： Kin Yau Wong [查看电子邮件]
[v1] 星期一， 2025 年 10 月 13 日 16:29:19 UTC (59 KB)

统计学 > 方法论

标题：两阶段研究的最优两步估计方法

标题： An optimal two-step estimation approach for two-phase studies

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

统计学 > 方法论

标题： 两阶段研究的最优两步估计方法 显示英文标题

标题： An optimal two-step estimation approach for two-phase studies

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：两阶段研究的最优两步估计方法