flowengineR: A Modular and Extensible Framework for Fair and Reproducible Workflow Design in R

Willer, Maximilian; Ruckdeschel, Peter

计算机科学 > 机器学习

arXiv:2511.00079v1 (cs)

[提交于 2025年10月29日 ]

标题： flowengineR：R中公平和可重复工作流设计的模块化和可扩展框架

标题： flowengineR: A Modular and Extensible Framework for Fair and Reproducible Workflow Design in R

Authors:Maximilian Willer, Peter Ruckdeschel

摘要： flowengineR是一个R包，旨在为构建可重复的算法工作流提供模块化和可扩展的框架，适用于通用的机器学习流水线。它受到算法公平性领域迅速发展的启发，其中新的度量标准、缓解策略和机器学习方法不断出现。在公平性中是一个核心挑战，但远远不止于此，现有工具包要么专注于单一干预措施，要么将可重复性和可扩展性作为次要考虑因素，而不是核心设计原则。 flowengineR通过引入数据分割、执行、预处理、训练、中间处理、后处理、评估和报告的标准引擎统一架构来解决这个问题。每个引擎封装了一个方法任务，但通过轻量级接口进行通信，确保工作流保持透明、可审计且易于扩展。尽管是在R中实现的，但flowengineR借鉴了工作流语言（CWL、YAWL）、面向图的可视化编程语言（KNIME）和R框架（BatchJobs、batchtools）的想法。然而，它的重点不在于协调引擎以实现弹性并行执行，而是在于简单设置和管理不同的引擎作为数据结构。这种正交化使得责任分布、独立开发和简化集成成为可能。在公平性背景下，通过将公平性方法结构化为可互换的引擎，flowengineR使研究人员能够在建模流程中整合、比较和评估干预措施。同时，该架构可以推广到可解释性、鲁棒性和合规性度量，而无需核心修改。虽然由公平性驱动，但它最终提供了一个通用的基础架构，适用于任何需要可重复性、透明性和可扩展性的工作流环境。

摘要： flowengineR is an R package designed to provide a modular and extensible framework for building reproducible algorithmic workflows for general-purpose machine learning pipelines. It is motivated by the rapidly evolving field of algorithmic fairness where new metrics, mitigation strategies, and machine learning methods continuously emerge. A central challenge in fairness, but also far beyond, is that existing toolkits either focus narrowly on single interventions or treat reproducibility and extensibility as secondary considerations rather than core design principles. flowengineR addresses this by introducing a unified architecture of standardized engines for data splitting, execution, preprocessing, training, inprocessing, postprocessing, evaluation, and reporting. Each engine encapsulates one methodological task yet communicates via a lightweight interface, ensuring workflows remain transparent, auditable, and easily extensible. Although implemented in R, flowengineR builds on ideas from workflow languages (CWL, YAWL), graph-oriented visual programming languages (KNIME), and R frameworks (BatchJobs, batchtools). Its emphasis, however, is less on orchestrating engines for resilient parallel execution but rather on the straightforward setup and management of distinct engines as data structures. This orthogonalization enables distributed responsibilities, independent development, and streamlined integration. In fairness context, by structuring fairness methods as interchangeable engines, flowengineR lets researchers integrate, compare, and evaluate interventions across the modeling pipeline. At the same time, the architecture generalizes to explainability, robustness, and compliance metrics without core modifications. While motivated by fairness, it ultimately provides a general infrastructure for any workflow context where reproducibility, transparency, and extensibility are essential.

评论：	27页，7图，1表
主题：	机器学习 (cs.LG) ; 计算机与社会 (cs.CY); 方法论 (stat.ME)
MSC 类：	62-04, 62-07
ACM 类：	D.2.11; G.3; I.2.6
引用方式：	arXiv:2511.00079 [cs.LG]
	(或者 arXiv:2511.00079v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2511.00079

提交历史

来自： Maximilian Willer [查看电子邮件]
[v1] 星期三， 2025 年 10 月 29 日 17:59:19 UTC (522 KB)

计算机科学 > 机器学习

标题： flowengineR：R中公平和可重复工作流设计的模块化和可扩展框架

标题： flowengineR: A Modular and Extensible Framework for Fair and Reproducible Workflow Design in R

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： flowengineR：R中公平和可重复工作流设计的模块化和可扩展框架 显示英文标题

标题： flowengineR: A Modular and Extensible Framework for Fair and Reproducible Workflow Design in R

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： flowengineR：R中公平和可重复工作流设计的模块化和可扩展框架