Skip to main content
CenXiv.org
此网站处于试运行阶段,支持我们!
我们衷心感谢所有贡献者的支持。
贡献
赞助
cenxiv logo > cs.DC

帮助 | 高级搜索

分布式、并行与集群计算

  • 新提交
  • 交叉列表
  • 替换

查看 最近的 文章

显示 2025年09月19日, 星期五 新的列表

总共 10 条目
显示最多 2000 每页条目: 较少 | 更多 | 所有

新提交 (展示 3 之 3 条目 )

[1] arXiv:2509.14920 [中文pdf, pdf, html, 其他]
标题: 成本性能分析:基于CPU的无服务器和基于GPU的训练架构的比较研究
标题: Cost-Performance Analysis: A Comparative Study of CPU-Based Serverless and GPU-Based Training Architectures
Amine Barrak, Fabio Petrillo, Fehmi Jaafar
期刊参考: 第26届国际并行与分布式计算、应用和技术会议2025
主题: 分布式、并行与集群计算 (cs.DC)

分布式机器学习(ML)领域面临着对可扩展且成本效益高的训练解决方案日益增长的需求,尤其是在大型复杂模型的背景下。无服务器计算作为一种有前景的范式,通过提供动态可扩展性和资源高效的执行来应对这些挑战。在我们之前的工作基础上,该工作介绍了无服务器的鲁棒训练对等集成(SPIRT)架构,本文提出了几种无服务器分布式ML架构的比较分析。我们考察了SPIRT以及像ScatterReduce、AllReduce和MLLess这样的成熟架构,重点关注训练时间效率、成本效益、通信开销和容错能力等关键指标。我们的研究结果表明,SPIRT通过并行批量处理和由RedisAI支持的数据库内操作等策略,在减少训练时间和通信开销方面表现出显著改进。然而,传统架构显示出可扩展性挑战,并在不同程度上容易受到故障和对抗性攻击的影响。成本分析强调了尽管初始设置成本较高,SPIRT在长期经济利益方面的优势。本研究不仅突出了当前无服务器ML架构的优势和局限性,还为未来的研究奠定了基础,旨在开发结合现有系统最有效特性的新模型。

The field of distributed machine learning (ML) faces increasing demands for scalable and cost-effective training solutions, particularly in the context of large, complex models. Serverless computing has emerged as a promising paradigm to address these challenges by offering dynamic scalability and resource-efficient execution. Building upon our previous work, which introduced the Serverless Peer Integrated for Robust Training (SPIRT) architecture, this paper presents a comparative analysis of several serverless distributed ML architectures. We examine SPIRT alongside established architectures like ScatterReduce, AllReduce, and MLLess, focusing on key metrics such as training time efficiency, cost-effectiveness, communication overhead, and fault tolerance capabilities. Our findings reveal that SPIRT provides significant improvements in reducing training times and communication overhead through strategies such as parallel batch processing and in-database operations facilitated by RedisAI. However, traditional architectures exhibit scalability challenges and varying degrees of vulnerability to faults and adversarial attacks. The cost analysis underscores the long-term economic benefits of SPIRT despite its higher initial setup costs. This study not only highlights the strengths and limitations of current serverless ML architectures but also sets the stage for future research aimed at developing new models that combine the most effective features of existing systems.

[2] arXiv:2509.15182 [中文pdf, pdf, html, 其他]
标题: 基于条件先验的非平稳信道估计使用加速扩散模型
标题: Conditional Prior-based Non-stationary Channel Estimation Using Accelerated Diffusion Models
Muhammad Ahmed Mohsin, Ahsan Bilal, Muhammad Umer, Asad Aali, Muhammad Ali Jamshed, Dean F. Hougen, John M. Cioffi
评论: ICASSP 2026
主题: 分布式、并行与集群计算 (cs.DC)

运动丰富的城市微小区(UMi)环境中的无线信道是非平稳的;移动性和散射体动态随时间改变分布,导致经典和深度估计器性能下降。 本工作提出了用于信道估计的条件先验扩散方法,该方法学习一个与历史相关的得分函数以去除噪声信道快照的噪声。 一个带有跨时间注意力的时序编码器将一个短观察窗口压缩成一个上下文向量,该向量捕捉信道的瞬时相干性,并通过特征级调制引导去噪器。 在推理过程中,一个信噪比匹配的初始化选择与测量输入信噪比对应的扩散步骤,然后按照缩短的、几何间隔的调度进行,以较少的迭代次数保留信噪比轨迹。 利用前一信道估计的时序自条件和仅在训练时使用的平滑惩罚进一步稳定演化过程,而不会对测试时的估计器产生偏差。 在3GPP基准上的评估显示,在所有信噪比下均比LMMSE、GMM、LSTM和LDAMP基线方法具有更低的NMSE,证明了其稳定的性能和强大的高信噪比保真度。

Wireless channels in motion-rich urban microcell (UMi) settings are non-stationary; mobility and scatterer dynamics shift the distribution over time, degrading classical and deep estimators. This work proposes conditional prior diffusion for channel estimation, which learns a history-conditioned score to denoise noisy channel snapshots. A temporal encoder with cross-time attention compresses a short observation window into a context vector, which captures the channel's instantaneous coherence and steers the denoiser via feature-wise modulation. In inference, an SNR-matched initialization selects the diffusion step whose marginal aligns with the measured input SNR, and the process follows a shortened, geometrically spaced schedule, preserving the signal-to-noise trajectory with far fewer iterations. Temporal self-conditioning with the previous channel estimate and a training-only smoothness penalty further stabilizes evolution without biasing the test-time estimator. Evaluations on a 3GPP benchmark show lower NMSE across all SNRs than LMMSE, GMM, LSTM, and LDAMP baselines, demonstrating stable performance and strong high SNR fidelity.

[3] arXiv:2509.15192 [中文pdf, pdf, html, 其他]
标题: 基于持续学习的损失正则化的网络分布偏移下的信道预测
标题: Channel Prediction under Network Distribution Shift Using Continual Learning-based Loss Regularization
Muhammad Ahmed Mohsin, Muhammad Umer, Ahsan Bilal, Muhammad Ibtsaam Qadir, Muhammad Ali Jamshed, Dean F. Hougen, John M. Cioffi
评论: ICASSP 2026
主题: 分布式、并行与集群计算 (cs.DC)

现代无线网络在移动用户穿越具有不同天线布局、载波频率和散射统计的异构网络配置时面临关键挑战。 传统预测器在分布偏移下性能下降,跨配置切换期间NMSE上升了37.5%。 本工作通过提出一种基于损失正则化的持续学习框架来解决信道预测中的灾难性遗忘问题。 该方法在标准训练目标中加入惩罚项,有选择地保留对先前配置至关重要的网络参数,同时使模型能够适应新环境。 研究了两种显著的正则化策略:弹性权重整合(EWC)和突触智能(SI)。 在3GPP场景和多种架构中,SI将高信噪比下的NMSE底限降低了高达1.8 dB($\approx$32--34%),而EWC实现了高达1.4 dB($\approx$17--28%)。 值得注意的是,标准EWC会引入$\mathcal{O}(MK)$复杂度(存储$M$个Fisher对角条目和对应参数快照跨越$K$个任务),除非进行整合,而SI保持$\mathcal{O}(M)$内存复杂度(存储$M$个模型参数),与任务序列长度无关,使其适用于资源受限的无线基础设施。

Modern wireless networks face critical challenges when mobile users traverse heterogeneous network configurations with varying antenna layouts, carrier frequencies, and scattering statistics. Traditional predictors degrade under distribution shift, with NMSE rising by 37.5\% during cross-configuration handovers. This work addresses catastrophic forgetting in channel prediction by proposing a continual learning framework based on loss regularization. The approach augments standard training objectives with penalty terms that selectively preserve network parameters essential for previous configurations while enabling adaptation to new environments. Two prominent regularization strategies are investigated: Elastic Weight Consolidation (EWC) and Synaptic Intelligence (SI). Across 3GPP scenarios and multiple architectures, SI lowers the high-SNR NMSE floor by up to 1.8 dB ($\approx$32--34\%), while EWC achieves up to 1.4 dB ($\approx$17--28\%). Notably, standard EWC incurs $\mathcal{O}(MK)$ complexity (storing $M$ Fisher diagonal entries and corresponding parameter snapshots across $K$ tasks) unless consolidated, whereas SI maintains $\mathcal{O}(M)$ memory complexity (storing $M$ model parameters), independent of task sequence length, making it suitable for resource-constrained wireless infrastructure

交叉提交 (展示 2 之 2 条目 )

[4] arXiv:2509.14292 (交叉列表自 cs.OS) [中文pdf, pdf, html, 其他]
标题: 通过操作系统协同设计应对无服务器冷启动
标题: Taming Serverless Cold Starts Through OS Co-Design
Ben Holmes, Baltasar Dinis, Lana Honcharuk, Joshua Fried, Adam Belay
主题: 操作系统 (cs.OS) ; 分布式、并行与集群计算 (cs.DC)

无服务器计算承诺细粒度的弹性与操作简便性,引发了工业界和学术界的广泛关注。 然而,这种承诺受到冷启动问题的削弱,其中在一段不活动后调用函数会触发昂贵的初始化,才能开始任何工作。 即使在当今高速存储的情况下,普遍观点认为实现亚毫秒级冷启动需要保持状态驻留在内存中。 本文挑战了这一假设。 我们对现有快照/恢复机制的分析表明,操作系统层面的限制,而不是存储速度,才是从磁盘进行超快速恢复的真实障碍。 这些限制迫使当前系统要么以昂贵的方式分段恢复状态,要么捕获过多的状态,导致恢复时间更长和性能不可预测。 此外,操作系统提供的当前内存原语使得可靠地将数据加载到内存并避免昂贵的运行时页面错误变得困难。 为了克服这些障碍,我们提出了Spice,一个专门为无服务器快照/恢复设计的执行引擎。 Spice直接与操作系统集成,在不进行昂贵重放的情况下恢复内核状态,并引入了用于高效可靠恢复内存映射的专用原语。 结果是, Spice在从磁盘进行冷恢复时实现了接近热启动的性能,与最先进的基于进程的系统相比延迟减少了14.9倍,与基于虚拟机的系统相比减少了10.6倍。 这证明了在无服务器计算中,高性能和内存弹性不再需要做出权衡。

Serverless computing promises fine-grained elasticity and operational simplicity, fueling widespread interest from both industry and academia. Yet this promise is undercut by the cold setart problem, where invoking a function after a period of inactivity triggers costly initialization before any work can begin. Even with today's high-speed storage, the prevailing view is that achieving sub-millisecond cold starts requires keeping state resident in memory. This paper challenges that assumption. Our analysis of existing snapshot/restore mechanisms show that OS-level limitations, not storage speed, are the real barrier to ultra-fast restores from disk. These limitations force current systems to either restore state piecemeal in a costly manner or capture too much state, leading to longer restore times and unpredictable performance. Futhermore, current memory primitives exposed by the OS make it difficult to reliably fetch data into memory and avoid costly runtime page faults. To overcome these barriers, we present Spice, an execution engine purpose-built for serverless snapshot/restore. Spice integrates directly with the OS to restore kernel state without costly replay and introduces dedicated primitives for restoring memory mappings efficiently and reliably. As a result, Spice delivers near-warm performance on cold restores from disk, reducing latency by up to 14.9x over state-of-the-art process-based systems and 10.6x over VM-based systems. This proves that high performance and memory elasticity no longer need to be a trade-off in serverless computing.

[5] arXiv:2509.14470 (交叉列表自 quant-ph) [中文pdf, pdf, html, 其他]
标题: 使用量子框架扩展混合量子-HPC应用
标题: Scaling Hybrid Quantum-HPC Applications with the Quantum Framework
Srikar Chundury, Amir Shehata, Seongmin Kim, Muralikrishnan Gopalakrishnan Meena, Chao Lu, Kalyana Gottiparthi, Eduardo Antonio Coello Perez, Frank Mueller, In-Saeng Suh
评论: 9页,5图
主题: 量子物理 (quant-ph) ; 分布式、并行与集群计算 (cs.DC)

混合量子-高性能计算(Q-HPC)工作流正在成为在当前的噪声中等规模量子(NISQ)设备上大规模运行量子应用的关键策略。 这些工作流必须在各种模拟器和硬件后端之间无缝运行,因为没有单一的模拟器能在所有电路类型上提供最佳性能。 模拟效率强烈依赖于电路结构、纠缠和深度,这使得灵活且与后端无关的执行模型对于公平基准测试、明智的平台选择以及最终识别量子优势机会至关重要。 在本工作中,我们扩展了量子框架(QFw),这是一个模块化且具备HPC意识的协调层,以在统一接口下集成多个本地后端(Qiskit Aer、NWQ-Sim、QTensor和TN-QVM)以及基于云的量子后端(IonQ)。 通过这种集成,我们执行了一些非变分以及变分的工作负载。 结果突显了工作负载特定的后端优势:虽然Qiskit Aer的矩阵乘积状态在大型伊辛模型中表现出色,但NWQ-Sim不仅在大规模纠缠和哈密顿量方面领先,而且在分布式方式下对优化问题的并发子问题执行展示了优势。 这些发现表明,与模拟器无关且具备HPC意识的协调是实现可扩展、可重复和可移植的Q-HPC生态系统的一种实际路径,从而加速向展示量子优势的进展。

Hybrid quantum-high performance computing (Q-HPC) workflows are emerging as a key strategy for running quantum applications at scale in current noisy intermediate-scale quantum (NISQ) devices. These workflows must operate seamlessly across diverse simulators and hardware backends since no single simulator offers the best performance for every circuit type. Simulation efficiency depends strongly on circuit structure, entanglement, and depth, making a flexible and backend-agnostic execution model essential for fair benchmarking, informed platform selection, and ultimately the identification of quantum advantage opportunities. In this work, we extend the Quantum Framework (QFw), a modular and HPC-aware orchestration layer, to integrate multiple local backends (Qiskit Aer, NWQ-Sim, QTensor, and TN-QVM) and a cloud-based quantum backend (IonQ) under a unified interface. Using this integration, we execute a number of non-variational as well as variational workloads. The results highlight workload-specific backend advantages: while Qiskit Aer's matrix product state excels for large Ising models, NWQ-Sim not only leads on large-scale entanglement and Hamiltonian but also shows the benefits of concurrent subproblem execution in a distributed manner for optimization problems. These findings demonstrate that simulator-agnostic, HPC-aware orchestration is a practical path toward scalable, reproducible, and portable Q-HPC ecosystems, thereby accelerating progress toward demonstrating quantum advantage.

替换提交 (展示 5 之 5 条目 )

[6] arXiv:2502.09922 (替换) [中文pdf, pdf, html, 其他]
标题: λScale:实现无服务器大语言模型推理的快速扩展
标题: λScale: Enabling Fast Scaling for Serverless Large Language Model Inference
Minchen Yu, Rui Yang, Chaobo Jia, Zhaoyuan Su, Sheng Yao, Tingfeng Lan, Yuchen Yang, Yue Cheng, Wei Wang, Ao Wang, Ruichuan Chen
主题: 分布式、并行与集群计算 (cs.DC)

无服务器计算已成为基于云的模型推理的有吸引力的解决方案。 然而,随着现代大型语言模型(LLMs)的规模持续增长,现有的无服务器平台常常面临显著的模型启动开销。 这在高效扩展模型实例以适应现实世界推理服务中常见的动态、突发性工作负载方面构成了重大挑战。 在本文中,我们介绍了{\lambda }Scale,一种高效的无服务器推理系统,以实现快速模型扩展。 {\lambda }Scale 的核心思想是利用 GPU 节点之间的高速 RDMA 网络进行快速模型多播,同时在模型传输期间启用分布式推理执行——称为“边加载边执行”。 {\lambda }Scale 提出了一种高效的模型扩展方案,{\lambda }Pipe,该方案支持自适应模型多播,并在接收节点之间动态构建执行流水线以进行协作式分布式推理。 此外,{\lambda }Scale 支持在 GPU 和主机内存之间的高效模型管理,允许在不同存储层级之间快速扩展模型。 评估结果表明,{\lambda }Scale 实现了快速模型扩展并有效处理了负载峰值,在真实世界的 LLM 推理轨迹上相比最先进的解决方案实现了高达 5 倍的尾部延迟改进和 31.3% 的成本降低。

Serverless computing has emerged as a compelling solution for cloud-based model inference. However, as modern large language models (LLMs) continue to grow in size, existing serverless platforms often face substantial model startup overhead. This poses a significant challenge in efficiently scaling model instances to accommodate dynamic, bursty workloads commonly observed in real-world inference services. In this paper, we introduce {\lambda}Scale, an efficient serverless inference system to achieve fast model scaling. The key idea behind {\lambda}Scale is to leverage high-speed RDMA networks between GPU nodes for fast model multicast, while enabling distributed inference execution during model transmission -- referred to as "execute-while-load". {\lambda}Scale proposes an efficient model scaling scheme, {\lambda}Pipe, which supports adaptive model multicast and dynamically constructs execution pipelines across receiving nodes for collaborative, distributed inference. Additionally, {\lambda}Scale supports efficient model management across GPU and host memory, allowing fast scaling for models across different storage tiers. Evaluation results show that {\lambda}Scale enables fast model scaling and effectively handles load spikes, achieving up to 5x tail-latency improvement and 31.3% cost reduction compared to state-of-the-art solutions on real-world LLM inference traces.

[7] arXiv:2505.07452 (替换) [中文pdf, pdf, html, 其他]
标题: SwarmSearch:具有自筹资金经济的去中心化搜索引擎
标题: SwarmSearch: Decentralized Search Engine with Self-Funding Economy
Marcel Gregoriadis, Rowdy Chotkan, Petru Neague, Johan Pouwelse
评论: 提交以供可能发表
期刊参考: M. Gregoriadis, R. M. Chotkan, P. Neague 和 J. Pouwelse,“SwarmSearch:具有自我资助经济的去中心化搜索引擎”,2025年IEEE第50届本地计算机网络会议(LCN),澳大利亚悉尼,2025年,第1-10页。
主题: 分布式、并行与集群计算 (cs.DC)

集中式搜索引擎控制我们看到、阅读、相信和投票的内容。 因此,它们引发了关于信息控制、审查和偏见的担忧。 去中心化搜索引擎为这一问题提供了解决方案,但其采用受到质量较差和缺乏自给自足经济框架的阻碍。 我们提出了SwarmSearch,一个完全去中心化的、基于人工智能的搜索引擎,具有自筹资金的架构。 我们的系统设计用于在去中心化文件共享软件Tribler中部署。 SwarmSearch结合了基于志愿者的与盈利驱动的机制,以促进资源的隐性市场。 采用最先进的基于人工智能的检索和相关性排名,我们还旨在缩小去中心化搜索与集中式替代方案之间的质量差距。 我们的系统在存在50%恶意节点的情况下表现出高的检索准确性和稳健性。

Centralized search engines control what we see, read, believe, and vote. Consequently, they raise concerns over information control, censorship, and bias. Decentralized search engines offer a remedy to this problem, but their adoption has been hindered by their inferior quality and lack of a self-sustaining economic framework. We present SwarmSearch, a fully decentralized, AI-powered search engine with a self-funding architecture. Our system is designed for deployment within the decentralized file-sharing software Tribler. SwarmSearch integrates volunteer-based with profit-driven mechanisms to foster an implicit marketplace for resources. Employing the state-of-the-art of AI-based retrieval and relevance ranking, we also aim to close the quality gap between decentralized search and centralized alternatives. Our system demonstrates high retrieval accuracy while showing robustness in the presence of 50% adversarial nodes.

[8] arXiv:2508.06972 (替换) [中文pdf, pdf, html, 其他]
标题: DSperse:一种用于零知识机器学习中目标验证的框架
标题: DSperse: A Framework for Targeted Verification in Zero-Knowledge Machine Learning
Dan Ivanov, Tristan Freiberg, Shirin Shahabi, Jonathan Gold, Haruna Isah
评论: 12页,8图和10表
主题: 人工智能 (cs.AI) ; 密码学与安全 (cs.CR) ; 分布式、并行与集群计算 (cs.DC) ; 机器学习 (cs.LG)

DSperse是一个模块化框架,用于分布式机器学习推理,并具有战略加密验证。 在分布式零知识机器学习的新兴范式中,DSperse通过启用对战略性选择的子计算的针对性验证,避免了全模型电路化的高成本和僵硬性。 这些可验证的片段,或称为“切片”,可以覆盖推理管道的部分或全部,通过审计、复制或经济激励来强制全局一致性。 这种架构支持一种实际的信任最小化形式,将零知识证明定位到它们提供最大价值的组件中。 我们使用多个证明系统评估DSperse,并报告在切片和未切片配置下的内存使用、运行时间和电路行为的经验结果。 通过允许证明边界灵活地与模型的逻辑结构对齐,DSperse支持适合不同部署需求的可扩展、有针对性的验证策略。

DSperse is a modular framework for distributed machine learning inference with strategic cryptographic verification. Operating within the emerging paradigm of distributed zero-knowledge machine learning, DSperse avoids the high cost and rigidity of full-model circuitization by enabling targeted verification of strategically chosen subcomputations. These verifiable segments, or "slices", may cover part or all of the inference pipeline, with global consistency enforced through audit, replication, or economic incentives. This architecture supports a pragmatic form of trust minimization, localizing zero-knowledge proofs to the components where they provide the greatest value. We evaluate DSperse using multiple proving systems and report empirical results on memory usage, runtime, and circuit behavior under sliced and unsliced configurations. By allowing proof boundaries to align flexibly with the model's logical structure, DSperse supports scalable, targeted verification strategies suited to diverse deployment needs.

[9] arXiv:2508.08552 (替换) [中文pdf, pdf, html, 其他]
标题: 资源感知的异构集成联邦学习聚合与稀疏化
标题: Resource-Aware Aggregation and Sparsification in Heterogeneous Ensemble Federated Learning
Keumseo Ryum, Jinu Gong, Joonhyuk Kang
评论: 4页
主题: 机器学习 (cs.LG) ; 分布式、并行与集群计算 (cs.DC)

联邦学习(FL)允许使用私有客户端数据进行分布式训练,但在现实通信场景下,系统异质性阻碍了其收敛。 大多数解决系统异质性的FL方案利用全局剪枝或集成蒸馏,但常常忽略了通信效率所需的典型约束。 同时,深度集成可以聚合单独训练模型的预测以提高性能,但当前基于集成的FL方法在充分捕捉模型预测多样性方面存在不足。 在本工作中,我们提出了 \textbf{SHEFL},一个适用于具有不同计算能力客户端的全局集成联邦学习框架。 我们根据客户端可用资源分配不同数量的全局模型。 我们引入了一种新的聚合方案,可以减轻客户端之间的训练偏差,并动态调整客户端之间的稀疏化比例,以减少训练深度集成的计算负担。 大量实验表明,我们的方法有效解决了计算异质性问题,与现有方法相比显著提高了准确性和稳定性。

Federated learning (FL) enables distributed training with private client data, but its convergence is hindered by system heterogeneity under realistic communication scenarios. Most FL schemes addressing system heterogeneity utilize global pruning or ensemble distillation, yet often overlook typical constraints required for communication efficiency. Meanwhile, deep ensembles can aggregate predictions from individually trained models to improve performance, but current ensemble-based FL methods fall short in fully capturing diversity of model predictions. In this work, we propose \textbf{SHEFL}, a global ensemble-based FL framework suited for clients with diverse computational capacities. We allocate different numbers of global models to clients based on their available resources. We introduce a novel aggregation scheme that mitigates the training bias between clients and dynamically adjusts the sparsification ratio across clients to reduce the computational burden of training deep ensembles. Extensive experiments demonstrate that our method effectively addresses computational heterogeneity, significantly improving accuracy and stability compared to existing approaches.

[10] arXiv:2509.14098 (替换) [中文pdf, pdf, html, 其他]
标题: 基于接近中心性的量子模拟电路分割器
标题: A Closeness Centrality-based Circuit Partitioner for Quantum Simulations
Doru Thom Popovici, Harlin Lee, Mauro Del Ben, Naoki Yoshioka, Nobuyasu Ito, Katherine Klymko, Daan Camps, Anastasiia Butko
评论: 14页,10图
主题: 量子物理 (quant-ph) ; 分布式、并行与集群计算 (cs.DC)

在高性能计算(HPC)系统上模拟量子电路(QC)已成为基准算法和探索大规模量子计算潜力的重要方法,尽管当前量子硬件存在局限性。 然而,这些模拟通常需要大量的资源,这需要使用包含数千个计算节点和大内存占用的大型集群。 在本工作中,我们引入了一个端到端框架,该框架提供了一种高效的大型QC分区方案,并配备了一个灵活的代码生成器,以提供一种便携式解决方案,最大限度地减少计算节点之间的数据移动。 通过将量子态和电路的分布形式化为图问题,我们应用接近中心性来评估门的重要性,并设计了一种快速且可扩展的分区方法。 生成的分区被编译成高度优化的代码,在各种超级计算机上无缝运行,为量子算法模拟的性能和可扩展性提供了关键见解。

Simulating quantum circuits (QC) on high-performance computing (HPC) systems has become an essential method to benchmark algorithms and probe the potential of large-scale quantum computation despite the limitations of current quantum hardware. However, these simulations often require large amounts of resources, necessitating the use of large clusters with thousands of compute nodes and large memory footprints. In this work, we introduce an end-to-end framework that provides an efficient partitioning scheme for large-scale QCs alongside a flexible code generator to offer a portable solution that minimizes data movement between compute nodes. By formulating the distribution of quantum states and circuits as a graph problem, we apply closeness centrality to assess gate importance and design a fast, scalable partitioning method. The resulting partitions are compiled into highly optimized codes that run seamlessly on a wide range of supercomputers, providing critical insights into the performance and scalability of quantum algorithm simulations.

总共 10 条目
显示最多 2000 每页条目: 较少 | 更多 | 所有
  • 关于
  • 帮助
  • contact arXivClick here to contact arXiv 联系
  • 订阅 arXiv 邮件列表点击这里订阅 订阅
  • 版权
  • 隐私政策
  • 网络无障碍帮助
  • arXiv 运营状态
    通过...获取状态通知 email 或者 slack

京ICP备2025123034号