gpuRDF2vec -- Scalable GPU-based RDF2vec

Böckling, Martin; Paulheim, Heiko

计算机科学 > 人工智能

arXiv:2508.01073 (cs)

[提交于 2025年8月1日 ]

标题： gpuRDF2vec -- 基于GPU的可扩展RDF2vec

标题： gpuRDF2vec -- Scalable GPU-based RDF2vec

Authors:Martin Böckling, Heiko Paulheim

摘要：在网页规模上生成知识图谱（KG）嵌入仍然具有挑战性。在现有技术中，RDF2vec结合了有效性与强大的可扩展性。我们提出了gpuRDF2vec，一个开源库，利用现代GPU并支持多节点执行，以加速RDF2vec管道的每个阶段。在合成生成的图和真实世界基准上的大量实验表明，gpuRDF2vec相比目前最快的替代方案i.e. jRDF2vec实现了显著的加速。在单节点设置中，我们的行走提取阶段仅在大型/密集图上使用随机行走就显著优于pyRDF2vec、SparkKGML和jRDF2vec，并且能够很好地扩展到更长的行走，这通常会导致更好的嵌入质量。我们的gpuRDF2vec实现使从业者和研究人员能够在实际时间预算内对大规模图进行高质量KG嵌入训练，并基于Pytorch Lightning进行可扩展的word2vec实现。

摘要： Generating Knowledge Graph (KG) embeddings at web scale remains challenging. Among existing techniques, RDF2vec combines effectiveness with strong scalability. We present gpuRDF2vec, an open source library that harnesses modern GPUs and supports multi-node execution to accelerate every stage of the RDF2vec pipeline. Extensive experiments on both synthetically generated graphs and real-world benchmarks show that gpuRDF2vec achieves up to a substantial speedup over the currently fastest alternative, i.e., jRDF2vec. In a single-node setup, our walk-extraction phase alone outperforms pyRDF2vec, SparkKGML, and jRDF2vec by a substantial margin using random walks on large/ dense graphs, and scales very well to longer walks, which typically lead to better quality embeddings. Our implementation of gpuRDF2vec enables practitioners and researchers to train high-quality KG embeddings on large-scale graphs within practical time budgets and builds on top of Pytorch Lightning for the scalable word2vec implementation.

评论：	18页，ISWC 2025
主题：	人工智能 (cs.AI)
引用方式：	arXiv:2508.01073 [cs.AI]
	(或者 arXiv:2508.01073v1 [cs.AI] 对于此版本)
	https://doi.org/10.48550/arXiv.2508.01073

提交历史

来自： Martin Böckling [查看电子邮件]
[v1] 星期五， 2025 年 8 月 1 日 21:07:31 UTC (243 KB)

计算机科学 > 人工智能

标题： gpuRDF2vec -- 基于GPU的可扩展RDF2vec

标题： gpuRDF2vec -- Scalable GPU-based RDF2vec

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 人工智能

标题： gpuRDF2vec -- 基于GPU的可扩展RDF2vec 显示英文标题

标题： gpuRDF2vec -- Scalable GPU-based RDF2vec

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： gpuRDF2vec -- 基于GPU的可扩展RDF2vec