Training-free Neural Architecture Search for RNNs and Transformers

Serianni, Aaron; Kalita, Jugal

计算机科学 > 机器学习

arXiv:2306.00288v1 (cs)

[提交于 2023年6月1日 ]

标题：无需训练的神经网络架构搜索用于RNN和Transformer

标题： Training-free Neural Architecture Search for RNNs and Transformers

Authors:Aaron Serianni (Princeton University), Jugal Kalita (University of Colorado at Colorado Springs)

摘要：神经架构搜索（NAS）使得新的和有效的神经网络架构的自动创建成为可能，为手动设计复杂架构的繁琐过程提供了一种替代方案。然而，传统的NAS算法速度慢且需要大量的计算资源。最近的研究已经探讨了用于图像分类架构的无训练NAS度量，大大加快了搜索算法的速度。在本文中，我们研究了用于循环神经网络（RNN）和基于BERT的变压器架构的无训练NAS度量，针对语言建模任务。首先，我们开发了一个新的无训练度量，名为隐藏协方差，它能够预测RNN架构的训练后性能，并显著优于现有的无训练度量。我们在NAS-Bench-NLP基准上对隐藏协方差度量的有效性进行了实验评估。其次，我们发现当前的变压器架构搜索空间范式并不适合无训练神经架构搜索。相反，一个简单的定性分析可以有效地将搜索空间缩小到表现最佳的架构。这一结论基于我们对现有无训练度量以及从最近的变压器剪枝文献中开发的新度量的调查，并在我们自己的训练过的BERT架构基准上进行了评估。最终，我们的分析表明，架构搜索空间和无训练度量必须共同开发，才能实现有效结果。

摘要： Neural architecture search (NAS) has allowed for the automatic creation of new and effective neural network architectures, offering an alternative to the laborious process of manually designing complex architectures. However, traditional NAS algorithms are slow and require immense amounts of computing power. Recent research has investigated training-free NAS metrics for image classification architectures, drastically speeding up search algorithms. In this paper, we investigate training-free NAS metrics for recurrent neural network (RNN) and BERT-based transformer architectures, targeted towards language modeling tasks. First, we develop a new training-free metric, named hidden covariance, that predicts the trained performance of an RNN architecture and significantly outperforms existing training-free metrics. We experimentally evaluate the effectiveness of the hidden covariance metric on the NAS-Bench-NLP benchmark. Second, we find that the current search space paradigm for transformer architectures is not optimized for training-free neural architecture search. Instead, a simple qualitative analysis can effectively shrink the search space to the best performing architectures. This conclusion is based on our investigation of existing training-free metrics and new metrics developed from recent transformer pruning literature, evaluated on our own benchmark of trained BERT architectures. Ultimately, our analysis shows that the architecture search space and the training-free metric must be developed together in order to achieve effective results.

评论：	代码可在 https://github.com/aaronserianni/training-free-nas 获取
主题：	机器学习 (cs.LG) ; 人工智能 (cs.AI); 计算与语言 (cs.CL)
引用方式：	arXiv:2306.00288 [cs.LG]
	(或者 arXiv:2306.00288v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2306.00288

提交历史

来自： Aaron Serianni [查看电子邮件]
[v1] 星期四， 2023 年 6 月 1 日 02:06:13 UTC (6,937 KB)

计算机科学 > 机器学习

标题：无需训练的神经网络架构搜索用于RNN和Transformer

标题： Training-free Neural Architecture Search for RNNs and Transformers

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 无需训练的神经网络架构搜索用于RNN和Transformer 显示英文标题

标题： Training-free Neural Architecture Search for RNNs and Transformers

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：无需训练的神经网络架构搜索用于RNN和Transformer