EFTNAS: Searching for Efficient Language Models in First-Order Weight-Reordered Super-Networks

Juan Pablo Munoz, Yi Zheng, Nilesh Jain


Abstract
Transformer-based models have demonstrated outstanding performance in natural language processing (NLP) tasks and many other domains, e.g., computer vision. Depending on the size of these models, which have grown exponentially in the past few years, machine learning practitioners might be restricted from deploying them in resource-constrained environments. This paper discusses the compression of transformer-based models for multiple resource budgets. Integrating neural architecture search (NAS) and network pruning techniques, we effectively generate and train weight-sharing super-networks that contain efficient, high-performing, and compressed transformer-based models. A common challenge in NAS is the design of the search space, for which we propose a method to automatically obtain the boundaries of the search space and then derive the rest of the intermediate possible architectures using a first-order weight importance technique. The proposed end-to-end NAS solution, EFTNAS, discovers efficient subnetworks that have been compressed and fine-tuned for downstream NLP tasks. We demonstrate EFTNAS on the General Language Understanding Evaluation (GLUE) benchmark and the Stanford Question Answering Dataset (SQuAD), obtaining high-performing smaller models with a reduction of more than 5x in size without or with little degradation in performance.
Anthology ID:
2024.lrec-main.497
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
5596–5608
Language:
URL:
https://aclanthology.org/2024.lrec-main.497
DOI:
Bibkey:
Cite (ACL):
Juan Pablo Munoz, Yi Zheng, and Nilesh Jain. 2024. EFTNAS: Searching for Efficient Language Models in First-Order Weight-Reordered Super-Networks. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 5596–5608, Torino, Italia. ELRA and ICCL.
Cite (Informal):
EFTNAS: Searching for Efficient Language Models in First-Order Weight-Reordered Super-Networks (Munoz et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.497.pdf