Layer-wise Regularized Dropout for Neural Language Models

Shiwen Ni, Min Yang, Ruifeng Xu, Chengming Li, Xiping Xiping Hu


Abstract
Among the various pre-trained neural language models that are popular today, dropout is already an indispensable regularization technique. To solve the inconsistency between training and inference caused by the randomness of dropout, some studies use consistency training to regularize dropout at the output layer. In this paper, we propose a novel Layer-wise Regularized Dropout (LR-Drop), which is specially designed for Transformer-based Language models. Specifically, LR-Drop layer-wise regularizes each Transformer layer using the consistency training strategy. Each training sample passes through the two siamese sub-models sampled by dropout, and then LR-Drop forces the hidden states, multi-head attention matrices, and output distribution of the two siamese sub-models to be consistent. The proposed LR-Drop can be regarded as a “self-distillation” framework, in which each sub-model generated by dropout is the other’s “teacher” model and “student” model. Through extensive experiments on 8 natural language understanding datasets, 6 neural machine translation datasets, and 1 abstractive summarization dataset (a total of 15 datasets), we show that LR-Drop achieves superior performances, including state-of-the-art results.
Anthology ID:
2024.lrec-main.891
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
10208–10218
Language:
URL:
https://aclanthology.org/2024.lrec-main.891
DOI:
Bibkey:
Cite (ACL):
Shiwen Ni, Min Yang, Ruifeng Xu, Chengming Li, and Xiping Xiping Hu. 2024. Layer-wise Regularized Dropout for Neural Language Models. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 10208–10218, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Layer-wise Regularized Dropout for Neural Language Models (Ni et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.891.pdf