Dan Qu

2024

pdf bib abs
Meta-Adapter for Self-Supervised Speech Models: A Solution to Low-Resource Speech Recognition Challenges
Yaqi Chen | Hao Zhang | Xukui Yang | Wenlin Zhang | Dan Qu
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Self-supervised models have demonstrated remarkable performance in speech processing by learning latent representations from large amounts of unlabeled data. Although these models yield promising results on low-resource languages, the computational expense of fine-tuning all model parameters is prohibitively high. Adapters offer a solution by incorporating lightweight bottleneck structures into pre-trained models, enabling efficient parameter adaptation for downstream tasks. However, randomly initialized adapters often underperform in low-resource scenarios, limiting their applicability in low-resource languages. To address this issue, we develop the Meta-Adapter for self-supervised models to obtain meta-initialized parameters that facilitate quick adaptation to low-resource languages. Extensive experiments on the Common Voice and FLEURS datasets demonstrate the superior performance of Meta-Adapters on 12 low-resource languages spanning four different language families. Moreover, Meta-adapters show better generalization and extensibility than traditional pretraining methods.

2022

It is notoriously difficult to implement end-to-end speech translation (E2E-ST) model because of the task complexity and data scarcity. Existing techniques often attempt to carry out implicit knowledge transfer from machine translation (MT) to ST model by imposing various constraints. However, in this transfer scenario, a significant problem is that the performance of the MT will drop significantly and the final transfer effect is also restricted. In this article, we recommend Fine and Coarse Granularity Contrastive Learning (FCGCL), which conduct explicit knowledge transfer from MT to ST model. Specially, we ensure through multi granularity contrastive learning that inputs with similar semantic between different modalities are encoded closely in the shared semantic space while inputs with different semantics are kept apart. Experiments on the MuST-C datasets on all 8 languages and further analysis show that our method can effectively improve the E2E-ST performance and achieves an average BLEU of 29.0.

Co-authors

Zhen Li 1

Tong Niu 1

Dan Qu

2024

2022

Co-authors

Venues