Roman Vashurin


2023

pdf bib
Efficient Out-of-Domain Detection for Sequence to Sequence Models
Artem Vazhentsev | Akim Tsvigun | Roman Vashurin | Sergey Petrakov | Daniil Vasilev | Maxim Panov | Alexander Panchenko | Artem Shelmanov
Findings of the Association for Computational Linguistics: ACL 2023

Sequence-to-sequence (seq2seq) models based on the Transformer architecture have become a ubiquitous tool applicable not only to classical text generation tasks such as machine translation and summarization but also to any other task where an answer can be represented in a form of a finite text fragment (e.g., question answering). However, when deploying a model in practice, we need not only high performance but also an ability to determine cases where the model is not applicable. Uncertainty estimation (UE) techniques provide a tool for identifying out-of-domain (OOD) input where the model is susceptible to errors. State-of-the-art UE methods for seq2seq models rely on computationally heavyweight and impractical deep ensembles. In this work, we perform an empirical investigation of various novel UE methods for large pre-trained seq2seq models T5 and BART on three tasks: machine translation, text summarization, and question answering. We apply computationally lightweight density-based UE methods to seq2seq models and show that they often outperform heavyweight deep ensembles on the task of OOD detection.

pdf bib
LM-Polygraph: Uncertainty Estimation for Language Models
Ekaterina Fadeeva | Roman Vashurin | Akim Tsvigun | Artem Vazhentsev | Sergey Petrakov | Kirill Fedyanin | Daniil Vasilev | Elizaveta Goncharova | Alexander Panchenko | Maxim Panov | Timothy Baldwin | Artem Shelmanov
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Recent advancements in the capabilities of large language models (LLMs) have paved the way for a myriad of groundbreaking applications in various fields. However, a significant challenge arises as these models often “hallucinate”, i.e., fabricate facts without providing users an apparent means to discern the veracity of their statements. Uncertainty estimation (UE) methods are one path to safer, more responsible, and more effective use of LLMs. However, to date, research on UE methods for LLMs has been focused primarily on theoretical rather than engineering contributions. In this work, we tackle this issue by introducing LM-Polygraph, a framework with implementations of a battery of state-of-the-art UE methods for LLMs in text generation tasks, with unified program interfaces in Python. Additionally, it introduces an extendable benchmark for consistent evaluation of UE techniques by researchers, and a demo web application that enriches the standard chat dialog with confidence scores, empowering end-users to discern unreliable responses. LM-Polygraph is compatible with the most recent LLMs, including BLOOMz, LLaMA-2, ChatGPT, and GPT-4, and is designed to support future releases of similarly-styled LMs.