Yixuan Li


2024

pdf bib
Joint Annotation of Morphology and Syntax in Dependency Treebanks
Bruno Guillaume | Kim Gerdes | Kirian Guiller | Sylvain Kahane | Yixuan Li
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In this paper, we compare different ways to annotate both syntactic and morphological relations in a dependency treebank and we propose new formats we call mSUD and mUD, compatible with the Universal Dependencies (UD) schema for syntactic treebanks. We emphasize mSUD rather than mUD, the former being based on distributional criteria for the choice of the head of any combination, which allow us to clearly encode the internal structure of a word, that is, the derivational path. We investigate different problems posed by a morph-based annotation, concerning tokenization, choice of the head of a morph combination, relations between morphs, additional features needed, such as the token type differentiating roots and derivational and inflectional affixes. We show how our annotation schema can be applied to different languages from polysynthetic languages such as Yupik to isolating languages such as Chinese.

2023

pdf bib
A Critical Analysis of Document Out-of-Distribution Detection
Jiuxiang Gu | Yifei Ming | Yi Zhou | Jason Kuen | Vlad Morariu | Handong Zhao | Ruiyi Zhang | Nikolaos Barmpalios | Anqi Liu | Yixuan Li | Tong Sun | Ani Nenkova
Findings of the Association for Computational Linguistics: EMNLP 2023

Large-scale pre-training is widely used in recent document understanding tasks. During deployment, one may expect that models should trigger a conservative fallback policy when encountering out-of-distribution (OOD) samples, which highlights the importance of OOD detection. However, most existing OOD detection methods focus on single-modal inputs such as images or texts. While documents are multi-modal in nature, it is underexplored if and how multi-modal information in documents can be exploited for OOD detection. In this work, we first provide a systematic and in-depth analysis on OOD detection for document understanding models. We study the effects of model modality, pre-training, and fine-tuning across various types of OOD inputs. In particular, we find that spatial information is critical for document OOD detection. To better exploit spatial information, we propose a spatial-aware adapter, which serves as a parameter-efficient add-on module to adapt transformer-based language models to the document domain. Extensive experiments show that adding the spatial-aware adapter significantly improves the OOD detection performance compared to directly using the language model and achieves superior performance compared to competitive baselines.

pdf bib
Is Fine-tuning Needed? Pre-trained Language Models Are Near Perfect for Out-of-Domain Detection
Rheeya Uppaal | Junjie Hu | Yixuan Li
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Out-of-distribution (OOD) detection is a critical task for reliable predictions over text. Fine-tuning with pre-trained language models has been a de facto procedure to derive OOD detectors with respect to in-distribution (ID) data. Despite its common use, the understanding of the role of fine-tuning and its necessity for OOD detection is largely unexplored. In this paper, we raise the question: is fine-tuning necessary for OOD detection? We present a study investigating the efficacy of directly leveraging pre-trained language models for OOD detection, without any model fine-tuning on the ID data. We compare the approach with several competitive fine-tuning objectives, and offer new insights under various types of distributional shifts. Extensive experiments demonstrate near-perfect OOD detection performance (with 0% FPR95 in many cases), strongly outperforming the fine-tuned counterpart.

2019

pdf bib
Character-level Annotation for Chinese Surface-Syntactic Universal Dependencies
Yixuan Li | Gerdes Kim | Dong Chuanming
Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)