Philip Blair


2024

pdf bib
JRC-Names-Retrieval: A Standardized Benchmark for Name Search
Philip Blair | Kfir Bar
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Many systems rely on the ability to effectively search through databases of personal and organization entity names in multiple writing scripts. Despite this, there is a relative lack of research studying this problem in isolation. In this work, we discuss this problem in detail and support future research by publishing what we believe is the first comprehensive dataset designed for this task. Additionally, we present a number of baselines against which future work can be compared; among which, we describe a neural solution based on ByT5 (Xue et al. 2022) which demonstrates up to a 12% performance gain over preexisting baselines, indicating that there remains much room for improvement in this space.

2022

pdf bib
Improving Few-Shot Domain Transfer for Named Entity Disambiguation with Pattern Exploitation
Philip Blair | Kfir Bar
Findings of the Association for Computational Linguistics: EMNLP 2022

Named entity disambiguation (NED) is a critical subtask of entity linking, which seeks to connect knowledge base entities with textual mentions of those entities. Naturally, the performance of a model depends on the domain it was trained on; thus, reducing the amount of data required to train models is advantageous. In this work, we leverage recent research on pattern exploitation for NED and explore whether it can reduce the amount of data required for domain adaptation by reformulating the disambiguation task as a masked language modeling problem. Using ADAPET (Tam et al., 2021), which implements a new approach for few-shot learning using fine-tuned transformer-based language models, we produce an NED model which yields, without any sacrifice of in-domain accuracy, a 7% improvement in zero-shot cross-domain performance as evaluated on NEDMed, a new NED dataset of mental health news which we release with this work.