Theodoros Rekatsinas


2024

pdf bib
Construction of Paired Knowledge Graph - Text Datasets Informed by Cyclic Evaluation
Ali Mousavi | Xin Zhan | He Bai | Peng Shi | Theodoros Rekatsinas | Benjamin Han | Yunyao Li | Jeffrey Pound | Joshua M. Susskind | Natalie Schluter | Ihab F. Ilyas | Navdeep Jaitly
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Datasets that pair Knowledge Graphs (KG) and text together (KG-T) can be used to train forward and reverse neural models that generate text from KG and vice versa. However models trained on datasets where KG and text pairs are not equivalent can suffer from more hallucination and poorer recall. In this paper, we verify this empirically by generating datasets with different levels of noise and find that noisier datasets do indeed lead to more hallucination. We argue that the ability of forward and reverse models trained on a dataset to cyclically regenerate source KG or text is a proxy for the equivalence between the KG and the text in the dataset. Using cyclic evaluation we find that manually created WebNLG is much better than automatically created TeKGen and T-REx. Informed by these observations, we construct a new, improved dataset called LAGRANGE using heuristics meant to improve equivalence between KG and text and show the impact of each of the heuristics on cyclic evaluation. We also construct two synthetic datasets using large language models (LLMs), and observe that these are conducive to models that perform significantly well on cyclic generation of text, but less so on cyclic generation of KGs, probably because of a lack of a consistent underlying ontology.

2020

pdf bib
Unsupervised Relation Extraction from Language Models using Constrained Cloze Completion
Ankur Goswami | Akshata Bhat | Hadar Ohana | Theodoros Rekatsinas
Findings of the Association for Computational Linguistics: EMNLP 2020

We show that state-of-the-art self-supervised language models can be readily used to extract relations from a corpus without the need to train a fine-tuned extractive head. We introduce RE-Flex, a simple framework that performs constrained cloze completion over pretrained language models to perform unsupervised relation extraction. RE-Flex uses contextual matching to ensure that language model predictions matches supporting evidence from the input corpus that is relevant to a target relation. We perform an extensive experimental study over multiple relation extraction benchmarks and demonstrate that RE-Flex outperforms competing unsupervised relation extraction methods based on pretrained language models by up to 27.8 F1 points compared to the next-best method. Our results show that constrained inference queries against a language model can enable accurate unsupervised relation extraction.