Sebastian Reimann

2025

Using Large Language Models to Perform MIPVU-Inspired Automatic Metaphor Detection
Sebastian Reimann | Tatjana Scheffler
Proceedings of the 2nd Workshop on Analogical Abstraction in Cognition, Perception, and Language (Analogy-Angle II)

Automatic metaphor detection has often been inspired by linguistic procedures for manual metaphor identification. In this work, we test how closely the steps required by the Metaphor Identification Procedure VU Amsterdam (MIPVU) can be translated into prompts for generative Large Language Models (LLMs) and how well three commonly used LLMs are able to perform these steps. We find that while the procedure itself can be modeled with only a few compromises, neither language model is able to match the performance of supervised, fine-tuned methods for metaphor detection. All models failed to sufficiently filter out literal examples, where no contrast between the contextual and a more basic or concrete meaning was present. Both versions of LLaMa however signaled interesting potentials in detecting similarities between literal and metaphoric meanings that may be exploited in further work.

2024

pdf bib abs

Applying Transfer Learning to German Metaphor Prediction
Maria Berger | Nieke Kiwitt | Sebastian Reimann
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This paper presents results in transfer-learning metaphor recognition in German. Starting from an English language corpus annotated for metaphor at the sentence level, and its machine-translation to German, we annotate 1000 sentences of the German part to use it as a Gold standard for two different metaphor prediction setups: i) a sequence labeling set-up (on the token-level), and ii) a classification (based on sentences) setup. We test two transfer leaning approaches: i) a group of transformer models, and ii) a technique that utilizes bilingual embeddings together with an RNN classifier. We find out that the transformer models do moderately in a zero-shot scenario (up to 61% F1 for classification) and the embeddings approaches do not even beat the guessing baseline (36% F1 for classification). We use our Gold data to fine-tune the classification tasks on target-language data achieving up to 90% F1 with both, the multilingual BERT and the bilingual embeddings. We also publish the annotated bilingual corpus.

pdf bib abs

When is a Metaphor Actually Novel? Annotating Metaphor Novelty in the Context of Automatic Metaphor Detection
Sebastian Reimann | Tatjana Scheffler
Proceedings of the 18th Linguistic Annotation Workshop (LAW-XVIII)

We present an in-depth analysis of metaphor novelty, a relatively overlooked phenomenon in NLP. Novel metaphors have been analyzed via scores derived from crowdsourcing in NLP, while in theoretical work they are often defined by comparison to senses in dictionary entries. We reannotate metaphorically used words in the large VU Amsterdam Metaphor Corpus based on whether their metaphoric meaning is present in the dictionary. Based on this, we find that perceived metaphor novelty often clash with the dictionary based definition. We use the new labels to evaluate the performance of state-of-the-art language models for automatic metaphor detection and notice that novel metaphors according to our dictionary-based definition are easier to identify than novel metaphors according to crowd-sourced novelty scores. In a subsequent analysis, we study the correlation between high novelty scores and word frequencies in the pretraining and finetuning corpora, as well as potential problems with rare words for pre-trained language models. In line with previous works, we find a negative correlation between word frequency in the training data and novelty scores and we link these aspects to problems with the tokenization of BERT and RoBERTa.

pdf bib abs

Metaphors in Online Religious Communication: A Detailed Dataset and Cross-Genre Metaphor Detection
Sebastian Reimann | Tatjana Scheffler
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

We present the first dataset of fine-grained metaphor annotations for texts from online religious communication, where figurative language plays a particularly important role. In addition to binary labels, metaphors are annotated for deliberateness, that is, whether they are communicated explicitly as metaphors, and we provide indicators for such deliberate use. We further show that cross-genre transfer metaphor detection (from the widely used VUA corpus to our Reddit data) leads to a drop in performance due to the shift in topic and metaphors from source domains that did not occur in the training data. We solve this issue by adding a small amount of in-genre data in fine-tuning, leading to notable performance increases of more than 5 points in F1. Moreover, religious communication has the tendency for extended metaphorical comparisons, which are problematic for current metaphor detection systems. Adding in-genre data had slightly positive effects but we argue that to solve this, architectures that consider larger spans of context are necessary.

2022

pdf bib abs

Cause and Effect in Governmental Reports: Two Data Sets for Causality Detection in Swedish
Luise Dürlich | Sebastian Reimann | Gustav Finnveden | Joakim Nivre | Sara Stymne
Proceedings of the LREC 2022 workshop on Natural Language Processing for Political Sciences

Causality detection is the task of extracting information about causal relations from text. It is an important task for different types of document analysis, including political impact assessment. We present two new data sets for causality detection in Swedish. The first data set is annotated with binary relevance judgments, indicating whether a sentence contains causality information or not. In the second data set, sentence pairs are ranked for relevance with respect to a causality query, containing a specific hypothesized cause and/or effect. Both data sets are carefully curated and mainly intended for use as test data. We describe the data sets and their annotation, including detailed annotation guidelines. In addition, we present pilot experiments on cross-lingual zero-shot and few-shot causality detection, using training data from English and German.