Jonathan Kamp


2024

pdf bib
The Role of Syntactic Span Preferences in Post-Hoc Explanation Disagreement
Jonathan Kamp | Lisa Beinborn | Antske Fokkens
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Post-hoc explanation methods for transformer models tend to disagree with one another. Agreement is generally measured for a small subset of most important tokens. However, the presence of disagreement is often overlooked and the reasons for disagreement insufficiently examined, causing these methods to be utilised without adequate care. In this work, we explain disagreement from a linguistic perspective. We find that different methods systematically select different token types. Additionally, similar methods display similar linguistic preferences, which consequently affect agreement. By estimating the subsets of *k* most important tokens dynamically over sentences, we find that methods better agree on the syntactic span level. Especially the methods that agree the least with other methods benefit most from this dynamic subset estimation. We methodically explore the different settings of the dynamic *k* approach: we observe that its combination with spans yields favourable results in capturing important signals in the sentence, and propose an improved setting of global token importance.

2023

pdf bib
Dynamic Top-k Estimation Consolidates Disagreement between Feature Attribution Methods
Jonathan Kamp | Lisa Beinborn | Antske Fokkens
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Feature attribution scores are used for explaining the prediction of a text classifier to users by highlighting a k number of tokens. In this work, we propose a way to determine the number of optimal k tokens that should be displayed from sequential properties of the attribution scores. Our approach is dynamic across sentences, method-agnostic, and deals with sentence length bias. We compare agreement between multiple methods and humans on an NLI task, using fixed k and dynamic k. We find that perturbation-based methods and Vanilla Gradient exhibit highest agreement on most method–method and method–human agreement metrics with a static k. Their advantage over other methods disappears with dynamic ks which mainly improve Integrated Gradient and GradientXInput. To our knowledge, this is the first evidence that sequential properties of attribution scores are informative for consolidating attribution signals for human interpretation.

2022

pdf bib
Perturbations and Subpopulations for Testing Robustness in Token-Based Argument Unit Recognition
Jonathan Kamp | Lisa Beinborn | Antske Fokkens
Proceedings of the 9th Workshop on Argument Mining

Argument Unit Recognition and Classification aims at identifying argument units from text and classifying them as pro or against. One of the design choices that need to be made when developing systems for this task is what the unit of classification should be: segments of tokens or full sentences. Previous research suggests that fine-tuning language models on the token-level yields more robust results for classifying sentences compared to training on sentences directly. We reproduce the study that originally made this claim and further investigate what exactly token-based systems learned better compared to sentence-based ones. We develop systematic tests for analysing the behavioural differences between the token-based and the sentence-based system. Our results show that token-based models are generally more robust than sentence-based models both on manually perturbed examples and on specific subpopulations of the data.