Arianna Muti

2025

Probing Feminist Representations: A Study of Bias in LLMs and Word Embeddings
Arianna Muti | Elisa Bassignana | Emanuele Moscato | Debora Nozza
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)

pdf bib abs

MilaNLP@Multilingual Counterspeech Generation: Evaluating Translation and Background Knowledge Filtering
Emanuele Moscato | Arianna Muti | Debora Nozza
Proceedings of the First Workshop on Multilingual Counterspeech Generation

We describe our participation in the Multilingual Counterspeech Generation shared task, which aims to generate a counternarrative to counteract hate speech, given a hateful sentence and relevant background knowledge. Our team tested two different aspects: translating outputs from English vs generating outputs in the original languages and filtering pieces of the background knowledge provided vs including all the background knowledge. Our experiments show that filtering the background knowledge in the same prompt and leaving data in the original languages leads to more adherent counternarrative generations, except for Basque, where translating the output from English and filtering the background knowledge in a separate prompt yields better results. Our system ranked first in English, Italian, and Spanish and fourth in Basque.

pdf bib abs

The “r” in “woman” stands for rights. Auditing LLMs in Uncovering Social Dynamics in Implicit Misogyny
Arianna Muti | Chris Emmery | Debora Nozza | Alberto Barrón-Cedeño | Tommaso Caselli
Findings of the Association for Computational Linguistics: EMNLP 2025

Persistent societal biases like misogyny express themselves more often implicitly than through openly hostile language.However, previous misogyny studies have focused primarily on explicit language, overlooking these more subtle forms. We bridge this gap by examining implicit misogynistic expressions in English and Italian. First, we develop a taxonomy of social dynamics, i.e., the underlying communicative intent behind misogynistic statements in social media data. Then, we test the ability of nine LLMs to identify the social dynamics as a multi-label classification and text span selection: first LLMs must choose social dynamics given a prefixed list, then they have to explicitly identify the text spans that triggered their decisions. We also investigate the extent of using different learning settings: zero and few-shot, and prescriptive. Our analysis suggests that LLMs struggle to follow instructions and reason in all settings, mostly relying on semantic associations, recasting claims of emergent abilities.

pdf bib abs

Untangling Hate Speech Definitions: A Semantic Componential Analysis Across Cultures and Domains
Katerina Korre | Arianna Muti | Federico Ruggeri | Alberto Barrón-Cedeño
Findings of the Association for Computational Linguistics: NAACL 2025

Hate speech relies heavily on cultural influences, leading to varying individual interpretations. For that reason, we propose a Semantic Componential Analysis (SCA) framework for a cross-cultural and cross-domain analysis of hate speech definitions. We create the first dataset of hate speech definitions encompassing 493 definitions from more than 100 cultures, drawn from five key domains: online dictionaries, academic research, Wikipedia, legal texts, and online platforms. By decomposing these definitions into semantic components,our analysis reveals significant variation across definitions, yet many domains borrow definitions from one another without taking into account the target culture. We conduct zero-shot model experiments using our proposed dataset, employing three popular open-sourced LLMs to understand the impact of different definitions on hate speech detection. Our findings indicate that LLMs are sensitive to definitions: responses for hate speech detection change according to the complexity of definitions used in the prompt.

pdf bib abs

Detoxify-IT: An Italian Parallel Dataset for Text Detoxification
Viola De Ruvo | Arianna Muti | Daryna Dementieva | Debora Nozza
Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH)

Toxic language online poses growing challenges for content moderation. Detoxification, which rewrites toxic content into neutral form, offers a promising alternative but remains underexplored beyond English. We present Detoxify-IT, the first Italian dataset for this task, featuring toxic comments and their human-written neutral rewrites. Our experiments show that even limited fine-tuning on Italian data leads to notable improvements in content preservation and fluency compared to both multilingual models and LLMs used in zero-shot settings, underlining the need for language-specific resources. This work enables detoxification research in Italian and supports broader efforts toward safer, more inclusive online communication.

pdf bib abs

Blue-haired, misandriche, rabiata: Tracing the Connotation of ‘Feminist(s)’ Across Time, Languages and Domains
Arianna Muti | Sara Gemelli | Emanuele Moscato | Emilie Francis | Amanda Cercas Curry | Flor Miriam Plaza-del-Arco | Debora Nozza
Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH)

Understanding how words shift in meaning is crucial for analyzing societal attitudes.In this study, we investigate the contextual variations of the terms feminist, feminists along three axes: time, language, and domain.To this aim, we collect and release FEMME, a dataset comprising the occurrences of such terms from 2014 to 2023 in English, Italian and Swedish in Twitter, Reddit and Incel domains.Our methodology leverages frame analysis, as well as fine-tuning and LLMs. We find that the connotation of the plural form feminists is consistently more negative than feminist, indicating more hostility towards feminists as a collective, which often triggers greater societal pushback, reflecting broader patterns of group-based hostility and stigma. Across languages, we observe similar stereotypes towards feminists that often include body shaming, as well as accusations of hypocrisy and irrational behavior. In terms of time, we identify events that trigger a peak in terms of negative or positive connotation.As expected, the Incel spheres show predominantly negative connotations, while the general domains show mixed connotations.

2024

pdf bib abs

A Corpus for Sentence-Level Subjectivity Detection on English News Articles
Francesco Antici | Federico Ruggeri | Andrea Galassi | Katerina Korre | Arianna Muti | Alessandra Bardi | Alice Fedotova | Alberto Barrón-Cedeño
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

We develop novel annotation guidelines for sentence-level subjectivity detection, which are not limited to language-specific cues. We use our guidelines to collect NewsSD-ENG, a corpus of 638 objective and 411 subjective sentences extracted from English news articles on controversial topics. Our corpus paves the way for subjectivity detection in English and across other languages without relying on language-specific tools, such as lexicons or machine translation. We evaluate state-of-the-art multilingual transformer-based models on the task in mono-, multi-, and cross-language settings. For this purpose, we re-annotate an existing Italian corpus. We observe that models trained in the multilingual setting achieve the best performance on the task.

pdf bib abs

In this paper we report the development of our annotation methodology for the shared task FIGNEWS 2024. The objective of the shared task is to look into the layers of bias in how the war on Gaza is represented in media narrative. Our methodology follows the prescriptive paradigm, in which guidelines are detailed and refined through an iterative process in which edge cases are discussed and converged. Our IAA score (Krippendorff’s 𝛼) is 0.420, highlighting the challenging and subjective nature of the task. Our results show that 52% of posts were unbiased, 42% biased against Palestine, 5% biased against Israel, and 3% biased against both. 16% were unclear or not applicable.

pdf bib abs

The Challenges of Creating a Parallel Multilingual Hate Speech Corpus: An Exploration
Katerina Korre | Arianna Muti | Alberto Barrón-Cedeño
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Hate speech is infamously one of the most demanding topics in Natural Language Processing, as its multifacetedness is accompanied by a handful of challenges, such as multilinguality and cross-linguality. Hate speech has a subjective aspect that intensifies when referring to different cultures and different languages. In this respect, we design a pipeline that will help us explore the possibility of the creation of a parallel multilingual hate speech dataset, using machine translation. In this paper, we evaluate how/whether this is feasible by assessing the quality of the translations, calculating the toxicity levels of original and target texts, and calculating correlations between the newly obtained scores. Finally, we perform a qualitative analysis to gain further semantic and grammatical insights. With this pipeline we aim at exploring ways of filtering hate speech texts in order to parallelize sentences in multiple languages, examining the challenges of the task.

pdf bib abs

Language is Scary when Over-Analyzed: Unpacking Implied Misogynistic Reasoning with Argumentation Theory-Driven Prompts
Arianna Muti | Federico Ruggeri | Khalid Al Khatib | Alberto Barrón-Cedeño | Tommaso Caselli
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

We propose misogyny detection as an Argumentative Reasoning task and we investigate the capacity of large language models (LLMs) to understand the implicit reasoning used to convey misogyny in both Italian and English. The central aim is to generate the missing reasoning link between a message and the implied meanings encoding the misogyny. Our study uses argumentation theory as a foundation to form a collection of prompts in both zero-shot and few-shot settings. These prompts integrate different techniques, including chain-of-thought reasoning and augmented knowledge. Our findings show that LLMs fall short on reasoning capabilities about misogynistic comments and that they mostly rely on their implicit knowledge derived from internalized common stereotypes about women to generate implied assumptions, rather than on inductive reasoning.

pdf bib abs

PejorativITy - In-Context Pejorative Language Disambiguation: A CALAMITA Challenge
Arianna Muti
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

Misogyny is often expressed through figurative language. Some neutral words can assume a negative connotation when functioning as pejorative epithets, and they can be used to express misogyny. Disambiguating the meaning of such terms might help the detection of misogyny. This challenge addresses a) the disambiguation of specific ambiguous words in a given context; b) the detection of misogyny in instances that contain such polysemic words. In particular, framed as a binary classification, our task is divided into two parts. In Task A, the model is asked to define if, given a tweet, the target word is used in pejorative or non-pejorative way. In Task B, the model is asked whether the whole sentence is misogynous or not.

pdf bib abs

PejorativITy: Disambiguating Pejorative Epithets to Improve Misogyny Detection in Italian Tweets
Arianna Muti | Federico Ruggeri | Cagri Toraman | Alberto Barrón-Cedeño | Samuel Algherini | Lorenzo Musetti | Silvia Ronchi | Gianmarco Saretto | Caterina Zapparoli
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Misogyny is often expressed through figurative language. Some neutral words can assume a negative connotation when functioning as pejorative epithets. Disambiguating the meaning of such terms might help the detection of misogyny. In order to address such task, we present PejorativITy, a novel corpus of 1,200 manually annotated Italian tweets for pejorative language at the word level and misogyny at the sentence level. We evaluate the impact of injecting information about disambiguated words into a model targeting misogyny detection. In particular, we explore two different approaches for injection: concatenation of pejorative information and substitution of ambiguous words with univocal terms. Our experimental results, both on our corpus and on two popular benchmarks on Italian tweets, show that both approaches lead to a major classification improvement, indicating that word sense disambiguation is a promising preliminary step for misogyny detection. Furthermore, we investigate LLMs’ understanding of pejorative epithets by means of contextual word embeddings analysis and prompting.

2023

pdf bib abs

UniBoe’s at SemEval-2023 Task 10: Model-Agnostic Strategies for the Improvement of Hate-Tuned and Generative Models in the Classification of Sexist Posts
Arianna Muti | Francesco Fernicola | Alberto Barrón-Cedeño
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

We present our submission to SemEval-2023 Task 10: Explainable Detection of Online Sexism (EDOS). We address all three tasks: Task A consists of identifying whether a post is sexist. If so, Task B attempts to assign it one of four categories: threats, derogation, animosity, and prejudiced discussions. Task C aims for an even more fine-grained classification, divided among 11 classes. Our team UniBoe’s experiments with fine-tuning of hate-tuned Transformer-based models and priming for generative models. In addition, we explore model-agnostic strategies, such as data augmentation techniques combined with active learning, as well as obfuscation of identity terms. Our official submissions obtain an F1_score of 0.83 for Task A, 0.58 for Task B and 0.32 for Task C.

pdf bib abs

On the Identification and Forecasting of Hate Speech in Inceldom
Paolo Gajo | Arianna Muti | Katerina Korre | Silvia Bernardini | Alberto Barrón-Cedeño
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

Spotting hate speech in social media posts is crucial to increase the civility of the Web and has been thoroughly explored in the NLP community. For the first time, we introduce a multilingual corpus for the analysis and identification of hate speech in the domain of inceldom, built from incel Web forums in English and Italian, including expert annotation at the post level for two kinds of hate speech: misogyny and racism. This resource paves the way for the development of mono- and cross-lingual models for (a) the identification of hateful (misogynous and racist) posts and (b) the forecasting of the amount of hateful responses that a post is likely to trigger. Our experiments aim at improving the performance of Transformer-based models using masked language modeling pre-training and dataset merging. The results show that these strategies boost the models’ performance in all settings (binary classification, multi-label classification and forecasting), especially in the cross-lingual scenarios.

2022

pdf bib abs

LeaningTower@LT-EDI-ACL2022: When Hope and Hate Collide
Arianna Muti | Marta Marchiori Manerba | Katerina Korre | Alberto Barrón-Cedeño
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion

The 2022 edition of LT-EDI proposed two tasks in various languages. Task Hope Speech Detection required models for the automatic identification of hopeful comments for equality, diversity, and inclusion. Task Homophobia/Transphobia Detection focused on the identification of homophobic and transphobic comments. We targeted both tasks in English by using reinforced BERT-based approaches. Our core strategy aimed at exploiting the data available for each given task to augment the amount of supervised instances in the other. On the basis of an active learning process, we trained a model on the dataset for Task i and applied it to the dataset for Task j to iteratively integrate new silver data for Task i. Our official submissions to the shared task obtained a macro-averaged F₁ score of 0.53 for Hope Speech and 0.46 for Homo/Transphobia, placing our team in the third and fourth positions out of 11 and 12 participating teams respectively.

pdf bib abs

A Checkpoint on Multilingual Misogyny Identification
Arianna Muti | Alberto Barrón-Cedeño
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

We address the problem of identifying misogyny in tweets in mono and multilingual settings in three languages: English, Italian, and Spanish. We explore model variations considering single and multiple languages both in the pre-training of the transformer and in the training of the downstream taskto explore the feasibility of detecting misogyny through a transfer learning approach across multiple languages. That is, we train monolingual transformers with monolingual data, and multilingual transformers with both monolingual and multilingual data. Our models reach state-of-the-art performance on all three languages. The single-language BERT models perform the best, closely followed by different configurations of multilingual BERT models. The performance drops in zero-shot classification across languages. Our error analysis shows that multilingual and monolingual models tend to make the same mistakes.

pdf bib abs

UniBO at SemEval-2022 Task 5: A Multimodal bi-Transformer Approach to the Binary and Fine-grained Identification of Misogyny in Memes
Arianna Muti | Katerina Korre | Alberto Barrón-Cedeño
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

We present our submission to SemEval 2022 Task 5 on Multimedia Automatic Misogyny Identification. We address the two tasks: Task A consists of identifying whether a meme is misogynous. If so, Task B attempts to identify its kind among shaming, stereotyping, objectification, and violence. Our approach combines a BERT Transformer with CLIP for the textual and visual representations. Both textual and visual encoders are fused in an early-fusion fashion through a Multimodal Bidirectional Transformer with unimodally pretrained components. Our official submissions obtain macro-averaged F₁=0.727 in Task A (4th position out of 69 participants)and weighted F₁=0.710 in Task B (4th position out of 42 participants).

pdf bib abs

Misogyny and Aggressiveness Tend to Come Together and Together We Address Them
Arianna Muti | Francesco Fernicola | Alberto Barrón-Cedeño
Proceedings of the Thirteenth Language Resources and Evaluation Conference

We target the complementary binary tasks of identifying whether a tweet is misogynous and, if that is the case, whether it is also aggressive. We compare two ways to address these problems: one multi-class model that discriminates between all the classes at once: not misogynous, non aggressive-misogynous and aggressive-misogynous; as well as a cascaded approach where the binary classification is carried out separately (misogynous vs non-misogynous and aggressive vs non-aggressive) and then joined together. For the latter, two training and three testing scenarios are considered. Our models are built on top of AlBERTo and are evaluated on the framework of Evalita’s 2020 shared task on automatic misogyny and aggressiveness identification in Italian tweets. Our cascaded models —including the strong naïve baseline— outperform significantly the top submissions to Evalita, reaching state-of-the-art performance without relying on any external information.

Arianna Muti

2025

2024

2023

2022

Co-authors

Venues