María-Teresa Martín-Valdivia
Other people with similar names: M. Teresa Martín-Valdivia
Unverified author pages with similar names: María-Teresa Martín-Valdivia
2026
Nuanced Toxicity Detection in Spanish: A New Corpus and Benchmark Study
Alba María Mármol-Romero | Robiert Sepúlveda-Torres | Estela Saquete | María-Teresa Martín-Valdivia | L. Alfonso Ureña
Findings of the Association for Computational Linguistics: EACL 2026
Alba María Mármol-Romero | Robiert Sepúlveda-Torres | Estela Saquete | María-Teresa Martín-Valdivia | L. Alfonso Ureña
Findings of the Association for Computational Linguistics: EACL 2026
The rise of toxic content on digital platforms has intensified the demand for automatic moderation tools. While English has benefited from large-scale annotated corpora, Spanish remains under-resourced, particularly for nuanced cases of toxicity such as irony, sarcasm, or indirect aggression. We present an extended version of the NECOS-TOX corpus, comprising 4,011 Spanish comments collected from 16 major news outlets. Each comment is annotated across three levels of toxicity (Non-Toxic, Slightly Toxic, and Toxic), following an iterative annotation protocol that achieved substantial inter-annotator agreement (k = 0.74). To reduce annotation costs while maintaining quality, we employed a human-in-the-loop active learning strategy, with manual correction of model pre-labels. We benchmarked the dataset with traditional machine learning (ML) methods, domain-specific transformers, and instruction-tuned large language models (LLMs). Results show that compact encoder models (e.g., RoBERTa-base-bne, 125M parameters) perform on par with much larger models (e.g., LLaMA-3.1-8B), underscoring the value of in-domain adaptation over raw scale. Our error analysis highlights persistent challenges in distinguishing subtle forms of toxicity, especially sarcasm and implicit insults, and reveals entity-related biases that motivate anonymization strategies. The dataset and trained models are released publicly.
2025
La Leaderboard: A Large Language Model Leaderboard for Spanish Varieties and Languages of Spain and Latin America
María Grandury | Javier Aula-Blasco | Júlia Falcão | Clémentine Fourrier | Miguel González Saiz | Gonzalo Martínez | Gonzalo Santamaria Gomez | Rodrigo Agerri | Nuria Aldama García | Luis Chiruzzo | Javier Conde | Helena Gomez Adorno | Marta Guerrero Nieto | Guido Ivetta | Natàlia López Fuertes | Flor Miriam Plaza-del-Arco | María-Teresa Martín-Valdivia | Helena Montoro Zamorano | Carmen Muñoz Sanz | Pedro Reviriego | Leire Rosado Plaza | Alejandro Vaca Serrano | Estrella Vallecillo-Rodríguez | Jorge Vallego | Irune Zubiaga
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
María Grandury | Javier Aula-Blasco | Júlia Falcão | Clémentine Fourrier | Miguel González Saiz | Gonzalo Martínez | Gonzalo Santamaria Gomez | Rodrigo Agerri | Nuria Aldama García | Luis Chiruzzo | Javier Conde | Helena Gomez Adorno | Marta Guerrero Nieto | Guido Ivetta | Natàlia López Fuertes | Flor Miriam Plaza-del-Arco | María-Teresa Martín-Valdivia | Helena Montoro Zamorano | Carmen Muñoz Sanz | Pedro Reviriego | Leire Rosado Plaza | Alejandro Vaca Serrano | Estrella Vallecillo-Rodríguez | Jorge Vallego | Irune Zubiaga
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Leaderboards showcase the current capabilities and limitations of Large Language Models (LLMs). To motivate the development of LLMs that represent the linguistic and cultural diversity of the Spanish-speaking community, we present La Leaderboard, the first open-source leaderboard to evaluate generative LLMs in languages and language varieties of Spain and Latin America. La Leaderboard is a community-driven project that aims to establish an evaluation standard for everyone interested in developing LLMs for the Spanish-speaking community. This initial version combines 66 datasets in Catalan, Basque, Galician, and different Spanish varieties, showcasing the evaluation results of 50 models. To encourage community-driven development of leaderboards in other languages, we explain our methodology, including guidance on selecting the most suitable evaluation setup for each downstream task. In particular, we provide a rationale for using fewer few-shot examples than typically found in the literature, aiming to reduce environmental impact and facilitate access to reproducible results for a broader research community.
2022
Empathy and Distress Prediction using Transformer Multi-output Regression and Emotion Analysis with an Ensemble of Supervised and Zero-Shot Learning Models
Flor Miriam Del Arco | Jaime Collado-Montañez | L. Alfonso Ureña | María-Teresa Martín-Valdivia
Proceedings of the 12th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis
Flor Miriam Del Arco | Jaime Collado-Montañez | L. Alfonso Ureña | María-Teresa Martín-Valdivia
Proceedings of the 12th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis
This paper describes the participation of the SINAI research group at WASSA 2022 (Empathy and Personality Detection and Emotion Classification). Specifically, we participate in Track 1 (Empathy and Distress predictions) and Track 2 (Emotion classification). We conducted extensive experiments developing different machine learning solutions in line with the state of the art in Natural Language Processing. For Track 1, a Transformer multi-output regression model is proposed. For Track 2, we aim to explore recent techniques based on Zero-Shot Learning models including a Natural Language Inference model and GPT-3, using them in an ensemble manner with a fine-tune RoBERTa model. Our team ranked 2nd in the first track and 3rd in the second track.
Search
Fix author
Co-authors
- Flor Miriam Plaza-del-Arco 2
- L. Alfonso Ureña 2
- Rodrigo Agerri 1
- Javier Aula-Blasco 1
- Luis Chiruzzo 1
- Jaime Collado-Montañez 1
- Javier Conde 1
- Júlia Falcão 1
- Clémentine Fourrier 1
- Natàlia López Fuertes 1
- Nuria Aldama García 1
- Gonzalo Santamaria Gomez 1
- Helena Gomez Adorno 1
- María Grandury 1
- Guido Ivetta 1
- Gonzalo Martínez 1
- Alba María Mármol-Romero 1
- Marta Guerrero Nieto 1
- Leire Rosado Plaza 1
- Pedro Reviriego 1
- Miguel González Saiz 1
- Carmen Muñoz Sanz 1
- Estela Saquete 1
- Robiert Sepúlveda-Torres 1
- Alejandro Vaca Serrano 1
- Estrella Vallecillo-Rodríguez 1
- Jorge Vallego 1
- Helena Montoro Zamorano 1
- Irune Zubiaga 1