Viktoria Ondrejova


2024

pdf bib
SlovakSum: A Large Scale Slovak Summarization Dataset
Viktoria Ondrejova | Marek Suppa
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The ability to automatically summarize news articles has become increasingly important due to the vast amount of information available online. Together with the rise of chatbots , Natural Language Processing (NLP) has recently experienced a tremendous amount of development. Despite these advancements, the majority of research is focused on established well-resourced languages, such as English. To contribute to development of the low resource Slovak language, we introduce SlovakSum, a Slovak news summarization dataset consisting of over 200 thousand news articles with titles and short abstracts obtained from multiple Slovak newspapers. The abstractive approach, including MBART and mT5 models, was used to evaluate various baselines. The code for the reproduction of our dataset and experiments can be found at https://github.com/NaiveNeuron/slovaksum
Search
Co-authors
Venues