Ladislav Lenc


2024

pdf bib
COMICORDA: Dialogue Act Recognition in Comic Books
Jiri Martinek | Pavel Kral | Ladislav Lenc | Josef Baloun
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Dialogue act (DA) recognition is usually realized from a speech signal that is transcribed and segmented into text. However, only a little work in DA recognition from images exists. Therefore, this paper concentrates on this modality and presents a novel DA recognition approach for image documents, namely comic books. To the best of our knowledge, this is the first study investigating dialogue acts from comic books and represents the first steps to building a model for comic book understanding. The proposed method is composed of the following steps: speech balloon segmentation, optical character recognition (OCR), and DA recognition itself. We use YOLOv8 for balloon segmentation, Google Vision for OCR, and Transformer-based models for DA classification. The experiments are performed on a newly created dataset comprising 1,438 annotated comic panels. It contains bounding boxes, transcriptions, and dialogue act annotation. We have achieved nearly 98% average precision for speech balloon segmentation and exceeded the accuracy of 70% for the DA recognition task. We also present an analysis of dialogue structure in the comics domain and compare it with the standard DA datasets, representing another contribution of this paper.

2018

pdf bib
Czech Text Document Corpus v 2.0
Pavel Král | Ladislav Lenc
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
UWB at SemEval-2018 Task 1: Emotion Intensity Detection in Tweets
Pavel Přibáň | Tomáš Hercig | Ladislav Lenc
Proceedings of the 12th International Workshop on Semantic Evaluation

This paper describes our system created for the SemEval-2018 Task 1: Affect in Tweets (AIT-2018). We participated in both the regression and the ordinal classification subtasks for emotion intensity detection in English, Arabic, and Spanish. For the regression subtask we use the AffectiveTweets system with added features using various word embeddings, lexicons, and LDA. For the ordinal classification we additionally use our Brainy system with features using parse tree, POS tags, and morphological features. The most beneficial features apart from word and character n-grams include word embeddings, POS count and morphological features.

2017

pdf bib
The Impact of Figurative Language on Sentiment Analysis
Tomáš Hercig | Ladislav Lenc
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

Figurative language such as irony, sarcasm, and metaphor is considered a significant challenge in sentiment analysis. These figurative devices can sculpt the affect of an utterance and test the limits of sentiment analysis of supposedly literal texts. We explore the effect of figurative language on sentiment analysis. We incorporate the figurative language indicators into the sentiment analysis process and compare the results with and without the additional information about them. We evaluate on the SemEval-2015 Task 11 data and outperform the first team with our convolutional neural network model and additional training data in terms of mean squared error and we follow closely behind the first place in terms of cosine similarity.

pdf bib
Word Embeddings for Multi-label Document Classification
Ladislav Lenc | Pavel Král
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

In this paper, we analyze and evaluate word embeddings for representation of longer texts in the multi-label classification scenario. The embeddings are used in three convolutional neural network topologies. The experiments are realized on the Czech ČTK and English Reuters-21578 standard corpora. We compare the results of word2vec static and trainable embeddings with randomly initialized word vectors. We conclude that initialization does not play an important role for classification. However, learning of word vectors is crucial to obtain good results.

2016

pdf bib
UWB at SemEval-2016 Task 7: Novel Method for Automatic Sentiment Intensity Determination
Ladislav Lenc | Pavel Král | Václav Rajtmajer
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)