Harm Lameris


2024

pdf bib
The Role of Creaky Voice in Turn Taking and the Perception of Speaker Stance: Experiments Using Controllable TTS
Harm Lameris | Eva Szekely | Joakim Gustafson
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Recent advancements in spontaneous text-to-speech (TTS) have enabled the realistic synthesis of creaky voice, a voice quality known for its diverse pragmatic and paralinguistic functions. In this study, we used synthesized creaky voice in perceptual tests, to explore how listeners without formal training perceive two distinct types of creaky voice. We annotated a spontaneous speech corpus using creaky voice detection tools and modified a neural TTS engine with a creaky phonation embedding to control the presence of creaky phonation in the synthesized speech. We performed an objective analysis using a creak detection tool which revealed significant differences in creaky phonation levels between the two creaky voice types and modal voice. Two subjective listening experiments were performed to investigate the effect of creaky voice on perceived certainty, valence, sarcasm, and turn finality. Participants rated non-positional creak as less certain, less positive, and more indicative of turn finality, while positional creak was rated significantly more turn final compared to modal phonation.

2022

pdf bib
Speech Data Augmentation for Improving Phoneme Transcriptions of Aphasic Speech Using Wav2Vec 2.0 for the PSST Challenge
Birger Moell | Jim O’Regan | Shivam Mehta | Ambika Kirkland | Harm Lameris | Joakim Gustafson | Jonas Beskow
Proceedings of the RaPID Workshop - Resources and ProcessIng of linguistic, para-linguistic and extra-linguistic Data from people with various forms of cognitive/psychiatric/developmental impairments - within the 13th Language Resources and Evaluation Conference

As part of the PSST challenge, we explore how data augmentations, data sources, and model size affect phoneme transcription accuracy on speech produced by individuals with aphasia. We evaluate model performance in terms of feature error rate (FER) and phoneme error rate (PER). We find that data augmentations techniques, such as pitch shift, improve model performance. Additionally, increasing the size of the model decreases FER and PER. Our experiments also show that adding manually-transcribed speech from non-aphasic speakers (TIMIT) improves performance when Room Impulse Response is used to augment the data. The best performing model combines aphasic and non-aphasic data and has a 21.0% PER and a 9.2% FER, a relative improvement of 9.8% compared to the baseline model on the primary outcome measurement. We show that data augmentation, larger model size, and additional non-aphasic data sources can be helpful in improving automatic phoneme recognition models for people with aphasia.

2021

pdf bib
Whit’s the Richt Pairt o Speech: PoS tagging for Scots
Harm Lameris | Sara Stymne
Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects

In this paper we explore PoS tagging for the Scots language. Scots is spoken in Scotland and Northern Ireland, and is closely related to English. As no linguistically annotated Scots data were available, we manually PoS tagged a small set that is used for evaluation and training. We use English as a transfer language to examine zero-shot transfer and transfer learning methods. We find that training on a very small amount of Scots data was superior to zero-shot transfer from English. Combining the Scots and English data led to further improvements, with a concatenation method giving the best results. We also compared the use of two different English treebanks and found that a treebank containing web data was superior in the zero-shot setting, while it was outperformed by a treebank containing a mix of genres when combined with Scots data.