Andreas Blaette


2024

pdf bib
The dbpedia R Package: An Integrated Workflow for Entity Linking (for ParlaMint Corpora)
Christoph Leonhardt | Andreas Blaette
Proceedings of the IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora (ParlaCLARIN) @ LREC-COLING 2024

Entity Linking is a powerful approach for linking textual data to established structured data such as survey data or adminstrative data. However, in the realm of social science, the approach is not widely adopted. We argue that this is, at least in part, due to specific setup requirements which constitute high barriers for usage and workflows which are not well integrated into analyitical scenarios commonly deployed in social science research. We introduce the dbpedia R package to make the approach more accessible. It has a focus on functionality that is easily adoptable to the needs of social scientists working with textual data, including the support of different input formats, limited setup costs and various output formats. Using a ParlaMint corpus, we show the applicability and flexibility of the approach for parliamentary debates.

2022

pdf bib
How GermaParl Evolves: Improving Data Quality by Reproducible Corpus Preparation and User Involvement
Andreas Blaette | Julia Rakers | Christoph Leonhardt
Proceedings of the Workshop ParlaCLARIN III within the 13th Language Resources and Evaluation Conference

The development and curation of large-scale corpora of plenary debates requires not only care and attention to detail when the data is created but also effective means of sustainable quality control. This paper makes two contributions: Firstly, it presents an updated version of the GermaParl corpus of parliamentary debates in the German *Bundestag*. Secondly, it shows how the corpus preparation pipeline is designed to serve the quality of the resource by facilitating effective community involvement. Centered around a workflow which combines reproducibility, transparency and version control, the pipeline allows for continuous improvements to the corpus.

2020

pdf bib
The Europeanization of Parliamentary Debates on Migration in Austria, France, Germany, and the Netherlands
Andreas Blaette | Simon Gehlhar | Christoph Leonhardt
Proceedings of the Second ParlaCLARIN Workshop

Corpora of plenary debates in national parliaments are available for many European states. For comparative research on political discourse, a persisting problem is that the periods covered by corpora differ and that a lack of standardization of data formats inhibits the integration of corpora into a single analytical framework. The solution we pursue is a ‘Framework for Parsing Plenary Protocols’ (frappp), which has been used to prepare corpora of the Assemblée Nationale (‘‘ParisParl”), the German Bundestag (‘‘GermaParl”), the Tweede Kamer of the Netherlands (‘‘TweedeTwee”), and the Austrian Nationalrat (‘‘AustroParl”) for the first two decades of the 21st century (2000-2019). To demonstrate the usefulness of the data gained, we investigate the Europeanization of migration debates in these Western European countries of immigration, i.e. references to a European dimension of policy-making in speeches on migration and integration. Based on a segmentation of the corpora into speeches, the method we use is topic modeling, and the analysis of joint occurrences of topics indicating migration and European affairs, respectively. A major finding is that after 2015, we see an increasing Europeanization of migration debates in the small EU member states in our sample (Austria and the Netherlands), and a regression of respective Europeanization in France and – more notably – in Germany.