Which digitalization topics are addressed in politics and society? Results of an automated text analysis

To identify potentially important topics in digitalization research, CAIS conducted an automated content analysis of key text documents. An insight into the process and the insights gained.


Anticipating social developments and reacting to them in an appropriate way are key tasks for political and social actors. But which digitalization topics does the political domain address in the first place? And which of these does it address in the short- or medium-term? How do funding bodies (e.g. the German Research Foundation and the Federal Ministry of Education and Research) act in this field? Do they have different focal points? And which topics with a focus on the digital transformation do other institutions address?

To answer these questions, we conducted an automated text analysis of such key documents in fall 2020. Together with the findings of the online real-time Delphi study of fall 2019, as well as the expert discussions with researchers in digitalization research conducted a year later, this automated text analysis is a further component whereby CAIS identifies research topics of future significance.

Automated text analysis of 471 documents

The automated text analysis of 471 documents included important texts, such as the digitalization strategies of federal states, calls for research projects issued by the Federal Ministry of Education and Research, and self-descriptions of existing research contexts with a digital focus. Data collection took place between 20 August and 7 September 2020. Drawing on the bag-of-words approach, we gained an exploratory insight into the basic structures and themes of the texts.

First insight: all texts are strongly marked by a research- and business-related vocabulary

Due to the selection of sources, words such as “digitalization” and “digital” naturally take center stage. The focus of the texts on a business-related, practice-oriented vocabulary indicates that the discussion in the texts is strongly related to the application of digital technologies (see also Figure 2). Besides a focus on policy- and business-related content, a topic-modeling approach (LDA) also reveals a high proportion of research-related vocabulary.

Bag-of-words approach

The bag-of-words approach breaks texts down into components of a fixed length for analysis, thereby dissolving the context in which words, word groups and sentences appear. Figuratively speaking, a bag contains all the words of the original texts in a loose order and relationship.

Topic-modeling approach

Topic modeling refers to a procedure in the automated processing of texts. A topic model can be understood as a statistical model for discovering topics or semantic structures that occur in a collection of documents. LDA stands for Latent Dirichlet Allocation, which is a common method of analysis.

Second insight: results congruent with other methods for identifying topics

Analyzing the overlap between topics from the real-time Delphi study and from the expert discussions on the one hand, and topics from the automated text analysis on the other, points to a focus on the topic of IT/cybersecurity. There are also overlaps, for example, with regard to the topics of digital literacy, surveillance, and environmental protection.

Third insight: valuable initial indications and potential for development

Since the text corpus in its current form is too small and too heterogeneous, the methods used have not yet been able to produce sufficiently precise results. Nevertheless, they do give us valuable indications of the focal points emphasized by different social and political actors with regard to issues of digitalization. Expanding the data basis and sharpening the analytical tools will help improve the process, although, being constantly aligned with the other components, it already fits into the overall structure of identifying topics for the CAIS research programs on digitalization research.

The whole process of identifying topics can be seen in the video.

Abbildung 2. Übersicht über die 70 häufigsten Begriffe