Archaeoinformatics - Data Science

Open Topics

This is a list of open thesis topics. For further information, or if you wish to suggest your own topic, contact the responsible supervisor(s).

BA/MA: Data Science Applications in Marine Sciences

There are multiple open topics available that target data science applications in marine science. If you are interested in one of the topics or have an idea about a related topic, please contact Carola Trahms, M.Sc.
You can find more suggested topics below.

Fish Larvae Trajectories in the Mediterranean Sea

Example data that can be encountered in marine data science.


BA/MA: Efficient spatio-temporal indexing for HINs: Getting from measurement tables to HINs

Example of a spatial HIN for marine data science

Much of the data and measurements obtained in marine science are of a spatial and/or temporal nature, i.e. they are associated with geo-coordinates or time stamps.
These spatio-temporal properties can be leveraged to obtain new, and deeper insights into the data. However managing spatial and temporal data often requires careful indexing of the data, in order to remain efficient. In this thesis you will study efficient methods to index spatio-temporal data for Heterogeneous Information Networks (HINs), which are large graphs, where different types of nodes and relationships are modelled. This topic offers the opportunity to hone skills and techniques learned in lectures like Information Systems, Geo-Information Systems, and Methods of Efficient Similarity Search in Large Databases (although the latter two are not a pre-requisite).

BA/MA: (Linear) combinations of MetaStructures for Clustering or Community Detection in (schema-rich) HINs

Heterogeneous Information Networks (HINs) are graphs, where nodes have different types, and edges form different relationships between the nodes (a homogeneous information network would just be a plain graph). In all graphs, but especially HINs, it is of great interest to find groups or communities that exhibit similar behavior or are more closely related to one another. Meta Structures are complex relationships in HINs, which can be used to express the 'relatedness' of nodes within the graph, and thus provide a powerful tool available to be used in clustering or community detection. In this thesis you will study efficient (fast, scalable) and effective (meaningful) clustering and CD algortihms using (linear) combinations of meta structures. Participation in a lecture such as KDDM (or similar) is required.


BA/MA: Text Mining and Knowledge Extraction in Marine Sciences

There are multiple open topics available that target text mining and knowledge extraction from text with applications in marine science. If you are interested in one of the topics, or have an idea about a related topic, please contact Asif Suryani, M.Sc.
You can find more suggested topics below.

BA/MA: Study and Evaluation: From NER to Network Representation of Scientific Text

named-entity recognition example

Named entity recognition (NER) is the task of finding and extracting relevant entities from text. Scientific measurements and their associated values are of particular interest in this scenario, but also automatic recognition of locations, institutions, persons etc. Once these entities are extracted from a text, the task is to construct a network representation (e.g. a heterogeneous information network) of the text document at hand, where the challenge lies in predicting the appropriate relationships between the extracted entities (e.g. linking an extracted quantity 'mass' to its respective measurement '42', and unit 'kilograms'). The Target of this thesis is to develop and study novel techniques for NER and to link the extracted entities for a network representation of the document.

BA: Scientific Text Parser: An Interactive and Intelligent Approach

In this bachelor's thesis the task is to develop a framework for a parsing toolkit, that reads and summarizes scientific text documents - taylored to the needs of the user. The target domain for these studies will be scientific texts from marine science.

MA: Pre-trained Language Models for Domain-driven Q/A

In this master's thesis the objective is to leverage novel, and state-of-the-art pre-trained language models (such as BERT) to facilitate automated question answering (Q/A). The target domain for these studies will be scientific texts from marine science.

MA: Evaluating Protein Networks – Puzzling the Biological Secret Behind the Data

Contact: Christian Beth, M.Sc.

Example Protein Network

Protein interaction and function overview (figure taken from [1])

The development of novel metallic biomaterials as implants for medical application requires an attentive elucidation of impaired or improved physiological processes and responses. Therefore, we establish in vitro models to analyse cellular reactions on different biological levels. Fundamental information can be obtained from analysing the variations in the protein synthesis. Functional interactions, relationships, and correlations between the tremendous diverse proteins lead to multifaceted and complex networks. Structured data science will help to elucidate these networks and contribute to define sensitive cellular processes and regularities. The main task will be the classification of big data sets in order to read out principles and pattern. This work will be based on the creative embedding of biological data into mathematical models.

[1] Understanding Protein Networks Using Vester's Sensitivity Model, Moreno, LA et al. in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 17, no. 4, pp. 1440-1450, (2020)

MA: Identification and Synchronization of Events in Data Series from Lake Sediments

Contact: Steffen Strohm, M.Sc.

The preliminary title of this master thesis points towards a set of possible subproblems in the context of mining time series data derived from lake sediments. The project focusses on analysing these time series to identify patterns, which represent events, environmental conditions or human impact. Identifying these patterns and comparing the time series with other - independent - climate data series will help to understand (multifactorial) transformation processes, their role and temporal dynamics. This work is a collaborative project with scientists from subproject F2 (Geoarchaeology) within the CRC 1266 "Scales of Transformation" at Kiel University.

German Description

Im Fokus des Sonderforschungsbereiches 1266 „TransformationsDimensionen“ stehen die Untersuchungen von Mensch-Umwelt Wechselwirkungen. Das Projekt F2 erhebt hierbei Daten mittels pollenanalytischer und geochemischer Analysen von jahresgeschichteten Seesedimenten aus Norddeutschland. Diese Daten bilden Zeitreihen, die See-interne und landschaftliche Veränderungen widerspiegeln. Darin sind sowohl klimatische wie menschliche Einflüsse abgebildet. Ein Vergleich mit unabhängigen Klimadatenreihen soll helfen, die oftmals multifaktoriell bedingten Veränderungen sowie die Rolle und die zeitliche Dynamik (z.B. die Frage nach Synchronität bzw. Asynchronität) einzelner Prozesse besser zu verstehen.