Archaeoinformatics - Data Science

Open Topics

This is a list of open thesis topics. For further information, or if you whish to suggest your own topic, contact the responsible supervisor(s).

BA/MA: Data Science Applications in Marine Sciences

There are multiple open topics available that target data science applications in marine science. If you are interested in one of the topics or have an idea about a related topic, please contact Carola Trahms, M.Sc.
You can find more suggested topics below.

Fish Larvae Trajectories in the Mediterranean Sea

Example data that can be encountered in marine data science.


BA/MA: Efficient spatio-temporal indexing for HINs: Getting from measurement tables to HINs

Example of a spatial HIN for marine data science

Much of the data and measurements obtained in marine science are of a spatial and/or temporal nature, i.e. they are associated with geo-coordinates or time stamps.
These spatio-temporal properties can be leveraged to obtain new, and deeper insights into the data. However managing spatial and temporal data often requires careful indexing of the data, in order to remain efficient. In this thesis you will study efficient methods to index spatio-temporal data for Heterogeneous Information Networks (HINs), which are large graphs, where different types of nodes and relationships are modelled. This topic offers the opportunity to hone skills and techniques learned in lectures like Information Systems, Geo-Information Systems, and Methods of Efficient Similarity Search in Large Databases (although the latter two are not a pre-requisite).

BA/MA: (Linear) combinations of MetaStructures for Clustering or Community Detection in (schema-rich) HINs

Heterogeneous Information Networks (HINs) are graphs, where nodes have different types, and edges form different relationships between the nodes (a homogeneous information network would just be a plain graph). In all graphs, but especially HINs, it is of great interest to find groups or communities that exhibit similar behavior or are more closely related to one another. Meta Structures are complex relationships in HINs, which can be used to express the 'relatedness' of nodes within the graph, and thus provide a powerful tool available to be used in clustering or community detection. In this thesis you will study efficient (fast, scalable) and effective (meaningful) clustering and CD algortihms using (linear) combinations of meta structures. Participation in a lecture such as KDDM (or similar) is required.


BA/MA: Text Mining and Knowledge Extraction in Marine Sciences

There are multiple open topics available that target text mining and knowledge extraction from text with applications in marine science. If you are interested in one of the topics, or have an idea about a related topic, please contact Asif Suryani, M.Sc.
You can find more suggested topics below.

BA/MA: Study and Evaluation: From NER to Network Representation of Scientific Text

Named entity recognition (NER) is the task of finding and extracting relevant entities from text. Scientific measurements and their associated values are of particular interest in this scenario, but also automatic recognition of locations, institutions, persons etc. Once these entities are extracted from a text, the task is to construct a network representation (e.g. a heterogeneous information network) of the text document at hand, where the challenge lies in predicting the appropriate relationships between the extracted entities (e.g. linking an extracted quantity 'mass' to its respective measurement '42', and unit 'kilograms'). The Target of this thesis is to develop and study novel techniques for NER and to link the extracted entities for a network representation of the document.

BA: Scientific Text Parser: An Interactive and Intelligent Approach

In this bachelor's thesis the task is to develop a framework for a parsing toolkit, that reads and summarizes scientific text documents - taylored to the needs of the user. The target domain for these studies will be scientific texts from marine science.

MA: Pre-trained Language Models for Domain-driven Q/A

In this master's thesis the objective is to leverage novel, and state-of-the-art pre-trained language models (such as BERT) to facilitate automated question answering (Q/A). The target domain for these studies will be scientific texts from marine science.

BA/MA: Scalable Co-Location Mining in Large Protein Databases

Contact: Steffen Strohm, M.Sc., Christian Beth, M.Sc.

Given a large set of genomes covering a set of genes where genes can come from a specific family (according to their function). The question at issue is which genes significantly co-occur (i.e. appear together) in genomes. These questions relate to comparing genes among species with similar/different ecological or physiological properties. In the context of this question, the aim of this thesis is to develop algorithms and methods that efficiently support the identification of co-occurance patterns in gene/genome-datasets.

MA: Evaluating Protein Networks – Puzzling the Biological Secret Behind the Data

Contact: Christian Beth, M.Sc.

Example Protein Network

Protein interaction and function overview (figure taken from [1])

The development of novel metallic biomaterials as implants for medical application requires an attentive elucidation of impaired or improved physiological processes and responses. Therefore, we establish in vitro models to analyse cellular reactions on different biological levels. Fundamental information can be obtained from analysing the variations in the protein synthesis. Functional interactions, relationships, and correlations between the tremendous diverse proteins lead to multifaceted and complex networks. Structured data science will help to elucidate these networks and contribute to define sensitive cellular processes and regularities. The main task will be the classification of big data sets in order to read out principles and pattern. This work will be based on the creative embedding of biological data into mathematical models.

[1] Understanding Protein Networks Using Vester's Sensitivity Model, Moreno, LA et al. in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 17, no. 4, pp. 1440-1450, (2020)

MA: Extending Expressiveness of Meta-Structures in Heterogeneous Information Network

Contact: Christian Beth, M.Sc.

Heterogeneous information networks (HINs) are graphs, where nodes have different types, and edges form different relations between the nodes, and thus allow semantically rich modelling of virtually any kind of data and information, ranging from protein-protein interaction networks to bibliogrpahical networks. The meta-path is a composite relationship between nodes in an HIN that is an integral part of state-of-the-art similarity/relevance measures in HINs, which are an integral part for downstream data mining tasks such as clustering, classification, or link prediction in HINs. To allow for more powerful, complex, and expressive relationships, the meta-structure was developed. It felxibly combines meta-path relations with the 'and'-linkage, which allows the user to specify more precisely what she is looking for. But why stop at the 'and'-linkage? Why not consider 'or', 'not', or other logical constraints? In this thesis you will design and study effective (meaningful) and efficient (fast, scalable) relevance measures based on more expressive meta-structures.