Archaeoinformatics - Data Science

BA/MA: Scalable Co-Location Mining in Large Protein Databases

Contact: Steffen Strohm, M.Sc., Christian Beth, M.Sc.

Given a large set of genomes covering a set of genes where genes can come from a specific family (according to their function). The question at issue is which genes significantly co-occur (i.e. appear together) in genomes. These questions relate to comparing genes among species with similar/different ecological or physiological properties. In the context of this question, the aim of this thesis is to develop algorithms and methods that efficiently support the identification of co-occurance patterns in gene/genome-datasets.