Archaeoinformatics - Data Science

BA: Colored Motif Search In Heterogeneous Information Networks

Supervisors:

Prof. Dr. Matthias Renz

Christian Beth, M.Sc.

Examples of homogeneous and heterogeneous motifs.

Most graphs only contain nodes and edges of the same type, such graphs are also called homogeneous information networks. Heterogeneous information networks can be seen as an extension of homogeneous information networks. Heterogeneous Information Networks are graphs that consist of different typed nodes and edges and can thus store additional information compared to homogeneous information networks. They carry richer semantic information than homogeneous information
networks do because of the types that can be assigned to nodes and edges. A network motif is a fundamental building block of a graph, it thus is a subgraph that plays an important role in the network structure of the graph. The gain of information in heterogeneous information networks can be used for the problem of network motif discovery as well, as heterogeneous network motifs not only contain structural information. They also contain semantic information about the node and edge types that appear in these network motifs. The problem of network motif discovery in heterogeneous information networks was covered by Rossi et al. in their work "Heterogeneous Network Motifs" [1] in a very efficient way. This work extends the approach by Rossi et al., so that the input networks can have directed and typed edges. These extensions allow the network motifs to contain richer information based on the edge types that had not been part of the network motifs in the approach shown in [1]. The extended approach is created step by step and the performance of the final approach as well as the performance of the intermediate approaches is evaluated on the DBLP dataset. This evaluation shows that directed edges do not have a big influence on the performance, but the typed edges do: The average runtime compared to the approach by Rossi et al. [1] has increased by the factor 2593. Nevertheless the extensions made in this work are having a big influence on the amount of information that a network motif can contain, so that the increase of the runtime is tolerable if it is seen in relation to the information gain about the network structure based on the network motifs with directed, typed edges.

For more information please contact Christian Beth, M.Sc.

[1] Heterogeneous Network Motifs, Rossi, Ryan A., et al. ". arXiv preprint (2019)