Archaeoinformatics - Data Science

BA: Spatial Semantics Expansion for computing relevance on Heterogeneous Information Networks

Author: Jerome Spindler

Supervisors:

Prof. Dr. Matthias Renz

Christian Beth, M.Sc.

Excerpt of the PANGAEA dataset.

Sample map region of the North Atlantic Ocean from the PANGAEA database.

Abstract:

In recent years, with the rise of Big Data within both the scientific community and commercial sectors, the demand for solutions for storing and performing computations on heterogenous data has steadily increased. In the fallout of this, the heterogeneous information network (HIN) model was conceived as a visually intuitive and semantically connected model for interpreting heterogeneous data. These networks, represented by directed graphs, led to the development of relation models, such as Meta Paths and, as a generalization thereof, Meta Structures, which were developed to represent subgraph patterns on the network by which to determine relations between objects. Upon these, for computing relevance between objects on the network such as Path Count, Struct Count and Structure Contained Subgraph Expansion (SCSE) were developed as measures of relevance based on all occurences of the pattern described by a Meta Path or Meta Structure starting from one designated source object. This work looks to expand the Meta Structure model to allow for edges, which are not manifested on the network, yet represent a non-trivial relation many objects may be in with each other with a focus on spatial features including, but not limited to, distance from each other, inclusion of one in the other and overlap of each other. These are intended to be definable by users and parameterizable per query. Additionally, it is intended to have these edges potentially influence the results of relevance computations to allow for increased expressiveness with regards to weighing properties of objects, such as "the closer the better". Example computation results on excerpt data taken from the PANGAEA database , utilizing Struct Count and SCSE, are provided to demonstrate the efficacy and impact of this paper’s proposed expansion, as well as a experimental comparison of implementation approaches for these ephemeral edges with regards to data page access.