← Go back

Color the LOD Cloud

Master Thesis

The Linked Open Data (LOD) cloud is a set of interlinked RDF datasets that are publicly available. From time to time a figure of the cloud is published and in this figure, the datasets might be colored based on the content of the single datasets. However, the categories as well as the assignment of the single datasets to the single categories are created manually by humans who read descriptions of the datasets or take a look into the data. The goal of this thesis is to create an approach that does this in an automatic, unsupervised way to enable humans an easier access to the LOD cloud. This comprises several steps:

  1. Extract data (or use meta data) from the datasets
  2. Calculate a topic model (e.g., by using Latent Dirichlet Allocation) and assign probabilities to the single datasets.
  3. Filter the topics based on their quality (e.g., calculated by Palmetto)
  4. Generate short labels for the single topics