A recent IBM survey estimates that more than 3.7 billion humans use the internet every day and produce nearly 2.5 quintillion bytes of data on the Web each day. The availability of such large amounts of data is commonly regarded as one of the motors for the current lapses in the development of AI-powered solutions. One of the most important types of data sources found on the Web are Knowledge Graphs (KG). They implement a broad spectrum of formalisms to represent and query entities and their interrelations via interconnected semantic graph networks. While the term "knowledge graph" was popularized by Google in 2012, it is now used by a plethora of companies to describe a multitude of datasets using formalisms of varying expressiveness and semantics, which are used to tackle real-world problems.
Recently, researchers have exploited the use of knowledge graphs to support and improve a variety of open research problems. One trending topic is the use of RDF Knowledge Graphs in the Natural Language Generation (NLG) area for creating natural language text. A given RDF KG commonly stores knowledge in triples. Each triple consists of (i) a subject---often an entity, (ii) a relation---often called property---and (iii) an object---an entity or a literal ( a string or a value with a unit). Thus, relying on NLG algorithms is possible to verbalize the following triple as "Edmund Hillary was born in Auckland." A promising idea is the use of automatically generated texts in downstream NLP tasks such as Named Entity Recognition and Linking. However, many pre-steps are required in order to enable the generation of text from RDF triples such as Graph Summarization.
In this project group, the students are expected to deepen their knowledge in all steps regarding the RDF-to-Text generation task. To this end, the LD2NL project developed by DICE group will be used as a scientific framework. A beneficial side-effect will be acquiring teamwork experience on a scientific project.
The course consists of weekly meetings which will monitor the progress of students in the assigned tasks. The students will be divided into sub-groups for the following respective areas:
The goal of a student joining the project group should be:
Be a valuable member of your team!
This mainly includes (but might not be limited to) the following aspects which are considered for evaluating the students’ performance:
Performance/Code Quality: The students will choose different tasks that they tackle within their respective subgroup. The performance of the students regarding these tasks is measured. Typically, the tasks result in a piece of code that is written. However, the tasks are not limited to that and other results (e.g., diagrams, concepts, etc.) are taken into account as well. These results and especially the code of each student will be reviewed according to a) its functionality, b) its quality and c) its documentation.
Communication/Presentation: The communication of the students within the group is of central importance. Successful teamwork is not possible without it. Therefore, the communication of the students with their team members as well as the communication with their supervisor (during regular meetings, via mail, etc.) will be evaluated. Additionally, the subgroups will present their respective work within a presentation day with allocated time slots once a semester. The presentations will be evaluated and the attendance to the presentation day is mandatory.
Management/Report: Although the supervisor will give hints regarding the students work, the students are mainly responsible for managing the project. Therefore, the students will select a project leader (or group speaker) which will be the main contact point for the supervisor. This leader should make sure that the group follows an agile management (note that responsibility does not mean that this student has to do everything by him/herself—tasks can be delegated). At the end of each semester, each student will submit a short report on the current state of their work (~4 pages).