Name of the participant: Isaiah Mulang’ Onando
Description of the IT research project: The research and application of knowledge graphs (called KGs) has increased rapidly in recent years, especially through their use in large IT companies such as Google, Facebook and Microsoft. Knowledge graphs offer an easy way to display knowledge. The simplest form of modelling is the representation via a triple relationship or triples (subject, predicate, object). Due to this simple representation and the progress in the techniques of the SemanticWeb, KGs are applicable in almost all areas. The development of a software solution for creating domain-specific knowledge graphs is significantly influenced by three factors: the exponentially growing amount of data, the accuracy of machine learning applications and the usability of the software by the end user. This means first and foremost that the currently common form of classification of unstructured data by human actors is no longer a promising solution. Supporting systems combine the computational power of intelligent, automatic structuring with human interpretability, thus enabling more efficient and effective results.
The information extraction mechanisms of existing systems are not generally applicable due to their focus on e-commerce and the evaluation of forum contributions. The first goal of this project is therefore to enable the construction of domain-specific knowledge graphs in a general system. The system will extract relevant data from unstructured texts using Natural Language Processing (NLP). Based on this data, the knowledge graph will then be constructed using hierarchical classification algorithms. Subsequently, a question-answer system for the application of these domain-specific KGs will be created. For this purpose, methods of Named Entity Disambiguation (NED), linking of relations (RL) and semantic parsing are required. A dynamic pipeline of context-sensitive components is proposed here, in which each component consists of several approaches to perform the specific task, such as NED, RL, and so on. The specific approach chosen depends on the type of question being asked. A functioning QA-over-Domain-KG system and the extension of the state of the art in the form of scientific publications are the two main objectives that will be pursued as a result of this project.
Software Campus partner: Fraunhofer IAIS, DATEV
Implementation period: 01.02.2020 – 30.11.2021