QUOCA – Quality-driven data integration in Semantic Data Lakes

Name of the participant: Lars Hering

Description of the IT research project: In the course of digitization and Industry 4.0, large amounts of data play a central role in companies and are increasingly seen as part of their strategy. In recent years, progress in the fields of data science and especially machine learning has been enormous. As a result, data is becoming increasingly important in corporate decision-making and is also being used to develop new data-driven business models. The sources of the data are usually diverse and range from sensor data in factories to product lifecycle management software and data from smartphone apps. However, according to a study by McKinsey, only a few companies are currently exploiting the full potential of the data available to them, and the quality of the data plays an important role in this context, because successful data-driven business models and corporate strategies based on insights from large amounts of data are strongly influenced by the underlying data and, in particular, its quality. At the same time, a study by the Harvard Business Review found that only 3% of data in companies meet basic data quality requirements. For this reason, new approaches are needed to quantify the quality of data and to take it into account in the integration and analysis process.

The goal of the project “QUOCA – Quality-Driven Data Integration in Semantic Data Lakes” is to determine the quality of data from a variety of heterogeneous data sources in the context of Big Data and to use the knowledge about quality in the data integration process to support downstream processes such as data analytics and machine learning and to improve the generated insights. The approach is based on Semantic Data Lakes as a technology for integrating large amounts of data from a multitude of data sources via a uniform interface.

Software Campus partner: KIT, DATEV eG
Implementation period: 1.4.2020 to 31.3.2022