INTEXPLORE – Interactive Structured Text Exploration

Name of the participant: Benjamin Hättasch

Description of the IT research project: Enormous amounts of information exist in this world only as written text, but not in structured form. How can people find relevant information in large amounts of data in a short time? How can they understand central messages and the overall content, but also important details? This is important for journalists, lawyers, financial analysts, physicians, and almost all scientists who regularly have to process large amounts of text.

The vast majority of these people lack the technical knowledge to write extraction rules, regular expressions, or even code to assist them in this task. The INTEXPLORE project is therefore researching text exploration tools that are easy to use and can be used regardless of the exact task.

This benefits, for example, a journalist who wants to write an article about aircraft safety based on some recent incidents. To answer questions such as “which types of accidents are the most common?” or “which airlines are involved in the most incidents?” she would currently have to read a large volume of accident reports from government regulators and extract the necessary information and file it in a structured way (for example, in a database or spreadsheet) to then be able to answer these questions. Depending on the follow-up questions, this process then has to be repeated quite a few times because it is not possible to assess in advance which information might be relevant in the further course.

The research of this project will lay a foundation for automatically checking or answering such queries and hypotheses by automatically identifying the relevant information in a text collection and preparing it for an approximate answer. This can save hours of manual work – per question.

Software Campus partners: TU Darmstadt, Holtzbrinck Publishing Group
Implementation period: 01.03.2021 – 28.02.2023