Name of the participant: Yevgeniy Puzikov
Description of the IT-research project: The research project was concerned with the development of Natural Language Generation (NLG) techniques for eCommerce applications. The two use cases addressed were:
– Re-generating texts with the required stylistic features, while preserving the original contents (change of the tone of voice of a document).
– Generating textual descriptions of life science products from sets of key-value attribute pairs.
As a result, a Natural Language Generation (NLG) framework called “Sanity Polygon” was developed. It presents a sensible methodology system that encompasses several aspects of the NLG process that should align together for a successful delivery of an NLG product. In the process of the framework development, algorithms for text generation were designed and implemented.
It was shown how, depending on the task at hand, one can successfully solve the problem with simple template-based approaches, as opposed to using complex statistical models.
Furthermore, a robust automatic system for generating high-quality natural language statements from structured content representations was developed. It was shown that a human-designed algorithm helps purely statistical systems to not stray away too far from the underlying meaning during text generation.
In addition, an improvement strategy for common approaches to text generation from data tables was proposed, and it was shown that annotation issues that commonly cause data-driven models to hallucinate content, can be largely mitigated by constraining the generation process with templates.
Finally, modern techniques for controlling the generation process were empirically evaluated. Specifically, various approaches to textual style transfer were compared.
Software Campus partners: TU Darmstadt, Merck
Implementation period: 01.11.2018 bis 01.11.2020