Toward framework for development of spreadsheet data extraction systems

Авторы: Shigarov A.O., Khristyuk V.V., Paramonov V.V., Yurin A.Yu., Dorodnykh N.O.

Журнал: Proc. 1st Scientific-Practical Workshop on Information Technologies: Algorithms, Models, Systems (ITAMS 2018)

Год: 2018

URL: http://ceur-ws.org/Vol-2221/paper14.pdf

Аннотация: The paper presents a problem formulation for the development of a theoretical and software framework for creating systems of data extraction from arbitrary spreadsheet tables. The problem covers the tasks of the automatic recovering semantic markup of tables, conceptualization of their natural-language content, data cleaning and lineage, generating relational and linked data, as well as a synthesis of tabular data transformation systems based on table analysis and interpretation rules. We consider the state of the art methods and discuss some perspective techniques for the development of a consistent solution to this problem.

