Страница публикации

Towards End-to-End Transformation of Arbitrary Tables from Untagged Portable Documents (PDF) to Linked Data

Авторы: Shigarov A., Cherepanov I., Cherkashin E., Dorodnykh N., Khristyuk V., Mikhailov A., Paramonov V., Rozhkow E., Yurin A.

Журнал: CEUR Workshop Proceedings: Proc. of 2nd Scientific-Practical Workshop Information Technologies: Algorithms, Models, Systems (ITAMS'2019)

Том: 2463


Год: 2019

Отчётный год: 2019


Местоположение издательства:


Аннотация: The paper is devoted to the problem of an end-to-end table transformation from untagged portable documents (PDF) to linked data. It covers the issues of the table extraction from documents, the reconstruction of logical table structure, the conceptualization of their natural-language content, and the linking of extracted data with external vocabularies. We consider some perspective approaches for the deeplearning-based table detection, heuristic-based table structure recognition, rule-based table analysis, and knowledge-based table interpretation. They can be used as a basis to develop a consistent solution for this problem. Our application experience confirms that such solutions are demanded for populating databases and generating ontologies with tabular data being extracted from weakly and semi-structured documents.

Индексируется WOS: 0

Индексируется Scopus: 1

Индексируется РИНЦ: 0

Публикация в печати: 0

Добавил в систему: