Страница публикации

Table extraction, analysis, and interpretation: The current state of the TabbyDOC project

Авторы: Shigarov A., Dorodnykh N., Mikhailov A., Paramonov V., Yurin A.

Журнал: CEUR Workshop Proceedings: 4th Scientific-Practical Workshop Information Technologies: Algorithms, Models, Systems (ITAMS 2021, Irkutsk, 14 September 2021)

Том: 2984


Год: 2021

Отчётный год: 2021


Местоположение издательства:


Аннотация: The freely available tabular data represented in various digital formats, such as print-oriented documents, spreadsheets, and web pages, are a valuable source to populate knowledge graphs. However, difficulties that inevitably arise with the extraction and integration of the tabular data often hinder their intensive use in practice. TabbyDOC project aims at elaborating a theoretical basis and developing open software for data extraction from arbitrary tables. Previously, it was devoted to the following issues: (i) table extraction tables from print-oriented documents, (ii) data transformation from spreadsheet tables to relational and linked data. This paper summarizes the project’s results that are intended for the following tasks: (i) automation of fine-tuning artificial neural networks for table detection in document images, (ii) a synthesis of programs for spreadsheet data transformation driven by user-defined rules of table analysis and interpretation, and (iii) generating RDF-triples from entities extracted from relational tables.

Индексируется WOS: 0

Индексируется Scopus: 1

Индексируется РИНЦ: 1

Публикация в печати: 0

Добавил в систему: