Страница публикации

On Graph-Based Verification for PDF Table Detection

Авторы: Mikhailov A., Shigarov A., Cherepanov I.

Журнал: Proceedings of Ivannikov ISPRAS Open Conference (ISPRAS 2020, Moscow, 10–11 декабря 2020 г.)



Год: 2020

Отчётный год: 2020

Издательство: Institute of Electrical and Electronics Engineers Inc.

Местоположение издательства: Moscow


Аннотация: Many non-editable documents are shared in PDF (Portable Document Format). They are typically not accompanied by tags for annotating the page layout, including table positions. One of the important challenges of the analysis and understanding of such documents is table detection. This paper outlines a novel two-phase approach to the table detection in untagged PDF documents. The first phase uses deep neural networks (DNN) to predict some table candidates. The second phase selects probable tables from the candidates by verifying their graph representation. We build a weighted directed graph from text blocks inside a predicted area of a table. A set of such graphs produced from the 'ICDAR 2013 Table Competition' dataset allowed us to train a verification model based on the Random Forest technique. The empirical results for competitive dataset demonstrated high performance of our implementation of this approach. We showed that additional verification enables reduction of errors and improvement of results of the PDF table detection.

Индексируется WOS: 1

Индексируется Scopus: 1

Индексируется РИНЦ: 1

Публикация в печати: 0

Добавил в систему: