Страница публикации

TabbyXL: Rule-Based Spreadsheet Data Extraction and Transformation

Авторы: Shigarov A., Khristyuk V., Mikhailov A., Paramonov V.

Журнал: Communications in Computer and Information Science: 25th International Conference on Information and Software Technologies (ICIST 2019; Vilnius; Lithuania; 10-12 October 2019)

Том: 1078


Год: 2019

Отчётный год: 2019


Местоположение издательства:


Аннотация: This paper presents an approach to rule-based spreadsheet data extraction and transformation. We determine a table object model and domain-specific language of table analysis and interpretation rules. In contrast to the existing data transformation languages, we draw up this process as consecutive steps: role analysis, structural analysis, and interpretation. To the best of our knowledge, there are no languages for expressing rules for transforming tabular data into the relational form in terms of the table understanding. We also consider a tool for transforming spreadsheet data from arbitrary to relational tables. The performance evaluation has been done automatically for both (role and structural) stages of table analysis with the prepared ground-truth data. It shows high F-score from 95.82% to 99.04% for different recovered items in the existing dataset of 200 arbitrary tables of the same genre (government statistics).

Индексируется WOS: 0

Индексируется Scopus: 1

Индексируется РИНЦ: 0

Публикация в печати: 0

Добавил в систему: