Страница публикации

Some approaches for improving quality of tabular data

Авторы: Paramonov V.V., Lomaeva E.A.

Журнал: CEUR Workshop Proceedings: 3rd Scientific-Practical Workshop Information Technologies: Algorithms, Models, Systems (ITAMS 2020; Irkutsk, 3 September 2020)

Том: 2677

Номер:

Год: 2020

Отчётный год: 2020

Издательство:

Местоположение издательства:

URL:

Аннотация: A spreadsheet is one of popular forms for presentation and transferring data of the same types. The area of using this kind of documents is very widespread. Extraction tables from spreadsheets and their understanding are significant tasks that allow getting useful information for further use, for example in processes of integration data that obtained from various sources. As rule tables in spreadsheets create by humans and for humans use. This feature could be the reason that tables may contain messy data such as misprints, errors of calculation, incorrect structure etc. It leads to the complication of automated table processing and understanding. This paper has discussed some approaches to data cleanse that improve the quality of tabular data. The approaches consist of checking and correction of cells calculation and spelling errors. We use phonetic words similarity to correct spelling mistakes in words and heuristic algorithms to detect calculated values in cells.

Индексируется WOS: 0

Индексируется Scopus: 1

Индексируется РИНЦ: 1

Публикация в печати: 0

Добавил в систему: