Intelligent approach to automated star-schema construction using a knowledge base

2021
Most data-warehouse construction processes are performed manually by experts, which is laborious, timeconsuming, and prone to error. Furthermore, special knowledge is required to design complex multidimensional models, such as a star schema. This predicament has motivated computer scientists to propose automation techniques to generate such models. For this reason, we present a new strategy that incorporates knowledgebased models into a framework, named the Semantic-based Star-schema Designer, that assists the automation of star schema construction. Our models provide reasoning capabilities needed by star schema designs, including those that can disambiguate heterogeneous terms, detect appropriate data types and attribute sizes, and organize data hierarchies to support online analytical processes. We also propose strategies to overcome the uncertainty arising when attribute names are not available in the data source. The names of unknown attributes are thus predicted using an arithmetic coding technique to infer column names. Our system also generates star schema from semi-structured data (e.g., comma-separated-value files and spreadsheets), which do not provide primary keys, foreign keys, or relationship cardinalities between tables. Our framework facilitates star schema construction and their relationship information without human intervention using homegrown algorithms. Experiments demonstrate that our technique predicts column names and data types that enable the effective generation of star schema better than baseline approaches.
EXPERT SYSTEMS WITH APPLICATIONS
卷号:182
ISSN:0957-4174
收录类型
SSCI
发表日期
2021
学科领域
循证管理学
国家
泰国
语种
英语
DOI
10.1016/j.eswa.2021.115226
其他关键词
DATA WAREHOUSES; DESIGN; FRAMEWORK
EISSN
1873-6793
资助机构
Computer Science and Information Technology Department, Science Faculty, Naresuan University [R2564E059, R2564E060]; Health Systems Research Institute [63-017]; Program Management Unit for Human Resources & Institutional Development, Research, and Innovation [B16F630071]; Thailand Science Research Innovation (TSRI) [CU_FRB640001_01_30_1]
资助信息
This research was supported by the Computer Science and Information Technology Department, Science Faculty, Naresuan University (Grant no: R2564E059, R2564E060), Health Systems Research Institute (Grant no: 63-017), Program Management Unit for Human Resources & Institutional Development, Research, and Innovation (Grant no: B16F630071), and Thailand Science Research Innovation (TSRI Grant no: CU_FRB640001_01_30_1). The funder had no role in the study design, data collection, analysis, decision to publish, or preparation of the manuscript. All authors have read and approved the final manuscript and declare that no competing interests exist.
被引频次(WOS)
0
被引更新日期
2022-01
来源机构
Naresuan University Naresuan University King Mongkuts Institute of Technology Ladkrabang
关键词
Data warehouse Intelligent system Multidimensional model Ontology Semantic approach Star schema