HDSAnalytics: A Data Analytics Framework for Heterogeneous Data Sources

被引:4
|
作者
Jaybal, Yogalakshmi [1 ]
Ramanathan, Chandrashekar [1 ]
Rajagopalan, S. [1 ]
机构
[1] Int Inst Informat Technol, Bangalore, Karnataka, India
关键词
heterogeneous data sources; analytics; query processing; INTEGRATION;
D O I
10.1145/3152494.3152516
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
This paper presents HDSAnalytics: A data analytics framework for heterogeneous data sources. This framework utilizes data from a variety of data sources differing in formats and volume. These data sources can contain data in structured, semi-structured or unstructured form. The integration of data from these different data sources into a single unified data source may result in some loss of information due to semantic, syntactic and schematic differences that arise among data sources. Semantic heterogeneity arises because of the presence of similar data in different forms in different data sources. Schematic and Syntactic heterogeneity arises due to the difference in formats/schema in which the data is stored and the way in which the data is accessed and retrieved respectively. Hence, the need to access, retrieve and utilize the information from different data sources possess challenges like 1. Mapping similar attributes among different data sources, 2. Retrieving specific attributes from different data sources that are relevant with respect to a users query, 3. Retrieving data from different data sources in different formats as requested by different components in the system. The proposed HDSAnalytics framework design aides analytic models in using heterogeneous data sources "As -Is" without integrating into a single data source, thereby overcoming all the above mentioned challenges. Our prototype of the framework, experimented using data from Bangalore Metropolitan Transport Corporation (BMTC), India, demonstrates how bus fleet operations can be smoothly analyzed, diagnosed and explored for improving bus fleet schedules and reducing the operations costs. It provides detailed insight on bus fleet operations. Our prototype scales and works efficiently well with increasing number of heterogeneous data sources.
引用
收藏
页码:11 / 19
页数:9
相关论文
共 50 条
  • [1] An Intelligent Data Service Framework for Heterogeneous Data Sources
    Khan, Fakhri Alam
    Rehman, Mujeeb Ur
    Khalid, Afsheen
    Ali, Muhammad
    Imran, Muhammad
    Nawaz, Muhammad
    Rahman, Attaur
    [J]. JOURNAL OF GRID COMPUTING, 2019, 17 (03) : 577 - 589
  • [2] An Intelligent Data Service Framework for Heterogeneous Data Sources
    Fakhri Alam Khan
    Mujeeb ur Rehman
    Afsheen Khalid
    Muhammad Ali
    Muhammad Imran
    Muhammad Nawaz
    Attaur Rahman
    [J]. Journal of Grid Computing, 2019, 17 : 577 - 589
  • [3] A big Data Analytics Framework for the Integration of Heterogeneous Federated Data Centers
    Hewapathirana, Ishara
    Silva, Thushari
    [J]. PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT 2021), 2021, : 650 - 657
  • [4] A Novel Framework for Integrating Heterogeneous Data Sources through Data Exchange
    Cheng, Yin -Ting
    Chen, Ming-Chih
    [J]. SENSORS AND MATERIALS, 2023, 35 (07) : 2603 - 2618
  • [5] A Heterogeneous Data Analytics Framework for RFID-Enabled Factories
    Zhong, Ray Y.
    Putnik, Goran D.
    Newman, Stephen T.
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2021, 51 (09): : 5567 - 5576
  • [6] A Clinical Decision Support Framework for Heterogeneous Data Sources
    Huang, Mengxing
    Han, Huirui
    Wang, Hao
    Li, Lefei
    Zhang, Yu
    Bhatti, Uzair Aslam
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2018, 22 (06) : 1824 - 1833
  • [7] A Data Model for Heterogeneous Data Sources
    Chirathamjaree, Chaiyaporn
    [J]. PROCEEDINGS OF THE ICEBE 2008: IEEE INTERNATIONAL CONFERENCE ON E-BUSINESS ENGINEERING, 2008, : 121 - 127
  • [8] A framework for abstracting data sources having heterogeneous representation formats
    Rosaci, D
    Terracina, G
    Ursino, D
    [J]. DATA & KNOWLEDGE ENGINEERING, 2004, 48 (01) : 1 - 38
  • [9] Working framework of semantic interoperability for CRIS with heterogeneous data sources
    Leiva-Mederos, Amed
    Senso, Jose A.
    Hidalgo-Delgado, Yusniel
    Hipola, Pedro
    [J]. JOURNAL OF DOCUMENTATION, 2017, 73 (03) : 481 - 499
  • [10] Lynx: A Graph Query Framework for Multiple Heterogeneous Data Sources
    Shen, Zhihong
    Hu, Chuan
    Zhao, Zihao
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 16 (12): : 3926 - 3929