Associative Feature Information Extraction Using Text Mining from Health Big Data

被引:37
|
作者
Kim, Joo-Chang [1 ]
Chung, Kyungyong [2 ]
机构
[1] Kyonggi Univ, Dept Comp Sci, Data Min Lab, 154-42 Gwanggyosan Ro, Suwon 16227, Gyeonggi Do, South Korea
[2] Kyonggi Univ, Div Comp Sci & Engn, 154-42 Gwanggyosan Ro, Suwon 16227, Gyeonggi Do, South Korea
关键词
Information extraction; Text mining; Health big data; TF-IDF; Data mining; SERVICE;
D O I
10.1007/s11277-018-5722-5
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
With the development of big data computing technology, most documents in various areas, including politics, economics, society, culture, life, and public health, have been digitalized. The structure of conventional documents differs according to their authors or the organization that generated them. Therefore, policies and studies related to their efficient digitalization and use exist. Text mining is the technology used to classify, cluster, extract, search, and analyze data to find patterns or features in a set of unstructured or structured documents written in natural language. In this paper, a method for extracting associative feature information using text mining from health big data is proposed. Using health documents as raw data, health big data are created by means of the Web. The useful information contained in health documents is extracted through text mining. Health documents as raw data are collected through Web scraping and then saved in a file server. The collected raw data of health documents are sentence type, and thus morphological analysis is applied to create a corpus. The file server executes stop word removal, tagging, and the analysis of polysemous words in a preprocessing procedure to create a candidate corpus. TF-C-IDF is applied to the candidate corpus to evaluate the importance of words in a set of documents. The words classified as of high importance by TF-C-IDF are included in a set of keywords, and the transactions of each document are created. Using an Apriori mining algorithm, the association rules of keywords in the created transaction are analyzed and associative keywords are generated. TF-C-IDF weights and associative keywords are extracted from health big data as associative features. The proposed method is a base technology for creating added value in the healthcare industry in the era of the 4th industrial revolution. Its evaluation in terms of F-measure and efficiency showed its performance to be high. The method is expected to contribute to healthcare big data management and information search.
引用
收藏
页码:691 / 707
页数:17
相关论文
共 50 条
  • [1] Associative Feature Information Extraction Using Text Mining from Health Big Data
    Joo-Chang Kim
    Kyungyong Chung
    [J]. Wireless Personal Communications, 2019, 105 : 691 - 707
  • [2] Feature Extraction of Museum Big Data Text Information Based on the Similarity Mapping Algorithm
    Yang, Zhe
    Wang, Huiqin
    Tang, Qixuan
    Wang, Ting
    Wang, Shaowen
    Kong, Yulei
    [J]. MOBILE INFORMATION SYSTEMS, 2022, 2022
  • [3] Scholarly Big Data: Information Extraction and Data Mining
    Giles, C. Lee
    [J]. PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 1 - 1
  • [4] Knowledge Entity Extraction and Text Mining in the Era of Big Data
    Zhang, Chengzhi
    Mayr, Philipp
    Lu, Wei
    Zhang, Yi
    [J]. Data and Information Management, 2021, 5 (03) : 309 - 311
  • [5] Knowledge process of health big data using MapReduce-based associative mining
    Choi, So-Young
    Chung, Kyungyong
    [J]. PERSONAL AND UBIQUITOUS COMPUTING, 2020, 24 (05) : 571 - 581
  • [6] Knowledge process of health big data using MapReduce-based associative mining
    So-Young Choi
    Kyungyong Chung
    [J]. Personal and Ubiquitous Computing, 2020, 24 : 571 - 581
  • [7] Mining knowledge from text repositories using information extraction: A review
    SANDEEP R SIRSAT
    DR VINAY CHAVAN
    DR SHRINIVAS P DESHPANDE
    [J]. Sadhana, 2014, 39 : 53 - 62
  • [8] Mining knowledge from text repositories using information extraction: A review
    Sirsat, Sandeep R.
    Chavan, Vinay
    Deshpande, Shrinivas P.
    [J]. SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2014, 39 (01): : 53 - 62
  • [9] Feature Extraction of Dialogue Text Based on Big Data and Machine Learning
    Liu, Xuelin
    Zhang, Hua
    Cheng, Yue
    [J]. International Journal of Web-Based Learning and Teaching Technologies, 2024, 19 (01)
  • [10] Extraction of features from clinical routine data using text mining
    Grundel, Bastian
    Bernardeau, Marc-Antoine
    Langner, Holger
    Schmidt, Christoph
    Boehringer, Daniel
    Ritter, Marc
    Rosenthal, Paul
    Grandjean, Andrea
    Schulz, Stefan
    Daumke, Philipp
    Stahl, Andreas
    [J]. OPHTHALMOLOGE, 2021, 118 (03): : 264 - 272