Mining Unstructured Data via Computational Intelligence

被引:0
|
作者
Kuri-Morales, Angel [1 ]
机构
[1] Inst Tecnol Autonomo Mexico, Mexico City 01000, DF, Mexico
关键词
Data bases; Neural networks; Genetic algorithms; Categorical encoding;
D O I
10.1007/978-3-319-27060-9_43
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
At present very large volumes of information are being regularly produced in the world. Most of this information is unstructured, lacking the properties usually expected from, for instance, relational databases. One of the more interesting issues in computer science is how, if possible, may we achieve data mining on such unstructured data. Intuitively, its analysis has been attempted by devising schemes to identify patterns and trends through means such as statistical pattern learning. The basic problem of this approach is that the user has to decide, a priori, the model of the patterns and, furthermore, the way in which they are to be found in the data. This is true regardless of the kind of data, be it textual, musical, financial or otherwise. In this paper we explore an alternative paradigm in which raw data is categorized by analyzing a large corpus from which a set of categories and the different instances in each category are determined, resulting in a structured database. Then each of the instances is mapped into a numerical value which preserves the underlying patterns. This is done using a genetic algorithm and a set of multi-layer perceptron networks. Every categorical instance is then replaced by the adequate numerical code. The resulting numerical database may be tackled with the usual clustering algorithms. We hypothesize that any unstructured data set may be approached in this fashion. In this work we exemplify with a textual database and apply our method to characterize texts by different authors and present experimental evidence that the resulting databases yield clustering results which permit authorship identification from raw textual data.
引用
收藏
页码:518 / 529
页数:12
相关论文
共 50 条
  • [31] Data mining for Web intelligence
    Han, JW
    Chang, KCC
    COMPUTER, 2002, 35 (11) : 64 - +
  • [32] Computational science and data mining
    Marginean, FA
    COMPUTATIONAL SCIENCE - ICCS 2003, PT III, PROCEEDINGS, 2003, 2659 : 644 - 651
  • [33] Computational aspects of data mining
    Marginean, FA
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2003, PT 1, PROCEEDINGS, 2003, 2667 : 614 - 622
  • [34] Computational Intelligence in Big Data
    Jin, Yaochu
    Hammer, Barbara
    IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE, 2014, 9 (03) : 12 - +
  • [35] Searching for Significance in Unstructured Data: text mining with Leximancer
    Thomas, David A.
    EUROPEAN EDUCATIONAL RESEARCH JOURNAL, 2014, 13 (02): : 235 - 256
  • [36] 3rd Workshop on Mining Unstructured Data
    Bacchelli, Alberto
    Bettenburg, Nicolas
    Guerrouj, Latifa
    Haiduc, Sonia
    2013 20TH WORKING CONFERENCE ON REVERSE ENGINEERING (WCRE), 2013, : 491 - +
  • [37] A CRM Model Based on Mining Unstructured Customers' Data
    Deng Shaoling
    Li Yan
    2008 4TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-31, 2008, : 11277 - 11279
  • [38] Examining computational thinking processes in modeling unstructured data
    Shiyan Jiang
    Yingxiao Qian
    Hengtao Tang
    Rabia Yalcinkaya
    Carolyn P. Rosé
    Jie Chao
    William Finzer
    Education and Information Technologies, 2023, 28 : 4309 - 4333
  • [39] EmoSense: Computational Intelligence Driven Emotion Sensing via Wireless Channel Data
    Gu, Yu
    Wang, Yantong
    Liu, Tao
    Ji, Yusheng
    Liu, Zhi
    Li, Peng
    Wang, Xiaoyan
    An, Xin
    Ren, Fuji
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2020, 4 (03): : 216 - 226
  • [40] Examining computational thinking processes in modeling unstructured data
    Jiang, Shiyan
    Qian, Yingxiao
    Tang, Hengtao
    Yalcinkaya, Rabia
    Rose, Carolyn P.
    Chao, Jie
    Finzer, William
    EDUCATION AND INFORMATION TECHNOLOGIES, 2023, 28 (04) : 4309 - 4333