Mining Unstructured Data via Computational Intelligence

被引:0
|
作者
Kuri-Morales, Angel [1 ]
机构
[1] Inst Tecnol Autonomo Mexico, Mexico City 01000, DF, Mexico
关键词
Data bases; Neural networks; Genetic algorithms; Categorical encoding;
D O I
10.1007/978-3-319-27060-9_43
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
At present very large volumes of information are being regularly produced in the world. Most of this information is unstructured, lacking the properties usually expected from, for instance, relational databases. One of the more interesting issues in computer science is how, if possible, may we achieve data mining on such unstructured data. Intuitively, its analysis has been attempted by devising schemes to identify patterns and trends through means such as statistical pattern learning. The basic problem of this approach is that the user has to decide, a priori, the model of the patterns and, furthermore, the way in which they are to be found in the data. This is true regardless of the kind of data, be it textual, musical, financial or otherwise. In this paper we explore an alternative paradigm in which raw data is categorized by analyzing a large corpus from which a set of categories and the different instances in each category are determined, resulting in a structured database. Then each of the instances is mapped into a numerical value which preserves the underlying patterns. This is done using a genetic algorithm and a set of multi-layer perceptron networks. Every categorical instance is then replaced by the adequate numerical code. The resulting numerical database may be tackled with the usual clustering algorithms. We hypothesize that any unstructured data set may be approached in this fashion. In this work we exemplify with a textual database and apply our method to characterize texts by different authors and present experimental evidence that the resulting databases yield clustering results which permit authorship identification from raw textual data.
引用
收藏
页码:518 / 529
页数:12
相关论文
共 50 条
  • [1] Computational intelligence for data mining
    Embrechts, MJ
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5: E-SYSTEMS AND E-MAN FOR CYBERNETICS IN CYBERSPACE, 2002, : 1484 - 1484
  • [2] Mining unstructured data for a competitive intelligence system XEW
    El Haddadi, Amine
    Fennan, Abdelhadi
    El Haddadi, Anass
    Boulouard, Zakaria
    Koutti, Lahcen
    [J]. 2015 6TH INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS AND ECONOMIC INTELLIGENCE (SIIE), 2015, : 146 - 149
  • [3] Computational Intelligence and Data Mining in Sports
    Fister, Iztok
    Fister, Iztok, Jr.
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (06):
  • [4] Data Mining Based on Computational Intelligence
    WANG Yuan-zhen 1
    2.School of Computer Science and Technology
    [J]. Wuhan University Journal of Natural Sciences, 2005, (02) : 371 - 374
  • [5] The role of computational intelligence in data mining
    Rubin, SH
    Ceruti, MG
    Dai, W
    [J]. PROCEEDINGS OF THE 2005 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2005, : 192 - 198
  • [6] Towards a better integration of data mining and decision support via computational intelligence
    Delisle, S
    [J]. Sixteenth International Workshop on Database and Expert Systems Applications, Proceedings, 2005, : 720 - 724
  • [7] Unstructured Big Data Threat Intelligence Parallel Mining Algorithm
    Li, Zhihua
    Yu, Xinye
    Wei, Tao
    Qian, Junhao
    [J]. BIG DATA MINING AND ANALYTICS, 2024, 7 (02): : 531 - 546
  • [8] Data mining through data visualisation: Computational intelligence
    AbdulRahman, R. Alazmi
    AbdulAziz, R. Alazmi
    [J]. ELECTRONICS AND COMMUNICATIONS: PROCEEDINGS OF THE 7TH WSEAS INTERNATIONAL CONFERENCE ON ELECTRONICS, HARDWARE, WIRELESS AND OPTICAL COMMUNICATIONS (EHAC '08), 2008, : 15 - 15
  • [9] High Order Computational Intelligence in Data Mining
    Neukart, Florian
    Grigorescu, Costin-Marius
    Moraru, Sorin-Aurel
    [J]. 2011 6TH CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2011,
  • [10] Special Issue on Computational Intelligence in Data mining
    Abonyi, Janos
    Abraham, Ajith
    [J]. INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2005, 29 (01): : 1 - 2