Mining Unstructured Data via Computational Intelligence

被引:0
|
作者
Kuri-Morales, Angel [1 ]
机构
[1] Inst Tecnol Autonomo Mexico, Mexico City 01000, DF, Mexico
关键词
Data bases; Neural networks; Genetic algorithms; Categorical encoding;
D O I
10.1007/978-3-319-27060-9_43
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
At present very large volumes of information are being regularly produced in the world. Most of this information is unstructured, lacking the properties usually expected from, for instance, relational databases. One of the more interesting issues in computer science is how, if possible, may we achieve data mining on such unstructured data. Intuitively, its analysis has been attempted by devising schemes to identify patterns and trends through means such as statistical pattern learning. The basic problem of this approach is that the user has to decide, a priori, the model of the patterns and, furthermore, the way in which they are to be found in the data. This is true regardless of the kind of data, be it textual, musical, financial or otherwise. In this paper we explore an alternative paradigm in which raw data is categorized by analyzing a large corpus from which a set of categories and the different instances in each category are determined, resulting in a structured database. Then each of the instances is mapped into a numerical value which preserves the underlying patterns. This is done using a genetic algorithm and a set of multi-layer perceptron networks. Every categorical instance is then replaced by the adequate numerical code. The resulting numerical database may be tackled with the usual clustering algorithms. We hypothesize that any unstructured data set may be approached in this fashion. In this work we exemplify with a textual database and apply our method to characterize texts by different authors and present experimental evidence that the resulting databases yield clustering results which permit authorship identification from raw textual data.
引用
收藏
页码:518 / 529
页数:12
相关论文
共 50 条
  • [21] Plenary lecture four - Data mining through data visualisation: Computational intelligence approaches
    Fyfe, Colin
    PROCEEDINGS OF THE 7TH WSEAS INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, ROBOTICS AND AUTOMATION: ADVANCED TOPICS ON SIGNAL PROCESSING, ROBOTICS AND AUTOMATION, 2008, : 15 - 15
  • [22] Intelligence use of unstructured data in a data warehouse environment
    Wakefield, Jim
    INTELLIGENCE AND SECURITY INFORMATICS, PROCEEDINGS, 2006, 3975 : 694 - 695
  • [23] Data Mining Over Biological Datasets: An Integrated Approach Based on Computational Intelligence
    Stegmayer, Georgina
    Gerard, Matias
    Milone, Diego H.
    IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE, 2012, 7 (04) : 22 - 34
  • [24] Special issue on computational intelligence for social media data mining and knowledge discovery
    Li, Ying
    Shyamasundar, R. K.
    Wang, Xinheng
    COMPUTATIONAL INTELLIGENCE, 2021, 37 (02) : 658 - 659
  • [25] A Literature Review on Mining Cyberthreat Intelligence from Unstructured Texts
    Rahman, Md Rayhanur
    Mahdavi-Hezaveh, Rezvan
    Williams, Laurie
    20TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2020), 2020, : 516 - 525
  • [26] Business Intelligence Model for Unstructured Data Management
    Abdullah, Mohammad Fikry
    Ahmad, Kamsuriah
    5TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATICS 2015, 2015, : 473 - 477
  • [27] Topics and Terms Mining in Unstructured Data Stores
    Lomotey, Richard K.
    Deters, Ralph
    2013 IEEE 16TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE 2013), 2013, : 854 - 861
  • [28] Special issue on computational intelligence for social mining
    Camacho, David
    Bello-Orgaz, Gema
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2020, 11 (01) : 1 - 3
  • [29] Special issue on computational intelligence for social mining
    David Camacho
    Gema Bello-Orgaz
    Journal of Ambient Intelligence and Humanized Computing, 2020, 11 : 1 - 3
  • [30] An Efficient Neuro-Fuzzy-Genetic Data mining Framework Based On Computational Intelligence
    Zhang, Zhibing
    HIS 2009: 2009 NINTH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS, VOL 2, PROCEEDINGS, 2009, : 178 - 183