Mining Unstructured Data via Computational Intelligence

被引:0
|
作者
Kuri-Morales, Angel [1 ]
机构
[1] Inst Tecnol Autonomo Mexico, Mexico City 01000, DF, Mexico
关键词
Data bases; Neural networks; Genetic algorithms; Categorical encoding;
D O I
10.1007/978-3-319-27060-9_43
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
At present very large volumes of information are being regularly produced in the world. Most of this information is unstructured, lacking the properties usually expected from, for instance, relational databases. One of the more interesting issues in computer science is how, if possible, may we achieve data mining on such unstructured data. Intuitively, its analysis has been attempted by devising schemes to identify patterns and trends through means such as statistical pattern learning. The basic problem of this approach is that the user has to decide, a priori, the model of the patterns and, furthermore, the way in which they are to be found in the data. This is true regardless of the kind of data, be it textual, musical, financial or otherwise. In this paper we explore an alternative paradigm in which raw data is categorized by analyzing a large corpus from which a set of categories and the different instances in each category are determined, resulting in a structured database. Then each of the instances is mapped into a numerical value which preserves the underlying patterns. This is done using a genetic algorithm and a set of multi-layer perceptron networks. Every categorical instance is then replaced by the adequate numerical code. The resulting numerical database may be tackled with the usual clustering algorithms. We hypothesize that any unstructured data set may be approached in this fashion. In this work we exemplify with a textual database and apply our method to characterize texts by different authors and present experimental evidence that the resulting databases yield clustering results which permit authorship identification from raw textual data.
引用
收藏
页码:518 / 529
页数:12
相关论文
共 50 条
  • [41] 10th International Workshop on Computational Intelligence and Data Mining - WCIDM 2022 (Preface)
    Arnold, Dirk
    Berka, Petr
    Bustince, Humberto
    Doerr, Carola
    Durante, Fabrizio
    Engler, Hans
    Faigl, Jan
    Fürnkranz, Johannes
    Hartono, Pitoyo
    Holeňa, Martin
    Honlzinger, Andreas
    Horváth, Tomáš
    Hric, Ján
    Iacca, Giovanni
    Johanssen, Arne
    Kalina, Jan
    Kléma, Jiří
    Krempl, Georg
    Krilavičius, Tomas
    Kůrková, Věra
    Lenca, Philippe
    Lengler, Johannes
    Ligeza, Antoni
    Mantovani, Rafael Gomes
    Navara, Mirko
    Nguifo, Engelbert Memphu
    Okhrin, Ostap
    Perilieva, Irina
    Pošík, Petr
    Rauch, Jan
    Zhang, Tingting
    Železný, Filip
    [J]. CEUR Workshop Proceedings, 2022, 3226
  • [42] A data mining approach to intelligence operations
    Memon, Nasrullah
    Hicks, David L.
    Harkiolakis, Nicholas
    [J]. DATA MINING, INTRUSION DETECTION, INFORMATION ASSURANCE, AND DATA NETWORKS SECURITY 2008, 2008, 6973
  • [43] Data Mining and Business Intelligence Dashboards
    Jamalpur, Bhavana
    Sharma, S. S. V. N.
    [J]. INTERNATIONAL JOURNAL OF ASIAN BUSINESS AND INFORMATION MANAGEMENT, 2012, 3 (04) : 39 - 44
  • [44] Fuzzy Data Mining and Web Intelligence
    Poli, Venkata Subba Reddy
    [J]. 2015 INTERNATIONAL CONFERENCE ON FUZZY THEORY AND ITS APPLICATIONS (IFUZZY), 2015, : 74 - 79
  • [45] Mining WiFi Data for Business Intelligence
    Arora, Deepali
    Neville, Stephen W.
    Li, Kin Fun
    [J]. 2013 EIGHTH INTERNATIONAL CONFERENCE ON P2P, PARALLEL, GRID, CLOUD AND INTERNET COMPUTING (3PGCIC 2013), 2013, : 394 - 398
  • [46] Augmented intelligence in educational data mining
    Tapani Toivonen
    Ilkka Jormanainen
    Markku Tukiainen
    [J]. Smart Learning Environments, 6
  • [47] Augmented intelligence in educational data mining
    Toivonen, Tapani
    Jormanainen, Ilkka
    Tukiainen, Markku
    [J]. SMART LEARNING ENVIRONMENTS, 2019, 6 (01)
  • [48] E-business intelligence via MCMP-Based data mining methods
    Peng, Yi
    Shi, Yong
    Li, Xingsen
    Chen, Zhengxin
    Kou, Gang
    [J]. WEB INTELLIGENCE MEETS BRAIN INFORMATICS, 2007, 4845 : 443 - +
  • [49] Prediction of Press-Fit Quality via Data Mining Techniques and Artificial Intelligence
    Cruz Guerrero, Rene
    Alonso Lavernia, Maria de los Angeles
    Simon Marmolejo, Isaias
    [J]. IEEE ACCESS, 2019, 7 : 159599 - 159607
  • [50] Finding New Competitive Intelligence: Using Structured and Unstructured Data
    Kahlon, Ravinder Singh
    Tse, Man-Chie
    [J]. PROCEEDINGS OF THE 8TH EUROPEAN CONFERENCE ON INNOVATION AND ENTREPRENEURSHIP, VOL 2, 2013, : 842 - 846