Clustering documents in evolving languages by image texture analysis

被引:0
|
作者
Darko Brodić
Alessia Amelio
Zoran N. Milivojević
机构
[1] University of Belgrade,Technical Faculty in Bor
[2] DIMES University of Calabria,undefined
[3] College of Applied Technical Sciences,undefined
来源
Applied Intelligence | 2017年 / 46卷
关键词
Coding; Clustering; Document analysis; Evolving languages; Image processing; Italian language; Language recognition; Statistical analysis;
D O I
暂无
中图分类号
学科分类号
摘要
This paper introduces a new method for clustering of documents, which have been written in a language evolving during different historical periods, with an example of the Italian language. In the first phase, the text is transformed into a string of four numerical codes, which have been derived from the energy profile of each letter, defining the height of the letters and their location in the text line. Each code represents a gray level and the text is codified as a 1-D image. In the second phase, texture features are extracted from the obtained image in order to create document feature vectors. Subsequently, a new clustering algorithm is employed on the feature vectors to discriminate documents from different historical periods of the language. Experiments are performed on a database of Italian documents given in Italian Vulgar and modern Italian. Results demonstrate that this proposed method perfectly identifies the historical periods of the language of the documents, outperforming other well-known clustering algorithms generally adopted for document categorization and other state-of-the-art text-based language models.
引用
收藏
页码:916 / 933
页数:17
相关论文
共 50 条
  • [31] Dynamic clustering analysis of documents based on cluster centroids
    Zheng, XS
    He, PL
    Yuan, FY
    Wang, Z
    Wu, GY
    2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, : 194 - 198
  • [32] Clustering Technical Documents by Stylistic Features for Authorship Analysis
    Berry, Daniel
    Sazonov, Edward
    IEEE SOUTHEASTCON 2015, 2015,
  • [33] Automatically Evolving Rotation-Invariant Texture Image Descriptors by Genetic Programming
    Al-Sahaf, Harith
    Al-Sahaf, Ausama
    Xue, Bing
    johnston, Mark
    Zhang, Mengjie
    IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2017, 21 (01) : 83 - 101
  • [34] Clustering of singular value decomposition of image data with applications to texture classification
    Targhi, AT
    Shademan, A
    VISUAL COMMUNICATIONS AND IMAGE PROCESSING 2003, PTS 1-3, 2003, 5150 : 972 - 979
  • [35] Visual languages for sketching documents
    Pinto-Albuquerque, M
    Fonseca, MJ
    Jorge, JA
    2000 IEEE INTERNATIONAL SYMPOSIUM ON VISUAL LANGUAGES, PROCEEDINGS, 2000, : 225 - 232
  • [36] FRACTAL MODELING IN IMAGE TEXTURE ANALYSIS
    DENNIS, TJ
    DESSIPRIS, NG
    IEE PROCEEDINGS-F RADAR AND SIGNAL PROCESSING, 1989, 136 (05) : 227 - 235
  • [37] Analysis of Texture of Image with Statistics Method
    Guan, Jishi
    Li, Xian
    Zhou, Yuguang
    Shi, Hongwei
    PROCEEDINGS OF THE 2015 INTERNATIONAL INDUSTRIAL INFORMATICS AND COMPUTER ENGINEERING CONFERENCE, 2015, : 567 - 572
  • [38] Contour and texture analysis for image segmentation
    Malik, J
    Belongie, S
    Leung, T
    Shi, JB
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2001, 43 (01) : 7 - 27
  • [39] Texture features in facial image analysis
    Pietikäinen, M
    Hadid, A
    ADVANCES IN BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2005, 3781 : 1 - 8
  • [40] An automatic image analysis of coke texture
    Eilertsen, JL
    Rrvik, S
    Foosns, T
    Oye, HA
    CARBON, 1996, 34 (03) : 375 - 385