Centroid-based language identification using letter feature set

被引:0
|
作者
Takci, H [1 ]
Sogukpinar, I [1 ]
机构
[1] Gebze Inst Technol, TR-41400 Gebze, Turkey
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, an unexpected amount of growth of the text documents volume has been observed on the internet, intranet, in digital libraries and newsgroups. To obtain useful information and meaningful patterns from these documents, a great many researchers known under the term "text mining" have been carried out. Among them text categorization is to be mentioned that covers the problem of classifying documents relative to their similarities. One of techniques applied in this area is called centroid-based document classification method. All researchers on text categorization use the notion of frequency somehow or other. In this study, letter frequencies (LF) have been used for text categorization. By making use of letter frequencies information, the centroid-based document classification has been carried out. An experiment has been done on language detection for text documents. Its results allow propose that the letter-based text categorization should be done prior to term based text categorization.
引用
收藏
页码:640 / 648
页数:9
相关论文
共 50 条
  • [1] A high performance centroid-based classification approach for language identification
    Takci, Hidayet
    Gungor, Tunga
    [J]. PATTERN RECOGNITION LETTERS, 2012, 33 (16) : 2077 - 2084
  • [2] Shape tracking using centroid-based methods
    Abrantes, AJ
    Marques, JS
    [J]. ENERGY MINIMIZATION METHODS IN COMPUTER VISION AND PATTERN RECOGNITION, 2001, 2134 : 576 - 591
  • [3] CENTROID-BASED TEXTURE CLASSIFICATION USING THE SIRV REPRESENTATION
    Schutz, Aurelien
    Bombrun, Lionel
    Berthoumieu, Yannick
    [J]. 2013 20TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2013), 2013, : 3810 - 3814
  • [4] Centroid-Based Algorithm for Extracting Feature Points of Digital Cameras' Position
    Xu, Guangli
    Wang, Zhijiang
    Zhou, Guanchen
    [J]. INFORMATION COMPUTING AND APPLICATIONS, PT 2, 2010, 106 : 406 - +
  • [5] A new Chinese text feature selection method in centroid-based classifier
    Gu, Yijun
    Wang, Rong
    Wang, Jianhua
    Yu, Jiangde
    [J]. 2008 INTERNATIONAL SYMPOSIUM ON INFORMATION PROCESSING AND 2008 INTERNATIONAL PACIFIC WORKSHOP ON WEB MINING AND WEB-BASED APPLICATION, 2008, : 88 - +
  • [6] Centroid-Based Clustering with -Divergences
    Sarmiento, Auxiliadora
    Fondon, Irene
    Duran-Diaz, Ivan
    Cruces, Sergio
    [J]. ENTROPY, 2019, 21 (02)
  • [7] A Study on Intrusion Detection Using Centroid-Based Classification
    Setiawan, Bambang
    Djanali, Supeno
    Ahmad, Tohari
    [J]. 4TH INFORMATION SYSTEMS INTERNATIONAL CONFERENCE (ISICO 2017), 2017, 124 : 672 - 681
  • [8] Adversarial Anomaly Detection Using Centroid-based Clustering
    Anindya, Imrul Chowdhury
    Kantarcioglu, Murat
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2018, : 1 - 8
  • [9] RANDOM CENTROID INITIALIZATION FOR IMPROVING CENTROID-BASED CLUSTERING
    Romanuke, Vadim V.
    [J]. Decision Making: Applications in Management and Engineering, 2023, 6 (02): : 734 - 746
  • [10] Graph and Centroid-based Word Clustering
    Thaiprayoon, Santipong
    Unger, Herwig
    Kubek, Mario
    [J]. 2020 4TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2020, 2020, : 163 - 168