Short Text Clustering using Numerical data based on N-gram

被引:0
|
作者
Kumar, Rajiv [1 ]
Mathur, Robin Prakash [1 ]
机构
[1] Lovely Profess Univ, Dept Comp Sci Engn, Phagwara, Punjab, India
关键词
N-gram; Clustering; VSM;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Short text messages, especially mobile SMSs contain not only pure textual strings but also contain numeric values. Existing systems discard and filter out these numeric values. In our research, a new approach has been developed which makes usage of numeric values for feature extraction in the process of clustering. We are proposing an algorithm that uses n-gram approach to retrieve the pre-strings and post-strings of each numeric data and then similarity between documents is calculated. Partitioning is done to separate out two types of documents such as pure textual as well as mixed documents. Text messaging is gaining popularity in the field of pushing and providing short indication and informative notifications to users at any time. Use of numerical values through n-gram plays an important role for efficient clustering of text messages.
引用
收藏
页码:274 / 276
页数:3
相关论文
共 50 条
  • [1] Short Text Classification Based on Feature Extension Using The N-Gram Model
    Zhang, Xinwei
    Wu, Bin
    [J]. 2015 12TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2015, : 710 - 716
  • [2] A Short Text Classification Method Based on N-Gram and CNN
    WANG Haitao
    HE Jie
    ZHANG Xiaohong
    LIU Shufen
    [J]. Chinese Journal of Electronics, 2020, 29 (02) : 248 - 254
  • [3] A Short Text Classification Method Based on N-Gram and CNN
    Wang, Haitao
    He, Jie
    Zhang, Xiaohong
    Liu, Shufen
    [J]. CHINESE JOURNAL OF ELECTRONICS, 2020, 29 (02) : 248 - 254
  • [4] Language Identification of Short Text Segments with N-gram Models
    Vatanen, Tommi
    Vayrynen, Jaakko J.
    Virpioja, Sami
    [J]. LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 3423 - 3430
  • [5] Multilingual Text Categorization Using Character N-gram
    Suzuki, Makoto
    Yamagishi, Naohide
    Tsai, Yi-Ching
    Hirasawa, Shigeichi
    [J]. 2008 IEEE CONFERENCE ON SOFT COMPUTING IN INDUSTRIAL APPLICATIONS SMCIA/08, 2009, : 49 - +
  • [6] Chinese Text Categorization Using the Character N-gram
    Suzuki, Makoto
    Yamagishi, Naohide
    Tsai, Yi-Ching
    [J]. 2012 INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY AND ITS APPLICATIONS (ISITA 2012), 2012, : 722 - 726
  • [7] Improved Text Generation Using N-gram Statistics
    de Novais, Eder Miranda
    Tadeu, Thiago Dias
    Paraboni, Ivandre
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2010, 2010, 6433 : 316 - 325
  • [8] Construction of Scholarly n-Gram from Huge Text Data
    Hwang, Myunggwon
    Hwang, Mi-Nyeong
    Yeom, Ha-Neul
    Jung, Hanmin
    [J]. 2014 EIGHTH INTERNATIONAL CONFERENCE ON INNOVATIVE MOBILE AND INTERNET SERVICES IN UBIQUITOUS COMPUTING (IMIS), 2014, : 31 - 35
  • [9] Text mining with n-gram variables
    Schonlau, Matthias
    Guenther, Nick
    Sucholutsky, Ilia
    [J]. STATA JOURNAL, 2017, 17 (04): : 866 - 881
  • [10] SEARCHING FOR TEXT - SEND AN N-GRAM
    KIMBRELL, RE
    [J]. BYTE, 1988, 13 (05): : 297 - &