Text categorization: An experiment using phrases

被引:0
|
作者
Kongovi, M [1 ]
Guzman, JC [1 ]
Dasigi, V [1 ]
机构
[1] So Polytechn State Univ, Marietta, GA 30060 USA
来源
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Typical text classifiers learn from example and training documents that have been manually categorized. In this research, our experiment dealt with the classification of news wire articles using category profiles. We built these profiles by selecting feature words and phrases from the training documents. For our experiments we decided on using the text corpus Reuters-21578. We used precision and recall to measure the effectiveness of our classifier. Though our experiments with words yielded good results, we found instances where the phrase-based approach produced more effectiveness. This could be due to the fact that when a word along with its adjoining word-a phrase-is considered towards building a category profile, it could be a good discriminator. This tight packaging of word pairs could bring in some semantic value. The packing of word pairs also filters out words occurring frequently in isolation that do not bear much weight towards characterizing that category.
引用
收藏
页码:213 / 228
页数:16
相关论文
共 50 条
  • [1] EXPERIMENT ON METHODS FOR CLUSTERING AND CATEGORIZATION OF POLISH TEXT
    Wielgosz, Maciej
    Fraczek, Rafal
    Russek, Pawel
    Pietron, Marcin
    Dabrowska-Boruch, Agnieszka
    Jamro, Ernest
    Wiatr, Kazimierz
    [J]. COMPUTING AND INFORMATICS, 2017, 36 (01) : 186 - 204
  • [2] Using wavelet analysis for text categorization in digital libraries: a first experiment with Strathprints
    Daranyi, Sandor
    Wittek, Peter
    Dobreva, Milena
    [J]. INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES, 2012, 12 (01) : 3 - 12
  • [3] Using wavelet analysis for text categorization in digital libraries: a first experiment with Strathprints
    Sándor Darányi
    Peter Wittek
    Milena Dobreva
    [J]. International Journal on Digital Libraries, 2012, 12 (1) : 3 - 12
  • [4] Using WordNet for text categorization
    Elberrichi, Zakaria
    Rahmoun, Abdelattif
    Bentaalah, Mohamed Amine
    [J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2008, 5 (01) : 16 - 24
  • [5] Using SVMs for text categorization
    Dumais, S
    [J]. IEEE INTELLIGENT SYSTEMS & THEIR APPLICATIONS, 1998, 13 (04): : 21 - 23
  • [6] Automatic Text Categorization using NTC
    Jo, Taeho
    [J]. NDT: 2009 FIRST INTERNATIONAL CONFERENCE ON NETWORKED DIGITAL TECHNOLOGIES, 2009, : 26 - 31
  • [7] Biomedical text categorization using UMLS
    Perea Ortega, Jose Manuel
    Martin Valdivia, Maria Teresa
    Montejo Raez, Arturo
    Diaz Galiano, Manuel Carlos
    [J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2008, (40): : 121 - 127
  • [8] Using KNN Algorithm for Text Categorization
    Wajeed, M. A.
    Adilakshmi, T.
    [J]. COMPUTATIONAL INTELLIGENCE AND INFORMATION TECHNOLOGY, 2011, 250 : 796 - +
  • [9] On using partial supervision for text categorization
    Aggarwal, CC
    Gates, SC
    Yu, PS
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2004, 16 (02) : 245 - 255
  • [10] Phrases in Text Types
    Dziurewicz, Elzbieta
    Wozniak, Joanna
    Zenderowska-Korpus, Grazyna
    [J]. MODERNA SPRAK, 2023, 117 (01): : 157 - 162