A comparative study of two automatic document classification methods in a library setting

被引:17
|
作者
Pong, Joanna Yi-Hang [2 ]
Kwok, Ron Chi-Wai [1 ]
Lau, Raymond Yiu-Keung [1 ]
Hao, Jin-Xing [1 ]
Wong, Percy Ching-Chi [1 ]
机构
[1] City Univ Hong Kong, Dept Informat Syst, Kowloon, Hong Kong, Peoples R China
[2] City Univ Hong Kong, Run Run Shaw Library, Kowloon, Hong Kong, Peoples R China
关键词
automatic document classification; text categorization; machine learning; k-nearest; neighbours classifier; naive Bayes classifier; library practice;
D O I
10.1177/0165551507082592
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In current library practice, trained human experts usually carry out document cataloguing and indexing based on a manual approach. With the explosive growth in the number of electronic documents available on the Internet and digital libraries, it is increasingly difficult for library practitioners to categorize both electronic documents and traditional library materials using just a manual approach. To improve the effectiveness and efficiency of document categorization at the library setting, more in-depth studies of using automatic document classification methods to categorize library items are required. Machine learning research has advanced rapidly in recent years. However, applying machine learning techniques to improve library practice is still a relatively unexplored area. This paper illustrates the design and development of a machine learning based automatic document classification system to alleviate the manual categorization problem encountered within the library setting. Two supervised machine learning algorithms have been tested. Our empirical tests show that supervised machine learning algorithms in general, and the k-nearest neighbours (KNN) algorithm in particular, can be used to develop an effective document classification system to enhance current library practice. Moreover, some concrete recommendations regarding how to practically apply the KNN algorithm to develop automatic document classification in a library setting are made. To our best knowledge, this is the first in-depth study of applying the KNN algorithm to automatic document classification based on the widely used LCC classification scheme adopted by many large libraries.
引用
收藏
页码:213 / 230
页数:18
相关论文
共 50 条
  • [31] A Comparative Study on Various Text Classification Methods
    Khanna, Samarth
    Tiwari, Bishnu
    Das, Priyanka
    Das, Asit Kumar
    [J]. COMPUTATIONAL INTELLIGENCE IN PATTERN RECOGNITION, CIPR 2020, 2020, 1120 : 539 - 549
  • [32] A COMPARATIVE STUDY OF CLASSIFICATION METHODS FOR FALL DETECTION
    Catalbas, Bahadir
    Secer, Gorkem
    Yucesoy, Burak
    Aslan, Murat
    [J]. 2014 22ND SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2014, : 1315 - 1318
  • [33] A New Method of Automatic Text Document Classification
    Yatsko, V. A.
    [J]. AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS, 2021, 55 (03) : 122 - 133
  • [34] QUERY-SPECIFIC AUTOMATIC DOCUMENT CLASSIFICATION
    WILLETT, P
    [J]. INTERNATIONAL FORUM ON INFORMATION AND DOCUMENTATION, 1985, 10 (02): : 28 - 32
  • [35] A New Method of Automatic Text Document Classification
    V. A. Yatsko
    [J]. Automatic Documentation and Mathematical Linguistics, 2021, 55 : 122 - 133
  • [36] Automatic Arabic Document Classification via kNN
    HANI M. O. Iwidat
    [J]. CADDM, 2008, Design and Manufacturing.2008 (02) : 65 - 73
  • [37] Comparative study of automatic phone segmentation methods for TTS
    Adell, J
    Bonafonte, A
    Gómez, JA
    Castro, MJ
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 309 - 312
  • [38] A Comparative Study on Recent Automatic Data Fusion Methods
    Pereira, Luis Manuel
    Salazar, Addisson
    Vergara, Luis
    [J]. COMPUTERS, 2024, 13 (01)
  • [39] Comparative Document Summarisation via Classification
    Bista, Umanga
    Mathews, Alexander
    Shin, Minjeong
    Menon, Aditya Krishna
    Xie, Lexing
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 20 - 28
  • [40] Multicriteria Supplier Classification for DSS: Comparative Analysis of Two Methods
    Sepulveda, J. M.
    Derpich, I. S.
    [J]. INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2015, 10 (02) : 238 - 247