Automated Subject Classification of Textual Documents in the Context of Web-Based Hierarchical Browsing

被引:2
|
作者
Golub, Koraljka [1 ]
机构
[1] Univ Bath, UKOLN, Bath BA2 7AY, Avon, England
来源
KNOWLEDGE ORGANIZATION | 2011年 / 38卷 / 03期
关键词
D O I
10.5771/0943-7444-2011-3-230
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
While automated methods for information organization have been around for several decades now, exponential growth of the World Wide Web has put them into the forefront of research in different communities, within which several approaches can be identified: 1) machine learning (algorithms that allow computers to improve their performance based on learning from pre-existing data); 2) document clustering (algorithms for unsupervised document organization and automated topic extraction); and 3) string matching (algorithms that match given strings within larger text). Here the aim was to automatically organize textual documents into hierarchical structures for subject browsing. The string-matching approach was tested using a controlled vocabulary (containing pre-selected and pre-defined authorized terms, each corresponding to only one concept). The results imply that an appropriate controlled vocabulary, with a sufficient number of entry terms designating classes, could in itself be a solution for automated classification. Then, if the same controlled vocabulary had an appropriate hierarchical structure, it would at the same time provide a good browsing structure for the collection of automatically classified documents.
引用
收藏
页码:230 / 244
页数:15
相关论文
共 50 条
  • [31] Web-based system for Japanese local political documents
    Ototake, Hokuto
    Sakaji, Hiroki
    Takamaru, Keiichi
    Kobayashi, Akio
    Uchida, Yuzu
    Kimura, Yasutomo
    [J]. INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS, 2018, 14 (03) : 357 - 371
  • [32] XWebMapper:: A Web-based tool for transforming XML documents
    Llavador, Manel
    Canos, Jose H.
    [J]. RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES, 2006, 4172 : 563 - 566
  • [33] Structural abstractions of hypertext documents for Web-based retrieval
    Deogun, JS
    Sever, H
    Raghavan, VV
    [J]. NINTH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 1998, : 385 - 390
  • [34] A Web-Based Tool for Analysing Normative Documents in English
    Camilleri, John J.
    Haghshenas, Mohammad Reza
    Schneider, Gerardo
    [J]. 33RD ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, 2018, : 1865 - 1872
  • [35] Signing Documents in Web-Based Document Management Systems
    Sladic, Goran
    Milosavljevic, Branko
    [J]. IPSI BGD TRANSACTIONS ON INTERNET RESEARCH, 2013, 9 (01): : 26 - 31
  • [36] Enabling multimodal interaction in web-based personal digital photo browsing
    Ismail, N. A.
    O'Brien, E. A.
    [J]. 2008 INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION ENGINEERING, VOLS 1-3, 2008, : 907 - +
  • [37] Automated multiple hierarchical classification of web news of unexpected events
    Cai, Hua-Li
    Liu, Lu
    Wang, Li
    [J]. Beijing Gongye Daxue Xuebao/Journal of Beijing University of Technology, 2011, 37 (06): : 947 - 954
  • [38] Improving web browsing on small devices based on table classification
    Wang, C
    Xie, X
    Wang, WY
    Ma, WY
    [J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2004, PT 2, PROCEEDINGS, 2004, 3332 : 88 - 95
  • [39] Web-Based Intelligent Photograph Management System Enhancing Browsing Experience
    Orii, Yuki
    Nozawa, Takayuki
    Kondo, Toshiyuki
    [J]. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2010, 14 (04) : 390 - 395
  • [40] CoolTeD: A Web-based Collaborative Labeling Tool for the Textual Dataset
    Wang, Chong
    Jiang, Jingwen
    Daneva, Maya
    Van Sinderen, Marten
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER 2022), 2022, : 613 - 617