Automated Subject Classification of Textual Documents in the Context of Web-Based Hierarchical Browsing

被引:2
|
作者
Golub, Koraljka [1 ]
机构
[1] Univ Bath, UKOLN, Bath BA2 7AY, Avon, England
来源
KNOWLEDGE ORGANIZATION | 2011年 / 38卷 / 03期
关键词
D O I
10.5771/0943-7444-2011-3-230
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
While automated methods for information organization have been around for several decades now, exponential growth of the World Wide Web has put them into the forefront of research in different communities, within which several approaches can be identified: 1) machine learning (algorithms that allow computers to improve their performance based on learning from pre-existing data); 2) document clustering (algorithms for unsupervised document organization and automated topic extraction); and 3) string matching (algorithms that match given strings within larger text). Here the aim was to automatically organize textual documents into hierarchical structures for subject browsing. The string-matching approach was tested using a controlled vocabulary (containing pre-selected and pre-defined authorized terms, each corresponding to only one concept). The results imply that an appropriate controlled vocabulary, with a sufficient number of entry terms designating classes, could in itself be a solution for automated classification. Then, if the same controlled vocabulary had an appropriate hierarchical structure, it would at the same time provide a good browsing structure for the collection of automatically classified documents.
引用
收藏
页码:230 / 244
页数:15
相关论文
共 50 条
  • [1] Automated subject classification of textual web documents
    Golub, Koraljka
    [J]. JOURNAL OF DOCUMENTATION, 2006, 62 (03) : 350 - 371
  • [2] Automated classification of web pages in hierarchical browsing
    Golub, Koraljka
    Lykke, Marianne
    [J]. JOURNAL OF DOCUMENTATION, 2009, 65 (06) : 901 - 925
  • [3] SenTag: A Web-Based Tool for Semantic Annotation of Textual Documents
    Loreggia, Andrea
    Mosco, Simone
    Zerbinati, Alberto
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 13191 - 13193
  • [4] Automated classification of textual documents based on a controlled vocabulary in engineering
    Golub, Koralika
    Hamon, Thierry
    Ardo, Anders
    [J]. KNOWLEDGE ORGANIZATION, 2007, 34 (04): : 247 - 263
  • [5] Hierarchical Multidimensional Classification of Web Documents with MultiWebClass
    Serafino, Francesco
    Pio, Gianvito
    Ceci, Michelangelo
    Malerba, Donato
    [J]. DISCOVERY SCIENCE, DS 2015, 2015, 9356 : 236 - 250
  • [6] Web-based text classification in the absence of manually labeled training documents
    Hung, Chen-Ming
    Chien, Lee-Feng
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2007, 58 (01): : 88 - 96
  • [7] Automated classification of web documents into a hierarchy of categories
    Ceci, M
    Esposito, F
    Lapi, M
    Malerba, D
    [J]. INTELLIGENT INFORMATION PROCESSING AND WEB MINING, 2003, : 59 - 68
  • [8] Web-based searching and browsing of multimedia data
    Niblack, W
    Yue, S
    Kraft, R
    Amir, A
    Sundaresan, N
    [J]. 2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 1717 - 1720
  • [9] A web-based, branching logic questionnaire for the automated classification of migraine
    Kaiser, Eric A.
    Igdalova, Aleksandra
    Aguirre, Geoffrey K.
    Cucchiara, Brett
    [J]. CEPHALALGIA, 2019, 39 (10) : 1257 - 1266
  • [10] Searching and Browsing Live, Web-based Meetings
    Carter, Scott
    Denoue, Laurent
    Cooper, Matthew
    [J]. MM'15: PROCEEDINGS OF THE 2015 ACM MULTIMEDIA CONFERENCE, 2015, : 791 - 792