Document Indexing with a Concept Hierarchy

被引:0
|
作者
Gelbukh, Alexander [1 ]
Sidorov, Grigori [1 ]
Guzman-Arenas, Adolfo [1 ]
机构
[1] Natl Polytech Inst IPN, Ctr Comp Res CIC, Nat Language Proc Lab, Av Juan Dios Batiz S-N, Mexico City 07738, DF, Mexico
来源
COMPUTACION Y SISTEMAS | 2005年 / 8卷 / 04期
关键词
Document Characterization; Document Comparison; Ontology; Statistical Methods;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Given a large hierarchical concept dictionary (thesaurus, or ontology), the task of selection of the concepts that describe the contents of a given document is considered. A statistical method of document indexing driven by such a dictionary is proposed. The method is insensible to inaccuracies in the dictionary, which allow for semi-automatic translation of the hierarchy into different languages. The problem of handling non-terminal and especially top-level nodes in the hierarchy is discussed. Common sense-complaint methods of automatically assigning the weights to the nodes and links in the hierarchy are presented. The application of the method in the Classifier system is discussed.
引用
收藏
页码:281 / 292
页数:12
相关论文
共 50 条
  • [1] THE CONCEPT OF DOCUMENT COMPONENTS FOR PROBABILISTIC INDEXING
    KWOK, KL
    [J]. PROCEEDINGS OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1986, 23 : 158 - 162
  • [2] Document retrieval through concept hierarchy formulation
    Schönhofen, Péter
    Charaf, Hassan
    [J]. Periodica Polytechnica Electrical Engineering, 2001, 45 (02): : 91 - 108
  • [3] Concept integration of document databases using different indexing languages
    Zhang, XY
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2006, 42 (01) : 121 - 135
  • [4] Document indexing: a concept-based approach to term weight estimation
    Kang, BY
    Lee, SJ
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2005, 41 (05) : 1065 - 1080
  • [5] Using WordNet for Concept-Based Document Indexing in Information Retrieval
    Boubekeur, Fatiha
    Boughanem, Mohand
    Tamine, Lynda
    Daoud, Mariam
    [J]. SEMAPRO 2010: THE FOURTH INTERNATIONAL CONFERENCE ON ADVANCES IN SEMANTIC PROCESSING, 2010, : 151 - 157
  • [6] Document Concept Hierarchy Generation by Extracting Semantic Tree Using Knowledge Graph
    Tumpa, Sanjida Nasreen
    Ali, Muhammad Masroor
    [J]. 2018 4TH IEEE INTERNATIONAL WIE CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (IEEE WIECON-ECE 2018), 2018, : 83 - 86
  • [7] Indexing of handwritten document images
    SyedaMahmood, T
    [J]. WORKSHOP ON DOCUMENT IMAGE ANALYSIS (DIA'97), PROCEEDINGS: IN COOPERATION WITH CVPR '97, 1997, : 66 - 73
  • [8] Document indexing in text categorization
    Zhang, QR
    Zhang, L
    Dong, SB
    Tan, JH
    [J]. PROCEEDINGS OF 2005 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-9, 2005, : 3792 - 3796
  • [9] Web document indexing and retrieval
    Hyusein, B
    Patel, A
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, PROCEEDINGS, 2003, 2588 : 573 - 579
  • [10] Sentence Ranking for Document Indexing
    Maiti, Saptaditya
    Mandal, Deba P.
    Mitra, Pabitra
    [J]. PATTERN RECOGNITION AND MACHINE INTELLIGENCE, 2011, 6744 : 274 - 279