New classification quality estimators for analysis of documentary information: Application to patent analysis and web mapping

被引:28
|
作者
Lamirel, JC
Francois, C
AL Shehabi, S
Hoffmann, M
机构
[1] LORIA, F-54506 Vandoeuvre Les Nancy, France
[2] URI INIST, CNRS, Vandoeuvre Les Nancy, France
关键词
D O I
10.1023/B:SCIE.0000034386.05278.e8
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The information analysis process includes a cluster analysis or classification step associated with an expert validation of the results. In this paper, we propose new measures of Recall/Precision for estimating the quality of cluster analysis. These measures derive both from the Galois lattice theory and from the Information Retrieval (IR) domain. As opposed to classical measures of inertia, they present the main advantages to be both independent of the classification method and of the difference between the intrinsic dimension of the data and those of the clusters. We present two experiments on the basis of the MultiSOM model, which is an extension of Kohonen's SOM model, as a cluster analysis method. Our first experiment on patent data shows how our measures can be used to compare viewpoint-oriented classification methods, such as MultiSOM, with global cluster analysis method, such as WebSOM Our second experiment, which takes part in the EICSTES EEC project, is an original Webometrics experiment that combines content and links classification starting from a large non-homogeneous set of web pages. This experiment highlights the fact that break-even points between our different measures of Recall/Precision can be used to determine an optimal number of clusters for web data classification. The content of the clusters obtained when using different break-even points are compared for determining the quality of the resulting maps.
引用
收藏
页码:445 / 462
页数:18
相关论文
共 50 条
  • [1] New classification quality estimators for analysis of documentary information: Application to patent analysis and web mapping
    Jean-Charles Lamirel
    Claire Francois
    Shadi Al Shehabi
    Martial Hoffmann
    Scientometrics, 2004, 60 : 445 - 562
  • [2] APPLICATION OF A STATISTICAL METHOD TO DOCUMENTARY INFORMATION-FLOW ANALYSIS
    DUBOVIKOV, MS
    MARKOVA, LI
    SOLTS, NA
    NAUCHNO-TEKHNICHESKAYA INFORMATSIYA SERIYA 2-INFORMATSIONNYE PROTSESSY I SISTEMY, 1976, (01): : 22 - 25
  • [3] Research on the Application of Patent Information Analysis in Enterprise Strategy
    Liu, Haiyan
    2016 ISSGBM INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATION AND SOCIAL SCIENCES (ISSGBM-ICS 2016), PT 3, 2016, 68 : 471 - 474
  • [4] The new sources of information: information and documentary search in the context of web 2.0
    Rodriguez Conde, Maria Jose
    EDUCATION IN THE KNOWLEDGE SOCIETY, 2011, 12 (02): : 321 - 322
  • [5] A bibliometric mapping analysis of the literature on patent analysis
    Karatas, Ali Rauf
    Kazak, Hasan
    Akcan, Ahmet Tayfur
    Akkas, Erhan
    Arik, Muserref
    WORLD PATENT INFORMATION, 2024, 77
  • [7] A taxonomical classification of business models on mobile business: Patent analysis and SOM mapping
    Kim, Chulhyun
    Lee, Hakyeon
    Park, Yongtae
    2006 IEEE INTERNATIONAL CONFERENCE ON MANAGEMENT OF INNOVATION AND TECHNOLOGY, VOLS 1 AND 2, PROCEEDINGS, 2006, : 478 - +
  • [8] Morphological Patent Analysis, Recycling Web Patent Databases
    Diaz Prado, Jose Aldo
    Lopez Pineda, Arturo
    Cruz Ramos, Marco Polo
    KNOWLEDGE MANAGEMENT AND INNOVATION IN ADVANCING ECONOMIES-ANALYSES & SOLUTIONS, VOLS 1-3, 2009, : 911 - +
  • [9] Recent Application Technology Trends Analysis of Zinc Sulfide: Based on Patent Information Analysis
    Lee, Do-Yeon
    Kang, Hyun-Moo
    Yoon, Jongman
    Lee, Jeong-Gu
    KOREAN JOURNAL OF MATERIALS RESEARCH, 2016, 26 (02): : 100 - 108
  • [10] A Hybrid Model Combining SOMs with SVRs for Patent Quality Analysis and Classification
    Chang, Pei-Chann
    Wu, Jheng-Long
    Tsao, Cheng-Chin
    Fan, Chin-Yuan
    DATA MINING AND BIG DATA, DMBD 2016, 2016, 9714 : 262 - 269