New classification quality estimators for analysis of documentary information: Application to patent analysis and web mapping

被引:28
|
作者
Lamirel, JC
Francois, C
AL Shehabi, S
Hoffmann, M
机构
[1] LORIA, F-54506 Vandoeuvre Les Nancy, France
[2] URI INIST, CNRS, Vandoeuvre Les Nancy, France
关键词
D O I
10.1023/B:SCIE.0000034386.05278.e8
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The information analysis process includes a cluster analysis or classification step associated with an expert validation of the results. In this paper, we propose new measures of Recall/Precision for estimating the quality of cluster analysis. These measures derive both from the Galois lattice theory and from the Information Retrieval (IR) domain. As opposed to classical measures of inertia, they present the main advantages to be both independent of the classification method and of the difference between the intrinsic dimension of the data and those of the clusters. We present two experiments on the basis of the MultiSOM model, which is an extension of Kohonen's SOM model, as a cluster analysis method. Our first experiment on patent data shows how our measures can be used to compare viewpoint-oriented classification methods, such as MultiSOM, with global cluster analysis method, such as WebSOM Our second experiment, which takes part in the EICSTES EEC project, is an original Webometrics experiment that combines content and links classification starting from a large non-homogeneous set of web pages. This experiment highlights the fact that break-even points between our different measures of Recall/Precision can be used to determine an optimal number of clusters for web data classification. The content of the clusters obtained when using different break-even points are compared for determining the quality of the resulting maps.
引用
收藏
页码:445 / 462
页数:18
相关论文
共 50 条
  • [41] Analysis of Networking and Application Layer Derived Metrics for Web Quality of Experience
    Le Thu Nguyen
    Harris, Richard
    Jusak, Jusak
    2012 IEEE CONSUMER COMMUNICATIONS AND NETWORKING CONFERENCE (CCNC), 2012, : 321 - 325
  • [42] Identifying opportunities for sustainable business models in manufacturing: Application of patent analysis and generative topographic mapping
    Feng, Jian
    Liu, Zhenfeng
    Feng, Lijie
    SUSTAINABLE PRODUCTION AND CONSUMPTION, 2021, 27 : 509 - 522
  • [43] Image descriptors analysis supported by Information Visualization with application in automatic classification
    Mendes, Gilson
    Paiva, Jose Gustavo S.
    PROCEEDINGS OF THE 14TH BRAZILIAN SYMPOSIUM ON INFORMATION SYSTEMS (SBSI2018), 2018, : 301 - 308
  • [44] Application of Bayes analysis with part pre-information in pattern classification
    Wan Junli
    Zhang Guohua
    Zhang Xuejiao
    Zhao Zhuo
    PROCEEDINGS OF THE FIRST INTERNATIONAL SYMPOSIUM ON TEST AUTOMATION & INSTRUMENTATION, VOLS 1 - 3, 2006, : 1815 - 1818
  • [45] Classification of Web History Tools Through Web Analysis
    Goncalves Evangelista, Joao Rafael
    de Oliveira Gatto, Dacyr Dante
    Sassi, Renato Jose
    HCI FOR CYBERSECURITY, PRIVACY AND TRUST, 2019, 11594 : 266 - 276
  • [46] INFORMATION QUALITY ANALYSIS
    VACCA, J
    INFOSYSTEMS, 1985, 32 (12): : 60 - 61
  • [47] Mapping Technological Trajectory as Patent Analysis and Delphi Investigation
    Lee, P. C.
    Su, H. N.
    2008 IEEE INTERNATIONAL CONFERENCE ON MANAGEMENT OF INNOVATION AND TECHNOLOGY, VOLS 1-3, 2008, : 23 - 28
  • [48] Worldwide patent analysis and mapping of combine harvester innovation
    Wang, Xiuhong
    AFRICAN JOURNAL OF AGRICULTURAL RESEARCH, 2010, 5 (24): : 3493 - 3499
  • [49] Mapping Technological Profile of Collaborative Robots by Patent Analysis
    Borregan-Alvarado, Jon
    Alvarez-Meaza, Izaskun
    Cilleruelo-Carrasco, Ernesto
    Garechana-Anacabe, Gaizka
    INTERNATIONAL JOURNAL OF HUMAN-COMPUTER INTERACTION, 2023, 39 (20) : 3920 - 3935
  • [50] Osteoporotic Vertebral Fractures: An Analysis of Readability and Quality of Web-Based information
    Hidayat, Yasir
    Rajkoomar, Ashley Ghanshyam
    Qadeer, Muhammad Abrar
    D'Souza, Lester G.
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2022, 14 (06)