An application of the nearest correlation matrix on web document classification

被引:6
|
作者
Qi, Houduo [1 ]
Xia, Zhonghang
Xing, Guangming
机构
[1] Univ Southampton, Sch Math, Southampton SO17 1BJ, Hants, England
[2] Western Kentucky Univ, Dept Comp Sci, Bowling Green, KY 42101 USA
关键词
support vector machines; classification; kernel matrix; semidefinite programming;
D O I
10.3934/jimo.2007.3.701
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
The Web document is organized by a set of textual data according to a predefined logical structure. It has been shown that collecting Web documents with similar structures can improve query efficiency. The XML document has no vectorial representation, which is required in most existing classification algorithms. The kernel method has been applied to represent structural data with pairwise similarity. In this case, a set of Web data can be fed into classification algorithms in the format of a kernel matrix. However, since the distance between a pair of Web documents is usually obtained approximately, the derived distance matrix is not a kernel matrix. In this paper, we propose to use the nearest correlation matrix (of the estimated distance matrix) as the kernel matrix, which can be fast computed by a Newton- type method. Experimental studies show that the classification accuracy can be significantly improved.
引用
收藏
页码:701 / 713
页数:13
相关论文
共 50 条
  • [41] Clonal selection algorithm based web document classification
    Hu, Xuanzi
    He, Dingxiu
    Journal of Information and Computational Science, 2010, 7 (02): : 551 - 557
  • [42] Statistical Methods for Performance Evaluation of WEB Document Classification
    Volovici, Daniel
    Breazu, Macarie
    Curea, Gabriel Dacian
    Morariu, Daniel Ionel
    STUDIES IN INFORMATICS AND CONTROL, 2010, 19 (02): : 169 - 176
  • [43] Two dimensional Large Margin Nearest Neighbor for Matrix Classification
    Song, Kun
    Nie, Feiping
    Han, Junwei
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2751 - 2757
  • [44] Complex Document Classification and Localization Application on Identity Document Images
    Awal, Ahmad-Montaser
    Ghanmi, Nabil
    Sicre, Ronan
    Furon, Teddy
    2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 426 - 431
  • [45] A Correlation-Based Distance Function for Nearest Neighbor Classification
    Rodriguez, Yanet
    De Baets, Bernard
    Garcia, Maria M.
    Morell, Carlos
    Grau, Ricardo
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS AND APPLICATIONS, PROCEEDINGS, 2008, 5197 : 284 - +
  • [46] Classification on Web Application Requests
    Gharibeh, Samar
    Melhem, Shatha
    Najadat, Hassan
    2020 11TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2020, : 033 - 037
  • [47] Towards enriching the quality of k-nearest neighbor rule for document classification
    Basu, Tanmay
    Murthy, C. A.
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2014, 5 (06) : 897 - 905
  • [48] Towards enriching the quality of k-nearest neighbor rule for document classification
    Tanmay Basu
    C. A. Murthy
    International Journal of Machine Learning and Cybernetics, 2014, 5 : 897 - 905
  • [49] Correlation-Based Web Document Clustering for Adaptive Web Interface Design
    Zhong Su
    Qiang Yang
    Hongjiang Zhang
    Xiaowei Xu
    Yu-Hen Hu
    Shaoping Ma
    Knowledge and Information Systems, 2002, 4 (2) : 151 - 167
  • [50] Document Classification Using Nonnegative Matrix Factorization and Underapproximation
    Berry, Michael W.
    Gillis, Nicolas
    Glineur, Francois
    ISCAS: 2009 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-5, 2009, : 2782 - 2785