An application of the nearest correlation matrix on web document classification

被引:6
|
作者
Qi, Houduo [1 ]
Xia, Zhonghang
Xing, Guangming
机构
[1] Univ Southampton, Sch Math, Southampton SO17 1BJ, Hants, England
[2] Western Kentucky Univ, Dept Comp Sci, Bowling Green, KY 42101 USA
关键词
support vector machines; classification; kernel matrix; semidefinite programming;
D O I
10.3934/jimo.2007.3.701
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
The Web document is organized by a set of textual data according to a predefined logical structure. It has been shown that collecting Web documents with similar structures can improve query efficiency. The XML document has no vectorial representation, which is required in most existing classification algorithms. The kernel method has been applied to represent structural data with pairwise similarity. In this case, a set of Web data can be fed into classification algorithms in the format of a kernel matrix. However, since the distance between a pair of Web documents is usually obtained approximately, the derived distance matrix is not a kernel matrix. In this paper, we propose to use the nearest correlation matrix (of the estimated distance matrix) as the kernel matrix, which can be fast computed by a Newton- type method. Experimental studies show that the classification accuracy can be significantly improved.
引用
收藏
页码:701 / 713
页数:13
相关论文
共 50 条
  • [31] Web Document Classification by Keywords Using Random Forests
    Klassen, Myungsook
    Paturi, Nikhila
    NETWORKED DIGITAL TECHNOLOGIES, PT 2, 2010, 88 : 256 - 261
  • [32] A PSO-based web document classification algorithm
    Ziqiang Wang
    Qingzhou Zhang
    Dexian Zhang
    SNPD 2007: EIGHTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING, AND PARALLEL/DISTRIBUTED COMPUTING, VOL 3, PROCEEDINGS, 2007, : 659 - +
  • [33] PCCS: a fast clustering and classification method for Web document
    Wang, A.H.
    Zhang, M.
    Yang, D.Q.
    Tang, S.W.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2001, 38 (04):
  • [34] Web Document Classification using Support Vector Machine
    Shinde, Sharmila
    Joeg, Prasanna
    Vanjale, Sandeep
    2017 INTERNATIONAL CONFERENCE ON CURRENT TRENDS IN COMPUTER, ELECTRICAL, ELECTRONICS AND COMMUNICATION (CTCEEC), 2017, : 688 - 691
  • [35] Improving SVM on Web Content Classification by Document Formulation
    Xia, Tian
    Chai, Yanmei
    Wang, Tong
    PROCEEDINGS OF 2012 7TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION, VOLS I-VI, 2012, : 110 - 113
  • [36] Annotation based classification of the PDF document for semantic web
    Shukla, Archana
    ICECT 2011 - 2011 3rd International Conference on Electronics Computer Technology, 2011, 1 : 370 - 376
  • [37] A quadratically convergent Newton method for computing the nearest correlation matrix
    Qi, Houduo
    Sun, Defeng
    SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 2006, 28 (02) : 360 - 385
  • [38] LOW RANK METHODS FOR SOLVING THE NEAREST CORRELATION MATRIX PROBLEM
    Al-Homidan, Suliman
    JOURNAL OF NONLINEAR AND CONVEX ANALYSIS, 2018, 19 (06) : 881 - 892
  • [39] A MA-Based Web Document Classification Algorithm
    Sun, Xia
    Wang, Ziqiang
    Zhang, Dexian
    2008 IEEE INTERNATIONAL SYMPOSIUM ON IT IN MEDICINE AND EDUCATION, VOLS 1 AND 2, PROCEEDINGS, 2008, : 952 - 955
  • [40] Web document classification based on extended rough set
    Yi, GX
    Hu, HP
    Lu, ZD
    PDCAT 2005: SIXTH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PROCEEDINGS, 2005, : 916 - 918