An application of the nearest correlation matrix on web document classification

被引:6
|
作者
Qi, Houduo [1 ]
Xia, Zhonghang
Xing, Guangming
机构
[1] Univ Southampton, Sch Math, Southampton SO17 1BJ, Hants, England
[2] Western Kentucky Univ, Dept Comp Sci, Bowling Green, KY 42101 USA
关键词
support vector machines; classification; kernel matrix; semidefinite programming;
D O I
10.3934/jimo.2007.3.701
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
The Web document is organized by a set of textual data according to a predefined logical structure. It has been shown that collecting Web documents with similar structures can improve query efficiency. The XML document has no vectorial representation, which is required in most existing classification algorithms. The kernel method has been applied to represent structural data with pairwise similarity. In this case, a set of Web data can be fed into classification algorithms in the format of a kernel matrix. However, since the distance between a pair of Web documents is usually obtained approximately, the derived distance matrix is not a kernel matrix. In this paper, we propose to use the nearest correlation matrix (of the estimated distance matrix) as the kernel matrix, which can be fast computed by a Newton- type method. Experimental studies show that the classification accuracy can be significantly improved.
引用
收藏
页码:701 / 713
页数:13
相关论文
共 50 条
  • [21] Web Document Classification Using MFA and MPM
    Sun, Xia
    Wang, Ziqiang
    2009 SECOND INTERNATIONAL CONFERENCE ON FUTURE INFORMATION TECHNOLOGY AND MANAGEMENT ENGINEERING, FITME 2009, 2009, : 349 - 352
  • [22] Progressive analysis scheme for web document classification
    Sung, LC
    Kuo, CH
    Chen, MC
    Sun, YL
    2005 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, PROCEEDINGS, 2005, : 606 - 609
  • [23] Web document classification based on rough set
    Duan, Qiguo
    Miao, Duoqian
    Chen, Min
    ROUGH SETS, FUZZY SETS, DATA MINING AND GRANULAR COMPUTING, PROCEEDINGS, 2007, 4482 : 240 - +
  • [24] Structure methods for solving the nearest correlation matrix problem
    Al-Homidan, Suliman
    AlQarni, Munirah
    POSITIVITY, 2012, 16 (03) : 497 - 508
  • [25] Computing the nearest correlation matrix - a problem from finance
    Higham, NJ
    IMA JOURNAL OF NUMERICAL ANALYSIS, 2002, 22 (03) : 329 - 343
  • [26] Alternative gradient algorithms for computing the nearest correlation matrix
    Yin, Jun-Feng
    Zhang, Yu
    APPLIED MATHEMATICS AND COMPUTATION, 2013, 219 (14) : 7591 - 7599
  • [27] Structure methods for solving the nearest correlation matrix problem
    Suliman Al-Homidan
    Munirah AlQarni
    Positivity, 2012, 16 : 497 - 508
  • [28] Application of a genetic algorithm to nearest neighbour classification
    Simkin, S
    Verwaart, T
    Vrolijk, H
    INNOVATIONS IN APPLIED ARTIFICIAL INTELLIGENCE, 2005, 3533 : 544 - 546
  • [29] Feature and prototype evolution for Nearest Neighbor Classification of web documents
    Cheatham, Michelle
    Rizki, Mateen
    THIRD INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: NEW GENERATIONS, PROCEEDINGS, 2006, : 364 - +
  • [30] SVM multi-classifier and web document classification
    Liang, JZ
    PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 1347 - 1351