Clustering suggestion for Chinese news web pages from multi-media sources

被引:0
|
作者
Chiu, Deng-Yiv [1 ]
Pan, Ya-Chen [1 ]
机构
[1] Chung Hua Univ, Dept Informat Management, Hsinchu, Taiwan
关键词
Chinese web pages clustering; multi-class SVM; fuzzy c-means algorithm; genetic algorithm;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There exist some news obviously classified into incorrect categories on Chinese web pages portal. The main reasons could be that it is difficult to automatically classify Chinese news and the news appearing on web pages portal are retrieved from many media sources. In this study, we integrate genetic algorithm and multi-class support vector machine (SVM) classifier to construct a Chinese news classification method. In addition, we find that some similar documents are scattered in different categories. The main reason could be that the categories of original media sources are different from those of news web pages portal. Those similar news should be collected to form a new category. We try to combine genetic algorithm and fuzzy c-means algorithm to propose a new approach to offer clustering suggestion for news web pages that are scattered in different categories and are from multi-media sources.
引用
收藏
页码:183 / 187
页数:5
相关论文
共 50 条
  • [41] Semiautomatic extraction of topic maps from Web pages using clustering with web contents and structure
    Mase, Motohiro
    Yamada, Seiji
    Nitta, Katsumi
    [J]. PROCEEDING OF THE 2007 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY, WORKSHOPS, 2007, : 208 - +
  • [42] A summarization system for Chinese news from multiple sources
    Chen, HH
    Kuo, JJ
    Huang, SJ
    Lin, CJ
    Wung, HC
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2003, 54 (13): : 1224 - 1236
  • [43] Misinformation from Web-based News Media? Computational Analysis of Metabolic Disease Burden for Chinese
    Chang, Angela
    [J]. DISINFORMATION IN OPEN ONLINE MEDIA, MISDOOM 2020, 2020, 12259 : 52 - 62
  • [44] A STRUCTURAL APPROACH TO EXTRACTING CHINESE POSITION RELATIONS FROM WEB PAGES
    Jin, Peiquan
    Yang, Jia
    Zhao, Jie
    Liu, Yanhong
    [J]. JOURNAL OF WEB ENGINEERING, 2013, 12 (05): : 363 - 382
  • [45] Content Extraction from Web Pages Based on Chinese Punctuation Number
    Song, Mingqiu
    Wu, Xintao
    [J]. 2007 INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-15, 2007, : 5573 - 5575
  • [46] Improving Online Clustering of Chinese Technology Web News With Bag-of-Near-Synonyms
    Zhang, Zhe
    Chen, Le
    Yin, Fengjing
    Zhang, Xin
    Guo, Lixiang
    [J]. IEEE ACCESS, 2020, 8 : 94245 - 94257
  • [47] Development of a web-based multi-media resource for environmental control modeling and greenhouse education
    Tignor, M. E.
    Irani, T. A.
    Rhoades, E.
    Giacomelli, G. A.
    Kubota, C.
    Fitz, E.
    McMahon, M. J.
    Wilson, S. B.
    [J]. PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON GREENHOUSE COOLING, 2006, (719): : 303 - +
  • [48] THE ONGOING BURROW OF WHEELER + MULTI-MEDIA ARTIST - NOTES FROM UNDERGROUND
    FRIEDMAN, BH
    [J]. ARTS MAGAZINE, 1982, 57 (01): : 94 - 97
  • [49] Performance of Limestone as a Multi-Media Filter for Manganese Removal from Groundwater
    Ahmedi, Figene
    Thaqi, Premton
    Tosuni, Kaltrina
    [J]. JOURNAL OF ECOLOGICAL ENGINEERING, 2023, 24 (06): : 385 - 391
  • [50] Social Media as News Sources-A Content Analysis of the Use of SinaWeibo and WeChat as Sources in Chinese Newspapers
    Chen, Yang
    [J]. 3RD ANNUAL INTERNATIONAL CONFERENCE ON MODERN EDUCATION AND SOCIAL SCIENCE (MESS 2017), 2017, 135 : 661 - 669