Organizing WWW images based on the analysis of page layout and web link structure

被引:0
|
作者
Cai, D [1 ]
He, XF [1 ]
Ma, WY [1 ]
Wen, JR [1 ]
Zhang, HJ [1 ]
机构
[1] Microsoft Res Asia, Beijing 100080, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to the rapid growth of the number of digital images on the Web, there is an increasing demand for effective and efficient method for organizing and retrieving the images available. This paper describes a method for clustering and embedding WWW images. By using a vision-based page segmentation algorithm, a web page is partitioned into blocks, and the textual and link information of an image can be accurately extracted from the block containing that image. By extracting the page-to-block, block-to-image, block-to-page relationships through link structure and page layout analysis, we construct an image graph. With the image graph model, we use techniques from spectral graph theory for image clustering and embedding. Some experimental results are given in the paper.
引用
收藏
页码:113 / 116
页数:4
相关论文
共 50 条
  • [41] Focused web crawling strategy based on web semantic analysis and web link analysis
    Xihua University Archives, Chengdu, Sichuan, 610039, China
    不详
    J. Comput. Inf. Syst., 2009, 6 (1793-1800):
  • [42] Similarity among web pages based on their link structure
    Quevedo, JU
    Huang, SHS
    IKE'03: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE ENGINEERING, VOLS 1 AND 2, 2003, : 232 - 238
  • [43] Countering Web Spam of Link-based Ranking Based on Link Analysis
    Wang, Hongwei
    Li, Yuankai
    Guo, Kaiqiang
    PEEA 2011, 2011, 23
  • [44] Real-time text extraction based on the page layout analysis system
    Soua, M.
    Benchekroun, A.
    Kachouri, R.
    Akil, M.
    REAL-TIME IMAGE AND VIDEO PROCESSING 2017, 2017, 10223
  • [45] LABA: Logical Layout Analysis of Book Page Images in Arabic Using Multiple Support Vector Machines
    Qin, Wenda
    Elanwar, Randa
    Betke, Margrit
    2018 IEEE 2ND INTERNATIONAL WORKSHOP ON ARABIC AND DERIVED SCRIPT ANALYSIS AND RECOGNITION (ASAR), 2018, : 35 - 40
  • [46] A URL-based Analysis of WWW Structure and Dynamics
    Kline, Jeffery
    Oakes, Edward
    Barford, Paul
    PROCEEDINGS OF THE 3RD NETWORK TRAFFIC MEASUREMENT AND ANALYSIS CONFERENCE (TMA 2019), 2019, : 81 - 88
  • [47] A fuzzy logic-based representation for web page clustering using self-organizing maps
    Garcia-Plaza, Alberto P.
    Fresno, Victor
    Martinez, Raquel
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2009, (42): : 79 - 86
  • [48] A Web Page Clustering Method Based on Formal Concept Analysis
    Zhang, Zuping
    Zhao, Jing
    Yan, Xiping
    INFORMATION, 2018, 9 (09)
  • [49] Predicting Escalations of Medical Queries Based on Web Page Structure and Content
    White, Ryen W.
    Horvitz, Eric
    SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, 2010, : 769 - 770
  • [50] Internal Structure and Semantic Web Link Structure Based Ontology Ranking
    Rajapaksha, Samantha K.
    Kodagoda, Nuwan
    2008 4TH INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION FOR SUSTAINABILITY (ICIAFS), 2008, : 103 - 107