Query-Sets++: A Scalable Approach for Modeling Web Sites

被引:0
|
作者
Poblete, Barbara [1 ,2 ]
Spiliopoulou, Myra [3 ]
Mendoza, Marcelo [4 ]
机构
[1] Univ Chile, Dept Comp Sci DCC, Santiago, Chile
[2] Yahoo Res Latin Amer, Latin America, Chile
[3] Otto Von Guericke Univ, Magdeburg, Germany
[4] Univ Tecn Federico Santa Maria, Santa Maria, Chile
来源
STRING PROCESSING AND INFORMATION RETRIEVAL | 2011年 / 7024卷
关键词
Web Sites; Query Mining; Classification;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We explore an effective approach for modeling and classifying Web sites in the World Wide Web. The aim of this work is to classify Web sites using features which are independent of size, structure and vocabulary. We establish Web site similarity based on search engine query hits, which convey document relevance and utility in direct relation to users' needs and interests. To achieve this, we use a generic Web site representation scheme over different feature spaces, built upon query traffic to the site's documents. For this task we extend, in a non-trivial way, our prior work using query-sets for single document representation. We discuss why this previous methodology is not scalable for a large set of heterogeneous Web sites. We show that our models achieve very compact Web site representations. Furthermore, our experiments on site classification show excellent performance and quality/dimensionality trade-off. In particular, we sustain a reduction in the feature space to 5% of the size of the bag-of-words representation, while achieving 99% precision in our classification experiments on DMOZ.
引用
收藏
页码:129 / +
页数:2
相关论文
共 50 条
  • [1] Semplore: An IR approach to scalable hybrid query of semantic web data
    Zhang, Lei
    Liu, QiaoLing
    Zhang, Jie
    Wang, HaoFen
    Pan, Yue
    Yu, Yong
    SEMANTIC WEB, PROCEEDINGS, 2007, 4825 : 652 - +
  • [2] Building a scalable web query system
    Hsu, Meichun
    Xiong, Yuhong
    DATABASES IN NETWORKED INFORMATION SYSTEMS, PROCEEDINGS, 2007, 4777 : 322 - +
  • [3] A Novel Approach Based on Fuzzy Rough Sets for Web Query System
    Han, Jinghua
    Liu, Gang
    FUZZY INFORMATION AND ENGINEERING 2010, VOL 1, 2010, 78 : 667 - 673
  • [4] Scalable Query Result Caching for Web Applications
    Garrod, Charles
    Manjhi, Amit
    Ailamaki, Anastasia
    Maggs, Bruce
    Mowry, Todd
    Olston, Christopher
    Tomasic, Anthony
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2008, 1 (01): : 550 - 561
  • [5] Efficient Query Processing for Scalable Web Search
    Tonellotto, Nicola
    Macdonald, Craig
    Ounis, Iadh
    FOUNDATIONS AND TRENDS IN INFORMATION RETRIEVAL, 2018, 12 (4-5): : 319 - 500
  • [6] A scalable update management mechanism for query result caching systems at database-driven web sites
    Choi, SL
    Huh, S
    Kim, SM
    Song, JH
    Lee, YJ
    FRONTIERS OF WWW RESEARCH AND DEVELOPMENT - APWEB 2006, PROCEEDINGS, 2006, 3841 : 850 - 855
  • [7] Query aspects approach to web search
    Crabtree, Daniel
    Gao, Xiaoying
    Andreae, Peter
    WEB INTELLIGENCE, 2016, 14 (03) : 173 - 197
  • [8] An approach for the ranking of query results in the semantic web
    Stojanovic, N
    Studer, R
    Stojanovic, L
    SEMANTIC WEB - ISWC 2003, 2003, 2870 : 500 - 516
  • [9] Web Modeling Language (WebML): a modeling language for designing Web sites
    Ceri, S
    Fraternali, P
    Bongio, A
    COMPUTER NETWORKS-THE INTERNATIONAL JOURNAL OF COMPUTER AND TELECOMMUNICATIONS NETWORKING, 2000, 33 (1-6): : 137 - 157
  • [10] A Query Rewriting Approach for Web Service Composition
    Barhamgi, Mahmoud
    Benslimane, Djamal
    Medjahed, Brahim
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2010, 3 (03) : 206 - 222