Search and Browse Log Mining for Web Information Retrieval: Challenges, Methods, and Applications

被引:0
|
作者
Jiang, Daxin [1 ]
Pei, Jian
Li, Hang [1 ]
机构
[1] Microsoft Res Asia, Beijing, Peoples R China
关键词
Search and browse logs; log data mining;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Huge amounts of search log data have been accumulated in various search engines. Currently, a commercial search engine receives billions of queries and collects tera-bytes of log data on any single day. Other than search log data, browse logs can be collected by client-side browser plug-ins, which record the browse information if users' permissions are granted. Such massive amounts of search/browse log data, on the one hand, provide great opportunities to mine the wisdom of crowds and improve search results as well as online advertisement. On the other hand, designing effective and efficient methods to clean, model, and process large scale log data also presents great challenges. In this tutorial, we focus on mining search and browse log data for Web information retrieval. We consider a Web information retrieval system consisting of four components, namely, query understanding, document understanding, query-document matching, and user understanding. Accordingly, we organize the tutorial materials along these four aspects. For each aspect, we will survey the major tasks, challenges, fundamental principles, and state-of-the-art methods. The goal of this tutorial is to provide a systematic survey on large-scale search/browse log mining to the IR community. It will help IR researchers to get familiar with the core challenges and promising directions in log mining. At the same time, this tutorial may also serve the developers of Web information retrieval systems as a comprehensive and in-depth reference to the advanced log mining techniques.
引用
收藏
页码:912 / 912
页数:1
相关论文
共 50 条
  • [31] New challenges of web crawler technology for information retrieval
    Blazquez Ochando, Manuel
    METODOS DE INFORMACION, 2013, 4 (07): : 115 - 128
  • [32] Mining the world wide Web: An information search approach
    Thelwall, M
    JOURNAL OF DOCUMENTATION, 2002, 58 (02) : 232 - 234
  • [33] Application of Convolution Neural Networks in Web Search Log Mining for Effective Web Document Clustering
    Chawla, Suruchi
    INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2022, 12 (01)
  • [34] A Hybrid Information Filtering Algorithm Based on Distributed Web log Mining
    Ling Yun
    Wang Xun
    Gu Huamao
    THIRD 2008 INTERNATIONAL CONFERENCE ON CONVERGENCE AND HYBRID INFORMATION TECHNOLOGY, VOL 1, PROCEEDINGS, 2008, : 1086 - 1091
  • [35] The Application and Research of Web Log Mining in Network Resources of Ceramic Information
    Fang, Wan
    PROCEEDINGS OF THE 2015 5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCES AND AUTOMATION ENGINEERING, 2016, 42 : 334 - 337
  • [36] Translation of unknown terms via web mining for information retrieval
    Li, Qing
    Myaeng, Sung Hyon
    Jin, Yun
    Kang, Bo-Yeong
    INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2006, 4182 : 258 - 269
  • [37] A Technical Study on Information Retrieval using Web Mining Techniques
    Srinaganya, G.
    Sathiaseelan, J. G. R.
    2015 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION, EMBEDDED AND COMMUNICATION SYSTEMS (ICIIECS), 2015,
  • [38] Time Graph Pattern Mining for Web Analysis and Information Retrieval
    Oshino, Taihei
    Asano, Yasuhito
    Yoshikawa, Masatoshi
    WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2010, 6184 : 40 - 46
  • [39] Mining search engine query log for evaluating content and structure of a web site
    Hosseini, Mehdi
    Abolhassani, Hassan
    PROCEEDINGS OF THE IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE: WI 2007, 2007, : 235 - 241
  • [40] Information retrieval using semantic web browser - Personalized and categorical web search
    Sumalatha, M. R.
    Vaidehi, V.
    Kannan, A.
    Anandhi, S.
    2007 INTERNATIONAL CONFERENCE OF SIGNAL PROCESSING, COMMUNICATIONS AND NETWORKING, VOLS 1 AND 2, 2006, : 238 - +