Multi-Style Language Model for Web Scale Information Retrieval

被引:0
|
作者
Wang, Kuansan [1 ]
Li, Xiaolong
Gao, Jianfeng [1 ]
机构
[1] Microsoft Res, Redmond, WA 98052 USA
关键词
Information Retrieval; Mixture Language Models; Smoothing; Parameter Estimation; Probabilistic Relevance Model;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Web documents are typically associated with many text streams, including the body, the title and the URL that are determined by the authors, and the anchor text or search queries used by others to refer to the documents. Through a systematic large scale analysis on their cross entropy, we show that these text streams appear to be composed in different language styles, and hence warrant respective language models to properly describe their properties. We propose a language modeling approach to Web document retrieval in which each document is characterized by a mixture model with components corresponding to the various text streams associated with the document. Immediate issues for such a mixture model arise as all the text streams are not always present for the documents, and they do not share the same lexicon, making it challenging to properly combine the statistics from the mixture components. To address. these issues, we introduce an "open-vocabulary" smoothing technique so that all the component language models have the same cardinality and their scores can simply be linearly combined. To ensure that the approach can cope with Web scale applications, the model training algorithm is designed to require no labeled data and can be fully automated with few heuristics and no empirical parameter tunings. The evaluation on Web document ranking tasks shows that the component language models indeed have varying degrees of capabilities as predicted by the cross-entropy analysis, and the combined mixture model outperforms the state-of-the-art BM25F based system.
引用
收藏
页码:467 / 474
页数:8
相关论文
共 50 条
  • [1] Large-scale interactive retrieval in art collections using multi-style feature aggregation
    Ufer, Nikolai
    Simon, Max
    Lang, Sabine
    Ommer, Bjoern
    [J]. PLOS ONE, 2021, 16 (11):
  • [2] MCLGAN: a multi-style cartoonization method based on style condition information
    Li, Canlin
    Wang, Xinyue
    Yi, Ran
    Zhang, Wenjiao
    Bi, Lihua
    Ma, Lizhuang
    [J]. VISUAL COMPUTER, 2024,
  • [3] Hyper-Textual Language Model for Web Information Retrieval
    Xie, Ying
    Raghavan, Vijay V.
    Young, Andrew
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING, VOLS 1 AND 2, 2008, : 68 - +
  • [4] Attention-Guide Walk Model in Heterogeneous Information Network for Multi-Style Recommendation Explanation
    Wang, Xin
    Wang, Ying
    Ling, Yunzhi
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 6275 - 6282
  • [5] Multi-Style Migration QR Code
    You, Fucheng
    Lai, Shuren
    Gong, Hechen
    Zhao, Yangze
    [J]. 3RD ANNUAL INTERNATIONAL CONFERENCE ON INFORMATION SYSTEM AND ARTIFICIAL INTELLIGENCE (ISAI2018), 2018, 1069
  • [6] Fast Video Multi-Style Transfer
    Gao, Wei
    Lie, Yijun
    Yin, Yihang
    Yang, Ming-Hsuan
    [J]. 2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 3211 - 3219
  • [7] Interactive Artistic Multi-style Transfer
    Wang, Xiaohui
    Lyu, Yiran
    Huang, Junfeng
    Wang, Ziying
    Qin, Jingyan
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2021, 14 (01)
  • [8] Interactive Artistic Multi-style Transfer
    Xiaohui Wang
    Yiran Lyu
    Junfeng Huang
    Ziying Wang
    Jingyan Qin
    [J]. International Journal of Computational Intelligence Systems, 14
  • [9] The Communication Value of Multi-style Subtitles
    Zeng, Guangyu
    [J]. PROCEEDINGS OF THE 2017 2ND INTERNATIONAL CONFERENCE ON EDUCATION, SPORTS, ARTS AND MANAGEMENT ENGINEERING (ICESAME 2017), 2017, 123 : 685 - 690
  • [10] Multi-Style Generative Reading Comprehension
    Nishida, Kyosuke
    Saito, Itsumi
    Nishida, Kosuke
    Shinoda, Kazutoshi
    Otsuka, Atsushi
    Asano, Hisako
    Tomita, Junji
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 2273 - 2284