Chinese New Word Detection from Query Logs

被引:0
|
作者
Zhang, Yan [1 ]
Sun, Maosong [1 ]
Zhang, Yang [2 ]
机构
[1] Tsinghua Univ, Deptment Comp Sci & Technol, State Key Lab Intelligent Technol & Syst Technol, Beijing 100084, Peoples R China
[2] Sohu Inc R&D Ctr, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
new word detection; dynamic time warping; query logs; search;
D O I
10.1115/PVP2010-25531
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing works in literature mostly resort to the web pages or other author-centric resources to detect new words, which require highly complex text processing. This paper exploits the visitor-centric resources, specifically, query logs from the commercial search engine, to detect new words. Since query logs are generated by the search engine users, and are segmented naturally, the complex text processing work can be avoided. By dynamic time warping, a new word detection algorithm based on the trajectory similarity is proposed to distinguish new words from the query logs. Experiments based on real world data sets show the effectiveness and efficiency of the proposed algorithm.
引用
收藏
页码:233 / 243
页数:11
相关论文
共 50 条
  • [1] Bootstrapping Domain Detection Using Query Click Logs for New Domains
    Hakkani-Tuer, Dilek
    Tur, Gokhan
    Heck, Larry
    Shriberg, Elizabeth
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 716 - 719
  • [2] Intent Boundary Detection in Search Query Logs
    Wang, Chieh-Jen
    Lin, Kevin Hsin-Yih
    Chen, Hsin-Hsi
    [J]. SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, 2010, : 749 - 750
  • [3] New Word Detection in Ancient Chinese Literature
    Xie, Tao
    Wu, Bin
    Wang, Bai
    [J]. WEB AND BIG DATA, APWEB-WAIM 2017, PT II, 2017, 10367 : 260 - 275
  • [4] Segmenting User Sessions in Search Engine Query Logs Leveraging Word Embeddings
    Gomes, Pedro
    Martins, Bruno
    Cruz, Luis
    [J]. DIGITAL LIBRARIES FOR OPEN KNOWLEDGE, TPDL 2019, 2019, 11799 : 185 - 199
  • [5] Characteristics of chinese web searching: A large-scale analysis of Chinese query logs
    Li, Yanan
    Zhang, Sen
    Wang, Bin
    Li, Jintao
    [J]. Journal of Computational Information Systems, 2008, 4 (03): : 1127 - 1136
  • [6] Keyword Query Cleaning with Query Logs
    Gao, Lei
    Yu, Xiaohui
    Liu, Yang
    [J]. WEB-AGE INFORMATION MANAGEMENT, 2011, 6897 : 31 - 42
  • [7] Extracting Semantic Relations from Query Logs;
    Baeza-Yates, Ricardo
    Tiberi, Alessandro
    [J]. KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2007, : 76 - 85
  • [8] Query Suggestions in the Absence of Query Logs
    Bhatia, Sumit
    Majumdar, Debapriyo
    Mitra, Prasenjit
    [J]. PROCEEDINGS OF THE 34TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR'11), 2011, : 795 - 804
  • [9] Mining Precision Interfaces From Query Logs
    Zhang, Qianrui
    Zhang, Haoci
    Sellam, Thibault
    Wu, Eugene
    [J]. SIGMOD '19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2019, : 988 - 1005
  • [10] New Word Detection and Tagging on Chinese Twitter Stream
    Liang, Yuzhi
    Yin, Pengcheng
    Yiu, S. M.
    [J]. BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, 2015, 9263 : 310 - 321