Chinese New Word Detection from Query Logs

被引:0
|
作者
Zhang, Yan [1 ]
Sun, Maosong [1 ]
Zhang, Yang [2 ]
机构
[1] Tsinghua Univ, Deptment Comp Sci & Technol, State Key Lab Intelligent Technol & Syst Technol, Beijing 100084, Peoples R China
[2] Sohu Inc R&D Ctr, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
new word detection; dynamic time warping; query logs; search;
D O I
10.1115/PVP2010-25531
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing works in literature mostly resort to the web pages or other author-centric resources to detect new words, which require highly complex text processing. This paper exploits the visitor-centric resources, specifically, query logs from the commercial search engine, to detect new words. Since query logs are generated by the search engine users, and are segmented naturally, the complex text processing work can be avoided. By dynamic time warping, a new word detection algorithm based on the trajectory similarity is proposed to distinguish new words from the query logs. Experiments based on real world data sets show the effectiveness and efficiency of the proposed algorithm.
引用
收藏
页码:233 / 243
页数:11
相关论文
共 50 条
  • [31] Improving Unsupervised Dependency Parsing with Knowledge from Query Logs
    Qiao, Xiuming
    Cao, Hailong
    Zhao, Tiejun
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2016, 16 (01)
  • [32] Commonsense Properties from Query Logs and Question Answering Forums
    Romero, Julien
    Razniewski, Simon
    Pal, Koninika
    Pan, Jeff Z.
    Sakhadeo, Archit
    Weikum, Gerhard
    [J]. PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 1411 - 1420
  • [33] Which Performs Better for New Word Detection, Character Based or Chinese Word Segmentation Based?
    Zhang, Haijun
    Shi, Shumin
    [J]. PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2014), 2014, : 10 - 14
  • [34] Investigating COVID-19-Related query logs of Chinese search engine users
    Ye, Ziyi
    Mao, Jiaxin
    Liu, Yiqun
    Zhang, Min
    Ma, Shaoping
    [J]. Ye, Ziyi (yeziyi1998@gmail.com), 1600, John Wiley and Sons Inc (57):
  • [35] A new word detection method for chinese based on local context information
    Zeng, Hua-Lin
    Zhou, Chang-Le
    Zheng, Xu-Ling
    [J]. Journal of Donghua University (English Edition), 2010, 27 (02) : 189 - 192
  • [36] Mining Preferences from OLAP Query Logs for Proactive Personalization
    Aligon, Julien
    Golfarelli, Matteo
    Marcel, Patrick
    Rizzi, Stefano
    Turricchia, Elisa
    [J]. ADVANCES IN DATABASES AND INFORMATION SYSTEMS, 2011, 6909 : 84 - +
  • [37] An adaptive method for Chinese new word detection based on hypothesis testing
    Jiang, Dongchen
    Jiang, Aoyuan
    Tang, Shuai
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2022, 25 (04) : 993 - 999
  • [38] An adaptive method for Chinese new word detection based on hypothesis testing
    Dongchen Jiang
    Aoyuan Jiang
    Shuai Tang
    [J]. Pattern Analysis and Applications, 2022, 25 : 993 - 999
  • [39] A New Word Detection Method for Chinese Based on Local Context Information
    曾华琳
    周昌乐
    郑旭玲
    [J]. Journal of Donghua University(English Edition), 2010, 27 (02) : 189 - 192
  • [40] Profiling Apache HIVE Query from Run Time Logs
    Haryono, Givanna Putri
    Zhou, Ying
    [J]. 2016 INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2016, : 61 - 68