Cross-lingual analysis of English and Chinese web search

被引:0
|
作者
Lin, Peiguang [1 ]
Zhang, Tong [2 ]
Xia, Menglong [3 ]
Zhou, Jin [4 ]
Nie, Peiyao [1 ]
机构
[1] Shandong Univ Finance & Econ, Sch Comp Sci & Technol, Jinan 250001, Shandong, Peoples R China
[2] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou 510000, Guangdong, Peoples R China
[3] Macau Univ Sci & Technol, Fac Hospitality & Tourism Management, Ave Wai Long, Taipa 999078, Macau, Peoples R China
[4] Univ Jinan, Shandong Prov Key Lab Network Based Intelligent C, Jinan 250001, Shandong, Peoples R China
关键词
cross-lingual analysis; web search analysis; search query; POS distribution; search session; session entropy; query reformulation; click graph analysis; query features; web search burstiness; ENGINE; ALGORITHM; BEHAVIOR;
D O I
10.1504/IJWGS.2018.095663
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
There is a growing number of the non-English Web in recent years. So the language-dependent and user-based search paradigms are becoming increasingly important for search engines. Unfortunately, most of the works are available on web search analysis are still English-based. In order to understand the behavioural commonality and distinction of non-English users, we propose a framework for analysing the web search behaviours of users in a cross-lingual context. This framework is composed of 10 factors, which can be applied at the query level, session level and corpus level respectively. The integral employment of these factors could help us with characterising the user behaviour of web search, even in different languages, with regard to both statistical and semantic perspectives. This framework shows a better efficiency not only in revealing the commonality and distinction of web search, but also in informing the design of search paradigms in a cross-lingual scenario.
引用
收藏
页码:376 / 399
页数:24
相关论文
共 50 条
  • [21] Manipuri–English comparable corpus for cross-lingual studies
    Lenin Laitonjam
    Sanasam Ranbir Singh
    Language Resources and Evaluation, 2023, 57 : 377 - 413
  • [22] Document Similarity for Arabic and Cross-Lingual Web Content
    Salhi, Ali
    Yahya, Adnan H.
    ARABIC LANGUAGE PROCESSING: FROM THEORY TO PRACTICE, 2018, 782 : 134 - 146
  • [23] Learning Cross-Lingual IR from an English Retriever
    Li, Yulong
    Franz, Martin
    Sultan, Md Arafat
    Iyer, Bhavani
    Lee, Young-Suk
    Sil, Avirup
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 4428 - 4436
  • [24] Limitations of cross-lingual learning from image search
    Hartmann, Mareike
    Sogaard, Anders
    REPRESENTATION LEARNING FOR NLP, 2018, : 159 - 163
  • [25] A cross-lingual framework for web news taxonomy integration
    Yang, Cheng-Zen
    Chen, Che-Min
    Chen, Ing-Xiang
    INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2006, 4182 : 270 - +
  • [26] Fractional Similarity: Cross-Lingual Feature Selection for Search
    Jagarlamudi, Jagadeesh
    Bennett, Paul N.
    ADVANCES IN INFORMATION RETRIEVAL, 2011, 6611 : 226 - +
  • [27] Cross-Lingual Sentiment Analysis: A Survey
    Xu Y.
    Cao H.
    Wang W.
    Du W.
    Xu C.
    Data Analysis and Knowledge Discovery, 2023, 7 (01) : 1 - 21
  • [28] Cross-Lingual Knowledge Distillation for Chinese Video Captioning
    Hou J.-Y.
    Qi Y.-Y.
    Wu X.-X.
    Jia Y.-D.
    Jisuanji Xuebao/Chinese Journal of Computers, 2021, 44 (09): : 1907 - 1921
  • [29] CROSS-LINGUAL AND MULTILINGUAL SPEECH EMOTION RECOGNITION ON ENGLISH AND FRENCH
    Neumann, Michael
    Ngoc Thang Vu
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5769 - 5773
  • [30] Cross-lingual Dysarthria Severity Classification for English, Korean, and Tamil
    Yeo, Eun Jung
    Choi, Kwanghee
    Kim, Sunhee
    Chung, Minhwa
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 566 - 574