Multilingual sentence categorization and novelty mining

被引:10
|
作者
Zhang, Yi [1 ]
Tsai, Flora S. [1 ]
Kwee, Agus Trisnajaya [1 ]
机构
[1] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore 639798, Singapore
关键词
Multilingual categorization; Sentence retrieval; Novelty mining; Malay; Chinese;
D O I
10.1016/j.ipm.2010.02.003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A challenge for sentence categorization and novelty mining is to detect not only when text is relevant to the user's information need, but also when it contains something new which the user has not seen before. It involves two tasks that need to be solved. The first is identifying relevant sentences (categorization) and the second is identifying new information from those relevant sentences (novelty mining). Many previous studies of relevant sentence retrieval and novelty mining have been conducted on the English language, but few papers have addressed the problem of multilingual sentence categorization and novelty mining. This is an important issue in global business environments, where mining knowledge from text in a single language is not sufficient. In this paper, we perform the first task by categorizing Malay and Chinese sentences, then comparing their performances with that of English. Thereafter, we conduct novelty mining to identify the sentences with new information. Experimental results on TREC 2004 Novelty Track data show similar categorization performance on Malay and English sentences, which greatly outperform Chinese. In the second task, it is observed that we can achieve similar novelty mining results for all three languages, which indicates that our algorithm is suitable for novelty mining of multilingual sentences. In addition, after benchmarking our results with novelty mining without categorization, it is learnt that categorization is necessary for the successful performance of novelty mining. (C) 2010 Elsevier Ltd. All rights reserved.
引用
收藏
页码:667 / 675
页数:9
相关论文
共 50 条
  • [1] Chinese Categorization and Novelty Mining
    Tsai, Flora S.
    Zhang, Yi
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT II: 15TH PACIFIC-ASIA CONFERENCE, PAKDD 2011, 2011, 6635 : 284 - 295
  • [2] Evaluation of novelty metrics for sentence-level novelty mining
    Tsai, Flora S.
    Tang, Wenyin
    Chan, Kap Luk
    [J]. INFORMATION SCIENCES, 2010, 180 (12) : 2359 - 2374
  • [3] MUSS: Multilingual Unsupervised Sentence Simplification by Mining Paraphrases
    Martin, Louis
    Fan, Angela
    de la Clergerie, Eric
    Bordes, Antoine
    Sagot, Benoit
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1651 - 1664
  • [4] Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining
    Kvapilikova, Ivana
    Artetxe, Mikel
    Labaka, Gorka
    Agirre, Eneko
    Bojar, Ondrej
    [J]. 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020): STUDENT RESEARCH WORKSHOP, 2020, : 255 - 262
  • [5] Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings
    Artetxe, Mikel
    Schwenk, Holger
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3197 - 3203
  • [6] Multilingual novelty detection
    Tsai, Flora S.
    Zhang, Yi
    Kwee, Agus T.
    Tang, Wenyin
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (01) : 652 - 658
  • [7] Novelty Categorization Theory
    Forster, Jens
    Marguc, Janina
    Gillebaart, Marleen
    [J]. SOCIAL AND PERSONALITY PSYCHOLOGY COMPASS, 2010, 4 (09): : 736 - 755
  • [8] Multilingual sentence hunter
    Liu, JYC
    Lin, JL
    [J]. WEB INFORMATION SYSTEMS ENGINEERING - WISE 2005 WORKSHOPS, PROCEEDINGS, 2005, 3807 : 84 - 93
  • [9] Categorization in multilingual storytelling Introduction
    Prior, Matthew T.
    Talmy, Steven
    [J]. PRAGMATICS AND SOCIETY, 2019, 10 (03) : 329 - 336
  • [10] Novelty and conflict in the categorization of complex stimuli
    Folstein, Jonathan R.
    Van Petten, Cyma
    Rose, Scott A.
    [J]. PSYCHOPHYSIOLOGY, 2008, 45 (03) : 467 - 479