Building parallel corpora by automatic title alignment

被引:0
|
作者
Yang, CC [1 ]
Li, KW [1 ]
机构
[1] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Sha Tin 100083, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-lingual semantic interoperability has drawn significant research attention recently, as the number of digital libraries in non-English languages has grown exponentially. Cross-lingual information retrieval (CLIR) across different European languages, such as English, Spanish and French, has been widely explored, but CLIR across European and Oriental languages is still at the initial stages. To cross the language boundary, a corpus-based approach shows promise of overcoming the limitations of knowledge-based and controlled vocabulary approaches. However, collecting parallel corpora between European and Oriental languages is not an easy task. Length-based and text-based approaches are two major approaches to align parallel documents. In this paper, we investigate several techniques using these approaches, and compare their performance in aligning English and Chinese titles of parallel documents available on the Web.
引用
收藏
页码:328 / 339
页数:12
相关论文
共 50 条
  • [1] Building parallel corpora by automatic title alignment using length-based and text-based approaches
    Yang, CC
    Li, KW
    INFORMATION PROCESSING & MANAGEMENT, 2004, 40 (06) : 939 - 955
  • [2] Factors influencing automatic segmental alignment of sociophonetic corpora
    Fromont, Robert
    Watson, Kevin
    CORPORA, 2016, 11 (03) : 401 - 431
  • [3] Automatic construction of English/Chinese parallel corpora
    Yang, CC
    Li, KW
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2003, 54 (08): : 730 - 742
  • [4] Automatic creation of WordNets from parallel corpora
    Oliver, Antoni
    Climent, Salvador
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 1112 - 1116
  • [5] Automatic Computation of Poetic Creativity in Parallel Corpora
    Zuniga, Daniel F.
    Amido, Teresa
    Camargo, Jorge E.
    ADVANCES IN COMPUTING, CCC 2017, 2017, 735 : 710 - 720
  • [6] A new Alignment algorithm for Parallel Corpora of Japanese and Chinese
    Quan, Yuhua
    Jin, Ying-hao
    Quan, Jingji
    2011 INTERNATIONAL CONFERENCE ON ELECTRONICS, COMMUNICATIONS AND CONTROL (ICECC), 2011, : 3498 - 3501
  • [7] Parallel Sentence Alignment from Biomedical Comparable Corpora
    Cardon, Remi
    Grabar, Natalia
    DIGITAL PERSONALIZED HEALTH AND MEDICINE, 2020, 270 : 362 - 366
  • [8] A Quantitative Analysis and Sentence Alignment for Parallel Corpora of ShiJi
    Liu, Ying
    Wang, Nan
    Yuan, Bo
    JOURNAL OF QUANTITATIVE LINGUISTICS, 2016, 23 (01) : 71 - 108
  • [9] Context-based sentence alignment in parallel corpora
    Bicici, Ergun
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2008, 4919 : 434 - 444
  • [10] Classification and Selection of Translation Candidates for Parallel Corpora Alignment
    Kavitha, K. M.
    Gomes, Luis
    Aires, Jose
    Lopes, Jose Gabriel P.
    PROGRESS IN ARTIFICIAL INTELLIGENCE-BK, 2015, 9273 : 723 - 734