Identifying translationese at the word and sub-word level

被引:13
|
作者
Avner, Ehud Alexander [1 ]
Ordan, Noam [2 ]
Wintner, Shuly [3 ]
机构
[1] Univ Potsdam, Potsdam, Germany
[2] Univ Saarland, Saarbrucken, Germany
[3] Univ Haifa, IL-31999 Haifa, Israel
关键词
D O I
10.1093/llc/fqu047
中图分类号
C [社会科学总论];
学科分类号
03 ; 0303 ;
摘要
We use text classification to distinguish automatically between original and translated texts in Hebrew, a morphologically complex language. To this end, we design several linguistically informed feature sets that capture word-level and sub-word-level (in particular, morphological) properties of Hebrew. Such features are abstract enough to allow for the development of accurate, robust classifiers, and they also lend themselves to linguistic interpretation. Careful evaluation shows that some of the classifiers we define are, indeed, highly accurate, and scale up nicely to domains that they were not trained on. In addition, analysis of the best features provides insight into the morphological properties of translated texts.
引用
收藏
页码:30 / 54
页数:25
相关论文
共 50 条
  • [21] Sub-word orthographic processing and semantic activation as revealed by ERPs
    Hasenacker, Jana
    Nadalini, Andrea
    Crepaldi, Davide
    LANGUAGE COGNITION AND NEUROSCIENCE, 2025, 40 (03) : 328 - 340
  • [22] Sub-word based Arabic Handwriting Analysis for Writer Identification
    Maliki, Makki
    Al-Jawad, Naseer
    Jassim, Sabah
    MOBILE MULTIMEDIA/IMAGE PROCESSING, SECURITY, AND APPLICATIONS 2013, 2013, 8755
  • [23] MAP and Sub-Word Level T-Norm for Text-Dependent Speaker Recognition
    Toledano, Doroteo T.
    Hernandez-Lopez, Daniel
    Esteve-Elizalde, Cristina
    Gonzalez-Rodriguez, Joaquin
    Fernandez Pozo, Ruben
    Hernandez Gomez, Luis
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1933 - +
  • [24] Sub-word Based Offline Handwritten Farsi Word Recognition Using Recurrent Neural Network
    Ghadikolaie, Mohammad Fazel Younessy
    Kabir, Ehsanolah
    Razzazi, Farbod
    ETRI JOURNAL, 2016, 38 (04) : 703 - 713
  • [25] Beginning readers activate semantics from sub-word orthography
    Nation, Kate
    Cocksey, Joanne
    COGNITION, 2009, 110 (02) : 273 - 278
  • [26] AlephBERT: Language Model Pre-training and Evaluation from Sub-Word to Sentence Level
    Seker, Amit
    Bandel, Elron
    Bareket, Dan
    Brusilovsky, Idan
    Greenfeld, Refael Shaked
    Tsarfaty, Reut
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 46 - 56
  • [27] Incorporating language constraints in sub-word based speech recognition
    Erdogan, H
    Büyük, O
    Oflazer, K
    2005 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2005, : 98 - +
  • [28] Evaluating Sub-word embeddings in cross-lingual models
    Parizi, Ali Hakimi
    Cook, Paul
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2712 - 2719
  • [29] TEPS: Transient Error Protection Utilizing Sub-word Parallelism
    Hong, Seokin
    Kim, Soontae
    2009 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI, 2009, : 286 - 291
  • [30] Effects of sub-word segmentation on performance of transformer language models
    Hou, Jue
    Katinskaia, Anisia
    Anh-Duc Vu
    Yangarber, Roman
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 7413 - 7425