Identifying translationese at the word and sub-word level

被引:13
|
作者
Avner, Ehud Alexander [1 ]
Ordan, Noam [2 ]
Wintner, Shuly [3 ]
机构
[1] Univ Potsdam, Potsdam, Germany
[2] Univ Saarland, Saarbrucken, Germany
[3] Univ Haifa, IL-31999 Haifa, Israel
关键词
D O I
10.1093/llc/fqu047
中图分类号
C [社会科学总论];
学科分类号
03 ; 0303 ;
摘要
We use text classification to distinguish automatically between original and translated texts in Hebrew, a morphologically complex language. To this end, we design several linguistically informed feature sets that capture word-level and sub-word-level (in particular, morphological) properties of Hebrew. Such features are abstract enough to allow for the development of accurate, robust classifiers, and they also lend themselves to linguistic interpretation. Careful evaluation shows that some of the classifiers we define are, indeed, highly accurate, and scale up nicely to domains that they were not trained on. In addition, analysis of the best features provides insight into the morphological properties of translated texts.
引用
收藏
页码:30 / 54
页数:25
相关论文
共 50 条
  • [1] Exploring the limits of sub-word level parallelism
    Scott, K
    Davidson, J
    2000 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PROCEEDINGS, 2000, : 81 - 91
  • [2] Sub-word Level Lip Reading With Visual Attention
    Prajwal, K. R.
    Afouras, Triantafyllos
    Zisserman, Andrew
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5152 - 5162
  • [3] Word/sub-word lattices decomposition and combination for speech recognition
    Le, Viet-Bac
    Seng, Sopheap
    Besacier, Laurent
    Bigi, Brigitte
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4321 - 4324
  • [4] Data alignment for sub-word parallelism in DSP
    Fridman, Jose
    IEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation, 1999, : 251 - 260
  • [5] Sub-word parallelism in digital signal processing
    Fridman, J
    IEEE SIGNAL PROCESSING MAGAZINE, 2000, 17 (02) : 27 - 35
  • [6] Systematic design of programs with sub-word parallelism
    Schaffer, R
    Merker, R
    Catthoor, F
    PAR ELEC 2002: INTERNATIONAL CONFERENCE ON PARALLEL COMPUTING IN ELECTRICAL ENGINEERING, 2002, : 393 - 398
  • [7] Sub-word Language Modeling for Russian LVCSR
    Zablotskiy, Sergey
    Minker, Wolfgang
    SPEECH AND COMPUTER (SPECOM 2015), 2015, 9319 : 413 - 421
  • [8] Comparison of Phonemic and Graphemic Word to Sub-Word Unit Mappings for Lithuanian Phone-Level Speech Transcription
    Raskinis, Gailius
    Paskauskaite, Gintare
    Saudargiene, Ausra
    Kazlauskiene, Asta
    Vaiciunas, Airenas
    INFORMATICA, 2019, 30 (03) : 573 - 593
  • [9] Study of sub-word acoustical models for Kannada isolated word recognition system
    Thalengala A.
    Shama K.
    International Journal of Speech Technology, 2016, 19 (4) : 817 - 826
  • [10] Sub-word Image Clustering in Farsi Printed Books
    Soheili, Mohammad Reza
    Kabir, Ehsanollah
    Stricker, Didier
    SEVENTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2014), 2015, 9445