Identifying translationese at the word and sub-word level

被引:13
|
作者
Avner, Ehud Alexander [1 ]
Ordan, Noam [2 ]
Wintner, Shuly [3 ]
机构
[1] Univ Potsdam, Potsdam, Germany
[2] Univ Saarland, Saarbrucken, Germany
[3] Univ Haifa, IL-31999 Haifa, Israel
关键词
D O I
10.1093/llc/fqu047
中图分类号
C [社会科学总论];
学科分类号
03 ; 0303 ;
摘要
We use text classification to distinguish automatically between original and translated texts in Hebrew, a morphologically complex language. To this end, we design several linguistically informed feature sets that capture word-level and sub-word-level (in particular, morphological) properties of Hebrew. Such features are abstract enough to allow for the development of accurate, robust classifiers, and they also lend themselves to linguistic interpretation. Careful evaluation shows that some of the classifiers we define are, indeed, highly accurate, and scale up nicely to domains that they were not trained on. In addition, analysis of the best features provides insight into the morphological properties of translated texts.
引用
收藏
页码:30 / 54
页数:25
相关论文
共 50 条
  • [41] Optimal Matrix Computing Using Vector Division with Sub-word Parallel
    Gan, Xin-Biao
    Dai, Kui
    Shen, Li
    Wang, Zhi-Ying
    INTERNATIONAL SYMPOSIUM ON UBIQUITOUS MULTIMEDIA COMPUTING, PROCEEDINGS, 2008, : 3 - 6
  • [42] Natural Sounding Sub-word Units Concatenation in Malay Speech Synthesis
    Tiun, Sabrina
    Abdullah, Rosni
    Kong, Tang Enya
    PROCEEDINGS OF THE 2009 INTERNATIONAL CONFERENCE ON SIGNAL ACQUISITION AND PROCESSING, 2009, : 77 - +
  • [43] Sub-word information in pre-trained biomedical word representations: evaluation and hyper-parameter optimization
    Galea, Dieter
    Laponogov, Ivan
    Veselkov, Kirill
    SIGBIOMED WORKSHOP ON BIOMEDICAL NATURAL LANGUAGE PROCESSING (BIONLP 2018), 2018, : 56 - 66
  • [44] Seam carving-based Arabic handwritten sub-word segmentation
    Berriche, Lamia
    Al-Mutairy, Abeer
    COGENT ENGINEERING, 2020, 7 (01):
  • [45] Experiments for the selection of sub-word units in the Basque context for semantic tasks
    Barroso, Nora
    de Ipina, Karmele Lopez
    Hernandez, Carmen
    Ezeiza, Aitzol
    Grana, Manuel
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2012, 15 (01) : 49 - 56
  • [46] Evaluating the Impact of Sub-word Information and Cross-lingual Word Embeddings on Mi'kmaq Language Modelling
    Boudreau, Jeremie
    Patra, Akankshya
    Suvarna, Ashima
    Cook, Paul
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2736 - 2745
  • [47] SubTST: a consolidation of sub-word latent topics and sentence transformer in semantic representation
    Binh Dang
    Tung Le
    Le-Minh Nguyen
    APPLIED INTELLIGENCE, 2023, 53 (11) : 13470 - 13487
  • [48] REDUCING MORPHO-PHONETIC CONFUSION IN SUB-WORD BASED UYGHUR ASR
    Ablimit, Mijit
    Hamdulla, Askar
    Pattar, Akbar
    2015 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING, 2015, : 348 - 352
  • [49] Evaluating Modeling Units and Sub-word Features in Language Models for Turkish ASR
    Liu, Chang
    Zhang, Yike
    Zhang, Pengyuan
    Wang, Yaofeng
    2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 414 - 418
  • [50] TRI-FACTORIZATION LEARNING OF SUB-WORD UNITS WITH APPLICATION TO VOCABULARY ACQUISITION
    Sun, Meng
    Van Hamme, Hugo
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 5177 - 5180