Natural language processing for similar languages, varieties, and dialects: A survey

被引:16
|
作者
Zampieri, Marcos [1 ]
Nakov, Preslav [2 ]
Scherrer, Yves [3 ]
机构
[1] Rochester Inst Technol, Rochester, NY 14623 USA
[2] HBKU, Qatar Comp Res Inst, Doha, Qatar
[3] Univ Helsinki, Helsinki, Finland
关键词
Dialects; similar languages; language varieties; language identification machine; translation parsing; MACHINE TRANSLATION; IDENTIFICATION; ADAPTATION;
D O I
10.1017/S1351324920000492
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There has been a lot of recent interest in the natural language processing (NLP) community in the computational processing of language varieties and dialects, with the aim to improve the performance of applications such as machine translation, speech recognition, and dialogue systems. Here, we attempt to survey this growing field of research, with focus on computational methods for processing similar languages, varieties, and dialects. In particular, we discuss the most important challenges when dealing with diatopic language variation, and we present some of the available datasets, the process of data collection, and the most common data collection strategies used to compile datasets for similar languages, varieties, and dialects. We further present a number of studies on computational methods developed and/or adapted for preprocessing, normalization, part-of-speech tagging, and parsing similar languages, language varieties, and dialects. Finally, we discuss relevant applications such as language and dialect identification and machine translation for closely related languages, language varieties, and dialects.
引用
收藏
页码:595 / 612
页数:18
相关论文
共 50 条
  • [1] Natural Language Processing for Dialects of a Language: A Survey
    Joshi, Aditya
    Dabre, Raj
    Kanojia, Diptesh
    Li, Zhuang
    Zhan, Haolan
    Haffari, Gholamreza
    Dippold, Doris
    ACM COMPUTING SURVEYS, 2025, 57 (06)
  • [2] Natural Language Processing for recognizing Bangla speech with regular and regional dialects: A survey of algorithms and approaches
    Upama, Paramita Basak
    Sridevi, Parama
    Rabbani, Masud
    Alam, Kazi Shafiul
    Haque, Munirul
    Ahamed, Sheikh Iqbal
    2024 IEEE 48TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC 2024, 2024, : 312 - 319
  • [3] Natural Language Processing in Translation of Relational Languages
    Dudas, Adam
    Skrinarova, Jarmila
    IPSI BGD TRANSACTIONS ON INTERNET RESEARCH, 2023, 19 (01): : 17 - 23
  • [4] Survey of Progressive Era of Text Summarization for Indian and Foreign Languages Using Natural Language Processing
    Dhawale, Apurva D.
    Kulkarni, Sonali B.
    Kumbhakarna, Vaishali
    INNOVATIVE DATA COMMUNICATION TECHNOLOGIES AND APPLICATION, 2020, 46 : 654 - 662
  • [5] A survey of graphs in natural language processing
    Nastase, Vivi
    Mihalcea, Rada
    Radev, Dragomir R.
    NATURAL LANGUAGE ENGINEERING, 2015, 21 (05) : 665 - 698
  • [6] Natural language processing in finance: A survey
    Du, Kelvin
    Zhao, Yazhi
    Mao, Rui
    Xing, Frank
    Cambria, Erik
    INFORMATION FUSION, 2025, 115
  • [7] Intelligent Approaches for Natural Language Processing for Indic Languages
    Kumar, Rashi
    Sahula, Vineet
    2021 IEEE INTERNATIONAL SYMPOSIUM ON SMART ELECTRONIC SYSTEMS (ISES 2021), 2021, : 331 - 334
  • [8] Language planning for Italian regional languages ("dialects")
    Coluzzi, Paolo
    LANGUAGE PROBLEMS & LANGUAGE PLANNING, 2008, 32 (03): : 215 - 236
  • [9] Languages and language varieties in Malta
    Vella, Alexandra
    INTERNATIONAL JOURNAL OF BILINGUAL EDUCATION AND BILINGUALISM, 2013, 16 (05) : 532 - 552
  • [10] A Survey on Challenges and Advances in Natural Language Processing with a Focus on Legal Informatics and Low-Resource Languages
    Krasadakis, Panteleimon
    Sakkopoulos, Evangelos
    Verykios, Vassilios S.
    ELECTRONICS, 2024, 13 (03)