A Hindi-English Code-Switching Corpus

被引:0
|
作者
Dey, Anik [1 ]
Fung, Pascale [1 ]
机构
[1] HKUST, Human Language Technol Ctr, Dept Elect & Comp Engn, Hong Kong, Peoples R China
关键词
code-switch; mixed language; Hindi-English;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
The aim of this paper is to investigate the rules and constraints of code-switching (CS) in Hindi-English mixed language data. In this paper, we'll discuss how we collected the mixed language corpus. This corpus is primarily made up of student interview speech. The speech was manually transcribed and verified by bilingual speakers of Hindi and English. The code-switching cases in the corpus are discussed and the reasons for code-switching are explained.
引用
收藏
页码:2410 / 2413
页数:4
相关论文
共 50 条
  • [1] CODE-SWITCHING - HINDI-ENGLISH
    VERMA, SK
    [J]. LINGUA, 1976, 38 (02) : 153 - 165
  • [3] A diachronic investigation of Hindi-English code-switching, using Bollywood film scripts
    Si, Aung
    [J]. INTERNATIONAL JOURNAL OF BILINGUALISM, 2011, 15 (04) : 388 - 407
  • [4] A quantitative analysis of age-related differences in Hindi-English code-switching
    Ellison, T. Mark
    Si, Aung
    [J]. INTERNATIONAL JOURNAL OF BILINGUALISM, 2021, 25 (06) : 1510 - 1528
  • [5] Inter-individual differences in Hindi-English code-switching: A quantitative approach
    Si, Aung
    Ellison, T. Mark
    [J]. INTERNATIONAL JOURNAL OF BILINGUALISM, 2023, 27 (03) : 306 - 330
  • [6] A Twitter Corpus for Hindi-English Code Mixed POS Tagging
    Singh, Kushagra
    Sen, Indira
    Kumaraguru, Ponnurangam
    [J]. NATURAL LANGUAGE PROCESSING FOR SOCIAL MEDIA (AFNLP SIG SOCIALNLP), 2018, : 12 - 17
  • [7] A Mandarin-English Code-Switching Corpus
    Li, Ying
    Yu, Yue
    Fung, Pascale
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 2515 - 2519
  • [8] Aggression-annotated Corpus of Hindi-English Code-mixed Data
    Kumar, Ritesh
    Reganti, Aishwarya N.
    Bhatia, Akshit
    Maheshwari, Tushar
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 1425 - 1431
  • [9] CMIR: A Corpus for Evaluation of Code Mixed Information Retrieval of Hindi-English Tweets
    Chakma, Kunal
    Das, Amitava
    [J]. COMPUTACION Y SISTEMAS, 2016, 20 (03): : 425 - 434
  • [10] HindEnCorp - Hindi-English and Hindi-only Corpus for Machine Translation
    Bojar, Ondrej
    Diatka, Vojtech
    Rychly, Pavel
    Stranak, Pavel
    Suchomel, Vit
    Tamchyna, Ales
    Zeman, Daniel
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 3550 - 3555