Culturomics on a Bengali Newspaper Corpus

被引:2
|
作者
Phani, Shanta [1 ]
Lahiri, Shibamouli [2 ]
Biswas, Arindam [1 ]
机构
[1] BESU, Dept IT, Howrah 711103, W Bengal, India
[2] Penn State Univ, Dept Comp Sci & Engn, University Pk, PA 16802 USA
关键词
D O I
10.1109/IALP.2012.68
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce culturomic studies on a leading Bengali newspaper corpus - Ananda Bazar Patrika, in the same spirit as [15]. Based on 11 years' worth of Bengali newswire text, we are able to extract trajectories of salient words that are of importance in contemporary West Bengal. To the best of our knowledge, this is the first time a culturomic trend analysis is being performed on an Indic language. As a result of our analysis, we obtain interesting insights into word usage and cultural shift in contemporary West Bengal. Moreover, we model culturomic trajectories using ARIMA and obtain word usage predictions that closely follow actual usage patterns.
引用
收藏
页码:237 / 240
页数:4
相关论文
共 50 条
  • [1] Introducing the Corpus of the Canon of Western Literature: A corpus for culturomics and stylistics
    Green, Clarence
    [J]. LANGUAGE AND LITERATURE, 2017, 26 (04) : 282 - 299
  • [2] WORD FREQUENCY IN NEWSPAPER BENGALI - DABBS,J
    DIMOCK, EC
    [J]. JOURNAL OF ASIAN STUDIES, 1968, 28 (01): : 182 - 183
  • [3] Sentiment Analysis on (Bengali Horoscope) Corpus
    Ghosal, Tirthankar
    Das, Sajal K.
    Bhattacharjee, Saprativa
    [J]. 2015 ANNUAL IEEE INDIA CONFERENCE (INDICON), 2015,
  • [4] Publishing the Trove Newspaper Corpus
    Cassidy, Steve
    [J]. LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 4520 - 4525
  • [5] BEmoC: A Corpus for Identifying Emotion in Bengali Texts
    MD. Asif Iqbal
    Avishek Das
    Omar Sharif
    Mohammed Moshiul Hoque
    Iqbal H. Sarker
    [J]. SN Computer Science, 2022, 3 (2)
  • [6] Extracting Collocations from Bengali Text Corpus
    Das, Bidyut
    [J]. 2ND INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATION, CONTROL AND INFORMATION TECHNOLOGY (C3IT-2012), 2012, 4 : 325 - 329
  • [7] An Enhanced Corpus for Arabic Newspaper Comments
    Rahab, Hichem
    Zitouni, Abdelhafid
    Djoudi, Mahieddine
    [J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2020, 17 (05) : 789 - 798
  • [8] A computer corpus of Italian newspaper language
    Burr, E
    [J]. RESEARCH IN HUMANITIES COMPUTING 4, 1996, 4 : 216 - 239
  • [9] A Sentiment Classification in Bengali and Machine Translated English Corpus
    Sazzed, Salim
    Jayarathna, Sampath
    [J]. 2019 IEEE 20TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE (IRI 2019), 2019, : 107 - 114
  • [10] Bengali Basic Travel Expression Corpus: A Statistical Analysis
    Khan, Soma
    Basu, Joyanta
    Basu, Tulika
    Bepari, Milton Samirakshma
    Pal, Madhab
    Roy, Rajib
    [J]. 2014 17TH ORIENTAL CHAPTER OF THE INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDIZATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (COCOSDA), 2014,