Building comparable corpora from social networks

被引:0
|
作者
Trabelsi, Maroua [1 ]
Hajjem, Malek [2 ]
Latiri, Chiraz [1 ]
机构
[1] LIPAH, Tunis, Tunisia
[2] LIPAH, INSAT Ctr Urbain Nord Tunis, Tunis, Tunisia
关键词
Social networks; Text mining; Comparable corpora; Comparability metrics;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
Working with comparable corpora becomes an interesting alternative to rare parallel corpora in different natural language tasks. Therefore many researchers have accentuated the need of large quantities of such corpora and the need to work on their construction. In this paper, we highlight the interest and usefulness of textual data mining in social networks. We propose the extraction of tweets from the microblog Twitter in order to construct a comparable corpus. This work aims to develop a new method for the construction of comparable corpus from twitter that could be used in forthcoming work to construct a bilingual dictionary, using text mining approach.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Building English - Punjabi Aligned Parallel Corpora of Nouns from Comparable Corpora
    Kaur, Dilshad
    Singh, Satwinder
    [J]. APPLIED COMPUTER SYSTEMS, 2023, 28 (02) : 245 - 251
  • [2] From language to culture and beyond: building and exploring comparable web corpora
    Gatto, Maristella
    [J]. LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 72 - 78
  • [3] Exploiting Comparable Corpora for Building and Expanding Terminological Resources
    Sadat, Fatiha
    [J]. LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : E13 - E16
  • [4] Building and using multimodal comparable corpora for machine translation
    Afli, Haithem
    Barrault, Loic
    Schwenk, Holger
    [J]. NATURAL LANGUAGE ENGINEERING, 2016, 22 (04) : 603 - 625
  • [5] An application of local relevance feedback for building comparable corpora from news article matching
    Collier, Nigel
    Kumano, Akira
    Hirakawa, Hideki
    [J]. NII Journal, 2003, (05): : 9 - 23
  • [6] Chinese-Portuguese Machine Translation: A Study on Building Parallel Corpora from Comparable Texts
    Liu, Siyou
    Wang, Longyue
    Liu, Chao-Hong
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 1485 - 1492
  • [7] Terminology Extraction from Comparable Corpora for Latvian
    Gornostay, Tatiana
    Ramm, Anita
    Heid, Ulrich
    Morin, Emmanuel
    Harastani, Rima
    Planas, Emmanuel
    [J]. HUMAN LANGUAGE TECHNOLOGIES: THE BALTIC PERSPECTIVE, 2012, 247 : 66 - +
  • [8] Extracting Parallel Phrases from Comparable Corpora
    Zhang, Jiexin
    Cao, Hailong
    Zhao, Tiejun
    [J]. PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2014), 2014, : 166 - 169
  • [9] Building Comparable Corpora for Assessing Multi-Word Term Alignment
    Adjali, Omar
    Morin, Emmanuel
    Zweigenbaum, Pierre
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 3103 - 3112
  • [10] A Review on Building Bilingual Comparable Corpora for Resource-limited Languages
    Nasharuddin, Nurul Amelina
    Abdullah, Muhamad Taufik
    Azman, Azreen
    Kadir, Rabiah Abdul
    [J]. 2018 FOURTH INTERNATIONAL CONFERENCE ON INFORMATION RETRIEVAL AND KNOWLEDGE MANAGEMENT (CAMP), 2018, : 113 - 118