Co-occurrence based word representation for extracting named entities in Tamil tweets

被引:1
|
作者
Devi, G. Remmiya [1 ]
Kumar, M. Anand [1 ]
Soman, K. P. [1 ]
机构
[1] Amrita Vishwa Vidyapeetham, Ctr Computat Engn & Networking CEN, Amrita Sch Engn, Coimbatore, Tamil Nadu, India
关键词
Support Vector Machine; Word2vec; glove embedding; N-gram embedding; structured skip gram; RECOGNITION;
D O I
10.3233/JIFS-169439
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Social media is considered to be a vibrant area where millions of individuals interact and share their views. Processing social media text in Indian languages is a challenging task, as it is a well-known fact that Indian languages are morphologically rich in structure. On transferring such an unstructured text into a consistent format, the data is exposed to feature extraction method. In the huge corpora, information units i.e. entities holds the basic idea of the content. The main aim of the system is to recognise and extract the named entities in the social media twitter text. The proposed system relies on the proficient co-occurrence based word embedding models to extract the features for the words in the dataset. The proposed work makes use of text data from the Twitter resource in the Tamil language. In order to enhance the performance of the system, tri-gram features are extracted from the word embedding vectors. Hence, systems are trained using N-gram embedding features and named entity tags. Implementation of the system is using machine learning classifier, Support Vector Machine (SVM). On comparing the performance of the proposed systems, it can be seen that glove embedding shows better results with the accuracy of 96.93%, whereas the accuracy of word2vec embedding is 84.53%. The improvement in the performance of the system based on glove embedding with regard to the accuracy may be due to the imperative role of the co-occurrence information of glove embedding in recognising the entities.
引用
收藏
页码:1435 / 1442
页数:8
相关论文
共 50 条
  • [1] An Empirical Study of the Occurrence and Co-Occurrence of Named Entities in Natural Language Corpora
    Saravanan, K.
    Choudhury, Monojit
    Udupa, Raghavendra
    Kumaran, A.
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 3118 - 3125
  • [2] Probabilistic Named Entity Recognition for non-standard format entities using co-occurrence word embeddings
    Al-Ani, Jabir Alshehabi
    Fasli, Maria
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 2077 - 2086
  • [3] Extracting Tweets related to Disaster Information by using Multiple Co-occurrence Relation of Words
    Yuzawa, Akio
    Ichikawa, Hiroyoshi
    Kobayashi, Aki
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON SMART COMPUTING (SMARTCOMP 2018), 2018, : 321 - 326
  • [4] Co-occurrence and ranking of entities based on semantic annotation
    Popov, Borislav
    Kiryakov, Atanas
    Kitchukov, Ilian
    Angelov, Krasimir
    Kozhuharov, Danail
    [J]. International Journal of Metadata, Semantics and Ontologies, 2008, 3 (01) : 21 - 36
  • [5] Extracting semantic representations from word co-occurrence statistics: A computational study
    John A. Bullinaria
    Joseph P. Levy
    [J]. Behavior Research Methods, 2007, 39 : 510 - 526
  • [6] Extracting semantic representations from word co-occurrence statistics: A computational study
    Bullinaria, John A.
    Levy, Joseph P.
    [J]. BEHAVIOR RESEARCH METHODS, 2007, 39 (03) : 510 - 526
  • [7] Extracting Co-Occurrence Relations from ZDDs
    Toda, Takahisa
    [J]. ALGORITHMS, 2012, 5 (04) : 654 - 667
  • [8] Combining word based and word co-occurrence based sequence analysis for text categorization
    Luo, X
    Zincir-Heywood, AN
    [J]. PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 1580 - 1585
  • [9] An Automatic Image Tagging Based on Word Co-Occurrence Analysis
    Abdulraheem, Ali
    Zakaria, Lailatul Qadri
    [J]. 2018 FOURTH INTERNATIONAL CONFERENCE ON INFORMATION RETRIEVAL AND KNOWLEDGE MANAGEMENT (CAMP), 2018, : 49 - 53
  • [10] A word co-occurrence matrix based method for relevance feedback
    Chen, Zilong
    Lu, Yang
    [J]. Journal of Computational Information Systems, 2011, 7 (01): : 17 - 24