An Unsupervised Approach for Precise Context Identification from Unstructured Text Documents

被引:0
|
作者
Mallek, Maha [1 ,2 ]
Fournier, Sebastien [1 ]
Guetari, Ramzi [3 ]
Espinasse, Bernard [1 ]
Chaari, Wided Lejouad [2 ]
机构
[1] Aix Marseille Univ, LIS UMR CNRS 7020, Marseille, France
[2] Univ Manouba, ENSI, LARIA, Manouba, Tunisia
[3] Univ Tunis El Manar, LIMTIC Lab, ISI, Ariana, Tunisia
关键词
accurate context extraction; unstructured textual document; text mining; semantic analysis; EXTRACTION;
D O I
10.1109/ICTAI50040.2020.00130
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The majority of the documents produced and exchanged through medias and social networks are unstructured. Due to the amount of these unstructured documents on the Web, their exploitation represents a tedious or even impossible task for human beings without assistance by dedicated algorithms and specialized computer systems in document classification or information extraction. To be efficient and relevant, such systems have to understand the content of these unstructured documents. The context (or topic) of a document is one of the basic information essential for the understanding of its content, and the more precise the context of a document, the more relevant its understanding will be. This paper presents a precise context identification approach that is evaluated quantitatively and qualitatively on several reference corpora and compared to other context identification systems. The contexts identified by our model are much more precise than those identified by these others systems.
引用
收藏
页码:821 / 826
页数:6
相关论文
共 50 条
  • [1] Mining criminal networks from unstructured text documents
    Al-Zaidy, Rabeah
    Fung, Benjamin C. M.
    Youssef, Amr M.
    Fortin, Francis
    [J]. DIGITAL INVESTIGATION, 2012, 8 (3-4) : 147 - 160
  • [2] A statistical approach to filtering unstructured text and retrieving context
    Stach, JF
    Park, EK
    [J]. INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOL VI, PROCEEDINGS, 1999, : 2815 - 2821
  • [3] Unsupervised Text Normalization Approach for Morphological Analysis of Blog Documents
    Ikeda, Kazushi
    Yanagihara, Tadashi
    Matsumoto, Kazunori
    Takishima, Yasuhiro
    [J]. AI 2009: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2009, 5866 : 401 - 411
  • [4] An Approach of Strike-through Text Identification from Handwritten Documents
    Adak, Chandranath
    Chaudhuri, Bidyut B.
    [J]. 2014 14TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2014, : 643 - 648
  • [5] Automated ontology construction for unstructured text documents
    Lee, Chang-Shing
    Kao, Yuan-Fang
    Kuo, Yau-Hwang
    Wang, Mei-Hui
    [J]. DATA & KNOWLEDGE ENGINEERING, 2007, 60 (03) : 547 - 566
  • [6] Feedback-based Keyphrase extraction from Unstructured Text Documents
    Madaan, Nishtha
    Saxena, Mudit
    Patel, Hima
    Mehta, Sameep
    [J]. 2020 INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS & NETWORKS (COMSNETS), 2020,
  • [7] Context of Text: Concepts for Recognizing Context of Acquired Knowledge from Documents
    Madhusudanan, N.
    Gurumoorthy, B.
    Chakrabarti, Amaresh
    [J]. PRODUCT LIFECYCLE MANAGEMENT AND THE INDUSTRY OF THE FUTURE, 2017, 517 : 632 - 641
  • [8] Context-based extraction of concepts from unstructured textual documents
    Gul, Saima
    Rabiger, Stefan
    Saygin, Yucel
    [J]. INFORMATION SCIENCES, 2022, 588 : 248 - 264
  • [9] Unsupervised Abstractive Summarization of Bengali Text Documents
    Chowdhury, Radia Rayan
    Nayeem, Mir Tafseer
    Mim, Tahsin Tasnim
    Chowdhury, Md Saifur Rahman
    Jannat, Taufiqul
    [J]. 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 2612 - 2619
  • [10] Named Entity Recognition in Unstructured Medical Text Documents
    Pearson, Cole
    Seliya, Naeem
    Dave, Rushit
    [J]. INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND ENERGY TECHNOLOGIES (ICECET 2021), 2021, : 412 - 417