An Unsupervised Approach for Precise Context Identification from Unstructured Text Documents

被引:0
|
作者
Mallek, Maha [1 ,2 ]
Fournier, Sebastien [1 ]
Guetari, Ramzi [3 ]
Espinasse, Bernard [1 ]
Chaari, Wided Lejouad [2 ]
机构
[1] Aix Marseille Univ, LIS UMR CNRS 7020, Marseille, France
[2] Univ Manouba, ENSI, LARIA, Manouba, Tunisia
[3] Univ Tunis El Manar, LIMTIC Lab, ISI, Ariana, Tunisia
关键词
accurate context extraction; unstructured textual document; text mining; semantic analysis; EXTRACTION;
D O I
10.1109/ICTAI50040.2020.00130
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The majority of the documents produced and exchanged through medias and social networks are unstructured. Due to the amount of these unstructured documents on the Web, their exploitation represents a tedious or even impossible task for human beings without assistance by dedicated algorithms and specialized computer systems in document classification or information extraction. To be efficient and relevant, such systems have to understand the content of these unstructured documents. The context (or topic) of a document is one of the basic information essential for the understanding of its content, and the more precise the context of a document, the more relevant its understanding will be. This paper presents a precise context identification approach that is evaluated quantitatively and qualitatively on several reference corpora and compared to other context identification systems. The contexts identified by our model are much more precise than those identified by these others systems.
引用
收藏
页码:821 / 826
页数:6
相关论文
共 50 条
  • [11] Accurate Context Extraction from Unstructured Text Based on Deep Learning
    Mack, Maha
    Guetari, Ramazi
    Fournier, Sebastian
    Chaari, Wided Lejouad
    Espinasse, Bernard
    [J]. 2022 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY, WI-IAT, 2022, : 309 - 314
  • [12] Unsupervised concept identification from a large corpus of research documents
    Plangsri, Watcharachat
    Phisanbut, Nalina
    Piamsa-nga, Punpiti
    [J]. 2022-14TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SMART TECHNOLOGY (KST 2022), 2022, : 46 - 50
  • [13] Unsupervised Cross-Modal Audio Representation Learning from Unstructured Multilingual Text
    Schindler, Alexander
    Gordea, Sergiu
    Knees, Peter
    [J]. PROCEEDINGS OF THE 35TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING (SAC'20), 2020, : 706 - 713
  • [14] Text identification in color documents
    Strouthopoulos, C
    Papamarkos, N
    Atsalakis, A
    Chamzas, C
    [J]. ISPA 2003: PROCEEDINGS OF THE 3RD INTERNATIONAL SYMPOSIUM ON IMAGE AND SIGNAL PROCESSING AND ANALYSIS, PTS 1 AND 2, 2003, : 702 - 705
  • [15] Building a Construction Project Key-Phrase Network from Unstructured Text Documents
    Nedeljkovic, Dorde
    Kovacevic, Milos
    [J]. JOURNAL OF COMPUTING IN CIVIL ENGINEERING, 2017, 31 (06)
  • [16] An unsupervised semantic sentence ranking scheme for text documents
    Zhang, Hao
    Wang, Jie
    [J]. INTEGRATED COMPUTER-AIDED ENGINEERING, 2021, 28 (01) : 17 - 33
  • [17] Shape pattern matching: A tool to cluster unstructured text documents
    Toshniwal, Durga
    Roy, Rishiraj Saha
    [J]. JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2010, 10 : S73 - S84
  • [18] Information Retrieval for Unstructured Text Documents in Serbian into the Crime Domain
    Nikolic, Vojkan
    Markoski, Branko
    Ivkovic, Miodrag
    Kuk, Kristijan
    Djikanovic, Predrag
    [J]. 2015 16TH IEEE INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND INFORMATICS (CINTI), 2015, : 267 - 271
  • [19] Unstructured Text Documents Summarization With Multi-Stage Clustering
    Saeed, Muhammad Yahya
    Awais, Muhammad
    Talib, Ramzan
    Younas, Muhammad
    [J]. IEEE ACCESS, 2020, 8 : 212838 - 212854
  • [20] An unsupervised approach for learning a Chinese IS-A taxonomy from an unstructured corpus
    Huang, Subin
    Luo, Xiangfeng
    Huang, Jing
    Guo, Yike
    Gu, Shengwei
    [J]. KNOWLEDGE-BASED SYSTEMS, 2019, 182