An efficient framework of utilizing the latent semantic analysis in text extraction

被引:0
|
作者
Ahmad Hussein Ababneh
Joan Lu
Qiang Xu
机构
[1] University of Huddersfield,School of Computing and Engineering
关键词
Automatic text extraction; Multi-layer similarity; Latent semantic analysis; Vector space model;
D O I
暂无
中图分类号
学科分类号
摘要
The use of the latent semantic analysis (LSA) in text mining demands large space and time requirements. This paper proposes a new text extraction method that sets a framework on how to employ the statistical semantic analysis in the text extraction in an efficient way. The method uses the centrality feature and omits the segments of the text that have a high verbatim, statistical, or semantic similarity with previously processed segments. The identification of similarity is based on a new multi-layer similarity method that computes the similarity in three statistical layers, it uses the Jaccard similarity and the vector space model in the first and second layers respectively, and uses the LSA in the third layer. The multi-layer similarity restricts the use of the third layer for the segments that the first and second layers failed to estimate their similarities. Rouge tool is used in the evaluation, but because Rouge does not consider the extract’s size, we supplemented it with a new evaluation strategy based on the compression rate and the ratio of the sentences intersections between the automatic and the reference extracts. Our comparisons with classical LSA and traditional statistical extractions showed that we reduced the use of the LSA procedure by 52%, and we obtained 65% reduction on the original matrix dimensions, also, we obtained remarkable accuracy results. It is concluded that the employment of the centrality feature with the proposed multi-layer framework yields a significant solution in terms of efficiency and accuracy in the field of text extraction.
引用
收藏
页码:785 / 815
页数:30
相关论文
共 50 条
  • [21] Latent semantic analysis for text categorization using neural network
    Yu, Bo
    Xu, Zong-ben
    Li, Cheng-hua
    [J]. KNOWLEDGE-BASED SYSTEMS, 2008, 21 (08) : 900 - 904
  • [22] Web Text Classification Based on Improved Latent Semantic Analysis
    Wang, Lan
    Wan, Yuan
    [J]. 2011 SECOND ETP/IITA CONFERENCE ON TELECOMMUNICATION AND INFORMATION (TEIN 2011), VOL 1, 2011, : 176 - 179
  • [23] NLP Based Latent Semantic Analysis for Legal Text Summarization
    Merchant, Kaiz
    Pande, Yash
    [J]. 2018 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2018, : 1803 - 1807
  • [24] Robust discriminant analysis of latent semantic feature for text categorization
    Hu, Jiani
    Deng, Weihong
    Guo, Jun
    [J]. FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2006, 4223 : 400 - 409
  • [25] Text Clustering Based on Domain Ontology and Latent Semantic Analysis
    Li Yaxiong
    Pan Deng
    [J]. MECHATRONICS ENGINEERING, COMPUTING AND INFORMATION TECHNOLOGY, 2014, 556-562 : 3536 - +
  • [26] A Comprehensive Method for Text Summarization Based on Latent Semantic Analysis
    Wang, Yingjie
    Ma, Jun
    [J]. NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2013, 2013, 400 : 394 - 401
  • [27] Text summarization using a trainable summarizer and latent semantic analysis
    Yeh, JY
    Ke, HR
    Yang, WP
    Meng, IH
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2005, 41 (01) : 75 - 95
  • [28] EXTRACTION OF MANUFACTURING RULES FROM UNSTRUCTURED TEXT USING A SEMANTIC FRAMEWORK
    Kang, SungKu
    Patil, Lalit
    Rangarajan, Arvind
    Moitra, Abha
    Jia, Tao
    Robinson, Dean
    Dutta, Debasish
    [J]. INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, 2015, VOL 1B, 2016,
  • [29] Text segmentation by latent semantic indexing
    Ishioka, T
    [J]. NEW DEVELOPMENTS IN PSYCHOMETRICS, 2003, : 689 - 696
  • [30] Efficient Probabilistic Latent Semantic Analysis through Parallelization
    Wan, Raymond
    Anh, Vo Ngoc
    Mamitsuka, Hiroshi
    [J]. INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2009, 5839 : 432 - +