Automatic Text Extraction from Arabic Newspapers

被引:2
|
作者
Vasilopoulos, Nikos [1 ]
Wasfi, Yazan [2 ]
Kavallieratou, Ergina [1 ]
机构
[1] Univ Aegean, Karlovassi 83200, Samos, Greece
[2] Media Observer, Caracas Complex,Yajouz St 8, Amman, Jordan
来源
关键词
Layout analysis; Page segmentation; Text localization; Text extraction;
D O I
10.1007/978-3-319-93000-8_57
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
A system for extracting the textual information from document images with complex layouts is presented. It is based on both layout analysis and text localization techniques. Layout analysis is first applied to segment the page in text and non-text blocks and then text localization is used to detect text that may be embedded inside images, charts, diagrams, tables etc. Detailed experiments on scanned Arabic newspapers showed that combining layout analysis and text localization methods can lead to improved page segmentation and text extraction results.
引用
收藏
页码:505 / 510
页数:6
相关论文
共 50 条
  • [1] Automatic extraction of ontological relations from Arabic text
    Al Zamil, Mohammed G. H.
    Al-Radaideh, Qasem
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2014, 26 (04) : 462 - 472
  • [2] Automatic Keyword Extraction for Text Summarization in e-Newspapers
    Thomas, Justine Raju
    Bharti, Santosh Kumar
    Babu, Korra Sathya
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATICS AND ANALYTICS (ICIA' 16), 2016,
  • [3] Extraction of reported speeches from Arabic Lebanese newspapers
    Al-Hajj, Moustafa
    Mourad, Ghassan
    [J]. 2015 FIFTH INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION AND COMMUNICATION TECHNOLOGY AND ITS APPLICATIONS (DICTAP), 2015, : 125 - 128
  • [4] Text Summarization with Automatic Keyword Extraction in Telugu e-Newspapers
    Naidu, Reddy
    Bharti, Santosh Kumar
    Babu, Korra Sathya
    Mohapatra, Ramesh Kumar
    [J]. SMART COMPUTING AND INFORMATICS, 2018, 77 : 555 - 564
  • [5] Automatic Arabic Text Summarization Using Clustering and Keyphrase Extraction
    Fejer, Hamzah Noori
    Omar, Nazlia
    [J]. PROCEEDINGS OF THE 2014 6TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND MULTIMEDIA (ICIM), 2014, : 293 - 298
  • [6] Automatic Extraction of Headlines from Punjabi Newspapers
    Gupta, Vishal
    [J]. APPLIED ALGORITHMS, 2014, 8321 : 237 - 244
  • [7] Automatic extraction of headlines from Punjabi newspapers
    Gupta, Vishal
    [J]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, 8321 : 237 - 244
  • [8] A Method of Automatic Domain Extraction of Text to Facilitate Retrieval of Arabic Documents
    Al-Maghasbeh, Mohammad Khaled A.
    bin Hamzah, Mohd Pouzi
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2018, 9 (08) : 227 - 230
  • [9] Automatic Processing of Arabic Text
    Osman, Ziad
    Hamandi, Lama
    Zantout, Rached
    Sibai, Fadi N.
    [J]. 2009 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION TECHNOLOGY, 2009, : 6 - +
  • [10] Events Automatic Extraction from Arabic Texts
    Hkiri, Emna
    Mallat, Souheyl
    Zrigui, Mounir
    [J]. INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2016, 6 (01) : 36 - 51