Extracting information from newspaper archives in Africa

被引:1
|
作者
Zeni, M. [1 ]
Weldemariam, K. [2 ]
机构
[1] Univ Trento, I-38123 Trento, TN, Italy
[2] IBM Res, Nairobi, Kenya
关键词
Alternative source - Digital archives - Digital sources - Extracting information - Proof of concept - Public services - Research problems - Sub-saharan africa;
D O I
10.1147/JRD.2017.2742706
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In sub-Saharan Africa, lack of useful information for the public good is one obstacle to the development of public services (public safety, education, healthcare, etc.). This makes the extraction of data from digital archives (e.g., analog sources such as printed newspaper archives and born-digital sources like native PDF) an interesting alternative source of data to increase the amount and diversity of potentially useful information. Printed newspapers contain various multiarticle page layouts, wherein articles in the newspaper are designed to allow readers to define their own reading. The title of an article, the introductory story of the title, and related images are mostly grouped together. However, subsequent paragraphs and images are spread across various pages of the newspaper in a somewhat unpredictable manner. This, together with the poor quality of existing archives, makes the extracting of data from archived newspapers a daunting research problem. To solve these challenges, we present a system that extracts, detects, and clusters articles in newspapers from digital archives (mainly containing scanned newspaper archives from which the information is extracted). Finally, we also describe our proof-of-concept service using the extracted data.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] EXTRACTING INFORMATION FROM ENDGAME DATABASES
    NUNN, J
    [J]. ICCA JOURNAL, 1993, 16 (04): : 191 - 200
  • [32] Archives for Africa
    Freemantle, M
    [J]. CHEMICAL & ENGINEERING NEWS, 2006, 84 (20) : 44 - 45
  • [33] Extracting Functional Information From Tissues
    Michael P. DiGiovanna
    [J]. Molecular Diagnosis, 2001, 6 (1) : 13 - 15
  • [34] EXTRACTING MOTION INFORMATION FROM LUMINANCE
    ANSTIS, SM
    [J]. PERCEPTION, 1988, 17 (03) : 341 - 341
  • [35] Extracting information from multiplex networks
    Iacovacci, Jacopo
    Bianconi, Ginestra
    [J]. CHAOS, 2016, 26 (06)
  • [36] Extracting structure information from TDHF
    Stevenson, P. D.
    Fracasso, S.
    [J]. JOURNAL OF PHYSICS G-NUCLEAR AND PARTICLE PHYSICS, 2010, 37 (06)
  • [37] Extracting semistructured information from Web
    [J]. 2000, Beijing, China (12):
  • [38] Extracting Information Networks from the Blogosphere
    Merhav, Yuval
    Mesquita, Filipe
    Barbosa, Denilson
    Yee, Wai Gen
    Frieder, Ophir
    [J]. ACM TRANSACTIONS ON THE WEB, 2012, 6 (03)
  • [39] Extracting information from AGN variability
    Kasliwal, Vishal P.
    Vogeley, Michael S.
    Richards, Gordon T.
    [J]. MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2017, 470 (03) : 3027 - 3048
  • [40] Extracting table information from the Web
    Kim, YS
    Lee, KH
    [J]. DOCUMENT ANALYSIS SYSTEMS VI, PROCEEDINGS, 2004, 3163 : 438 - 441