VizioMetrix: A Platform for Analyzing the Visual Information in Big Scholarly Data

被引:7
|
作者
Lee, Po-Shen [1 ]
West, Jevin D. [2 ]
Howe, Bill [1 ]
机构
[1] Univ Washington, 185 Stevens Way, Seattle, WA 98105 USA
[2] Univ Washington, Box 352840, Seattle, WA 98195 USA
基金
美国国家科学基金会;
关键词
Figure Retrieval; Information Retrieval; Crowdsourcing; Opendata; Bibliometrics; Scientometrics; Viziometrics;
D O I
10.1145/2872518.2890523
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We present VizioMetrix, a platform that extracts visual information from the scientific literature and makes it available for use in new information retrieval applications and for studies that look at patterns of visual information across millions of papers. New ideas are conveyed visually in the scientific literature through figures - diagrams, photos, visualizations, tables - but these visual elements remain ensconced in the surrounding paper and difficult to use directly to facilitate information discovery tasks or longitudinal analytics. Very few applications in information retrieval, academic search, or bibliometrics make direct use of the figures, and none attempt to recognize and exploit the type of figure, which can be used to augment interactions with a large corpus of scholarly literature. The VizioMetrix platform processes a corpus of documents, classifies the figures, organizes the results into a cloud-hosted databases, and drives three distinct applications to support bibliometric analysis and information retrieval. The first application supports information retrieval tasks by allowing rapid browsing of classified figures. The second application supports longitudinal analysis of visual patterns in the literature and facilitates data mining of these figures. The third application supports crowdsourced tagging of figures to improve classification, augment search, and facilitate new kinds of analyses. Our initial corpus is the entirety of PubMed Central (PMC), and will be released to the public alongside this paper; we welcome other researchers to make use of these resources.
引用
收藏
页码:413 / 418
页数:6
相关论文
共 50 条
  • [1] Scholarly Big Data: Information Extraction and Data Mining
    Giles, C. Lee
    [J]. PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 1 - 1
  • [2] A Web Service for Scholarly Big Data Information Extraction
    Williams, Kyle
    Li, Lichi
    Khabsa, Madian
    Wu, Jian
    Shih, Patrick C.
    Giles, C. Lee
    [J]. 2014 IEEE 21ST INTERNATIONAL CONFERENCE ON WEB SERVICES (ICWS 2014), 2014, : 105 - 112
  • [3] Towards Building a Scholarly Big Data Platform: Challenges, Lessons and Opportunities
    Wu, Zhaohui
    Wu, Jian
    Khabsa, Madian
    Williams, Kyle
    Chen, Hung-Hsuan
    Huang, Wenyi
    Tuarob, Suppawong
    Choudhury, Sagnik Ray
    Ororbia, Alexander
    Mitra, Prasenjit
    Giles, C. Lee
    [J]. 2014 IEEE/ACM JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL), 2014, : 117 - 126
  • [4] Big Scholarly Data in CiteSeerX: Information Extraction from the Web
    Ororbia, Alexander G., II
    Wu, Jian
    Khabsa, Madian
    Williams, Kyle
    Giles, C. Lee
    [J]. WWW'15 COMPANION: PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2015, : 597 - 602
  • [5] Information Management and Platform Refinement for Big Data
    Syu, Joey J.
    Ji, Kevin Z.
    Hsiao, Hui-I
    Lin, Po-Hung
    [J]. 2015 INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA (CCBD), 2015, : 173 - U424
  • [6] Traffic Information Computing Platform for Big Data
    Duan, Zongtao
    Li, Ying
    Zheng, Xibin
    Liu, Yan
    Dai, Jiting
    Kang, Jun
    [J]. INTERNATIONAL CONFERENCE OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING 2014 (ICCMSE 2014), 2014, 1618 : 464 - 467
  • [7] Scholarly Big Data Information Extraction and Integration in the CiteSeerχ Digital Library
    Williams, Kyle
    Wu, Jian
    Choudhury, Sagnik Ray
    Khabsa, Madian
    Giles, C. Lee
    [J]. 2014 IEEE 30TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW), 2014, : 68 - 73
  • [8] A Survey Of Data Visualization Tools For Analyzing Large Volume Of Data In Big Data Platform
    Raghav, R. S.
    Pothula, Sujatha
    Vengattaraman, T.
    Ponnurangam, Dhavachelvan
    [J]. PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON COMMUNICATION AND ELECTRONICS SYSTEMS (ICCES), 2016, : 372 - 377
  • [9] Big data and big techs: understanding the value of information in platform capitalism
    Marciano, Alain
    Nicita, Antonio
    Ramello, Giovanni Battista
    [J]. EUROPEAN JOURNAL OF LAW AND ECONOMICS, 2020, 50 (03) : 345 - 358
  • [10] Big data and big techs: understanding the value of information in platform capitalism
    Alain Marciano
    Antonio Nicita
    Giovanni Battista Ramello
    [J]. European Journal of Law and Economics, 2020, 50 : 345 - 358