Building A Large Collection of Multi-domain Electronic Theses and Dissertations

被引:5
|
作者
Uddin, Sami [1 ]
Banerjee, Bipasha [2 ]
Wu, Jian [1 ]
Ingram, William A. [3 ]
Fox, Edward A. [2 ]
机构
[1] Old Dominion Univ, Comp Sci, Norfolk, VA 23529 USA
[2] Virginia Polytech Inst & State Univ, Comp Sci, Blacksburg, VA USA
[3] Virginia Polytech Inst & State Univ, Univ Lib, Blacksburg, VA USA
关键词
ETD; OAI-PMH; Big data;
D O I
10.1109/BigData52589.2021.9672058
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we report our progress on building a collection containing over 450k Electronic Theses and Dissertations (ETDs), including full-text and metadata. Our goal is to close the gap of accessibility between long text and short text documents, and to create a new research opportunity for the scholarly community. For that, we developed an ETD Ingestion Framework (EIF) that automatically harvests metadata and PDFs of ETDs from university libraries. We faced multiple challenges and learned many lessons during the process, that led to proposed solutions to overcome/mitigate the limitations of the current data. We also described the data that we have collected. We hope our methods will be useful for building similar collections from university libraries and that the data can be used for research and education.
引用
收藏
页码:6043 / 6045
页数:3
相关论文
共 50 条
  • [1] Electronic theses and dissertations
    Fineman, Y
    PORTAL-LIBRARIES AND THE ACADEMY, 2003, 3 (02) : 219 - 227
  • [2] Electronic theses and dissertations in CRIS
    Schoepfel, Joachim
    Zendulkova, Danica
    Fatemi, Omid
    12TH INTERNATIONAL CONFERENCE ON CURRENT RESEARCH INFORMATION SYSTEMS (CRIS 2014): MANAGING DATA INTENSIVE SCIENCE: THE ROLE OF RESEARCH INFORMATION SYSTEMS IN REALISING THE DIGITAL AGENDA, 2014, 33 : 110 - 117
  • [3] Building Large Arabic Multi-domain Resources for Sentiment Analysis
    ElSahar, Hady
    El-Beltagy, Samhaa R.
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT II, 2015, 9042 : 23 - 34
  • [4] Electronic Theses and Dissertations at the University of Virginia
    Sharretts, CW
    Shieh, J
    French, JC
    ASIS 99: PROCEEDINGS OF THE 62ND ASIS ANNUAL MEETING, VOL 36, 1999: KNOWLEDGE: CREATION ORGANIZATION AND USE, 1999, 36 : 240 - 255
  • [5] Issues and innovations in electronic theses and dissertations
    Pennell, C
    LIBRARY COLLECTIONS ACQUISITIONS & TECHNICAL SERVICES, 2000, 24 (04): : 514 - 515
  • [6] Electronic theses and dissertations; A world of ideas
    Lapinski, S
    France, R
    Dowling, T
    Zhang, Y
    Lee, KH
    ASIST 2001: PROCEEDINGS OF THE 64TH ASIST ANNUAL MEETING, VOL 38, 2001, 2001, 38 : 639 - 640
  • [7] Electronic theses and dissertations at Virginia Tech
    Thompson, Larry A.
    Science and Technology Libraries, 2001, 20 (01): : 87 - 101
  • [8] Open Access to Electronic Theses and Dissertations
    Suber, Peter
    DESIDOC JOURNAL OF LIBRARY & INFORMATION TECHNOLOGY, 2008, 28 (01): : 25 - 34
  • [9] Evaluating preservation strategies for electronic theses and dissertations
    Strodl, Stephan
    Becker, Christoph
    Neumayer, Robert
    Rauber, Andreas
    Bettelli, Eleonora Nicchiarelli
    Kaiser, Max
    Hofman, Hans
    Neuroth, Heike
    Strathmann, Stefan
    Debole, Franca
    Amato, Giuseppe
    DIGITAL LIBRARIES: RESEARCH AND DEVELOPMENT, 2007, 4877 : 238 - +
  • [10] Building datasets to support information extraction and structure parsing from electronic theses and dissertations
    Ingram, William A.
    Wu, Jian
    Kahu, Sampanna Yashwant
    Manzoor, Javaid Akbar
    Banerjee, Bipasha
    Ahuja, Aman
    Choudhury, Muntabir Hasan
    Salsabil, Lamia
    Shields, Winston
    Fox, Edward A.
    INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES, 2024, 25 (02) : 175 - 196