Building A Large Collection of Multi-domain Electronic Theses and Dissertations

被引:5
|
作者
Uddin, Sami [1 ]
Banerjee, Bipasha [2 ]
Wu, Jian [1 ]
Ingram, William A. [3 ]
Fox, Edward A. [2 ]
机构
[1] Old Dominion Univ, Comp Sci, Norfolk, VA 23529 USA
[2] Virginia Polytech Inst & State Univ, Comp Sci, Blacksburg, VA USA
[3] Virginia Polytech Inst & State Univ, Univ Lib, Blacksburg, VA USA
关键词
ETD; OAI-PMH; Big data;
D O I
10.1109/BigData52589.2021.9672058
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we report our progress on building a collection containing over 450k Electronic Theses and Dissertations (ETDs), including full-text and metadata. Our goal is to close the gap of accessibility between long text and short text documents, and to create a new research opportunity for the scholarly community. For that, we developed an ETD Ingestion Framework (EIF) that automatically harvests metadata and PDFs of ETDs from university libraries. We faced multiple challenges and learned many lessons during the process, that led to proposed solutions to overcome/mitigate the limitations of the current data. We also described the data that we have collected. We hope our methods will be useful for building similar collections from university libraries and that the data can be used for research and education.
引用
收藏
页码:6043 / 6045
页数:3
相关论文
共 50 条
  • [31] Basic, fuller, fullest: Treatment options for electronic theses and dissertations
    McCutcheon, Sevim
    LIBRARY COLLECTIONS ACQUISITIONS & TECHNICAL SERVICES, 2011, 35 (2-3): : 64 - 68
  • [32] The inevitable future of electronic theses and dissertations within Malaysia context
    Looi, EN
    Yeng, SW
    DIGITAL LIBRARIES: TECHNOLOGY AND MANAGEMENT OF INDIGENOUS KNOWLEDGE FOR GLOBAL ACCESS, 2003, 2911 : 340 - 350
  • [33] Open access to research data in electronic theses and dissertations: an overview
    Schopfel, Joachim
    Chaudiron, Stephane
    Jacquemin, Bernard
    Prost, Helene
    Severo, Marta
    Thiault, Florence
    LIBRARY HI TECH, 2014, 32 (04) : 612 - 627
  • [34] Electronic Theses and Dissertations Programs: A Review of the Critical Success Factors
    Rasuli, Behrooz
    Solaimani, Sam
    Alipour-Hafezi, Mehdi
    COLLEGE & RESEARCH LIBRARIES, 2019, 80 (01): : 60 - 75
  • [35] Long-term preservation of electronic theses and dissertations in Algeria
    Bakelli, Y
    Benrahmoun, S
    LIBRI, 2003, 53 (04): : 254 - 261
  • [36] Electronic theses and dissertations and academia: A preliminary study from India
    Vijayakumar, J. K.
    Murthy, T. A. V.
    Khan, M. T. M.
    JOURNAL OF ACADEMIC LIBRARIANSHIP, 2007, 33 (03): : 417 - 421
  • [37] ETDPC: A Multimodality Framework for Classifying Pages in Electronic Theses and Dissertations
    Choudhury, Muntabir Hasan
    Salsabil, Lamia
    Ingram, William A.
    Fox, Edward A.
    Wu, Jian
    THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21, 2024, : 22878 - 22884
  • [38] Status of Electronic Theses and Dissertations (ETDs) in Academic Libraries in Zimbabwe
    Chisita, Collence Takaingenhamo
    Enakrire, Rexwhite Tega
    Muziringa, Masimba Clyde
    INTERNATIONAL JOURNAL OF E-COLLABORATION, 2020, 16 (03) : 96 - 108
  • [39] Non-conventional technologies for data collection in Brazilian dissertations and theses
    Candido de Oliveira Salvador, Petala Tuani
    Filgueira Martins Rodrigues, Claudia Cristiane
    Nunes de Lima, Kalya Yasmine
    Andrade Alves, Kisna Yasmin
    Pereira Santos, Viviane Euzebia
    REVISTA BRASILEIRA DE ENFERMAGEM, 2015, 68 (02) : 243 - 251
  • [40] Communication channels and the adoption of digital libraries for electronic theses and dissertations
    Allard, S
    JCDL 2004: PROCEEDINGS OF THE FOURTH ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES: GLOBAL REACH AND DIVERSE IMPACT, 2004, : 381 - 381